Class MarkSweepGarbageCollector

  • All Implemented Interfaces:
    BlobGarbageCollector

    public class MarkSweepGarbageCollector
    extends java.lang.Object
    implements BlobGarbageCollector
    Mark and sweep garbage collector. Uses the file system to store internal state while in process to account for huge data. This class is not thread safe.
    • Field Detail

      • LOG

        public static final Logger LOG
      • TEMP_DIR

        public static final java.lang.String TEMP_DIR
    • Constructor Detail

      • MarkSweepGarbageCollector

        public MarkSweepGarbageCollector​(BlobReferenceRetriever marker,
                                         GarbageCollectableBlobStore blobStore,
                                         java.util.concurrent.Executor executor,
                                         java.lang.String root,
                                         int batchCount,
                                         long maxLastModifiedInterval,
                                         boolean checkConsistencyAfterGc,
                                         boolean sweepIfRefsPastRetention,
                                         @Nullable
                                         @Nullable java.lang.String repositoryId,
                                         @Nullable
                                         @Nullable Whiteboard whiteboard,
                                         @Nullable
                                         @Nullable StatisticsProvider statisticsProvider)
                                  throws java.io.IOException
        Creates an instance of MarkSweepGarbageCollector
        Parameters:
        marker - BlobReferenceRetriever instanced used to fetch refereed blob entries
        blobStore - the blob store instance
        executor - executor
        root - the root absolute path of directory under which temporary files would be created
        batchCount - batch sized used for saving intermediate state
        maxLastModifiedInterval - lastModifiedTime in millis. Only files with time less than this time would be considered for GC
        repositoryId - unique repository id for this node
        whiteboard - whiteboard instance
        statisticsProvider - statistics provider instance
        Throws:
        java.io.IOException
      • MarkSweepGarbageCollector

        public MarkSweepGarbageCollector​(BlobReferenceRetriever marker,
                                         GarbageCollectableBlobStore blobStore,
                                         java.util.concurrent.Executor executor,
                                         java.lang.String root,
                                         int batchCount,
                                         long maxLastModifiedInterval,
                                         @Nullable
                                         @Nullable java.lang.String repositoryId)
                                  throws java.io.IOException
        Throws:
        java.io.IOException
      • MarkSweepGarbageCollector

        public MarkSweepGarbageCollector​(BlobReferenceRetriever marker,
                                         GarbageCollectableBlobStore blobStore,
                                         java.util.concurrent.Executor executor,
                                         long maxLastModifiedInterval,
                                         @Nullable
                                         @Nullable java.lang.String repositoryId,
                                         @Nullable
                                         @Nullable Whiteboard whiteboard,
                                         @Nullable
                                         @Nullable StatisticsProvider statisticsProvider)
                                  throws java.io.IOException
        Instantiates a new blob garbage collector.
        Throws:
        java.io.IOException
    • Method Detail

      • collectGarbage

        public void collectGarbage​(boolean markOnly)
                            throws java.lang.Exception
        Description copied from interface: BlobGarbageCollector
        Marks garbage blobs from the passed node store instance. Collects them only if markOnly is false.
        Specified by:
        collectGarbage in interface BlobGarbageCollector
        Parameters:
        markOnly - whether to only mark references and not sweep in the mark and sweep operation.
        Throws:
        java.lang.Exception - the exception
      • collectGarbage

        public void collectGarbage​(boolean markOnly,
                                   boolean forceBlobRetrieve)
                            throws java.lang.Exception
        Description copied from interface: BlobGarbageCollector
        Marks garbage blobs from the passed node store instance. Collects them only if markOnly is false. Also forces retrieval of blob ids from the blob store rather than using any local tracking.
        Specified by:
        collectGarbage in interface BlobGarbageCollector
        Parameters:
        markOnly - whether to only mark references and not sweep in the mark and sweep operation.
        forceBlobRetrieve - whether to force retrieve of blob ids from datastore
        Throws:
        java.lang.Exception
      • getStats

        public java.util.List<GarbageCollectionRepoStats> getStats()
                                                            throws java.lang.Exception
        Returns the stats related to GC for all repos
        Specified by:
        getStats in interface BlobGarbageCollector
        Returns:
        a list of GarbageCollectionRepoStats objects
        Throws:
        java.lang.Exception
      • markAndSweep

        protected void markAndSweep​(boolean markOnly,
                                    boolean forceBlobRetrieve)
                             throws java.lang.Exception
        Mark and sweep. Main entry method for GC.
        Parameters:
        markOnly - whether to mark only
        forceBlobRetrieve - force retrieve blob ids
        Throws:
        java.lang.Exception - the exception
      • sweep

        protected long sweep​(GarbageCollectorFileState fs,
                             long markStart,
                             boolean forceBlobRetrieve)
                      throws java.lang.Exception
        Sweep phase of gc candidate deletion.

        Performs the following steps depending upon the type of the blob store refer SharedDataStore.Type:

        • Shared
          • Merge all marked references (from the mark phase run independently) available in the data store meta store (from all configured independent repositories).
          • Retrieve all blob ids available.
          • Diffs the 2 sets above to retrieve list of blob ids not used.
          • Deletes only blobs created after (earliest time stamp of the marked references - #maxLastModifiedInterval) from the above set.
        • Default
          • Mark phase already run.
          • Retrieve all blob ids available.
          • Diffs the 2 sets above to retrieve list of blob ids not used.
          • Deletes only blobs created after (time stamp of the marked references - #maxLastModifiedInterval).
        Parameters:
        fs - the garbage collector file state
        markStart - the start time of mark to take as reference for deletion
        forceBlobRetrieve -
        Returns:
        the number of blobs deleted
        Throws:
        java.lang.Exception - the exception
      • iterateNodeTree

        protected void iterateNodeTree​(GarbageCollectorFileState fs,
                                       boolean logPath)
                                throws java.io.IOException
        Iterates the complete node tree and collect all blob references
        Parameters:
        fs - the garbage collector file state
        logPath - whether to log path in the file or not
        Throws:
        java.io.IOException
      • checkConsistency

        public long checkConsistency​(boolean markOnly)
                              throws java.lang.Exception
        Description copied from interface: BlobGarbageCollector
        Collects the blob references and consolidates references from other repositories if available in the DataStore. Adds relevant metrics.
        Specified by:
        checkConsistency in interface BlobGarbageCollector
        Returns:
        Throws:
        java.lang.Exception
      • checkConsistency

        public long checkConsistency()
                              throws java.lang.Exception
        Checks for the DataStore consistency and reports the number of missing blobs still referenced.
        Specified by:
        checkConsistency in interface BlobGarbageCollector
        Returns:
        the missing blobs
        Throws:
        java.lang.Exception
      • setTraceOutput

        public void setTraceOutput​(boolean trace)
      • setClock

        public void setClock​(Clock clock)