Class MarkSweepGarbageCollector

java.lang.Object
org.apache.jackrabbit.oak.plugins.blob.MarkSweepGarbageCollector
All Implemented Interfaces:
BlobGarbageCollector

public class MarkSweepGarbageCollector extends Object implements BlobGarbageCollector
Mark and sweep garbage collector. Uses the file system to store internal state while in process to account for huge data. This class is not thread safe.
  • Field Details

  • Constructor Details

    • MarkSweepGarbageCollector

      public MarkSweepGarbageCollector(BlobReferenceRetriever marker, GarbageCollectableBlobStore blobStore, Executor executor, String root, int batchCount, long maxLastModifiedInterval, boolean checkConsistencyAfterGc, boolean sweepIfRefsPastRetention, @Nullable @Nullable String repositoryId, @Nullable @Nullable Whiteboard whiteboard, @Nullable @Nullable StatisticsProvider statisticsProvider) throws IOException
      Creates an instance of MarkSweepGarbageCollector
      Parameters:
      marker - BlobReferenceRetriever instanced used to fetch refereed blob entries
      blobStore - the blob store instance
      executor - executor
      root - the root absolute path of directory under which temporary files would be created
      batchCount - batch sized used for saving intermediate state
      maxLastModifiedInterval - lastModifiedTime in millis. Only files with time less than this time would be considered for GC
      repositoryId - unique repository id for this node
      whiteboard - whiteboard instance
      statisticsProvider - statistics provider instance
      Throws:
      IOException
    • MarkSweepGarbageCollector

      public MarkSweepGarbageCollector(BlobReferenceRetriever marker, GarbageCollectableBlobStore blobStore, Executor executor, String root, int batchCount, long maxLastModifiedInterval, @Nullable @Nullable String repositoryId) throws IOException
      Throws:
      IOException
    • MarkSweepGarbageCollector

      public MarkSweepGarbageCollector(BlobReferenceRetriever marker, GarbageCollectableBlobStore blobStore, Executor executor, long maxLastModifiedInterval, @Nullable @Nullable String repositoryId, @Nullable @Nullable Whiteboard whiteboard, @Nullable @Nullable StatisticsProvider statisticsProvider) throws IOException
      Instantiates a new blob garbage collector.
      Throws:
      IOException
  • Method Details

    • collectGarbage

      public void collectGarbage(boolean markOnly) throws Exception
      Description copied from interface: BlobGarbageCollector
      Marks garbage blobs from the passed node store instance. Collects them only if markOnly is false.
      Specified by:
      collectGarbage in interface BlobGarbageCollector
      Parameters:
      markOnly - whether to only mark references and not sweep in the mark and sweep operation.
      Throws:
      Exception - the exception
    • collectGarbage

      public void collectGarbage(boolean markOnly, boolean forceBlobRetrieve) throws Exception
      Description copied from interface: BlobGarbageCollector
      Marks garbage blobs from the passed node store instance. Collects them only if markOnly is false. Also forces retrieval of blob ids from the blob store rather than using any local tracking.
      Specified by:
      collectGarbage in interface BlobGarbageCollector
      Parameters:
      markOnly - whether to only mark references and not sweep in the mark and sweep operation.
      forceBlobRetrieve - whether to force retrieve of blob ids from datastore
      Throws:
      Exception
    • getStats

      public List<GarbageCollectionRepoStats> getStats() throws Exception
      Returns the stats related to GC for all repos
      Specified by:
      getStats in interface BlobGarbageCollector
      Returns:
      a list of GarbageCollectionRepoStats objects
      Throws:
      Exception
    • getOperationStats

      public OperationsStatsMBean getOperationStats()
      Description copied from interface: BlobGarbageCollector
      Returns operation statistics
      Specified by:
      getOperationStats in interface BlobGarbageCollector
      Returns:
      stats object
    • getConsistencyOperationStats

      public OperationsStatsMBean getConsistencyOperationStats()
      Description copied from interface: BlobGarbageCollector
      Returns consistency operation statistics
      Specified by:
      getConsistencyOperationStats in interface BlobGarbageCollector
      Returns:
      stats object
    • markAndSweep

      protected void markAndSweep(boolean markOnly, boolean forceBlobRetrieve) throws Exception
      Mark and sweep. Main entry method for GC.
      Parameters:
      markOnly - whether to mark only
      forceBlobRetrieve - force retrieve blob ids
      Throws:
      Exception - the exception
    • mark

      protected void mark(GarbageCollectorFileState fs) throws IOException, DataStoreException
      Mark phase of the GC.
      Parameters:
      fs - the garbage collector file state
      Throws:
      IOException
      DataStoreException
    • sweep

      protected long sweep(GarbageCollectorFileState fs, long markStart, boolean forceBlobRetrieve) throws Exception
      Sweep phase of gc candidate deletion.

      Performs the following steps depending upon the type of the blob store refer SharedDataStore.Type:

      • Shared
        • Merge all marked references (from the mark phase run independently) available in the data store meta store (from all configured independent repositories).
        • Retrieve all blob ids available.
        • Diffs the 2 sets above to retrieve list of blob ids not used.
        • Deletes only blobs created after (earliest time stamp of the marked references - #maxLastModifiedInterval) from the above set.
      • Default
        • Mark phase already run.
        • Retrieve all blob ids available.
        • Diffs the 2 sets above to retrieve list of blob ids not used.
        • Deletes only blobs created after (time stamp of the marked references - #maxLastModifiedInterval).
      Parameters:
      fs - the garbage collector file state
      markStart - the start time of mark to take as reference for deletion
      forceBlobRetrieve -
      Returns:
      the number of blobs deleted
      Throws:
      Exception - the exception
    • iterateNodeTree

      protected void iterateNodeTree(GarbageCollectorFileState fs, boolean logPath) throws IOException
      Iterates the complete node tree and collect all blob references
      Parameters:
      fs - the garbage collector file state
      logPath - whether to log path in the file or not
      Throws:
      IOException
    • checkConsistency

      public long checkConsistency(boolean markOnly) throws Exception
      Description copied from interface: BlobGarbageCollector
      Collects the blob references and consolidates references from other repositories if available in the DataStore. Adds relevant metrics.
      Specified by:
      checkConsistency in interface BlobGarbageCollector
      Returns:
      Throws:
      Exception
    • checkConsistency

      public long checkConsistency() throws Exception
      Checks for the DataStore consistency and reports the number of missing blobs still referenced.
      Specified by:
      checkConsistency in interface BlobGarbageCollector
      Returns:
      the missing blobs
      Throws:
      Exception
    • setTraceOutput

      public void setTraceOutput(boolean trace)
    • setClock

      public void setClock(Clock clock)