Class MarkSweepGarbageCollector
java.lang.Object
org.apache.jackrabbit.oak.plugins.blob.MarkSweepGarbageCollector
- All Implemented Interfaces:
BlobGarbageCollector
Mark and sweep garbage collector.
Uses the file system to store internal state while in process to account for huge data.
This class is not thread safe.
-
Field Summary
Fields -
Constructor Summary
ConstructorsConstructorDescriptionMarkSweepGarbageCollector
(BlobReferenceRetriever marker, GarbageCollectableBlobStore blobStore, Executor executor, long maxLastModifiedInterval, @Nullable String repositoryId, @Nullable Whiteboard whiteboard, @Nullable StatisticsProvider statisticsProvider) Instantiates a new blob garbage collector.MarkSweepGarbageCollector
(BlobReferenceRetriever marker, GarbageCollectableBlobStore blobStore, Executor executor, String root, int batchCount, long maxLastModifiedInterval, boolean checkConsistencyAfterGc, boolean sweepIfRefsPastRetention, @Nullable String repositoryId, @Nullable Whiteboard whiteboard, @Nullable StatisticsProvider statisticsProvider) Creates an instance of MarkSweepGarbageCollectorMarkSweepGarbageCollector
(BlobReferenceRetriever marker, GarbageCollectableBlobStore blobStore, Executor executor, String root, int batchCount, long maxLastModifiedInterval, @Nullable String repositoryId) -
Method Summary
Modifier and TypeMethodDescriptionlong
Checks for the DataStore consistency and reports the number of missing blobs still referenced.long
checkConsistency
(boolean markOnly) Collects the blob references and consolidates references from other repositories if available in the DataStore.void
collectGarbage
(boolean markOnly) Marks garbage blobs from the passed node store instance.void
collectGarbage
(boolean markOnly, boolean forceBlobRetrieve) Marks garbage blobs from the passed node store instance.Returns consistency operation statisticsReturns operation statisticsgetStats()
Returns the stats related to GC for all reposprotected void
iterateNodeTree
(GarbageCollectorFileState fs, boolean logPath) Iterates the complete node tree and collect all blob referencesprotected void
Mark phase of the GC.protected void
markAndSweep
(boolean markOnly, boolean forceBlobRetrieve) Mark and sweep.void
void
setTraceOutput
(boolean trace) protected long
sweep
(GarbageCollectorFileState fs, long markStart, boolean forceBlobRetrieve) Sweep phase of gc candidate deletion.
-
Field Details
-
LOG
-
TEMP_DIR
-
DEFAULT_BATCH_COUNT
public static final int DEFAULT_BATCH_COUNT- See Also:
-
DELIM
- See Also:
-
-
Constructor Details
-
MarkSweepGarbageCollector
public MarkSweepGarbageCollector(BlobReferenceRetriever marker, GarbageCollectableBlobStore blobStore, Executor executor, String root, int batchCount, long maxLastModifiedInterval, boolean checkConsistencyAfterGc, boolean sweepIfRefsPastRetention, @Nullable @Nullable String repositoryId, @Nullable @Nullable Whiteboard whiteboard, @Nullable @Nullable StatisticsProvider statisticsProvider) throws IOException Creates an instance of MarkSweepGarbageCollector- Parameters:
marker
- BlobReferenceRetriever instanced used to fetch refereed blob entriesblobStore
- the blob store instanceexecutor
- executorroot
- the root absolute path of directory under which temporary files would be createdbatchCount
- batch sized used for saving intermediate statemaxLastModifiedInterval
- lastModifiedTime in millis. Only files with time less than this time would be considered for GCrepositoryId
- unique repository id for this nodewhiteboard
- whiteboard instancestatisticsProvider
- statistics provider instance- Throws:
IOException
-
MarkSweepGarbageCollector
public MarkSweepGarbageCollector(BlobReferenceRetriever marker, GarbageCollectableBlobStore blobStore, Executor executor, String root, int batchCount, long maxLastModifiedInterval, @Nullable @Nullable String repositoryId) throws IOException - Throws:
IOException
-
MarkSweepGarbageCollector
public MarkSweepGarbageCollector(BlobReferenceRetriever marker, GarbageCollectableBlobStore blobStore, Executor executor, long maxLastModifiedInterval, @Nullable @Nullable String repositoryId, @Nullable @Nullable Whiteboard whiteboard, @Nullable @Nullable StatisticsProvider statisticsProvider) throws IOException Instantiates a new blob garbage collector.- Throws:
IOException
-
-
Method Details
-
collectGarbage
Description copied from interface:BlobGarbageCollector
Marks garbage blobs from the passed node store instance. Collects them only if markOnly is false.- Specified by:
collectGarbage
in interfaceBlobGarbageCollector
- Parameters:
markOnly
- whether to only mark references and not sweep in the mark and sweep operation.- Throws:
Exception
- the exception
-
collectGarbage
Description copied from interface:BlobGarbageCollector
Marks garbage blobs from the passed node store instance. Collects them only if markOnly is false. Also forces retrieval of blob ids from the blob store rather than using any local tracking.- Specified by:
collectGarbage
in interfaceBlobGarbageCollector
- Parameters:
markOnly
- whether to only mark references and not sweep in the mark and sweep operation.forceBlobRetrieve
- whether to force retrieve of blob ids from datastore- Throws:
Exception
-
getStats
Returns the stats related to GC for all repos- Specified by:
getStats
in interfaceBlobGarbageCollector
- Returns:
- a list of GarbageCollectionRepoStats objects
- Throws:
Exception
-
getOperationStats
Description copied from interface:BlobGarbageCollector
Returns operation statistics- Specified by:
getOperationStats
in interfaceBlobGarbageCollector
- Returns:
- stats object
-
getConsistencyOperationStats
Description copied from interface:BlobGarbageCollector
Returns consistency operation statistics- Specified by:
getConsistencyOperationStats
in interfaceBlobGarbageCollector
- Returns:
- stats object
-
markAndSweep
Mark and sweep. Main entry method for GC.- Parameters:
markOnly
- whether to mark onlyforceBlobRetrieve
- force retrieve blob ids- Throws:
Exception
- the exception
-
mark
Mark phase of the GC.- Parameters:
fs
- the garbage collector file state- Throws:
IOException
DataStoreException
-
sweep
protected long sweep(GarbageCollectorFileState fs, long markStart, boolean forceBlobRetrieve) throws Exception Sweep phase of gc candidate deletion.Performs the following steps depending upon the type of the blob store refer
SharedDataStore.Type
:- Shared
-
- Merge all marked references (from the mark phase run independently) available in the data store meta store (from all configured independent repositories).
- Retrieve all blob ids available.
- Diffs the 2 sets above to retrieve list of blob ids not used.
- Deletes only blobs created after (earliest time stamp of the marked references - #maxLastModifiedInterval) from the above set.
- Default
-
- Mark phase already run.
- Retrieve all blob ids available.
- Diffs the 2 sets above to retrieve list of blob ids not used.
- Deletes only blobs created after (time stamp of the marked references - #maxLastModifiedInterval).
- Parameters:
fs
- the garbage collector file statemarkStart
- the start time of mark to take as reference for deletionforceBlobRetrieve
-- Returns:
- the number of blobs deleted
- Throws:
Exception
- the exception
-
iterateNodeTree
Iterates the complete node tree and collect all blob references- Parameters:
fs
- the garbage collector file statelogPath
- whether to log path in the file or not- Throws:
IOException
-
checkConsistency
Description copied from interface:BlobGarbageCollector
Collects the blob references and consolidates references from other repositories if available in the DataStore. Adds relevant metrics.- Specified by:
checkConsistency
in interfaceBlobGarbageCollector
- Returns:
- Throws:
Exception
-
checkConsistency
Checks for the DataStore consistency and reports the number of missing blobs still referenced.- Specified by:
checkConsistency
in interfaceBlobGarbageCollector
- Returns:
- the missing blobs
- Throws:
Exception
-
setTraceOutput
public void setTraceOutput(boolean trace) -
setClock
-