Class DefaultAheadOfTimeBlobDownloader
- java.lang.Object
-
- org.apache.jackrabbit.oak.index.indexer.document.flatfile.DefaultAheadOfTimeBlobDownloader
-
- All Implemented Interfaces:
Closeable
,AutoCloseable
,AheadOfTimeBlobDownloader
public class DefaultAheadOfTimeBlobDownloader extends Object implements AheadOfTimeBlobDownloader
Scans a FlatFileStore for non-inlined blobs in nodes matching a given pattern and downloads them from the blob store. The goal of this class is to populate the local data store cache with the non-inlined blobs that are required by the indexer, so that when the indexing thread tries to retrieve the blob, it will find it locally, thereby avoiding an expensive call to the blob store. When indexing repositories with many non-inlined renditions, pre-populating the cache can cut the indexing time by more than half.This AOT download is intended to run asynchronously with the indexing thread. It starts the following threads:
- [scanner] - scans the FFS, searching for blobs to download. A blob is selected for download if it is a binary property in a node whose name matches the suffix given as parameter to this class, and is non-inlined.
- [downloader-n] - a configurable number of threads that download the blobs that were discovered by the scanner thread.
updateIndexed(long)
to inform the AOT downlaoder of the last line indexed. This is necessary to keep the AOT downloader more or less in sync with the indexer, that is, to prevent it from falling behind and to prevent it from going to far ahead.This AOT downloader should be configured with enough threads that it is able to stay ahead of the indexer. Whether it can remain ahead or not, will depend on the number of blobs to download and the speed of the connection to the blob store. As a rough guide, on a cloud environment with blob stored in Azure Blob Store or Amazon S3, 4 download threads should be enough. If the AOT downloader falls behind the indexer, it will skip any nodes that are behind the last known indexing position, to try to catchup.
The AOT downlaoder will also try not to be too far ahead of the indexer. This is done to avoid filling up the local blob store cache, which would cause blobs to be evicted before the indexer gets around to use them. In this case, the indexer would have to download again the blob, which would negate the benefits of using this AOT downloader. The AOT downlaoder takes as parameter the maximum amount of data that it is allowed to prefetch (
maxPrefetchWindowMB
). It will them try to not download more than this data, pausing its progress whenever the prefect window is full. For details on how this implemented, seeAheadOfTimeBlobDownloaderThrottler
.
-
-
Field Summary
-
Fields inherited from interface org.apache.jackrabbit.oak.index.indexer.document.flatfile.AheadOfTimeBlobDownloader
NOOP
-
-
Constructor Summary
Constructors Constructor Description DefaultAheadOfTimeBlobDownloader(@NotNull String binaryBlobsPathSuffix, @NotNull File ffsPath, @NotNull Compression algorithm, @NotNull GarbageCollectableBlobStore blobStore, @NotNull List<org.apache.jackrabbit.oak.plugins.index.search.IndexDefinition> indexDefinitions, int nDownloadThreads, int maxPrefetchWindowSize, int maxPrefetchWindowMB)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description void
close()
String
formatAggregateStatistics()
long
getBlobsEnqueuedForDownload()
long
getLinesScanned()
long
getNotIncludedInIndex()
long
getTotalBlobsDownloaded()
void
join()
void
start()
void
stop()
void
updateIndexed(long positionIndexed)
-
-
-
Constructor Detail
-
DefaultAheadOfTimeBlobDownloader
public DefaultAheadOfTimeBlobDownloader(@NotNull @NotNull String binaryBlobsPathSuffix, @NotNull @NotNull File ffsPath, @NotNull @NotNull Compression algorithm, @NotNull @NotNull GarbageCollectableBlobStore blobStore, @NotNull @NotNull List<org.apache.jackrabbit.oak.plugins.index.search.IndexDefinition> indexDefinitions, int nDownloadThreads, int maxPrefetchWindowSize, int maxPrefetchWindowMB)
- Parameters:
binaryBlobsPathSuffix
- Suffix of nodes that are to be considered for AOT download. Any node that does not match this suffix is ignored.ffsPath
- Flat file store path.algorithm
- Compression algorithm of the flat file store.blobStore
- The blob store. This should be the same blob store used by the indexer and its cache should be large enough to holdmaxPrefetchWindowMB
of data.indexDefinitions
- The indexeres for which AOT blob download is enabled.nDownloadThreads
- Number of download threads.maxPrefetchWindowMB
- Size of the prefetch window, that is, how much data the downlaoder will retrieve ahead of the indexer.
-
-
Method Detail
-
start
public void start()
- Specified by:
start
in interfaceAheadOfTimeBlobDownloader
-
join
public void join() throws ExecutionException, InterruptedException
-
updateIndexed
public void updateIndexed(long positionIndexed)
- Specified by:
updateIndexed
in interfaceAheadOfTimeBlobDownloader
-
close
public void close()
- Specified by:
close
in interfaceAutoCloseable
- Specified by:
close
in interfaceCloseable
-
stop
public void stop()
-
formatAggregateStatistics
public String formatAggregateStatistics()
-
getBlobsEnqueuedForDownload
public long getBlobsEnqueuedForDownload()
-
getTotalBlobsDownloaded
public long getTotalBlobsDownloaded()
-
getLinesScanned
public long getLinesScanned()
-
getNotIncludedInIndex
public long getNotIncludedInIndex()
-
-