java.lang.Object
- org.apache.jackrabbit.oak.index.indexer.document.flatfile.DefaultAheadOfTimeBlobDownloader

All Implemented Interfaces:

Closeable, AutoCloseable, AheadOfTimeBlobDownloader
```
public class DefaultAheadOfTimeBlobDownloader
extends Object
implements AheadOfTimeBlobDownloader
```
Scans a FlatFileStore for non-inlined blobs in nodes matching a given pattern and downloads them from the blob store. The goal of this class is to populate the local data store cache with the non-inlined blobs that are required by the indexer, so that when the indexing thread tries to retrieve the blob, it will find it locally, thereby avoiding an expensive call to the blob store. When indexing repositories with many non-inlined renditions, pre-populating the cache can cut the indexing time by more than half.
This AOT download is intended to run asynchronously with the indexing thread. It starts the following threads:
- [scanner] - scans the FFS, searching for blobs to download. A blob is selected for download if it is a binary property in a node whose name matches the suffix given as parameter to this class, and is non-inlined.
- [downloader-n] - a configurable number of threads that download the blobs that were discovered by the scanner thread.
The indexer should periodically call updateIndexed(long) to inform the AOT downlaoder of the last line indexed. This is necessary to keep the AOT downloader more or less in sync with the indexer, that is, to prevent it from falling behind and to prevent it from going to far ahead.
This AOT downloader should be configured with enough threads that it is able to stay ahead of the indexer. Whether it can remain ahead or not, will depend on the number of blobs to download and the speed of the connection to the blob store. As a rough guide, on a cloud environment with blob stored in Azure Blob Store or Amazon S3, 4 download threads should be enough. If the AOT downloader falls behind the indexer, it will skip any nodes that are behind the last known indexing position, to try to catchup.
The AOT downlaoder will also try not to be too far ahead of the indexer. This is done to avoid filling up the local blob store cache, which would cause blobs to be evicted before the indexer gets around to use them. In this case, the indexer would have to download again the blob, which would negate the benefits of using this AOT downloader. The AOT downlaoder takes as parameter the maximum amount of data that it is allowed to prefetch (maxPrefetchWindowMB). It will them try to not download more than this data, pausing its progress whenever the prefect window is full. For details on how this implemented, see AheadOfTimeBlobDownloaderThrottler.

Field Summary
- Fields inherited from interface org.apache.jackrabbit.oak.index.indexer.document.flatfile.AheadOfTimeBlobDownloader
  NOOP

Constructor Summary

Constructors
Constructor	Description
`DefaultAheadOfTimeBlobDownloader(@NotNull String binaryBlobsPathSuffix, @NotNull File ffsPath, @NotNull Compression algorithm, @NotNull GarbageCollectableBlobStore blobStore, @NotNull List<org.apache.jackrabbit.oak.plugins.index.search.IndexDefinition> indexDefinitions, int nDownloadThreads, int maxPrefetchWindowSize, int maxPrefetchWindowMB)`

Method Summary

All Methods Instance Methods Concrete Methods
Modifier and Type	Method	Description
`void`	`close()`
`String`	`formatAggregateStatistics()`
`long`	`getBlobsEnqueuedForDownload()`
`long`	`getLinesScanned()`
`long`	`getNotIncludedInIndex()`
`long`	`getTotalBlobsDownloaded()`
`void`	`join()`
`void`	`start()`
`void`	`stop()`
`void`	`updateIndexed(long positionIndexed)`

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Constructor Detail

DefaultAheadOfTimeBlobDownloader

public DefaultAheadOfTimeBlobDownloader(@NotNull
                                        @NotNull String binaryBlobsPathSuffix,
                                        @NotNull
                                        @NotNull File ffsPath,
                                        @NotNull
                                        @NotNull Compression algorithm,
                                        @NotNull
                                        @NotNull GarbageCollectableBlobStore blobStore,
                                        @NotNull
                                        @NotNull List<org.apache.jackrabbit.oak.plugins.index.search.IndexDefinition> indexDefinitions,
                                        int nDownloadThreads,
                                        int maxPrefetchWindowSize,
                                        int maxPrefetchWindowMB)

Parameters:: binaryBlobsPathSuffix - Suffix of nodes that are to be considered for AOT download. Any node that does not match this suffix is ignored.; ffsPath - Flat file store path.; algorithm - Compression algorithm of the flat file store.; blobStore - The blob store. This should be the same blob store used by the indexer and its cache should be large enough to hold maxPrefetchWindowMB of data.; indexDefinitions - The indexeres for which AOT blob download is enabled.; nDownloadThreads - Number of download threads.; maxPrefetchWindowMB - Size of the prefetch window, that is, how much data the downlaoder will retrieve ahead of the indexer.

Method Detail

start
```
public void start()
```
Specified by:

start in interface AheadOfTimeBlobDownloader

join

public void join()
          throws ExecutionException,
                 InterruptedException

Throws:: ExecutionException; InterruptedException

updateIndexed
```
public void updateIndexed(long positionIndexed)
```
Specified by:

updateIndexed in interface AheadOfTimeBlobDownloader

close
```
public void close()
```
Specified by:

close in interface AutoCloseable

Specified by:

close in interface Closeable

stop
```
public void stop()
```

formatAggregateStatistics

public String formatAggregateStatistics()

getBlobsEnqueuedForDownload

public long getBlobsEnqueuedForDownload()

getTotalBlobsDownloaded
```
public long getTotalBlobsDownloaded()
```

getLinesScanned
```
public long getLinesScanned()
```

getNotIncludedInIndex
```
public long getNotIncludedInIndex()
```

Class DefaultAheadOfTimeBlobDownloader

Field Summary

Fields inherited from interface org.apache.jackrabbit.oak.index.indexer.document.flatfile.AheadOfTimeBlobDownloader

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Constructor Detail

DefaultAheadOfTimeBlobDownloader

Method Detail

start

join

updateIndexed

close

stop

formatAggregateStatistics

getBlobsEnqueuedForDownload

getTotalBlobsDownloaded

getLinesScanned

getNotIncludedInIndex