java.lang.Object
- org.apache.jackrabbit.oak.index.indexer.document.flatfile.pipelined.PipelinedMongoDownloadTask

All Implemented Interfaces:: Callable<PipelinedMongoDownloadTask.Result>

public class PipelinedMongoDownloadTask
extends Object
implements Callable<PipelinedMongoDownloadTask.Result>

Nested Class Summary

Nested Classes
Modifier and Type Class Description

static class PipelinedMongoDownloadTask.Result

Field Summary

Fields
Modifier and Type	Field	Description
`static int`	`DEFAULT_OAK_INDEXER_PIPELINED_MONGO_CONNECTION_RETRY_SECONDS`
`static String`	`DEFAULT_OAK_INDEXER_PIPELINED_MONGO_CUSTOM_EXCLUDE_ENTRIES_REGEX`
`static String`	`DEFAULT_OAK_INDEXER_PIPELINED_MONGO_CUSTOM_EXCLUDED_PATHS`
`static boolean`	`DEFAULT_OAK_INDEXER_PIPELINED_MONGO_PARALLEL_DUMP`
`static boolean`	`DEFAULT_OAK_INDEXER_PIPELINED_MONGO_PARALLEL_DUMP_SECONDARIES_ONLY`
`static boolean`	`DEFAULT_OAK_INDEXER_PIPELINED_MONGO_REGEX_PATH_FILTERING`
`static int`	`DEFAULT_OAK_INDEXER_PIPELINED_MONGO_REGEX_PATH_FILTERING_MAX_PATHS`
`static boolean`	`DEFAULT_OAK_INDEXER_PIPELINED_RETRY_ON_CONNECTION_ERRORS`
`static String`	`OAK_INDEXER_PIPELINED_MONGO_CONNECTION_RETRY_SECONDS`
`static String`	`OAK_INDEXER_PIPELINED_MONGO_CUSTOM_EXCLUDE_ENTRIES_REGEX`	Any document with a path that matches this regex pattern will not be downloaded.
`static String`	`OAK_INDEXER_PIPELINED_MONGO_CUSTOM_EXCLUDED_PATHS`	Additional Oak paths to exclude from downloading from Mongo.
`static String`	`OAK_INDEXER_PIPELINED_MONGO_PARALLEL_DUMP`	Whether to download in parallel from Mongo with two streams, one per each secondary.
`static String`	`OAK_INDEXER_PIPELINED_MONGO_PARALLEL_DUMP_SECONDARIES_ONLY`	When using parallel download, allow downloading from any replica.
`static String`	`OAK_INDEXER_PIPELINED_MONGO_REGEX_PATH_FILTERING`	Whether to do path filtering in the Mongo query instead of doing a full traversal of the document store and filtering in the indexing job.
`static String`	`OAK_INDEXER_PIPELINED_MONGO_REGEX_PATH_FILTERING_MAX_PATHS`	Maximum number of elements in the included/excluded paths list used for regex path filtering.
`static String`	`OAK_INDEXER_PIPELINED_RETRY_ON_CONNECTION_ERRORS`	Whether to retry on connection errors to MongoDB.
`static org.bson.RawBsonDocument[]`	`SENTINEL_MONGO_DOCUMENT`

Constructor Summary

Constructors
Constructor	Description
`PipelinedMongoDownloadTask(com.mongodb.MongoClientURI mongoClientURI, MongoDocumentStore docStore, int maxBatchSizeBytes, int maxBatchNumberOfDocuments, BlockingQueue<org.bson.RawBsonDocument[]> queue, List<PathFilter> pathFilters, StatisticsProvider statisticsProvider, IndexingReporter reporter, ThreadFactory threadFactory)`
`PipelinedMongoDownloadTask(com.mongodb.MongoClientURI mongoClientURI, MongoDocumentStore docStore, int maxBatchSizeBytes, int maxBatchNumberOfDocuments, BlockingQueue<org.bson.RawBsonDocument[]> queue, List<PathFilter> pathFilters, StatisticsProvider statisticsProvider, IndexingReporter reporter, ThreadFactory threadFactory, long minModified)`

Method Summary

All Methods Instance Methods Concrete Methods
Modifier and Type Method Description

PipelinedMongoDownloadTask.Result call()
- Methods inherited from class java.lang.Object
  clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Field Detail
  - SENTINEL_MONGO_DOCUMENT
```
public static final org.bson.RawBsonDocument[] SENTINEL_MONGO_DOCUMENT
```
  - OAK_INDEXER_PIPELINED_RETRY_ON_CONNECTION_ERRORS
```
public static final String OAK_INDEXER_PIPELINED_RETRY_ON_CONNECTION_ERRORS
```
    Whether to retry on connection errors to MongoDB. This property affects the query that is used to download the documents from MongoDB. If set to true, the query will traverse the results by order of the _modified property (does an index scan), which allows it to resume after a failed connection from where it left off. If set to false, it uses a potentially more efficient query that does not impose any order on the results (does a simple column scan).
    
    See Also:
    
    Constant Field Values
  - DEFAULT_OAK_INDEXER_PIPELINED_RETRY_ON_CONNECTION_ERRORS
```
public static final boolean DEFAULT_OAK_INDEXER_PIPELINED_RETRY_ON_CONNECTION_ERRORS
```
    See Also:
    
    Constant Field Values
  - OAK_INDEXER_PIPELINED_MONGO_CONNECTION_RETRY_SECONDS
```
public static final String OAK_INDEXER_PIPELINED_MONGO_CONNECTION_RETRY_SECONDS
```
    See Also:
    
    Constant Field Values
  - DEFAULT_OAK_INDEXER_PIPELINED_MONGO_CONNECTION_RETRY_SECONDS
```
public static final int DEFAULT_OAK_INDEXER_PIPELINED_MONGO_CONNECTION_RETRY_SECONDS
```
    See Also:
    
    Constant Field Values
  - OAK_INDEXER_PIPELINED_MONGO_REGEX_PATH_FILTERING
```
public static final String OAK_INDEXER_PIPELINED_MONGO_REGEX_PATH_FILTERING
```
    Whether to do path filtering in the Mongo query instead of doing a full traversal of the document store and filtering in the indexing job. This feature may significantly reduce the number of documents downloaded from Mongo. The performance gains may not be proportional to the reduction in the number of documents downloaded because Mongo still has to traverse all the documents. This is required because the regex expression used for path filtering starts with a wildcard (because the _id starts with the depth of the path, so the regex expression must ignore this part). Because of the wildcard at the start, Mongo cannot use of the index on _id.
    
    See Also:
    
    Constant Field Values
  - DEFAULT_OAK_INDEXER_PIPELINED_MONGO_REGEX_PATH_FILTERING
```
public static final boolean DEFAULT_OAK_INDEXER_PIPELINED_MONGO_REGEX_PATH_FILTERING
```
    See Also:
    
    Constant Field Values
  - OAK_INDEXER_PIPELINED_MONGO_CUSTOM_EXCLUDE_ENTRIES_REGEX
```
public static final String OAK_INDEXER_PIPELINED_MONGO_CUSTOM_EXCLUDE_ENTRIES_REGEX
```
    Any document with a path that matches this regex pattern will not be downloaded. This pattern will be included in the Mongo query, that is, the filtering is done by server-side at Mongo, which avoids downloading the documents matching this query. This is typically a _suffix_, for example "/metadata.xml$|/renditions/.*.jpg$". To exclude subtrees such as /content/abc, use mongoFilterPaths instead.
    
    See Also:
    
    Constant Field Values
  - DEFAULT_OAK_INDEXER_PIPELINED_MONGO_CUSTOM_EXCLUDE_ENTRIES_REGEX
```
public static final String DEFAULT_OAK_INDEXER_PIPELINED_MONGO_CUSTOM_EXCLUDE_ENTRIES_REGEX
```
    See Also:
    
    Constant Field Values
  - OAK_INDEXER_PIPELINED_MONGO_REGEX_PATH_FILTERING_MAX_PATHS
```
public static final String OAK_INDEXER_PIPELINED_MONGO_REGEX_PATH_FILTERING_MAX_PATHS
```
    Maximum number of elements in the included/excluded paths list used for regex path filtering. If after merging and de-deduplication of the paths of all the path filters the number of included or excluded paths exceeds this value, then disable path filtering to avoid creating Mongo queries with large number of filters
    
    See Also:
    
    Constant Field Values
  - DEFAULT_OAK_INDEXER_PIPELINED_MONGO_REGEX_PATH_FILTERING_MAX_PATHS
```
public static final int DEFAULT_OAK_INDEXER_PIPELINED_MONGO_REGEX_PATH_FILTERING_MAX_PATHS
```
    See Also:
    
    Constant Field Values
  - OAK_INDEXER_PIPELINED_MONGO_CUSTOM_EXCLUDED_PATHS
```
public static final String OAK_INDEXER_PIPELINED_MONGO_CUSTOM_EXCLUDED_PATHS
```
    Additional Oak paths to exclude from downloading from Mongo. This is a comma-separated list of paths. These paths are only filtered if the included paths computed from the indexes resolve to the root tree (/), otherwise the value of this property is ignored.
    
    See Also:
    
    Constant Field Values
  - DEFAULT_OAK_INDEXER_PIPELINED_MONGO_CUSTOM_EXCLUDED_PATHS
```
public static final String DEFAULT_OAK_INDEXER_PIPELINED_MONGO_CUSTOM_EXCLUDED_PATHS
```
    See Also:
    
    Constant Field Values
  - OAK_INDEXER_PIPELINED_MONGO_PARALLEL_DUMP
```
public static final String OAK_INDEXER_PIPELINED_MONGO_PARALLEL_DUMP
```
    Whether to download in parallel from Mongo with two streams, one per each secondary. This applies only if Mongo is a cluster with two secondaries. One thread downloads in ascending order of (_modified, _id) and the other in descending order, until they cross. This feature requires that the retryOnConnectionErrors property is set to true, because it relies on downloading in a given order (if retryOnConnectionErrors is false, the download is done in natural order, that is, it is undefined).
    
    See Also:
    
    Constant Field Values
  - DEFAULT_OAK_INDEXER_PIPELINED_MONGO_PARALLEL_DUMP
```
public static final boolean DEFAULT_OAK_INDEXER_PIPELINED_MONGO_PARALLEL_DUMP
```
    See Also:
    
    Constant Field Values
  - OAK_INDEXER_PIPELINED_MONGO_PARALLEL_DUMP_SECONDARIES_ONLY
```
public static final String OAK_INDEXER_PIPELINED_MONGO_PARALLEL_DUMP_SECONDARIES_ONLY
```
    When using parallel download, allow downloading from any replica. By default, we download only from secondaries, allowing only a single download from each of the secondaries. This is done to minimize the load on the primary and to spread the load between the two secondaries. But in some cases it may be preferable to allow unrestricted download from any replica, for instance, if downloading from a standalone Mongo cluster. Even when there is a single replica, downloading in parallel with two connections might yield better performance. This is also useful for testing.
    
    See Also:
    
    Constant Field Values
  - DEFAULT_OAK_INDEXER_PIPELINED_MONGO_PARALLEL_DUMP_SECONDARIES_ONLY
```
public static boolean DEFAULT_OAK_INDEXER_PIPELINED_MONGO_PARALLEL_DUMP_SECONDARIES_ONLY
```
- Constructor Detail
  - PipelinedMongoDownloadTask
```
public PipelinedMongoDownloadTask(com.mongodb.MongoClientURI mongoClientURI,
                                  MongoDocumentStore docStore,
                                  int maxBatchSizeBytes,
                                  int maxBatchNumberOfDocuments,
                                  BlockingQueue<org.bson.RawBsonDocument[]> queue,
                                  List<PathFilter> pathFilters,
                                  StatisticsProvider statisticsProvider,
                                  IndexingReporter reporter,
                                  ThreadFactory threadFactory)
```
  - PipelinedMongoDownloadTask
```
public PipelinedMongoDownloadTask(com.mongodb.MongoClientURI mongoClientURI,
                                  MongoDocumentStore docStore,
                                  int maxBatchSizeBytes,
                                  int maxBatchNumberOfDocuments,
                                  BlockingQueue<org.bson.RawBsonDocument[]> queue,
                                  List<PathFilter> pathFilters,
                                  StatisticsProvider statisticsProvider,
                                  IndexingReporter reporter,
                                  ThreadFactory threadFactory,
                                  long minModified)
```
- Method Detail
  - call
```
public PipelinedMongoDownloadTask.Result call()
                                       throws Exception
```
    Specified by:
    
    call in interface Callable<PipelinedMongoDownloadTask.Result>
    
    Throws:
    
    Exception

Class PipelinedMongoDownloadTask

Nested Class Summary

Field Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Field Detail

SENTINEL_MONGO_DOCUMENT

OAK_INDEXER_PIPELINED_RETRY_ON_CONNECTION_ERRORS

DEFAULT_OAK_INDEXER_PIPELINED_RETRY_ON_CONNECTION_ERRORS

OAK_INDEXER_PIPELINED_MONGO_CONNECTION_RETRY_SECONDS

DEFAULT_OAK_INDEXER_PIPELINED_MONGO_CONNECTION_RETRY_SECONDS

OAK_INDEXER_PIPELINED_MONGO_REGEX_PATH_FILTERING

DEFAULT_OAK_INDEXER_PIPELINED_MONGO_REGEX_PATH_FILTERING

OAK_INDEXER_PIPELINED_MONGO_CUSTOM_EXCLUDE_ENTRIES_REGEX

DEFAULT_OAK_INDEXER_PIPELINED_MONGO_CUSTOM_EXCLUDE_ENTRIES_REGEX

OAK_INDEXER_PIPELINED_MONGO_REGEX_PATH_FILTERING_MAX_PATHS

DEFAULT_OAK_INDEXER_PIPELINED_MONGO_REGEX_PATH_FILTERING_MAX_PATHS

OAK_INDEXER_PIPELINED_MONGO_CUSTOM_EXCLUDED_PATHS

DEFAULT_OAK_INDEXER_PIPELINED_MONGO_CUSTOM_EXCLUDED_PATHS

OAK_INDEXER_PIPELINED_MONGO_PARALLEL_DUMP

DEFAULT_OAK_INDEXER_PIPELINED_MONGO_PARALLEL_DUMP

OAK_INDEXER_PIPELINED_MONGO_PARALLEL_DUMP_SECONDARIES_ONLY

DEFAULT_OAK_INDEXER_PIPELINED_MONGO_PARALLEL_DUMP_SECONDARIES_ONLY

Constructor Detail

PipelinedMongoDownloadTask

PipelinedMongoDownloadTask

Method Detail

call