Class PipelinedMongoDownloadTask
- java.lang.Object
-
- org.apache.jackrabbit.oak.index.indexer.document.flatfile.pipelined.PipelinedMongoDownloadTask
-
- All Implemented Interfaces:
java.util.concurrent.Callable<PipelinedMongoDownloadTask.Result>
public class PipelinedMongoDownloadTask extends java.lang.Object implements java.util.concurrent.Callable<PipelinedMongoDownloadTask.Result>
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
PipelinedMongoDownloadTask.Result
-
Field Summary
Fields Modifier and Type Field Description static int
DEFAULT_OAK_INDEXER_PIPELINED_MONGO_CONNECTION_RETRY_SECONDS
static java.lang.String
DEFAULT_OAK_INDEXER_PIPELINED_MONGO_CUSTOM_EXCLUDE_ENTRIES_REGEX
static java.lang.String
DEFAULT_OAK_INDEXER_PIPELINED_MONGO_CUSTOM_EXCLUDED_PATHS
static boolean
DEFAULT_OAK_INDEXER_PIPELINED_MONGO_REGEX_PATH_FILTERING
static int
DEFAULT_OAK_INDEXER_PIPELINED_MONGO_REGEX_PATH_FILTERING_MAX_PATHS
static boolean
DEFAULT_OAK_INDEXER_PIPELINED_RETRY_ON_CONNECTION_ERRORS
static java.lang.String
OAK_INDEXER_PIPELINED_MONGO_CONNECTION_RETRY_SECONDS
static java.lang.String
OAK_INDEXER_PIPELINED_MONGO_CUSTOM_EXCLUDE_ENTRIES_REGEX
Any document with a path that matches this regex pattern will not be downloaded.static java.lang.String
OAK_INDEXER_PIPELINED_MONGO_CUSTOM_EXCLUDED_PATHS
Additional Oak paths to exclude from downloading from Mongo.static java.lang.String
OAK_INDEXER_PIPELINED_MONGO_REGEX_PATH_FILTERING
Whether to do path filtering in the Mongo query instead of doing a full traversal of the document store and filtering in the indexing job.static java.lang.String
OAK_INDEXER_PIPELINED_MONGO_REGEX_PATH_FILTERING_MAX_PATHS
Maximum number of elements in the included/excluded paths list used for regex path filtering.static java.lang.String
OAK_INDEXER_PIPELINED_RETRY_ON_CONNECTION_ERRORS
Whether to retry on connection errors to MongoDB.
-
Constructor Summary
Constructors Constructor Description PipelinedMongoDownloadTask(com.mongodb.client.MongoDatabase mongoDatabase, MongoDocumentStore mongoDocStore, int maxBatchSizeBytes, int maxBatchNumberOfDocuments, java.util.concurrent.BlockingQueue<NodeDocument[]> queue, java.util.List<PathFilter> pathFilters, StatisticsProvider statisticsProvider, IndexingReporter reporter)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description PipelinedMongoDownloadTask.Result
call()
-
-
-
Field Detail
-
OAK_INDEXER_PIPELINED_RETRY_ON_CONNECTION_ERRORS
public static final java.lang.String OAK_INDEXER_PIPELINED_RETRY_ON_CONNECTION_ERRORS
Whether to retry on connection errors to MongoDB. This property affects the query that is used to download the documents from MongoDB. If set to true, the query will traverse the results by order of the _modified property (does an index scan), which allows it to resume after a failed connection from where it left off. If set to false, it uses a potentially more efficient query that does not impose any order on the results (does a simple column scan).- See Also:
- Constant Field Values
-
DEFAULT_OAK_INDEXER_PIPELINED_RETRY_ON_CONNECTION_ERRORS
public static final boolean DEFAULT_OAK_INDEXER_PIPELINED_RETRY_ON_CONNECTION_ERRORS
- See Also:
- Constant Field Values
-
OAK_INDEXER_PIPELINED_MONGO_CONNECTION_RETRY_SECONDS
public static final java.lang.String OAK_INDEXER_PIPELINED_MONGO_CONNECTION_RETRY_SECONDS
- See Also:
- Constant Field Values
-
DEFAULT_OAK_INDEXER_PIPELINED_MONGO_CONNECTION_RETRY_SECONDS
public static final int DEFAULT_OAK_INDEXER_PIPELINED_MONGO_CONNECTION_RETRY_SECONDS
- See Also:
- Constant Field Values
-
OAK_INDEXER_PIPELINED_MONGO_REGEX_PATH_FILTERING
public static final java.lang.String OAK_INDEXER_PIPELINED_MONGO_REGEX_PATH_FILTERING
Whether to do path filtering in the Mongo query instead of doing a full traversal of the document store and filtering in the indexing job. This feature may significantly reduce the number of documents downloaded from Mongo. The performance gains may not be proportional to the reduction in the number of documents downloaded because Mongo still has to traverse all the documents. This is required because the regex expression used for path filtering starts with a wildcard (because the _id starts with the depth of the path, so the regex expression must ignore this part). Because of the wildcard at the start, Mongo cannot use of the index on _id.- See Also:
- Constant Field Values
-
DEFAULT_OAK_INDEXER_PIPELINED_MONGO_REGEX_PATH_FILTERING
public static final boolean DEFAULT_OAK_INDEXER_PIPELINED_MONGO_REGEX_PATH_FILTERING
- See Also:
- Constant Field Values
-
OAK_INDEXER_PIPELINED_MONGO_CUSTOM_EXCLUDE_ENTRIES_REGEX
public static final java.lang.String OAK_INDEXER_PIPELINED_MONGO_CUSTOM_EXCLUDE_ENTRIES_REGEX
Any document with a path that matches this regex pattern will not be downloaded. This pattern will be included in the Mongo query, that is, the filtering is done by server-side at Mongo, which avoids downloading the documents matching this query. This is typically a _suffix_, for example "/metadata.xml$|/renditions/.*.jpg$". To exclude subtrees such as /content/abc, use mongoFilterPaths instead.- See Also:
- Constant Field Values
-
DEFAULT_OAK_INDEXER_PIPELINED_MONGO_CUSTOM_EXCLUDE_ENTRIES_REGEX
public static final java.lang.String DEFAULT_OAK_INDEXER_PIPELINED_MONGO_CUSTOM_EXCLUDE_ENTRIES_REGEX
- See Also:
- Constant Field Values
-
OAK_INDEXER_PIPELINED_MONGO_REGEX_PATH_FILTERING_MAX_PATHS
public static final java.lang.String OAK_INDEXER_PIPELINED_MONGO_REGEX_PATH_FILTERING_MAX_PATHS
Maximum number of elements in the included/excluded paths list used for regex path filtering. If after merging and de-deduplication of the paths of all the path filters the number of included or excluded paths exceeds this value, then disable path filtering to avoid creating Mongo queries with large number of filters- See Also:
- Constant Field Values
-
DEFAULT_OAK_INDEXER_PIPELINED_MONGO_REGEX_PATH_FILTERING_MAX_PATHS
public static final int DEFAULT_OAK_INDEXER_PIPELINED_MONGO_REGEX_PATH_FILTERING_MAX_PATHS
- See Also:
- Constant Field Values
-
OAK_INDEXER_PIPELINED_MONGO_CUSTOM_EXCLUDED_PATHS
public static final java.lang.String OAK_INDEXER_PIPELINED_MONGO_CUSTOM_EXCLUDED_PATHS
Additional Oak paths to exclude from downloading from Mongo. This is a comma-separated list of paths. These paths are only filtered if the included paths computed from the indexes resolve to the root tree (/), otherwise the value of this property is ignored.- See Also:
- Constant Field Values
-
DEFAULT_OAK_INDEXER_PIPELINED_MONGO_CUSTOM_EXCLUDED_PATHS
public static final java.lang.String DEFAULT_OAK_INDEXER_PIPELINED_MONGO_CUSTOM_EXCLUDED_PATHS
- See Also:
- Constant Field Values
-
-
Constructor Detail
-
PipelinedMongoDownloadTask
public PipelinedMongoDownloadTask(com.mongodb.client.MongoDatabase mongoDatabase, MongoDocumentStore mongoDocStore, int maxBatchSizeBytes, int maxBatchNumberOfDocuments, java.util.concurrent.BlockingQueue<NodeDocument[]> queue, java.util.List<PathFilter> pathFilters, StatisticsProvider statisticsProvider, IndexingReporter reporter)
-
-
Method Detail
-
call
public PipelinedMongoDownloadTask.Result call() throws java.lang.Exception
- Specified by:
call
in interfacejava.util.concurrent.Callable<PipelinedMongoDownloadTask.Result>
- Throws:
java.lang.Exception
-
-