Class SearchIndex

    • Field Detail

      • VALID_SYSTEM_INDEX_NODE_TYPE_NAMES

        public static final Collection<Name> VALID_SYSTEM_INDEX_NODE_TYPE_NAMES
        Valid node type names under /jcr:system. Used to determine if a query needs to be executed also against the /jcr:system tree.
      • DEFAULT_EXTRACTOR_POOL_SIZE

        public static final int DEFAULT_EXTRACTOR_POOL_SIZE
        Deprecated.
        this value is not used anymore. Instead the default value is calculated as follows: 2 * Runtime.getRuntime().availableProcessors().
        The default value for property extractorPoolSize.
        See Also:
        Constant Field Values
      • DEFAULT_EXTRACTOR_TIMEOUT

        public static final long DEFAULT_EXTRACTOR_TIMEOUT
        The default timeout in milliseconds which is granted to the text extraction process until fulltext indexing is deferred to a background thread.
        See Also:
        Constant Field Values
      • PATH_FACTORY

        protected static final PathFactory PATH_FACTORY
        The path factory.
      • ROOT_PATH

        protected static final Path ROOT_PATH
        The path of the root node.
      • JCR_SYSTEM_PATH

        protected static final Path JCR_SYSTEM_PATH
        The path /jcr:system.
      • index

        protected MultiIndex index
        The actual index
    • Constructor Detail

      • SearchIndex

        public SearchIndex()
    • Method Detail

      • doInit

        protected void doInit()
                       throws IOException
        Initializes this QueryHandler. This implementation requires that a path parameter is set in the configuration. If this condition is not met, a IOException is thrown.
        Specified by:
        doInit in class AbstractQueryHandler
        Throws:
        IOException - if an error occurs while initializing this handler.
      • deleteNode

        public void deleteNode​(NodeId id)
                        throws IOException
        Removes the node with id from the search index.
        Parameters:
        id - the id of the node to remove from the index.
        Throws:
        IOException - if an error occurs while removing the node from the index.
      • createExecutableQuery

        public ExecutableQuery createExecutableQuery​(SessionContext sessionContext,
                                                     String statement,
                                                     String language)
                                              throws InvalidQueryException
        Creates a new query by specifying the query statement itself and the language in which the query is stated. If the query statement is syntactically invalid, given the language specified, an InvalidQueryException is thrown. language must specify a query language string from among those returned by QueryManager.getSupportedQueryLanguages(); if it is not then an InvalidQueryException is thrown.
        Parameters:
        sessionContext - component context of the current session
        statement - the query statement.
        language - the syntax of the query statement.
        Returns:
        A Query object.
        Throws:
        InvalidQueryException - if statement is invalid or language is unsupported.
      • getWeaklyReferringNodes

        public Iterable<NodeId> getWeaklyReferringNodes​(NodeId id)
                                                 throws RepositoryException,
                                                        IOException
        Returns the ids of the nodes that refer to the node with id by weak references.
        Parameters:
        id - the id of the target node.
        Returns:
        the ids of the referring nodes.
        Throws:
        RepositoryException - if an error occurs.
        IOException - if an error occurs while reading from the index.
      • getQueryNodeFactory

        protected DefaultQueryNodeFactory getQueryNodeFactory()
        This method returns the QueryNodeFactory used to parse Queries. This method may be overridden to provide a customized QueryNodeFactory
        Returns:
        the query node factory.
      • flush

        public void flush()
                   throws RepositoryException
        Waits until all pending text extraction tasks have been processed and the updated index has been flushed to disk.
        Throws:
        RepositoryException - if the index update can not be written
      • executeQuery

        public MultiColumnQueryHits executeQuery​(SessionImpl session,
                                                 AbstractQueryImpl queryImpl,
                                                 org.apache.lucene.search.Query query,
                                                 Path[] orderProps,
                                                 boolean[] orderSpecs,
                                                 String[] orderFuncs,
                                                 long resultFetchHint)
                                          throws IOException
        Executes the query on the search index.
        Parameters:
        session - the session that executes the query.
        queryImpl - the query impl.
        query - the lucene query.
        orderProps - name of the properties for sort order.
        orderSpecs - the order specs for the sort order properties. true indicates ascending order, false indicates descending.
        orderFuncs - functions for the properties for sort order.
        resultFetchHint - a hint on how many results should be fetched. @return the query hits.
        Throws:
        IOException - if an error occurs while searching the index.
      • executeQuery

        public MultiColumnQueryHits executeQuery​(SessionImpl session,
                                                 MultiColumnQuery query,
                                                 Ordering[] orderings,
                                                 long resultFetchHint)
                                          throws IOException
        Executes the query on the search index.
        Parameters:
        session - the session that executes the query.
        query - the query.
        orderings - the order specs for the sort order.
        resultFetchHint - a hint on how many results should be fetched.
        Returns:
        the query hits.
        Throws:
        IOException - if an error occurs while searching the index.
      • createExcerptProvider

        public ExcerptProvider createExcerptProvider​(org.apache.lucene.search.Query query)
                                              throws IOException
        Creates an excerpt provider for the given query.
        Parameters:
        query - the query.
        Returns:
        an excerpt provider for the given query.
        Throws:
        IOException - if the provider cannot be created.
      • getTextAnalyzer

        public org.apache.lucene.analysis.Analyzer getTextAnalyzer()
        Returns the analyzer in use for indexing.
        Returns:
        the analyzer in use for indexing.
      • getTikaConfigPath

        public String getTikaConfigPath()
        Returns the path of the Tika configuration used for text extraction.
        Returns:
        path of the Tika configuration file
      • setTikaConfigPath

        public void setTikaConfigPath​(String tikaConfigPath)
        Sets the path of the Tika configuration used for text extraction. The path can be either a file system or a class resource path. The default setting is the tika-config.xml class resource relative to org.apache.core.query.lucene.
        Parameters:
        tikaConfigPath - path of the Tika configuration file
      • getForkJavaCommand

        public String getForkJavaCommand()
        Returns the java command used to fork external parser processes, or null (the default) for in-process text extraction.
        Returns:
        fork java command
      • setForkJavaCommand

        public void setForkJavaCommand​(String command)
        Sets the java command used to fork external parser processes.
        Parameters:
        command - fork java command, or null for in-process extraction
      • getParser

        public org.apache.tika.parser.Parser getParser()
        Returns the parser used for extracting text content from binary properties for full text indexing.
        Returns:
        the configured parser
      • getNamespaceMappings

        public NamespaceMappings getNamespaceMappings()
        Returns the namespace mappings for the internal representation.
        Returns:
        the namespace mappings for the internal representation.
      • getIndexingConfig

        public IndexingConfiguration getIndexingConfig()
        Returns:
        the indexing configuration or null if there is none.
      • getSynonymProvider

        public SynonymProvider getSynonymProvider()
        Returns:
        the synonym provider of this search index. If none is set for this search index the synonym provider of the parent handler is returned if there is any.
      • getSpellChecker

        public SpellChecker getSpellChecker()
        Returns:
        the spell checker of this search index. If none is configured this method returns null.
      • getSimilarity

        public org.apache.lucene.search.Similarity getSimilarity()
        Returns:
        the similarity, which should be used for indexing and searching.
      • getIndexReader

        public org.apache.lucene.index.IndexReader getIndexReader()
                                                           throws IOException
        Returns an index reader for this search index. The caller of this method is responsible for closing the index reader when he is finished using it.
        Returns:
        an index reader for this search index.
        Throws:
        IOException - the index reader cannot be obtained.
      • getIndexFormatVersion

        public IndexFormatVersion getIndexFormatVersion()
        Returns the index format version that this search index is able to support when a query is executed on this index.
        Returns:
        the index format version for this search index.
      • getDirectoryManager

        public DirectoryManager getDirectoryManager()
        Returns:
        the directory manager for this search index.
      • getRedoLogFactory

        public RedoLogFactory getRedoLogFactory()
        Returns:
        the redo log factory for this search index.
      • runConsistencyCheck

        public ConsistencyCheck runConsistencyCheck()
                                             throws IOException
        Runs a consistency check on this search index.
        Returns:
        the result of the consistency check.
        Throws:
        IOException - if an error occurs while running the check.
      • getIndexReader

        protected org.apache.lucene.index.IndexReader getIndexReader​(boolean includeSystemIndex)
                                                              throws IOException
        Returns an index reader for this search index. The caller of this method is responsible for closing the index reader when he is finished using it.
        Parameters:
        includeSystemIndex - if true the index reader will cover the complete workspace. If false the returned index reader will not contains any nodes under /jcr:system.
        Returns:
        an index reader for this search index.
        Throws:
        IOException - the index reader cannot be obtained.
      • createSortFields

        protected org.apache.lucene.search.SortField[] createSortFields​(Path[] orderProps,
                                                                        boolean[] orderSpecs,
                                                                        String[] orderFuncs)
        Creates the SortFields for the order properties.
        Parameters:
        orderProps - the order properties.
        orderSpecs - the order specs for the properties.
        orderFuncs - the functions for the properties.
        Returns:
        an array of sort fields
      • createOrderings

        protected Ordering[] createOrderings​(OrderingImpl[] orderings)
                                      throws RepositoryException
        Creates internal orderings for the QOM ordering specifications.
        Parameters:
        orderings - the QOM ordering specifications.
        Returns:
        the internal orderings.
        Throws:
        RepositoryException - if an error occurs.
      • createDocument

        protected org.apache.lucene.document.Document createDocument​(NodeState node,
                                                                     NamespaceMappings nsMappings,
                                                                     IndexFormatVersion indexFormatVersion)
                                                              throws RepositoryException
        Creates a lucene Document for a node state using the namespace mappings nsMappings.
        Parameters:
        node - the node state to index.
        nsMappings - the namespace mappings of the search index.
        indexFormatVersion - the index format version that should be used to index the passed node state.
        Returns:
        a lucene Document that contains all properties of node.
        Throws:
        RepositoryException - if an error occurs while indexing the node.
      • getIndex

        protected MultiIndex getIndex()
        Returns the actual index.
        Returns:
        the actual index.
      • getSortComparatorSource

        protected SharedFieldComparatorSource getSortComparatorSource()
        Returns:
        the field comparator source for this index.
      • createIndexingConfiguration

        protected IndexingConfiguration createIndexingConfiguration​(NamespaceMappings namespaceMappings)
        Parameters:
        namespaceMappings - The namespace mappings
        Returns:
        the fulltext indexing configuration or null if there is no configuration.
      • createSynonymProvider

        protected SynonymProvider createSynonymProvider()
        Returns:
        the configured synonym provider or null if none is configured or an error occurs.
      • createSynonymProviderConfigResource

        protected FileSystemResource createSynonymProviderConfigResource()
                                                                  throws FileSystemException,
                                                                         IOException
        Creates a file system resource to the synonym provider configuration.
        Returns:
        a file system resource or null if no path was configured.
        Throws:
        FileSystemException - if an exception occurs accessing the file system.
        IOException - if another exception occurs.
      • createSpellChecker

        protected SpellChecker createSpellChecker()
        Creates a spell checker for this query handler.
        Returns:
        the spell checker or null if none is configured or an error occurs.
      • getIndexingConfigurationDOM

        protected Element getIndexingConfigurationDOM()
        Returns the document element of the indexing configuration or null if there is no indexing configuration.
        Returns:
        the indexing configuration or null if there is none.
      • mergeAggregatedNodeIndexes

        protected void mergeAggregatedNodeIndexes​(NodeState state,
                                                  org.apache.lucene.document.Document doc,
                                                  IndexFormatVersion ifv)
        Merges the fulltext indexed fields of the aggregated node states into doc.
        Parameters:
        state - the node state on which doc was created.
        doc - the lucene document with index fields from state.
        ifv - the current index format version.
      • retrieveAggregateRoot

        protected void retrieveAggregateRoot​(NodeState state,
                                             Map<NodeId,​NodeState> aggregates)
        Retrieves the root of the indexing aggregate for state and puts it into aggregates map.
        Parameters:
        state - the node state for which we want to retrieve the aggregate root.
        aggregates - aggregate roots are collected in this map.
      • retrieveAggregateRoot

        protected void retrieveAggregateRoot​(Set<NodeId> removedIds,
                                             Map<NodeId,​NodeState> aggregates)
        Retrieves the root of the indexing aggregate for removedIds and puts it into map.
        Parameters:
        removedIds - the ids of removed nodes.
        aggregates - aggregate roots are collected in this map
      • setAnalyzer

        public void setAnalyzer​(String analyzerClassName)
        Sets the default analyzer in use for indexing. The given analyzer class name must satisfy the following conditions:
        • the class must exist in the class path
        • the class must have a public default constructor, or a constructor that takes a Lucene Version argument
        • the class must be a Lucene Analyzer

        If the above conditions are met, then a new instance of the class is set as the analyzer. Otherwise a warning is logged and the current analyzer is not changed.

        This property setter method is normally invoked by the Jackrabbit configuration mechanism if the "analyzer" parameter is set in the search configuration.

        Parameters:
        analyzerClassName - the analyzer class name
      • getAnalyzer

        public String getAnalyzer()
        Returns the class name of the default analyzer that is currently in use.
        Returns:
        class name of analyzer in use.
      • setPath

        public void setPath​(String path)
        Sets the location of the search index.
        Parameters:
        path - the location of the search index.
      • getPath

        public String getPath()
        Returns the location of the search index. Returns null if not set.
        Returns:
        the location of the search index.
      • setUseCompoundFile

        public void setUseCompoundFile​(boolean b)
        The lucene index writer property: useCompoundFile
      • getUseCompoundFile

        public boolean getUseCompoundFile()
        Returns the current value for useCompoundFile.
        Returns:
        the current value for useCompoundFile.
      • setMinMergeDocs

        public void setMinMergeDocs​(int minMergeDocs)
        The lucene index writer property: minMergeDocs
      • getMinMergeDocs

        public int getMinMergeDocs()
        Returns the current value for minMergeDocs.
        Returns:
        the current value for minMergeDocs.
      • setVolatileIdleTime

        public void setVolatileIdleTime​(int volatileIdleTime)
        Sets the property: volatileIdleTime
        Parameters:
        volatileIdleTime - idle time in seconds
      • getVolatileIdleTime

        public int getVolatileIdleTime()
        Returns the current value for volatileIdleTime.
        Returns:
        the current value for volatileIdleTime.
      • setMaxMergeDocs

        public void setMaxMergeDocs​(int maxMergeDocs)
        The lucene index writer property: maxMergeDocs
      • getMaxMergeDocs

        public int getMaxMergeDocs()
        Returns the current value for maxMergeDocs.
        Returns:
        the current value for maxMergeDocs.
      • setMergeFactor

        public void setMergeFactor​(int mergeFactor)
        The lucene index writer property: mergeFactor
      • getMergeFactor

        public int getMergeFactor()
        Returns the current value for the merge factor.
        Returns:
        the current value for the merge factor.
      • setBufferSize

        public void setBufferSize​(int size)
        See Also:
        VolatileIndex.setBufferSize(int)
      • getBufferSize

        public int getBufferSize()
        Returns the current value for the buffer size.
        Returns:
        the current value for the buffer size.
      • setRespectDocumentOrder

        public void setRespectDocumentOrder​(boolean docOrder)
      • getRespectDocumentOrder

        public boolean getRespectDocumentOrder()
      • setForceConsistencyCheck

        public void setForceConsistencyCheck​(boolean b)
      • getForceConsistencyCheck

        public boolean getForceConsistencyCheck()
      • setAutoRepair

        public void setAutoRepair​(boolean b)
      • getAutoRepair

        public boolean getAutoRepair()
      • setCacheSize

        public void setCacheSize​(int size)
      • getCacheSize

        public int getCacheSize()
      • setMaxFieldLength

        public void setMaxFieldLength​(int length)
      • getMaxFieldLength

        public int getMaxFieldLength()
      • setMaxExtractLength

        public void setMaxExtractLength​(int length)
      • getMaxExtractLength

        public int getMaxExtractLength()
      • setTextFilterClasses

        public void setTextFilterClasses​(String filterClasses)
        Deprecated.
        Sets the list of text extractors (and text filters) to use for extracting text content from binary properties. The list must be comma (or whitespace) separated, and contain fully qualified class names of the TextExtractor (and org.apache.jackrabbit.core.query.TextFilter) classes to be used. The configured classes must all have a public default constructor.
        Parameters:
        filterClasses - comma separated list of class names
      • getTextFilterClasses

        public String getTextFilterClasses()
        Deprecated.
        Returns the fully qualified class names of the text filter instances currently in use. The names are comma separated.
        Returns:
        class names of the text filters in use.
      • setResultFetchSize

        public void setResultFetchSize​(int size)
        Tells the query handler how many result should be fetched initially when a query is executed.
        Parameters:
        size - the number of results to fetch initially.
      • getResultFetchSize

        public int getResultFetchSize()
        Returns:
        the number of results the query handler will fetch initially when a query is executed.
      • setExtractorPoolSize

        public void setExtractorPoolSize​(int numThreads)
        The number of background threads for the extractor pool.
        Parameters:
        numThreads - the number of threads.
      • getExtractorPoolSize

        public int getExtractorPoolSize()
        Returns:
        the size of the thread pool which is used to run the text extractors when binary content is indexed.
      • setExtractorBackLogSize

        public void setExtractorBackLogSize​(int backLog)
        The number of extractor jobs that are queued until a new job is executed with the current thread instead of using the thread pool.
        Parameters:
        backLog - size of the extractor job queue.
      • getExtractorBackLogSize

        public int getExtractorBackLogSize()
        Returns:
        the size of the extractor queue back log.
      • setExtractorTimeout

        public void setExtractorTimeout​(long timeout)
        The timeout in milliseconds which is granted to the text extraction process until fulltext indexing is deferred to a background thread.
        Parameters:
        timeout - the timeout in milliseconds.
      • getExtractorTimeout

        public long getExtractorTimeout()
        Returns:
        the extractor timeout in milliseconds.
      • setSizeEstimate

        public void setSizeEstimate​(boolean b)
        If enabled, NodeIterator.getSize() may report a larger value than the actual result. This value may shrink when the query result encounters non-existing nodes or the session does not have access to a node. This might be a security problem.
        Parameters:
        b - true to enable
      • getSizeEstimate

        public boolean getSizeEstimate()
        Get the size estimate setting.
        Returns:
        the setting
      • setSupportHighlighting

        public void setSupportHighlighting​(boolean b)
        If set to true additional information is stored in the index to support highlighting using the rep:excerpt pseudo property.
        Parameters:
        b - true to enable highlighting support.
      • getSupportHighlighting

        public boolean getSupportHighlighting()
        Returns:
        true if highlighting support is enabled.
      • setExcerptProviderClass

        public void setExcerptProviderClass​(String className)
        Sets the class name for the ExcerptProvider that should be used for the rep:excerpt pseudo property in a query.
        Parameters:
        className - the name of a class that implements ExcerptProvider.
      • getExcerptProviderClass

        public String getExcerptProviderClass()
        Returns:
        the class name of the excerpt provider implementation.
      • setIndexingConfiguration

        public void setIndexingConfiguration​(String path)
        Sets the path to the indexing configuration file.
        Parameters:
        path - the path to the configuration file.
      • getIndexingConfiguration

        public String getIndexingConfiguration()
        Returns:
        the path to the indexing configuration file.
      • setIndexingConfigurationClass

        public void setIndexingConfigurationClass​(String className)
        Sets the name of the class that implements IndexingConfiguration. The default value is org.apache.jackrabbit.core.query.lucene.IndexingConfigurationImpl.
        Parameters:
        className - the name of the class that implements IndexingConfiguration.
      • getIndexingConfigurationClass

        public String getIndexingConfigurationClass()
        Returns:
        the class name of the indexing configuration implementation.
      • setSynonymProviderClass

        public void setSynonymProviderClass​(String className)
        Sets the name of the class that implements SynonymProvider. The default value is null (none set).
        Parameters:
        className - name of the class that implements SynonymProvider.
      • getSynonymProviderClass

        public String getSynonymProviderClass()
        Returns:
        the class name of the synonym provider implementation or null if none is set.
      • setSpellCheckerClass

        public void setSpellCheckerClass​(String className)
        Sets the name of the class that implements SpellChecker. The default value is null (none set).
        Parameters:
        className - name of the class that implements SpellChecker.
      • getSpellCheckerClass

        public String getSpellCheckerClass()
        Returns:
        the class name of the spell checker implementation or null if none is set.
      • setEnableConsistencyCheck

        public void setEnableConsistencyCheck​(boolean b)
        Enables or disables the consistency check on startup. Consistency checks are disabled per default.
        Parameters:
        b - true enables consistency checks.
        See Also:
        setForceConsistencyCheck(boolean)
      • getEnableConsistencyCheck

        public boolean getEnableConsistencyCheck()
        Returns:
        true if consistency checks are enabled.
      • setSynonymProviderConfigPath

        public void setSynonymProviderConfigPath​(String path)
        Sets the configuration path for the synonym provider.
        Parameters:
        path - the configuration path for the synonym provider.
      • getSynonymProviderConfigPath

        public String getSynonymProviderConfigPath()
        Returns:
        the configuration path for the synonym provider. If none is set this method returns null.
      • setSimilarityClass

        public void setSimilarityClass​(String className)
        Sets the similarity implementation, which will be used for indexing and searching. The implementation must extend Similarity.
        Parameters:
        className - a Similarity implementation.
      • getSimilarityClass

        public String getSimilarityClass()
        Returns:
        the name of the similarity class.
      • setMaxVolatileIndexSize

        public void setMaxVolatileIndexSize​(long maxVolatileIndexSize)
        Sets a new maxVolatileIndexSize value.
        Parameters:
        maxVolatileIndexSize - the new value.
      • getMaxVolatileIndexSize

        public long getMaxVolatileIndexSize()
        Returns:
        the maxVolatileIndexSize in bytes.
      • getDirectoryManagerClass

        public String getDirectoryManagerClass()
        Returns:
        the name of the directory manager class.
      • setDirectoryManagerClass

        public void setDirectoryManagerClass​(String className)
        Sets name of the directory manager class. The class must implement DirectoryManager.
        Parameters:
        className - the name of the class that implements directory manager.
      • setUseSimpleFSDirectory

        public void setUseSimpleFSDirectory​(boolean useSimpleFSDirectory)
        If set true will indicate to the DirectoryManager to use the SimpleFSDirectory.
        Parameters:
        useSimpleFSDirectory - whether to use SimpleFSDirectory or automatically pick an implementation based on the current platform.
      • isUseSimpleFSDirectory

        public boolean isUseSimpleFSDirectory()
        Returns:
        true if the DirectoryManager should use the SimpleFSDirectory.
      • getTermInfosIndexDivisor

        public int getTermInfosIndexDivisor()
        Returns:
        the current value for termInfosIndexDivisor.
      • setTermInfosIndexDivisor

        public void setTermInfosIndexDivisor​(int termInfosIndexDivisor)
        Sets a new value for termInfosIndexDivisor.
        Parameters:
        termInfosIndexDivisor - the new value.
      • isInitializeHierarchyCache

        public boolean isInitializeHierarchyCache()
        Returns:
        true if the hierarchy cache should be initialized immediately on startup.
      • setInitializeHierarchyCache

        public void setInitializeHierarchyCache​(boolean initializeHierarchyCache)
        Whether the hierarchy cache should be initialized immediately on startup.
        Parameters:
        initializeHierarchyCache - true if the cache should be initialized immediately.
      • getMaxHistoryAge

        public long getMaxHistoryAge()
        Returns:
        the maximum age in seconds for outdated generations of IndexInfos.
      • setMaxHistoryAge

        public void setMaxHistoryAge​(long maxHistoryAge)
        Sets a new value for the maximum age in seconds for outdated generations of IndexInfos.
        Parameters:
        maxHistoryAge - age in seconds.
      • getRedoLogFactoryClass

        public String getRedoLogFactoryClass()
        Returns:
        the name of the redo log factory class.
      • setRedoLogFactoryClass

        public void setRedoLogFactoryClass​(String className)
        Sets the name of the redo log factory class. Must implement RedoLogFactory.
        Parameters:
        className - the name of the redo log factory class.
      • checkOpen

        protected void checkOpen()
                          throws IOException
        Checks if this SearchIndex is open, otherwise throws an IOException.
        Throws:
        IOException - if this SearchIndex had been closed.