Class NodeIndexer


  • public class NodeIndexer
    extends Object
    Creates a lucene Document object from a Node.
    • Field Detail

      • DEFAULT_BOOST

        protected static final float DEFAULT_BOOST
        The default boost for a lucene field: 1.0f.
        See Also:
        Constant Field Values
      • node

        protected final NodeState node
        The NodeState of the node to index
      • stateProvider

        protected final ItemStateManager stateProvider
        The persistent item state provider
      • mappings

        protected final NamespaceMappings mappings
        Namespace mappings to use for indexing. This is the internal namespace mapping.
      • indexingConfig

        protected IndexingConfiguration indexingConfig
        The indexing configuration or null if none is available.
      • supportHighlighting

        protected boolean supportHighlighting
        If set to true the fulltext field is stored and and a term vector is created with offset information.
      • indexFormatVersion

        protected IndexFormatVersion indexFormatVersion
        Indicates index format for this node indexer.
      • doNotUseInExcerpt

        protected List<org.apache.lucene.document.Fieldable> doNotUseInExcerpt
        List of FieldNames.FULLTEXT fields which should not be used in an excerpt.
    • Constructor Detail

      • NodeIndexer

        public NodeIndexer​(NodeState node,
                           ItemStateManager stateProvider,
                           NamespaceMappings mappings,
                           Executor executor,
                           org.apache.tika.parser.Parser parser)
        Creates a new node indexer.
        Parameters:
        node - the node state to index.
        stateProvider - the persistent item state manager to retrieve properties.
        mappings - internal namespace mappings.
        executor - background task executor for text extraction
        parser - parser for binary properties
    • Method Detail

      • getNodeId

        public NodeId getNodeId()
        Returns the NodeId of the indexed node.
        Returns:
        the NodeId of the indexed node.
      • setSupportHighlighting

        public void setSupportHighlighting​(boolean b)
        If set to true additional information is stored in the index to support highlighting using the rep:excerpt pseudo property.
        Parameters:
        b - true to enable highlighting support.
      • setIndexFormatVersion

        public void setIndexFormatVersion​(IndexFormatVersion indexFormatVersion)
        Sets the index format version
        Parameters:
        indexFormatVersion - the index format version
      • setIndexingConfiguration

        public void setIndexingConfiguration​(IndexingConfiguration config)
        Sets the indexing configuration for this node indexer.
        Parameters:
        config - the indexing configuration.
      • getMaxExtractLength

        public int getMaxExtractLength()
        Returns the maximum number of characters to extract from binaries.
        Returns:
        maximum extraction length
      • setMaxExtractLength

        public void setMaxExtractLength​(int length)
        Sets the maximum number of characters to extract from binaries.
        Parameters:
        length - maximum extraction length
      • createDoc

        public org.apache.lucene.document.Document createDoc()
                                                      throws RepositoryException
        Creates a lucene Document.
        Returns:
        the lucene Document with the index layout.
        Throws:
        RepositoryException - if an error occurs while reading property values from the ItemStateProvider.
      • throwRepositoryException

        protected void throwRepositoryException​(Exception e)
                                         throws RepositoryException
        Wraps the exception e into a RepositoryException and throws the created exception.
        Parameters:
        e - the base exception.
        Throws:
        RepositoryException
      • addMVPName

        protected void addMVPName​(org.apache.lucene.document.Document doc,
                                  Name name)
        Adds a FieldNames.MVP field to doc with the resolved name using the internal search index namespace mapping.
        Parameters:
        doc - the lucene document.
        name - the name of the multi-value property.
      • addValue

        protected void addValue​(org.apache.lucene.document.Document doc,
                                InternalValue value,
                                Name name)
                         throws RepositoryException
        Adds a value to the lucene Document.
        Parameters:
        doc - the document.
        value - the internal jackrabbit value.
        name - the name of the property.
        Throws:
        RepositoryException
      • addValueProperty

        protected void addValueProperty​(org.apache.lucene.document.Document doc,
                                        InternalValue value,
                                        Name name,
                                        String fieldName)
                                 throws RepositoryException
        Adds a property related value to the lucene Document.
        Like length for indexed fields.
        Parameters:
        doc - the document.
        value - the internal jackrabbit value.
        name - the name of the property.
        Throws:
        RepositoryException
      • addPropertyName

        protected void addPropertyName​(org.apache.lucene.document.Document doc,
                                       Name name)
        Adds the property name to the lucene _:PROPERTIES_SET field.
        Parameters:
        doc - the document.
        name - the name of the property.
      • addBinaryValue

        protected void addBinaryValue​(org.apache.lucene.document.Document doc,
                                      String fieldName,
                                      InternalValue internalValue)
        Adds the binary value to the document as the named field.

        This implementation checks if this node is of type nt:resource and if that is the case, tries to extract text from the binary property using the parser.

        Parameters:
        doc - The document to which to add the field
        fieldName - The name of the field to add
        internalValue - The value for the field to add to the document.
      • getValue

        protected InternalValue getValue​(Name name)
                                  throws ItemStateException
        Utility method that extracts the first value of the named property of the current node. Returns null if the property does not exist or contains no values.
        Parameters:
        name - property name
        Returns:
        value of the named property, or null
        Throws:
        ItemStateException - if the property can not be accessed
      • addBooleanValue

        protected void addBooleanValue​(org.apache.lucene.document.Document doc,
                                       String fieldName,
                                       Object internalValue)
        Adds the string representation of the boolean value to the document as the named field.
        Parameters:
        doc - The document to which to add the field
        fieldName - The name of the field to add
        internalValue - The value for the field to add to the document.
      • createFieldWithoutNorms

        protected org.apache.lucene.document.Field createFieldWithoutNorms​(String fieldName,
                                                                           String internalValue,
                                                                           int propertyType)
        Creates a field of name fieldName with the value of internalValue. The created field is indexed without norms.
        Parameters:
        fieldName - The name of the field to add
        internalValue - The value for the field to add to the document.
        propertyType - the property type.
      • addCalendarValue

        protected void addCalendarValue​(org.apache.lucene.document.Document doc,
                                        String fieldName,
                                        Calendar internalValue)
        Adds the calendar value to the document as the named field. The calendar value is converted to an indexable string value using the DateField class.
        Parameters:
        doc - The document to which to add the field
        fieldName - The name of the field to add
        internalValue - The value for the field to add to the document.
      • addDoubleValue

        protected void addDoubleValue​(org.apache.lucene.document.Document doc,
                                      String fieldName,
                                      double internalValue)
        Adds the double value to the document as the named field. The double value is converted to an indexable string value using the DoubleField class.
        Parameters:
        doc - The document to which to add the field
        fieldName - The name of the field to add
        internalValue - The value for the field to add to the document.
      • addLongValue

        protected void addLongValue​(org.apache.lucene.document.Document doc,
                                    String fieldName,
                                    long internalValue)
        Adds the long value to the document as the named field. The long value is converted to an indexable string value using the LongField class.
        Parameters:
        doc - The document to which to add the field
        fieldName - The name of the field to add
        internalValue - The value for the field to add to the document.
      • addDecimalValue

        protected void addDecimalValue​(org.apache.lucene.document.Document doc,
                                       String fieldName,
                                       BigDecimal internalValue)
        Adds the long value to the document as the named field. The long value is converted to an indexable string value using the LongField class.
        Parameters:
        doc - The document to which to add the field
        fieldName - The name of the field to add
        internalValue - The value for the field to add to the document.
      • addReferenceValue

        protected void addReferenceValue​(org.apache.lucene.document.Document doc,
                                         String fieldName,
                                         NodeId internalValue,
                                         boolean weak)
        Adds the reference value to the document as the named field. The value's string representation is added as the reference data. Additionally the reference data is stored in the index. As of Jackrabbit 2.0 this method also adds the reference UUID as a FieldNames.WEAK_REFS field to the index if it is a weak reference.
        Parameters:
        doc - The document to which to add the field
        fieldName - The name of the field to add
        internalValue - The value for the field to add to the document.
        weak - Flag indicating whether it's a WEAKREFERENCE (true) or a REFERENCE (flase)
      • addPathValue

        protected void addPathValue​(org.apache.lucene.document.Document doc,
                                    String fieldName,
                                    Path internalValue)
        Adds the path value to the document as the named field. The path value is converted to an indexable string value using the name space mappings with which this class has been created.
        Parameters:
        doc - The document to which to add the field
        fieldName - The name of the field to add
        internalValue - The value for the field to add to the document.
      • addURIValue

        protected void addURIValue​(org.apache.lucene.document.Document doc,
                                   String fieldName,
                                   URI internalValue)
        Adds the uri value to the document as the named field.
        Parameters:
        doc - The document to which to add the field
        fieldName - The name of the field to add
        internalValue - The value for the field to add to the document.
      • addStringValue

        protected void addStringValue​(org.apache.lucene.document.Document doc,
                                      String fieldName,
                                      String internalValue)
        Adds the string value to the document both as the named field and for full text indexing.
        Parameters:
        doc - The document to which to add the field
        fieldName - The name of the field to add
        internalValue - The value for the field to add to the document.
      • addStringValue

        protected void addStringValue​(org.apache.lucene.document.Document doc,
                                      String fieldName,
                                      String internalValue,
                                      boolean tokenized)
        Adds the string value to the document both as the named field and optionally for full text indexing if tokenized is true.
        Parameters:
        doc - The document to which to add the field
        fieldName - The name of the field to add
        internalValue - The value for the field to add to the document.
        tokenized - If true the string is also tokenized and fulltext indexed.
      • addStringValue

        protected void addStringValue​(org.apache.lucene.document.Document doc,
                                      String fieldName,
                                      String internalValue,
                                      boolean tokenized,
                                      boolean includeInNodeIndex,
                                      float boost)
        Adds the string value to the document both as the named field and optionally for full text indexing if tokenized is true.
        Parameters:
        doc - The document to which to add the field
        fieldName - The name of the field to add
        internalValue - The value for the field to add to the document.
        tokenized - If true the string is also tokenized and fulltext indexed.
        includeInNodeIndex - If true the string is also tokenized and added to the node scope fulltext index.
        boost - the boost value for this string field.
      • addStringValue

        protected void addStringValue​(org.apache.lucene.document.Document doc,
                                      String fieldName,
                                      String internalValue,
                                      boolean tokenized,
                                      boolean includeInNodeIndex,
                                      float boost,
                                      boolean useInExcerpt)
        Adds the string value to the document both as the named field and optionally for full text indexing if tokenized is true.
        Parameters:
        doc - The document to which to add the field
        fieldName - The name of the field to add
        internalValue - The value for the field to add to the document.
        tokenized - If true the string is also tokenized and fulltext indexed.
        includeInNodeIndex - If true the string is also tokenized and added to the node scope fulltext index.
        boost - the boost value for this string field.
        useInExcerpt - If true the string may show up in an excerpt.
      • addNameValue

        protected void addNameValue​(org.apache.lucene.document.Document doc,
                                    String fieldName,
                                    Name internalValue)
        Adds the name value to the document as the named field. The name value is converted to an indexable string treating the internal value as a Name and mapping the name space using the name space mappings with which this class has been created.
        Parameters:
        doc - The document to which to add the field
        fieldName - The name of the field to add
        internalValue - The value for the field to add to the document.
      • createFulltextField

        protected org.apache.lucene.document.Field createFulltextField​(String value,
                                                                       boolean store,
                                                                       boolean withOffsets)
        Creates a fulltext field for the string value.
        Parameters:
        value - the string value.
        store - if the value of the field should be stored.
        withOffsets - if a term vector with offsets should be stored.
        Returns:
        a lucene field.
      • createFulltextField

        protected org.apache.lucene.document.Field createFulltextField​(String value,
                                                                       boolean store,
                                                                       boolean withOffsets,
                                                                       boolean withNorms)
        Creates a fulltext field for the string value.
        Parameters:
        value - the string value.
        store - if the value of the field should be stored.
        withOffsets - if a term vector with offsets should be stored.
        withNorms - if norm information should be added for this value
        Returns:
        a lucene field.
      • createFulltextField

        protected org.apache.lucene.document.Fieldable createFulltextField​(InternalValue value,
                                                                           org.apache.tika.metadata.Metadata metadata)
        Creates a fulltext field for the reader value.
        Parameters:
        value - the binary value
        metadata - document metatadata
        Returns:
        a lucene field.
      • createFulltextField

        protected org.apache.lucene.document.Fieldable createFulltextField​(InternalValue value,
                                                                           org.apache.tika.metadata.Metadata metadata,
                                                                           boolean withNorms)
        Creates a fulltext field for the reader value.
        Parameters:
        value - the binary value
        metadata - document metatadata
        withNorms - if norm information should be added for this value
        Returns:
        a lucene field.
      • isIndexed

        protected boolean isIndexed​(Name propertyName)
        Returns true if the property with the given name should be indexed. The default is to index all properties unless explicit indexing configuration is specified. The jcr:primaryType and jcr:mixinTypes properties are always indexed for correct node type resolution in queries.
        Parameters:
        propertyName - name of a property.
        Returns:
        true if the property should be indexed; false otherwise.
      • isIncludedInNodeIndex

        protected boolean isIncludedInNodeIndex​(Name propertyName)
        Returns true if the property with the given name should also be added to the node scope index.
        Parameters:
        propertyName - the name of a property.
        Returns:
        true if it should be added to the node scope index; false otherwise.
      • useInExcerpt

        protected boolean useInExcerpt​(Name propertyName)
        Returns true if the content of the property with the given name should the used to create an excerpt.
        Parameters:
        propertyName - the name of a property.
        Returns:
        true if it should be used to create an excerpt; false otherwise.
      • isSupportedMediaType

        protected boolean isSupportedMediaType​(String type)
        Returns true if the provided type is among the types supported by the Tika parser we are using.
        Parameters:
        type - the type to check.
        Returns:
        whether the type is supported by the Tika parser we are using.
      • getPropertyBoost

        protected float getPropertyBoost​(Name propertyName)
        Returns the boost value for the given property name.
        Parameters:
        propertyName - the name of a property.
        Returns:
        the boost value for the given property name.
      • getNodeBoost

        protected float getNodeBoost()
        Returns:
        the boost value for this node state.
      • addLength

        protected void addLength​(org.apache.lucene.document.Document doc,
                                 String propertyName,
                                 InternalValue value)
        Adds a FieldNames.PROPERTY_LENGTHS field to document with a named length value.
        Parameters:
        doc - the lucene document.
        propertyName - the property name.
        value - the internal value.
      • addNodeName

        protected void addNodeName​(org.apache.lucene.document.Document doc,
                                   String namespaceURI,
                                   String localName)
                            throws NamespaceException
        Depending on the index format version adds one or two fields to the document for the node name.
        Parameters:
        doc - the lucene document.
        namespaceURI - the namespace URI of the node name.
        localName - the local name of the node.
        Throws:
        NamespaceException
      • addParentChildRelation

        protected void addParentChildRelation​(org.apache.lucene.document.Document doc,
                                              NodeId parentId)
                                       throws ItemStateException,
                                              RepositoryException
        Adds a parent child relation to the given doc.
        Parameters:
        doc - the document.
        parentId - the id of the parent node.
        Throws:
        ItemStateException - if the parent node cannot be read.
        RepositoryException - if the parent node does not have a child node entry for the current node.