Class NodeIndexer
- java.lang.Object
-
- org.apache.jackrabbit.core.query.lucene.NodeIndexer
-
-
Field Summary
Fields Modifier and Type Field Description protected static float
DEFAULT_BOOST
The default boost for a lucene field: 1.0f.protected List<Fieldable>
doNotUseInExcerpt
List ofFieldNames.FULLTEXT
fields which should not be used in an excerpt.protected IndexFormatVersion
indexFormatVersion
Indicates index format for this node indexer.protected IndexingConfiguration
indexingConfig
The indexing configuration ornull
if none is available.protected NamespaceMappings
mappings
Namespace mappings to use for indexing.protected NodeState
node
TheNodeState
of the node to indexprotected NamePathResolver
resolver
Name and Path resolver.protected ItemStateManager
stateProvider
The persistent item state providerprotected boolean
supportHighlighting
If set totrue
the fulltext field is stored and and a term vector is created with offset information.
-
Constructor Summary
Constructors Constructor Description NodeIndexer(NodeState node, ItemStateManager stateProvider, NamespaceMappings mappings, Executor executor, org.apache.tika.parser.Parser parser)
Creates a new node indexer.
-
Method Summary
All Methods Instance Methods Concrete Methods Deprecated Methods Modifier and Type Method Description protected void
addBinaryValue(Document doc, String fieldName, InternalValue internalValue)
Adds the binary value to the document as the named field.protected void
addBooleanValue(Document doc, String fieldName, Object internalValue)
Adds the string representation of the boolean value to the document as the named field.protected void
addCalendarValue(Document doc, String fieldName, Calendar internalValue)
Adds the calendar value to the document as the named field.protected void
addDecimalValue(Document doc, String fieldName, BigDecimal internalValue)
Adds the long value to the document as the named field.protected void
addDoubleValue(Document doc, String fieldName, double internalValue)
Adds the double value to the document as the named field.protected void
addLength(Document doc, String propertyName, InternalValue value)
Adds aFieldNames.PROPERTY_LENGTHS
field todocument
with a named length value.protected void
addLongValue(Document doc, String fieldName, long internalValue)
Adds the long value to the document as the named field.protected void
addMVPName(Document doc, Name name)
Adds aFieldNames.MVP
field todoc
with the resolvedname
using the internal search index namespace mapping.protected void
addNameValue(Document doc, String fieldName, Name internalValue)
Adds the name value to the document as the named field.protected void
addNodeName(Document doc, String namespaceURI, String localName)
Depending on the index format version adds one or two fields to the document for the node name.protected void
addParentChildRelation(Document doc, NodeId parentId)
Adds a parent child relation to the givendoc
.protected void
addPathValue(Document doc, String fieldName, Path internalValue)
Adds the path value to the document as the named field.protected void
addPropertyName(Document doc, Name name)
Adds the property name to the lucene _:PROPERTIES_SET field.protected void
addReferenceValue(Document doc, String fieldName, NodeId internalValue, boolean weak)
Adds the reference value to the document as the named field.protected void
addStringValue(Document doc, String fieldName, String internalValue)
Deprecated.UseaddStringValue(Document, String, Object, boolean)
instead.protected void
addStringValue(Document doc, String fieldName, String internalValue, boolean tokenized)
Adds the string value to the document both as the named field and optionally for full text indexing iftokenized
istrue
.protected void
addStringValue(Document doc, String fieldName, String internalValue, boolean tokenized, boolean includeInNodeIndex, float boost)
Deprecated.protected void
addStringValue(Document doc, String fieldName, String internalValue, boolean tokenized, boolean includeInNodeIndex, float boost, boolean useInExcerpt)
Adds the string value to the document both as the named field and optionally for full text indexing iftokenized
istrue
.protected void
addURIValue(Document doc, String fieldName, URI internalValue)
Adds the uri value to the document as the named field.protected void
addValue(Document doc, InternalValue value, Name name)
Adds a value to the lucene Document.protected void
addValueProperty(Document doc, InternalValue value, Name name, String fieldName)
Adds a property related value to the lucene Document.Document
createDoc()
Creates a lucene Document.protected Field
createFieldWithoutNorms(String fieldName, String internalValue, int propertyType)
Creates a field of namefieldName
with the value ofinternalValue
.protected Field
createFulltextField(String value)
Deprecated.protected Field
createFulltextField(String value, boolean store, boolean withOffsets)
Deprecated.protected Field
createFulltextField(String value, boolean store, boolean withOffsets, boolean withNorms)
Creates a fulltext field for the stringvalue
.protected Fieldable
createFulltextField(InternalValue value, org.apache.tika.metadata.Metadata metadata)
Deprecated.protected Fieldable
createFulltextField(InternalValue value, org.apache.tika.metadata.Metadata metadata, boolean withNorms)
Creates a fulltext field for the readervalue
.int
getMaxExtractLength()
Returns the maximum number of characters to extract from binaries.protected float
getNodeBoost()
NodeId
getNodeId()
Returns theNodeId
of the indexed node.protected float
getPropertyBoost(Name propertyName)
Returns the boost value for the given property name.protected InternalValue
getValue(Name name)
Utility method that extracts the first value of the named property of the current node.protected boolean
isIncludedInNodeIndex(Name propertyName)
Returnstrue
if the property with the given name should also be added to the node scope index.protected boolean
isIndexed(Name propertyName)
Returnstrue
if the property with the given name should be indexed.protected boolean
isSupportedMediaType(String type)
Returnstrue
if the provided type is among the types supported by the Tika parser we are using.void
setIndexFormatVersion(IndexFormatVersion indexFormatVersion)
Sets the index format versionvoid
setIndexingConfiguration(IndexingConfiguration config)
Sets the indexing configuration for this node indexer.void
setMaxExtractLength(int length)
Sets the maximum number of characters to extract from binaries.void
setSupportHighlighting(boolean b)
If set totrue
additional information is stored in the index to support highlighting using the rep:excerpt pseudo property.protected void
throwRepositoryException(Exception e)
Wraps the exceptione
into aRepositoryException
and throws the created exception.protected boolean
useInExcerpt(Name propertyName)
Returnstrue
if the content of the property with the given name should the used to create an excerpt.
-
-
-
Field Detail
-
DEFAULT_BOOST
protected static final float DEFAULT_BOOST
The default boost for a lucene field: 1.0f.- See Also:
- Constant Field Values
-
node
protected final NodeState node
TheNodeState
of the node to index
-
stateProvider
protected final ItemStateManager stateProvider
The persistent item state provider
-
mappings
protected final NamespaceMappings mappings
Namespace mappings to use for indexing. This is the internal namespace mapping.
-
resolver
protected final NamePathResolver resolver
Name and Path resolver.
-
indexingConfig
protected IndexingConfiguration indexingConfig
The indexing configuration ornull
if none is available.
-
supportHighlighting
protected boolean supportHighlighting
If set totrue
the fulltext field is stored and and a term vector is created with offset information.
-
indexFormatVersion
protected IndexFormatVersion indexFormatVersion
Indicates index format for this node indexer.
-
doNotUseInExcerpt
protected List<Fieldable> doNotUseInExcerpt
List ofFieldNames.FULLTEXT
fields which should not be used in an excerpt.
-
-
Constructor Detail
-
NodeIndexer
public NodeIndexer(NodeState node, ItemStateManager stateProvider, NamespaceMappings mappings, Executor executor, org.apache.tika.parser.Parser parser)
Creates a new node indexer.- Parameters:
node
- the node state to index.stateProvider
- the persistent item state manager to retrieve properties.mappings
- internal namespace mappings.executor
- background task executor for text extractionparser
- parser for binary properties
-
-
Method Detail
-
getNodeId
public NodeId getNodeId()
Returns theNodeId
of the indexed node.- Returns:
- the
NodeId
of the indexed node.
-
setSupportHighlighting
public void setSupportHighlighting(boolean b)
If set totrue
additional information is stored in the index to support highlighting using the rep:excerpt pseudo property.- Parameters:
b
-true
to enable highlighting support.
-
setIndexFormatVersion
public void setIndexFormatVersion(IndexFormatVersion indexFormatVersion)
Sets the index format version- Parameters:
indexFormatVersion
- the index format version
-
setIndexingConfiguration
public void setIndexingConfiguration(IndexingConfiguration config)
Sets the indexing configuration for this node indexer.- Parameters:
config
- the indexing configuration.
-
getMaxExtractLength
public int getMaxExtractLength()
Returns the maximum number of characters to extract from binaries.- Returns:
- maximum extraction length
-
setMaxExtractLength
public void setMaxExtractLength(int length)
Sets the maximum number of characters to extract from binaries.- Parameters:
length
- maximum extraction length
-
createDoc
public Document createDoc() throws RepositoryException
Creates a lucene Document.- Returns:
- the lucene Document with the index layout.
- Throws:
RepositoryException
- if an error occurs while reading property values from theItemStateProvider
.
-
throwRepositoryException
protected void throwRepositoryException(Exception e) throws RepositoryException
Wraps the exceptione
into aRepositoryException
and throws the created exception.- Parameters:
e
- the base exception.- Throws:
RepositoryException
-
addMVPName
protected void addMVPName(Document doc, Name name)
Adds aFieldNames.MVP
field todoc
with the resolvedname
using the internal search index namespace mapping.- Parameters:
doc
- the lucene document.name
- the name of the multi-value property.
-
addValue
protected void addValue(Document doc, InternalValue value, Name name) throws RepositoryException
Adds a value to the lucene Document.- Parameters:
doc
- the document.value
- the internal jackrabbit value.name
- the name of the property.- Throws:
RepositoryException
-
addValueProperty
protected void addValueProperty(Document doc, InternalValue value, Name name, String fieldName) throws RepositoryException
Adds a property related value to the lucene Document.
Likelength
for indexed fields.- Parameters:
doc
- the document.value
- the internal jackrabbit value.name
- the name of the property.- Throws:
RepositoryException
-
addPropertyName
protected void addPropertyName(Document doc, Name name)
Adds the property name to the lucene _:PROPERTIES_SET field.- Parameters:
doc
- the document.name
- the name of the property.
-
addBinaryValue
protected void addBinaryValue(Document doc, String fieldName, InternalValue internalValue)
Adds the binary value to the document as the named field.This implementation checks if this
node
is of type nt:resource and if that is the case, tries to extract text from the binary property using theparser
.- Parameters:
doc
- The document to which to add the fieldfieldName
- The name of the field to addinternalValue
- The value for the field to add to the document.
-
getValue
protected InternalValue getValue(Name name) throws ItemStateException
Utility method that extracts the first value of the named property of the current node. Returnsnull
if the property does not exist or contains no values.- Parameters:
name
- property name- Returns:
- value of the named property, or
null
- Throws:
ItemStateException
- if the property can not be accessed
-
addBooleanValue
protected void addBooleanValue(Document doc, String fieldName, Object internalValue)
Adds the string representation of the boolean value to the document as the named field.- Parameters:
doc
- The document to which to add the fieldfieldName
- The name of the field to addinternalValue
- The value for the field to add to the document.
-
createFieldWithoutNorms
protected Field createFieldWithoutNorms(String fieldName, String internalValue, int propertyType)
Creates a field of namefieldName
with the value ofinternalValue
. The created field is indexed without norms.- Parameters:
fieldName
- The name of the field to addinternalValue
- The value for the field to add to the document.propertyType
- the property type.
-
addCalendarValue
protected void addCalendarValue(Document doc, String fieldName, Calendar internalValue)
Adds the calendar value to the document as the named field. The calendar value is converted to an indexable string value using theDateField
class.- Parameters:
doc
- The document to which to add the fieldfieldName
- The name of the field to addinternalValue
- The value for the field to add to the document.
-
addDoubleValue
protected void addDoubleValue(Document doc, String fieldName, double internalValue)
Adds the double value to the document as the named field. The double value is converted to an indexable string value using theDoubleField
class.- Parameters:
doc
- The document to which to add the fieldfieldName
- The name of the field to addinternalValue
- The value for the field to add to the document.
-
addLongValue
protected void addLongValue(Document doc, String fieldName, long internalValue)
Adds the long value to the document as the named field. The long value is converted to an indexable string value using theLongField
class.- Parameters:
doc
- The document to which to add the fieldfieldName
- The name of the field to addinternalValue
- The value for the field to add to the document.
-
addDecimalValue
protected void addDecimalValue(Document doc, String fieldName, BigDecimal internalValue)
Adds the long value to the document as the named field. The long value is converted to an indexable string value using theLongField
class.- Parameters:
doc
- The document to which to add the fieldfieldName
- The name of the field to addinternalValue
- The value for the field to add to the document.
-
addReferenceValue
protected void addReferenceValue(Document doc, String fieldName, NodeId internalValue, boolean weak)
Adds the reference value to the document as the named field. The value's string representation is added as the reference data. Additionally the reference data is stored in the index. As of Jackrabbit 2.0 this method also adds the reference UUID as aFieldNames.WEAK_REFS
field to the index if it is a weak reference.- Parameters:
doc
- The document to which to add the fieldfieldName
- The name of the field to addinternalValue
- The value for the field to add to the document.weak
- Flag indicating whether it's a WEAKREFERENCE (true) or a REFERENCE (flase)
-
addPathValue
protected void addPathValue(Document doc, String fieldName, Path internalValue)
Adds the path value to the document as the named field. The path value is converted to an indexable string value using the name space mappings with which this class has been created.- Parameters:
doc
- The document to which to add the fieldfieldName
- The name of the field to addinternalValue
- The value for the field to add to the document.
-
addURIValue
protected void addURIValue(Document doc, String fieldName, URI internalValue)
Adds the uri value to the document as the named field.- Parameters:
doc
- The document to which to add the fieldfieldName
- The name of the field to addinternalValue
- The value for the field to add to the document.
-
addStringValue
protected void addStringValue(Document doc, String fieldName, String internalValue)
Deprecated.UseaddStringValue(Document, String, Object, boolean)
instead.Adds the string value to the document both as the named field and for full text indexing.- Parameters:
doc
- The document to which to add the fieldfieldName
- The name of the field to addinternalValue
- The value for the field to add to the document.
-
addStringValue
protected void addStringValue(Document doc, String fieldName, String internalValue, boolean tokenized)
Adds the string value to the document both as the named field and optionally for full text indexing iftokenized
istrue
.- Parameters:
doc
- The document to which to add the fieldfieldName
- The name of the field to addinternalValue
- The value for the field to add to the document.tokenized
- Iftrue
the string is also tokenized and fulltext indexed.
-
addStringValue
protected void addStringValue(Document doc, String fieldName, String internalValue, boolean tokenized, boolean includeInNodeIndex, float boost)
Deprecated.Adds the string value to the document both as the named field and optionally for full text indexing iftokenized
istrue
.- Parameters:
doc
- The document to which to add the fieldfieldName
- The name of the field to addinternalValue
- The value for the field to add to the document.tokenized
- Iftrue
the string is also tokenized and fulltext indexed.includeInNodeIndex
- Iftrue
the string is also tokenized and added to the node scope fulltext index.boost
- the boost value for this string field.
-
addStringValue
protected void addStringValue(Document doc, String fieldName, String internalValue, boolean tokenized, boolean includeInNodeIndex, float boost, boolean useInExcerpt)
Adds the string value to the document both as the named field and optionally for full text indexing iftokenized
istrue
.- Parameters:
doc
- The document to which to add the fieldfieldName
- The name of the field to addinternalValue
- The value for the field to add to the document.tokenized
- Iftrue
the string is also tokenized and fulltext indexed.includeInNodeIndex
- Iftrue
the string is also tokenized and added to the node scope fulltext index.boost
- the boost value for this string field.useInExcerpt
- Iftrue
the string may show up in an excerpt.
-
addNameValue
protected void addNameValue(Document doc, String fieldName, Name internalValue)
Adds the name value to the document as the named field. The name value is converted to an indexable string treating the internal value as aName
and mapping the name space using the name space mappings with which this class has been created.- Parameters:
doc
- The document to which to add the fieldfieldName
- The name of the field to addinternalValue
- The value for the field to add to the document.
-
createFulltextField
protected Field createFulltextField(String value)
Deprecated.Creates a fulltext field for the stringvalue
.- Parameters:
value
- the string value.- Returns:
- a lucene field.
-
createFulltextField
protected Field createFulltextField(String value, boolean store, boolean withOffsets)
Deprecated.Creates a fulltext field for the stringvalue
.- Parameters:
value
- the string value.store
- if the value of the field should be stored.withOffsets
- if a term vector with offsets should be stored.- Returns:
- a lucene field.
-
createFulltextField
protected Field createFulltextField(String value, boolean store, boolean withOffsets, boolean withNorms)
Creates a fulltext field for the stringvalue
.- Parameters:
value
- the string value.store
- if the value of the field should be stored.withOffsets
- if a term vector with offsets should be stored.withNorms
- if norm information should be added for this value- Returns:
- a lucene field.
-
createFulltextField
protected Fieldable createFulltextField(InternalValue value, org.apache.tika.metadata.Metadata metadata)
Deprecated.Creates a fulltext field for the readervalue
.- Parameters:
value
- the binary valuemetadata
- document metatadata- Returns:
- a lucene field.
-
createFulltextField
protected Fieldable createFulltextField(InternalValue value, org.apache.tika.metadata.Metadata metadata, boolean withNorms)
Creates a fulltext field for the readervalue
.- Parameters:
value
- the binary valuemetadata
- document metatadatawithNorms
- if norm information should be added for this value- Returns:
- a lucene field.
-
isIndexed
protected boolean isIndexed(Name propertyName)
Returnstrue
if the property with the given name should be indexed. The default is to index all properties unless explicit indexing configuration is specified. Thejcr:primaryType
andjcr:mixinTypes
properties are always indexed for correct node type resolution in queries.- Parameters:
propertyName
- name of a property.- Returns:
true
if the property should be indexed;false
otherwise.
-
isIncludedInNodeIndex
protected boolean isIncludedInNodeIndex(Name propertyName)
Returnstrue
if the property with the given name should also be added to the node scope index.- Parameters:
propertyName
- the name of a property.- Returns:
true
if it should be added to the node scope index;false
otherwise.
-
useInExcerpt
protected boolean useInExcerpt(Name propertyName)
Returnstrue
if the content of the property with the given name should the used to create an excerpt.- Parameters:
propertyName
- the name of a property.- Returns:
true
if it should be used to create an excerpt;false
otherwise.
-
isSupportedMediaType
protected boolean isSupportedMediaType(String type)
Returnstrue
if the provided type is among the types supported by the Tika parser we are using.- Parameters:
type
- the type to check.- Returns:
- whether the type is supported by the Tika parser we are using.
-
getPropertyBoost
protected float getPropertyBoost(Name propertyName)
Returns the boost value for the given property name.- Parameters:
propertyName
- the name of a property.- Returns:
- the boost value for the given property name.
-
getNodeBoost
protected float getNodeBoost()
- Returns:
- the boost value for this
node
state.
-
addLength
protected void addLength(Document doc, String propertyName, InternalValue value)
Adds aFieldNames.PROPERTY_LENGTHS
field todocument
with a named length value.- Parameters:
doc
- the lucene document.propertyName
- the property name.value
- the internal value.
-
addNodeName
protected void addNodeName(Document doc, String namespaceURI, String localName) throws NamespaceException
Depending on the index format version adds one or two fields to the document for the node name.- Parameters:
doc
- the lucene document.namespaceURI
- the namespace URI of the node name.localName
- the local name of the node.- Throws:
NamespaceException
-
addParentChildRelation
protected void addParentChildRelation(Document doc, NodeId parentId) throws ItemStateException, RepositoryException
Adds a parent child relation to the givendoc
.- Parameters:
doc
- the document.parentId
- the id of the parent node.- Throws:
ItemStateException
- if the parent node cannot be read.RepositoryException
- if the parent node does not have a child node entry for the current node.
-
-