Class FuzzyTermsEnum
- All Implemented Interfaces:
BytesRefIterator
Term enumerations are always ordered by
getComparator()
. Each term in the enumeration is
greater than all that precede it.
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic interface
reuses compiled automata across different segments, because they are independent of the indexstatic final class
Stores compiled automata as a list (indexed by edit distance)Nested classes/interfaces inherited from class org.apache.lucene.index.TermsEnum
TermsEnum.SeekStatus
-
Field Summary
FieldsModifier and TypeFieldDescriptionprotected int
protected final float
protected final boolean
protected final int
protected final float
protected final int
protected final Terms
protected final int[]
-
Constructor Summary
ConstructorsConstructorDescriptionFuzzyTermsEnum
(Terms terms, AttributeSource atts, Term term, float minSimilarity, int prefixLength, boolean transpositions) Constructor for enumeration of all terms from specifiedreader
which share a prefix of lengthprefixLength
withterm
and which have a fuzzy similarity >minSimilarity
. -
Method Summary
Modifier and TypeMethodDescriptionint
docFreq()
Returns the number of documents containing the current term.GetDocsEnum
for the current term, with control over whether freqs are required.docsAndPositions
(Bits liveDocs, DocsAndPositionsEnum reuse, int flags) GetDocsAndPositionsEnum
for the current term, with control over whether offsets and payloads are required.protected TermsEnum
getAutomatonEnum
(int editDistance, BytesRef lastTerm) return an automata-based enum for matching up to editDistance from lastTerm, if possibleReturn theBytesRef
Comparator used to sort terms provided by the iterator.float
float
protected void
maxEditDistanceChanged
(BytesRef lastTerm, int maxEdits, boolean init) next()
Increments the iteration to the nextBytesRef
in the iterator.long
ord()
Returns ordinal position for current term.Seeks to the specified term, if it exists, or to the next (ceiling) term.void
seekExact
(long ord) Seeks to the specified term by ordinal (position) as previously returned byTermsEnum.ord()
.boolean
Attempts to seek to the exact term, returning true if the term is found.void
Expert: Seeks a specific position byTermState
previously obtained fromTermsEnum.termState()
.protected void
swap in a new actual enum to proxy toterm()
Returns current term.Expert: Returns the TermsEnums internal state to position the TermsEnum without re-seeking the term dictionary.long
Returns the total number of occurrences of this term across all documents (the sum of the freq() for each doc that has this term).Methods inherited from class org.apache.lucene.index.TermsEnum
attributes, docs, docsAndPositions
-
Field Details
-
minSimilarity
protected final float minSimilarity -
scale_factor
protected final float scale_factor -
termLength
protected final int termLength -
maxEdits
protected int maxEdits -
raw
protected final boolean raw -
terms
-
termText
protected final int[] termText -
realPrefixLength
protected final int realPrefixLength
-
-
Constructor Details
-
FuzzyTermsEnum
public FuzzyTermsEnum(Terms terms, AttributeSource atts, Term term, float minSimilarity, int prefixLength, boolean transpositions) throws IOException Constructor for enumeration of all terms from specifiedreader
which share a prefix of lengthprefixLength
withterm
and which have a fuzzy similarity >minSimilarity
.After calling the constructor the enumeration is already pointing to the first valid term if such a term exists.
- Parameters:
terms
- Delivers terms.atts
-AttributeSource
created by the rewrite method ofMultiTermQuery
thats contains information about competitive boosts during rewrite. It is also used to cache DFAs between segment transitions.term
- Pattern term.minSimilarity
- Minimum required similarity for terms from the reader. Pass an integer value representing edit distance. Passing a fraction is deprecated.prefixLength
- Length of required common prefix. Default value is 0.- Throws:
IOException
- if there is a low-level IO error
-
-
Method Details
-
getAutomatonEnum
return an automata-based enum for matching up to editDistance from lastTerm, if possible- Throws:
IOException
-
setEnum
swap in a new actual enum to proxy to -
maxEditDistanceChanged
protected void maxEditDistanceChanged(BytesRef lastTerm, int maxEdits, boolean init) throws IOException - Throws:
IOException
-
next
Description copied from interface:BytesRefIterator
Increments the iteration to the nextBytesRef
in the iterator. Returns the resultingBytesRef
ornull
if the end of the iterator is reached. The returned BytesRef may be re-used across calls to next. After this method returns null, do not call it again: the results are undefined.- Returns:
- the next
BytesRef
in the iterator ornull
if the end of the iterator is reached. - Throws:
IOException
- If there is a low-level I/O error.
-
docFreq
Description copied from class:TermsEnum
Returns the number of documents containing the current term. Do not call this when the enum is unpositioned.TermsEnum.SeekStatus.END
.- Specified by:
docFreq
in classTermsEnum
- Throws:
IOException
-
totalTermFreq
Description copied from class:TermsEnum
Returns the total number of occurrences of this term across all documents (the sum of the freq() for each doc that has this term). This will be -1 if the codec doesn't support this measure. Note that, like other term measures, this measure does not take deleted documents into account.- Specified by:
totalTermFreq
in classTermsEnum
- Throws:
IOException
-
docs
Description copied from class:TermsEnum
GetDocsEnum
for the current term, with control over whether freqs are required. Do not call this when the enum is unpositioned. This method will not return null.- Specified by:
docs
in classTermsEnum
- Parameters:
liveDocs
- unset bits are documents that should not be returnedreuse
- pass a prior DocsEnum for possible reuseflags
- specifies which optional per-document values you require; seeDocsEnum.FLAG_FREQS
- Throws:
IOException
- See Also:
-
docsAndPositions
public DocsAndPositionsEnum docsAndPositions(Bits liveDocs, DocsAndPositionsEnum reuse, int flags) throws IOException Description copied from class:TermsEnum
GetDocsAndPositionsEnum
for the current term, with control over whether offsets and payloads are required. Some codecs may be able to optimize their implementation when offsets and/or payloads are not required. Do not call this when the enum is unpositioned. This will return null if positions were not indexed.- Specified by:
docsAndPositions
in classTermsEnum
- Parameters:
liveDocs
- unset bits are documents that should not be returnedreuse
- pass a prior DocsAndPositionsEnum for possible reuseflags
- specifies which optional per-position values you require; seeDocsAndPositionsEnum.FLAG_OFFSETS
andDocsAndPositionsEnum.FLAG_PAYLOADS
.- Throws:
IOException
-
seekExact
Description copied from class:TermsEnum
Expert: Seeks a specific position byTermState
previously obtained fromTermsEnum.termState()
. Callers should maintain theTermState
to use this method. Low-level implementations may position the TermsEnum without re-seeking the term dictionary.Seeking by
TermState
should only be used iff the state was obtained from the sameTermsEnum
instance.NOTE: Using this method with an incompatible
TermState
might leave thisTermsEnum
in undefined state. On a segment levelTermState
instances are compatible only iff the source and the targetTermsEnum
operate on the same field. If operating on segment level, TermState instances must not be used across segments.NOTE: A seek by
TermState
might not restore theAttributeSource
's state.AttributeSource
states must be maintained separately if this method is used.- Overrides:
seekExact
in classTermsEnum
- Parameters:
term
- the term the TermState corresponds tostate
- theTermState
- Throws:
IOException
-
termState
Description copied from class:TermsEnum
Expert: Returns the TermsEnums internal state to position the TermsEnum without re-seeking the term dictionary.NOTE: A seek by
TermState
might not capture theAttributeSource
's state. Callers must maintain theAttributeSource
states separately- Overrides:
termState
in classTermsEnum
- Throws:
IOException
- See Also:
-
getComparator
Description copied from interface:BytesRefIterator
Return theBytesRef
Comparator used to sort terms provided by the iterator. This may return null if there are no items or the iterator is not sorted. Callers may invoke this method many times, so it's best to cache a single instance & reuse it. -
ord
Description copied from class:TermsEnum
Returns ordinal position for current term. This is an optional method (the codec may throwUnsupportedOperationException
). Do not call this when the enum is unpositioned.- Specified by:
ord
in classTermsEnum
- Throws:
IOException
-
seekExact
Description copied from class:TermsEnum
Attempts to seek to the exact term, returning true if the term is found. If this returns false, the enum is unpositioned. For some codecs, seekExact may be substantially faster thanTermsEnum.seekCeil(org.apache.lucene.util.BytesRef)
.- Overrides:
seekExact
in classTermsEnum
- Throws:
IOException
-
seekCeil
Description copied from class:TermsEnum
Seeks to the specified term, if it exists, or to the next (ceiling) term. Returns SeekStatus to indicate whether exact term was found, a different term was found, or EOF was hit. The target term may be before or after the current term. If this returns SeekStatus.END, the enum is unpositioned.- Specified by:
seekCeil
in classTermsEnum
- Throws:
IOException
-
seekExact
Description copied from class:TermsEnum
Seeks to the specified term by ordinal (position) as previously returned byTermsEnum.ord()
. The target ord may be before or after the current ord, and must be within bounds.- Specified by:
seekExact
in classTermsEnum
- Throws:
IOException
-
term
Description copied from class:TermsEnum
Returns current term. Do not call this when the enum is unpositioned.- Specified by:
term
in classTermsEnum
- Throws:
IOException
-
getMinSimilarity
public float getMinSimilarity() -
getScaleFactor
public float getScaleFactor()
-