Package org.apache.lucene.codecs
Class BlockTreeTermsReader
java.lang.Object
org.apache.lucene.index.Fields
org.apache.lucene.codecs.FieldsProducer
org.apache.lucene.codecs.BlockTreeTermsReader
- All Implemented Interfaces:
Closeable,AutoCloseable,Iterable<String>
A block-based terms index and dictionary that assigns
terms to variable length blocks according to how they
share prefixes. The terms index is a prefix trie
whose leaves are term blocks. The advantage of this
approach is that seekExact is often able to
determine a term cannot exist without doing any IO, and
intersection with Automata is very fast. Note that this
terms dictionary has it's own fixed terms index (ie, it
does not support a pluggable terms index
implementation).
NOTE: this terms dictionary does not support index divisor when opening an IndexReader. Instead, you can change the min/maxItemsPerBlock during indexing.
The data structure used by this implementation is very similar to a burst trie (http://citeseer.ist.psu.edu/viewdoc/summary?doi=10.1.1.18.3499), but with added logic to break up too-large blocks of all terms sharing a given prefix into smaller ones.
Use CheckIndex with the -verbose
option to see summary statistics on the blocks in the
dictionary.
See BlockTreeTermsWriter.
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionfinal classBlockTree's implementation ofTerms.static classBlockTree statistics for a single field returned byBlockTreeTermsReader.FieldReader.computeStats(). -
Field Summary
Fields inherited from class org.apache.lucene.index.Fields
EMPTY_ARRAY -
Constructor Summary
ConstructorsConstructorDescriptionBlockTreeTermsReader(Directory dir, FieldInfos fieldInfos, SegmentInfo info, PostingsReaderBase postingsReader, IOContext ioContext, String segmentSuffix, int indexDivisor) Sole constructor. -
Method Summary
Modifier and TypeMethodDescriptionvoidclose()iterator()Returns an iterator that will step through all fields names.longReturns approximate RAM bytes usedprotected intreadHeader(IndexInput input) Reads terms file header.protected intreadIndexHeader(IndexInput input) Reads index file header.protected voidseekDir(IndexInput input, long dirOffset) Seekinputto the directory offset.intsize()Returns the number of fields or -1 if the number of distinct field names is unknown.Get theTermsfor this field.Methods inherited from class org.apache.lucene.index.Fields
getUniqueTermCountMethods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitMethods inherited from interface java.lang.Iterable
forEach, spliterator
-
Constructor Details
-
BlockTreeTermsReader
public BlockTreeTermsReader(Directory dir, FieldInfos fieldInfos, SegmentInfo info, PostingsReaderBase postingsReader, IOContext ioContext, String segmentSuffix, int indexDivisor) throws IOException Sole constructor.- Throws:
IOException
-
-
Method Details
-
readHeader
Reads terms file header.- Throws:
IOException
-
readIndexHeader
Reads index file header.- Throws:
IOException
-
seekDir
Seekinputto the directory offset.- Throws:
IOException
-
close
- Specified by:
closein interfaceAutoCloseable- Specified by:
closein interfaceCloseable- Specified by:
closein classFieldsProducer- Throws:
IOException
-
iterator
Description copied from class:FieldsReturns an iterator that will step through all fields names. This will not return null. -
terms
Description copied from class:FieldsGet theTermsfor this field. This will return null if the field does not exist.- Specified by:
termsin classFields- Throws:
IOException
-
size
public int size()Description copied from class:FieldsReturns the number of fields or -1 if the number of distinct field names is unknown. If >= 0,Fields.iterator()will return as many field names. -
ramBytesUsed
public long ramBytesUsed()Description copied from class:FieldsProducerReturns approximate RAM bytes used- Specified by:
ramBytesUsedin classFieldsProducer
-