Class IndexReader
- java.lang.Object
-
- org.apache.lucene.index.IndexReader
-
- All Implemented Interfaces:
Closeable
,AutoCloseable
- Direct Known Subclasses:
AtomicReader
,CompositeReader
public abstract class IndexReader extends Object implements Closeable
IndexReader is an abstract class, providing an interface for accessing an index. Search of an index is done entirely through this abstract interface, so that any subclass which implements it is searchable.There are two different types of IndexReaders:
AtomicReader
: These indexes do not consist of several sub-readers, they are atomic. They support retrieval of stored fields, doc values, terms, and postings.CompositeReader
: Instances (likeDirectoryReader
) of this reader can only be used to get stored fields from the underlying AtomicReaders, but it is not possible to directly retrieve postings. To do that, get the sub-readers viaCompositeReader.getSequentialSubReaders()
. Alternatively, you can mimic anAtomicReader
(with a serious slowdown), by wrapping composite readers withSlowCompositeReaderWrapper
.
IndexReader instances for indexes on disk are usually constructed with a call to one of the static
DirectoryReader.open()
methods, e.g.DirectoryReader.open(org.apache.lucene.store.Directory)
.DirectoryReader
implements theCompositeReader
interface, it is not possible to directly get postings.For efficiency, in this API documents are often referred to via document numbers, non-negative integers which each name a unique document in the index. These document numbers are ephemeral -- they may change as documents are added to and deleted from an index. Clients should thus not rely on a given document having the same number between sessions.
NOTE:
IndexReader
instances are completely thread safe, meaning multiple threads can call any of its methods, concurrently. If your application requires external synchronization, you should not synchronize on theIndexReader
instance; use your own (non-Lucene) objects instead.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static interface
IndexReader.ReaderClosedListener
A custom listener that's invoked when the IndexReader is closed.
-
Method Summary
All Methods Static Methods Instance Methods Abstract Methods Concrete Methods Deprecated Methods Modifier and Type Method Description void
addReaderClosedListener(IndexReader.ReaderClosedListener listener)
Expert: adds aIndexReader.ReaderClosedListener
.void
close()
Closes files associated with this index.void
decRef()
Expert: decreases the refCount of this IndexReader instance.abstract int
docFreq(Term term)
Returns the number of documents containing theterm
.protected abstract void
doClose()
Implements close.Document
document(int docID)
Returns the stored fields of then
thDocument
in this index.Document
document(int docID, Set<String> fieldsToLoad)
Likedocument(int)
but only loads the specified fields.abstract void
document(int docID, StoredFieldVisitor visitor)
Expert: visits the fields of a stored document, for custom processing/loading of each field.protected void
ensureOpen()
Throws AlreadyClosedException if this IndexReader or any of its child readers is closed, otherwise returns.boolean
equals(Object obj)
Object
getCombinedCoreAndDeletesKey()
Expert: Returns a key for this IndexReader that also includes deletions, so FieldCache/CachingWrapperFilter can find it again.abstract IndexReaderContext
getContext()
Expert: Returns the rootIndexReaderContext
for thisIndexReader
's sub-reader tree.Object
getCoreCacheKey()
Expert: Returns a key for this IndexReader, so FieldCache/CachingWrapperFilter can find it again.abstract int
getDocCount(String field)
Returns the number of documents that have at least one term for this field, or -1 if this measure isn't stored by the codec.int
getRefCount()
Expert: returns the current refCount for this readerabstract long
getSumDocFreq(String field)
Returns the sum ofTermsEnum.docFreq()
for all terms in this field, or -1 if this measure isn't stored by the codec.abstract long
getSumTotalTermFreq(String field)
Returns the sum ofTermsEnum.totalTermFreq()
for all terms in this field, or -1 if this measure isn't stored by the codec (or if this fields omits term freq and positions).Terms
getTermVector(int docID, String field)
Retrieve term vector for this document and field, or null if term vectors were not indexed.abstract Fields
getTermVectors(int docID)
Retrieve term vectors for this document, or null if term vectors were not indexed.boolean
hasDeletions()
Returns true if any documents have been deleted.int
hashCode()
void
incRef()
Expert: increments the refCount of this IndexReader instance.List<AtomicReaderContext>
leaves()
Returns the reader's leaves, or itself if this reader is atomic.abstract int
maxDoc()
Returns one greater than the largest possible document number.int
numDeletedDocs()
Returns the number of deleted documents.abstract int
numDocs()
Returns the number of documents in this index.static DirectoryReader
open(IndexCommit commit)
Deprecated.static DirectoryReader
open(IndexCommit commit, int termInfosIndexDivisor)
Deprecated.static DirectoryReader
open(IndexWriter writer, boolean applyAllDeletes)
Deprecated.static DirectoryReader
open(Directory directory)
Deprecated.static DirectoryReader
open(Directory directory, int termInfosIndexDivisor)
Deprecated.void
registerParentReader(IndexReader reader)
Expert: This method is called byIndexReader
s which wrap other readers (e.g.void
removeReaderClosedListener(IndexReader.ReaderClosedListener listener)
Expert: remove a previously addedIndexReader.ReaderClosedListener
.abstract long
totalTermFreq(Term term)
Returns the total number of occurrences ofterm
across all documents (the sum of the freq() for each doc that has this term).boolean
tryIncRef()
Expert: increments the refCount of this IndexReader instance only if the IndexReader has not been closed yet and returnstrue
iff the refCount was successfully incremented, otherwisefalse
.
-
-
-
Method Detail
-
addReaderClosedListener
public final void addReaderClosedListener(IndexReader.ReaderClosedListener listener)
Expert: adds aIndexReader.ReaderClosedListener
. The provided listener will be invoked when this reader is closed.
-
removeReaderClosedListener
public final void removeReaderClosedListener(IndexReader.ReaderClosedListener listener)
Expert: remove a previously addedIndexReader.ReaderClosedListener
.
-
registerParentReader
public final void registerParentReader(IndexReader reader)
Expert: This method is called byIndexReader
s which wrap other readers (e.g.CompositeReader
orFilterAtomicReader
) to register the parent at the child (this reader) on construction of the parent. When this reader is closed, it will mark all registered parents as closed, too. The references to parent readers are weak only, so they can be GCed once they are no longer in use.
-
getRefCount
public final int getRefCount()
Expert: returns the current refCount for this reader
-
incRef
public final void incRef()
Expert: increments the refCount of this IndexReader instance. RefCounts are used to determine when a reader can be closed safely, i.e. as soon as there are no more references. Be sure to always call a correspondingdecRef()
, in a finally clause; otherwise the reader may never be closed. Note thatclose()
simply calls decRef(), which means that the IndexReader will not really be closed untildecRef()
has been called for all outstanding references.- See Also:
decRef()
,tryIncRef()
-
tryIncRef
public final boolean tryIncRef()
Expert: increments the refCount of this IndexReader instance only if the IndexReader has not been closed yet and returnstrue
iff the refCount was successfully incremented, otherwisefalse
. If this method returnsfalse
the reader is either already closed or is currently being closed. Either way this reader instance shouldn't be used by an application unlesstrue
is returned.RefCounts are used to determine when a reader can be closed safely, i.e. as soon as there are no more references. Be sure to always call a corresponding
decRef()
, in a finally clause; otherwise the reader may never be closed. Note thatclose()
simply calls decRef(), which means that the IndexReader will not really be closed untildecRef()
has been called for all outstanding references.
-
decRef
public final void decRef() throws IOException
Expert: decreases the refCount of this IndexReader instance. If the refCount drops to 0, then this reader is closed. If an exception is hit, the refCount is unchanged.- Throws:
IOException
- in case an IOException occurs in doClose()- See Also:
incRef()
-
ensureOpen
protected final void ensureOpen() throws AlreadyClosedException
Throws AlreadyClosedException if this IndexReader or any of its child readers is closed, otherwise returns.- Throws:
AlreadyClosedException
-
equals
public final boolean equals(Object obj)
For caching purposes,
IndexReader
subclasses are not allowed to implement equals/hashCode, so methods are declared final. To lookup instances from caches usegetCoreCacheKey()
andgetCombinedCoreAndDeletesKey()
.
-
hashCode
public final int hashCode()
For caching purposes,
IndexReader
subclasses are not allowed to implement equals/hashCode, so methods are declared final. To lookup instances from caches usegetCoreCacheKey()
andgetCombinedCoreAndDeletesKey()
.
-
open
@Deprecated public static DirectoryReader open(Directory directory) throws IOException
Deprecated.Returns a IndexReader reading the index in the given Directory- Parameters:
directory
- the index directory- Throws:
IOException
- if there is a low-level IO error
-
open
@Deprecated public static DirectoryReader open(Directory directory, int termInfosIndexDivisor) throws IOException
Deprecated.Expert: Returns a IndexReader reading the index in the given Directory with the given termInfosIndexDivisor.- Parameters:
directory
- the index directorytermInfosIndexDivisor
- Subsamples which indexed terms are loaded into RAM. This has the same effect asIndexWriterConfig.setTermIndexInterval(int)
except that setting must be done at indexing time while this setting can be set per reader. When set to N, then one in every N*termIndexInterval terms in the index is loaded into memory. By setting this to a value > 1 you can reduce memory usage, at the expense of higher latency when loading a TermInfo. The default value is 1. Set this to -1 to skip loading the terms index entirely.- Throws:
IOException
- if there is a low-level IO error
-
open
@Deprecated public static DirectoryReader open(IndexWriter writer, boolean applyAllDeletes) throws IOException
Deprecated.Open a near real time IndexReader from theIndexWriter
.- Parameters:
writer
- The IndexWriter to open fromapplyAllDeletes
- If true, all buffered deletes will be applied (made visible) in the returned reader. If false, the deletes are not applied but remain buffered (in IndexWriter) so that they will be applied in the future. Applying deletes can be costly, so if your app can tolerate deleted documents being returned you might gain some performance by passing false.- Returns:
- The new IndexReader
- Throws:
IOException
- if there is a low-level IO error- See Also:
DirectoryReader.openIfChanged(DirectoryReader,IndexWriter,boolean)
-
open
@Deprecated public static DirectoryReader open(IndexCommit commit) throws IOException
Deprecated.Expert: returns an IndexReader reading the index in the givenIndexCommit
.- Parameters:
commit
- the commit point to open- Throws:
IOException
- if there is a low-level IO error
-
open
@Deprecated public static DirectoryReader open(IndexCommit commit, int termInfosIndexDivisor) throws IOException
Deprecated.Expert: returns an IndexReader reading the index in the givenIndexCommit
and termInfosIndexDivisor.- Parameters:
commit
- the commit point to opentermInfosIndexDivisor
- Subsamples which indexed terms are loaded into RAM. This has the same effect asIndexWriterConfig.setTermIndexInterval(int)
except that setting must be done at indexing time while this setting can be set per reader. When set to N, then one in every N*termIndexInterval terms in the index is loaded into memory. By setting this to a value > 1 you can reduce memory usage, at the expense of higher latency when loading a TermInfo. The default value is 1. Set this to -1 to skip loading the terms index entirely.- Throws:
IOException
- if there is a low-level IO error
-
getTermVectors
public abstract Fields getTermVectors(int docID) throws IOException
Retrieve term vectors for this document, or null if term vectors were not indexed. The returned Fields instance acts like a single-document inverted index (the docID will be 0).- Throws:
IOException
-
getTermVector
public final Terms getTermVector(int docID, String field) throws IOException
Retrieve term vector for this document and field, or null if term vectors were not indexed. The returned Fields instance acts like a single-document inverted index (the docID will be 0).- Throws:
IOException
-
numDocs
public abstract int numDocs()
Returns the number of documents in this index.
-
maxDoc
public abstract int maxDoc()
Returns one greater than the largest possible document number. This may be used to, e.g., determine how big to allocate an array which will have an element for every document number in an index.
-
numDeletedDocs
public final int numDeletedDocs()
Returns the number of deleted documents.
-
document
public abstract void document(int docID, StoredFieldVisitor visitor) throws IOException
Expert: visits the fields of a stored document, for custom processing/loading of each field. If you simply want to load all fields, usedocument(int)
. If you want to load a subset, useDocumentStoredFieldVisitor
.- Throws:
IOException
-
document
public final Document document(int docID) throws IOException
Returns the stored fields of then
thDocument
in this index. This is just sugar for usingDocumentStoredFieldVisitor
.NOTE: for performance reasons, this method does not check if the requested document is deleted, and therefore asking for a deleted document may yield unspecified results. Usually this is not required, however you can test if the doc is deleted by checking the
Bits
returned fromMultiFields.getLiveDocs(org.apache.lucene.index.IndexReader)
. NOTE: only the content of a field is returned, if that field was stored during indexing. Metadata like boost, omitNorm, IndexOptions, tokenized, etc., are not preserved.- Throws:
IOException
- if there is a low-level IO error
-
document
public final Document document(int docID, Set<String> fieldsToLoad) throws IOException
Likedocument(int)
but only loads the specified fields. Note that this is simply sugar forDocumentStoredFieldVisitor(Set)
.- Throws:
IOException
-
hasDeletions
public boolean hasDeletions()
-
close
public final void close() throws IOException
Closes files associated with this index. Also saves any new deletions to disk. No other methods should be called after this has been called.- Specified by:
close
in interfaceAutoCloseable
- Specified by:
close
in interfaceCloseable
- Throws:
IOException
- if there is a low-level IO error
-
doClose
protected abstract void doClose() throws IOException
Implements close.- Throws:
IOException
-
getContext
public abstract IndexReaderContext getContext()
Expert: Returns the rootIndexReaderContext
for thisIndexReader
's sub-reader tree.Iff this reader is composed of sub readers, i.e. this reader being a composite reader, this method returns a
CompositeReaderContext
holding the reader's direct children as well as a view of the reader tree's atomic leaf contexts. All sub-IndexReaderContext
instances referenced from this readers top-level context are private to this reader and are not shared with another context tree. For example, IndexSearcher uses this API to drive searching by one atomic leaf reader at a time. If this reader is not composed of child readers, this method returns anAtomicReaderContext
.Note: Any of the sub-
CompositeReaderContext
instances referenced from this top-level context do not supportCompositeReaderContext.leaves()
. Only the top-level context maintains the convenience leaf-view for performance reasons.
-
leaves
public final List<AtomicReaderContext> leaves()
Returns the reader's leaves, or itself if this reader is atomic. This is a convenience method callingthis.getContext().leaves()
.- See Also:
IndexReaderContext.leaves()
-
getCoreCacheKey
public Object getCoreCacheKey()
Expert: Returns a key for this IndexReader, so FieldCache/CachingWrapperFilter can find it again. This key must not have equals()/hashCode() methods, so "equals" means "identical".
-
getCombinedCoreAndDeletesKey
public Object getCombinedCoreAndDeletesKey()
Expert: Returns a key for this IndexReader that also includes deletions, so FieldCache/CachingWrapperFilter can find it again. This key must not have equals()/hashCode() methods, so "equals" means "identical".
-
docFreq
public abstract int docFreq(Term term) throws IOException
Returns the number of documents containing theterm
. This method returns 0 if the term or field does not exists. This method does not take into account deleted documents that have not yet been merged away.- Throws:
IOException
- See Also:
TermsEnum.docFreq()
-
totalTermFreq
public abstract long totalTermFreq(Term term) throws IOException
Returns the total number of occurrences ofterm
across all documents (the sum of the freq() for each doc that has this term). This will be -1 if the codec doesn't support this measure. Note that, like other term measures, this measure does not take deleted documents into account.- Throws:
IOException
-
getSumDocFreq
public abstract long getSumDocFreq(String field) throws IOException
Returns the sum ofTermsEnum.docFreq()
for all terms in this field, or -1 if this measure isn't stored by the codec. Note that, just like other term measures, this measure does not take deleted documents into account.- Throws:
IOException
- See Also:
Terms.getSumDocFreq()
-
getDocCount
public abstract int getDocCount(String field) throws IOException
Returns the number of documents that have at least one term for this field, or -1 if this measure isn't stored by the codec. Note that, just like other term measures, this measure does not take deleted documents into account.- Throws:
IOException
- See Also:
Terms.getDocCount()
-
getSumTotalTermFreq
public abstract long getSumTotalTermFreq(String field) throws IOException
Returns the sum ofTermsEnum.totalTermFreq()
for all terms in this field, or -1 if this measure isn't stored by the codec (or if this fields omits term freq and positions). Note that, just like other term measures, this measure does not take deleted documents into account.- Throws:
IOException
- See Also:
Terms.getSumTotalTermFreq()
-
-