Class LiveIndexWriterConfig
- Direct Known Subclasses:
IndexWriterConfig
IndexWriter
with few setters for
settings that can be changed on an IndexWriter
instance "live".- Since:
- 4.0
-
Field Summary
FieldsModifier and TypeFieldDescriptionprotected Codec
Codec
used to write new segments.protected IndexCommit
IndexCommit
thatIndexWriter
is opened on.protected IndexDeletionPolicy
IndexDeletionPolicy
controlling when commit points are deleted.protected org.apache.lucene.index.FlushPolicy
FlushPolicy
to control when segments are flushed.protected org.apache.lucene.index.DocumentsWriterPerThreadPool
DocumentsWriterPerThreadPool
to control how threads are allocated toDocumentsWriterPerThread
.protected org.apache.lucene.index.DocumentsWriterPerThread.IndexingChain
DocumentsWriterPerThread.IndexingChain
that determines how documents are indexed.protected InfoStream
InfoStream
for debugging messages.protected final Version
Version
thatIndexWriter
should emulate.protected MergePolicy
MergePolicy
for selecting merges.protected MergeScheduler
MergeScheduler
to use for running merges.protected IndexWriterConfig.OpenMode
IndexWriterConfig.OpenMode
thatIndexWriter
is opened with.protected int
Sets the hard upper bound on RAM usage for a single segment, after which the segment is forced to flush.protected boolean
True if readers should be pooled.protected Similarity
Similarity
to use when encoding norms.protected boolean
True if segment flushes should use compound file formatprotected long
Timeout when trying to obtain the write lock on init. -
Method Summary
Modifier and TypeMethodDescriptionReturns the default analyzer to use for indexing documents.getCodec()
Returns the currentCodec
.Returns theIndexCommit
as specified inIndexWriterConfig.setIndexCommit(IndexCommit)
or the default,null
which specifies to open the latest index commit point.Returns theIndexDeletionPolicy
specified inIndexWriterConfig.setIndexDeletionPolicy(IndexDeletionPolicy)
or the defaultKeepOnlyLastCommitDeletionPolicy
/ReturnsInfoStream
used for debugging.int
Returns the number of buffered deleted terms that will trigger a flush of all buffered deletes if enabled.int
Returns the number of buffered added documents that will trigger a flush if enabled.int
Returns the max number of simultaneous threads that may be indexing documents at once in IndexWriter.Returns the current merged segment warmer.Returns the current MergePolicy in use by this writer.Returns theMergeScheduler
that was set byIndexWriterConfig.setMergeScheduler(MergeScheduler)
.Returns theIndexWriterConfig.OpenMode
set byIndexWriterConfig.setOpenMode(OpenMode)
.double
Returns the value set bysetRAMBufferSizeMB(double)
if enabled.int
Returns the max amount of memory eachDocumentsWriterPerThread
can consume until forcefully flushed.boolean
Returnstrue
ifIndexWriter
should pool readers even ifDirectoryReader.open(IndexWriter, boolean)
has not been called.int
Returns thetermInfosIndexDivisor
.Expert: returns theSimilarity
implementation used by thisIndexWriter
.int
Returns the interval between indexed terms.boolean
Returnstrue
iff theIndexWriter
packs newly written segments in a compound file.long
Returns allowed timeout when acquiring the write lock.setMaxBufferedDeleteTerms
(int maxBufferedDeleteTerms) Determines the maximum number of delete-by-term operations that will be buffered before both the buffered in-memory delete terms and queries are applied and flushed.setMaxBufferedDocs
(int maxBufferedDocs) Determines the minimal number of documents required before the buffered in-memory documents are flushed as a new Segment.setMergedSegmentWarmer
(IndexWriter.IndexReaderWarmer mergeSegmentWarmer) Set the merged segment warmer.setRAMBufferSizeMB
(double ramBufferSizeMB) Determines the amount of RAM that may be used for buffering added documents and deletions before they are flushed to the Directory.setReaderTermsIndexDivisor
(int divisor) Sets the termsIndexDivisor passed to any readers that IndexWriter opens, for example when applying deletes or creating a near-real-time reader inDirectoryReader.open(IndexWriter, boolean)
.setTermIndexInterval
(int interval) Expert: set the interval between indexed terms.setUseCompoundFile
(boolean useCompoundFile) Sets if theIndexWriter
should pack newly written segments in a compound file.toString()
-
Field Details
-
delPolicy
IndexDeletionPolicy
controlling when commit points are deleted. -
commit
IndexCommit
thatIndexWriter
is opened on. -
openMode
IndexWriterConfig.OpenMode
thatIndexWriter
is opened with. -
similarity
Similarity
to use when encoding norms. -
mergeScheduler
MergeScheduler
to use for running merges. -
writeLockTimeout
protected volatile long writeLockTimeoutTimeout when trying to obtain the write lock on init. -
indexingChain
protected volatile org.apache.lucene.index.DocumentsWriterPerThread.IndexingChain indexingChainDocumentsWriterPerThread.IndexingChain
that determines how documents are indexed. -
codec
Codec
used to write new segments. -
infoStream
InfoStream
for debugging messages. -
mergePolicy
MergePolicy
for selecting merges. -
indexerThreadPool
protected volatile org.apache.lucene.index.DocumentsWriterPerThreadPool indexerThreadPoolDocumentsWriterPerThreadPool
to control how threads are allocated toDocumentsWriterPerThread
. -
readerPooling
protected volatile boolean readerPoolingTrue if readers should be pooled. -
flushPolicy
protected volatile org.apache.lucene.index.FlushPolicy flushPolicyFlushPolicy
to control when segments are flushed. -
perThreadHardLimitMB
protected volatile int perThreadHardLimitMBSets the hard upper bound on RAM usage for a single segment, after which the segment is forced to flush. -
matchVersion
Version
thatIndexWriter
should emulate. -
useCompoundFile
protected volatile boolean useCompoundFileTrue if segment flushes should use compound file format
-
-
Method Details
-
getAnalyzer
Returns the default analyzer to use for indexing documents. -
setTermIndexInterval
Expert: set the interval between indexed terms. Large values cause less memory to be used by IndexReader, but slow random-access to terms. Small values cause more memory to be used by an IndexReader, and speed random-access to terms.This parameter determines the amount of computation required per query term, regardless of the number of documents that contain that term. In particular, it is the maximum number of other terms that must be scanned before a term is located and its frequency and position information may be processed. In a large index with user-entered query terms, query processing time is likely to be dominated not by term lookup but rather by the processing of frequency and positional data. In a small index or when many uncommon query terms are generated (e.g., by wildcard queries) term lookup may become a dominant cost.
In particular,
numUniqueTerms/interval
terms are read into memory by an IndexReader, and, on average,interval/2
terms must be scanned for each random term access.Takes effect immediately, but only applies to newly flushed/merged segments.
NOTE: This parameter does not apply to all PostingsFormat implementations, including the default one in this release. It only makes sense for term indexes that are implemented as a fixed gap between terms. For example,
Lucene41PostingsFormat
implements the term index instead based upon how terms share prefixes. To configure its parameters (the minimum and maximum size for a block), you would instead useLucene41PostingsFormat(int, int)
. which can also be configured on a per-field basis://customize Lucene41PostingsFormat, passing minBlockSize=50, maxBlockSize=100 final PostingsFormat tweakedPostings = new Lucene41PostingsFormat(50, 100); iwc.setCodec(new Lucene45Codec() { @Override public PostingsFormat getPostingsFormatForField(String field) { if (field.equals("fieldWithTonsOfTerms")) return tweakedPostings; else return super.getPostingsFormatForField(field); } });
Note that other implementations may have their own parameters, or no parameters at all. -
getTermIndexInterval
public int getTermIndexInterval()Returns the interval between indexed terms.- See Also:
-
setMaxBufferedDeleteTerms
Determines the maximum number of delete-by-term operations that will be buffered before both the buffered in-memory delete terms and queries are applied and flushed.Disabled by default (writer flushes by RAM usage).
NOTE: This setting won't trigger a segment flush.
Takes effect immediately, but only the next time a document is added, updated or deleted. Also, if you only delete-by-query, this setting has no effect, i.e. delete queries are buffered until the next segment is flushed.
- Throws:
IllegalArgumentException
- if maxBufferedDeleteTerms is enabled but smaller than 1- See Also:
-
getMaxBufferedDeleteTerms
public int getMaxBufferedDeleteTerms()Returns the number of buffered deleted terms that will trigger a flush of all buffered deletes if enabled.- See Also:
-
setRAMBufferSizeMB
Determines the amount of RAM that may be used for buffering added documents and deletions before they are flushed to the Directory. Generally for faster indexing performance it's best to flush by RAM usage instead of document count and use as large a RAM buffer as you can.When this is set, the writer will flush whenever buffered documents and deletions use this much RAM. Pass in
IndexWriterConfig.DISABLE_AUTO_FLUSH
to prevent triggering a flush due to RAM usage. Note that if flushing by document count is also enabled, then the flush will be triggered by whichever comes first.The maximum RAM limit is inherently determined by the JVMs available memory. Yet, an
IndexWriter
session can consume a significantly larger amount of memory than the given RAM limit since this limit is just an indicator when to flush memory resident documents to the Directory. Flushes are likely happen concurrently while other threads adding documents to the writer. For application stability the available memory in the JVM should be significantly larger than the RAM buffer used for indexing.NOTE: the account of RAM usage for pending deletions is only approximate. Specifically, if you delete by Query, Lucene currently has no way to measure the RAM usage of individual Queries so the accounting will under-estimate and you should compensate by either calling commit() periodically yourself, or by using
setMaxBufferedDeleteTerms(int)
to flush and apply buffered deletes by count instead of RAM usage (for each buffered delete Query a constant number of bytes is used to estimate RAM usage). Note that enablingsetMaxBufferedDeleteTerms(int)
will not trigger any segment flushes.NOTE: It's not guaranteed that all memory resident documents are flushed once this limit is exceeded. Depending on the configured
FlushPolicy
only a subset of the buffered documents are flushed and therefore only parts of the RAM buffer is released.The default value is
IndexWriterConfig.DEFAULT_RAM_BUFFER_SIZE_MB
.Takes effect immediately, but only the next time a document is added, updated or deleted.
- Throws:
IllegalArgumentException
- if ramBufferSize is enabled but non-positive, or it disables ramBufferSize when maxBufferedDocs is already disabled- See Also:
-
getRAMBufferSizeMB
public double getRAMBufferSizeMB()Returns the value set bysetRAMBufferSizeMB(double)
if enabled. -
setMaxBufferedDocs
Determines the minimal number of documents required before the buffered in-memory documents are flushed as a new Segment. Large values generally give faster indexing.When this is set, the writer will flush every maxBufferedDocs added documents. Pass in
IndexWriterConfig.DISABLE_AUTO_FLUSH
to prevent triggering a flush due to number of buffered documents. Note that if flushing by RAM usage is also enabled, then the flush will be triggered by whichever comes first.Disabled by default (writer flushes by RAM usage).
Takes effect immediately, but only the next time a document is added, updated or deleted.
- Throws:
IllegalArgumentException
- if maxBufferedDocs is enabled but smaller than 2, or it disables maxBufferedDocs when ramBufferSize is already disabled- See Also:
-
getMaxBufferedDocs
public int getMaxBufferedDocs()Returns the number of buffered added documents that will trigger a flush if enabled.- See Also:
-
setMergedSegmentWarmer
public LiveIndexWriterConfig setMergedSegmentWarmer(IndexWriter.IndexReaderWarmer mergeSegmentWarmer) Set the merged segment warmer. SeeIndexWriter.IndexReaderWarmer
.Takes effect on the next merge.
-
getMergedSegmentWarmer
Returns the current merged segment warmer. SeeIndexWriter.IndexReaderWarmer
. -
setReaderTermsIndexDivisor
Sets the termsIndexDivisor passed to any readers that IndexWriter opens, for example when applying deletes or creating a near-real-time reader inDirectoryReader.open(IndexWriter, boolean)
. If you pass -1, the terms index won't be loaded by the readers. This is only useful in advanced situations when you will only .next() through all terms; attempts to seek will hit an exception.Takes effect immediately, but only applies to readers opened after this call
NOTE: divisor settings > 1 do not apply to all PostingsFormat implementations, including the default one in this release. It only makes sense for terms indexes that can efficiently re-sample terms at load time.
-
getReaderTermsIndexDivisor
public int getReaderTermsIndexDivisor()Returns thetermInfosIndexDivisor
.- See Also:
-
getOpenMode
Returns theIndexWriterConfig.OpenMode
set byIndexWriterConfig.setOpenMode(OpenMode)
. -
getIndexDeletionPolicy
Returns theIndexDeletionPolicy
specified inIndexWriterConfig.setIndexDeletionPolicy(IndexDeletionPolicy)
or the defaultKeepOnlyLastCommitDeletionPolicy
/ -
getIndexCommit
Returns theIndexCommit
as specified inIndexWriterConfig.setIndexCommit(IndexCommit)
or the default,null
which specifies to open the latest index commit point. -
getSimilarity
Expert: returns theSimilarity
implementation used by thisIndexWriter
. -
getMergeScheduler
Returns theMergeScheduler
that was set byIndexWriterConfig.setMergeScheduler(MergeScheduler)
. -
getWriteLockTimeout
public long getWriteLockTimeout()Returns allowed timeout when acquiring the write lock. -
getCodec
Returns the currentCodec
. -
getMergePolicy
Returns the current MergePolicy in use by this writer. -
getMaxThreadStates
public int getMaxThreadStates()Returns the max number of simultaneous threads that may be indexing documents at once in IndexWriter. -
getReaderPooling
public boolean getReaderPooling()Returnstrue
ifIndexWriter
should pool readers even ifDirectoryReader.open(IndexWriter, boolean)
has not been called. -
getRAMPerThreadHardLimitMB
public int getRAMPerThreadHardLimitMB()Returns the max amount of memory eachDocumentsWriterPerThread
can consume until forcefully flushed. -
getInfoStream
ReturnsInfoStream
used for debugging. -
setUseCompoundFile
Sets if theIndexWriter
should pack newly written segments in a compound file. Default istrue
.Use
false
for batch indexing with very large ram buffer settings.Note: To control compound file usage during segment merges see
MergePolicy.setNoCFSRatio(double)
andMergePolicy.setMaxCFSSegmentSizeMB(double)
. This setting only applies to newly created segments. -
getUseCompoundFile
public boolean getUseCompoundFile() -
toString
-