Class IndexWriterConfig
- All Implemented Interfaces:
Cloneable
IndexWriter
.
Once IndexWriter
has been created with this object, changes to this
object will not affect the IndexWriter
instance. For that, use
LiveIndexWriterConfig
that is returned from IndexWriter.getConfig()
.
All setter methods return IndexWriterConfig
to allow chaining
settings conveniently, for example:
IndexWriterConfig conf = new IndexWriterConfig(analyzer); conf.setter1().setter2();
- Since:
- 3.1
- See Also:
-
Nested Class Summary
Nested Classes -
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final int
Disabled by default (because IndexWriter flushes by RAM usage by default).static final int
Disabled by default (because IndexWriter flushes by RAM usage by default).static final int
The maximum number of simultaneous threads that may be indexing documents at once in IndexWriter; if more than this many threads arrive they will wait for others to finish.static final double
Default value is 16 MB (which means flush when buffered docs consume approximately 16 MB RAM).static final int
Default value is 1945.static final boolean
Default setting forsetReaderPooling(boolean)
.static final int
Default value is 1.static final int
Default value is 32.static final boolean
Default value for compound file system for newly written segments (set totrue
).static final int
Denotes a flush trigger is disabled.static long
Default value for the write lock timeout (1,000 ms).Fields inherited from class org.apache.lucene.index.LiveIndexWriterConfig
codec, commit, delPolicy, flushPolicy, indexerThreadPool, indexingChain, infoStream, matchVersion, mergePolicy, mergeScheduler, openMode, perThreadHardLimitMB, readerPooling, similarity, useCompoundFile, writeLockTimeout
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionclone()
Returns the default analyzer to use for indexing documents.getCodec()
Returns the currentCodec
.static long
Returns the default write lock timeout for newly instantiated IndexWriterConfigs.Returns theIndexCommit
as specified insetIndexCommit(IndexCommit)
or the default,null
which specifies to open the latest index commit point.Returns theIndexDeletionPolicy
specified insetIndexDeletionPolicy(IndexDeletionPolicy)
or the defaultKeepOnlyLastCommitDeletionPolicy
/ReturnsInfoStream
used for debugging.int
Returns the number of buffered deleted terms that will trigger a flush of all buffered deletes if enabled.int
Returns the number of buffered added documents that will trigger a flush if enabled.int
Returns the max number of simultaneous threads that may be indexing documents at once in IndexWriter.Returns the current merged segment warmer.Returns the current MergePolicy in use by this writer.Returns theMergeScheduler
that was set bysetMergeScheduler(MergeScheduler)
.Returns theIndexWriterConfig.OpenMode
set bysetOpenMode(OpenMode)
.double
Returns the value set byLiveIndexWriterConfig.setRAMBufferSizeMB(double)
if enabled.int
Returns the max amount of memory eachDocumentsWriterPerThread
can consume until forcefully flushed.boolean
Returnstrue
ifIndexWriter
should pool readers even ifDirectoryReader.open(IndexWriter, boolean)
has not been called.int
Returns thetermInfosIndexDivisor
.Expert: returns theSimilarity
implementation used by thisIndexWriter
.int
Returns the interval between indexed terms.long
Returns allowed timeout when acquiring the write lock.Set theCodec
.static void
setDefaultWriteLockTimeout
(long writeLockTimeout) Sets the default (for any instance) maximum time to wait for a write lock (in milliseconds).setIndexCommit
(IndexCommit commit) Expert: allows to open a certain commit point.setIndexDeletionPolicy
(IndexDeletionPolicy delPolicy) Expert: allows an optionalIndexDeletionPolicy
implementation to be specified.setInfoStream
(PrintStream printStream) Convenience method that usesPrintStreamInfoStream
.setInfoStream
(InfoStream infoStream) Information about merges, deletes and a message when maxFieldLength is reached will be printed to this.setMaxBufferedDeleteTerms
(int maxBufferedDeleteTerms) Determines the maximum number of delete-by-term operations that will be buffered before both the buffered in-memory delete terms and queries are applied and flushed.setMaxBufferedDocs
(int maxBufferedDocs) Determines the minimal number of documents required before the buffered in-memory documents are flushed as a new Segment.setMaxThreadStates
(int maxThreadStates) Sets the max number of simultaneous threads that may be indexing documents at once in IndexWriter.setMergedSegmentWarmer
(IndexWriter.IndexReaderWarmer mergeSegmentWarmer) Set the merged segment warmer.setMergePolicy
(MergePolicy mergePolicy) Expert:MergePolicy
is invoked whenever there are changes to the segments in the index.setMergeScheduler
(MergeScheduler mergeScheduler) Expert: sets the merge scheduler used by this writer.setOpenMode
(IndexWriterConfig.OpenMode openMode) SpecifiesIndexWriterConfig.OpenMode
of the index.setRAMBufferSizeMB
(double ramBufferSizeMB) Determines the amount of RAM that may be used for buffering added documents and deletions before they are flushed to the Directory.setRAMPerThreadHardLimitMB
(int perThreadHardLimitMB) Expert: Sets the maximum memory consumption per thread triggering a forced flush if exceeded.setReaderPooling
(boolean readerPooling) By default, IndexWriter does not pool the SegmentReaders it must open for deletions and merging, unless a near-real-time reader has been obtained by callingDirectoryReader.open(IndexWriter, boolean)
.setReaderTermsIndexDivisor
(int divisor) Sets the termsIndexDivisor passed to any readers that IndexWriter opens, for example when applying deletes or creating a near-real-time reader inDirectoryReader.open(IndexWriter, boolean)
.setSimilarity
(Similarity similarity) Expert: set theSimilarity
implementation used by this IndexWriter.setTermIndexInterval
(int interval) Expert: set the interval between indexed terms.setUseCompoundFile
(boolean useCompoundFile) Sets if theIndexWriter
should pack newly written segments in a compound file.setWriteLockTimeout
(long writeLockTimeout) Sets the maximum time to wait for a write lock (in milliseconds) for this instance.toString()
Methods inherited from class org.apache.lucene.index.LiveIndexWriterConfig
getUseCompoundFile
-
Field Details
-
DEFAULT_TERM_INDEX_INTERVAL
public static final int DEFAULT_TERM_INDEX_INTERVALDefault value is 32. Change usingsetTermIndexInterval(int)
.- See Also:
-
DISABLE_AUTO_FLUSH
public static final int DISABLE_AUTO_FLUSHDenotes a flush trigger is disabled.- See Also:
-
DEFAULT_MAX_BUFFERED_DELETE_TERMS
public static final int DEFAULT_MAX_BUFFERED_DELETE_TERMSDisabled by default (because IndexWriter flushes by RAM usage by default).- See Also:
-
DEFAULT_MAX_BUFFERED_DOCS
public static final int DEFAULT_MAX_BUFFERED_DOCSDisabled by default (because IndexWriter flushes by RAM usage by default).- See Also:
-
DEFAULT_RAM_BUFFER_SIZE_MB
public static final double DEFAULT_RAM_BUFFER_SIZE_MBDefault value is 16 MB (which means flush when buffered docs consume approximately 16 MB RAM).- See Also:
-
WRITE_LOCK_TIMEOUT
public static long WRITE_LOCK_TIMEOUTDefault value for the write lock timeout (1,000 ms).- See Also:
-
DEFAULT_READER_POOLING
public static final boolean DEFAULT_READER_POOLINGDefault setting forsetReaderPooling(boolean)
.- See Also:
-
DEFAULT_READER_TERMS_INDEX_DIVISOR
public static final int DEFAULT_READER_TERMS_INDEX_DIVISORDefault value is 1. Change usingsetReaderTermsIndexDivisor(int)
.- See Also:
-
DEFAULT_RAM_PER_THREAD_HARD_LIMIT_MB
public static final int DEFAULT_RAM_PER_THREAD_HARD_LIMIT_MBDefault value is 1945. Change usingsetRAMPerThreadHardLimitMB(int)
- See Also:
-
DEFAULT_MAX_THREAD_STATES
public static final int DEFAULT_MAX_THREAD_STATESThe maximum number of simultaneous threads that may be indexing documents at once in IndexWriter; if more than this many threads arrive they will wait for others to finish. Default value is 8.- See Also:
-
DEFAULT_USE_COMPOUND_FILE_SYSTEM
public static final boolean DEFAULT_USE_COMPOUND_FILE_SYSTEMDefault value for compound file system for newly written segments (set totrue
). For batch indexing with very large ram buffers usefalse
- See Also:
-
-
Constructor Details
-
IndexWriterConfig
Creates a new config that with defaults that match the specifiedVersion
as well as the defaultAnalyzer
. If matchVersion is >=Version.LUCENE_32
,TieredMergePolicy
is used for merging; elseLogByteSizeMergePolicy
. Note thatTieredMergePolicy
is free to select non-contiguous merges, which means docIDs may not remain monotonic over time. If this is a problem you should switch toLogByteSizeMergePolicy
orLogDocMergePolicy
.
-
-
Method Details
-
setDefaultWriteLockTimeout
public static void setDefaultWriteLockTimeout(long writeLockTimeout) Sets the default (for any instance) maximum time to wait for a write lock (in milliseconds). -
getDefaultWriteLockTimeout
public static long getDefaultWriteLockTimeout()Returns the default write lock timeout for newly instantiated IndexWriterConfigs.- See Also:
-
clone
-
setOpenMode
SpecifiesIndexWriterConfig.OpenMode
of the index.Only takes effect when IndexWriter is first created.
-
getOpenMode
Description copied from class:LiveIndexWriterConfig
Returns theIndexWriterConfig.OpenMode
set bysetOpenMode(OpenMode)
.- Overrides:
getOpenMode
in classLiveIndexWriterConfig
-
setIndexDeletionPolicy
Expert: allows an optionalIndexDeletionPolicy
implementation to be specified. You can use this to control when prior commits are deleted from the index. The default policy isKeepOnlyLastCommitDeletionPolicy
which removes all prior commits as soon as a new commit is done (this matches behavior before 2.2). Creating your own policy can allow you to explicitly keep previous "point in time" commits alive in the index for some time, to allow readers to refresh to the new commit without having the old commit deleted out from under them. This is necessary on filesystems like NFS that do not support "delete on last close" semantics, which Lucene's "point in time" search normally relies on.NOTE: the deletion policy cannot be null.
Only takes effect when IndexWriter is first created.
-
getIndexDeletionPolicy
Description copied from class:LiveIndexWriterConfig
Returns theIndexDeletionPolicy
specified insetIndexDeletionPolicy(IndexDeletionPolicy)
or the defaultKeepOnlyLastCommitDeletionPolicy
/- Overrides:
getIndexDeletionPolicy
in classLiveIndexWriterConfig
-
setIndexCommit
Expert: allows to open a certain commit point. The default is null which opens the latest commit point.Only takes effect when IndexWriter is first created.
-
getIndexCommit
Description copied from class:LiveIndexWriterConfig
Returns theIndexCommit
as specified insetIndexCommit(IndexCommit)
or the default,null
which specifies to open the latest index commit point.- Overrides:
getIndexCommit
in classLiveIndexWriterConfig
-
setSimilarity
Expert: set theSimilarity
implementation used by this IndexWriter.NOTE: the similarity cannot be null.
Only takes effect when IndexWriter is first created.
-
getSimilarity
Description copied from class:LiveIndexWriterConfig
Expert: returns theSimilarity
implementation used by thisIndexWriter
.- Overrides:
getSimilarity
in classLiveIndexWriterConfig
-
setMergeScheduler
Expert: sets the merge scheduler used by this writer. The default isConcurrentMergeScheduler
.NOTE: the merge scheduler cannot be null.
Only takes effect when IndexWriter is first created.
-
getMergeScheduler
Description copied from class:LiveIndexWriterConfig
Returns theMergeScheduler
that was set bysetMergeScheduler(MergeScheduler)
.- Overrides:
getMergeScheduler
in classLiveIndexWriterConfig
-
setWriteLockTimeout
Sets the maximum time to wait for a write lock (in milliseconds) for this instance. You can change the default value for all instances by callingsetDefaultWriteLockTimeout(long)
.Only takes effect when IndexWriter is first created.
-
getWriteLockTimeout
public long getWriteLockTimeout()Description copied from class:LiveIndexWriterConfig
Returns allowed timeout when acquiring the write lock.- Overrides:
getWriteLockTimeout
in classLiveIndexWriterConfig
- See Also:
-
setMergePolicy
Expert:MergePolicy
is invoked whenever there are changes to the segments in the index. Its role is to select which merges to do, if any, and return aMergePolicy.MergeSpecification
describing the merges. It also selects merges to do for forceMerge.Only takes effect when IndexWriter is first created.
-
setCodec
Set theCodec
.Only takes effect when IndexWriter is first created.
-
getCodec
Description copied from class:LiveIndexWriterConfig
Returns the currentCodec
.- Overrides:
getCodec
in classLiveIndexWriterConfig
-
getMergePolicy
Description copied from class:LiveIndexWriterConfig
Returns the current MergePolicy in use by this writer.- Overrides:
getMergePolicy
in classLiveIndexWriterConfig
- See Also:
-
setMaxThreadStates
Sets the max number of simultaneous threads that may be indexing documents at once in IndexWriter. Values < 1 are invalid and if passedmaxThreadStates
will be set toDEFAULT_MAX_THREAD_STATES
.Only takes effect when IndexWriter is first created.
-
getMaxThreadStates
public int getMaxThreadStates()Description copied from class:LiveIndexWriterConfig
Returns the max number of simultaneous threads that may be indexing documents at once in IndexWriter.- Overrides:
getMaxThreadStates
in classLiveIndexWriterConfig
-
setReaderPooling
By default, IndexWriter does not pool the SegmentReaders it must open for deletions and merging, unless a near-real-time reader has been obtained by callingDirectoryReader.open(IndexWriter, boolean)
. This method lets you enable pooling without getting a near-real-time reader. NOTE: if you set this to false, IndexWriter will still pool readers onceDirectoryReader.open(IndexWriter, boolean)
is called.Only takes effect when IndexWriter is first created.
-
getReaderPooling
public boolean getReaderPooling()Description copied from class:LiveIndexWriterConfig
Returnstrue
ifIndexWriter
should pool readers even ifDirectoryReader.open(IndexWriter, boolean)
has not been called.- Overrides:
getReaderPooling
in classLiveIndexWriterConfig
-
setRAMPerThreadHardLimitMB
Expert: Sets the maximum memory consumption per thread triggering a forced flush if exceeded. ADocumentsWriterPerThread
is forcefully flushed once it exceeds this limit even if thegetRAMBufferSizeMB()
has not been exceeded. This is a safety limit to prevent aDocumentsWriterPerThread
from address space exhaustion due to its internal 32 bit signed integer based memory addressing. The given value must be less that 2GB (2048MB)- See Also:
-
getRAMPerThreadHardLimitMB
public int getRAMPerThreadHardLimitMB()Description copied from class:LiveIndexWriterConfig
Returns the max amount of memory eachDocumentsWriterPerThread
can consume until forcefully flushed.- Overrides:
getRAMPerThreadHardLimitMB
in classLiveIndexWriterConfig
- See Also:
-
getInfoStream
Description copied from class:LiveIndexWriterConfig
ReturnsInfoStream
used for debugging.- Overrides:
getInfoStream
in classLiveIndexWriterConfig
- See Also:
-
getAnalyzer
Description copied from class:LiveIndexWriterConfig
Returns the default analyzer to use for indexing documents.- Overrides:
getAnalyzer
in classLiveIndexWriterConfig
-
getMaxBufferedDeleteTerms
public int getMaxBufferedDeleteTerms()Description copied from class:LiveIndexWriterConfig
Returns the number of buffered deleted terms that will trigger a flush of all buffered deletes if enabled.- Overrides:
getMaxBufferedDeleteTerms
in classLiveIndexWriterConfig
- See Also:
-
getMaxBufferedDocs
public int getMaxBufferedDocs()Description copied from class:LiveIndexWriterConfig
Returns the number of buffered added documents that will trigger a flush if enabled.- Overrides:
getMaxBufferedDocs
in classLiveIndexWriterConfig
- See Also:
-
getMergedSegmentWarmer
Description copied from class:LiveIndexWriterConfig
Returns the current merged segment warmer. SeeIndexWriter.IndexReaderWarmer
.- Overrides:
getMergedSegmentWarmer
in classLiveIndexWriterConfig
-
getRAMBufferSizeMB
public double getRAMBufferSizeMB()Description copied from class:LiveIndexWriterConfig
Returns the value set byLiveIndexWriterConfig.setRAMBufferSizeMB(double)
if enabled.- Overrides:
getRAMBufferSizeMB
in classLiveIndexWriterConfig
-
getReaderTermsIndexDivisor
public int getReaderTermsIndexDivisor()Description copied from class:LiveIndexWriterConfig
Returns thetermInfosIndexDivisor
.- Overrides:
getReaderTermsIndexDivisor
in classLiveIndexWriterConfig
- See Also:
-
getTermIndexInterval
public int getTermIndexInterval()Description copied from class:LiveIndexWriterConfig
Returns the interval between indexed terms.- Overrides:
getTermIndexInterval
in classLiveIndexWriterConfig
- See Also:
-
setInfoStream
Information about merges, deletes and a message when maxFieldLength is reached will be printed to this. Must not be null, butInfoStream.NO_OUTPUT
may be used to supress output. -
setInfoStream
Convenience method that usesPrintStreamInfoStream
. Must not be null. -
setMaxBufferedDeleteTerms
Description copied from class:LiveIndexWriterConfig
Determines the maximum number of delete-by-term operations that will be buffered before both the buffered in-memory delete terms and queries are applied and flushed.Disabled by default (writer flushes by RAM usage).
NOTE: This setting won't trigger a segment flush.
Takes effect immediately, but only the next time a document is added, updated or deleted. Also, if you only delete-by-query, this setting has no effect, i.e. delete queries are buffered until the next segment is flushed.
- Overrides:
setMaxBufferedDeleteTerms
in classLiveIndexWriterConfig
- See Also:
-
setMaxBufferedDocs
Description copied from class:LiveIndexWriterConfig
Determines the minimal number of documents required before the buffered in-memory documents are flushed as a new Segment. Large values generally give faster indexing.When this is set, the writer will flush every maxBufferedDocs added documents. Pass in
DISABLE_AUTO_FLUSH
to prevent triggering a flush due to number of buffered documents. Note that if flushing by RAM usage is also enabled, then the flush will be triggered by whichever comes first.Disabled by default (writer flushes by RAM usage).
Takes effect immediately, but only the next time a document is added, updated or deleted.
- Overrides:
setMaxBufferedDocs
in classLiveIndexWriterConfig
- See Also:
-
setMergedSegmentWarmer
Description copied from class:LiveIndexWriterConfig
Set the merged segment warmer. SeeIndexWriter.IndexReaderWarmer
.Takes effect on the next merge.
- Overrides:
setMergedSegmentWarmer
in classLiveIndexWriterConfig
-
setRAMBufferSizeMB
Description copied from class:LiveIndexWriterConfig
Determines the amount of RAM that may be used for buffering added documents and deletions before they are flushed to the Directory. Generally for faster indexing performance it's best to flush by RAM usage instead of document count and use as large a RAM buffer as you can.When this is set, the writer will flush whenever buffered documents and deletions use this much RAM. Pass in
DISABLE_AUTO_FLUSH
to prevent triggering a flush due to RAM usage. Note that if flushing by document count is also enabled, then the flush will be triggered by whichever comes first.The maximum RAM limit is inherently determined by the JVMs available memory. Yet, an
IndexWriter
session can consume a significantly larger amount of memory than the given RAM limit since this limit is just an indicator when to flush memory resident documents to the Directory. Flushes are likely happen concurrently while other threads adding documents to the writer. For application stability the available memory in the JVM should be significantly larger than the RAM buffer used for indexing.NOTE: the account of RAM usage for pending deletions is only approximate. Specifically, if you delete by Query, Lucene currently has no way to measure the RAM usage of individual Queries so the accounting will under-estimate and you should compensate by either calling commit() periodically yourself, or by using
LiveIndexWriterConfig.setMaxBufferedDeleteTerms(int)
to flush and apply buffered deletes by count instead of RAM usage (for each buffered delete Query a constant number of bytes is used to estimate RAM usage). Note that enablingLiveIndexWriterConfig.setMaxBufferedDeleteTerms(int)
will not trigger any segment flushes.NOTE: It's not guaranteed that all memory resident documents are flushed once this limit is exceeded. Depending on the configured
FlushPolicy
only a subset of the buffered documents are flushed and therefore only parts of the RAM buffer is released.The default value is
DEFAULT_RAM_BUFFER_SIZE_MB
.Takes effect immediately, but only the next time a document is added, updated or deleted.
- Overrides:
setRAMBufferSizeMB
in classLiveIndexWriterConfig
- See Also:
-
setReaderTermsIndexDivisor
Description copied from class:LiveIndexWriterConfig
Sets the termsIndexDivisor passed to any readers that IndexWriter opens, for example when applying deletes or creating a near-real-time reader inDirectoryReader.open(IndexWriter, boolean)
. If you pass -1, the terms index won't be loaded by the readers. This is only useful in advanced situations when you will only .next() through all terms; attempts to seek will hit an exception.Takes effect immediately, but only applies to readers opened after this call
NOTE: divisor settings > 1 do not apply to all PostingsFormat implementations, including the default one in this release. It only makes sense for terms indexes that can efficiently re-sample terms at load time.
- Overrides:
setReaderTermsIndexDivisor
in classLiveIndexWriterConfig
-
setTermIndexInterval
Description copied from class:LiveIndexWriterConfig
Expert: set the interval between indexed terms. Large values cause less memory to be used by IndexReader, but slow random-access to terms. Small values cause more memory to be used by an IndexReader, and speed random-access to terms.This parameter determines the amount of computation required per query term, regardless of the number of documents that contain that term. In particular, it is the maximum number of other terms that must be scanned before a term is located and its frequency and position information may be processed. In a large index with user-entered query terms, query processing time is likely to be dominated not by term lookup but rather by the processing of frequency and positional data. In a small index or when many uncommon query terms are generated (e.g., by wildcard queries) term lookup may become a dominant cost.
In particular,
numUniqueTerms/interval
terms are read into memory by an IndexReader, and, on average,interval/2
terms must be scanned for each random term access.Takes effect immediately, but only applies to newly flushed/merged segments.
NOTE: This parameter does not apply to all PostingsFormat implementations, including the default one in this release. It only makes sense for term indexes that are implemented as a fixed gap between terms. For example,
Lucene41PostingsFormat
implements the term index instead based upon how terms share prefixes. To configure its parameters (the minimum and maximum size for a block), you would instead useLucene41PostingsFormat(int, int)
. which can also be configured on a per-field basis://customize Lucene41PostingsFormat, passing minBlockSize=50, maxBlockSize=100 final PostingsFormat tweakedPostings = new Lucene41PostingsFormat(50, 100); iwc.setCodec(new Lucene45Codec() { @Override public PostingsFormat getPostingsFormatForField(String field) { if (field.equals("fieldWithTonsOfTerms")) return tweakedPostings; else return super.getPostingsFormatForField(field); } });
Note that other implementations may have their own parameters, or no parameters at all.- Overrides:
setTermIndexInterval
in classLiveIndexWriterConfig
- See Also:
-
setUseCompoundFile
Description copied from class:LiveIndexWriterConfig
Sets if theIndexWriter
should pack newly written segments in a compound file. Default istrue
.Use
false
for batch indexing with very large ram buffer settings.Note: To control compound file usage during segment merges see
MergePolicy.setNoCFSRatio(double)
andMergePolicy.setMaxCFSSegmentSizeMB(double)
. This setting only applies to newly created segments.- Overrides:
setUseCompoundFile
in classLiveIndexWriterConfig
-
toString
- Overrides:
toString
in classLiveIndexWriterConfig
-