Package org.apache.lucene.codecs
The Codec API allows you to customise the way the following pieces of index information are stored:
- Postings lists - see
PostingsFormat
- DocValues - see
DocValuesFormat
- Stored fields - see
StoredFieldsFormat
- Term vectors - see
TermVectorsFormat
- FieldInfos - see
FieldInfosFormat
- SegmentInfo - see
SegmentInfoFormat
- Norms - see
NormsFormat
- Live documents - see
LiveDocsFormat
Codecs are identified by name through the Java Service Provider Interface. To create your own codec, extend
Codec
and pass the new codec's name to the super() constructor:
public class MyCodec extends Codec { public MyCodec() { super("MyCodecName"); } ... }You will need to register the Codec class so that the
ServiceLoader
can find it, by including a
META-INF/services/org.apache.lucene.codecs.Codec file on your classpath that contains the package-qualified
name of your codec.
If you just want to customise the PostingsFormat
, or use different postings
formats for different fields, then you can register your custom postings format in the same way (in
META-INF/services/org.apache.lucene.codecs.PostingsFormat), and then extend the default
Lucene46Codec
and override
Lucene46Codec.getPostingsFormatForField(String)
to return your custom
postings format.
Similarly, if you just want to customise the DocValuesFormat
per-field, have
a look at Lucene46Codec.getDocValuesFormatForField(String)
.
-
ClassesClassDescriptionHolds all state required for
PostingsReaderBase
to produce aDocsEnum
without re-seeking the terms dict.A block-based terms index and dictionary that assigns terms to variable length blocks according to how they share prefixes.BlockTree statistics for a single field returned byBlockTreeTermsReader.FieldReader.computeStats()
.Block-based terms index and dictionary writer.Encodes/decodes an inverted index segment.Utility class for reading and writing versioned headers.Abstract API that consumes numeric, binary and sorted docvalues.Encodes/decodes per-document values.Abstract API that produces numeric, binary and sorted docvalues.A simple implementation ofDocValuesProducer.getDocsWithField(org.apache.lucene.index.FieldInfo)
that returnstrue
if a document has an ordinal >= 0A simple implementation ofDocValuesProducer.getDocsWithField(org.apache.lucene.index.FieldInfo)
that returnstrue
if a document has any ordinals.Encodes/decodesFieldInfos
Codec API for readingFieldInfos
.Codec API for writingFieldInfos
.Abstract API that consumes terms, doc, freq, prox, offset and payloads postings.Abstract API that produces terms, doc, freq, prox, offset and payloads postings.A codec that forwards all its method calls to another codec.Format for live/deleted documentsExposes flex API, merged from flex API of sub-segments, remapping docIDs (this is used for segment merging).Exposes flex API, merged from flex API of sub-segments, remapping docIDs (this is used for segment merging).This abstract class reads skip lists with multiple levels.This abstract class writes skip lists with multiple levels.Encodes/decodes per-document score normalization values.Provides aPostingsReaderBase
andPostingsWriterBase
.Abstract API that consumes postings for an individual term.Encodes/decodes terms, postings, and proximity data.The core terms dictionaries (BlockTermsReader, BlockTreeTermsReader) interact with a single instance of this class to manage creation ofDocsEnum
andDocsAndPositionsEnum
instances.Extension ofPostingsConsumer
to support pluggable term dictionaries.Expert: Controls the format of theSegmentInfo
(segment metadata file).Specifies an API for classes that can readSegmentInfo
information.Specifies an API for classes that can write outSegmentInfo
data.Controls the format of stored fieldsCodec API for reading stored fields.Codec API for writing stored fields:Abstract API that consumes terms for an individual field.Holder for per-term statistics.Controls the format of term vectorsCodec API for reading term vectors:Codec API for writing term vectors: