Class Lucene40DocValuesFormat

java.lang.Object
org.apache.lucene.codecs.DocValuesFormat
org.apache.lucene.codecs.lucene40.Lucene40DocValuesFormat
All Implemented Interfaces:
NamedSPILoader.NamedSPI

@Deprecated public class Lucene40DocValuesFormat extends DocValuesFormat
Deprecated.
Only for reading old 4.0 and 4.1 segments
Lucene 4.0 DocValues format.

Files:

Entries within the compound file:
  • <segment>_<fieldNumber>.dat: data values
  • <segment>_<fieldNumber>.idx: index into the .dat for DEREF types

There are several many types of DocValues with different encodings. From the perspective of filenames, all types store their values in .dat entries within the compound file. In the case of dereferenced/sorted types, the .dat actually contains only the unique values, and an additional .idx file contains pointers to these unique values.

Formats:
  • VAR_INTS .dat --> Header, PackedType, MinValue, DefaultValue, PackedStream
  • FIXED_INTS_8 .dat --> Header, ValueSize, Bytemaxdoc
  • FIXED_INTS_16 .dat --> Header, ValueSize, Shortmaxdoc
  • FIXED_INTS_32 .dat --> Header, ValueSize, Int32maxdoc
  • FIXED_INTS_64 .dat --> Header, ValueSize, Int64maxdoc
  • FLOAT_32 .dat --> Header, ValueSize, Float32maxdoc
  • FLOAT_64 .dat --> Header, ValueSize, Float64maxdoc
  • BYTES_FIXED_STRAIGHT .dat --> Header, ValueSize, (Byte * ValueSize)maxdoc
  • BYTES_VAR_STRAIGHT .idx --> Header, TotalBytes, Addresses
  • BYTES_VAR_STRAIGHT .dat --> Header, (Byte * variable ValueSize)maxdoc
  • BYTES_FIXED_DEREF .idx --> Header, NumValues, Addresses
  • BYTES_FIXED_DEREF .dat --> Header, ValueSize, (Byte * ValueSize)NumValues
  • BYTES_VAR_DEREF .idx --> Header, TotalVarBytes, Addresses
  • BYTES_VAR_DEREF .dat --> Header, (LengthPrefix + Byte * variable ValueSize)NumValues
  • BYTES_FIXED_SORTED .idx --> Header, NumValues, Ordinals
  • BYTES_FIXED_SORTED .dat --> Header, ValueSize, (Byte * ValueSize)NumValues
  • BYTES_VAR_SORTED .idx --> Header, TotalVarBytes, Addresses, Ordinals
  • BYTES_VAR_SORTED .dat --> Header, (Byte * variable ValueSize)NumValues
Data Types: Notes:
  • PackedType is a 0 when compressed, 1 when the stream is written as 64-bit integers.
  • Addresses stores pointers to the actual byte location (indexed by docid). In the VAR_STRAIGHT case, each entry can have a different length, so to determine the length, docid+1 is retrieved. A sentinel address is written at the end for the VAR_STRAIGHT case, so the Addresses stream contains maxdoc+1 indices. For the deduplicated VAR_DEREF case, each length is encoded as a prefix to the data itself as a VInt (maximum of 2 bytes).
  • Ordinals stores the term ID in sorted order (indexed by docid). In the FIXED_SORTED case, the address into the .dat can be computed from the ordinal as Header+ValueSize+(ordinal*ValueSize) because the byte length is fixed. In the VAR_SORTED case, there is double indirection (docid -> ordinal -> address), but an additional sentinel ordinal+address is always written (so there are NumValues+1 ordinals). To determine the length, ord+1's address is looked up as well.
  • BYTES_VAR_STRAIGHT BYTES_VAR_STRAIGHT in contrast to other straight variants uses a .idx file to improve lookup perfromance. In contrast to BYTES_VAR_DEREF BYTES_VAR_DEREF it doesn't apply deduplication of the document values.

Limitations:

  • Field Details

    • MAX_BINARY_FIELD_LENGTH

      public static final int MAX_BINARY_FIELD_LENGTH
      Deprecated.
      Maximum length for each binary doc values field.
      See Also:
  • Constructor Details

    • Lucene40DocValuesFormat

      public Lucene40DocValuesFormat()
      Deprecated.
      Sole constructor.
  • Method Details