Class SegmentInfos

  • All Implemented Interfaces:
    Cloneable, Iterable<SegmentCommitInfo>

    public final class SegmentInfos
    extends Object
    implements Cloneable, Iterable<SegmentCommitInfo>
    A collection of segmentInfo objects with methods for operating on those segments in relation to the file system.

    The active segments in the index are stored in the segment info file, segments_N. There may be one or more segments_N files in the index; however, the one with the largest generation is the active one (when older segments_N files are present it's because they temporarily cannot be deleted, or, a writer is in the process of committing, or a custom IndexDeletionPolicy is in use). This file lists each segment by name and has details about the codec and generation of deletes.

    There is also a file segments.gen. This file contains the current generation (the _N in segments_N) of the index. This is used only as a fallback in case the current generation cannot be accurately determined by directory listing alone (as is the case for some NFS clients with time-based directory cache expiration). This file simply contains an Int32 version header (FORMAT_SEGMENTS_GEN_CURRENT), followed by the generation recorded as Int64, written twice.

    Files:

    • segments.gen: GenHeader, Generation, Generation
    • segments_N: Header, Version, NameCounter, SegCount, <SegName, SegCodec, DelGen, DeletionCount, FieldInfosGen, UpdatesFiles>SegCount, CommitUserData, Checksum

    Data types:

    Field Descriptions:

    • Version counts how often the index has been changed by adding or deleting documents.
    • NameCounter is used to generate names for new segment files.
    • SegName is the name of the segment, and is used as the file name prefix for all of the files that compose the segment's index.
    • DelGen is the generation count of the deletes file. If this is -1, there are no deletes. Anything above zero means there are deletes stored by LiveDocsFormat.
    • DeletionCount records the number of deleted documents in this segment.
    • Checksum contains the CRC32 checksum of all bytes in the segments_N file up until the checksum. This is used to verify integrity of the file on opening the index.
    • SegCodec is the name of the Codec that encoded this segment.
    • CommitUserData stores an optional user-supplied opaque Map<String,String> that was passed to IndexWriter.setCommitData(java.util.Map).
    • FieldInfosGen is the generation count of the fieldInfos file. If this is -1, there are no updates to the fieldInfos in that segment. Anything above zero means there are updates to fieldInfos stored by FieldInfosFormat.
    • UpdatesFiles stores the list of files that were updated in that segment.

    • Field Detail

      • VERSION_40

        public static final int VERSION_40
        The file format version for the segments_N codec header, up to 4.5.
        See Also:
        Constant Field Values
      • VERSION_46

        public static final int VERSION_46
        The file format version for the segments_N codec header, since 4.6+.
        See Also:
        Constant Field Values
      • FORMAT_SEGMENTS_GEN_CURRENT

        public static final int FORMAT_SEGMENTS_GEN_CURRENT
        Used for the segments.gen file only! Whenever you add a new format, make it 1 smaller (negative version logic)!
        See Also:
        Constant Field Values
      • counter

        public int counter
        Used to name new segments.
      • version

        public long version
        Counts how often the index has been changed.
      • userData

        public Map<String,​String> userData
        Opaque Map<String, String> that user can specify during IndexWriter.commit
    • Method Detail

      • getLastCommitGeneration

        public static long getLastCommitGeneration​(String[] files)
        Get the generation of the most recent commit to the list of index files (N in the segments_N file).
        Parameters:
        files - -- array of file names to check
      • getLastCommitGeneration

        public static long getLastCommitGeneration​(Directory directory)
                                            throws IOException
        Get the generation of the most recent commit to the index in this directory (N in the segments_N file).
        Parameters:
        directory - -- directory to search for the latest segments_N file
        Throws:
        IOException
      • getLastCommitSegmentsFileName

        public static String getLastCommitSegmentsFileName​(String[] files)
        Get the filename of the segments_N file for the most recent commit in the list of index files.
        Parameters:
        files - -- array of file names to check
      • getLastCommitSegmentsFileName

        public static String getLastCommitSegmentsFileName​(Directory directory)
                                                    throws IOException
        Get the filename of the segments_N file for the most recent commit to the index in this Directory.
        Parameters:
        directory - -- directory to search for the latest segments_N file
        Throws:
        IOException
      • getSegmentsFileName

        public String getSegmentsFileName()
        Get the segments_N filename in use by this segment infos.
      • generationFromSegmentsFileName

        public static long generationFromSegmentsFileName​(String fileName)
        Parse the generation off the segments file name and return it.
      • writeSegmentsGen

        public static void writeSegmentsGen​(Directory dir,
                                            long generation)
        A utility for writing the IndexFileNames.SEGMENTS_GEN file to a Directory.

        NOTE: this is an internal utility which is kept public so that it's accessible by code from other packages. You should avoid calling this method unless you're absolutely sure what you're doing!

      • getNextSegmentFileName

        public String getNextSegmentFileName()
        Get the next segments_N filename that will be written.
      • read

        public final void read​(Directory directory,
                               String segmentFileName)
                        throws IOException
        Read a particular segmentFileName. Note that this may throw an IOException if a commit is in process.
        Parameters:
        directory - -- directory containing the segments file
        segmentFileName - -- segment file to load
        Throws:
        CorruptIndexException - if the index is corrupt
        IOException - if there is a low-level IO error
      • clone

        public SegmentInfos clone()
        Returns a copy of this instance, also copying each SegmentInfo.
        Overrides:
        clone in class Object
      • getVersion

        public long getVersion()
        version number when this SegmentInfos was generated.
      • getGeneration

        public long getGeneration()
        Returns current generation.
      • getLastGeneration

        public long getLastGeneration()
        Returns last succesfully read or written generation.
      • setInfoStream

        public static void setInfoStream​(PrintStream infoStream)
        If non-null, information about retries when loading the segments file will be printed to this.
      • setDefaultGenLookaheadCount

        public static void setDefaultGenLookaheadCount​(int count)
        Advanced: set how many times to try incrementing the gen when loading the segments file. This only runs if the primary (listing directory) and secondary (opening segments.gen file) methods fail to find the segments file.
      • files

        public Collection<String> files​(Directory dir,
                                        boolean includeSegmentsFile)
                                 throws IOException
        Returns all file names referenced by SegmentInfo instances matching the provided Directory (ie files associated with any "external" segments are skipped). The returned collection is recomputed on each invocation.
        Throws:
        IOException
      • toString

        public String toString​(Directory directory)
        Returns readable description of this segment.
      • totalDocCount

        public int totalDocCount()
        Returns sum of all segment's docCounts. Note that this does not include deletions
      • changed

        public void changed()
        Call this before committing if changes have been made to the segments.