Interface TermToBytesRefAttribute

  • All Superinterfaces:
    Attribute
    All Known Implementing Classes:
    CharTermAttributeImpl, NumericTokenStream.NumericTermAttributeImpl, Token

    public interface TermToBytesRefAttribute
    extends Attribute
    This attribute is requested by TermsHashPerField to index the contents. This attribute can be used to customize the final byte[] encoding of terms.

    Consumers of this attribute call getBytesRef() up-front, and then invoke fillBytesRef() for each term. Example:

       final TermToBytesRefAttribute termAtt = tokenStream.getAttribute(TermToBytesRefAttribute.class);
       final BytesRef bytes = termAtt.getBytesRef();
    
       while (tokenStream.incrementToken() {
    
         // you must call termAtt.fillBytesRef() before doing something with the bytes.
         // this encodes the term value (internally it might be a char[], etc) into the bytes.
         int hashCode = termAtt.fillBytesRef();
    
         if (isInteresting(bytes)) {
         
           // because the bytes are reused by the attribute (like CharTermAttribute's char[] buffer),
           // you should make a copy if you need persistent access to the bytes, otherwise they will
           // be rewritten across calls to incrementToken()
    
           doSomethingWith(new BytesRef(bytes));
         }
       }
       ...
     
    • Method Detail

      • fillBytesRef

        int fillBytesRef()
        Updates the bytes getBytesRef() to contain this term's final encoding, and returns its hashcode.
        Returns:
        the hashcode as defined by BytesRef.hashCode():
          int hash = 0;
          for (int i = termBytes.offset; i < termBytes.offset+termBytes.length; i++) {
            hash = 31*hash + termBytes.bytes[i];
          }
         
        Implement this for performance reasons, if your code can calculate the hash on-the-fly. If this is not the case, just return termBytes.hashCode().
      • getBytesRef

        BytesRef getBytesRef()
        Retrieve this attribute's BytesRef. The bytes are updated from the current term when the consumer calls fillBytesRef().
        Returns:
        this Attributes internal BytesRef.