Class Tokenizer
- java.lang.Object
-
- org.apache.lucene.util.AttributeSource
-
- org.apache.lucene.analysis.TokenStream
-
- org.apache.lucene.analysis.Tokenizer
-
- All Implemented Interfaces:
Closeable
,AutoCloseable
public abstract class Tokenizer extends TokenStream
A Tokenizer is a TokenStream whose input is a Reader.This is an abstract class; subclasses must override
TokenStream.incrementToken()
NOTE: Subclasses overriding
TokenStream.incrementToken()
must callAttributeSource.clearAttributes()
before setting attributes.
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from class org.apache.lucene.util.AttributeSource
AttributeSource.AttributeFactory, AttributeSource.State
-
-
Constructor Summary
Constructors Modifier Constructor Description protected
Tokenizer(Reader input)
Construct a token stream processing the given input.protected
Tokenizer(AttributeSource.AttributeFactory factory, Reader input)
Construct a token stream processing the given input using the given AttributeFactory.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description void
close()
Releases resources associated with this stream.protected int
correctOffset(int currentOff)
Return the corrected offset.void
reset()
This method is called by a consumer before it begins consumption usingTokenStream.incrementToken()
.void
setReader(Reader input)
Expert: Set a new reader on the Tokenizer.-
Methods inherited from class org.apache.lucene.analysis.TokenStream
end, incrementToken
-
Methods inherited from class org.apache.lucene.util.AttributeSource
addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, restoreState, toString
-
-
-
-
Field Detail
-
input
protected Reader input
The text source for this Tokenizer.
-
-
Constructor Detail
-
Tokenizer
protected Tokenizer(Reader input)
Construct a token stream processing the given input.
-
Tokenizer
protected Tokenizer(AttributeSource.AttributeFactory factory, Reader input)
Construct a token stream processing the given input using the given AttributeFactory.
-
-
Method Detail
-
close
public void close() throws IOException
Releases resources associated with this stream.If you override this method, always call
super.close()
, otherwise some internal state will not be correctly reset (e.g.,Tokenizer
will throwIllegalStateException
on reuse).NOTE: The default implementation closes the input Reader, so be sure to call
super.close()
when overriding this method.- Specified by:
close
in interfaceAutoCloseable
- Specified by:
close
in interfaceCloseable
- Overrides:
close
in classTokenStream
- Throws:
IOException
-
correctOffset
protected final int correctOffset(int currentOff)
Return the corrected offset. Ifinput
is aCharFilter
subclass this method callsCharFilter.correctOffset(int)
, else returnscurrentOff
.- Parameters:
currentOff
- offset as seen in the output- Returns:
- corrected offset based on the input
- See Also:
CharFilter.correctOffset(int)
-
setReader
public final void setReader(Reader input) throws IOException
Expert: Set a new reader on the Tokenizer. Typically, an analyzer (in its tokenStream method) will use this to re-use a previously created tokenizer.- Throws:
IOException
-
reset
public void reset() throws IOException
Description copied from class:TokenStream
This method is called by a consumer before it begins consumption usingTokenStream.incrementToken()
.Resets this stream to a clean state. Stateful implementations must implement this method so that they can be reused, just as if they had been created fresh.
If you override this method, always call
super.reset()
, otherwise some internal state will not be correctly reset (e.g.,Tokenizer
will throwIllegalStateException
on further usage).- Overrides:
reset
in classTokenStream
- Throws:
IOException
-
-