Class SimilarityBase
java.lang.Object
org.apache.lucene.search.similarities.Similarity
org.apache.lucene.search.similarities.SimilarityBase
- Direct Known Subclasses:
- DFRSimilarity,- IBSimilarity,- LMSimilarity
A subclass of 
Similarity that provides a simplified API for its
 descendants. Subclasses are only required to implement the score(org.apache.lucene.search.similarities.BasicStats, float, float)
 and toString() methods. Implementing
 explain(Explanation, BasicStats, int, float, float) is optional,
 inasmuch as SimilarityBase already provides a basic explanation of the score
 and the term frequency. However, implementers of a subclass are encouraged to
 include as much detail about the scoring method as possible.
 Note: multi-word queries such as phrase queries are scored in a different way than Lucene's default ranking algorithm: whereas it "fakes" an IDF value for the phrase as a whole (since it does not know it), this class instead scores phrases as a summation of the individual term scores.
- 
Nested Class SummaryNested classes/interfaces inherited from class org.apache.lucene.search.similarities.SimilaritySimilarity.SimScorer, Similarity.SimWeight
- 
Field SummaryFieldsModifier and TypeFieldDescriptionprotected booleanTrue if overlap tokens (tokens with a position of increment of zero) are discounted from the document's length.
- 
Constructor SummaryConstructors
- 
Method SummaryModifier and TypeMethodDescriptionlongcomputeNorm(FieldInvertState state) Encodes the document length in the same way asTFIDFSimilarity.final Similarity.SimWeightcomputeWeight(float queryBoost, CollectionStatistics collectionStats, TermStatistics... termStats) Compute any collection-level weight (e.g.protected floatdecodeNormValue(byte norm) Decodes a normalization factor (document length) stored in an index.protected byteencodeNormValue(float boost, float length) Encodes the length to a byte via SmallFloat.protected voidexplain(Explanation expl, BasicStats stats, int doc, float freq, float docLen) Subclasses should implement this method to explain the score.protected Explanationexplain(BasicStats stats, int doc, Explanation freq, float docLen) Explains the score.protected voidfillBasicStats(BasicStats stats, CollectionStatistics collectionStats, TermStatistics termStats) Fills all member fields defined inBasicStatsinstats.booleanReturns true if overlap tokens are discounted from the document's length.static doublelog2(double x) Returns the base two logarithm ofx.protected BasicStatsFactory method to return a custom stats objectprotected abstract floatscore(BasicStats stats, float freq, float docLen) Scores the documentdoc.voidsetDiscountOverlaps(boolean v) Determines whether overlap tokens (Tokens with 0 position increment) are ignored when computing norm.simScorer(Similarity.SimWeight stats, AtomicReaderContext context) Creates a newSimilarity.SimScorerto score matching documents from a segment of the inverted index.abstract StringtoString()Subclasses must override this method to return the name of the Similarity and preferably the values of parameters (if any) as well.Methods inherited from class org.apache.lucene.search.similarities.Similaritycoord, queryNorm
- 
Field Details- 
discountOverlapsprotected boolean discountOverlapsTrue if overlap tokens (tokens with a position of increment of zero) are discounted from the document's length.
 
- 
- 
Constructor Details- 
SimilarityBasepublic SimilarityBase()Sole constructor. (For invocation by subclass constructors, typically implicit.)
 
- 
- 
Method Details- 
setDiscountOverlapspublic void setDiscountOverlaps(boolean v) Determines whether overlap tokens (Tokens with 0 position increment) are ignored when computing norm. By default this is true, meaning overlap tokens do not count when computing norms.
- 
getDiscountOverlapspublic boolean getDiscountOverlaps()Returns true if overlap tokens are discounted from the document's length.- See Also:
 
- 
computeWeightpublic final Similarity.SimWeight computeWeight(float queryBoost, CollectionStatistics collectionStats, TermStatistics... termStats) Description copied from class:SimilarityCompute any collection-level weight (e.g. IDF, average document length, etc) needed for scoring a query.- Specified by:
- computeWeightin class- Similarity
- Parameters:
- queryBoost- the query-time boost.
- collectionStats- collection-level statistics, such as the number of tokens in the collection.
- termStats- term-level statistics, such as the document frequency of a term across the collection.
- Returns:
- SimWeight object with the information this Similarity needs to score a query.
 
- 
newStatsFactory method to return a custom stats object
- 
fillBasicStatsprotected void fillBasicStats(BasicStats stats, CollectionStatistics collectionStats, TermStatistics termStats) Fills all member fields defined inBasicStatsinstats. Subclasses can override this method to fill additional stats.
- 
scoreScores the documentdoc.Subclasses must apply their scoring formula in this class. - Parameters:
- stats- the corpus level statistics.
- freq- the term frequency.
- docLen- the document length.
- Returns:
- the score.
 
- 
explainSubclasses should implement this method to explain the score.explalready contains the score, the name of the class and the doc id, as well as the term frequency and its explanation; subclasses can add additional clauses to explain details of their scoring formulae.The default implementation does nothing. - Parameters:
- expl- the explanation to extend with details.
- stats- the corpus level statistics.
- doc- the document id.
- freq- the term frequency.
- docLen- the document length.
 
- 
explainExplains the score. The implementation here provides a basic explanation in the format score(name-of-similarity, doc=doc-id, freq=term-frequency), computed from:, and attaches the score (computed via thescore(BasicStats, float, float)method) and the explanation for the term frequency. Subclasses content with this format may add additional details inexplain(Explanation, BasicStats, int, float, float).- Parameters:
- stats- the corpus level statistics.
- doc- the document id.
- freq- the term frequency and its explanation.
- docLen- the document length.
- Returns:
- the explanation.
 
- 
simScorerpublic Similarity.SimScorer simScorer(Similarity.SimWeight stats, AtomicReaderContext context) throws IOException Description copied from class:SimilarityCreates a newSimilarity.SimScorerto score matching documents from a segment of the inverted index.- Specified by:
- simScorerin class- Similarity
- Parameters:
- stats- collection information from- Similarity.computeWeight(float, CollectionStatistics, TermStatistics...)
- context- segment of the inverted index to be scored.
- Returns:
- SloppySimScorer for scoring documents across context
- Throws:
- IOException- if there is a low-level I/O error
 
- 
toStringSubclasses must override this method to return the name of the Similarity and preferably the values of parameters (if any) as well.
- 
computeNormEncodes the document length in the same way asTFIDFSimilarity.- Specified by:
- computeNormin class- Similarity
- Parameters:
- state- current processing state for this field
- Returns:
- computed norm value
 
- 
decodeNormValueprotected float decodeNormValue(byte norm) Decodes a normalization factor (document length) stored in an index.- See Also:
 
- 
encodeNormValueprotected byte encodeNormValue(float boost, float length) Encodes the length to a byte via SmallFloat.
- 
log2public static double log2(double x) Returns the base two logarithm ofx.
 
-