Class LMSimilarity
- java.lang.Object
-
- org.apache.lucene.search.similarities.Similarity
-
- org.apache.lucene.search.similarities.SimilarityBase
-
- org.apache.lucene.search.similarities.LMSimilarity
-
- Direct Known Subclasses:
LMDirichletSimilarity,LMJelinekMercerSimilarity
public abstract class LMSimilarity extends SimilarityBase
Abstract superclass for language modeling Similarities. The following inner types are introduced:LMSimilarity.LMStats, which defines a new statistic, the probability that the collection language model generates the current term;LMSimilarity.CollectionModel, which is a strategy interface for object that compute the collection language modelp(w|C);LMSimilarity.DefaultCollectionModel, an implementation of the former, that computes the term probability as the number of occurrences of the term in the collection, divided by the total number of tokens.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static interfaceLMSimilarity.CollectionModelA strategy for computing the collection language model.static classLMSimilarity.DefaultCollectionModelModelsp(w|C)as the number of occurrences of the term in the collection, divided by the total number of tokens+ 1.static classLMSimilarity.LMStatsStores the collection distribution of the current term.-
Nested classes/interfaces inherited from class org.apache.lucene.search.similarities.Similarity
Similarity.SimScorer, Similarity.SimWeight
-
-
Field Summary
Fields Modifier and Type Field Description protected LMSimilarity.CollectionModelcollectionModelThe collection model.-
Fields inherited from class org.apache.lucene.search.similarities.SimilarityBase
discountOverlaps
-
-
Constructor Summary
Constructors Constructor Description LMSimilarity()Creates a new instance with the default collection language model.LMSimilarity(LMSimilarity.CollectionModel collectionModel)Creates a new instance with the specified collection language model.
-
Method Summary
All Methods Instance Methods Abstract Methods Concrete Methods Modifier and Type Method Description protected voidexplain(Explanation expl, BasicStats stats, int doc, float freq, float docLen)Subclasses should implement this method to explain the score.protected voidfillBasicStats(BasicStats stats, CollectionStatistics collectionStats, TermStatistics termStats)Computes the collection probability of the current term in addition to the usual statistics.abstract StringgetName()Returns the name of the LM method.protected BasicStatsnewStats(String field, float queryBoost)Factory method to return a custom stats objectStringtoString()Returns the name of the LM method.-
Methods inherited from class org.apache.lucene.search.similarities.SimilarityBase
computeNorm, computeWeight, decodeNormValue, encodeNormValue, explain, getDiscountOverlaps, log2, score, setDiscountOverlaps, simScorer
-
Methods inherited from class org.apache.lucene.search.similarities.Similarity
coord, queryNorm
-
-
-
-
Field Detail
-
collectionModel
protected final LMSimilarity.CollectionModel collectionModel
The collection model.
-
-
Constructor Detail
-
LMSimilarity
public LMSimilarity(LMSimilarity.CollectionModel collectionModel)
Creates a new instance with the specified collection language model.
-
LMSimilarity
public LMSimilarity()
Creates a new instance with the default collection language model.
-
-
Method Detail
-
newStats
protected BasicStats newStats(String field, float queryBoost)
Description copied from class:SimilarityBaseFactory method to return a custom stats object- Overrides:
newStatsin classSimilarityBase
-
fillBasicStats
protected void fillBasicStats(BasicStats stats, CollectionStatistics collectionStats, TermStatistics termStats)
Computes the collection probability of the current term in addition to the usual statistics.- Overrides:
fillBasicStatsin classSimilarityBase
-
explain
protected void explain(Explanation expl, BasicStats stats, int doc, float freq, float docLen)
Description copied from class:SimilarityBaseSubclasses should implement this method to explain the score.explalready contains the score, the name of the class and the doc id, as well as the term frequency and its explanation; subclasses can add additional clauses to explain details of their scoring formulae.The default implementation does nothing.
- Overrides:
explainin classSimilarityBase- Parameters:
expl- the explanation to extend with details.stats- the corpus level statistics.doc- the document id.freq- the term frequency.docLen- the document length.
-
getName
public abstract String getName()
Returns the name of the LM method. The values of the parameters should be included as well.Used in
.toString()
-
toString
public String toString()
Returns the name of the LM method. If a custom collection model strategy is used, its name is included as well.- Specified by:
toStringin classSimilarityBase- See Also:
getName(),LMSimilarity.CollectionModel.getName(),LMSimilarity.DefaultCollectionModel
-
-