Class IBSimilarity
- java.lang.Object
-
- org.apache.lucene.search.similarities.Similarity
-
- org.apache.lucene.search.similarities.SimilarityBase
-
- org.apache.lucene.search.similarities.IBSimilarity
-
public class IBSimilarity extends SimilarityBase
Provides a framework for the family of information-based models, as described in Stéphane Clinchant and Eric Gaussier. 2010. Information-based models for ad hoc IR. In Proceeding of the 33rd international ACM SIGIR conference on Research and development in information retrieval (SIGIR '10). ACM, New York, NY, USA, 234-241.The retrieval function is of the form RSV(q, d) = ∑ -xqw log Prob(Xw ≥ tdw | λw), where
- xqw is the query boost;
- Xw is a random variable that counts the occurrences of word w;
- tdw is the normalized term frequency;
- λw is a parameter.
The framework described in the paper has many similarities to the DFR framework (see
DFRSimilarity). It is possible that the two Similarities will be merged at one point.To construct an IBSimilarity, you must specify the implementations for all three components of the Information-Based model.
Distribution: Probabilistic distribution used to model term occurrenceDistributionLL: Log-logisticDistributionLL: Smoothed power-law
Lambda: λw parameter of the probability distributionNormalization: Term frequency normalizationAny supported DFR normalization (listed in
DFRSimilarity)
- See Also:
DFRSimilarity
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from class org.apache.lucene.search.similarities.Similarity
Similarity.SimScorer, Similarity.SimWeight
-
-
Field Summary
Fields Modifier and Type Field Description protected DistributiondistributionThe probabilistic distribution used to model term occurrence.protected LambdalambdaThe lambda (λw) parameter.protected NormalizationnormalizationThe term frequency normalization.-
Fields inherited from class org.apache.lucene.search.similarities.SimilarityBase
discountOverlaps
-
-
Constructor Summary
Constructors Constructor Description IBSimilarity(Distribution distribution, Lambda lambda, Normalization normalization)Creates IBSimilarity from the three components.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description protected voidexplain(Explanation expl, BasicStats stats, int doc, float freq, float docLen)Subclasses should implement this method to explain the score.DistributiongetDistribution()Returns the distributionLambdagetLambda()Returns the distribution's lambda parameterNormalizationgetNormalization()Returns the term frequency normalizationprotected floatscore(BasicStats stats, float freq, float docLen)Scores the documentdoc.StringtoString()The name of IB methods follow the patternIB <distribution> <lambda><normalization>.-
Methods inherited from class org.apache.lucene.search.similarities.SimilarityBase
computeNorm, computeWeight, decodeNormValue, encodeNormValue, explain, fillBasicStats, getDiscountOverlaps, log2, newStats, setDiscountOverlaps, simScorer
-
Methods inherited from class org.apache.lucene.search.similarities.Similarity
coord, queryNorm
-
-
-
-
Field Detail
-
distribution
protected final Distribution distribution
The probabilistic distribution used to model term occurrence.
-
lambda
protected final Lambda lambda
The lambda (λw) parameter.
-
normalization
protected final Normalization normalization
The term frequency normalization.
-
-
Constructor Detail
-
IBSimilarity
public IBSimilarity(Distribution distribution, Lambda lambda, Normalization normalization)
Creates IBSimilarity from the three components.Note that
nullvalues are not allowed: if you want no normalization, instead passNormalization.NoNormalization.- Parameters:
distribution- probabilistic distribution modeling term occurrencelambda- distribution's λw parameternormalization- term frequency normalization
-
-
Method Detail
-
score
protected float score(BasicStats stats, float freq, float docLen)
Description copied from class:SimilarityBaseScores the documentdoc.Subclasses must apply their scoring formula in this class.
- Specified by:
scorein classSimilarityBase- Parameters:
stats- the corpus level statistics.freq- the term frequency.docLen- the document length.- Returns:
- the score.
-
explain
protected void explain(Explanation expl, BasicStats stats, int doc, float freq, float docLen)
Description copied from class:SimilarityBaseSubclasses should implement this method to explain the score.explalready contains the score, the name of the class and the doc id, as well as the term frequency and its explanation; subclasses can add additional clauses to explain details of their scoring formulae.The default implementation does nothing.
- Overrides:
explainin classSimilarityBase- Parameters:
expl- the explanation to extend with details.stats- the corpus level statistics.doc- the document id.freq- the term frequency.docLen- the document length.
-
toString
public String toString()
The name of IB methods follow the patternIB <distribution> <lambda><normalization>. The name of the distribution is the same as in the original paper; for the names of lambda parameters, refer to the javadoc of theLambdaclasses.- Specified by:
toStringin classSimilarityBase
-
getDistribution
public Distribution getDistribution()
Returns the distribution
-
getLambda
public Lambda getLambda()
Returns the distribution's lambda parameter
-
getNormalization
public Normalization getNormalization()
Returns the term frequency normalization
-
-