Class IndriDirichletSimilarity
- java.lang.Object
-
- org.apache.lucene.search.similarities.Similarity
-
- org.apache.lucene.search.similarities.SimilarityBase
-
- org.apache.lucene.search.similarities.LMSimilarity
-
- org.apache.lucene.search.similarities.IndriDirichletSimilarity
-
public class IndriDirichletSimilarity extends LMSimilarity
Bayesian smoothing using Dirichlet priors as implemented in the Indri Search engine (http://www.lemurproject.org/indri.php). Indri Dirichelet Smoothing!tf_E + mu*P(t|D) P(t|E)= documentLength + documentMu mu*P(t|C) + tf_D where P(t|D)= doclen + mu
A larger value for mu, produces more smoothing. Smoothing is most important for short documents where the probabilities are more granular.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static classIndriDirichletSimilarity.IndriCollectionModelModelsp(w|C)as the number of occurrences of the term in the collection, divided by the total number of tokens+ 1.-
Nested classes/interfaces inherited from class org.apache.lucene.search.similarities.LMSimilarity
LMSimilarity.CollectionModel, LMSimilarity.DefaultCollectionModel, LMSimilarity.LMStats
-
Nested classes/interfaces inherited from class org.apache.lucene.search.similarities.Similarity
Similarity.SimScorer
-
-
Field Summary
-
Fields inherited from class org.apache.lucene.search.similarities.LMSimilarity
collectionModel
-
-
Constructor Summary
Constructors Constructor Description IndriDirichletSimilarity()Instantiates the similarity with the default μ value of 2000.IndriDirichletSimilarity(float mu)Instantiates the similarity with the provided μ parameter.IndriDirichletSimilarity(LMSimilarity.CollectionModel collectionModel)Instantiates the similarity with the default μ value of 2000.IndriDirichletSimilarity(LMSimilarity.CollectionModel collectionModel, boolean discountOverlaps, float mu)Instantiates the similarity with the provided parameters.IndriDirichletSimilarity(LMSimilarity.CollectionModel collectionModel, float mu)Instantiates the similarity with the provided μ parameter.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description protected voidexplain(List<Explanation> subs, BasicStats stats, double freq, double docLen)Subclasses should implement this method to explain the score.floatgetMu()Returns the μ parameter.StringgetName()Returns the name of the LM method.protected doublescore(BasicStats stats, double freq, double docLen)Scores the documentdoc.-
Methods inherited from class org.apache.lucene.search.similarities.LMSimilarity
fillBasicStats, newStats, toString
-
Methods inherited from class org.apache.lucene.search.similarities.SimilarityBase
explain, log2, scorer
-
Methods inherited from class org.apache.lucene.search.similarities.Similarity
computeNorm, getDiscountOverlaps
-
-
-
-
Constructor Detail
-
IndriDirichletSimilarity
public IndriDirichletSimilarity(LMSimilarity.CollectionModel collectionModel, boolean discountOverlaps, float mu)
Instantiates the similarity with the provided parameters.
-
IndriDirichletSimilarity
public IndriDirichletSimilarity(LMSimilarity.CollectionModel collectionModel, float mu)
Instantiates the similarity with the provided μ parameter.
-
IndriDirichletSimilarity
public IndriDirichletSimilarity(float mu)
Instantiates the similarity with the provided μ parameter.
-
IndriDirichletSimilarity
public IndriDirichletSimilarity(LMSimilarity.CollectionModel collectionModel)
Instantiates the similarity with the default μ value of 2000.
-
IndriDirichletSimilarity
public IndriDirichletSimilarity()
Instantiates the similarity with the default μ value of 2000.
-
-
Method Detail
-
score
protected double score(BasicStats stats, double freq, double docLen)
Description copied from class:SimilarityBaseScores the documentdoc.Subclasses must apply their scoring formula in this class.
- Specified by:
scorein classSimilarityBase- Parameters:
stats- the corpus level statistics.freq- the term frequency.docLen- the document length.- Returns:
- the score.
-
explain
protected void explain(List<Explanation> subs, BasicStats stats, double freq, double docLen)
Description copied from class:SimilarityBaseSubclasses should implement this method to explain the score.explalready contains the score, the name of the class and the doc id, as well as the term frequency and its explanation; subclasses can add additional clauses to explain details of their scoring formulae.The default implementation does nothing.
- Overrides:
explainin classLMSimilarity- Parameters:
subs- the list of details of the explanation to extendstats- the corpus level statistics.freq- the term frequency.docLen- the document length.
-
getMu
public float getMu()
Returns the μ parameter.
-
getName
public String getName()
Description copied from class:LMSimilarityReturns the name of the LM method. The values of the parameters should be included as well.Used in
LMSimilarity.toString().- Specified by:
getNamein classLMSimilarity
-
-