Content deleted Content added
Undoing back to last good version. There appears to have been damage done in edit by Esethu Gebe 7:39 Sept 14th (therefore subsequent edits unnecessary), AFTER Softlavender's last good edits at 5:10 Sept 14th |
Revi. essay, end off topic, and POV Tags: Reverted Visual edit Newcomer task Newcomer task: links |
||
Line 4:
{{Underlinked|date=September 2024}}
}}
In the field of [[information retrieval]], '''divergence
==Definition==
The divergence
<math>\text{weight}(t|d) =k\text{Prob}_M(t\in d|\text{Collection})</math>
# ''M'' represents the type of model of randomness which employs to calculate the probability.
# ''d'' is the total number of words in the documents.
# ''t'' is the number of a specific word in ''d''.
# ''k'' is defined by ''M''.
It is possible
== Probability space ==
Utility-Theoretic Indexing, developed by Cooper and Maron, is a theory of indexing based on [[Utility|utility theory]]. To reflect the 'value' for documents that are expected by the users, [[Index term|index terms]] are assigned to documents. The [[probability distribution]] assigns probabilities to all sets of terms for the vocabulary.
In [[information retrieval]], the term ''experiment'' alludes to the notion that the document can be acted as if it is a sequence of outcomes or just a [[Statistical population|sample of the items]]. The number of 'trials', where each word occurrence is a 'trial', can be assumed to be [[Independence (probability theory)|independent of each other]]; the probability distribution over the vocabulary is the [[Bernoulli process|same]] for each word.
==References==
<references />
=== General references ===
*Amati, Giambattista (2003). [https://theses.gla.ac.uk/1570/1/2003amatiphd.pdf Probabilistic Models of Information Retrieval Based on Measuring the Divergence from Randomness]. [[University of Glasgow]].
==External links==
*[http://terrier.org/docs/v3.5/dfr_description.html Terrier
[[Category:Ranking functions]]
|