4.1 Article

Understanding inverse document frequency: on theoretical arguments for IDF

Journal

JOURNAL OF DOCUMENTATION
Volume 60, Issue 5, Pages 503-520

Publisher

EMERALD GROUP PUBLISHING LTD
DOI: 10.1108/00220410410560582

Keywords

information theory; probabilistic analysis; modelling; text retrieval

Ask authors/readers for more resources

The term-weighting function known as IDF was proposed in 1972, and has since been extremely widely used, usually as part of a TF*IDF function. It is often described as a heuristic, and many papers have been written (some based on Shannon's Information Theory) seeking to establish some theoretical basis for it Some of these attempts are reviewed, and it is shown that the Information Theory approaches are problematic, but that there are good theoretical justifications of both IDF and TF*IDF in the traditional probabilistic model of information retrieval.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.1
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available