☆ 4.5 Article

Interpreting TF-IDF term weights as making relevance decisions

ACM TRANSACTIONS ON INFORMATION SYSTEMS (2008)

期刊

ACM TRANSACTIONS ON INFORMATION SYSTEMS

卷 26, 期 3, 页码 -

出版社

ASSOC COMPUTING MACHINERY

DOI: 10.1145/1361684.1361686

关键词

design; experimentation; languages; performance; information retrieval; term weight; relevance decision

类别

Computer Science, Information Systems

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

A novel probabilistic retrieval model is presented. It forms a basis to interpret the TF-IDF term weights as making relevance decisions. It simulates the local relevance decision-making for every location of a document, and combines all of these local relevance decisions as the document-wide relevance decision for the document. The significance of interpreting TF-IDF in this way is the potential to: (1) establish a unifying perspective about information retrieval as relevance decision-making; and (2) develop advanced TF-IDF-related term weights for future elaborate retrieval models. Our novel retrieval model is simplified to a basic ranking formula that directly corresponds to the TF-IDF term weights. In general, we show that the term-frequency factor of the ranking formula can be rendered into different term-frequency factors of existing retrieval systems. In the basic ranking formula, the remaining quantity -log p((r) over bar |t epsilon d) is interpreted as the probability of randomly picking a nonrelevant usage (denoted by (r) over bar) of term t. Mathematically, we show that this quantity can be approximated by the inverse document-frequency (IDF). Empirically, we show that this quantity is related to IDF, using four reference TREC ad hoc retrieval data collections.

Interpreting TF-IDF term weights as making relevance decisions

期刊

ACM TRANSACTIONS ON INFORMATION SYSTEMS

出版社

ASSOC COMPUTING MACHINERY

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Interpreting TF-IDF term weights as making relevance decisions

期刊

ACM TRANSACTIONS ON INFORMATION SYSTEMS

出版社

ASSOC COMPUTING MACHINERY

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文