☆ 4.7 Article

Joint upper & expected value normalization for evaluation of retrieval systems: A case study with Learning-to-Rank methods

INFORMATION PROCESSING & MANAGEMENT (2023)

期刊

INFORMATION PROCESSING & MANAGEMENT

卷 60, 期 4, 页码 -

出版社

ELSEVIER SCI LTD

DOI: 10.1016/j.ipm.2023.103404

关键词

Information retrieval evaluation; Upper expected value; Normalization; Learning to Rank

类别

Computer Science, Information Systems Information Science & Library Science

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

This paper introduces a new approach for information retrieval evaluation metrics that combines upper bound normalization and expected value normalization. Two case studies demonstrate the advantages of this new approach compared to traditional methods. Experimental results show that the proposed expected value normalized metrics have better discriminatory power and consistency, suggesting that the IR community should seriously consider expected value normalization when computing nDCG and MAP.

While original IR evaluation metrics are normalized in terms of their upper bounds based on an ideal ranked list, a corresponding expected value normalization for them has not yet been studied. We present a framework with both upper and expected value normalization, where the expected value is estimated from a randomized ranking of the corresponding documents present in the evaluation set. We next conducted two case studies by instantiating the new framework for two popular IR evaluation metrics (e.g., nDCG, MAP) and then comparing them against the traditional metrics. Experiments on two Learning-to-Rank (LETOR) benchmark data sets, MSLR-WEB30K (in-cludes 30K queries and 3771K documents) and MQ2007 (includes 1700 queries and 60K documents), with eight LETOR methods (pairwise & listwise), demonstrate the following properties of the new expected value normalized metric: (1) Statistically significant differences (between two methods) in terms of original metric no longer remain statistically significant in terms of Upper Expected(UE) normalized version and vice-versa, especially for uninformative query-sets. (2) When compared against the original metric, our proposed UE normalized metrics demonstrate an average of 23% and 19% increase in terms of Discriminatory Power on MSLR-WEB30K and MQ2007 data sets, respectively. We found similar improvements in terms of consistency as well; for example, UE-normalized MAP decreases the swap rate by 28% while comparing across different data sets and 26% across different query sets within the same data set. These findings suggest that the IR community should consider UE normalization seriously when computing nDCG and MAP and more in-depth study of UE normalization for general IR evaluation is warranted.

Joint upper & expected value normalization for evaluation of retrieval systems: A case study with Learning-to-Rank methods

期刊

INFORMATION PROCESSING & MANAGEMENT

出版社

ELSEVIER SCI LTD

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Joint upper & expected value normalization for evaluation of retrieval systems: A case study with Learning-to-Rank methods

期刊

INFORMATION PROCESSING & MANAGEMENT

出版社

ELSEVIER SCI LTD

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文