4.1 Article

Which noise affects algorithm robustness for learning to rank

Journal

INFORMATION RETRIEVAL JOURNAL
Volume 18, Issue 3, Pages 215-245

Publisher

SPRINGER
DOI: 10.1007/s10791-015-9253-3

Keywords

Learning to rank; Label noise; Robust data

Funding

  1. 973 Program of China [2014CB340401, 2012CB316303]
  2. 863 Program of China [2014AA015204]
  3. National Natural Science of China [61472401, 61203298]
  4. National Key Technology R&D Program of China [2012BAH46B04]

Ask authors/readers for more resources

When applying learning to rank algorithms in real search applications, noise in human labeled training data becomes an inevitable problem which will affect the performance of the algorithms. Previous work mainly focused on studying how noise affects ranking algorithms and how to design robust ranking algorithms. In our work, we investigate what inherent characteristics make training data robust to label noise and how to utilize them to guide labeling. The motivation of our work comes from an interesting observation that a same ranking algorithm may show very different sensitivities to label noise over different data sets. We thus investigate the underlying reason for this observation based on three typical kinds of learning to rank algorithms (i.e. pointwise, pairwise and listwise methods) and three public data sets (i.e. OHSUMED, TD2003 and MSLR-WEB10K) with different properties. We find that when label noise increases in training data, it is the document pair noise ratio (referred to as pNoise) rather than document noise ratio (referred to as dNoise) that can well explain the performance degradation of a ranking algorithm. We further identify two inherent characteristics of the training data, namely relevance levels and label balance, that have great impact on the variation of pNoise with respect to label noise (i.e. dNoise). According to these above results, we further discuss some guidelines on the labeling strategy to construct robust training data for learning to rank algorithms in practice.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.1
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available