4.6 Article

Agreement, the F-measure, and reliability in information retrieval

出版社

HANLEY & BELFUS INC
DOI: 10.1197/jamia.M1733

关键词

-

资金

  1. NLM NIH HHS [R01 LM06919, N01 LM07079] Funding Source: Medline

向作者/读者索取更多资源

Information retrieval studies that involve searching the Internet or marking phrases usually lack a well-defined number of negative cases. This prevents the use of traditional interrater reliability metrics like the K statistic to assess the quality of expert-generated gold standards. Such studies often quantify system performance as precision, recall, and F-measure, or as agreement. It can be shown that the average F-measure among pairs of experts is numerically identical to the average positive specific agreement among experts and that K approaches these measures as the number of negative cases grows large. Positive specific agreement-or the equivalent F-measure may be an appropriate way to quantify interrater reliability and therefore to assess the reliability of a gold standard in these studies.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据