期刊
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION
卷 12, 期 3, 页码 296-298出版社
HANLEY & BELFUS INC
DOI: 10.1197/jamia.M1733
关键词
-
类别
资金
- NLM NIH HHS [R01 LM06919, N01 LM07079] Funding Source: Medline
Information retrieval studies that involve searching the Internet or marking phrases usually lack a well-defined number of negative cases. This prevents the use of traditional interrater reliability metrics like the K statistic to assess the quality of expert-generated gold standards. Such studies often quantify system performance as precision, recall, and F-measure, or as agreement. It can be shown that the average F-measure among pairs of experts is numerically identical to the average positive specific agreement among experts and that K approaches these measures as the number of negative cases grows large. Positive specific agreement-or the equivalent F-measure may be an appropriate way to quantify interrater reliability and therefore to assess the reliability of a gold standard in these studies.
作者
我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。
推荐
暂无数据