4.6 Article

Agreement, the F-measure, and reliability in information retrieval

Journal

Publisher

HANLEY & BELFUS INC
DOI: 10.1197/jamia.M1733

Keywords

-

Funding

  1. NLM NIH HHS [R01 LM06919, N01 LM07079] Funding Source: Medline

Ask authors/readers for more resources

Information retrieval studies that involve searching the Internet or marking phrases usually lack a well-defined number of negative cases. This prevents the use of traditional interrater reliability metrics like the K statistic to assess the quality of expert-generated gold standards. Such studies often quantify system performance as precision, recall, and F-measure, or as agreement. It can be shown that the average F-measure among pairs of experts is numerically identical to the average positive specific agreement among experts and that K approaches these measures as the number of negative cases grows large. Positive specific agreement-or the equivalent F-measure may be an appropriate way to quantify interrater reliability and therefore to assess the reliability of a gold standard in these studies.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available