☆ 4.5 Article

An Empirical Comparative Assessment of Inter-Rater Agreement of Binary Outcomes and Multiple Raters

SYMMETRY-BASEL (2022)

期刊

SYMMETRY-BASEL

卷 14, 期 2, 页码 -

出版社

MDPI

DOI: 10.3390/sym14020262

关键词

inter-rater agreement; inter-rater reliability; observer agreement; Kappa; AC1; Kappa Paradox; meta-analysis; evidence synthesis

类别

Multidisciplinary Sciences

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

This study evaluated the performance of four commonly used inter-rater agreement statistics in the context of multiple raters. The expected values of all four statistics were equal when the outcome prevalence was symmetric, but only the expected values of the three Kappa statistics were equal when the outcome prevalence was asymmetric. Fleiss' Kappa yielded a higher variance in the symmetric case, while Gwet's AC1 yielded a lower variance in the asymmetric case. The authors suggest favoring Gwet's AC1 statistic when the population-level prevalence of outcomes is unknown, and conducting transformations between statistics for direct comparisons between inter-rater agreement measures.

Background: Many methods under the umbrella of inter-rater agreement (IRA) have been proposed to evaluate how well two or more medical experts agree on a set of outcomes. The objective of this work was to assess key IRA statistics in the context of multiple raters with binary outcomes. Methods: We simulated the responses of several raters (2-5) with 20, 50, 300, and 500 observations. For each combination of raters and observations, we estimated the expected value and variance of four commonly used inter-rater agreement statistics (Fleiss' Kappa, Light's Kappa, Conger's Kappa, and Gwet's AC1). Results: In the case of equal outcome prevalence (symmetric), the estimated expected values of all four statistics were equal. In the asymmetric case, only the estimated expected values of the three Kappa statistics were equal. In the symmetric case, Fleiss' Kappa yielded a higher estimated variance than the other three statistics. In the asymmetric case, Gwet's AC1 yielded a lower estimated variance than the three Kappa statistics for each scenario. Conclusion: Since the population-level prevalence of a set of outcomes may not be known a priori, Gwet's AC1 statistic should be favored over the three Kappa statistics. For meaningful direct comparisons between IRA measures, transformations between statistics should be conducted.

An Empirical Comparative Assessment of Inter-Rater Agreement of Binary Outcomes and Multiple Raters

期刊

SYMMETRY-BASEL

出版社

MDPI

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

An Empirical Comparative Assessment of Inter-Rater Agreement of Binary Outcomes and Multiple Raters

期刊

SYMMETRY-BASEL

出版社

MDPI

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文