4.5 Article

A Meta-Analysis of Machine Learning-Based Science Assessments: Factors Impacting Machine-Human Score Agreements

期刊

JOURNAL OF SCIENCE EDUCATION AND TECHNOLOGY
卷 30, 期 3, 页码 361-379

出版社

SPRINGER
DOI: 10.1007/s10956-020-09875-z

关键词

Machine learning; Science assessment; Meta-analysis; Interrater reliability; Validity; Cohen's kappa; Artificial Intelligence

资金

  1. National Science Foundation [DUE-1561159]

向作者/读者索取更多资源

This study conducted a meta-analysis of machine scoring in science assessment, identifying six factors that impact scoring success and showing that algorithm and subject domain have significant effects on scoring success.
Machine learning (ML) has been increasingly employed in science assessment to facilitate automatic scoring efforts, although with varying degrees of success (i.e., magnitudes of machine-human score agreements [MHAs]). Little work has empirically examined the factors that impact MHA disparities in this growing field, thus constraining the improvement of machine scoring capacity and its wide applications in science education. We performed a meta-analysis of 110 studies of MHAs in order to identify the factors most strongly contributing to scoring success (i.e., high Cohen's kappa [kappa]). We empirically examined six factors proposed as contributors to MHA magnitudes: algorithm, subject domain, assessment format, construct, school level, and machine supervision type. Our analyses of 110 MHAs revealed substantial heterogeneity in kappa(mean=.64; range = .09-.97, taking weights into consideration). Using three-level random-effects modeling, MHA score heterogeneity was explained by the variability both within publications (i.e., the assessment task level: 82.6%) and between publications (i.e., the individual study level: 16.7%). Our results also suggest that all six factors have significant moderator effects on scoring success magnitudes. Among these, algorithm and subject domain had significantly larger effects than the other factors, suggesting that technical features and assessment external features might be primary targets for improving MHAs and ML-based science assessments.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.5
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据