4.5 Article

Human ratings take time: A hierarchical facets model for the joint analysis of ratings and rating times

期刊

BEHAVIOR RESEARCH METHODS
卷 -, 期 -, 页码 -

出版社

SPRINGER
DOI: 10.3758/s13428-023-02259-2

关键词

Performance assessments; Onscreen rating; Rating time; Rasch facets models; Item response theory; Rater cognition

向作者/读者索取更多资源

Assessments using onscreen or internet-based technology for human ratings provide the benefit of automatically recording rating times. The hierarchical facets model for ratings and rating times (HFM-RT) is proposed to incorporate rating times as an additional data source to improve the quality of assessment outcomes. By analyzing simulated and real data, the HFM-RT successfully retrieved examinee and rater parameters, and demonstrated superior reliability indices in simulation. However, in the real-data analysis, the improvement in reliability was not significant due to the heterogeneity of examinees' writing proficiency.
Performance assessments increasingly utilize onscreen or internet-based technology to collect human ratings. One of the benefits of onscreen ratings is the automatic recording of rating times along with the ratings. Considering rating times as an additional data source can provide a more detailed picture of the rating process and improve the psychometric quality of the assessment outcomes. However, currently available models for analyzing performance assessments do not incorporate rating times. The present research aims to fill this gap and advance a joint modeling approach, the hierarchical facets model for ratings and rating times (HFM-RT). The model includes two examinee parameters (ability and time intensity) and three rater parameters (severity, centrality, and speed). The HFM-RT successfully recovered examinee and rater parameters in a simulation study and yielded superior reliability indices. A real-data analysis of English essay ratings collected in a high-stakes assessment context revealed that raters exhibited considerably different speed measures, spent more time on high-quality than low-quality essays, and tended to rate essays faster with increasing severity. However, due to the significant heterogeneity of examinees' writing proficiency, the improvement in the assessment's reliability using the HFM-RT was not salient in the real-data example. This discussion focuses on the advantages of accounting for rating times as a source of information in rating quality studies and highlights perspectives from the HFM-RT for future research on rater cognition.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.5
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据