4.2 Article

Analyzing Polytomous Test Data: A Comparison Between an Information-Based IRT Model and the Generalized Partial Credit Model

出版社

SAGE PUBLICATIONS INC
DOI: 10.3102/10769986231207879

关键词

item response theory; item characteristic curves; nonparametric IRT; simulation

向作者/读者索取更多资源

Item response theory (IRT) models the relationship between test item scores and a test taker's latent trait. This study compares two models for tests with polytomously scored items: the optimal scoring (OS) model and the generalized partial credit (GPC) model. The OS model demonstrates superior fit compared to the GPC model in real data examples, but has larger standard errors in simulation studies. The study also explores the use of surprisal arc length, a scale invariant measure of ability, and illustrates its potential as an alternative to sum scores.
Item response theory (IRT) models the relationship between the possible scores on a test item against a test taker's attainment of the latent trait that the item is intended to measure. In this study, we compare two models for tests with polytomously scored items: the optimal scoring (OS) model, a nonparametric IRT model based on the principles of information theory, and the generalized partial credit (GPC) model, a widely used parametric alternative. We evaluate these models using both simulated and real test data. In the real data examples, the OS model demonstrates superior model fit compared to the GPC model across all analyzed datasets. In our simulation study, the OS model outperforms the GPC model in terms of bias, but at the cost of larger standard errors for the probabilities along the estimated item response functions. Furthermore, we illustrate how surprisal arc length, an IRT scale invariant measure of ability with metric properties, can be used to put scores from vastly different types of IRT models on a common scale. We also demonstrate how arc length can be a viable alternative to sum scores for scoring test takers.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.2
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据