☆ 4.0 Article

Evaluating comparability in computerized adaptive testing: Issues, criteria and an example

JOURNAL OF EDUCATIONAL MEASUREMENT (2001)

Journal

JOURNAL OF EDUCATIONAL MEASUREMENT

Volume 38, Issue 1, Pages 19-49

Publisher

NATL COUNC MEAS EDUC

DOI: 10.1111/j.1745-3984.2001.tb01115.x

Keywords

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

When a computerized adaptive testing (CAT) version of a test co-exists with its paper-and-pencil (P&P) version, it is important for scores from the CAT version to be comparable to scores from its P&P version. The CAT version may require multiple item pools for test security reasons, and CAT scores based on alternate pools also need to be comparable to each other In this paper we review research literature on CAT comparability issues and synthesize issues specific to these two settings. A framework of criteria for evaluating comparability was developed that contains the following three categories of criteria: validity criterion, psychometric property/reliability criterion, and statistical assumption/test administration condition criterion. Methods for evaluating comparability under these criteria as well as various algorithms for improving comparability are described and discussed. Focusing on the psychometric property/reliability criterion, an example using an item pool of ACT Assessment Mathematics items is provided to demonstrate a process for developing comparable CAT versions and for evaluating comparability. This example illustrates how simulations can be used to improve comparability at the early stages of the development of a CAT The effects of different specifications of practical constraints, such as content balancing and item exposure rate control, and the effects of using alternate item pools are examined. One interesting finding from this study is that a large part of incomparability may be due to the change from number-correct score-based scoring to IRT ability estimation-based scoring. In addition, changes in components of a CAT such as exposure rate control, content balancing, test length, and item pool size were found to result in different levels of comparability in test scores.

Evaluating comparability in computerized adaptive testing: Issues, criteria and an example

Journal

JOURNAL OF EDUCATIONAL MEASUREMENT

Publisher

NATL COUNC MEAS EDUC

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Evaluating comparability in computerized adaptive testing: Issues, criteria and an example

Journal

JOURNAL OF EDUCATIONAL MEASUREMENT

Publisher

NATL COUNC MEAS EDUC

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper