☆ 4.3 Article

Evaluation of construct-irrelevant variance yielded by machine and human scoring of a science teacher PCK constructed response assessment

STUDIES IN EDUCATIONAL EVALUATION (2020)

Journal

STUDIES IN EDUCATIONAL EVALUATION

Volume 67, Issue -, Pages -

Publisher

ELSEVIER

DOI: 10.1016/j.stueduc.2020.100916

Keywords

Construct-irrelevant variance; Machine learning; Pedagogical content knowledge; Science teacher; Constructed response assessment

Funding

National Science Foundation [DGE 1438739, DUE 1323162]

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

Machine learning has been frequently employed to automatically score constructed response assessments. However, there is a lack of evidence of how this predictive scoring approach might be compromised by construct-irrelevant variance (CIV), which is a threat to test validity. In this study, we evaluated machine scores and human scores with regard to potential CIV. We developed two assessment tasks targeting science teacher pedagogical content knowledge (PCK); each task contains three video-based constructed response questions. 187 inservice science teachers watched the videos with each had a given classroom teaching scenario and then responded to the constructed-response items. Three human experts rated the responses and the human-consent scores were used to develop machine learning algorithms to predict ratings of the responses. Including the machine as another independent rater, along with the three human raters, we employed the many-facet Rasch measurement model to examine CIV due to three sources: variability of scenarios, rater severity, and rater sensitivity of the scenarios. Results indicate that variability of scenarios impacts teachers' performance, but the impact significantly depends on the construct of interest; for each assessment task, the machine is always the most severe rater, compared to the three human raters. However, the machine is less sensitive than the human raters to the task scenarios. This means the machine scoring is more consistent and stable across scenarios within each of the two tasks.

Evaluation of construct-irrelevant variance yielded by machine and human scoring of a science teacher PCK constructed response assessment

Journal

STUDIES IN EDUCATIONAL EVALUATION

Publisher

ELSEVIER

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Evaluation of construct-irrelevant variance yielded by machine and human scoring of a science teacher PCK constructed response assessment

Journal

STUDIES IN EDUCATIONAL EVALUATION

Publisher

ELSEVIER

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper