☆ 4.5 Article

Using Machine Learning to Score Multi-Dimensional Assessments of Chemistry and Physics

JOURNAL OF SCIENCE EDUCATION AND TECHNOLOGY (2021)

Journal

JOURNAL OF SCIENCE EDUCATION AND TECHNOLOGY

Volume 30, Issue 2, Pages 239-254

Publisher

SPRINGER

DOI: 10.1007/s10956-020-09895-9

Keywords

Three-dimensional science learning; Machine learning; Automatic scoring

Funding

National Science Foundation
OISE [1545684]

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

Researchers advocate for science learning assessments that require deeper understanding and reasoning to enhance science literacy. This study explored the use of machine learning text analysis as an alternative to human scoring for constructed response items, finding that machine scoring algorithms demonstrated comparable accuracy to human raters across multiple dimensions of science learning assessments.

In response to the call for promoting three-dimensional science learning (NRC, 2012), researchers argue for developing assessment items that go beyond rote memorization tasks to ones that require deeper understanding and the use of reasoning that can improve science literacy. Such assessment items are usually performance-based constructed responses and need technology involvement to ease the burden of scoring placed on teachers. This study responds to this call by examining the use and accuracy of a machine learning text analysis protocol as an alternative to human scoring of constructed response items. The items we employed represent multiple dimensions of science learning as articulated in the 2012 NRC report. Using a sample of over 26,000 constructed responses taken by 6700 students in chemistry and physics, we trained human raters and compiled a robust training set to develop machine algorithmic models and cross-validate the machine scores. Results show that human raters yielded good (Cohen's k = .40-.75) to excellent (Cohen's k > .75) interrater reliability on the assessment items with varied numbers of dimensions. A comparison reveals that the machine scoring algorithms achieved comparable scoring accuracy to human raters on these same items. Results also show that responses with formal vocabulary (e.g., velocity) were likely to yield lower machine-human agreements, which may be associated with the fact that fewer students employed formal phrases compared with the informal alternatives.

Using Machine Learning to Score Multi-Dimensional Assessments of Chemistry and Physics

Journal

JOURNAL OF SCIENCE EDUCATION AND TECHNOLOGY

Publisher

SPRINGER

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Using Machine Learning to Score Multi-Dimensional Assessments of Chemistry and Physics

Journal

JOURNAL OF SCIENCE EDUCATION AND TECHNOLOGY

Publisher

SPRINGER

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper