4.5 Article

Assessing Scientific Practices Using Machine-Learning Methods: How Closely Do They Match Clinical Interview Performance?

Journal

JOURNAL OF SCIENCE EDUCATION AND TECHNOLOGY
Volume 23, Issue 1, Pages 160-182

Publisher

SPRINGER
DOI: 10.1007/s10956-013-9461-9

Keywords

Applications in subject areas; Evaluation methodologies; Improving classroom teaching; Pedagogical issues; Teaching/learning strategies

Funding

  1. National Science Foundation REESE program [DRL 0909999]
  2. Direct For Education and Human Resources
  3. Division Of Undergraduate Education [1347578, 1323022, 1347733, 1347740, 1323162, 1322962] Funding Source: National Science Foundation
  4. Division Of Research On Learning
  5. Direct For Education and Human Resources [1340578] Funding Source: National Science Foundation
  6. Division Of Undergraduate Education
  7. Direct For Education and Human Resources [1322872, 1347700, 1323011, 1347729, 1347626] Funding Source: National Science Foundation

Ask authors/readers for more resources

The landscape of science education is being transformed by the new Framework for Science Education (National Research Council, A framework for K-12 science education: practices, crosscutting concepts, and core ideas. The National Academies Press, Washington, DC, 2012), which emphasizes the centrality of scientific practices-such as explanation, argumentation, and communication-in science teaching, learning, and assessment. A major challenge facing the field of science education is developing assessment tools that are capable of validly and efficiently evaluating these practices. Our study examined the efficacy of a free, open-source machine-learning tool for evaluating the quality of students' written explanations of the causes of evolutionary change relative to three other approaches: (1) human-scored written explanations, (2) a multiple-choice test, and (3) clinical oral interviews. A large sample of undergraduates (n = 104) exposed to varying amounts of evolution content completed all three assessments: a clinical oral interview, a written open-response assessment, and a multiple-choice test. Rasch analysis was used to compute linear person measures and linear item measures on a single logit scale. We found that the multiple-choice test displayed poor person and item fit (mean square outfit > 1.3), while both oral interview measures and computer-generated written response measures exhibited acceptable fit (average mean square outfit for interview: person 0.97, item 0.97; computer: person 1.03, item 1.06). Multiple-choice test measures were more weakly associated with interview measures (r = 0.35) than the computer-scored explanation measures (r = 0.63). Overall, Rasch analysis indicated that computer-scored written explanation measures (1) have the strongest correspondence to oral interview measures; (2) are capable of capturing students' normative scientific and naive ideas as accurately as human-scored explanations, and (3) more validly detect understanding than the multiple-choice assessment. These findings demonstrate the great potential of machine-learning tools for assessing key scientific practices highlighted in the new Framework for Science Education.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.5
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available