4.2 Article

Applying Computerized-Scoring Models of Written Biological Explanations across Courses and Colleges: Prospects and Limitations

Journal

CBE-LIFE SCIENCES EDUCATION
Volume 10, Issue 4, Pages 379-393

Publisher

AMER SOC CELL BIOLOGY
DOI: 10.1187/cbe.11-08-0081

Keywords

-

Funding

  1. National Science Foundation (NSF) [090999]
  2. AACR group
  3. Direct For Education and Human Resources
  4. Division Of Undergraduate Education [1022791] Funding Source: National Science Foundation
  5. Division Of Research On Learning
  6. Direct For Education and Human Resources [1340578] Funding Source: National Science Foundation
  7. Division Of Research On Learning
  8. Direct For Education and Human Resources [0909999] Funding Source: National Science Foundation
  9. Division Of Undergraduate Education
  10. Direct For Education and Human Resources [1022653] Funding Source: National Science Foundation

Ask authors/readers for more resources

Our study explored the prospects and limitations of using machine-learning software to score introductory biology students' written explanations of evolutionary change. We investigated three research questions: 1) Do scoring models built using student responses at one university function effectively at another university? 2) How many human-scored student responses are needed to build scoring models suitable for cross-institutional application? 3) What factors limit computer-scoring efficacy, and how can these factors be mitigated? To answer these questions, two biology experts scored a corpus of 2556 short-answer explanations ( from biology majors and nonmajors) at two universities for the presence or absence of five key concepts of evolution. Human-and computer-generated scores were compared using kappa agreement statistics. We found that machine-learning software was capable in most cases of accurately evaluating the degree of scientific sophistication in undergraduate majors' and nonmajors' written explanations of evolutionary change. In cases in which the software did not perform at the benchmark of near-perfect agreement (kappa > 0.80), we located the causes of poor performance and identified a series of strategies for their mitigation. Machine-learning software holds promise as an assessment tool for use in undergraduate biology education, but like most assessment tools, it is also characterized by limitations.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.2
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available