4.5 Article

Testing the Impact of Novel Assessment Sources and Machine Learning Methods on Predictive Outcome Modeling in Undergraduate Biology

Journal

JOURNAL OF SCIENCE EDUCATION AND TECHNOLOGY
Volume 30, Issue 2, Pages 193-209

Publisher

SPRINGER
DOI: 10.1007/s10956-020-09888-8

Keywords

Machine learning; Assessment; Predictive learning analytics; Concept inventories; Course- vs. institution-specific data sources; Introductory biology

Funding

  1. Howard Hughes Medical Institute Science Education Program

Ask authors/readers for more resources

This study explores the effectiveness of incorporating concept inventories and using machine learning methods to predict and address attrition in undergraduate science courses. The results show that including course-specific data significantly improves prediction performance, with ensemble ML methods yielding higher AUC values compared to non-ensemble techniques. Logistic regression performed the poorest and increasing corpus size did not impact prediction success meaningfully. The study discusses the potential roles of novel assessment types and ML techniques in enhancing predictive learning analytics and reducing attrition in undergraduate science education.
High levels of attrition characterize undergraduate science courses in the USA. Predictive analytics research seeks to build models that identify at-risk students and suggest interventions that enhance student success. This study examines whether incorporating a novel assessment type (concept inventories [CI]) and using machine learning (ML) methods (1) improves prediction quality, (2) reduces the time point of successful prediction, and (3) suggests more actionable course-level interventions. A corpus of university and course-level assessment and non-assessment variables (53 variables in total) from 3225 students (over six semesters) was gathered. Five ML methods were employed (two individuals, three ensembles) at three time points (pre-course, week 3, week 6) to quantify predictive efficacy. Inclusion of course-specific CI data along with university-specific corpora significantly improved prediction performance. Ensemble ML methods, in particular the generalized linear model with elastic net (GLMNET), yielded significantly higher area under the curve (AUC) values compared with non-ensemble techniques. Logistic regression achieved the poorest prediction performance and consistently underperformed. Surprisingly, increasing corpus size (i.e., amount of historical data) did not meaningfully impact prediction success. We discuss the roles that novel assessment types and ML techniques may play in advancing predictive learning analytics and addressing attrition in undergraduate science education.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.5
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available