☆ 4.5 Article

Best Practices in Supervised Machine Learning: A Tutorial for Psychologists

ADVANCES IN METHODS AND PRACTICES IN PSYCHOLOGICAL SCIENCE (2023)

期刊

ADVANCES IN METHODS AND PRACTICES IN PSYCHOLOGICAL SCIENCE

卷 6, 期 3, 页码 -

出版社

SAGE PUBLICATIONS INC

DOI: 10.1177/25152459231162559

关键词

tutorial; supervised machine learning; cross-validation; interpretable machine learning; random forest; open data; open materials

类别

Psychology Psychology, Multidisciplinary

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

Supervised machine learning is gaining popularity in psychology and other social sciences, but it is not widely taught in psychology programs. This tutorial provides an intuitive but comprehensive introduction to supervised machine learning for psychologists in four modules, covering topics such as resampling methods for evaluating model performance, introducing the random forest model, performing benchmark experiments, and interpreting machine learning models. The tutorial uses R programming language and the mlr3 package, with demonstrations using the PhoneStudy dataset.

Supervised machine learning (ML) is becoming an influential analytical method in psychology and other social sciences. However, theoretical ML concepts and predictive-modeling techniques are not yet widely taught in psychology programs. This tutorial is intended to provide an intuitive but thorough primer and introduction to supervised ML for psychologists in four consecutive modules. After introducing the basic terminology and mindset of supervised ML, in Module 1, we cover how to use resampling methods to evaluate the performance of ML models (bias-variance trade-off, performance measures, k-fold cross-validation). In Module 2, we introduce the nonlinear random forest, a type of ML model that is particularly user-friendly and well suited to predicting psychological outcomes. Module 3 is about performing empirical benchmark experiments (comparing the performance of several ML models on multiple data sets). Finally, in Module 4, we discuss the interpretation of ML models, including permutation variable importance measures, effect plots (partial dependence plots, individual conditional-expectation profiles), and the concept of model fairness. Throughout the tutorial, intuitive descriptions of theoretical concepts are provided, with as few mathematical formulas as possible, and followed by code examples using the mlr3 and companion packages in R. Key practical-analysis steps are demonstrated on the publicly available PhoneStudy data set (N = 624), which includes more than 1,800 variables from smartphone sensing to predict Big Five personality trait scores. The article contains a checklist to be used as a reminder of important elements when performing, reporting, or reviewing ML analyses in psychology. Additional examples and more advanced concepts are demonstrated in online materials (https://osf.io/9273g/).

Best Practices in Supervised Machine Learning: A Tutorial for Psychologists

期刊

ADVANCES IN METHODS AND PRACTICES IN PSYCHOLOGICAL SCIENCE

出版社

SAGE PUBLICATIONS INC

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Best Practices in Supervised Machine Learning: A Tutorial for Psychologists

期刊

ADVANCES IN METHODS AND PRACTICES IN PSYCHOLOGICAL SCIENCE

出版社

SAGE PUBLICATIONS INC

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文