4.2 Article

Building Better Models: Prediction, Replication, and Machine Learning in the Social Sciences

Publisher

SAGE PUBLICATIONS INC
DOI: 10.1177/0002716215570279

Keywords

big data; machine learning; predictive modeling; data science; penalized regression; ensemble learning; the Lasso

Ask authors/readers for more resources

Analytic techniques developed for big data have much broader applications in the social sciences, outperforming standard regression models evenor rather especiallyin smaller datasets. This article offers an overview of machine learning methods well-suited to social science problems, including decision trees, dimension reduction methods, nearest neighbor algorithms, support vector models, and penalized regression. In addition to novel algorithms, machine learning places great emphasis on model checking (through holdout samples and cross-validation) and model shrinkage (adjusting predictions toward the mean to reduce overfitting). This article advocates replacing typical regression analyses with two different sorts of models used in concert. A multi-algorithm ensemble approach should be used to determine the noise floor of a given dataset, while simpler methods such as penalized regression or decision trees should be used for theory building and hypothesis testing.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.2
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available