期刊
PSYCHOLOGICAL METHODS
卷 21, 期 4, 页码 493-506出版社
AMER PSYCHOLOGICAL ASSOC
DOI: 10.1037/met0000105
关键词
computational social science; big data; digital footprints; R; personality
资金
- Robert Bosch Stanford Graduate Fellowship
- Google Faculty Research Award
- National Science Foundation
- Defense Advanced Research Projects Agency (DARPA)
- Stanford Center for the Study of Language and Information
This article aims to introduce the reader to essential tools that can be used to obtain insights and build predictive models using large data sets. Recent user proliferation in the digital environment has led to the emergence of large samples containing a wealth of traces of human behaviors, communication, and social interactions. Such samples offer the opportunity to greatly improve our understanding of individuals, groups, and societies, but their analysis presents unique methodological challenges. In this tutorial, we discuss potential sources of such data and explain how to efficiently store them. Then, we introduce two methods that are often employed to extract patterns and reduce the dimensionality of large data sets: singular value decomposition and latent Dirichlet allocation. Finally, we demonstrate how to use dimensions or clusters extracted from data to build predictive models in a cross-validated way. The text is accompanied by examples of R code and a sample data set, allowing the reader to practice the methods discussed here. A companion website (http://dataminingtutorial.com) provides additional learning resources.
作者
我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。
推荐
暂无数据