☆ 4.4 Article

Supervised Latent Dirichlet Allocation With Covariates: A Bayesian Structural and Measurement Model of Text and Covariates

PSYCHOLOGICAL METHODS (2023)

期刊

PSYCHOLOGICAL METHODS

卷 -, 期 -, 页码 -

出版社

AMER PSYCHOLOGICAL ASSOC

DOI: 10.1037/met0000541

关键词

text mining; supervised topic modeling; mixture modeling; Bayesian estimation; regression

类别

Psychology, Multidisciplinary

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

This paper proposes a novel statistical model, SLDAX, which combines a latent variable model and a structural regression model to better estimate the topics in text data and use them as predictors. Through simulation studies and empirical applications, the effectiveness of the SLDAX model in psychological research is demonstrated.

Text is a burgeoning data source for psychological researchers, but little methodological research has focused on adapting popular modeling approaches for text to the context of psychological research. One popular measurement model for text, topic modeling, uses a latent mixture model to represent topics underlying a body of documents. Recently, psychologists have studied relationships between these topics and other psychological measures by using estimates of the topics as regression predictors along with other manifest variables. While similar two-stage approaches involving estimated latent variables are known to yield biased estimates and incorrect standard errors, two-stage topic modeling approaches have received limited statistical study and, as we show, are subject to the same problems. To address these problems, we proposed a novel statistical model-supervised latent Dirichlet allocation with covariates (SLDAX)-that jointly incorporates a latent variable measurement model of text and a structural regression model to allow the latent topics and other manifest variables to serve as predictors of an outcome. Using a simulation study with data characteristics consistent with psychological text data, we found that SLDAX estimates were generally more accurate and more efficient. To illustrate the application of SLDAX and a two-stage approach, we provide an empirical clinical application to compare the application of both the two-stage and SLDAX approaches. Finally, we implemented the SLDAX model in an open-source R package to facilitate its use and further study. Text data is an increasingly popular data source in psychological research that can be analyzed with a variety of models and algorithms. Topic models are a popular measurement model that use latent variables to represent constructs underlying a set of documents (e.g., clinical interviews, survey open responses, written or spoken educational assessments). Recent applications have used estimates of these topics as predictors of other variables in a regression model, but the statistical behavior of this approach has not been well studied. Similar approaches with other latent variable models are known to yield incorrect regression coefficient estimates and incorrect inferences. We showed that the use of topic estimates as regression predictors is also prone to these problems. As a solution, we proposed a model that jointly estimates the topic model and regression model-supervised latent Dirichlet allocation with covariates (SLDAX). Using a simulation study under typical psychological text data conditions, we found that SLDAX estimates were generally more accurate and more precise than the two-stage approach. We illustrate the SLDAX and two-stage approaches in a clinical study of nonsuicidal self injury and emotional dysregulation with participant interpersonal narratives. To allow researchers to apply the SLDAX model, we developed an open-source R software package.

Supervised Latent Dirichlet Allocation With Covariates: A Bayesian Structural and Measurement Model of Text and Covariates

期刊

PSYCHOLOGICAL METHODS

出版社

AMER PSYCHOLOGICAL ASSOC

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Supervised Latent Dirichlet Allocation With Covariates: A Bayesian Structural and Measurement Model of Text and Covariates

期刊

PSYCHOLOGICAL METHODS

出版社

AMER PSYCHOLOGICAL ASSOC

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文