4.8 Article

How to make causal inferences using texts

Journal

SCIENCE ADVANCES
Volume 8, Issue 42, Pages -

Publisher

AMER ASSOC ADVANCEMENT SCIENCE
DOI: 10.1126/sciadv.abg2652

Keywords

-

Funding

  1. Eunice Kennedy Shriver National Institute of Child Health and Human Development of the National Institutes of Health [P2CHD047879]
  2. National Science Foundation under the Resource Implementations for Data Intensive Research program [1738411, 1738288]
  3. Divn Of Social and Economic Sciences
  4. Direct For Social, Behav & Economic Scie [1738288, 1738411] Funding Source: National Science Foundation

Ask authors/readers for more resources

Text as data techniques have the potential to test social science theories by using large collections of text. However, estimating the latent representation of the text may introduce risks. To address these risks, a split-sample workflow is introduced for rigorous causal inferences.
Text as data techniques offer a great promise: the ability to inductively discover measures that are useful for testing social science theories with large collections of text. Nearly all text-based causal inferences depend on a latent representation of the text, but we show that estimating this latent representation from the data creates underacknowledged risks: we may introduce an identification problem or overfit. To address these risks, we introduce a split-sample workflow for making rigorous causal inferences with discovered measures as treatments or outcomes. We then apply it to estimate causal effects from an experiment on immigration attitudes and a study on bureaucratic responsiveness.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.8
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available