4.4 Article

Expert-Informed Topic Models for Document Set Discovery

期刊

COMMUNICATION METHODS AND MEASURES
卷 16, 期 1, 页码 39-58

出版社

ROUTLEDGE JOURNALS, TAYLOR & FRANCIS LTD
DOI: 10.1080/19312458.2021.1920008

关键词

-

资金

  1. German Research Foundation (DFG) [WE 2888/6-1]

向作者/读者索取更多资源

In text-as-data studies, expert-informed topic modeling (EITM) is proposed as a flexible and efficient approach to help researchers identify and select subsets of documents addressing specific topics within large text corpora by combining external domain knowledge and probabilistic topic models.
The first step in many text-as-data studies is to find documents that address a specific topic within a larger document set. Researchers often rely on simple keyword searches to do this, even though this may introduce considerable selection bias. Such bias may be even greater when researchers lack the domain knowledge required to make informed search decisions, for example, in cross-national research or research on unfamiliar social contexts. We propose expert-informed topic modeling (EITM) as a hybrid approach to tackle this problem. EITM combines the validity of external domain knowledge captured through expert surveys with probabilistic topic models to help researchers identify subsets of documents that cover initially unknown domain-specific topics, such as specific events and debates, that belong to a researcher-defined master topic. EITM is a flexible and efficient approach to the thematic selection of documents from large text corpora for further study. We benchmark and validate the method by discovering blog posts that address the public role of religion within large corpora of Australian, Swiss, and Turkish blog posts and provide researchers with a complete workflow to guide the application of EITM in their own work.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.4
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据