4.4 Article

WordPPR: A Researcher-Driven Computational Keyword Selection Method for Text Data Retrieval from Digital Media

期刊

出版社

ROUTLEDGE JOURNALS, TAYLOR & FRANCIS LTD
DOI: 10.1080/19312458.2023.2278177

关键词

-

向作者/读者索取更多资源

Despite the increasing use of digital media data in communication research, the challenge of retrieving data with maximal accuracy and coverage persists. This study introduces the WordPPR method for keyword selection and text data retrieval, which utilizes an iterative query expansion process and the Personalized PageRank algorithm to optimize retrieval precision and recall. The method demonstrates robustness against parameter choice and improvement upon other methods in suggesting additional keywords.
Despite the increasing use of digital media data in communication research, a central challenge persists - retrieving data with maximal accuracy and coverage. Our investigation of keyword-based data collection practices in extant communication research reveals a one-step process, whereas our cross-disciplinary literature review suggests an iterative query expansion process guided by human knowledge and computer intelligence. Hence, we introduce the WordPPR method for keyword selection and text data retrieval, which entails four steps: 1) collecting an initial dataset using core/seed keyword(s); 2) constructing a word graph based on the dataset; 3) applying the Personalized PageRank (PPR) algorithm to rank words in proximity to the seed keyword(s) and selecting new keywords that optimize retrieval precision and recall; 4) repeating steps 1-3 to determine if additional data collection is needed. Without requiring corpus-wide sampling/analysis or extensive manual annotation, this method is well suited for data collection from large-scale digital media corpora. Our simulation studies demonstrate its robustness against parameter choice and its improvement upon other methods in suggesting additional keywords. Its application in Twitter data retrieval is also provided. By advancing a more systematic approach to text data retrieval, this study contributes to improving digital media data retrieval practices in communication research and beyond.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.4
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据