4.4 Article

WordPPR: A Researcher-Driven Computational Keyword Selection Method for Text Data Retrieval from Digital Media

Journal

COMMUNICATION METHODS AND MEASURES
Volume -, Issue -, Pages -

Publisher

ROUTLEDGE JOURNALS, TAYLOR & FRANCIS LTD
DOI: 10.1080/19312458.2023.2278177

Keywords

-

Categories

Ask authors/readers for more resources

Despite the increasing use of digital media data in communication research, the challenge of retrieving data with maximal accuracy and coverage persists. This study introduces the WordPPR method for keyword selection and text data retrieval, which utilizes an iterative query expansion process and the Personalized PageRank algorithm to optimize retrieval precision and recall. The method demonstrates robustness against parameter choice and improvement upon other methods in suggesting additional keywords.
Despite the increasing use of digital media data in communication research, a central challenge persists - retrieving data with maximal accuracy and coverage. Our investigation of keyword-based data collection practices in extant communication research reveals a one-step process, whereas our cross-disciplinary literature review suggests an iterative query expansion process guided by human knowledge and computer intelligence. Hence, we introduce the WordPPR method for keyword selection and text data retrieval, which entails four steps: 1) collecting an initial dataset using core/seed keyword(s); 2) constructing a word graph based on the dataset; 3) applying the Personalized PageRank (PPR) algorithm to rank words in proximity to the seed keyword(s) and selecting new keywords that optimize retrieval precision and recall; 4) repeating steps 1-3 to determine if additional data collection is needed. Without requiring corpus-wide sampling/analysis or extensive manual annotation, this method is well suited for data collection from large-scale digital media corpora. Our simulation studies demonstrate its robustness against parameter choice and its improvement upon other methods in suggesting additional keywords. Its application in Twitter data retrieval is also provided. By advancing a more systematic approach to text data retrieval, this study contributes to improving digital media data retrieval practices in communication research and beyond.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.4
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available