☆ 4.4 Review

Text Preprocessing for Text Mining in Organizational Research: Review and Recommendations

ORGANIZATIONAL RESEARCH METHODS (2022)

期刊

ORGANIZATIONAL RESEARCH METHODS

卷 25, 期 1, 页码 114-146

出版社

SAGE PUBLICATIONS INC

DOI: 10.1177/1094428120971683

关键词

open vocabulary; closed vocabulary; stemming; text mining; best practices

类别

Psychology, Applied Management

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

Recent advances in text mining have provided new methods for leveraging the abundant natural language text data generated by organizations, employees, and customers. However, the decisions made during text preprocessing significantly impact the capture of language content and style, the statistical power of subsequent analyses, and the validity of insights derived from text mining. This study conducts complementary reviews to provide empirically grounded recommendations for text preprocessing decisions, taking into account the type of text mining, research questions, and dataset characteristics. It also provides recommendations for reporting text mining to promote transparency and reproducibility.

Recent advances in text mining have provided new methods for capitalizing on the voluminous natural language text data created by organizations, their employees, and their customers. Although often overlooked, decisions made during text preprocessing affect whether the content and/or style of language are captured, the statistical power of subsequent analyses, and the validity of insights derived from text mining. Past methodological articles have described the general process of obtaining and analyzing text data, but recommendations for preprocessing text data were inconsistent. Furthermore, primary studies use and report different preprocessing techniques. To address this, we conduct two complementary reviews of computational linguistics and organizational text mining research to provide empirically grounded text preprocessing decision-making recommendations that account for the type of text mining conducted (i.e., open or closed vocabulary), the research question under investigation, and the data set's characteristics (i.e., corpus size and average document length). Notably, deviations from these recommendations will be appropriate and, at times, necessary due to the unique characteristics of one's text data. We also provide recommendations for reporting text mining to promote transparency and reproducibility.

Text Preprocessing for Text Mining in Organizational Research: Review and Recommendations

期刊

ORGANIZATIONAL RESEARCH METHODS

出版社

SAGE PUBLICATIONS INC

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Text Preprocessing for Text Mining in Organizational Research: Review and Recommendations

期刊

ORGANIZATIONAL RESEARCH METHODS

出版社

SAGE PUBLICATIONS INC

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文