☆ 4.3 Article

Comparing automated text classification methods

INTERNATIONAL JOURNAL OF RESEARCH IN MARKETING (2019)

Journal

INTERNATIONAL JOURNAL OF RESEARCH IN MARKETING

Volume 36, Issue 1, Pages 20-38

Publisher

ELSEVIER

DOI: 10.1016/j.ijresmar.2018.09.009

Keywords

Text classification; Social media; Machine learning; User-generated content; Sentiment analysis; Natural language processing

Funding

German Research Foundation (DFG) research unit 1452, How Social Media is Changing Marketing [HE 6703/1-2]

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

Online social media drive the growth of unstructured text data. Many marketing applications require structuring this data at scales non-accessible to human coding, e.g., to detect communication shifts in sentiment or other researcher-defined content categories. Several methods have been proposed to automatically classify unstructured text. This paper compares the performance of ten such approaches (five lexicon-based, five machine learning algorithms) across 41 social media datasets covering major social media platforms, various sample sizes, and languages. So far, marketing research relies predominantly on support vector machines (SVM) and Linguistic Inquiry and Word Count (LIWC). Across all tasks we study, either random forest (RF) or naive Bayes (NB) performs best in terms of correctly uncovering human intuition. In particular, RF exhibits consistently high performance for three-class sentiment, NB for small samples sizes. SVM never outperform the remaining methods. All lexicon-based approaches, LIWC in particular, perform poorly compared with machine learning. In some applications, accuracies only slightly exceed chance. Since additional considerations of text classification choice are also in favor of NB and RF, our results suggest that marketing research can benefit from considering these alternatives. (C) 2018 Elsevier B.V. All rights reserved.

Comparing automated text classification methods

Journal

INTERNATIONAL JOURNAL OF RESEARCH IN MARKETING

Publisher

ELSEVIER

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Comparing automated text classification methods

Journal

INTERNATIONAL JOURNAL OF RESEARCH IN MARKETING

Publisher

ELSEVIER

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper