☆ 4.7 Article

On the cost-effectiveness of neural and non-neural approaches and representations for text classification: A comprehensive comparative study

INFORMATION PROCESSING & MANAGEMENT (2021)

Journal

INFORMATION PROCESSING & MANAGEMENT

Volume 58, Issue 3, Pages -

Publisher

ELSEVIER SCI LTD

DOI: 10.1016/j.ipm.2020.102481

Keywords

Text classification; Comparative study; Systematic review

Funding

CNPq, Brazil
CAPES, Brazil
FAPEMIG, Brazil
Amazon Web Services, United States
NVIDIA, United States
Google Research Awards
Brazilian National Research Council (CNPq) [159985/2018-8]

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

This article brings two major contributions. Firstly, it critically analyses recent scientific articles about different approaches for automatic text classification, revealing potential issues related to experimental procedures. Secondly, it provides a comparison between neural and non-neural ATC solutions, showing that simpler non-neural methods perform well in smaller datasets, while neural Transformers are better in larger datasets. However, the gains in effectiveness of neural methods are not significant compared to properly tuned non-neural solutions.

This article brings two major contributions. First, we present the results of a critical analysis of recent scientific articles about neural and non-neural approaches and representations for automatic text classification (ATC). This analysis is focused on assessing the scientific rigor of such studies. It reveals a profusion of potential issues related to the experimental procedures including: (i) use of inadequate experimental protocols, including no repetitions for the sake of assessing variability and generalization; (ii) lack of statistical treatment of the results; (iii) lack of details on hyperparameter tuning, especially of the baselines; (iv) use of inadequate measures of classification effectiveness (e.g., accuracy with skewed distributions). Second, we provide some organization and ground to the field by performing a comprehensive and scientifically sound comparison of recent neural and non-neural ATC solutions. Our study provides a more complete picture by looking beyond classification effectiveness, taking the trade-off between model costs (i.e., training time) into account. Our evaluation is guided by scientific rigor, which, as our literature review shows, is missing in a large body of work. Our experimental results, based on more than 1500 measurements, reveal that in the smaller datasets, the simplest and cheaper non-neural methods are among the best performers. In the larger datasets, neural Transformers perform better in terms of classification effectiveness. However, when compared to the best (properly tuned) non-neural solutions, the gains in effectiveness are not very expressive, especially considering the much longer training times (up to 23x slower). Our findings call for a self-reflection of best practices in the field, from the way experiments are conducted and analyzed to the choice of proper baselines for each situation and scenario.

On the cost-effectiveness of neural and non-neural approaches and representations for text classification: A comprehensive comparative study

Journal

INFORMATION PROCESSING & MANAGEMENT

Publisher

ELSEVIER SCI LTD

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

On the cost-effectiveness of neural and non-neural approaches and representations for text classification: A comprehensive comparative study

Journal

INFORMATION PROCESSING & MANAGEMENT

Publisher

ELSEVIER SCI LTD

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper