☆ 4.6 Review

A systematic review of emerging feature selection optimization methods for optimal text classification: the present state and prospective opportunities

NEURAL COMPUTING & APPLICATIONS (2021)

期刊

NEURAL COMPUTING & APPLICATIONS

卷 33, 期 22, 页码 15091-15118

出版社

SPRINGER LONDON LTD

DOI: 10.1007/s00521-021-06406-8

关键词

Feature selection; Hyper-heuristics; Metaheuristic algorithm; Optimization; Text classification

类别

Computer Science, Artificial Intelligence

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

Data preparation techniques such as feature selection are crucial for optimizing predictive models for classification tasks. Traditional feature selection methods may not effectively reduce high dimensionality in text data, but emerging technologies like metaheuristics and hyper-heuristics optimization methods offer new possibilities for improving model accuracy and efficiency. Despite the potential benefits, there is still a need for best practices in utilizing these emerging feature selection methods for text classification tasks.

Specialized data preparation techniques, ranging from data cleaning, outlier detection, missing value imputation, feature selection (FS), amongst others, are procedures required to get the most out of data and, consequently, get the optimal performance of predictive models for classification tasks. FS is a vital and indispensable technique that enables the model to perform faster, eliminate noisy data, remove redundancy, reduce overfitting, improve precision and increase generalization on testing data. While conventional FS techniques have been leveraged for classification tasks in the past few decades, they fail to optimally reduce the high dimensionality of the feature space of texts, thus breeding inefficient predictive models. Emerging technologies such as the metaheuristics and hyper-heuristics optimization methods provide a new paradigm for FS due to their efficiency in improving the accuracy of classification, computational demands, storage, as well as functioning seamlessly in solving complex optimization problems with less time. However, little details are known on best practices for case-to-case usage of emerging FS methods. The literature continues to be engulfed with clear and unclear findings in leveraging effective methods, which, if not performed accurately, alters precision, real-world-use feasibility, and the predictive model's overall performance. This paper reviews the present state of FS with respect to metaheuristics and hyper-heuristic methods. Through a systematic literature review of over 200 articles, we set out the most recent findings and trends to enlighten analysts, practitioners and researchers in the field of data analytics seeking clarity in understanding and implementing effective FS optimization methods for improved text classification tasks.

A systematic review of emerging feature selection optimization methods for optimal text classification: the present state and prospective opportunities

期刊

NEURAL COMPUTING & APPLICATIONS

出版社

SPRINGER LONDON LTD

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

A systematic review of emerging feature selection optimization methods for optimal text classification: the present state and prospective opportunities

期刊

NEURAL COMPUTING & APPLICATIONS

出版社

SPRINGER LONDON LTD

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文