4.7 Article

A hybrid ensemble-filter wrapper feature selection approach for medical data classification

出版社

ELSEVIER
DOI: 10.1016/j.chemolab.2021.104396

关键词

Hybrid filter; wrapper method; Feature selection; Medical data; Classification; Ensemble learning

向作者/读者索取更多资源

The study introduced a hybrid feature selection model based on ensemble-filter for disease detection, showing superior performance compared to other state-of-the-art algorithms in terms of accuracy, sensitivity, specificity, f1score, area under curve evaluation measures and number of selected features. The proposed hybrid approach is concluded to be more effective and reliable in selecting highly discriminative features, offering a promising tool for enhancing the classification performance of medical datasets for both clinicians and researchers.
Background and objective: Medical data plays a decisive role in disease diagnosis. The classification accuracy of high-dimensional datasets is often diminished by several redundant and irrelevant features. In this context, feature selection becomes an indispensable process. Feature selection primarily intends to identify a feature subspace which retains the classification accuracy while reducing the high computational cost of learning model as well as eliminating noise. The suitability of an appropriate feature selection approach heavily depends upon the capability of that approach to match the problem framework and to discover the intrinsic patterns within the data. The prime objective of this paper is to develop an ensemble-filter based hybrid feature selection model for disease detection. Methods: In this paper, a four-step hybrid ensemble feature selection algorithm has been introduced. Firstly, the dataset is partitioned using the cross-validation procedure. Secondly, in the filter step, various filter methods based on weighted scores were ensembled to generate a ranking of features, and thirdly sequential forward selection algorithm is utilized as a wrapper technique to obtain an optimal subset of features. Finally, the resulting optimal subset is processed for subsequent classification tasks. Results: Experiments have been performed on twenty benchmark medical datasets with different dimensionalities. The performance of the proposed hybrid approach is compared with fourteen state-of-the-art feature selection algorithms on four benchmark classifiers namely, Naive Bayes, Support Vector Machine with Radial Basis Function, Random Forest and k-Nearest Neighbor. The empirical results clearly demonstrate the superiority of the proposed hybrid methodology over the competing methods with respect to accuracy, sensitivity, specificity, f1score, area under curve evaluation measures and number of selected features. The statistical analysis of the obtained results shows the outperformance and the competitiveness of the proposed hybrid method with respect to various state-of-the-art algorithms. Conclusions: This study concludes that the proposed hybrid approach proves to be more effective and reliable feature selection technique in selecting highly discriminative features. The framework can be utilized as a promising tool by both clinicians and researchers in enhancing the classification performance of medical datasets.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据