☆ 4.5 Article

A Method of Classification Performance Improvement Via a Strategy of Clustering-Based Data Elimination Integrated withk-Fold Cross-Validation

ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING (2021)

期刊

ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING

卷 46, 期 2, 页码 1199-1212

出版社

SPRINGER HEIDELBERG

DOI: 10.1007/s13369-020-04972-y

关键词

Clustering-based data elimination; Relief; Medical dataset classification

类别

Multidisciplinary Sciences

资金

Necmettin Erbakan University
Selcuk University Scientific Research Projects Coordinatorship

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

The article introduces a data reduction method named MKMA-RAC, aimed at eliminating noisy data in classification systems to improve performance. Through experiments on datasets related to Hepatitis, Liver Disorders, SPECT images and Statlog (Heart), it is demonstrated that the proposed method achieves higher classification success rates compared to traditional methods.

Non-system errors that occur during data entry or data collection create noisy data that reduce the success of classification systems. To eliminate this data, a classification system with a new data reduction method consisting of a modifiedk-means algorithm using relief algorithm coefficients named MKMA-RAC was developed. The main theme of this article is the elimination of noisy data and its consistent application to the classification system using thek-fold cross-validation method. By means of the developed system, the training data became free from noisy data by integrating the support vector machine, linear discriminant analysis (LDA) and decision tree classifiers with MKMA-RAC-based data reduction for every fold. The data reduction process was not applied for the test data. Datasets used in the proposed method were the Hepatitis, Liver Disorders, SPECT images and Statlog (Heart) dataset taken from the UCI database. Classification performance values obtained both from the proposed method and without the proposed method with tenfold CV were given for these datasets. For Hepatitis, Liver Disorders, SPECT images and Statlog (Heart) datasets, and classification successes of the proposed system with SVM classifier were 96.88%, 74.56%, 87.24%, and 90.00%, classification successes of the proposed system with LDA classifier were 94.91%, 69.05%, 82.38%, and 88.52%, classification successes of the proposed system with decision tree classifier were 96.25%, 77.73%, 88.77% and 89.63%, respectively. The test results have shown that the proposed system generally achieved higher classification performance than other literature results. Therefore, the performance is very encouraging for pattern recognition applications.

A Method of Classification Performance Improvement Via a Strategy of Clustering-Based Data Elimination Integrated withk-Fold Cross-Validation

期刊

ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING

出版社

SPRINGER HEIDELBERG

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

A Method of Classification Performance Improvement Via a Strategy of Clustering-Based Data Elimination Integrated withk-Fold Cross-Validation

期刊

ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING

出版社

SPRINGER HEIDELBERG

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文