4.8 Article

A Hybrid Classification Framework Based on Clustering

期刊

IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS
卷 16, 期 4, 页码 2177-2188

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TII.2019.2933675

关键词

Training; Clustering algorithms; Decision trees; Neural networks; Supervised learning; Partitioning algorithms; Industries; Classification; clustering; decision tree; hybrid model; hybrid information gain ratio; industrial application

资金

  1. Major Project of the National Social Science Foundation of China [18VZL006]
  2. EU Horizon 2020 RISE Project ULTRACEPT [778062]
  3. National Natural Science Foundation of China [71471124]
  4. Tianfu Ten-Thousand Talents Program of Sichuan Province
  5. Excellent Youth Fund of Sichuan University [skqx201607, sksyl201709, skzx2016-rcrw14]
  6. Leading Cultivation Talents Program of Sichuan University

向作者/读者索取更多资源

The traditional supervised classification algorithms tend to focus on uncovering the relationship between sample attributes and the class labels; they seldom consider the potential structural characteristics of the sample space, often leading to unsatisfactory classification results. To improve the performance of classification models, many scholars have sought to construct hybrid models by combining both supervised and unsupervised learning. Although the existing hybrid models have shown significant potential in industrial applications, our experiments indicate that some shortcomings remain. With the aim of overcoming such shortcomings of the existing hybrid models, this article proposes a hybrid classification framework based on clustering (HCFC). First, it applies a clustering algorithm to partition the training samples into K clusters. It then constructs a clustering-based attribute selection measure-namely, the hybrid information gain ratio, based upon which it then trains a C4.5 decision tree. Depending on the differences in the clustering algorithms used, this article constructs two different versions of the HCFC (HCFC-K and HCFC-D) and tests them on eight benchmark datasets in the healthcare and disease diagnosis industries and on 15 datasets from other fields. The results indicate that both versions of the HCFC achieve a comparable or even better classification performance than the other three hybrid and six single models considered. In addition, the HCFC-D has a stronger ability to resist class noise compared with the HCFC-K.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.8
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据