4.6 Article

Semi-Supervised Deep Fuzzy C-Mean Clustering lefor Software Fault Prediction

期刊

IEEE ACCESS
卷 6, 期 -, 页码 25675-25685

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/ACCESS.2018.2835304

关键词

Semi-supervised learning; fuzzy C-Mean clustering; feature learning; software fault prediction

资金

  1. National Basic Research Program (973 Program) of China [2013CB329402]
  2. National Natural Science Foundation of China [61573267, 61473215, 61571342, 61572383, 61501353, 61502369, 61271302, 61272282, 61202176]
  3. Fund for Foreign Scholars in University Research and Teaching Programs (111 Project) [B07048]
  4. Major Research Plan of the National Natural Science Foundation of China [91438201, 91438103]

向作者/读者索取更多资源

Software fault prediction is a consequential research area in software quality promise. In this paper, we propose a semi-supervised deep fuzzy C-mean (DFCM) clustering for software fault prediction, which is the cumulation of semi-supervised DFCM clustering and feature compression techniques. Deep is utilized for the feature-based multi clusters of unlabeled and labeled data sets along with their labeled classes. In our approach, for the training model, we simultaneously deal with the unsupervised data and supervised data to exploit the obnubilated information from unlabeled data to labeled data to support the construction of the precise model. We utilize DFCM clustering to handle the class imbalance problem and withal fuzzy theory logic is very akin to human logic and it is facile to comprehend. We further ameliorate the prediction performance with the coalescence of feature learning techniques-feature extraction and feature selection (using random-under sampling) to generate good features and remove irrelevant and redundant features to reduce the noisy data for classification. However, by the performance of the model results, the amalgamation of deep multi clusters and feature techniques work better due to their ability to identify and amalgamation essential information in data feature. The classification model is predicted on the maximum homogeneous between the features of labeled and unlabeled data, the model is trained on the un-noisy data set obtained by the deep coalescence of multi clusters and feature techniques. To check the efficacy of our approach, we chose data sets from real-world software project (NASA & Eclipse), and then we compared our approach with a number of latest classical base-line methods, and investigate the performance by using performance measures such as probability of detection, F-measure, and area under the curve.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据