4.7 Article

LR-SMOTE - An improved unbalanced data set oversampling based on K-means and SVM

期刊

KNOWLEDGE-BASED SYSTEMS
卷 196, 期 -, 页码 -

出版社

ELSEVIER
DOI: 10.1016/j.knosys.2020.105845

关键词

Unbalanced data sets; SMOTE; Loose particles signal; LR-SMOTE algorithm

资金

  1. National Natural Science Foundation of China [51607059, 51077022, 61271347]
  2. Natural Science Foundation of Heilongjiang Province [QC2017059]
  3. Postdoctoral Fund in Heilongjiang Province [LBHZ16169]
  4. Talent Innovation Special Project of Heilongjiang Province [HDRCCX-201604]
  5. Science and Technology Innovative Research Team in Higher Educational Institutions of Heilongjiang Province [2012TD007]
  6. Heilongjiang University Youth Science Fund Project [QL201505]

向作者/读者索取更多资源

Machine learning classification algorithms are currently widely used. One of the main problems faced by classification algorithms is the problem of unbalanced data sets. Classification algorithms are not sensitive to unbalanced data sets, therefore, it is difficult to classify unbalanced data sets. There is also a problem of unbalanced data categories in the field of loose particle detection of sealed electronic components. The signals generated by internal components are always more than the signals generated by loose particles, which easily leads to misjudgment in classification. To classify unbalanced data sets more accurately, in this paper, based on the traditional oversampling SMOTE algorithm, the LR-SMOTE algorithm is proposed to make the newly generated samples close to the sample center, avoid generating outlier samples or changing the distribution of data sets. Experiments were carried out on four sets of UCI public data sets and six sets of self-built data sets. Unmodified data sets balanced by LR-SMOTE and SMOTE algorithms used random forest algorithm and support vector machine algorithm respectively. The experimental results show that the LR-SMOTE has better performance than the SMOTE algorithm in terms of G-means value, F-measure value and AUC. (C) 2020 Elsevier B.V. All rights reserved.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据