4.7 Article

LR-SMOTE - An improved unbalanced data set oversampling based on K-means and SVM

Journal

KNOWLEDGE-BASED SYSTEMS
Volume 196, Issue -, Pages -

Publisher

ELSEVIER
DOI: 10.1016/j.knosys.2020.105845

Keywords

Unbalanced data sets; SMOTE; Loose particles signal; LR-SMOTE algorithm

Funding

  1. National Natural Science Foundation of China [51607059, 51077022, 61271347]
  2. Natural Science Foundation of Heilongjiang Province [QC2017059]
  3. Postdoctoral Fund in Heilongjiang Province [LBHZ16169]
  4. Talent Innovation Special Project of Heilongjiang Province [HDRCCX-201604]
  5. Science and Technology Innovative Research Team in Higher Educational Institutions of Heilongjiang Province [2012TD007]
  6. Heilongjiang University Youth Science Fund Project [QL201505]

Ask authors/readers for more resources

Machine learning classification algorithms are currently widely used. One of the main problems faced by classification algorithms is the problem of unbalanced data sets. Classification algorithms are not sensitive to unbalanced data sets, therefore, it is difficult to classify unbalanced data sets. There is also a problem of unbalanced data categories in the field of loose particle detection of sealed electronic components. The signals generated by internal components are always more than the signals generated by loose particles, which easily leads to misjudgment in classification. To classify unbalanced data sets more accurately, in this paper, based on the traditional oversampling SMOTE algorithm, the LR-SMOTE algorithm is proposed to make the newly generated samples close to the sample center, avoid generating outlier samples or changing the distribution of data sets. Experiments were carried out on four sets of UCI public data sets and six sets of self-built data sets. Unmodified data sets balanced by LR-SMOTE and SMOTE algorithms used random forest algorithm and support vector machine algorithm respectively. The experimental results show that the LR-SMOTE has better performance than the SMOTE algorithm in terms of G-means value, F-measure value and AUC. (C) 2020 Elsevier B.V. All rights reserved.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available