4.7 Article

High-Dimensional Unbalanced Binary Classification by Genetic Programming with Multi-Criterion Fitness Evaluation and Selection

期刊

EVOLUTIONARY COMPUTATION
卷 30, 期 1, 页码 99-129

出版社

MIT PRESS
DOI: 10.1162/evco_a_00304

关键词

Classification; genetic programming; high dimensionality; class imbalance

资金

  1. Marsden Fund of New Zealand government [VUW1615, VUW1913, VUW1914]
  2. Science for Technological Innovation Challenge (SfTI) [2019-S7-CRS]
  3. University Research Fund at Victoria University of Wellington [223805/3986]
  4. MBIE Data Science SSIF Fund [RTVU1914]
  5. National Natural Science Foundation of China (NSFC) [61876169, 51975294]
  6. China Scholarship Council/Victoria University of Wellington Scholarship

向作者/读者索取更多资源

This article proposes a new fitness function and a multi-criteria selection method to address the bias issue of genetic programming in high-dimensional unbalanced classification. Experimental results show that the proposed method achieves better classification performance than other methods.
High-dimensional unbalanced classification is challenging because of the joint effects of high dimensionality and class imbalance. Genetic programming (GP) has the potential benefits for use in high-dimensional classification due to its built-in capability to select informative features. However, once data are not evenly distributed, GP tends to develop biased classifiers which achieve a high accuracy on the majority class but a low accuracy on the minority class. Unfortunately, the minority class is often at least as important as the majority class. It is of importance to investigate how GP can be effectively utilized for high-dimensional unbalanced classification. In this article, to address the performance bias issue of GP, a new two-criterion fitness function is developed, which considers two criteria, that is, the approximation of area under the curve (AUC) and the classification clarity (i.e., how well a program can separate two classes). The obtained values on the two criteria are combined in pairs, instead of summing them together. Furthermore, this article designs a three-criterion tournament selection to effectively identify and select good programs to be used by genetic operators for generating offspring during the evolutionary learning process. The experimental results show that the proposed method achieves better classification performance than other compared methods.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据