4.7 Article

An analysis of heuristic metrics for classifier ensemble pruning based on ordered aggregation

Journal

PATTERN RECOGNITION
Volume 124, Issue -, Pages -

Publisher

ELSEVIER SCI LTD
DOI: 10.1016/j.patcog.2021.108493

Keywords

Heuristic optimization; Ensemble selection; Ensemble pruning; Classifier ensemble; Machine learning; Difficult samples; Ordering-based pruning; Classifier complementariness

Funding

  1. European Union [665959]
  2. LOGISTAR project - European Union [769142]
  3. Polish National Science Center [2017/27/B/ST6/01325]
  4. [PID2019-106827GB-I0 0/AEI/10.13039/50110 0 011033]

Ask authors/readers for more resources

This article discusses the strategy of classifier ensemble pruning, involving optimizing predefined performance criteria to identify subensembles. The study analyzes a set of heuristic metrics to guide the pruning process, with results indicating that ordered aggregation is an effective strategy for improving predictive performance and reducing computational complexities.
Classifier ensemble pruning is a strategy through which a subensemble can be identified via optimizing a predefined performance criterion. Choosing the optimum or suboptimum subensemble decreases the initial ensemble size and increases its predictive performance. In this article, a set of heuristic metrics will be analyzed to guide the pruning process. The analyzed metrics are based on modifying the order of the classifiers in the bagging algorithm, with selecting the first set in the queue. Some of these criteria include general accuracy, the complementarity of decisions, ensemble diversity, the margin of samples, minimum redundancy, discriminant classifiers, and margin hybrid diversity. The efficacy of those metrics is affected by the original ensemble size, the required subensemble size, the kind of individual classifiers, and the number of classes. While the efficiency is measured in terms of the computational cost and the memory space requirements. The performance of those metrics is assessed over fifteen binary and fifteen multiclass benchmark classification tasks, respectively. In addition, the behavior of those metrics against randomness is measured in terms of the distribution of their accuracy around the median. Results show that ordered aggregation is an efficient strategy to generate subensembles that improve both predictive performance as well as computational and memory complexities of the whole bagging ensemble. (c) 2021 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license ( http://creativecommons.org/licenses/by-nc-nd/4.0/ )

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available