☆ 4.5 Article

A probabilistic classifier ensemble weighting scheme based on cross-validated accuracy estimates

DATA MINING AND KNOWLEDGE DISCOVERY (2019)

Journal

DATA MINING AND KNOWLEDGE DISCOVERY

Volume 33, Issue 6, Pages 1674-1709

Publisher

SPRINGER

DOI: 10.1007/s10618-019-00638-y

Keywords

Classification; Heterogeneous; Ensemble; Weighted

Funding

UK Engineering and Physical Sciences Research Council (EPSRC) [EP/M015807/1]
Biotechnology and Biological Sciences Research Council (BBSRC) Norwich Research Park Biosciences Doctoral Training Partnership [BB/M011216/1]
Research and Specialist Computing Support service at the University of East Anglia
BBSRC [1786465] Funding Source: UKRI
EPSRC [EP/M015807/1] Funding Source: UKRI

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

Our hypothesis is that building ensembles of small sets of strong classifiers constructed with different learning algorithms is, on average, the best approach to classification for real-world problems. We propose a simple mechanism for building small heterogeneous ensembles based on exponentially weighting the probability estimates of the base classifiers with an estimate of the accuracy formed through cross-validation on the train data. We demonstrate through extensive experimentation that, given the same small set of base classifiers, this method has measurable benefits over commonly used alternative weighting, selection or meta-classifier approaches to heterogeneous ensembles. We also show how an ensemble of five well-known, fast classifiers can produce an ensemble that is not significantly worse than large homogeneous ensembles and tuned individual classifiers on datasets from the UCI archive. We provide evidence that the performance of the cross-validation accuracy weighted probabilistic ensemble (CAWPE) generalises to a completely separate set of datasets, the UCR time series classification archive, and we also demonstrate that our ensemble technique can significantly improve the state-of-the-art classifier for this problem domain. We investigate the performance in more detail, and find that the improvement is most marked in problems with smaller train sets. We perform a sensitivity analysis and an ablation study to demonstrate the robustness of the ensemble and the significant contribution of each design element of the classifier. We conclude that it is, on average, better to ensemble strong classifiers with a weighting scheme rather than perform extensive tuning and that CAWPE is a sensible starting point for combining classifiers.

A probabilistic classifier ensemble weighting scheme based on cross-validated accuracy estimates

Journal

DATA MINING AND KNOWLEDGE DISCOVERY

Publisher

SPRINGER

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

A probabilistic classifier ensemble weighting scheme based on cross-validated accuracy estimates

Journal

DATA MINING AND KNOWLEDGE DISCOVERY

Publisher

SPRINGER

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper