4.5 Article

An experimental comparison of performance measures for classification

Journal

PATTERN RECOGNITION LETTERS
Volume 30, Issue 1, Pages 27-38

Publisher

ELSEVIER
DOI: 10.1016/j.patrec.2008.08.010

Keywords

Classification; Performance measures; Ranking; Calibration

Funding

  1. EU (FEDER)
  2. Spanish MEC [TIN 2007-68093C02-02]
  3. Generalitat Valenciana [GV06/301]
  4. UPV
  5. TAMAT
  6. Spanish project Agreement Technologies (Consolider Ingenio) [CSD2007-00022]

Ask authors/readers for more resources

Performance metrics in classification are fundamental in assessing the quality of learning methods and learned models. However, many different Measures have been defined in the literature with the aim of making better choices in general or For a specific application area. Choices made by one metric are claimed to be different from choices made by other metrics. In this work, we analyse experimentally the behaviour of 18 different performance metrics in several scenarios, identifying clusters and relationships between measures. We also perform a sensitivity analysis for all of them in terms of several traits: class threshold choice, separability/ranking quality, calibration performance and sensitivity to changes in prior class distribution. From the definitions and experiments, we make a comprehensive analysis of the relationships between metrics, and a taxonomy and arrangement of them according to the previous traits. This call be useful for choosing the most adequate measure (or set of measures) for a specific application. Additionally, the Study also highlights some niches in which new measures might be defined and also shows that Sonic supposedly innovative measures make the same choices (or almost) as existing ones. Finally, this work can also be used as a reference for comparing experimental results in pattern recognition and machine learning literature, when using different measures. (C) 2008 Elsevier B.V. All rights reserved.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.5
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available