☆ 4.7 Article

Handling data irregularities in classification: Foundations, trends, and future challenges

PATTERN RECOGNITION (2018)

Journal

PATTERN RECOGNITION

Volume 81, Issue -, Pages 674-693

Publisher

ELSEVIER SCI LTD

DOI: 10.1016/j.patcog.2018.03.008

Keywords

Data irregularities; Class imbalance; Small disjuncts; Class-distribution skew; Missing features; Absent features

Funding

Indian National Academy of Engineering (INAE)

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

Most of the traditional pattern classifiers assume their input data to be well-behaved in terms of similar underlying class distributions, balanced size of classes, the presence of a full set of observed features in all data instances, etc. Practical datasets, however, show up with various forms of irregularities that are, very often, sufficient to confuse a classifier, thus degrading its ability to learn from the data. In this article, we provide a bird's eye view of such data irregularities, beginning with a taxonomy and characterization of various distribution-based and feature-based irregularities. Subsequently, we discuss the notable and recent approaches that have been taken to make the existing stand-alone as well as ensemble classifiers robust against such irregularities. We also discuss the interrelation and co-occurrences of the data irregularities including class imbalance, small disjuncts, class skew, missing features, and absent (non-existing or undefined) features. Finally, we uncover a number of interesting future research avenues that are equally contextual with respect to the regular as well as deep machine learning paradigms. (C) 2018 Elsevier Ltd. All rights reserved.

Handling data irregularities in classification: Foundations, trends, and future challenges

Journal

PATTERN RECOGNITION

Publisher

ELSEVIER SCI LTD

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Handling data irregularities in classification: Foundations, trends, and future challenges

Journal

PATTERN RECOGNITION

Publisher

ELSEVIER SCI LTD

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper