4.7 Article

Handling data irregularities in classification: Foundations, trends, and future challenges

Journal

PATTERN RECOGNITION
Volume 81, Issue -, Pages 674-693

Publisher

ELSEVIER SCI LTD
DOI: 10.1016/j.patcog.2018.03.008

Keywords

Data irregularities; Class imbalance; Small disjuncts; Class-distribution skew; Missing features; Absent features

Funding

  1. Indian National Academy of Engineering (INAE)

Ask authors/readers for more resources

Most of the traditional pattern classifiers assume their input data to be well-behaved in terms of similar underlying class distributions, balanced size of classes, the presence of a full set of observed features in all data instances, etc. Practical datasets, however, show up with various forms of irregularities that are, very often, sufficient to confuse a classifier, thus degrading its ability to learn from the data. In this article, we provide a bird's eye view of such data irregularities, beginning with a taxonomy and characterization of various distribution-based and feature-based irregularities. Subsequently, we discuss the notable and recent approaches that have been taken to make the existing stand-alone as well as ensemble classifiers robust against such irregularities. We also discuss the interrelation and co-occurrences of the data irregularities including class imbalance, small disjuncts, class skew, missing features, and absent (non-existing or undefined) features. Finally, we uncover a number of interesting future research avenues that are equally contextual with respect to the regular as well as deep machine learning paradigms. (C) 2018 Elsevier Ltd. All rights reserved.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available