Journal
PATTERN RECOGNITION
Volume 81, Issue -, Pages 674-693Publisher
ELSEVIER SCI LTD
DOI: 10.1016/j.patcog.2018.03.008
Keywords
Data irregularities; Class imbalance; Small disjuncts; Class-distribution skew; Missing features; Absent features
Funding
- Indian National Academy of Engineering (INAE)
Ask authors/readers for more resources
Most of the traditional pattern classifiers assume their input data to be well-behaved in terms of similar underlying class distributions, balanced size of classes, the presence of a full set of observed features in all data instances, etc. Practical datasets, however, show up with various forms of irregularities that are, very often, sufficient to confuse a classifier, thus degrading its ability to learn from the data. In this article, we provide a bird's eye view of such data irregularities, beginning with a taxonomy and characterization of various distribution-based and feature-based irregularities. Subsequently, we discuss the notable and recent approaches that have been taken to make the existing stand-alone as well as ensemble classifiers robust against such irregularities. We also discuss the interrelation and co-occurrences of the data irregularities including class imbalance, small disjuncts, class skew, missing features, and absent (non-existing or undefined) features. Finally, we uncover a number of interesting future research avenues that are equally contextual with respect to the regular as well as deep machine learning paradigms. (C) 2018 Elsevier Ltd. All rights reserved.
Authors
I am an author on this paper
Click your name to claim this paper and add it to your profile.
Reviews
Recommended
No Data Available