Journal
IET SOFTWARE
Volume 13, Issue 6, Pages 479-496Publisher
WILEY
DOI: 10.1049/iet-sen.2018.5193
Keywords
learning (artificial intelligence); reviews; systematic literature review method; systematic literature review guidelines; data sets; data quality; machine learning; imbalanced data preprocessing techniques; defect reduction; quality assessment criteria
Categories
Funding
- Ministry of Education under University of Malaya High Impact Research grant [UM.C/625/1/HIR/MOHE/FCSIT/13]
- Ministry of Education under Fundamental Research Grant Scheme (FRGS) [FP001-2016]
Ask authors/readers for more resources
Data preprocessing remains an important step in machine learning studies. This is because proper preprocessing of imbalanced data can enable researchers to reduce defects as much as possible, which, in turn, may lead to the elimination of defects in existing data sets. Despite the remarkable achievements that have been accomplished in machine learning studies, systematic literature reviews of imbalanced data preprocessing techniques are lacking. Consequently, there are a limited number of systematic literature review studies on imbalanced data preprocessing. In this study, the authors assess the existing literature to identify the key issues related to data quality and handling and to provide a convenient collection of the techniques used to address these issues when performing data preprocessing. They applied a systematic literature review method involving a manual search to select articles published from January 2010 to September 2018 for review. The qualities of the existing studies were assessed using certain quality assessment criteria. Of the 118 relevant studies found, only 2% were identified as having been conducted following systematic literature review guidelines. This study, therefore, calls for more systematic literature review studies on data preprocessing to improve the quality of the data applied in machine learning studies.
Authors
I am an author on this paper
Click your name to claim this paper and add it to your profile.
Reviews
Recommended
No Data Available