4.7 Article

A Bolasso based consistent feature selection enabled random forest classification algorithm: An application to credit risk assessment

Journal

APPLIED SOFT COMPUTING
Volume 86, Issue -, Pages -

Publisher

ELSEVIER
DOI: 10.1016/j.asoc.2019.105936

Keywords

Bootstrap-lasso; Stable feature selection; BS-RF; Credit risk assessment; Random forest in credit risk

Ask authors/readers for more resources

Credit risk assessment has been a crucial issue as it forecasts whether an individual will default on loan or not. Classifying an applicant as good or bad debtor helps lender to make a wise decision. The modern data mining and machine learning techniques have been found to be very useful and accurate in credit risk predictive capability and correct decision making. Classification is one of the most widely used techniques in machine learning. To increase prediction accuracy of standalone classifiers while keeping overall cost to a minimum, feature selection techniques have been utilized, as feature selection removes redundant and irrelevant attributes from dataset. This paper initially introduces Bolasso (Bootstrap-Lasso) which selects consistent and relevant features from pool of features. The consistent feature selection is defined as robustness of selected features with respect to changes in dataset Bolasso generated shortlisted features are then applied to various classification algorithms like Random Forest (RF), Support Vector Machine (SVM), Naive Bayes (NB) and K-Nearest Neighbors (K-NN) to test its predictive accuracy. It is observed that Bolasso enabled Random Forest algorithm (BS-RF) provides best results forcredit risk evaluation. The classifiers are built on training and test data partition (70:30) of three datasets (Lending Club's peer to peer dataset, Kaggle's Bank loan status dataset and German credit dataset obtained from UCI). The performance of Bolasso enabled various classification algorithms is then compared with that of other baseline feature selection methods like Chi Square, Gain Ratio, ReliefF and stand-alone classifiers (no feature selection method applied). The experimental results shows that Bolasso provides phenomenal stability of features when compared with stability of other algorithms. Jaccard Stability Measure (JSM) is used to assess stability of feature selection methods. Moreover BS-RF have good classification accuracy and is better than other methods in terms of AUC and Accuracy resulting in effectively improving the decision making process of lenders. (C) 2019 Elsevier B.V. All rights reserved.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available