☆ 4.6 Article

Default prediction in P2P lending from high-dimensional data based on machine learning

PHYSICA A-STATISTICAL MECHANICS AND ITS APPLICATIONS (2019)

Journal

PHYSICA A-STATISTICAL MECHANICS AND ITS APPLICATIONS

Volume 534, Issue -, Pages -

Publisher

ELSEVIER

DOI: 10.1016/j.physa.2019.122370

Keywords

Default prediction; High-dimensional data; Imbalanced data; Machine learning; P2P lending

Funding

National Natural Science Foundation of China [91846107, 71571058, 61773286, 71690235]
Fundamental Research Funds for the Central Universities [PA2019GDQT0021]

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

In recent years, a new Internet-based unsecured credit model, peer-to-peer (P2P) lending, is flourishing and has become a successful complement to the traditional credit business. However, credit risk remains inevitable. A key challenge is creating a default prediction model that can effectively and accurately predict the default probability of each loan for a P2P lending platform. Due to the characteristics of P2P lending credit data, such as high dimension and class imbalance, conventional statistical models and machine learning algorithms cannot effectively and accurately predict default probability. To address this issue, a decision tree model-based heterogeneous ensemble default prediction model is proposed in this paper for accurate prediction of customer default in P2P lending. Gradient boosting decision trees (GBDT), extreme gradient boosting (XGBoost) and light gradient boosting machine (LightGBM) are employed as individual classifiers to create a heterogeneous ensemble learning-based default prediction model. Learning model-based feature ranking is applied to P2P lending credit data, and individual classifiers undergo hyperparameter optimization. Finally, comparison with benchmark models shows that the prediction model can achieve desirable prediction results and thus effectively solve the challenge of predictions based on high-dimensional and imbalanced data. (C) 2019 Elsevier B.V. All rights reserved.

Default prediction in P2P lending from high-dimensional data based on machine learning

Journal

PHYSICA A-STATISTICAL MECHANICS AND ITS APPLICATIONS

Publisher

ELSEVIER

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Default prediction in P2P lending from high-dimensional data based on machine learning

Journal

PHYSICA A-STATISTICAL MECHANICS AND ITS APPLICATIONS

Publisher

ELSEVIER

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper