4.6 Article

Default prediction in P2P lending from high-dimensional data based on machine learning

出版社

ELSEVIER
DOI: 10.1016/j.physa.2019.122370

关键词

Default prediction; High-dimensional data; Imbalanced data; Machine learning; P2P lending

资金

  1. National Natural Science Foundation of China [91846107, 71571058, 61773286, 71690235]
  2. Fundamental Research Funds for the Central Universities [PA2019GDQT0021]

向作者/读者索取更多资源

In recent years, a new Internet-based unsecured credit model, peer-to-peer (P2P) lending, is flourishing and has become a successful complement to the traditional credit business. However, credit risk remains inevitable. A key challenge is creating a default prediction model that can effectively and accurately predict the default probability of each loan for a P2P lending platform. Due to the characteristics of P2P lending credit data, such as high dimension and class imbalance, conventional statistical models and machine learning algorithms cannot effectively and accurately predict default probability. To address this issue, a decision tree model-based heterogeneous ensemble default prediction model is proposed in this paper for accurate prediction of customer default in P2P lending. Gradient boosting decision trees (GBDT), extreme gradient boosting (XGBoost) and light gradient boosting machine (LightGBM) are employed as individual classifiers to create a heterogeneous ensemble learning-based default prediction model. Learning model-based feature ranking is applied to P2P lending credit data, and individual classifiers undergo hyperparameter optimization. Finally, comparison with benchmark models shows that the prediction model can achieve desirable prediction results and thus effectively solve the challenge of predictions based on high-dimensional and imbalanced data. (C) 2019 Elsevier B.V. All rights reserved.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据