期刊
PHYSICA A-STATISTICAL MECHANICS AND ITS APPLICATIONS
卷 534, 期 -, 页码 -出版社
ELSEVIER
DOI: 10.1016/j.physa.2019.122370
关键词
Default prediction; High-dimensional data; Imbalanced data; Machine learning; P2P lending
资金
- National Natural Science Foundation of China [91846107, 71571058, 61773286, 71690235]
- Fundamental Research Funds for the Central Universities [PA2019GDQT0021]
In recent years, a new Internet-based unsecured credit model, peer-to-peer (P2P) lending, is flourishing and has become a successful complement to the traditional credit business. However, credit risk remains inevitable. A key challenge is creating a default prediction model that can effectively and accurately predict the default probability of each loan for a P2P lending platform. Due to the characteristics of P2P lending credit data, such as high dimension and class imbalance, conventional statistical models and machine learning algorithms cannot effectively and accurately predict default probability. To address this issue, a decision tree model-based heterogeneous ensemble default prediction model is proposed in this paper for accurate prediction of customer default in P2P lending. Gradient boosting decision trees (GBDT), extreme gradient boosting (XGBoost) and light gradient boosting machine (LightGBM) are employed as individual classifiers to create a heterogeneous ensemble learning-based default prediction model. Learning model-based feature ranking is applied to P2P lending credit data, and individual classifiers undergo hyperparameter optimization. Finally, comparison with benchmark models shows that the prediction model can achieve desirable prediction results and thus effectively solve the challenge of predictions based on high-dimensional and imbalanced data. (C) 2019 Elsevier B.V. All rights reserved.
作者
我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。
推荐
暂无数据