4.7 Article

Big data analytics for default prediction using graph theory

期刊

EXPERT SYSTEMS WITH APPLICATIONS
卷 176, 期 -, 页码 -

出版社

PERGAMON-ELSEVIER SCIENCE LTD
DOI: 10.1016/j.eswa.2021.114840

关键词

Big data analytics; Graph theory; Machine learning; Default prediction; SHAP value

向作者/读者索取更多资源

This study introduces two new models for default prediction, using a Big Data Analytics platform and a combination of statistical and machine learning methods to predict default for one million companies in Turkey, achieving promising results.
With the unprecedented increase in data all over the world, financial sector such as companies and industries try to remain competitive by transforming themselves into data-driven organizations. By analyzing a huge amount of financial data, companies are able to obtain valuable information to determine their strategic plans such as risk control, crisis management, or growth management. However, as the amount of data increase dramatically, traditional data analytic platforms confront with storing, managing, and analyzing difficulties. Emerging Big Data Analytics (BDA) overcome these problems by providing decentralized and distributed processing. In this study, we propose two new models for default prediction. In the first model, called DPModel-1, statistical (logistic regression), and machine learning methods (decision tree, random forest, gradient boosting) are employed to predict company default. Derived from the first model, we propose DPModel-2 based on graph theory. DPModel-2 also comprises new variables obtained from the trading interactions of companies. In both models, grid search optimization and SHapley Additive exPlanations (SHAP) value are utilized in order to determine the best hyperparameters and make the models interpretable, respectively. By leveraging balance sheet, credit, and invoice datasets, default prediction is realized for about one million companies in Turkey between the years 2010?2018. The default rates of companies range between 3%-6% by year. The experimental results are conducted on a BDA platform. According to the DPModel-1 results, the highest AUC score is ensured by random forest with 0.87. In addition, the results are improved for each technique separately by adjusting new variables with graph theory. According to DPModel-2 results, the best AUC score is achieved by random forest with 0.89.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据