4.6 Article

An Ensemble Random Forest Algorithm for Insurance Big Data Analysis

期刊

IEEE ACCESS
卷 5, 期 -, 页码 16568-16575

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/ACCESS.2017.2738069

关键词

Classification algorithms; ensemble learning; random forest; big data; spark

资金

  1. National Natural Science Foundation of China [61402183, 61772205]
  2. National Science and Technology Ministry [2015BAK36B06]
  3. Guangdong Provincial Scientific and Technological Projects [2017A010101008, 2017A010101014, 2017B090901061, 2016A010101007, 2016B090918021, 2014B010117001]
  4. Guangzhou Science and Technology Projects [201607010048, 201604010040]
  5. Opening Project of Guangdong Province Key Laboratory of Big Data Analysis and Processing [2017004]
  6. Fundamental Research Funds for the Central Universities, SCUT

向作者/读者索取更多资源

Due to the imbalanced distribution of business data, missing user features, and many other reasons, directly using big data techniques on realistic business data tends to deviate from the business goals. It is difficult to model the insurance business data by classification algorithms, such as logistic regression and support vector machine (SVM). In this paper, we exploit a heuristic bootstrap sampling approach combined with the ensemble learning algorithm on the large-scale insurance business data mining, and propose an ensemble random forest algorithm that uses the parallel computing capability and memory-cache mechanism optimized by Spark. We collected the insurance business data from China Life Insurance Company to analyze the potential customers using the proposed algorithm. We use F-Measure and G-mean to evaluate the performance of the algorithm. Experiment result shows that the ensemble random forest algorithm outperformed SVM and other classification algorithms in both performance and accuracy within the imbalanced data, and it is useful for improving the accuracy of product marketing compared to the traditional artificial approach.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据