4.7 Article

A principle component analysis-based random forest with the potential nearest neighbor method for automobile insurance fraud identification

Journal

APPLIED SOFT COMPUTING
Volume 70, Issue -, Pages 1000-1009

Publisher

ELSEVIER
DOI: 10.1016/j.asoc.2017.07.027

Keywords

Ensemble; Random forest; Principle component analysis; Potential nearest neighbors; Voting mechanism; Automobile insurance fraud

Funding

  1. Project of the National Natural Science Foundation of China [61502280, 61472228]
  2. Project of Qingdao Applied Basic Research of Qingdao [14-2-4-55-jch]
  3. Natural Science Foundation of Shandong province [ZR2014FM009]
  4. Graduate Education Innovation Program Project of Shandong University of Science and Technology [KDYC14016]

Ask authors/readers for more resources

As a successful ensemble method, Random Forest has attracted much attention. In this paper, individual classifiers are appropriately combined and a multiple classifier system with an increase in classification accuracy is presented. According to Breiman's methodology, we propose a multiple classifier system based on the Random Forest, Principle Component Analysis and Potential Nearest Neighbor methods As Breiman suggested, the performance of the Random Forest depends on the strength of the weak learners in the forests and diversity among them. The Principle Component Analysis method is applied to transform data at each node to another space when computing the best split at this node. This process increases the diversity of each tree in the forest and thereby improves the overall accuracy. The Random Forest is studied through the perspective of the Adaptive Nearest Neighbor. We introduce the concept of monotone distance measures and potential nearest neighbors and show that the Random Forest can be viewed as an adaptive learning mechanism of k Potential Nearest Neighbors. Considering the information loss caused by out-of-bag samples, a new voting mechanism based on Potential Nearest Neighbor is also presented to replace the traditional majority vote. The proposed algorithm improves the classification accuracy of the ensemble classifier by improving the difference of the base classifiers. The performance of the proposed method is compared with those of the Oblique Decision Tree Ensemble, Rotation Forest and basic Random Forest on the data sets. The experimental results show that the proposed method produces a better classification accuracy and lower variance. The proposed method is also applied to detect automobile insurance fraud, and the fraud rules are obtained. (C) 2017 Elsevier B.V. All rights reserved.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available