4.7 Article

Adjusted weight voting algorithm for random forests in handling missing values

Journal

PATTERN RECOGNITION
Volume 69, Issue -, Pages 52-60

Publisher

ELSEVIER SCI LTD
DOI: 10.1016/j.patcog.2017.04.005

Keywords

Random forests; Missing values; Imputation approaches; Surrogate decisions; Weighted voting

Funding

  1. National Natural Science Foundation of China [81271662]
  2. Ministry of the Science and Technology of China [2014DFT3050]
  3. Natural Science Foundation of Zhejiang Province [LQ14H180001]
  4. Department of Science and Technology of Zhejiang Province [2011R50018]

Ask authors/readers for more resources

Random forests (RF) is known as an efficient algorithm in classification, however it depends on the integrity of datasets. Conventional methods in dealing with missing values usually employ estimation and imputation approaches whose efficiency is tied to the assumptions of data features. Recently, algorithm of surrogate decisions in RF was developed and this paper proposes a random forests algorithm with modified surrogate splits (Adjusted Weight Voting Random Forest, AWVRF) which is able to address the incomplete data without imputation. Differing from the present surrogate method, in AWVRF algorithm, when the primary splitting attribute and the surrogate attributes of an internal node are all missing, the undergoing instance is allowed to exit at the current node with a vote. Then the weight of the vote is adjusted by the strength of the involved attributes and the final decision is made by weighted voting. AWVRF does not comprise imputation step, thus it is independent of data features. AWVRF is compared with the methods of mean imputation, LeoFill, knnimpute, BPCAfill and conventional RF with surrogate decisions (surrRF) using 50 times repeated 5-fold cross validation on 10 acknowledged datasets. In a total of 22 experiment settings, the method of AWVRF harvests the highest accuracy in 14 settings and the largest AUC in 7 settings, exhibiting its superiority over other methods. Compared with surrRF, AWVRF is significantly more efficient and remain good discrimination of prediction. Experimental results show that the present AWVRF algorithm can successfully handle the classification task for incomplete data. (C) 2017 Elsevier Ltd. All rights reserved.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available