4.4 Article

Decision tree approaches for zero-inflated count data

Journal

JOURNAL OF APPLIED STATISTICS
Volume 33, Issue 8, Pages 853-865

Publisher

TAYLOR & FRANCIS LTD
DOI: 10.1080/02664760600743613

Keywords

data mining; decision tree; homogeneity; maximum likelihood; zero inflated Poisson (ZIP)

Ask authors/readers for more resources

There have been many methodologies developed about zero-inflated data in the field of statistics. However, there is little literature in the data mining fields, even though zero-inflated data could be easily found in real application fields. In fact, there is no decision tree method that is suitable for zero-inflated responses. To analyze continuous target variable with decision trees as one of data mining techniques, we use F-statistics (CHAID) or variance reduction ( CART) criteria to find the best split. But these methods are only appropriate to a continuous target variable. If the target variable is rare events or zero-inflated count data, the above criteria could not give a good result because of its attributes. In this paper, we will propose a decision tree for zero-inflated count data, using a maximum of zero-inflated Poisson likelihood as the split criterion. In addition, using well-known data sets we will compare the performance of the split criteria. In the case when the analyst is interested in lower value groups ( e. g. no defect areas, customers who do not claim), the suggested ZIP tree would be more efficient.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.4
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available