Journal
JOURNAL OF APPLIED STATISTICS
Volume 33, Issue 8, Pages 853-865Publisher
TAYLOR & FRANCIS LTD
DOI: 10.1080/02664760600743613
Keywords
data mining; decision tree; homogeneity; maximum likelihood; zero inflated Poisson (ZIP)
Categories
Ask authors/readers for more resources
There have been many methodologies developed about zero-inflated data in the field of statistics. However, there is little literature in the data mining fields, even though zero-inflated data could be easily found in real application fields. In fact, there is no decision tree method that is suitable for zero-inflated responses. To analyze continuous target variable with decision trees as one of data mining techniques, we use F-statistics (CHAID) or variance reduction ( CART) criteria to find the best split. But these methods are only appropriate to a continuous target variable. If the target variable is rare events or zero-inflated count data, the above criteria could not give a good result because of its attributes. In this paper, we will propose a decision tree for zero-inflated count data, using a maximum of zero-inflated Poisson likelihood as the split criterion. In addition, using well-known data sets we will compare the performance of the split criteria. In the case when the analyst is interested in lower value groups ( e. g. no defect areas, customers who do not claim), the suggested ZIP tree would be more efficient.
Authors
I am an author on this paper
Click your name to claim this paper and add it to your profile.
Reviews
Recommended
No Data Available