4.7 Article

Incorporating domain knowledge into data mining classifiers: An application in indirect lending

期刊

DECISION SUPPORT SYSTEMS
卷 46, 期 1, 页码 287-299

出版社

ELSEVIER
DOI: 10.1016/j.dss.2008.06.013

关键词

Data mining; Classification; Supervised learning; Domain knowledge; Expert system

向作者/读者索取更多资源

Data mining techniques have been applied to solve classification problems for a variety of applications such as credit scoring, bankruptcy prediction, insurance underwriting, and management fraud detection. In many of those application domains, there exist human experts whose knowledge Could have a bearing on the effectiveness of the classification decision. The lack of research in combining data mining techniques with domain knowledge has prompted researchers to identify the fusion of data mining and knowledge-based expert systems as an important future direction. In this paper, we compare the performance of seven data mining classification methods-naive Bayes, logistic regression, decision tree, decision table, neural network, k-nearest neighbor, and support vector machine-with and without incorporating domain knowledge. The application we focus on is in the domain of indirect bank lending. An expert system capturing a lending expert's knowledge of rating a borrower's credit is used in combination with data mining to study if the incorporation of domain knowledge improves classification performance. We use two performance measures: misclassification cost and AUC (area under the curve). A 2 x 7 factorial, repeated-measures ANOVA, with the two factors being domain knowledge (present or absent) and data mining method (seven methods), as well as a special statistical test for comparing AUCs, is used for analyzing the results. Analysis of the results reveals that incorporation of domain knowledge significantly improves classification performance with respect to both misclassification cost and AUC. There is interaction between classification method and domain knowledge. Incorporation of domain knowledge has a higher influence on performance for some methods than for others. Both measures-misclassification cost and AUC-yield similar results, indicating that the findings of the study are robust. (c) 2008 Elsevier B.V. All rights reserved.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据