4.7 Article

Medical data mining by fuzzy modeling with selected features

Journal

ARTIFICIAL INTELLIGENCE IN MEDICINE
Volume 43, Issue 3, Pages 195-206

Publisher

ELSEVIER
DOI: 10.1016/j.artmed.2008.04.004

Keywords

feature selection; fuzzy models; data mining; medical data; diagnosis

Ask authors/readers for more resources

Objective: Medical data is often very high dimensional. Depending upon the use, some data dimensions might be more relevant than others. In processing medical data, choosing the optimal. subset of features is such important, not only to reduce the processing cost but also to improve the usefulness of the model built from the selected data. This paper presents a data mining study of medical data with fuzzy modeling methods that use feature subsets selected by some indices/methods. Methods: Specifically, three fuzzy modeling methods including the fuzzy k-nearest neighbor algorithm, a fuzzy clustering-based modeling, and the adaptive network-based fuzzy inference system are employed. For feature selection, a total of 11 indices/methods are used. Medical data mined include the Wisconsin breast cancer dataset and the Pima Indians diabetes dataset. The classification accuracy and computational time are reported. To show how good the best performer is, the globally optimal. was also found by carrying out an exhaustive testing of all possible combinations of feature subsets with three features. Results: For the Wisconsin breast cancer dataset, the best accuracy of 97.17% was obtained, which is only 0.25% tower than that was obtained by exhaustive testing. For the Pima Indians diabetes dataset, the best accuracy of 77.65% was obtained, which is only 0.13% lower than that obtained by exhaustive testing. Conclusion: This paper has shown that feature selection is important to mining medical data for reducing processing time and for increasing classification accuracy. However, not all combinations of feature selection and modeling methods are equally effective and the best combination is often data-dependent, as supported by the breast cancer and diabetes data analyzed in this paper. (C) 2008 Elsevier B.V. All rights reserved.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available