3.8 Proceedings Paper

Evaluation of Wrapper-based Feature Selection using Hard, Moderate, and Easy Bioinformatics Data

Publisher

IEEE
DOI: 10.1109/BIBE.2014.62

Keywords

Noise injection; difficulty of learning; wrapper-based feature selection; bioinformatics

Ask authors/readers for more resources

One of the most challenging problems encountered when analyzing real-world gene expression datasets is high dimensionality (overabundance of features/attributes). This large number of features can lead to suboptimal classification performance and increased computation time. Feature selection, whereby only a subset of the original features are used for building a classification model, is the most commonly used technique to counter high dimensionality. One category of feature selection called wrapper-based techniques employ a classifier to directly find the subset of features which performs best. Unfortunately, noise can negatively impact the effectiveness of data mining techniques and subsequently lead to suboptimal results. Class noise in particular has a detrimental effect on the classification performance, making datasets perform poorly across a wide range of classifiers (i.e. having a high difficulty-of-learning.). No previous work has examined the effectiveness of wrapper-based feature selection when learning from real-world high dimensional gene expression datasets in the context of difficulty-of-learning due to noise. To study this effectiveness, we perform experiments using ten gene expression datasets which was first determined to be easy-to-learn-from then had artificial class noise injected in a controlled fashion creating three levels of difficulty-of-learning (Easy, Moderate, and Hard). Using the Naive Bayes learner, we perform wrapper feature selection followed by classification, using four classifiers (Naive Bayes, Multilayer Perceptron, 5-Nearest Neighbor, and Support Vector Machines), and we compare these results to the classification performance without feature selection. The results show that wrapper-based feature selection effectiveness depends on the choice of learner: for Multilayer Perceptron, wrapper selection improved performance compared to not using feature selection, while for Naive Bayes it slightly reduced performance and for the remaining learners it further reduced performance. Because its performance relative to no feature selection varied depending on the choice of learner, we recommend that wrapper selection be at least considered in future bioinformatics experiments, especially if the goal is gene discovery not classification. Also, as dimensionality reduction techniques are not only useful but necessary for high-dimensional bioinformatics datasets, the no-feature-selection case may not be feasible in practice.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

3.8
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available