☆ 3.8 Proceedings Paper

Evaluation of Wrapper-based Feature Selection using Hard, Moderate, and Easy Bioinformatics Data

2014 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOENGINEERING (BIBE) (2014)

Journal

2014 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOENGINEERING (BIBE)

Volume -, Issue -, Pages 149-155

Publisher

IEEE

DOI: 10.1109/BIBE.2014.62

Keywords

Noise injection; difficulty of learning; wrapper-based feature selection; bioinformatics

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

One of the most challenging problems encountered when analyzing real-world gene expression datasets is high dimensionality (overabundance of features/attributes). This large number of features can lead to suboptimal classification performance and increased computation time. Feature selection, whereby only a subset of the original features are used for building a classification model, is the most commonly used technique to counter high dimensionality. One category of feature selection called wrapper-based techniques employ a classifier to directly find the subset of features which performs best. Unfortunately, noise can negatively impact the effectiveness of data mining techniques and subsequently lead to suboptimal results. Class noise in particular has a detrimental effect on the classification performance, making datasets perform poorly across a wide range of classifiers (i.e. having a high difficulty-of-learning.). No previous work has examined the effectiveness of wrapper-based feature selection when learning from real-world high dimensional gene expression datasets in the context of difficulty-of-learning due to noise. To study this effectiveness, we perform experiments using ten gene expression datasets which was first determined to be easy-to-learn-from then had artificial class noise injected in a controlled fashion creating three levels of difficulty-of-learning (Easy, Moderate, and Hard). Using the Naive Bayes learner, we perform wrapper feature selection followed by classification, using four classifiers (Naive Bayes, Multilayer Perceptron, 5-Nearest Neighbor, and Support Vector Machines), and we compare these results to the classification performance without feature selection. The results show that wrapper-based feature selection effectiveness depends on the choice of learner: for Multilayer Perceptron, wrapper selection improved performance compared to not using feature selection, while for Naive Bayes it slightly reduced performance and for the remaining learners it further reduced performance. Because its performance relative to no feature selection varied depending on the choice of learner, we recommend that wrapper selection be at least considered in future bioinformatics experiments, especially if the goal is gene discovery not classification. Also, as dimensionality reduction techniques are not only useful but necessary for high-dimensional bioinformatics datasets, the no-feature-selection case may not be feasible in practice.

Evaluation of Wrapper-based Feature Selection using Hard, Moderate, and Easy Bioinformatics Data

Journal

2014 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOENGINEERING (BIBE)

Publisher

IEEE

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Evaluation of Wrapper-based Feature Selection using Hard, Moderate, and Easy Bioinformatics Data

Journal

2014 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOENGINEERING (BIBE)

Publisher

IEEE

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper