☆ 4.5 Article

Stable bagging feature selection on medical data

JOURNAL OF BIG DATA (2021)

Journal

JOURNAL OF BIG DATA

Volume 8, Issue 1, Pages -

Publisher

SPRINGERNATURE

DOI: 10.1186/s40537-020-00385-8

Keywords

Feature selection; Ensemble technique; Bagging; Dimensionality reduction; Medical data; Microarray; Variance; Bias

Funding

Deanship of Scientific Research at King Khalid University [R.G.P2/100/41]

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

In the medical field, identifying relevant genes for a specific disease is crucial but challenging due to the curse of dimensionality. This paper proposes an ensemble approach based on the bagging technique to improve feature selection stability in medical datasets by reducing data variance. Experimental results show a significant improvement in stability while maintaining classification accuracy, with stability enhancement ranging from 20 to 50 percent.

In the medical field, distinguishing genes that are relevant to a specific disease, let's say colon cancer, is crucial to finding a cure and understanding its causes and subsequent complications. Usually, medical datasets are comprised of immensely complex dimensions with considerably small sample size. Thus, for domain experts, such as biologists, the task of identifying these genes have become a very challenging one, to say the least. Feature selection is a technique that aims to select these genes, or features in machine learning field with respect to the disease. However, learning from a medical dataset to identify relevant features suffers from the curse-of-dimensionality. Due to a large number of features with a small sample size, the selection usually returns a different subset each time a new sample is introduced into the dataset. This selection instability is intrinsically related to data variance. We assume that reducing data variance improves selection stability. In this paper, we propose an ensemble approach based on the bagging technique to improve feature selection stability in medical datasets via data variance reduction. We conducted an experiment using four microarray datasets each of which suffers from high dimensionality and relatively small sample size. On each dataset, we applied five well-known feature selection algorithms to select varying number of features. The proposed technique shows a significant improvement in selection stability while at least maintaining the classification accuracy. The stability improvement ranges from 20 to 50 percent in all cases. This implies that the likelihood of selecting the same features increased 20 to 50 percent more. This is accompanied with the increase of classification accuracy in most cases, which signifies the stated results of stability.

Stable bagging feature selection on medical data

Journal

JOURNAL OF BIG DATA

Publisher

SPRINGERNATURE

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Stable bagging feature selection on medical data

Journal

JOURNAL OF BIG DATA

Publisher

SPRINGERNATURE

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper