☆ 4.6 Article

Benchmarking relief-based feature selection methods for bioinformatics data mining

JOURNAL OF BIOMEDICAL INFORMATICS (2018)

Journal

JOURNAL OF BIOMEDICAL INFORMATICS

Volume 85, Issue -, Pages 168-188

Publisher

ACADEMIC PRESS INC ELSEVIER SCIENCE

DOI: 10.1016/j.jbi.2018.07.015

Keywords

Feature selection; ReliefF; Epistasis; Genetic heterogeneity; Classification; Regression

Funding

National Institutes of Health [AI116794, DK112217, ES013508, EY022300, HL134015, LM009012, LM010098, LM011360, TR001263]
Warren Center for Network and Data Science

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

Modern biomedical data mining requires feature selection methods that can (1) be applied to large scale feature spaces (e.g. 'omits' data), (2) function in noisy problems, (3) detect complex patterns of association (e.g. gene gene interactions), (4) be flexibly adapted to various problem domains and data types (e.g. genetic variants, gene expression, and clinical data) and (5) are computationally tractable. To that end, this work examines a set of filter-style feature selection algorithms inspired by the 'Relief' algorithm, i.e. Relief-Based algorithms (RBAs). We implement and expand these RBAs in an open source framework called ReBATE (Relief-Based Algorithm Training Environment). We apply a comprehensive genetic simulation study comparing existing RBAs, a proposed RBA called MultiStJRF, and other established feature selection methods, over a variety of problems. The results of this study (1) support the assertion that RBAs are particularly flexible, efficient, and powerful feature selection methods that differentiate relevant features having univariate, multivariate, epistatic, or heterogeneous associations, (2) confirm the efficacy of expansions for classification vs. regression, discrete vs. continuous features, missing data, multiple classes, or class imbalance, (3) identify previously unknown limitations of specific RBAs, and (4) suggest that while MultiSURF* performs best for explicitly identifying pure 2-way interactions, MuItiSURF yields the most reliable feature selection performance across a wide range of problem types.

Benchmarking relief-based feature selection methods for bioinformatics data mining

Journal

JOURNAL OF BIOMEDICAL INFORMATICS

Publisher

ACADEMIC PRESS INC ELSEVIER SCIENCE

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Benchmarking relief-based feature selection methods for bioinformatics data mining

Journal

JOURNAL OF BIOMEDICAL INFORMATICS

Publisher

ACADEMIC PRESS INC ELSEVIER SCIENCE

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper