4.7 Review

A Review of Matched-pairs Feature Selection Methods for Gene Expression Data Analysis

Journal

Publisher

ELSEVIER
DOI: 10.1016/j.csbj.2018.02.005

Keywords

Matched-pairs feature selection; Matched case-control design; Paired data; Gene expression

Funding

  1. National Natural Science Foundation of China [61472159, 61572227]
  2. Projects of International Cooperation and Exchanges NSFC [81320108025]
  3. Development Project of Jilin Province of China [20160204022GX, 2017C033, 20180414012GH]
  4. National Science Foundation/EPSCoR Award [IIA-1355423]
  5. State of South Dakota Research Innovation Center
  6. Agriculture Experiment Station of South Dakota State University
  7. Sanford Health - South Dakota State University Collaborative Research Seed Grant Program
  8. National Science Foundation [ACI-1548562]
  9. Office of Integrative Activities
  10. Office Of The Director [1355423] Funding Source: National Science Foundation

Ask authors/readers for more resources

With the rapid accumulation of gene expression data from various technologies, e.g., microarray, RNA-sequencing (RNA-seq), and single-cell RNA-seq, it is necessary to carry out dimensional reduction and feature (signature genes) selection in support of making sense out of such high dimensional data. These computational methods significantly facilitate further data analysis and interpretation, such as gene function enrichment analysis, cancer biomarker detection, and drug targeting identification in precision medicine. Although numerous methods have been developed for feature selection in bioinformatics, it is still a challenge to choose the appropriate methods for a specific problem and seek for the most reasonable ranking features. Meanwhile, the paired gene expression data under matched case-control design (MCCD) is becoming increasingly popular, which has often been used in multi-omics integration studies and may increase feature selection efficiency by offsetting similar distributions of confounding features. The appropriate feature selection methods specifically designed for the paired data, which is named as matched-pairs feature selection (MPFS), however, have not been maturely developed in parallel. In this review, we compare the performance of 10 feature-selection methods (eight MPFS methods and two traditional unpaired methods) on two real datasets by applied three classification methods, and analyze the algorithm complexity of these methods through the running of their programs. This review aims to induce and comprehensively present the MPFS in such a way that readers can easily understand its characteristics and get a clue in selecting the appropriate methods for their analyses. (c) 2018 The Authors. Published by Elsevier B.V. on behalf of Research Network of Computational and Structural Biotechnology. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available