4.5 Article

Feature selection and classification over the network with missing node observations

Journal

STATISTICS IN MEDICINE
Volume 41, Issue 7, Pages 1242-1262

Publisher

WILEY
DOI: 10.1002/sim.9267

Keywords

Bayesian nonparametrics; false discovery rate control; feature selection; gene networks

Funding

  1. National Institutes of Health [R01GM124061, R01HL095479, R01MH105561]

Ask authors/readers for more resources

Jointly analyzing transcriptomic data and existing biological networks leads to more robust and informative feature selection results and a better understanding of biological mechanisms. A new Bayesian node classification framework is proposed to handle missing values and improve classification accuracy while reducing bias in estimating gene effects. This method outperforms existing approaches in comprehensive simulation studies and analysis of real-world genomic data.
Jointly analyzing transcriptomic data and the existing biological networks can yield more robust and informative feature selection results, as well as better understanding of the biological mechanisms. Selecting and classifying node features over genome-scale networks has become increasingly important in genomic biology and genomic medicine. Existing methods have some critical drawbacks. The first is they do not allow flexible modeling of different subtypes of selected nodes. The second is they ignore nodes with missing values, very likely to increase bias in estimation. To address these limitations, we propose a general modeling framework for Bayesian node classification (BNC) with missing values. A new prior model is developed for the class indicators incorporating the network structure. For posterior computation, we resort to the Swendsen-Wang algorithm for efficiently updating class indicators. BNC can naturally handle missing values in the Bayesian modeling framework, which improves the node classification accuracy and reduces the bias in estimating gene effects. We demonstrate the advantages of our methods via extensive simulation studies and the analysis of the cutaneous melanoma dataset from The Cancer Genome Atlas.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.5
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available