4.2 Article

Data mining in genomics

Journal

CLINICS IN LABORATORY MEDICINE
Volume 28, Issue 1, Pages 145-+

Publisher

W B SAUNDERS CO-ELSEVIER INC
DOI: 10.1016/j.cll.2007.10.010

Keywords

-

Funding

  1. NHLBI NIH HHS [R01 HL081690-02, 1R01HL081690, R01 HL081690, R01 HL081690-01A1] Funding Source: Medline

Ask authors/readers for more resources

This article reviews important emerging statistical concepts, data mining techniques, and applications that have been recently developed and used for genomic data analysis. First, general background and some critical issues in genomic data mining are summarized. A novel concept of statistical significance is described, the so-called false discovery rate-the rate of false-positives among all positive findings-which has been suggested to control the error rate of numerous false-positives in large screening biological data analysis. Two recent statistical testing methods are then introduced: significance analysis of microarray and local pooled error tests. Statistical modeling in genomic data analysis is then presented, such as analysis of variance and heterogeneous error modeling approaches that have been suggested for analyzing microarray data obtained from multiple experimental or biological conditions. Two sections then describe data exploration and discovery tools largely termed as supervised learning and unsupervised learning. The former approaches include several multivariate statistical methods to investigate coexpression patterns of multiple genes, and the latter are the classification methods to discover genomic biomarker signatures for predicting important subclasses of human diseases. The last section briefly summarizes various genomic data mining approaches in biomedical pathway analysis and patient outcome or chemotherapeutic response prediction.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.2
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available