☆ 4.7 Article

On safari to Random Jungle: a fast implementation of Random Forests for high-dimensional data

BIOINFORMATICS (2010)

Journal

BIOINFORMATICS

Volume 26, Issue 14, Pages 1752-1758

Publisher

OXFORD UNIV PRESS

DOI: 10.1093/bioinformatics/btq257

Keywords

Funding

National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK)
DFG [KO 2250/3-1]
Medical Faculty of the University at Lubeck [E32-2009, SPP2]
NIH [5RO1-HL049609-14, 1R01-AG021917-01A1]
University of Minnesota
Minnesota Supercomputing Institute
GAW [R01-GM031575]
ENGAGE [201413]
Atherogenomics [01GS0831]

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

Motivation: Genome-wide association (GWA) studies have proven to be a successful approach for helping unravel the genetic basis of complex genetic diseases. However, the identified associations are not well suited for disease prediction, and only a modest portion of the heritability can be explained for most diseases, such as Type 2 diabetes or Crohn's disease. This may partly be due to the low power of standard statistical approaches to detect gene-gene and gene-environment interactions when small marginal effects are present. A promising alternative is Random Forests, which have already been successfully applied in candidate gene analyses. Important single nucleotide polymorphisms are detected by permutation importance measures. To this day, the application to GWA data was highly cumbersome with existing implementations because of the high computational burden. Results: Here, we present the new freely available software package Random Jungle (RJ), which facilitates the rapid analysis of GWA data. The program yields valid results and computes up to 159 times faster than the fastest alternative implementation, while still maintaining all options of other programs. Specifically, it offers the different permutation importance measures available. It includes new options such as the backward elimination method. We illustrate the application of RJ to a GWA of Crohn's disease. The most important single nucleotide polymorphisms (SNPs) validate recent findings in the literature and reveal potential interactions.

On safari to Random Jungle: a fast implementation of Random Forests for high-dimensional data

Journal

BIOINFORMATICS

Publisher

OXFORD UNIV PRESS

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

On safari to Random Jungle: a fast implementation of Random Forests for high-dimensional data

Journal

BIOINFORMATICS

Publisher

OXFORD UNIV PRESS

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper