4.5 Article

Unsupervised learning with random forest predictors

期刊

出版社

TAYLOR & FRANCIS INC
DOI: 10.1198/106186006X94072

关键词

biomarkers; cluster analysis; dissimilarity; ensemble predictors; tumor markers

向作者/读者索取更多资源

A random forest (RF) predictor is an ensemble of individual tree predictors. As part of their construction, RF predictors naturally lead to a dissimilarity measure between the observations. One can also define an RF dissimilarity measure between unlabeled data: the idea is to construct an RF predictor that distinguishes the observed data from suitably generated synthetic data. The observed data are the original unlabeled data and the synthetic data are drawn from a reference distribution. Here we describe the properties of the RF dissimilarity and make recommendations on how to use it in practice. An RF dissimilarity can be attractive because it handles mixed variable types well, is invariant to monotonic transformations of the input variables, and is robust to outlying observations. The RF dissimilarity easily deals with a large number of variables due to its intrinsic variable selection; for example, the Addcl1 RF dissimilarity weighs the contribution of each variable according to how dependent it is on other variables. We find that the RF dissimilarity is useful for detecting tumor sample clusters on the basis of tumor marker expressions. In this application, biologically meaningful clusters can often be described with simple thresholding rules.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.5
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据