4.7 Article

SRIQ clustering: A fusion of Random Forest, QT clustering, and KNN concepts

期刊

出版社

ELSEVIER
DOI: 10.1016/j.csbj.2022.03.036

关键词

Lung adenocarcinoma; Clustering; Molecular subtypes; Gene expression; Random Forest; QT clustering; KNN

资金

  1. Swedish Cancer Society
  2. Mrs Berta Kamprad Foundation, Sweden
  3. Swedish Research Council
  4. National Health Services (Region Skane/ALF)

向作者/读者索取更多资源

The study introduces a novel unsupervised clustering method called SRIQ, which can address some issues in commonly used unsupervised analysis methods. Using RNA sequencing data from lung adenocarcinomas, the technical reproducibility and performance of SRIQ are demonstrated and compared to the commonly used consensus clustering method. With differential gene expression analysis and auxiliary molecular data, SRIQ is able to define new tumor subsets that are biologically relevant and consistent with existing subtypes.
Gene expression profiling together with unsupervised analysis methods, typically clustering methods, has been used extensively in cancer research to unravel, e.g., new molecular subtypes that hold promise of disease refinement that may ultimately benefit patients. However, many of the commonly used methods require a prespecified number of clusters to extract and frequently require some type of feature preselection, e.g. variance filtering. This introduces subjectivity to the process of cluster discovery and the definition of putative novel tumor subtypes. Here, we introduce SRIQ, a novel unsupervised clustering method that could circumvent some of the issues in commonly used unsupervised analysis methods. SRIQ incorporates concepts from random forest machine learning as well as quality threshold- and knearest neighbor clustering. It is implemented as a Java and Python pipeline including data preprocessing, differential expression analysis, and pathway analysis. Using 434 lung adenocarcinomas profiled by RNA sequencing, we demonstrate the technical reproducibility of SRIQ and benchmark its performance compared to the commonly used consensus clustering method. Based on differential gene expression analysis and auxiliary molecular data we show that SRIQ can define new tumor subsets that appear biologically relevant and consistent compared and that these new subgroups seem to refine existing transcriptional subtypes that were defined using consensus clustering. Together, this provides support that SRIQ may be a useful new tool for unsupervised analysis of gene expression data from human malignancies. (c) 2022 The Author(s). Published by Elsevier B.V. on behalf of Research Network of Computational and Structural Biotechnology.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据