☆ 4.6 Article

CoIn: Correlation Induced Clustering for Cognition of High Dimensional Bioinformatics Data

IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS (2023)

期刊

IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS

卷 27, 期 2, 页码 598-607

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

DOI: 10.1109/JBHI.2022.3179265

关键词

Correlation; Bioinformatics; Tumors; Clustering algorithms; Clustering methods; Biomedical imaging; Kernel; Clustering; correlation analysis; correlation induced clustering

类别

Computer Science, Information Systems Computer Science, Interdisciplinary Applications Mathematical & Computational Biology Medical Informatics

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

In this paper, a new correlation induced clustering method (CoIn) is proposed to address the problem of high-dimensional bioinformatics data clustering. The method captures complex correlations among high-dimensional data and guarantees correlation consistency within each cluster. The evaluation on a high-dimensional mass spectrometry dataset of liver cancer tumor demonstrates that the proposed method produces more explainable and understandable results for clinical analysis, showing the potential application for knowledge discovery in high-dimensional bioinformatics data.

Analysis of high dimensional biomedical data such as microarray gene expression data and mass spectrometry images, is crucial to provide better medical services including cancer subtyping, protein homology detection, etc. Clustering is a fundamental cognitive task which aims to group unlabeled data into multiple clusters based on their intrinsic similarities. However, for most clustering methods, including the most widely used $K$-means algorithm, all features of the high dimensional data are considered equally in relevance, which distorts the performance when clustering high-dimensional data where there exist many redundant variables and correlated variables. In this paper, we aim at addressing the problem of the high dimensional bioinformatics data clustering and propose a new correlation induced clustering, CoIn, to capture complex correlations among high dimensional data and guarantee the correlation consistency within each cluster. We evaluate the proposed method on a high dimensional mass spectrometry dataset of liver cancer tumor to explore the metabolic differences on tissues and discover the intra-tumor heterogeneity (ITH). By comparing the results of baselines and ours, it has been found that our method produces more explainable and understandable results for clinical analysis, which demonstrates the proposed clustering paradigm has the potential with application to knowledge discovery in high dimensional bioinformatics data.

CoIn: Correlation Induced Clustering for Cognition of High Dimensional Bioinformatics Data

期刊

IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

CoIn: Correlation Induced Clustering for Cognition of High Dimensional Bioinformatics Data

期刊

IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文