☆ 4.5 Article

A new pipeline for structural characterization and classification of RNA-Seq microbiome data

BIODATA MINING (2021)

期刊

BIODATA MINING

卷 14, 期 1, 页码 -

出版社

BMC

DOI: 10.1186/s13040-021-00266-7

关键词

Microbial communities; Compositional nature; Classification method; 16 rRNA sequencing

类别

Mathematical & Computational Biology

资金

COLCIENCIAS [1215-5693-4635, 0770-2013]
Gobernacion del Atlantico (Colombia) [673]
National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD, USA [R01AI110385]
Universidad del Norte, Barranquilla, Colombia [FOFICO 32101 PE0031]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

This study introduces a novel approach for the classification of new samples into two groups with different biological settings using compositional data. The results demonstrate that the proposed method achieves classification accuracy of 98% or greater with synthetic data and outperforms other classification methods when applied to real gene sequencing datasets.

Background High-throughput sequencing enables the analysis of the composition of numerous biological systems, such as microbial communities. The identification of dependencies within these systems requires the analysis and assimilation of the underlying interaction patterns between all the variables that make up that system. However, this task poses a challenge when considering the compositional nature of the data coming from DNA-sequencing experiments because traditional interaction metrics (e.g., correlation) produce unreliable results when analyzing relative fractions instead of absolute abundances. The compositionality-associated challenges extend to the classification task, as it usually involves the characterization of the interactions between the principal descriptive variables of the datasets. The classification of new samples/patients into binary categories corresponding to dissimilar biological settings or phenotypes (e.g., control and cases) could help researchers in the development of treatments/drugs. Results Here, we develop and exemplify a new approach, applicable to compositional data, for the classification of new samples into two groups with different biological settings. We propose a new metric to characterize and quantify the overall correlation structure deviation between these groups and a technique for dimensionality reduction to facilitate graphical representation. We conduct simulation experiments with synthetic data to assess the proposed method's classification accuracy. Moreover, we illustrate the performance of the proposed approach using Operational Taxonomic Unit (OTU) count tables obtained through 16S rRNA gene sequencing data from two microbiota experiments. Also, compare our method's performance with that of two state-of-the-art methods. Conclusions Simulation experiments show that our method achieves a classification accuracy equal to or greater than 98% when using synthetic data. Finally, our method outperforms the other classification methods with real datasets from gene sequencing experiments.

A new pipeline for structural characterization and classification of RNA-Seq microbiome data

期刊

BIODATA MINING

出版社

BMC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

A new pipeline for structural characterization and classification of RNA-Seq microbiome data

期刊

BIODATA MINING

出版社

BMC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文