4.6 Article

Metagenomic Geolocation Prediction Using an Adaptive Ensemble Classifier

期刊

FRONTIERS IN GENETICS
卷 12, 期 -, 页码 -

出版社

FRONTIERS MEDIA SA
DOI: 10.3389/fgene.2021.642282

关键词

metagenomics; machine learning; ensemble classifier; microbiome; geolocation

资金

  1. National Science Foundation [DMS-1461948]

向作者/读者索取更多资源

This study utilized microbiome samples from urban environments to predict the geographical location of unknown samples, implemented multiple classifiers and a robust ensemble approach, and highlighted the unreliability of relying on a single classification algorithm for metagenomic samples. By combining several classifiers via ensemble approach, the study achieved classification results comparable to the best-performing component classifier.
Microbiome samples harvested from urban environments can be informative in predicting the geographic location of unknown samples. The idea that different cities may have geographically disparate microbial signatures can be utilized to predict the geographical location based on city-specific microbiome samples. We implemented this idea first; by utilizing standard bioinformatics procedures to pre-process the raw metagenomics samples provided by the CAMDA organizers. We trained several component classifiers and a robust ensemble classifier with data generated from taxonomy-dependent and taxonomy-free approaches. Also, we implemented class weighting and an optimal oversampling technique to overcome the class imbalance in the primary data. In each instance, we observed that the component classifiers performed differently, whereas the ensemble classifier consistently yielded optimal performance. Finally, we predicted the source cities of mystery samples provided by the organizers. Our results highlight the unreliability of restricting the classification of metagenomic samples to source origins to a single classification algorithm. By combining several component classifiers via the ensemble approach, we obtained classification results that were as good as the best-performing component classifier.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据