4.4 Article

Optimizing ancestral trait reconstruction of large HIV Subtype C datasets through multiple-trait subsampling

期刊

VIRUS EVOLUTION
卷 9, 期 2, 页码 -

出版社

OXFORD UNIV PRESS
DOI: 10.1093/ve/vead069

关键词

HIV Subtype C; multiple-trait subsampling; ancestral trait reconstruction; phylogenetic comparative methods; subsampling approaches

类别

向作者/读者索取更多资源

Large datasets and sampling bias pose challenges for phylodynamic reconstructions, especially when data are obtained from heterogeneous sources and/or convenience sampling. This study evaluates the impact of unbalanced sampled distribution on the reconstruction of HIV-1 subtype C dynamics using a comprehensive subsampling strategy. The results show that subsampling with all available traits, particularly using multigene datasets, can obtain the most suitable dataset for ancestral trait reconstruction. The study also demonstrates that sampling bias is inflated when vital information for a trait is missing or of poor quality.
Large datasets along with sampling bias represent a challenge for phylodynamic reconstructions, particularly when the study data are obtained from various heterogeneous sources and/or through convenience sampling. In this study, we evaluate the presence of unbalanced sampled distribution by collection date, location, and risk group of human immunodeficiency virus Type 1 Subtype C using a comprehensive subsampling strategy and assess their impact on the reconstruction of the viral spatial and risk group dynamics using phylogenetic comparative methods. Our study shows that a most suitable dataset for ancestral trait reconstruction can be obtained through subsampling by all available traits, particularly using multigene datasets. We also demonstrate that sampling bias is inflated when considerable information for a given trait is unavailable or of poor quality, as we observed for the trait risk group. In conclusion, we suggest that, even if traits are not well recorded, including them deliberately optimizes the representativeness of the original dataset rather than completely excluding them. Therefore, we advise the inclusion of as many traits as possible with the aid of subsampling approaches in order to optimize the dataset for phylodynamic analysis while reducing the computational burden. This will benefit research communities investigating the evolutionary and spatio-temporal patterns of infectious diseases.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.4
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据