4.7 Article

Distinguishing between recent balancing selection and incomplete sweep using deep neural networks

期刊

MOLECULAR ECOLOGY RESOURCES
卷 21, 期 8, 页码 2706-2718

出版社

WILEY
DOI: 10.1111/1755-0998.13379

关键词

adaptation; genomics; proteomics; molecular evolution; natural selection and contemporary evolution; population genetics— empirical; population genetics— theoretical

资金

  1. Leverhulme Trust [RPG-2018-208]

向作者/读者索取更多资源

Balancing selection is an important adaptive mechanism for a wide range of phenotypes, but its detection from genomic data is challenging. In this study, two deep neural networks were developed and implemented to accurately predict loci under recent selection, with convolutional neural network (CNN) outperforming artificial neural network (ANN). The trained networks successfully identified loci under recent selection in European populations and within the MEFV gene region, highlighting the potential functional relevance of common variants predicted to be under incomplete sweep.
Balancing selection is an important adaptive mechanism underpinning a wide range of phenotypes. Despite its relevance, the detection of recent balancing selection from genomic data is challenging as its signatures are qualitatively similar to those left by ongoing positive selection. In this study, we developed and implemented two deep neural networks and tested their performance to predict loci under recent selection, either due to balancing selection or incomplete sweep, from population genomic data. Specifically, we generated forward-in-time simulations to train and test an artificial neural network (ANN) and a convolutional neural network (CNN). ANN received as input multiple summary statistics calculated on the locus of interest, while CNN was applied directly on the matrix of haplotypes. We found that both architectures have high accuracy to identify loci under recent selection. CNN generally outperformed ANN to distinguish between signals of balancing selection and incomplete sweep and was less affected by incorrect training data. We deployed both trained networks on neutral genomic regions in European populations and demonstrated a lower false-positive rate for CNN than ANN. We finally deployed CNN within the MEFV gene region and identified several common variants predicted to be under incomplete sweep in a European population. Notably, two of these variants are functional changes and could modulate susceptibility to familial Mediterranean fever, possibly as a consequence of past adaptation to pathogens. In conclusion, deep neural networks were able to characterize signals of selection on intermediate frequency variants, an analysis currently inaccessible by commonly used strategies.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据