4.8 Article

From genotype to phenotype in Arabidopsis thaliana: in-silico genome interpretation predicts 288 phenotypes from sequencing data

期刊

NUCLEIC ACIDS RESEARCH
卷 50, 期 3, 页码 -

出版社

OXFORD UNIV PRESS
DOI: 10.1093/nar/gkab1099

关键词

-

资金

  1. FWO post-doctoral fellowship

向作者/读者索取更多资源

In this paper, a novel Genome Interpretation paradigm called Galiana is proposed, which directly models the genotype-to-phenotype relationship. The model is trained using Whole Genome sequencing data to predict Arabidopsis thaliana phenotypes, particularly related to flowering traits. Galiana achieves better performances and larger phenotype coverage compared to other models, and it is also fully interpretable using Saliency Maps gradient-based approaches. Additionally, 36 novel genes associated with flowering traits are identified.
In many cases, the unprecedented availability of data provided by high-throughput sequencing has shifted the bottleneck from a data availability issue to a data interpretation issue, thus delaying the promised breakthroughs in genetics and precision medicine, for what concerns Human genetics, and phenotype prediction to improve plant adaptation to climate change and resistance to bioagressors, for what concerns plant sciences. In this paper, we propose a novel Genome Interpretation paradigm, which aims at directly modeling the genotype-to-phenotype relationship, and we focus on A. thaliana since it is the best studied model organism in plant genetics. Our model, called Galiana, is the first end-to-end Neural Network (NN) approach following the genomes in/phenotypes out paradigm and it is trained to predict 288 real-valued Arabidopsis thaliana phenotypes from Whole Genome sequencing data. We show that 75 of these phenotypes are predicted with a Pearson correlation >= 0.4, and are mostly related to flowering traits. We show that our end-to-end NN approach achieves better performances and larger phenotype coverage than models predicting single phenotypes from the GWAS-derived known associated genes. Galiana is also fully interpretable, thanks to the Saliency Maps gradient-based approaches. We followed this interpretation approach to identify 36 novel genes that are likely to be associated with flowering traits, finding evidence for 6 of them in the existing literature.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.8
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据