4.7 Article

Analysis of Plant Pan-Genomes and Transcriptomes with GET_HOMOLOGUES-EST, a Clustering Solution for Sequences of the Same Species

期刊

FRONTIERS IN PLANT SCIENCE
卷 8, 期 -, 页码 -

出版社

FRONTIERS MEDIA SA
DOI: 10.3389/fpls.2017.00184

关键词

comparative genomics; pan-genome; RNA-seq; core-genome; accessory genome; Arabidopsis thaliana; barley

资金

  1. DGA - Obra Social La Caixa [GA-LC-059-2011]
  2. Spanish MINECO [AGL2013-48756-R, CSIC13-4E-2490, BES-2011-045905, AGL2010-21929]
  3. CONACyT-Mexico [179133]
  4. DGAPA-PAPIIT/UNAM [IN211814]
  5. Fundacion ARAID
  6. U. S. Department of Energy Joint Genome Institute, a DOE Office of Science User Facility [DE-AC02-05CH11231]

向作者/读者索取更多资源

The pan-genome of a species is defined as the union of all the genes and noncoding sequences found in all its individuals. However, constructing a pan-genome for plants with large genomes is daunting both in sequencing cost and the scale of the required computational analysis. A more affordable alternative is to focus on the genic repertoire by using transcriptomic data. Here, the software GET_HOMOLOGUES-EST was benchmarked with genomic and RNA-seq data of 19 Arabidopsis thaliana ecotypes and then applied to the analysis of transcripts from 16 Hordeum vulgare genotypes. The goal was to sample their pan-genomes and classify sequences as core, if detected in all accessions, or accessory, when absent in some of them. The resulting sequence clusters were used to simulate pan-genome growth, and to compile Average Nucleotide Identity matrices that summarize intra-species variation. Although transcripts were found to under-estimate pan-genome size by at least 10%, we concluded that clusters of expressed sequences can recapitulate phylogeny and reproduce two properties observed in A. thaliana gene models: accessory loci show lower expression and higher non-synonymous substitution rates than core genes. Finally, accessory sequences were observed to preferentially encode transposon components in both species, plus disease resistance genes in cultivated barleys, and a variety of protein domains from other families that appear frequently associated with presence/absence variation in the literature. These results demonstrate that pan-genome analyses are useful to explore germplasm diversity.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据