4.5 Article

Genome-wide identification of dominant polyadenylation hexamers for use in variant classification

Journal

HUMAN MOLECULAR GENETICS
Volume -, Issue -, Pages -

Publisher

OXFORD UNIV PRESS
DOI: 10.1093/hmg/ddad136

Keywords

polyadenylation; bioinformatics; variant classification; RNA seq

Ask authors/readers for more resources

Polyadenylation is a crucial process for mRNA stabilization and export, with hexamers playing a key role. However, only a few disorders have been associated with hexamer variants, possibly due to lack of defined hexamers in molecular analysis. To address this, the study aimed to define functionally important hexamers genome-wide and identified predominant polyA sites and putative predominant hexamers. Variants in these predominant hexamers showed potential pathogenic impact and could be used for interrogating exome and genome data in human diseases.
Polyadenylation is an essential process for the stabilization and export of mRNAs to the cytoplasm and the polyadenylation signal hexamer (herein referred to as hexamer) plays a key role in this process. Yet, only 14 Mendelian disorders have been associated with hexamer variants. This is likely an under-ascertainment as hexamers are not well defined and not routinely examined in molecular analysis. To facilitate the interrogation of putatively pathogenic hexamer variants, we set out to define functionally important hexamers genome-wide as a resource for research and clinical testing interrogation. We identified predominant polyA sites (herein referred to as pPAS) and putative predominant hexamers across protein coding genes (PAS usage >50% per gene). As a measure of the validity of these sites, the population constraint of 4532 predominant hexamers were measured. The predominant hexamers had fewer observed variants compared to non-predominant hexamers and trimer controls, and CADD scores for variants in these hexamers were significantly higher than controls. Exome data for 1477 individuals were interrogated for hexamer variants and transcriptome data were generated for 76 individuals with 65 variants in predominant hexamers. 3 & PRIME; RNA-seq data showed these variants resulted in alternate polyadenylation events (38%) and in elongated mRNA transcripts (12%). Our list of pPAS and predominant hexamers are available in the UCSC genome browser and on GitHub. We suggest this list of predominant hexamers can be used to interrogate exome and genome data. Variants in these predominant hexamers should be considered candidates for pathogenic variation in human disease, and to that end we suggest pathogenicity criteria for classifying hexamer variants.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.5
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available