4.8 Article

LTR_retriever: A Highly Accurate and Sensitive Program for Identification of Long Terminal Repeat Retrotransposons

Journal

PLANT PHYSIOLOGY
Volume 176, Issue 2, Pages 1410-1422

Publisher

AMER SOC PLANT BIOLOGISTS
DOI: 10.1104/pp.17.01310

Keywords

-

Categories

Funding

  1. National Science Foundation [MCB-1121650, IOS-1126998]
  2. United States Department of Agriculture National Institute of Food and Agriculture
  3. AgBioResearch at Michigan State University [MICL02408]
  4. Div Of Molecular and Cellular Bioscience
  5. Direct For Biological Sciences [1121650] Funding Source: National Science Foundation

Ask authors/readers for more resources

Long terminal repeat retrotransposons (LTR-RTs) are prevalent in plant genomes. The identification of LTR-RTs is critical for achieving high-quality gene annotation. Based on the well-conserved structure, multiple programs were developed for the de novo identification of LTR-RTs; however, these programs are associated with low specificity and high false discovery rates. Here, we report LTR_retriever, a multithreading-empowered Perl program that identifies LTR-RTs and generates high-quality LTR libraries from genomic sequences. LTR_retriever demonstrated significant improvements by achieving high levels of sensitivity (91%), specificity (97%), accuracy (96%), and precision (90%) in rice (Oryza sativa). LTR_retriever is also compatible with long sequencing reads. With 40k self-corrected PacBio reads equivalent to 4.5x genome coverage in Arabidopsis (Arabidopsis thaliana), the constructed LTR library showed excellent sensitivity and specificity. In addition to canonical LTR-RTs with 5'-TG ... CA-3' termini, LTR_retriever also identifies noncanonical LTR-RTs (non-TGCA), which have been largely ignored in genome-wide studies. We identified seven types of noncanonical LTRs from 42 out of 50 plant genomes. The majority of noncanonical LTRs are Copia elements, with which the LTR is four times shorter than that of other Copia elements, which may be a result of their target specificity. Strikingly, non-TGCA Copia elements are often located in genic regions and preferentially insert nearby or within genes, indicating their impact on the evolution of genes and their potential as mutagenesis tools.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.8
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available