4.7 Article

Direct mapping and alignment of protein sequences onto genomic sequence

Journal

BIOINFORMATICS
Volume 24, Issue 21, Pages 2438-2444

Publisher

OXFORD UNIV PRESS
DOI: 10.1093/bioinformatics/btn460

Keywords

-

Funding

  1. Ministry of Education, Culture, Sports, Science and Technology of Japan [18017017, 20017018]
  2. Grants-in-Aid for Scientific Research [18017017, 20017018] Funding Source: KAKEN

Ask authors/readers for more resources

Motivation: Finding protein-coding genes in a newly determined genomic sequence is the first step toward understanding the content written in the genome. Sequences of transcripts of homologous genes, if available, can considerably improve accuracy of prediction of genes and their structures, compared with that without such knowledge. As protein sequences are generally better conserved than nucleotide sequences, remote homologs can be used as templates, extending the applicability of evidence-based gene recognition methods. However, no tool seems to have been developed so far to simultaneously map and align a number of protein sequences on mammalian-sized genomic sequence. Results: We have extended our computer program Spaln to accept protein sequences, as well as cDNA sequences, as queries. When the query and the target sequences are reasonably similar, e.g. between mammalian orthologs, Spaln runs one to two orders of magnitude faster than conventional approaches that rely on Blast search followed by dynamic-programming-based spliced alignment. Exon-level and gene-level accuracies of Spaln are significantly higher than those obtained by the best available methods of the same type, particularly when the query and the target are distantly related.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available