4.4 Article

An Integrated Approach for Microprotein Identification and Sequence Analysis

Journal

JOVE-JOURNAL OF VISUALIZED EXPERIMENTS
Volume -, Issue 185, Pages -

Publisher

JOURNAL OF VISUALIZED EXPERIMENTS
DOI: 10.3791/63841

Keywords

-

Funding

  1. National Institutes of Health [HL-141630, HL-160569]
  2. Cincinnati Children's Research Foundation

Ask authors/readers for more resources

This article presents a detailed protocol for using bioinformatic tools to query genomic regions for microprotein-coding potential and provides methods for multiple species alignments and microprotein characteristic analysis. These tools can help identify microprotein-coding sequences in noncanonical genomic regions or rule out the presence of a conserved coding sequence in noncoding transcripts of interest.
Next-generation sequencing (NGS) has propelled the field of genomics forward and produced whole genome sequences for numerous animal species and model organisms. However, despite this wealth of sequence information, comprehensive gene annotation efforts have proven challenging, especially for small proteins. Notably, conventional protein annotation methods were designed to intentionally exclude putative proteins encoded by short open reading frames (sORFs) less than 300 nucleotides in length to filter out the exponentially higher number of spurious noncoding sORFs throughout the genome. As a result, hundreds of functional small proteins called microproteins (<100 amino acids in length) have been incorrectly classified as noncoding RNAs or overlooked entirely. Here we provide a detailed protocol to leverage free, publicly available bioinformatic tools to query genomic regions for microprotein-coding potential based on evolutionary conservation. Specifically, we provide step-by-step instructions on how to examine sequence conservation and coding potential using Phylogenetic Codon Substitution Frequencies (PhyloCSF) on the user-friendly University of California Santa Cruz (UCSC) Genome Browser. Additionally, we detail steps to efficiently generate multiple species alignments of identified microprotein sequences to visualize amino acid sequence conservation and recommend resources to analyze microprotein characteristics, including predicted domain structures. These powerful tools can be used to help identify putative microprotein-coding sequences in noncanonical genomic regions or to rule out the presence of a conserved coding sequence with translational potential in a noncoding transcript of interest.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.4
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available