4.7 Review

A simple guide to de novo transcriptome assembly and annotation

Related references

Note: Only part of the references are listed.
Article Biochemical Research Methods

CodAn: predictive models for precise identification of coding regions in eukaryotic transcripts

Pedro G. Nachtigall et al.

Summary: CodAn is a new approach for predicting confident CDS and UTR regions in full or partial transcriptome sequences in eukaryote species. The analysis showed that CodAn performs reliable predictions on full-length and partial transcripts, especially in correctly identifying CDSs. Therefore, CodAn is the best tool to use in projects involving transcriptomic data.

BRIEFINGS IN BIOINFORMATICS (2021)

Article Biochemistry & Molecular Biology

Error, noise and bias in de novo transcriptome assemblies

Adam H. Freedman et al.

Summary: De novo transcriptome assembly is a powerful tool for evolutionary inferences, but it often faces bias and noise issues in data analysis, leading to underestimation of diversity and gene expression estimates. Researchers should look for ways to minimize the impact of bias in transcriptome assemblies.

MOLECULAR ECOLOGY RESOURCES (2021)

Article Biochemistry & Molecular Biology

TOA: A software package for automated functional annotation in non-model plant species

Fernando Mora-Marquez et al.

Summary: TOA is a user-friendly open source application designed for functional annotation in non-model plant species, outperforming other software in terms of number of annotated sequences and accuracy of the annotation.

MOLECULAR ECOLOGY RESOURCES (2021)

Review Cell Biology

Gene regulation by long non-coding RNAs and its biological functions

Luisa Statello et al.

Summary: Recent studies have shed new light on the biogenesis and functions of long non-coding RNAs (lncRNAs), showcasing their diverse roles in gene regulation and signaling pathways, particularly in the contexts of neuronal disorders, immune responses, and cancer. The discovery of unique biogenesis pathways and subcellular localizations of lncRNAs has opened up potential therapeutic avenues for targeting lncRNAs in various biological and pathophysiological conditions.

NATURE REVIEWS MOLECULAR CELL BIOLOGY (2021)

Article Biochemistry & Molecular Biology

The Gene Ontology resource: enriching a GOld mine

Seth Carbon et al.

Summary: The Gene Ontology Consortium has made advancements in the last two years, such as improving the GO-CAM annotation framework, increasing the number of annotations and annotated gene products, and reviewing older annotations for consistency.

NUCLEIC ACIDS RESEARCH (2021)

Article Biochemistry & Molecular Biology

UniProt: the universal protein knowledgebase in 2021

Alex Bateman et al.

Summary: The UniProt Knowledgebase aims to provide users with a comprehensive, high-quality set of protein sequences annotated with functional information. Updates over the past two years have increased the number of sequences to approximately 190 million, with new methods to assess proteome completeness and quality. UniProtKB has responded to the COVID-19 pandemic by expertly curating relevant entries and making them rapidly available through a dedicated portal.

NUCLEIC ACIDS RESEARCH (2021)

Article Biochemistry & Molecular Biology

Rfam 14: expanded coverage of metagenomic, viral and microRNA families

Ioanna Kalvari et al.

Summary: Rfam is a database of RNA families with 3444 families, each represented by a multiple sequence alignment of known RNA sequences and a covariance model for searching additional members. Recent developments focused on improving data quality and coverage, adding new families like microRNAs, viral and bacterial RNAs through expert collaborations. The database saw significant growth with 759 new families added in Rfam 14, along with new features such as the Rfam Cloud family curation system.

NUCLEIC ACIDS RESEARCH (2021)

Article Biochemistry & Molecular Biology

OrthoDB in 2020: evolutionary and functional annotations of orthologs

Evgeny M. Zdobnov et al.

Summary: OrthoDB provides evolutionary and functional annotations of orthologs for a vast number of organisms, with a wide coverage of species and rich data content. The user interface has been enhanced, and features three views on the data, as well as online tools for interactive exploration and data download.

NUCLEIC ACIDS RESEARCH (2021)

Article Biochemistry & Molecular Biology

FlyBase: updates to the Drosophila melanogaster knowledge base

Aoife Larkin et al.

Summary: FlyBase is an essential online database for researchers using Drosophila melanogaster, offering a wide range of genetic, molecular, genomic resources. New features include Pathway Reports, paralog information, disease models based on orthology, customizable tables, and expression and disease data overview displays. Recent updates include developmental proteome incorporation, GAL4 search tab upgrades, additional Experimental Tool Reports, migration to JBrowse for genome browsing, and improvements to batch queries/downloads and the Fast-Track Your Paper tool.

NUCLEIC ACIDS RESEARCH (2021)

Article Biochemistry & Molecular Biology

CATH: increased structural coverage of functional space

Ian Sillitoe et al.

Summary: CATH identifies protein domains in structures and classifies them into evolutionary superfamilies, providing structural and functional annotations. The latest release significantly increases coverage of structural and sequence data, with additional derived data such as predicted sequence domains and functionally coherent sequence subsets. The FunFam generation pipeline has been re-engineered to capture more sequences with increased functional purity and information content.

NUCLEIC ACIDS RESEARCH (2021)

Review Biology

Streamlining data-intensive biology with workflow systems

Taylor Reiter et al.

Summary: With the increasing scale of biological data generation, the bottleneck of research has shifted from data generation to analysis. Data-centric workflow systems are reshaping the landscape of biological data analysis, empowering researchers to conduct reproducible analyses at scale, but knowledge of these techniques is still lacking.

GIGASCIENCE (2021)

Article Entomology

A De Novo Transcriptomics Approach Reveals Genes Involved in Thrips Tabaci Resistance to Spinosad

Ran Rosen et al.

Summary: The study explored the molecular and biological basis of resistance to spinosad in onion thrips populations, highlighting metabolic resistance mechanisms and increased fecundity in resistant populations. The research provides valuable genetic resources and insights for future studies on insect pest resistance in agriculture.

INSECTS (2021)

Article Biochemical Research Methods

Using prototyping to choose a bioinformatics workflow management system

Michael Jackson et al.

Summary: Data analysis involves multiple steps, and workflow management systems can help scientists process data more efficiently and provide various benefits, such as enhancing reproducibility and supporting portability. Researchers select a suitable workflow management system for their project through prototyping, emphasizing it as a cost-effective decision-making approach.

PLOS COMPUTATIONAL BIOLOGY (2021)

Review Genetics & Heredity

Temporal Dynamic Methods for Bulk RNA-Seq Time Series Data

Vera-Khlara S. Oh et al.

Summary: This article extensively reviews the applications of dynamic studies in time course experimental designs and clinical approaches, highlighting their significance and challenges in the biomedical field, as well as providing recommendations for future research directions.

GENES (2021)

Article Biology

Transcriptome annotation in the cloud: complexity, best practices, and cost

Roberto Vera Alvarez et al.

Summary: The researchers compared multiple BLAST sequence alignments using AWS and GCP and found that public cloud providers are a practical and cost-effective alternative for conducting advanced computational biology experiments. Their study aimed to establish an estimate of the cost and compute time needed for the execution of multiple BLAST runs in a cloud environment.

GIGASCIENCE (2021)

Article Biochemical Research Methods

Fast and sensitive taxonomic assignment to metagenomic contigs

M. Mirdita et al.

Summary: MMseqs2 taxonomy is a new tool for assigning taxonomic labels to metagenomic contigs. It extracts protein fragments from each contig, retains those relevant for taxonomic annotation, and determines the taxonomic identity using weighted voting. MMseqs2 is 2-18 times faster than existing tools and includes modules for creating and manipulating taxonomic reference databases.

BIOINFORMATICS (2021)

Article Biotechnology & Applied Microbiology

Strategies for detecting and identifying biological signals amidst the variation commonly found in RNA sequencing data

William W. Wilfinger et al.

Summary: This paper presents a three-step process for evaluating biological variability within a group in RNA sequencing data, including scaling gene counts, rank-ordering genes to detect potentially divergent trendlines, and testing with STRING database for statistically significant pathway associations. The analysis revealed that a majority of sequenced genes displayed minimal variation and dispersion across the sample group, while smaller subsets of genes showed markedly skewed trendlines, wide dispersion and variability. STRING database analysis identified interferon-mediated response networks in a percentage of individuals sampled, showcasing the importance of identifying highly variable genes and their network associations within specific individuals.

BMC GENOMICS (2021)

Article Biochemical Research Methods

Sensitive protein alignments at tree-of-life scale using DIAMOND

Benjamin Buchfink et al.

Summary: We are at the beginning of a genomic revolution where all known species are planned to be sequenced. The improved version of DIAMOND allows for quick tree-of-life scale protein alignments.

NATURE METHODS (2021)

Article Biochemistry & Molecular Biology

TRAPID 2.0: a web application for taxonomic and functional analysis of de novo transcriptomes

Francois Bucchini et al.

Summary: Advances in high-throughput sequencing have led to a massive increase in RNA-Seq transcriptome data, but also present new computational challenges. TRAPID 2.0 is a web application designed for fast and efficient processing of assembled transcriptome data, providing global characterization and multi-layer annotation for downstream analysis.

NUCLEIC ACIDS RESEARCH (2021)

Article Genetics & Heredity

Pincho: A Modular Approach to High Quality De Novo Transcriptomics

Randy Ortiz et al.

Summary: Transcriptomic reconstructions without reference are common for non-model biological systems, but the lack of standardization and customization in bioinformatic workflows poses challenges. The increasing number of transcriptome assembly software further complicates the issue, highlighting the need for studies on assembler synergy and the development of customizable management workflows.

GENES (2021)

Article Ecology

Signal, bias, and the role of transcriptome assembly quality in phylogenomic inference

Jennifer L. Spillane et al.

Summary: The study demonstrates that high-quality transcriptome assemblies generate more diverse and accurate phylogenomic datasets with a greater number of unique partitions and stronger phylogenetic signal compared to low-quality assemblies. This highlights the importance of transcriptome assembly quality in phylogenomic analyses and suggests that improving assembly quality can alleviate some uncertainties observed in such studies.

BMC ECOLOGY AND EVOLUTION (2021)

Article Biotechnology & Applied Microbiology

3 ′-5 ′ crosstalk contributes to transcriptional bursting

Massimo Cavallaro et al.

Summary: This study focuses on the contributions of processes at the 3 (' ) and 5 (' ) ends of a gene to transcriptional noise, and measures transcriptional bursting using Bayesian methodology. The results show that perturbation of polymerase shuttling typically reduces burst size, increases burst frequency, and thus limits transcriptional noise.

GENOME BIOLOGY (2021)

Review Biochemistry & Molecular Biology

Alternative splicing and cancer: a systematic review

Yuanjiao Zhang et al.

Summary: The study systematically describes the abnormal regulation and functions of alternative splicing in tumors, as well as introduces therapeutic strategies targeting splicing catalysis and regulatory proteins. Further research is needed to fully understand the association between alternative splicing and cancer.

SIGNAL TRANSDUCTION AND TARGETED THERAPY (2021)

Article Biochemistry & Molecular Biology

OMA orthology in 2021: website overhaul, conserved isoforms, ancestral gene order and more

Adrian M. Altenhoff et al.

Summary: OMA is a resource that elucidates evolutionary relationships among 2326 genes, providing functions such as pairwise and groupwise orthologs and functional annotations. The updated OMA database has been reorganized into gene-, group-, and genome-centric pages, with new features and improvements added.

NUCLEIC ACIDS RESEARCH (2021)

Article Biochemistry & Molecular Biology

KEGG: integrating viruses and cellular organisms

Minoru Kanehisa et al.

Summary: KEGG is a curated resource integrating eighteen databases categorized into systems, genomic, chemical and health information, providing mapping tools for understanding cellular and organism-level functions from genome sequences and other molecular datasets. The network variation maps in the KEGG database show how different pathogens and environmental factors influence cellular signaling pathways.

NUCLEIC ACIDS RESEARCH (2021)

Article Biochemistry & Molecular Biology

Pfam: The protein families database in 2021

Jaina Mistry et al.

Summary: The Pfam database has recently added a large number of protein families and domains, made revisions for COVID-19 research, and introduced Pfam-B as a supplement. These updates and improvements can help researchers classify protein sequences more effectively and conduct related studies.

NUCLEIC ACIDS RESEARCH (2021)

Article Biochemistry & Molecular Biology

TRAPID 2.0: a web application for taxonomic and functional analysis of de novo transcriptomes

Francois Bucchini et al.

Summary: Advances in high-throughput sequencing have led to a massive increase in RNA-Seq transcriptome data, with de novo assembled (meta)transcriptomes becoming popular tools for investigating gene repertoires. However, these datasets often contain fragmented or contaminant sequences, making analysis difficult. To address these challenges, TRAPID 2.0 was developed as a web application for efficient processing of assembled transcriptome data, offering global characterization, functional analysis, and interactive data visualizations for extracting biological insights.

NUCLEIC ACIDS RESEARCH (2021)

Article Biochemical Research Methods

IsoTree: A New Framework for de novo Transcriptome Assembly from RNA-seq Reads

Jin Zhao et al.

IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS (2020)

Article Biochemistry & Molecular Biology

EnTAP: Bringing faster and smarter functional annotation to non-model eukaryotic transcriptomes

Alexander J. Hart et al.

MOLECULAR ECOLOGY RESOURCES (2020)

Article Biochemistry & Molecular Biology

CDD/SPARCLE: the conserved domain database in 2020

Shennan Lu et al.

NUCLEIC ACIDS RESEARCH (2020)

Article Biotechnology & Applied Microbiology

Pan-tissue transcriptome analysis of long noncoding RNAs in the American beaver Castor canadensis

Amita Kashyap et al.

BMC GENOMICS (2020)

Article Biotechnology & Applied Microbiology

Compacta: a fast contig clustering tool for de novo assembled transcriptomes

Fernando G. Razo-Mendivil et al.

BMC GENOMICS (2020)

Article Biochemistry & Molecular Biology

Evaluation of Seven Different RNA-Seq Alignment Tools Based on Experimental Data from the Model Plant Arabidopsis thaliana

Stephanie Schaarschmidt et al.

INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES (2020)

Letter Biotechnology & Applied Microbiology

The nf-core framework for community-curated bioinformatics pipelines

Philip A. Ewels et al.

NATURE BIOTECHNOLOGY (2020)

Article Multidisciplinary Sciences

The Rhinella arenarum transcriptome: de novo assembly, annotation and gene prediction

Danilo Guillermo Ceschin et al.

SCIENTIFIC REPORTS (2020)

Article Multidisciplinary Sciences

De novo transcriptome assembly and annotation for gene discovery in avocado, macadamia and mango

Tinashe G. Chabikwa et al.

SCIENTIFIC DATA (2020)

Article Biotechnology & Applied Microbiology

Expanding the Chinese hamster ovary cell long noncoding RNA transcriptome using RNASeq

Krishna Motheramgari et al.

BIOTECHNOLOGY AND BIOENGINEERING (2020)

Article Biochemical Research Methods

ELIXIR-IT HPC@CINECA: high performance computing resources for the bioinformatics community

Tiziana Castrigano et al.

BMC BIOINFORMATICS (2020)

Review Genetics & Heredity

mRNAs, proteins and the emerging principles of gene expression control

Christopher Buccitelli et al.

NATURE REVIEWS GENETICS (2020)

Article Biochemistry & Molecular Biology

RNA-Bloom enables reference-free and reference-guided sequence assembly for single-cell transcriptomes

Ka Ming Nip et al.

GENOME RESEARCH (2020)

Review Biotechnology & Applied Microbiology

Opportunities and challenges in long-read sequencing data analysis

Shanika L. Amarasinghe et al.

GENOME BIOLOGY (2020)

Review Biochemical Research Methods

Interpretation of differential gene expression results of RNA-seq data: review and integration

Adam McDermaid et al.

BRIEFINGS IN BIOINFORMATICS (2019)

Article Biochemical Research Methods

JustOrthologs: a fast, accurate and user-friendly ortholog identification algorithm

Justin B. Miller et al.

BIOINFORMATICS (2019)

Article Biotechnology & Applied Microbiology

SignalP 5.0 improves signal peptide predictions using deep neural networks

Jose Juan Almagro Armenteros et al.

NATURE BIOTECHNOLOGY (2019)

Review Pharmacology & Pharmacy

Alternative splicing, RNA-seq and drug discovery

Shanrong Zhao

DRUG DISCOVERY TODAY (2019)

Review Genetics & Heredity

Single-Cell RNA-Seq Technologies and Related Computational Data Analysis

Geng Chen et al.

FRONTIERS IN GENETICS (2019)

Review Genetics & Heredity

Coding or Noncoding, the Converging Concepts of RNAs

Jing Li et al.

FRONTIERS IN GENETICS (2019)

Article Biochemistry & Molecular Biology

OMA standalone: orthology inference among public and custom genomes and transcriptomes

Adrian M. Altenhoff et al.

GENOME RESEARCH (2019)

Article Biochemical Research Methods

Protein-level assembly increases protein sequence recovery from metagenomic samples manyfold

Martin Steinegger et al.

NATURE METHODS (2019)

Review Genetics & Heredity

RNA sequencing: the teenage years

Rory Stark et al.

NATURE REVIEWS GENETICS (2019)

Article Biochemistry & Molecular Biology

The EMBL-EBI search and sequence analysis tools APIs in 2019

Fabio Madeira et al.

NUCLEIC ACIDS RESEARCH (2019)

Article Multidisciplinary Sciences

A comprehensive examination of Nanopore native RNA sequencing for characterization of complex transcriptomes

Charlotte Soneson et al.

NATURE COMMUNICATIONS (2019)

Article Multidisciplinary Sciences

De Novo Transcriptome Assembly and Functional Annotation in Five Species of Bats

Diana D. Moreno-Santillan et al.

SCIENTIFIC REPORTS (2019)

Editorial Material Multidisciplinary Sciences

THAT'S THE WAY WE FLOW

Jeffrey M. Perkel

NATURE (2019)

Review Biochemistry & Molecular Biology

Toward understanding the origin and evolution of cellular organisms

Minoru Kanehisa

PROTEIN SCIENCE (2019)

Article Ecology

The Bellerophon pipeline, improving de novo transcriptomes and removing chimeras

Jesse Kerkvliet et al.

ECOLOGY AND EVOLUTION (2019)

Review Biology

Realizing the potential of full-length transcriptome sequencing

Ashley Byrne et al.

PHILOSOPHICAL TRANSACTIONS OF THE ROYAL SOCIETY B-BIOLOGICAL SCIENCES (2019)

Article Biochemical Research Methods

TPMCalculator: one-step software to quantify mRNA abundance of genomic features

Roberto Vera Alvarez et al.

BIOINFORMATICS (2019)

Article Biochemical Research Methods

MMseqs2 desktop and local web server app for fast, interactive sequence searches

Milot Mirdita et al.

BIOINFORMATICS (2019)

Article Biochemical Research Methods

DTA-SiST: de novo transcriptome assembly by using simplified suffix trees

Jin Zhao et al.

BMC BIOINFORMATICS (2019)

Article Biotechnology & Applied Microbiology

OrthoFinder: phylogenetic orthology inference for comparative genomics

David M. Emms et al.

GENOME BIOLOGY (2019)

Article Biotechnology & Applied Microbiology

TransLiG: a de novo transcriptome assembler that uses line graph iteration

Juntao Liu et al.

GENOME BIOLOGY (2019)

Article Multidisciplinary Sciences

Improving in-silico normalization using read weights

Dilip A. Durai et al.

SCIENTIFIC REPORTS (2019)

Article Biotechnology & Applied Microbiology

Improved metagenomic analysis with Kraken 2

Derrick E. Wood et al.

GENOME BIOLOGY (2019)

Review Cell Biology

The emerging complexity of the tRNA world: mammalian tRNAs beyond protein synthesis

Paul Schimmel

NATURE REVIEWS MOLECULAR CELL BIOLOGY (2018)

Article Biochemistry & Molecular Biology

Database resources of the National Center for Biotechnology Information

Richa Agarwala et al.

NUCLEIC ACIDS RESEARCH (2018)

Article Biochemistry & Molecular Biology

PLAZA 4.0: an integrative resource for functional, evolutionary and comparative plant genomics

Michiel Van Bel et al.

NUCLEIC ACIDS RESEARCH (2018)

Article Biochemistry & Molecular Biology

Gene3D: Extensive prediction of globular domains in proteins

Tony E. Lewis et al.

NUCLEIC ACIDS RESEARCH (2018)

Article Biochemical Research Methods

Grouper: graph-based clustering and annotation for improved de novo transcriptome analysis

Laraib Malik et al.

BIOINFORMATICS (2018)

Article Biochemical Research Methods

fastp: an ultra-fast all-in-one FASTQ preprocessor

Shifu Chen et al.

BIOINFORMATICS (2018)

Article Biotechnology & Applied Microbiology

De novo transcriptome assembly, annotation and comparison of four ecological and evolutionary model salmonid fish species

Madeleine Carruthers et al.

BMC GENOMICS (2018)

Article Biotechnology & Applied Microbiology

Limitations of alignment-free tools in total RNA-seq quantification

Douglas C. Wu et al.

BMC GENOMICS (2018)

Letter Biochemical Research Methods

Bioconda: sustainable and comprehensive software distribution for the life sciences

Bjoern Gruening et al.

NATURE METHODS (2018)

Article Biochemistry & Molecular Biology

PANNZER2: a rapid functional annotation web server

Petri Toronen et al.

NUCLEIC ACIDS RESEARCH (2018)

Article Biochemistry & Molecular Biology

The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2018 update

Enis Afgan et al.

NUCLEIC ACIDS RESEARCH (2018)

Article Multidisciplinary Sciences

Clustering huge protein sequence sets in linear time

Martin Steinegger et al.

NATURE COMMUNICATIONS (2018)

Editorial Material Biochemistry & Molecular Biology

How complete are complete genome assemblies?-An avian perspective

Valentina Peona et al.

MOLECULAR ECOLOGY RESOURCES (2018)

Article Multidisciplinary Sciences

The axolotl genome and the evolution of key tissue formation regulators

Sergej Nowoshilow et al.

NATURE (2018)

Article Multidisciplinary Sciences

Improving nanopore read accuracy with the R2C2 method enables the sequencing of highly multiplexed full-length single-cell cDNA

Roger Volden et al.

PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA (2018)

Review Mathematical & Computational Biology

Modeling and analysis of RNA-seq data: a review from a statistical perspective

Wei Vivian Li et al.

QUANTITATIVE BIOLOGY (2018)

Article Information Science & Library Science

High-performance computing service for bioinformatics and data science

Jean-Paul Courneya et al.

JOURNAL OF THE MEDICAL LIBRARY ASSOCIATION (2018)

Review Biochemical Research Methods

A review of bioinformatic pipeline frameworks

Jeremy Leipzig

BRIEFINGS IN BIOINFORMATICS (2017)

Review Cell Biology

RNA-Seq methods for transcriptome analysis

Radmila Hrdlickova et al.

WILEY INTERDISCIPLINARY REVIEWS-RNA (2017)

Article Biochemistry & Molecular Biology

Fast Genome-Wide Functional Annotation through Orthology Assignment by eggNOG-Mapper

Jaime Huerta-Cepas et al.

MOLECULAR BIOLOGY AND EVOLUTION (2017)

Letter Biotechnology & Applied Microbiology

Nextflow enables reproducible computational workflows

Paolo Di Tommaso et al.

NATURE BIOTECHNOLOGY (2017)

Letter Biotechnology & Applied Microbiology

MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets

Martin Steinegger et al.

NATURE BIOTECHNOLOGY (2017)

Article Biochemical Research Methods

Salmon provides fast and bias-aware quantification of transcript expression

Rob Patro et al.

NATURE METHODS (2017)

Article Biochemistry & Molecular Biology

CPC2: a fast and accurate coding potential calculator based on sequence intrinsic features

Yu-Jian Kang et al.

NUCLEIC ACIDS RESEARCH (2017)

Article Biochemical Research Methods

Sma3s: A universal tool for easy functional annotation of proteomes and transcriptomes

Carlos S. Casimiro-Soriguer et al.

PROTEOMICS (2017)

Article Multidisciplinary Sciences

BBMerge - Accurate paired shotgun read merging via overlap

Brian Bushnell et al.

PLOS ONE (2017)

Article Multidisciplinary Sciences

Benchmarking of RNA-sequencing analysis workflows using wholetranscriptome RT-qPCR expression data

Celine Everaert et al.

SCIENTIFIC REPORTS (2017)

Article Biochemical Research Methods

fLPS: Fast discovery of compositional biases for the protein universe

Paul M. Harrison

BMC BIOINFORMATICS (2017)

Article Biochemical Research Methods

An improved filtering algorithm for big read datasets and its application to single-cell assembly

Axel Wedemeyer et al.

BMC BIOINFORMATICS (2017)

Article Biochemical Research Methods

MISA-web: a web server for microsatellite prediction

Sebastian Beier et al.

BIOINFORMATICS (2017)

Article Biotechnology & Applied Microbiology

Evaluation and comparison of computational tools for RNA-seq isoform quantification

Chi Zhang et al.

BMC GENOMICS (2017)

Software Review Biochemical Research Methods

Multiple sequence alignment modeling: methods and applications

Maria Chatzou et al.

BRIEFINGS IN BIOINFORMATICS (2016)

Article Biochemistry & Molecular Biology

Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation

Nuala A. O'Leary et al.

NUCLEIC ACIDS RESEARCH (2016)

Article Biochemical Research Methods

DOGMA: domain-based transcriptome and proteome quality assessment

Elias Dohmen et al.

BIOINFORMATICS (2016)

Article Biochemical Research Methods

MetaCycle: an integrated R package to evaluate periodicity in large scale data

Gang Wu et al.

BIOINFORMATICS (2016)

Article Biochemical Research Methods

MultiQC: summarize analysis results for multiple tools and samples in a single report

Philip Ewels et al.

BIOINFORMATICS (2016)

Article Biochemical Research Methods

rnaQUAST: a quality assessment tool for de novo transcriptome assemblies

Elena Bushmanova et al.

BIOINFORMATICS (2016)

Article Biochemistry & Molecular Biology

Centrifuge: rapid and sensitive classification of metagenomic sequences

Daehwan Kim et al.

GENOME RESEARCH (2016)

Article Biochemistry & Molecular Biology

TransRate: reference-free quality assessment of de novo transcriptome assemblies

Richard Smith-Unna et al.

GENOME RESEARCH (2016)

Review Biochemistry & Molecular Biology

BlastKOALA and GhostKOALA: KEGG Tools for Functional Characterization of Genome and Metagenome Sequences

Minoru Kanehisa et al.

JOURNAL OF MOLECULAR BIOLOGY (2016)

Review Biochemistry & Molecular Biology

The power and promise of RNA-seq in ecology and evolution

Erica V. Todd et al.

MOLECULAR ECOLOGY (2016)

Article Biotechnology & Applied Microbiology

Near-optimal probabilistic RNA-seq quantification

Nicolas L. Bray et al.

NATURE BIOTECHNOLOGY (2016)

Article Biochemical Research Methods

BinPacker: Packing-Based De Novo Transcriptome Assembly from RNA-seq Data

Juntao Liu et al.

PLOS COMPUTATIONAL BIOLOGY (2016)

Review Mathematical & Computational Biology

Visual programming for next-generation sequencing data analytics

Franco Milicchio et al.

BIODATA MINING (2016)

Article Multidisciplinary Sciences

SeqKit: A Cross-Platform and Ultrafast Toolkit for FASTA/Q File Manipulation

Wei Shen et al.

PLOS ONE (2016)

Article Multidisciplinary Sciences

FAMSA: Fast and accurate multiple sequence alignment of huge protein families

Sebastian Deorowicz et al.

SCIENTIFIC REPORTS (2016)

Article Biotechnology & Applied Microbiology

CIDANE: comprehensive isoform discovery and abundance estimation

Stefan Canzar et al.

GENOME BIOLOGY (2016)

Review Biotechnology & Applied Microbiology

A survey of best practices for RNA-seq data analysis

Ana Conesa et al.

GENOME BIOLOGY (2016)

Review Genetics & Heredity

RNA-mediated epigenetic regulation of gene expression

Daniel Holoch et al.

NATURE REVIEWS GENETICS (2015)

Article Biochemistry & Molecular Biology

Identification of protein coding regions in RNA transcripts

Shiyuyun Tang et al.

NUCLEIC ACIDS RESEARCH (2015)

Article Biochemistry & Molecular Biology

limma powers differential expression analyses for RNA-sequencing and microarray studies

Matthew E. Ritchie et al.

NUCLEIC ACIDS RESEARCH (2015)

Article Biotechnology & Applied Microbiology

RNAseq by Total RNA Library Identifies Additional RNAs Compared to Poly(A) RNA Library

Yan Guo et al.

BIOMED RESEARCH INTERNATIONAL (2015)

Article Biotechnology & Applied Microbiology

Bridger: a new framework for de novo transcriptome assembly using RNA-seq data

Zheng Chang et al.

GENOME BIOLOGY (2015)

Article Biochemical Research Methods

UniRef clusters: a comprehensive and scalable alternative for improving sequence similarity searches

Baris E. Suzek et al.

BIOINFORMATICS (2015)

Article Biotechnology & Applied Microbiology

Measuring differential gene expression with RNA-seq: challenges and strategies for data analysis

Francesca Finotello et al.

BRIEFINGS IN FUNCTIONAL GENOMICS (2015)

Article Biochemical Research Methods

SOAPdenovo-Trans: de novo transcriptome assembly with short RNA-Seq reads

Yinlong Xie et al.

BIOINFORMATICS (2014)

Article Biochemical Research Methods

Trimmomatic: a flexible trimmer for Illumina sequence data

Anthony M. Bolger et al.

BIOINFORMATICS (2014)

Article Biochemical Research Methods

InterProScan 5: genome-scale protein function classification

Philip Jones et al.

BIOINFORMATICS (2014)

Article Biochemical Research Methods

RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies

Alexandros Stamatakis

BIOINFORMATICS (2014)

Article Biochemical Research Methods

NeatFreq: reference-free data reduction and coverage normalization for De Novo sequence assembly

Jamison M. McCorrison et al.

BMC BIOINFORMATICS (2014)

Editorial Material Biochemistry & Molecular Biology

A first look at the Oxford Nanopore MinION sequencer

Alexander S. Mikheyev et al.

MOLECULAR ECOLOGY RESOURCES (2014)

Article Biotechnology & Applied Microbiology

Normalization of RNA-seq data using factor analysis of control genes or samples

Davide Risso et al.

NATURE BIOTECHNOLOGY (2014)

Article Biotechnology & Applied Microbiology

Dissemination of scientific software with Galaxy ToolShed

Daniel Blankenberg et al.

GENOME BIOLOGY (2014)

Article Medicine, Research & Experimental

Comparing Bioinformatic Gene Expression Profiling Methods: Microarray and RNA-Seq

Kirk J. Mantione et al.

MEDICAL SCIENCE MONITOR BASIC RESEARCH (2014)

Article Biotechnology & Applied Microbiology

Evaluation of de novo transcriptome assemblies from RNA-Seq data

Bo Li et al.

GENOME BIOLOGY (2014)

Article Biotechnology & Applied Microbiology

Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2

Michael I. Love et al.

GENOME BIOLOGY (2014)

Article Biochemical Research Methods

STAR: ultrafast universal RNA-seq aligner

Alexander Dobin et al.

BIOINFORMATICS (2013)

Article Biochemical Research Methods

Infernal 1.1: 100-fold faster RNA homology searches

Eric P. Nawrocki et al.

BIOINFORMATICS (2013)

Article Biotechnology & Applied Microbiology

Non-coding RNAs in homeostasis, disease and stress responses: an evolutionary perspective

Paulo P. Amaral et al.

BRIEFINGS IN FUNCTIONAL GENOMICS (2013)

Article Microbiology

pico-PLAZA, a genome database of microbial photosynthetic eukaryotes

Klaas Vandepoele et al.

ENVIRONMENTAL MICROBIOLOGY (2013)

Article Biochemistry & Molecular Biology

MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability

Kazutaka Katoh et al.

MOLECULAR BIOLOGY AND EVOLUTION (2013)

Article Biochemistry & Molecular Biology

CPAT: Coding-Potential Assessment Tool using an alignment-free logistic regression model

Liguo Wang et al.

NUCLEIC ACIDS RESEARCH (2013)

Article Multidisciplinary Sciences

TCW: Transcriptome Computational Workbench

Carol Soderlund et al.

PLOS ONE (2013)

Article Biotechnology & Applied Microbiology

TRAPID: an efficient online tool for the functional and comparative analysis of de novo RNA-Seq transcriptomes

Michiel Van Bel et al.

GENOME BIOLOGY (2013)

Article Biochemical Research Methods

Unipro UGENE: a unified bioinformatics toolkit

Konstantin Okonechnikov et al.

BIOINFORMATICS (2012)

Article Biochemical Research Methods

Oases: robust de novo RNA-seq assembly across the dynamic range of expression levels

Marcel H. Schulz et al.

BIOINFORMATICS (2012)

Article Biochemical Research Methods

CD-HIT: accelerated for clustering the next-generation sequencing data

Limin Fu et al.

BIOINFORMATICS (2012)

Article Biochemical Research Methods

SortMeRNA: fast and accurate filtering of ribosomal RNAs in metatranscriptomic data

Evguenia Kopylova et al.

BIOINFORMATICS (2012)

Article Biochemical Research Methods

Snakemake-a scalable bioinformatics workflow engine

Johannes Koester et al.

BIOINFORMATICS (2012)

Article Biochemistry & Molecular Biology

Effects of short read quality and quantity on a de novo vertebrate transcriptome assembly

T. I. Garcia et al.

COMPARATIVE BIOCHEMISTRY AND PHYSIOLOGY C-TOXICOLOGY & PHARMACOLOGY (2012)

Article Multidisciplinary Sciences

An integrated encyclopedia of DNA elements in the human genome

Ian Dunham et al.

NATURE (2012)

Article Biochemical Research Methods

Fast gapped-read alignment with Bowtie 2

Ben Langmead et al.

NATURE METHODS (2012)

Article Multidisciplinary Sciences

Selective Depletion of rRNA Enables Whole Transcriptome Profiling of Archival Fixed Tissue

John D. Morlan et al.

PLOS ONE (2012)

Article Biochemical Research Methods

Resolving the Ortholog Conjecture: Orthologs Tend to Be Weakly, but Significantly, More Similar in Function than Paralogs

Adrian M. Altenhoff et al.

PLOS COMPUTATIONAL BIOLOGY (2012)

Article Biochemical Research Methods

Evaluation of the coverage and depth of transcriptome by RNA-Seq in chickens

Ying Wang et al.

BMC BIOINFORMATICS (2011)

Article Biochemical Research Methods

RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome

Bo Li et al.

BMC BIOINFORMATICS (2011)

Article Biotechnology & Applied Microbiology

WebMGA: a customizable web server for fast metagenomic sequence analysis

Sitao Wu et al.

BMC GENOMICS (2011)

Review Cell Biology

RNA structure and the mechanisms of alternative splicing

C. Joel McManus et al.

CURRENT OPINION IN GENETICS & DEVELOPMENT (2011)

Article Biochemistry & Molecular Biology

Differential expression in RNA-seq: A matter of depth

Sonia Tarazona et al.

GENOME RESEARCH (2011)

Article Biotechnology & Applied Microbiology

Full-length transcriptome assembly from RNA-Seq data without a reference genome

Manfred G. Grabherr et al.

NATURE BIOTECHNOLOGY (2011)

Review Genetics & Heredity

RNA sequencing: advances, challenges and opportunities

Fatih Ozsolak et al.

NATURE REVIEWS GENETICS (2011)

Review Genetics & Heredity

Next-generation transcriptome assembly

Jeffrey A. Martin et al.

NATURE REVIEWS GENETICS (2011)

Article Biochemistry & Molecular Biology

The Sequence Read Archive

Rasko Leinonen et al.

NUCLEIC ACIDS RESEARCH (2011)

Article Biochemical Research Methods

Accelerated Profile HMM Searches

Sean R. Eddy

PLOS COMPUTATIONAL BIOLOGY (2011)

Article Biochemical Research Methods

edgeR: a Bioconductor package for differential expression analysis of digital gene expression data

Mark D. Robinson et al.

BIOINFORMATICS (2010)

Article Biochemical Research Methods

Prodigal: prokaryotic gene recognition and translation initiation site identification

Doug Hyatt et al.

BMC BIOINFORMATICS (2010)

Article Biochemical Research Methods

De novo assembly and analysis of RNA-seq data

Gordon Robertson et al.

NATURE METHODS (2010)

Article Biochemistry & Molecular Biology

Biases in Illumina transcriptome sequencing caused by random hexamer priming

Kasper D. Hansen et al.

NUCLEIC ACIDS RESEARCH (2010)

Review Biotechnology & Applied Microbiology

From RNA-seq reads to differential expression results

Alicia Oshlack et al.

GENOME BIOLOGY (2010)

Article Biochemical Research Methods

The Sequence Alignment/Map format and SAMtools

Heng Li et al.

BIOINFORMATICS (2009)

Article Biochemical Research Methods

BLAST plus : architecture and applications

Christiam Camacho et al.

BMC BIOINFORMATICS (2009)

Review Genetics & Heredity

RNA-Seq: a revolutionary tool for transcriptomics

Zhong Wang et al.

NATURE REVIEWS GENETICS (2009)

Article Multidisciplinary Sciences

Real-Time DNA Sequencing from Single Polymerase Molecules

John Eid et al.

SCIENCE (2009)

Article Biochemistry & Molecular Biology

High-throughput functional annotation and data mining with the Blast2GO suite

Stefan Gotz et al.

NUCLEIC ACIDS RESEARCH (2008)

Article Multidisciplinary Sciences

RNA maps reveal new RNA classes and a possible function for pervasive transcription

Philipp Kapranov et al.

SCIENCE (2007)

Article Biochemical Research Methods

UniRef: comprehensive and non-redundant UniProt reference clusters

Baris E. Suzek et al.

BIOINFORMATICS (2007)

Article Biochemistry & Molecular Biology

RNAmmer:: consistent and rapid annotation of ribosomal RNA genes

Karin Lagesen et al.

NUCLEIC ACIDS RESEARCH (2007)

Article Biochemistry & Molecular Biology

Transcriptional noise and the fidelity of initiation by RNA polymerase II

Kevin Struhl

NATURE STRUCTURAL & MOLECULAR BIOLOGY (2007)

Article Biochemical Research Methods

Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences

Weizhong Li et al.

BIOINFORMATICS (2006)

Letter Genetics & Heredity

GenePattern 2.0

M Reich et al.

NATURE GENETICS (2006)

Article Computer Science, Hardware & Architecture

Rule-based workflow management for bioinformatics

JS Conery et al.

VLDB JOURNAL (2005)

Review Multidisciplinary Sciences

Initial sequencing and analysis of the human genome

ES Lander et al.

NATURE (2001)

Article Biochemistry & Molecular Biology

Predicting transmembrane protein topology with a hidden Markov model: Application to complete genomes

A Krogh et al.

JOURNAL OF MOLECULAR BIOLOGY (2001)

Article Biochemistry & Molecular Biology

KEGG: Kyoto Encyclopedia of Genes and Genomes

M Kanehisa et al.

NUCLEIC ACIDS RESEARCH (2000)