☆ 4.4 Article

Sequencing artifacts in the type A influenza databases and attempts to correct them

INFLUENZA AND OTHER RESPIRATORY VIRUSES (2014)

Journal

INFLUENZA AND OTHER RESPIRATORY VIRUSES

Volume 8, Issue 4, Pages 499-505

Publisher

WILEY

DOI: 10.1111/irv.12239

Keywords

databases; errors; influenza; sequence

Funding

United States Department of Agriculture, Agricultural Research Service CRIS [6612-32000-063]

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

Background There are over 276 000 influenza gene sequences in public databases, with the quality of the sequences determined by the contributor. Objective As part of a high school class project, influenza sequences with possible errors were identified in the public databases based on the size of the gene being longer than expected, with the hypothesis that these sequences would have an error. Students contacted sequence submitters alerting them of the possible sequence issue(s) and requested they the suspect sequence(s) be correct as appropriate. Methods Type A influenza viruses were screened, and gene segments longer than the accepted size were identified for further analysis. Attention was placed on sequences with additional nucleotides upstream or downstream of the highly conserved non-coding ends of the viral segments. Results and Conclusions A total of 1081 sequences were identified that met this criterion. Three types of errors were commonly observed: non-influenza primer sequence wasn't removed from the sequence; PCR product was cloned and plasmid sequence was included in the sequence; and Taq polymerase added an adenine at the end of the PCR product. Internal insertions of nucleotide sequence were also commonly observed, but in many cases it was unclear if the sequence was correct or actually contained an error. A total of 215 sequences, or 22.8% of the suspect sequences, were corrected in the public databases in the first year of the student project. Unfortunately 138 additional sequences with possible errors were added to the databases in the second year. Additional awareness of the need for data integrity of sequences submitted to public databases is needed to fully reap the benefits of these large data sets.

Sequencing artifacts in the type A influenza databases and attempts to correct them

Journal

INFLUENZA AND OTHER RESPIRATORY VIRUSES

Publisher

WILEY

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Sequencing artifacts in the type A influenza databases and attempts to correct them

Journal

INFLUENZA AND OTHER RESPIRATORY VIRUSES

Publisher

WILEY

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper