4.6 Article

SOAPBarcode: revealing arthropod biodiversity through assembly of Illumina shotgun sequences of PCR amplicons

Journal

METHODS IN ECOLOGY AND EVOLUTION
Volume 4, Issue 12, Pages 1142-1150

Publisher

WILEY
DOI: 10.1111/2041-210X.12120

Keywords

high-throughput sequencing; metabarcoding; next-generation-sequencing; operational taxonomic units; phylogenetic diversity; species richness; standard barcode

Categories

Funding

  1. National High-tech Research and Development Project (863) of China [2012AA021601]
  2. BGI
  3. Yunnan Province [20080A001]
  4. Chinese Academy of Sciences [0902281081, KSCX2-YW-Z-1027]
  5. National Natural Science Foundation of China [31170498]
  6. Ministry of Science and Technology of China [2012FY110800]
  7. University of East Anglia
  8. State Key Laboratory of Genetic Resources and Evolution at the Kunming Institute of Zoology

Ask authors/readers for more resources

Metabarcoding of mixed arthropod samples for biodiversity assessment has mostly been carried out on the 454 GS FLX sequencer (Roche, Branford, Connecticut, USA), due to its ability to produce long reads (400bp) that are believed to allow higher taxonomic resolution. The Illumina sequencing platforms, with their much higher throughputs, could potentially reduce sequencing costs and improve sequence quality, but the associated shorter read length (typically <150bp) has deterred their usage in next-generation-sequencing (NGS)-based analyses of eukaryotic biodiversity, which often utilize standard barcode markers (e.g. COI, rbcL, matK, ITS) that are hundreds of nucleotides long. We present a new Illumina-based pipeline to recover full-length COI barcodes from mixed arthropod samples. Our new assembly program, SOAPBarcode, a variant of the genome assembly program SOAPdenovo, uses paired-end reads of the standard COI barcode region as anchors to extract the correct pathways (sequences) out of otherwise chaotic de Bruijn graphs', which are caused by the presence of large numbers of COI homologs of high sequence similarity. Two bulk insect samples of known species composition have been analysed in a recently published 454 metabarcoding study (Yu etal. 2012) and are re-analysed by our analysis pipeline. Compared to the results of Roche 454 (c.400-bp reads), our pipeline recovered full-length COI barcodes (658bp) and 17-31% more species-level operational taxonomic units (OTUs) from bulk insect samples, with fewer untraceable (novel) OTUs. On the other hand, our PCR-based pipeline also revealed higher rates of contamination across samples, due to the Illumina's increased sequencing depth. On balance, the assembled full-length barcodes and increased OTU recovery rates resulted in more resolved taxonomic assignments and more accurate beta diversity estimation. The HiSeq 2000 and the SOAPBarcode pipeline together can achieve more accurate biodiversity assessment at a much reduced sequencing cost in metabarcoding analyses. However, greater precaution is needed to prevent cross-sample contamination during field preparation and laboratory operation because of greater ability to detect non-target DNA amplicons present in low-copy numbers.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available