4.7 Article

Mitochondrial DNA Consensus Calling and Quality Filtering for Constructing Ancient Human Mitogenomes: Comparison of Two Widely Applied Methods

Journal

Publisher

MDPI
DOI: 10.3390/ijms23094651

Keywords

archaeogenetics; mtDNA; consensus sequence calling

Ask authors/readers for more resources

This study evaluates the use of two tools (ANGSD and schmutzi) for reconstructing high-quality ancient mitogenomes from ancient human genomic data. The results show that filtering improves mtDNA consensus calling and haplogroup prediction accuracy. The study also finds a strong correlation between the number of sequence reads and bases and overall sample quality.
Retrieving high-quality endogenous ancient DNA (aDNA) poses several challenges, including low molecular copy number, high rates of fragmentation, damage at read termini, and potential presence of exogenous contaminant DNA. All these factors complicate a reliable reconstruction of consensus aDNA sequences in reads from high-throughput sequencing platforms. Here, we report findings from a thorough evaluation of two alternative tools (ANGSD and schmutzi) aimed at overcoming these issues and constructing high-quality ancient mitogenomes. Raw genomic data (BAM/FASTQ) from a total of 17 previously published whole ancient human genomes ranging from the 14th to the 7th millennium BCE were retrieved and mitochondrial consensus sequences were reconstructed using different quality filters, with their accuracy measured and compared. Moreover, the influence of different sequence parameters (number of reads, sequenced bases, mean coverage, and rate of deamination and contamination) as predictors of derived sequence quality was evaluated. Complete mitogenomes were successfully reconstructed for all ancient samples, and for the majority of them, filtering substantially improved mtDNA consensus calling and haplogroup prediction. Overall, the schmutzi pipeline, which estimates and takes into consideration exogenous contamination, appeared to have the edge over the much faster and user-friendly alternative method (ANGSD) in moderate to high coverage samples (>1,000,000 reads). ANGSD, however, through its read termini trimming filter, showed better capabilities in calling the consensus sequence from low-quality samples. Among all the predictors of overall sample quality examined, the strongest correlation was found for the available number of sequence reads and bases. In the process, we report a previously unassigned haplogroup (U3b) for an Early Chalcolithic individual from Southern Anatolia/Northern Levant.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available