4.7 Article

ARHap: Association Rule Haplotype Phasing

出版社

IEEE COMPUTER SOC
DOI: 10.1109/TCBB.2021.3119955

关键词

DNA; Sequential analysis; Bioinformatics; Partitioning algorithms; Organisms; Measurement uncertainty; Linear programming; Computational genetics; chromosome reconstruction; DNA phasing; haplotype assembly

向作者/读者索取更多资源

This article proposes a novel approach called ARHap for individual human phasing based on discovery of interesting hidden relations among single variant sites. ARHap learns strong association rules among variant loci on the genome and develops a combinatorial approach for fast and accurate haplotype phasing. Extensive experimental analyses demonstrate the superiority of ARHap in diploid haplotyping compared to state-of-the-art algorithms, showing significantly better accuracy performance.
This article proposes a novel approach for Individual Human phasing through discovery of interesting hidden relations among single variant sites. The proposed framework, called ARHap, learns strong association rules among variant loci on the genome and develops a combinatorial approach for fast and accurate haplotype phasing based on the discovered associations. ARHap is composed of two main modules or processing phases. In the first phase, called association rule learning, ARHap identifies quantitative association rules from a collection of DNA reads of the organismunder study, resulting in a set of strong rules that reveal the inter-dependency of alleles. In the next phase, called haplotype reconstruction, we develop algorithms to utilize the learned rules to construct highly reliable haplotypes at individual single nucleotide polymorphism(SNP) sites. ARHap has several features that lead to both fast and accurate haplotyping. It uses an incremental haplotype reconstruction approach that enables us to generate association rules according to the unreconstructed SNP sites during each round of the algorithm. During each round, the association rule learning module generates rules while constraining the length of the rules and limiting the rules to those that contribute to reconstruction of unreconstructed sites only. The framework begins by generating rules of small size and highly strong. The rule length can increase and/or criteria about strongness of the rule are adjusted gradually, during subsequent rounds, if some SNP sites have remained unreconstructed. This adaptive approach, which uses feedback from haplotype reconstruction module, eliminates generation of rules that do not contribute to haplotype reconstruction aswell asweak rules that may introduce error in the final haplotypes. Extensive experimental analyses on datasets representing diploid organisms demonstrate superiority of ARHap in diploid haplotyping compared to the state-of-the-art algorithms. In particular, we show that this novel approach to haplotype phasing not only is fast but also achieves significantly better accuracy performance compared to other read-based computational approaches.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据