4.0 Article

The solution surface of the Li-Stephens haplotype copying model

期刊

ALGORITHMS FOR MOLECULAR BIOLOGY
卷 18, 期 1, 页码 -

出版社

BMC
DOI: 10.1186/s13015-023-00237-z

关键词

Li-Stephens model; Haplotype copying model; Solution path

向作者/读者索取更多资源

The Li-Stephens (LS) haplotype copying model is a crucial statistical inference procedure in genetics. This model assumes that a sampled chromosome is a mosaic of other chromosomes in the population. The behavior of LS depends on two user-specified parameters, ? and ?, representing mutation and recombination rates. In this study, the authors propose an alternative perspective that treats ? and ? as tuning parameters to better understand their impact on LS output. They develop an algorithm that efficiently partitions the (?, ?) plane to generate all possible solutions to the LS model. The findings suggest that using conventional values for ? and ? produces optimal results for imputation tasks but may increase switch error in diploid genotype phasing.
The Li-Stephens (LS) haplotype copying model forms the basis of a number of important statistical inference proce-dures in genetics. LS is a probabilistic generative model which supposes that a sampled chromosome is an imper-fect mosaic of other chromosomes found in a population. In the frequentist setting which is the focus of this paper, the output of LS is a 'copying path through chromosome space. The behavior of LS depends crucially on two user-specified parameters, ? and ?, which are respectively interpreted as the rates of mutation and recombination. However, because LS is not based on a realistic model of ancestry, the precise connection between these parameters and the biological phenomena they represent is unclear. Here, we offer an alternative perspective, which considers ? and ? as tuning parameters, and seeks to understand their impact on the LS output. We derive an algorithm which, for a given dataset, efficiently partitions the (?, ?) plane into regions where the output of the algorithm is constant, thereby enumerating all possible solutions to the LS model in one go. We extend this approach to the 'diploid LS model commonly used for phasing. We demonstrate the usefulness of our method by studying the effects of chang-ing ? and ?when using LS for common bioinformatic tasks. Our findings indicate that using the conventional (i.e., population-scaled) values for ? and ? produces near optimal results for imputation, but may systematically inflate switch error in the case of phasing diploid genotypes.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.0
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据