4.8 Article

Reconstruction of evolving gene variants and fitness from short sequencing reads

期刊

NATURE CHEMICAL BIOLOGY
卷 17, 期 11, 页码 1188-1198

出版社

NATURE PORTFOLIO
DOI: 10.1038/s41589-021-00876-6

关键词

-

资金

  1. NIH [R01 EB031172, R01 EB027793, R35 GM118062]
  2. HHMI
  3. NSF Graduate Research Fellowship

向作者/读者索取更多资源

Evoracle is a machine learning method that accurately reconstructs full-length genotypes and calculates fitness in directed evolution experiments, with substantial improvements over related methods. It performs well on data with linkage loss and large measurement noise, broadening accessibility to training machine learning models on gene variant fitness.
Directed evolution can generate proteins with tailor-made activities. However, full-length genotypes, their frequencies and fitnesses are difficult to measure for evolving gene-length biomolecules using most high-throughput DNA sequencing methods, as short read lengths can lose mutation linkages in haplotypes. Here we present Evoracle, a machine learning method that accurately reconstructs full-length genotypes (R-2 = 0.94) and fitness using short-read data from directed evolution experiments, with substantial improvements over related methods. We validate Evoracle on phage-assisted continuous evolution (PACE) and phage-assisted non-continuous evolution (PANCE) of adenine base editors and OrthoRep evolution of drug-resistant enzymes. Evoracle retains strong performance (R-2 = 0.86) on data with complete linkage loss between neighboring nucleotides and large measurement noise, such as pooled Sanger sequencing data (similar to US$10 per timepoint), and broadens the accessibility of training machine learning models on gene variant fitnesses. Evoracle can also identify high-fitness variants, including low-frequency 'rising stars', well before they are identifiable from consensus mutations.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.8
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据