Journal
NATURE COMMUNICATIONS
Volume 11, Issue 1, Pages -Publisher
NATURE PUBLISHING GROUP
DOI: 10.1038/s41467-020-15512-5
Keywords
-
Categories
Funding
- NIH [1R35GM133613-01]
- Alfred P. Sloan Research Fellowship
- Simons Center for Quantitative Biology at Cold Spring Harbor Laboratory
Ask authors/readers for more resources
Massively parallel phenotyping assays have provided unprecedented insight into how multiple mutations combine to determine biological function. While such assays can measure phenotypes for thousands to millions of genotypes in a single experiment, in practice these measurements are not exhaustive, so that there is a need for techniques to impute values for genotypes whose phenotypes have not been directly assayed. Here, we present an imputation method based on inferring the least epistatic possible sequence-function relationship compatible with the data. In particular, we infer the reconstruction where mutational effects change as little as possible across adjacent genetic backgrounds. The resulting models can capture complex higher-order genetic interactions near the data, but approach additivity where data is sparse or absent. We apply the method to high-throughput transcription factor binding assays and use it to explore a fitness landscape for protein G. High-throughput combinatorial mutagenesis assays are useful to screen the function of many different sequences but they are not exhaustive. Here, Zhou and McCandlish develop a method to impute such missing genotype-phenotype data based on inferring the least epistatic sequence-function relationship.
Authors
I am an author on this paper
Click your name to claim this paper and add it to your profile.
Reviews
Recommended
No Data Available