4.8 Article

A machine learning toolkit for genetic engineering attribution to facilitate biosecurity

Journal

NATURE COMMUNICATIONS
Volume 11, Issue 1, Pages -

Publisher

NATURE PORTFOLIO
DOI: 10.1038/s41467-020-19612-0

Keywords

-

Funding

  1. Center for Effective Altruism
  2. Open Philanthropy Project
  3. NIH [2T32HG002295-16]
  4. Lambda Labs, Inc.

Ask authors/readers for more resources

The promise of biotechnology is tempered by its potential for accidental or deliberate misuse. Reliably identifying telltale signatures characteristic to different genetic designers, termed 'genetic engineering attribution', would deter misuse, yet is still considered unsolved. Here, we show that recurrent neural networks trained on DNA motifs and basic phenotype data can reach 70% attribution accuracy in distinguishing between over 1,300 labs. To make these models usable in practice, we introduce a framework for weighing predictions against other investigative evidence using calibration, and bring our model to within 1.6% of perfect calibration. Additionally, we demonstrate that simple models can accurately predict both the nation-state-of-origin and ancestor labs, forming the foundation of an integrated attribution toolkit which should promote responsible innovation and international security alike. The potential for accidental or deliberate misuse of biotechnology is of concern for international biosecurity. Here the authors apply machine learning to DNA sequences and associated phenotypic data to facilitate genetic engineering attribution and identify country-of-origin and ancestral lab of engineered DNA sequences.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.8
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available