☆ 4.4 Article

Nucleic Acid Sequence Design via Efficient Ensemble Defect Optimization

JOURNAL OF COMPUTATIONAL CHEMISTRY (2011)

Journal

JOURNAL OF COMPUTATIONAL CHEMISTRY

Volume 32, Issue 3, Pages 439-452

Publisher

WILEY-BLACKWELL

DOI: 10.1002/jcc.21633

Keywords

RNA; DNA; secondary structure; sequence design; partition function

Funding

The National Science Foundation [NSF-CCF-0832824, NSF-CCF-CAREER-0448835]
The Beckman Institute at Caltech
The Ralph M. Parsons Foundation
Direct For Computer & Info Scie & Enginr
Division of Computing and Communication Foundations [832824] Funding Source: National Science Foundation

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

We describe an algorithm for designing the sequence of one or more interacting nucleic acid strands intended to adopt a target secondary structure at equilibrium. Sequence design is formulated as an optimization problem with the goal of reducing the ensemble defect below a user-specified stop condition. For a candidate sequence and a given target secondary structure, the ensemble defect is the average number of incorrectly paired nucleotides at equilibrium evaluated over the ensemble of unpseudoknotted secondary structures. To reduce the computational cost of accepting or rejecting mutations to a random initial sequence, candidate mutations are evaluated on the leaf nodes of a tree-decomposition of the target structure. During leaf optimization, defect-weighted mutation sampling is used to select each candidate mutation position with probability proportional to its contribution to the ensemble defect of the leaf. As subsequences are merged moving up the tree, emergent structural defects resulting from crosstalk between sibling sequences are eliminated via reoptimization within the defective subtree starting from new random subsequences. Using a Theta(N-3) dynamic program to evaluate the ensemble defect of a target structure with N nucleotides, this hierarchical approach implies an asymptotic optimality bound on design time: for sufficiently large N, the cost of sequence design is bounded below by 4/3 the cost of a single evaluation of the ensemble defect for the full sequence. Hence, the design algorithm has time complexity Omega(N-3). For target structures containing N is an element of {100, 200, 400, 800, 1600, 3200} nucleotides and duplex stems ranging from 1 to 30 base pairs, RNA sequence designs at 37 degrees C typically succeed in satisfying a stop condition with ensemble defect less than N/100. Empirically, the sequence design algorithm exhibits asymptotic optimality and the exponent in the time complexity bound is sharp. (C) 2010 Wiley Periodicals, Inc. J Comput Chem 32:439-452, 2011

Nucleic Acid Sequence Design via Efficient Ensemble Defect Optimization

Journal

JOURNAL OF COMPUTATIONAL CHEMISTRY

Publisher

WILEY-BLACKWELL

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Nucleic Acid Sequence Design via Efficient Ensemble Defect Optimization

Journal

JOURNAL OF COMPUTATIONAL CHEMISTRY

Publisher

WILEY-BLACKWELL

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper