4.6 Article

Codes With Run-Length and GC-Content Constraints for DNA-Based Data Storage

Journal

IEEE COMMUNICATIONS LETTERS
Volume 22, Issue 10, Pages 2004-2007

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/LCOMM.2018.2866566

Keywords

DNA data storage; constrained coding; channel coding

Funding

  1. SUTD-MIT IDC research grant
  2. Singapore Ministry of Education Academic Research Fund Tier 2 [MOE2016-T2-2-054]
  3. SUTD-ZJU grant [ZJURP1500102]
  4. NSFC [61750110529]

Ask authors/readers for more resources

We propose a coding method to transform binary sequences into DNA base sequences (codewords), namely sequences of the symbols A, T, C, and G, that satisfy the following two properties: 1) run-length constraint: the maximum run-length of each symbol in each codeword is at most three and 2) GC-content constraint: the GC-content of each codeword is close to 0.5, say between 0.4 and 0.6. The proposed coding scheme is motivated by the problem of designing codes for DNA-based data storage systems, where the binary digital data is stored in synthetic DNA base sequences. Existing literature either achieve code rates not greater than 1.78 bits per nucleotide or lead to severe error propagation. Our method achieves a rate of 1.9 bits per DNA base with low encoding/decoding complexity and limited error propagation.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available