4.7 Article

Differential Privacy-Based Genetic Matching in Personalized Medicine

Journal

IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTING
Volume 9, Issue 3, Pages 1109-1125

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TETC.2020.2970094

Keywords

Genetics; Noise measurement; Diseases; Servers; Privacy; Differential privacy; Personalized medicine; genetic matching; privacy-preserving; differential privacy; data utility

Funding

  1. National Natural Science Foundation of China [61872131]

Ask authors/readers for more resources

This article introduces a genetic matching scheme based on differential privacy technology to protect genetic data privacy and achieve effective genetic matching. The scheme constructs noisy published and query sequences using differential privacy algorithms, and calculates the longest common subsequence through a dynamic programming algorithm to achieve matching results.
Genetic matching in personalized medicine is becoming more popular in cloud computing, whereby a cloud server performs genetic matching from the genetic data outsourced by a gene provider (e.g., a genetic lab) and an authorized party (e.g., a doctor) for diagnosing the patients' diseases. Due to sensitive privacy, we should protect genetic data before outsourcing it to the untrusted cloud. However, traditional differential privacy schemes do not support genetic matching and ciphertext methods hinder data availability. In this article, we propose a differential privacy-based genetic matching (DPGM) scheme to achieve effective genetic matching and protect genetic privacy. Specifically, DPGM first uses a DP-based EIGENSTRAT (DPE) algorithm to construct a published sequence that contains significantly noisy single-nucleotide polymorphisms (SNPs) associated with diseases, thereby ensuring outsourced genetic data privacy. Second, DPGM adopts a DP-based N-order Markov (DPNM) algorithm to generate a noisy query sequence, which considers query privacy and the similarity between the noisy query and the actual query. Finally, DPGM calculates the longest common subsequence (LCS) based on a dynamic programming algorithm, which achieves effective matching results. Detailed theoretical analysis proves that our DPGM scheme achieves epsilon-differential privacy. Extensive experiments over actual genetic datasets demonstrate that our scheme achieves high efficiency and data utility.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available