4.6 Article

Research on Data Routing Strategy of Deduplication in Cloud Environment

Journal

IEEE ACCESS
Volume 10, Issue -, Pages 9529-9542

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/ACCESS.2021.3139757

Keywords

Routing; Fingerprint recognition; Load management; Distributed databases; Cloud computing; Throughput; Data compression; Cloud; deduplication; data routing; load balancing

Funding

  1. National Natural Science Foundation of China [61872284, 72071153]
  2. Industrial Field of General Projects of Science and Technology Department of Shaanxi Province [2020GY-012]
  3. Industrialization Project of Shaanxi Provincial Department of Education [21JC017]
  4. Thirteenth Five-Year'' National Key Research and Development Program Project [2019YFD1100901]
  5. Natural Science Foundation of Shannxi Province, China [2014JM2-6127]
  6. Talent Fund Project of the Xi'an University of Architecture and Technology [RC1707]
  7. Youth Fund Project of the Xi'an University of Architecture and Technology [QN1726]
  8. Natural Science Project of Xi'an University of Architecture and Technology [ZR18050]

Ask authors/readers for more resources

This paper investigates a data routing strategy based on distributed Bloom Filter. By using superchunks as the basic unit of data routing, the system throughput is improved. The feasibility of the strategy is validated through experiments, and its advantages over other routing strategies are demonstrated.
The application of data deduplication technology reduces the demand for data storage and improves resource utilization. Compared with limited storage capacity and computing capacity of a single node, cluster data deduplication technology has great advantages. However, the cluster data duplication technology also brings new issues on deduplication rate reduction and load balancing of storage nodes. The application of data routing strategy can well balance the problem of deduplication rate and load balancing. Therefore, this paper proposes a data routing strategy based on distributed Bloom Filter. 1)Superchunk is used as the basic unit of data routing to improve system throughput. According to Broder's theorem, k leastsized fingerprints are selected as the Superchunk features and send to the storage node. The optimal node is selected as the routing node by matching the BloomFilter, and the storage capacity of the node and maintained in the memory of the storage node. 2) Design and implement system prototypes. The specific parameters of all kinds of routing strategies are obtained through experiments, and the routing strategies proposed in this paper are tested. The theoretical analysis and experimental results prove the feasibility of the strategies proposed by this paper. Compared with the other routing strategies, our method improved 3% of the deduplication rate, reduces the communication query overhead by more than 36% and improves the load balancing degree of the storage system.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available