4.7 Article

DrTM plus B: Replication-Driven Live Reconfiguration for Fast and General Distributed Transaction Processing

Journal

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS
Volume 33, Issue 10, Pages 2628-2643

Publisher

IEEE COMPUTER SOC
DOI: 10.1109/TPDS.2022.3148251

Keywords

Throughput; Fault tolerant systems; Fault tolerance; Protocols; Low latency communication; Optimization; Distributed databases; Distributed transactions; load balance; live reconfiguration; data replication; RDMA

Funding

  1. National Key Research & Development Program [2020AAA0108500]
  2. National Natural Science Foundation of China [61732010, 61925206, 62172272]

Ask authors/readers for more resources

This article presents DrTM+B, a live reconfiguration approach for in-memory database systems that seamlessly repartitions data with little disruption to running transactions. It leverages pre-copy and post-copy schemes, a cooperative commit protocol, and log forwarding mechanism to provide high-performance and high availability for distributed transaction processing systems.
Recent in-memory database systems leverage advanced hardware features like RDMA to provide transaction processing at millions of transactions per second. Distributed transaction processing systems can scale to even higher rates, especially for partitionable workloads. Unfortunately, it is challenging to sustain such high rates during live reconfiguration of partitions. In this article, we observe that state-of-the-art approaches would cause notable performance disruption under fast transaction processing. To this end, this article presents DrTM+B, a live reconfiguration approach that seamlessly repartitions data with little performance disruption to running transactions. DrTM+B uses a pre-copy-based mechanism to avoid excessive data transfer by leveraging common properties in recent transactional systems. DrTM+B's reconfiguration plans reduce data movement by preferring existing data replicas, while copying data from multiple replicas asynchronously and in parallel. It further reuses the log forwarding mechanism in primary-backup replication to seamlessly track and forward dirty database tuples and avoids iterative copying costs. To commit a reconfiguration plan in a transactional-safe way, DrTM+B designs a cooperative commit protocol for synchronization of data and state among replicas. To boost the performance during data migration, DrTM+B combines the pre-copy and post-copy schemes to propose a hybrid copy scheme. The live reconfiguration approach can also coexist with fault-tolerance mechanisms of primary-backup replication to provide high availability. Evaluation on a working system based on DrTM+R with 3-way replication using typical OLTP workloads like TPC-C and SmallBank shows that DrTM+B incurs only very small performance degradation during live reconfiguration and provides high availability. Both the reconfiguration time and the downtime are also minimal.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available