Journal
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS
Volume 30, Issue 11, Pages 2582-2594Publisher
IEEE COMPUTER SOC
DOI: 10.1109/TPDS.2019.2911944
Keywords
Distributed debugging; distributed applications; tracing; query languages
Funding
- National Science Foundation (NSF) [XPS-1533870, XPS-1533802]
Ask authors/readers for more resources
Retroscope is a comprehensive lightweight distributed monitoring tool that enables users to query and reconstruct past consistent global states of the system. Retroscope achieves this by augmenting the system with Hybrid Logical Clocks (HLC) and by streaming HLC-stamped event logs for storage and processing; these HLC timestamps are then used for constructing global (or nonlocal) snapshots upon request. Retroscope provides a rich querying language (RQL) to facilitate searching for global predicates across past consistent states. The search is performed by advancing through global states in small incremental steps, greatly reducing the amount of computation needed to construct consistent states. The Retroscope search algorithm is embarrassingly-parallel and can employ many worker processes (each processing up to 150,000 consistent snapshots per second) to handle a single query. We evaluate Retroscope's monitoring capabilities in two case studies: Chord and Apache ZooKeeper.
Authors
I am an author on this paper
Click your name to claim this paper and add it to your profile.
Reviews
Recommended
No Data Available