☆ 4.7 Article

Explicit Data Correlations-Directed Metadata Prefetching Method in Distributed File Systems

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS (2019)

Journal

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS

Volume 30, Issue 12, Pages 2692-2705

Publisher

IEEE COMPUTER SOC

DOI: 10.1109/TPDS.2019.2921760

Keywords

Distributed file system; metadata performance; prefetching; data correlations

Funding

National Key R&D Program of China [2018YFB1003204]
National Nature Science Foundation of China [61772486, 61832011, 61802358, 61772484]

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

Metadata performance in distributed file systems (DFS) is critical, due to the following trends: (a) the growing size of modern storage systems is expected to exceed billions of files and most files are small; (b) over half of the file accesses are metadata operations. In this work, we present SMeta, a metadata prefetching method that is seamlessly integrated into DFS for easy-of-use and significantly scales the metadata performance. Previous prefetching proposals primarily focus on mining groups of files that tend to be accessed together from the access history. Nevertheless, our study discovered that these solutions likely miss a huge number of correlated files whose co-occurrence frequency is not high enough. Unlike access correlations, we take a novel and completely different approach to explore explicit data correlations by understanding the reference relationships between files encoded in some forms of hyperlinks, which naturally exist in many applications. To embrace this new concept, SMeta explores correlations upon files are written via a light-weight pattern matching algorithm, stores correlations in the reserved extended attributes of file metadata to avoid changes in DFS APIs, and collapses multiple I/O rounds for accessing metadata of the target file and its data-correlated files into one round. A cost-efficient adaptive feedback mechanism is introduced to improve prefetching accuracy. We implemented SMeta atop of Ceph and evaluated it using synthetic and real system workloads. Compared to baselines, SMeta provides better metadata performance in terms of latency, throughput and scalability.

Explicit Data Correlations-Directed Metadata Prefetching Method in Distributed File Systems

Journal

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS

Publisher

IEEE COMPUTER SOC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Explicit Data Correlations-Directed Metadata Prefetching Method in Distributed File Systems

Journal

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS

Publisher

IEEE COMPUTER SOC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper