☆ 4.5 Article

Hadoop Perfect File: A fast and memory-efficient metadata access archive file to face small files problem in HDFS

JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING (2021)

期刊

JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING

卷 156, 期 -, 页码 119-130

出版社

ACADEMIC PRESS INC ELSEVIER SCIENCE

DOI: 10.1016/j.jpdc.2021.05.011

关键词

Distributed file system; Massive small files; Fast access; HDFS

类别

Computer Science, Theory & Methods

资金

National Natural Science Foundation of China [61602037]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

This paper introduces a new archive file named Hadoop Perfect File (HPF), which minimizes access overheads by directly accessing metadata from the index file. It improves the access efficiency from archive files.

HDFS faces several issues when it comes to handling a large number of small files. These issues are well addressed by archive systems, which combine small files into larger ones. They use index files to hold relevant information for retrieving a small file content from the big archive file. However, existing archive-based solutions require significant overheads when retrieving a file content since additional processing and I/Os are needed to acquire the retrieval information before accessing the actual file content, therefore, deteriorating the access efficiency. This paper presents a new archive file named Hadoop Perfect File (HPF). HPF minimizes access overheads by directly accessing metadata from the part of the index file containing the information. It consequently reduces the additional processing and I/Os needed and improves the access efficiency from archive files. Our index system uses two hash functions. Metadata records are distributed across index files using a dynamic hash function. We further build an order-preserving perfect hash function that memorizes the position of a small file's metadata record within the index file. (c) 2021 Elsevier Inc. All rights reserved.

Hadoop Perfect File: A fast and memory-efficient metadata access archive file to face small files problem in HDFS

期刊

JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING

出版社

ACADEMIC PRESS INC ELSEVIER SCIENCE

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Hadoop Perfect File: A fast and memory-efficient metadata access archive file to face small files problem in HDFS

期刊

JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING

出版社

ACADEMIC PRESS INC ELSEVIER SCIENCE

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文