4.7 Article

Moving small files in a networked environment

Publisher

ELSEVIER
DOI: 10.1016/j.future.2022.09.016

Keywords

Data movement; File transfer; Small files; High-speed networking; Distributed computing; Data transfer pipeline; GridFTP

Ask authors/readers for more resources

Globally distributed computing infrastructures generate and exchange a large volume of data. While the transfer of large files has been optimized, transferring small files still faces challenges. This transfer is constrained by file system throughput. By building a data transfer pipeline model and extending existing solutions, we propose several engineering approaches to improve the efficiency of small file transfer.
Globally distributed computing infrastructures, such as clouds and supercomputers, are currently used to manage data that is generated with an unprecedented speed from a variety of resources. Coping with this trend, the volume of data exchanged across distant sites increases substantially. To accelerate data transfer, high-speed networks are provided to connect remote sites. Most existing data movement solutions are optimized for moving large files. However, it is still challenging to transfer a large number of small files across networks. This disadvantage not only lowers data transfer performance, but also decreases overall system utilization. We identify that moving small files is mainly constrained by degraded file system throughput, not just network performance as might be suspected. We have built a data transfer pipeline model to analyze the impact of small network I/O and storage I/O on data movement. Extending one of the widely used open source data movement solutions, GridFTP, we demonstrate several appropriate engineering approaches that mitigate the bottleneck and increase data transfer efficiency. We show optimizations that improve data transfer performance more than 5 times. In comparison to existing solutions, our approaches can save a significant amount of system resources for moving lots of small files.Crown Copyright (c) 2022 Published by Elsevier B.V. All rights reserved.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available