4.7 Article

Multithreaded runtime framework for parallel and adaptive applications

期刊

ENGINEERING WITH COMPUTERS
卷 38, 期 5, 页码 4675-4695

出版社

SPRINGER
DOI: 10.1007/s00366-022-01713-7

关键词

-

资金

  1. Dominion Fellowship
  2. Richard T. Cheng Endowment at Old Dominion University
  3. NSF [CNS-1828593]

向作者/读者索取更多资源

The PREMA framework introduces a new design for large-scale applications, enabling communication, remote method invocations, and object migrations for load balancing and improved performance. It includes multi-threading support and monitoring interfaces for system load, ensuring task correctness and concurrent migrations.
This paper presents a new design of the Parallel Runtime Environment for Multi-computer Applications (PREMA). This framework provides large-scale applications with one-sided communication, remote method invocations and a global namespace on top of transparent object migrations for implicit load balancing, scheduling, and latency hiding through an easy-to-use interface, for exascale-era platforms. The framework has been augmented with multi-threading, separating communication and execution into different threads to provide asynchronous message reception and instant computation execution. It allows for implicit parallel shared and distributed memory computations and guarantees correctness through an interface for assigning access privileges to parallel tasks while monitoring the load of the system and performing migrations. Scheduling and load balancing are enhanced by introducing custom intra-node schedulers and the ability to perform concurrent migrations. The motivation for the development of the runtime system is to provide a dynamic runtime for adaptive and irregular parallel applications like adaptive mesh refinement. Evaluating the system on such an application indicates an overall performance improvement of up to 50%, compared to static load balancing, with an overhead of less than 1% when using up to 190 computing nodes (i.e., 5600 cores); an improvement achieved by retaining a better work-load distribution among the execution units. Evaluations with a communication-intensive application with static load balancing reveals that no significant overhead is added despite the additional bookkeeping needed to monitor the load of each processing element.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据