期刊
50TH ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE (MICRO)
卷 -, 期 -, 页码 600-611出版社
ASSOC COMPUTING MACHINERY
DOI: 10.1145/3123939.3123976
关键词
GPGPU; SIMD; Data Dependency; Thread Block Scheduling; Dataflow
资金
- NSF [CCF-1423108, CCF-1513201]
GPUs lack fundamental support for data-dependent parallelism and synchronization. While CUDA Dynamic Parallelism signals progress in this direction, many limitations and challenges still remain. This paper introducesWireframe, a hardware-software solution that enables generalized support for data-dependent parallelism and synchronization. Wireframe enables applications to naturally express execution dependencies across different thread blocks through a dependency graph abstraction at run-time, which is sent to the GPU hardware at kernel launch. At run-time, the hardware enforces the dependencies specified in the dependency graph through a dependencyaware thread block scheduler. Overall, Wireframe is able to improve total execution time up to 65.20% with an average of 45.07%.
作者
我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。
推荐
暂无数据