☆ 4.5 Article

Agglomerative Memory and Thread Scheduling for High-Performance Ray-Tracing on GPUs

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS (2022)

期刊

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS

卷 41, 期 2, 页码 334-345

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

DOI: 10.1109/TCAD.2021.3058910

关键词

Ray tracing; Graphics processing units; Instruction sets; Microarchitecture; Acceleration; Hardware; Rendering (computer graphics); Graphics processing unit (GPU); irregularity; memory; ray-tracing; scheduling

类别

Computer Science, Hardware & Architecture Computer Science, Interdisciplinary Applications Engineering, Electrical & Electronic

资金

Key Scientific Instrument and Equipment Development Project of China National Science Foundation [61527812]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

This paper proposes a scheduling mechanism to unleash parallelism in the ray-tracing process on GPUs, which, when combined with a tile-based ray-tracing framework, significantly improves memory efficiency and outperforms traditional GPU architecture.

Ray-tracing rendering has long been considered as a promising technology to enable a higher level of visual experience. The democratization of the ray-tracing rendering to consumer platforms, however, poses significant challenges to rendering hardware and software due to its highly irregular computing patterns. In fact, modern ray-tracing techniques typically depend on a tree-based acceleration structure to reduce the computing complexity of intersection testing of rays and graphics primitives. The traversal by a massive number of rays on a graphics processing unit (GPU) incurs a significant amount of irregular memory traffic, which turns out to be a major stumbling block for real-time performance. In this work, a scheduling mechanism, so-called Agglomerative Memory and Thread Scheduling, is proposed to unleash the inherence parallelism in the ray-tracing process on GPUs. It is associated with a tile-based ray-tracing framework in which the acceleration structure (i.e., KD-tree in this work) is partitioned into subtrees that can be completely loaded into the on-chip L1 cache inside a streaming multiprocessor. An effective scheduling mechanism collects threads with regard to the subtrees hit by their respective rays and regroup threads into warps for dispatching. In addition, subtrees are dynamically preloaded into the L1 cache of multiprocessors in an on-demand fashion. The proposed scheduler can be integrated on today's high-end GPUs with only minor overhead. Microarchitecture simulation results prove that the proposed framework significantly improves memory efficiency and outperforms a traditional GPU microarchitecture by 47.4% for average.

Agglomerative Memory and Thread Scheduling for High-Performance Ray-Tracing on GPUs

期刊

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Agglomerative Memory and Thread Scheduling for High-Performance Ray-Tracing on GPUs

期刊

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文