3.8 Proceedings Paper

Iris: Automatic Generation of Efficient Data Layouts for High Bandwidth Utilization

Related references

Note: Only part of the references are listed.
Article Computer Science, Hardware & Architecture

Automatic Creation of High-bandwidth Memory Architectures from Domain-specific Languages: The Case of Computational Fluid Dynamics

Stephanie Soldavini et al.

Summary: This article proposes an automated tool flow for generating massively parallel accelerators on high-bandwidth-memory-equipped FPGAs from a domain-specific language. The method allows designers to integrate and evaluate various compiler or hardware optimizations. Experimental results show that this approach enables efficient data movement and processing, and achieves up to 103 GFLOPS with one compute unit on a Xilinx Alveo U280, which is up to 25x more energy efficient than expert-crafted Intel CPU implementations.

ACM TRANSACTIONS ON RECONFIGURABLE TECHNOLOGY AND SYSTEMS (2023)

Proceedings Paper Computer Science, Artificial Intelligence

EVEREST: A design environment for extreme-scale big data analytics on heterogeneous platforms

Christian Pilato et al.

Summary: HPDA applications deal with large volumes of distributed and heterogeneous data, requiring efficient computation. The EVEREST project aims to develop a comprehensive environment for the co-design of HPDA applications on heterogeneous, distributed, and secure platforms, focusing on programmability through a data-driven design approach, hardware-accelerated AI, and efficient runtime monitoring with virtualization support. The project combines state-of-the-art programming models, emerging communication standards, and novel domain-specific extensions to drive research and development.

PROCEEDINGS OF THE 2021 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION (DATE 2021) (2021)

Proceedings Paper Computer Science, Theory & Methods

Accelerating Neural Network Training using Arbitrary Precision Approximating Matrix Multiplication Algorithms

Grey Ballard et al.

Summary: The study suggests using Arbitrary Precision Approximating (APA) algorithms to speed up matrix multiplication in deep neural networks. By efficiently implementing and parallelizing them on multicore CPUs, significant performance improvements were achieved in training deep neural networks.

50TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING WORKSHOP PROCEEDINGS - ICPP WORKSHOPS '21 (2021)

Article Multidisciplinary Sciences

The future of computing beyond Moore's Law

John Shalf

PHILOSOPHICAL TRANSACTIONS OF THE ROYAL SOCIETY A-MATHEMATICAL PHYSICAL AND ENGINEERING SCIENCES (2020)

Proceedings Paper Computer Science, Hardware & Architecture

StreamBox-HBM: Stream Analytics on High Bandwidth Hybrid Memory

Hongyu Miao et al.

TWENTY-FOURTH INTERNATIONAL CONFERENCE ON ARCHITECTURAL SUPPORT FOR PROGRAMMING LANGUAGES AND OPERATING SYSTEMS (ASPLOS XXIV) (2019)

Article Social Sciences, Mathematical Methods

Apportionment methods

Ulrich Kohler et al.

STATA JOURNAL (2012)