4.5 Article

HDNN: a cross-platform MLIR dialect for deep neural networks

Related references

Note: Only part of the references are listed.
Article Computer Science, Software Engineering

Applying Intel's oneAPI to a machine learning case study

Pablo Antonio Martinez et al.

Summary: This article discusses different technologies and approaches to address the performance portability problem, with focus on Intel's oneAPI solution. It uses the machine learning framework Caffe as a case study to explore the feasibility and advantages of using oneAPI for development.

CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE (2022)

Article Computer Science, Hardware & Architecture

Domain-Specific Multi-Level IR Rewriting for GPU: The Open Earth Compiler for GPU-accelerated Climate Simulation

Tobias Gysi et al.

Summary: The study shows that multilevel rewriting is efficient in compiler optimization, especially for the weather and climate domain. By designing domain-specific dialects and optimizations, the compiler was optimized and achieved better results than existing solutions.

ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION (2021)

Article Computer Science, Hardware & Architecture

KERNELFARER: Replacing Native-Code Idioms with High-Performance Library Calls

Joao P. L. De Carvalho et al.

Summary: The article introduces KERNELFARER, an idiom recognizer implemented in the existing LLVM compiler framework, which shows more robust idiom recognition compared to alternative solutions with lower compilation overhead. It can match and replace linear algebra idioms, demonstrating significant performance gains.

ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION (2021)

Proceedings Paper Quantum Science & Technology

A MLIR Dialect for Quantum Assembly Languages

Alexander McCaskey et al.

Summary: The study demonstrates the utility of MLIR in quantum computing by extending it to express and compile common quantum assembly languages. By adhering to the QIR specification, a retargetable compiler workflow is implemented, mapping quantum languages to executable binaries and object code. The effectiveness of this novel compiler workflow is validated by using OpenQASM 2.0 to write quantum programs.

2021 IEEE INTERNATIONAL CONFERENCE ON QUANTUM COMPUTING AND ENGINEERING (QCE 2021) / QUANTUM WEEK 2021 (2021)

Proceedings Paper Computer Science, Hardware & Architecture

Ten Lessons From Three Generations Shaped Google's TPUv4i Industrial Product

Norman P. Jouppi et al.

Summary: The passage summarizes the lessons learned from Google's deployment of several TPU generations since 2015, highlighting the importance of semiconductor technology advancement, compiler compatibility, cost considerations, multi-tenancy support, and how these lessons culminated in the development of TPUv4i since 2020.

2021 ACM/IEEE 48TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA 2021) (2021)

Article Computer Science, Interdisciplinary Applications

Performant, Portable, and Productive Parallel Programming With Standard Languages

Michael Wolfe

Summary: Finding a perfect solution for the ${\mathrm P}<^>3$P3 problem, balancing performance, portability, and productivity, remains a challenge. A proposed machine model could aid in designing algorithms and data structures to achieve performance portability, with focus on existing and future standard languages' parallel features advocated for by the community.

COMPUTING IN SCIENCE & ENGINEERING (2021)

Article Computer Science, Interdisciplinary Applications

Navigating Performance, Portability, and Productivity

S. John Pennycook et al.

Summary: This article discusses a methodology for quantifying, summarizing, visualizing, and understanding application performance portability, and programmer productivity, which helps in defining goals, designing experiments, and making forward progress.

COMPUTING IN SCIENCE & ENGINEERING (2021)

Article Computer Science, Hardware & Architecture

A Domain-Specific Supercomputer for Training Deep Neural Networks

Norman P. Jouppi et al.

COMMUNICATIONS OF THE ACM (2020)

Article Computer Science, Hardware & Architecture

Domain-Specific Hardware Accelerators

William J. Dally et al.

COMMUNICATIONS OF THE ACM (2020)

Proceedings Paper Automation & Control Systems

PET-to-MLIR: A polyhedral front-end for MLIR

Konrad Komisarczyk et al.

2020 23RD EUROMICRO CONFERENCE ON DIGITAL SYSTEM DESIGN (DSD 2020) (2020)

Article Computer Science, Theory & Methods

PHAST-A Portable High-Level Modern C plus plus Programming Library for GPUs and Multi-Cores

Biagio Peccerillo et al.

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS (2019)

Article Computer Science, Hardware & Architecture

A New Golden Age for Computer Architecture

John L. Hennessy et al.

COMMUNICATIONS OF THE ACM (2019)

Article Computer Science, Theory & Methods

Demystifying Parallel and Distributed Deep Learning: An In-depth Concurrency Analysis

Tal Ben-Nun et al.

ACM COMPUTING SURVEYS (2019)

Proceedings Paper Computer Science, Software Engineering

HPVM: Heterogeneous Parallel Virtual Machine

Maria Kotsifakou et al.

ACM SIGPLAN NOTICES (2018)

Article Engineering, Electrical & Electronic

Efficient Processing of Deep Neural Networks: A Tutorial and Survey

Vivienne Sze et al.

PROCEEDINGS OF THE IEEE (2017)

Proceedings Paper Computer Science, Artificial Intelligence

In-Datacenter Performance Analysis of a Tensor Processing Unit

Norman P. Jouppi et al.

44TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA 2017) (2017)

Proceedings Paper Computer Science, Theory & Methods

Kokkos: Enabling performance portability across manycore architectures

H. Carter Edwards et al.

2013 EXTREME SCALING WORKSHOP (XSW 2013) (2014)

Editorial Material Computer Science, Interdisciplinary Applications

OPENCL: A PARALLEL PROGRAMMING STANDARD FOR HETEROGENEOUS COMPUTING SYSTEMS

John E. Stone et al.

COMPUTING IN SCIENCE & ENGINEERING (2010)