Related references
Note: Only part of the references are listed.Applying Intel's oneAPI to a machine learning case study
Pablo Antonio Martinez et al.
CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE (2022)
Domain-Specific Multi-Level IR Rewriting for GPU: The Open Earth Compiler for GPU-accelerated Climate Simulation
Tobias Gysi et al.
ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION (2021)
KERNELFARER: Replacing Native-Code Idioms with High-Performance Library Calls
Joao P. L. De Carvalho et al.
ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION (2021)
A MLIR Dialect for Quantum Assembly Languages
Alexander McCaskey et al.
2021 IEEE INTERNATIONAL CONFERENCE ON QUANTUM COMPUTING AND ENGINEERING (QCE 2021) / QUANTUM WEEK 2021 (2021)
Ten Lessons From Three Generations Shaped Google's TPUv4i Industrial Product
Norman P. Jouppi et al.
2021 ACM/IEEE 48TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA 2021) (2021)
Performant, Portable, and Productive Parallel Programming With Standard Languages
Michael Wolfe
COMPUTING IN SCIENCE & ENGINEERING (2021)
Navigating Performance, Portability, and Productivity
S. John Pennycook et al.
COMPUTING IN SCIENCE & ENGINEERING (2021)
A Domain-Specific Supercomputer for Training Deep Neural Networks
Norman P. Jouppi et al.
COMMUNICATIONS OF THE ACM (2020)
Domain-Specific Hardware Accelerators
William J. Dally et al.
COMMUNICATIONS OF THE ACM (2020)
PET-to-MLIR: A polyhedral front-end for MLIR
Konrad Komisarczyk et al.
2020 23RD EUROMICRO CONFERENCE ON DIGITAL SYSTEM DESIGN (DSD 2020) (2020)
PHAST-A Portable High-Level Modern C plus plus Programming Library for GPUs and Multi-Cores
Biagio Peccerillo et al.
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS (2019)
A New Golden Age for Computer Architecture
John L. Hennessy et al.
COMMUNICATIONS OF THE ACM (2019)
Demystifying Parallel and Distributed Deep Learning: An In-depth Concurrency Analysis
Tal Ben-Nun et al.
ACM COMPUTING SURVEYS (2019)
HPVM: Heterogeneous Parallel Virtual Machine
Maria Kotsifakou et al.
ACM SIGPLAN NOTICES (2018)
Efficient Processing of Deep Neural Networks: A Tutorial and Survey
Vivienne Sze et al.
PROCEEDINGS OF THE IEEE (2017)
In-Datacenter Performance Analysis of a Tensor Processing Unit
Norman P. Jouppi et al.
44TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA 2017) (2017)
Kokkos: Enabling performance portability across manycore architectures
H. Carter Edwards et al.
2013 EXTREME SCALING WORKSHOP (XSW 2013) (2014)
OPENCL: A PARALLEL PROGRAMMING STANDARD FOR HETEROGENEOUS COMPUTING SYSTEMS
John E. Stone et al.
COMPUTING IN SCIENCE & ENGINEERING (2010)