4.6 Article

Benchmarking a New Paradigm: Experimental Analysis and Characterization of a Real Processing-in-Memory System

相关参考文献

注意:仅列出部分参考文献,下载原文获取全部文献信息。
Article Computer Science, Hardware & Architecture

Near-Memory Processing in Action: Accelerating Personalized Recommendation With AxDIMM

Liu Ke et al.

Summary: Near-memory processing (NMP) is a computing paradigm that improves the performance of memory-constrained workloads by moving the compute capability next to the main memory. Experimental results show that using a versatile FPGA-enabled NMP platform can significantly enhance the performance of recommendation inference serving.

IEEE MICRO (2022)

Article Computer Science, Theory & Methods

GIRAF: General Purpose In-Storage Resistive Associative Framework

Leonid Yavits et al.

Summary: GIRAF is a general framework that combines storage and parallel associative processing using resistive content addressable memory. It improves performance by conducting computations within the storage arrays and addresses bandwidth limitations.

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS (2022)

Article Computer Science, Hardware & Architecture

SparseP: Towards Efficient Sparse Matrix Vector Multiplication on Real Processing-In-Memory Architectures

Christina Giannoula et al.

Summary: This paper presents the first comprehensive analysis of Sparse Matrix Vector Multiplication (SpMV) on a real-world Processing-In-Memory (PIM) architecture. The study shows that executing SpMV on a PIM system can significantly improve performance and energy efficiency.

PROCEEDINGS OF THE ACM ON MEASUREMENT AND ANALYSIS OF COMPUTING SYSTEMS (2022)

Review Engineering, Electrical & Electronic

In-memory Learning with Analog Resistive Switching Memory: A Review and Perspective

Yue Xi et al.

Summary: This article reviews analog resistive switching memory (RSM) devices and their hardware technologies for in-memory learning, discussing the impact of different FoM levels on system functionality and efficiency, analyzing hardware optimization methods, and discussing the challenges and prospects from the device to system level.

PROCEEDINGS OF THE IEEE (2021)

Article Computer Science, Hardware & Architecture

Enabling fast and energy-efficient FM-index exact matching using processing-near-memory

Jose M. Herruzo et al.

Summary: This paper evaluates the performance and energy consumption of two classes of processor architectures when executing the FM-index exact matching algorithm, demonstrating that PNM solution can significantly improve performance and reduce energy consumption.

JOURNAL OF SUPERCOMPUTING (2021)

Article Engineering, Electrical & Electronic

A Survey of Test and Reliability Solutions for Magnetic Random Access Memories

Patrick Girard et al.

Summary: Memories are a significant part of system-on-chips and contribute to the system power consumption. This article discusses the potential of magnetic random access memories to mitigate Flash shortcomings and be used as replacements for DRAM and SRAM. It provides an up-to-date coverage of MRAM test and reliability solutions in the literature, focusing on defectiveness and reliability issues.

PROCEEDINGS OF THE IEEE (2021)

Article Computer Science, Hardware & Architecture

FPGA-Based Near-Memory Acceleration of Modern Data-Intensive Applications

Gagandeep Singh et al.

Summary: Modern data-intensive applications require high computational capabilities but are limited by strict power constraints. The development of FPGAs with HBM provides a solution to alleviate the bottleneck of data movement, improving efficiency and energy savings in computing systems.

IEEE MICRO (2021)

Proceedings Paper Computer Science, Hardware & Architecture

SIMDRAM: A Framework for Bit-Serial SIMD Processing using DRAM

Nastaran Hajinazar et al.

Summary: SIMDRAM is a flexible general-purpose processing-with-DRAM framework that supports the efficient implementation of complex operations and user-defined operations. By utilizing a control unit inside the memory controller, SIMDRAM manages the computation of operations from start to end, providing efficiency and flexibility.

ASPLOS XXVI: TWENTY-SIXTH INTERNATIONAL CONFERENCE ON ARCHITECTURAL SUPPORT FOR PROGRAMMING LANGUAGES AND OPERATING SYSTEMS (2021)

Proceedings Paper Computer Science, Hardware & Architecture

Google Neural Network Models for Edge Devices: Analyzing and Mitigating Machine Learning Inference Bottlenecks

Amirali Boroumand et al.

Summary: The study identifies three major shortcomings of the Google edge TPU: suboptimal computational throughput, inefficient energy usage, and bottleneck in memory system. A new acceleration framework called Mensa is proposed to address these issues by utilizing multiple heterogeneous edge ML accelerators tailored to specific subsets of NN models and layers, resulting in significant improvements in energy efficiency and throughput compared to existing state-of-the-art accelerators.

30TH INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES (PACT 2021) (2021)

Proceedings Paper Computer Science, Hardware & Architecture

Hardware Architecture and Software Stack for PIM Based on Commercial DRAM Technology

Sukhan Lee et al.

Summary: Emerging applications require high off-chip memory bandwidth, but it is costly to increase the bandwidth and transferring data across the memory hierarchy consumes a large amount of energy. To improve efficiency, researchers are revisiting past processing-in-memory architectures, especially leveraging recent integration technologies.

2021 ACM/IEEE 48TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA 2021) (2021)

Proceedings Paper Computer Science, Hardware & Architecture

QUAC-TRNG: High Throughput True Random Number Generation Using Quadruple Row Activation in Commodity DRAM Chips

Ataberk Olgun et al.

Summary: QUAC-TRNG is a new high-throughput true random number generator that can be fully implemented in commodity DRAM chips, helping computing systems without dedicated TRNG hardware to obtain security guarantees. By exploiting a carefully engineered sequence of DRAM commands to activate four consecutive DRAM rows, QUAC-TRNG causes the bitline sense amplifiers to converge non-deterministically to random values. This approach allows QUAC-TRNG to reliably generate true random numbers with a high throughput, surpassing the state-of-the-art DRAM-based TRNG in both basic and throughput-optimized versions.

2021 ACM/IEEE 48TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA 2021) (2021)

Article Computer Science, Information Systems

DAMOV: A New Methodology and Benchmark Suite for Evaluating Data Movement Bottlenecks

Geraldo F. Oliveira et al.

Summary: Data movement between the CPU and main memory is a major bottleneck for improving performance, scalability, and energy efficiency in modern computer systems. Various techniques have been employed to reduce this overhead, from traditional cache hierarchies to emerging Near-Data Processing (NDP) methods. However, there is still a lack of understanding regarding the key metrics for identifying data movement bottlenecks and their relation to different mitigation mechanisms.

IEEE ACCESS (2021)

Proceedings Paper Computer Science, Hardware & Architecture

Mixed Precision Quantization for ReRAM-based DNN Inference Accelerators

Sitao Huang et al.

Summary: A mixed precision quantization scheme for ReRAM-based DNN inference accelerators was proposed in this study, reducing inference latency and energy consumption significantly while only losing a small amount of accuracy. It jointly applies weight quantization, input quantization, and partial sum quantization for each DNN layer, and includes an automated quantization flow powered by deep reinforcement learning to search for the optimal configuration.

2021 26TH ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE (ASP-DAC) (2021)

Proceedings Paper Computer Science, Hardware & Architecture

BlockHammer: Preventing RowHammer at Low Cost by Blacklisting Rapidly-Accessed DRAM Rows

A. Giray Yaglikci et al.

Summary: The aggressive memory density scaling has made modern DRAM devices susceptible to RowHammer attacks, with current mitigation mechanisms facing challenges in performance and design modifications. BlockHammer offers a low-cost, effective, and easy solution to prevent all RowHammer bit-flips efficiently without needing knowledge of or changes to DRAM internals, thus significantly improving system security and performance.

2021 27TH IEEE INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE (HPCA 2021) (2021)

Proceedings Paper Computer Science, Hardware & Architecture

SynCron: Efficient Synchronization Support for Near-Data-Processing Architectures

Christina Giannoula et al.

Summary: Near-Data-Processing architectures support multiple NDP units, each containing multiple simple cores close to memory. Efficient synchronization among the NDP cores is necessary for high performance of parallel workloads.

2021 27TH IEEE INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE (HPCA 2021) (2021)

Proceedings Paper Computer Science, Hardware & Architecture

FAFNIR: Accelerating Sparse Gathering by Using Efficient Near-Memory Intelligent Reduction

Bahar Asgari et al.

Summary: The study introduces an efficient solution for sparse data gathering called Fafnir, which minimizes data movement utilizing an intelligent reduction tree in memory and maximizes parallel memory accesses in near-data processing. Fafnir does not rely on spatial locality, offering higher efficiency and faster performance compared to existing NDP proposals.

2021 27TH IEEE INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE (HPCA 2021) (2021)

Article Engineering, Electrical & Electronic

In-Memory Low-Cost Bit-Serial Addition Using Commodity DRAM Technology

Mustafa E. Ali et al.

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS (2020)

Article Computer Science, Hardware & Architecture

RowHammer: A Retrospective

Onur Mutlu et al.

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS (2020)

Article Computer Science, Hardware & Architecture

PANTHER: A Programmable Architecture for Neural Network Training Harnessing Energy-Efficient ReRAM

Aayush Ankit et al.

IEEE TRANSACTIONS ON COMPUTERS (2020)

Article Computer Science, Hardware & Architecture

Accelerating Genome Analysis: A Primer on an Ongoing Journey

Mohammed Alser et al.

IEEE MICRO (2020)

Proceedings Paper Computer Science, Hardware & Architecture

NERO: A Near High-Bandwidth Memory Stencil Accelerator forWeather Prediction Modeling

Gagandeep Singh et al.

2020 30TH INTERNATIONAL CONFERENCE ON FIELD-PROGRAMMABLE LOGIC AND APPLICATIONS (FPL) (2020)

Proceedings Paper Biochemical Research Methods

Variant Calling Parallelization on Processor-in-Memory Architecture

Dominique Lavenier et al.

2020 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (2020)

Proceedings Paper Computer Science, Hardware & Architecture

NATSA: A Near-Data Processing Accelerator for Time Series Analysis

Ivan Fernandez et al.

2020 IEEE 38TH INTERNATIONAL CONFERENCE ON COMPUTER DESIGN (ICCD 2020) (2020)

Proceedings Paper Computer Science, Hardware & Architecture

A Heterogeneous PIM Hardware-Software Co-Design for Energy-Efficient Graph Processing

Yu Huang et al.

2020 IEEE 34TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM IPDPS 2020 (2020)

Proceedings Paper Computer Science, Hardware & Architecture

The Virtual Block Interface: A Flexible Alternative to the Conventional Virtual Memory Framework

Nastaran Hajinazar et al.

2020 ACM/IEEE 47TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA 2020) (2020)

Proceedings Paper Computer Science, Information Systems

TRRespass: Exploiting the Many Sides of Target Row Refresh

Pietro Frigo et al.

2020 IEEE SYMPOSIUM ON SECURITY AND PRIVACY (SP 2020) (2020)

Proceedings Paper Computer Science, Information Systems

Are We Susceptible to Rowhammer? An End-to-End Methodology for Cloud Providers

Lucian Cojocar et al.

2020 IEEE SYMPOSIUM ON SECURITY AND PRIVACY (SP 2020) (2020)

Proceedings Paper Computer Science, Hardware & Architecture

Revisiting RowHammer: An Experimental Analysis of Modern DRAM Devices and Mitigation Techniques

Jeremie S. Kim et al.

2020 ACM/IEEE 47TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA 2020) (2020)

Article Computer Science, Hardware & Architecture

NoM: Network-on-Memory for Inter-Bank Data Transfer in Highly-Banked Memories

Seyyed Hossein SeyyedAghaei Rezaei et al.

IEEE COMPUTER ARCHITECTURE LETTERS (2020)

Article Computer Science, Hardware & Architecture

GraphH: A Processing-in-Memory Architecture for Large-Scale Graph Processing

Guohao Dai et al.

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS (2019)

Article Computer Science, Hardware & Architecture

Processing data where it makes sense: Enabling in-memory computation

Onur Mutlu et al.

MICROPROCESSORS AND MICROSYSTEMS (2019)

Article Computer Science, Hardware & Architecture

Processing-in-memory: A workload-driven perspective

S. Ghose et al.

IBM JOURNAL OF RESEARCH AND DEVELOPMENT (2019)

Proceedings Paper Computer Science, Hardware & Architecture

D-RaNGe: Using Commodity DRAM Devices to Generate True Random Numbers with Low Latency and High Throughput

Jeremie S. Kim et al.

2019 25TH IEEE INTERNATIONAL SYMPOSIUM ON HIGH PERFORMANCE COMPUTER ARCHITECTURE (HPCA) (2019)

Proceedings Paper Computer Science, Theory & Methods

Towards a Scatter-Gather Architecture Hardware and Software Issues

Arun Rodrigues et al.

MEMSYS 2019: PROCEEDINGS OF THE INTERNATIONAL SYMPOSIUM ON MEMORY SYSTEMS (2019)

Proceedings Paper Computer Science, Hardware & Architecture

CoNDA: Efficient Cache Coherence Support for Near-Data Accelerators

Amirali Boroumand et al.

PROCEEDINGS OF THE 2019 46TH INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA '19) (2019)

Proceedings Paper Computer Science, Hardware & Architecture

Duality Cache for Data Parallel Acceleration

Daichi Fujiki et al.

PROCEEDINGS OF THE 2019 46TH INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA '19) (2019)

Proceedings Paper Computer Science, Hardware & Architecture

GraphQ: Scalable PIM-Based Graph Processing

Youwei Zhuo et al.

MICRO'52: THE 52ND ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE (2019)

Proceedings Paper Computer Science, Hardware & Architecture

SMASH: Co-designing Software Compression and Hardware-Accelerated Indexing for Efficient Sparse Matrix Operations

Konstantinos Kanellopoulos et al.

MICRO'52: THE 52ND ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE (2019)

Proceedings Paper Computer Science, Hardware & Architecture

MEDAL: Scalable DIMM based Near Data Processing Accelerator for DNA Seeding Algorithm

Wenqin Huangfu et al.

MICRO'52: THE 52ND ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE (2019)

Proceedings Paper Computer Science, Software Engineering

NAPEL: Near-Memory Computing Application Performance Prediction via Ensemble Learning

Gagandeep Singh et al.

PROCEEDINGS OF THE 2019 56TH ACM/EDAC/IEEE DESIGN AUTOMATION CONFERENCE (DAC) (2019)

Proceedings Paper Computer Science, Software Engineering

INVITED: Enabling Practical Processing in and near Memory for Data-Intensive Computing

Onur Mutlu et al.

PROCEEDINGS OF THE 2019 56TH ACM/EDAC/IEEE DESIGN AUTOMATION CONFERENCE (DAC) (2019)

Proceedings Paper Computer Science, Software Engineering

AlignS: A Processing-In-Memory Accelerator for DNA Short Read Alignment Leveraging SOT-MRAM

Shaahin Angizi et al.

PROCEEDINGS OF THE 2019 56TH ACM/EDAC/IEEE DESIGN AUTOMATION CONFERENCE (DAC) (2019)

Proceedings Paper Computer Science, Software Engineering

Automatic Generation of Warp-Level Primitives and Atomic Instructions for Fast and Portable Parallel Reduction on GPUs

Simon Garcia De Gonzalo et al.

PROCEEDINGS OF THE 2019 IEEE/ACM INTERNATIONAL SYMPOSIUM ON CODE GENERATION AND OPTIMIZATION (CGO '19) (2019)

Proceedings Paper Computer Science, Theory & Methods

GraphiDe: A Graph Processing Accelerator leveraging In-DRAM-Computing

Shaahin Angizi et al.

GLSVLSI '19 - PROCEEDINGS OF THE 2019 ON GREAT LAKES SYMPOSIUM ON VLSI (2019)

Article Biotechnology & Applied Microbiology

GRIM-Filter: Fast seed location filtering in DNA read mapping using processing-in-memory technologies

Jeremie S. Kim et al.

BMC GENOMICS (2018)

Article Computer Science, Hardware & Architecture

McDRAM: Low Latency and Energy-Efficient Matrix Computations in DRAM

Hyunsung Shin et al.

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS (2018)

Proceedings Paper Computer Science, Artificial Intelligence

CMP-PIM: An Energy-Efficient Comparator-based Processing-In-Memory Neural Network Accelerator

Shaahin Angizi et al.

2018 55TH ACM/ESDA/IEEE DESIGN AUTOMATION CONFERENCE (DAC) (2018)

Proceedings Paper Computer Science, Hardware & Architecture

Massively Parallel Skyline Computation For Processing-In-Memory Architectures

Vasileios Zois et al.

27TH INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES (PACT 2018) (2018)

Proceedings Paper Computer Science, Theory & Methods

Design Space Exploration of Near Memory Accelerators

Scott Lloyd et al.

PROCEEDINGS OF THE INTERNATIONAL SYMPOSIUM ON MEMORY SYSTEMS (MEMSYS 2018) (2018)

Proceedings Paper Computer Science, Artificial Intelligence

Matrix Profile XI: SCRIMP plus plus : Time Series Motif Discovery at Interactive Speeds

Yan Zhu et al.

2018 IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM) (2018)

Proceedings Paper Computer Science, Hardware & Architecture

Solar-DRAM: Reducing DRAM Access Latency by Exploiting the Variation in Local Bitlines

Jeremie S. Kim et al.

2018 IEEE 36TH INTERNATIONAL CONFERENCE ON COMPUTER DESIGN (ICCD) (2018)

Proceedings Paper Computer Science, Hardware & Architecture

GraphP: Reducing Communication for PIM-based Graph Processing with Efficient Data Partition

Mingxing Zhang et al.

2018 24TH IEEE INTERNATIONAL SYMPOSIUM ON HIGH PERFORMANCE COMPUTER ARCHITECTURE (HPCA) (2018)

Proceedings Paper Computer Science, Hardware & Architecture

The DRAM Latency PUF: Quickly Evaluating Physical Unclonable Functions by Exploiting the Latency-Reliability Tradeoff in Modern Commodity DRAM Devices

Jeremie S. Kim et al.

2018 24TH IEEE INTERNATIONAL SYMPOSIUM ON HIGH PERFORMANCE COMPUTER ARCHITECTURE (HPCA) (2018)

Proceedings Paper Computer Science, Software Engineering

GoogleWorkloads for Consumer Devices: Mitigating Data Movement Bottlenecks

Amirali Boroumand et al.

ACM SIGPLAN NOTICES (2018)

Article Computer Science, Hardware & Architecture

LazyPIM: An Efficient Cache Coherence Mechanism for Processing-in-Memory

Amirali Boroumand et al.

IEEE COMPUTER ARCHITECTURE LETTERS (2017)

Proceedings Paper Computer Science, Hardware & Architecture

Concurrent Data Structures for Near-Memory Computing

Zhiyu Liu et al.

PROCEEDINGS OF THE 29TH ACM SYMPOSIUM ON PARALLELISM IN ALGORITHMS AND ARCHITECTURES (SPAA'17) (2017)

Proceedings Paper Computer Science, Artificial Intelligence

The Mondrian Data Engine

Mario Drumond et al.

44TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA 2017) (2017)

Article Computer Science, Hardware & Architecture

CAIRO: A Compiler-Assisted Technique for Enabling Instruction-Level Offloading of Processing-In-Memory

Ramyad Hadidi et al.

ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION (2017)

Proceedings Paper Computer Science, Hardware & Architecture

GraphPIM: Enabling Instruction-Level PIM Offloading in Graph Computing Frameworks

Lifeng Nai et al.

2017 23RD IEEE INTERNATIONAL SYMPOSIUM ON HIGH PERFORMANCE COMPUTER ARCHITECTURE (HPCA) (2017)

Proceedings Paper Computer Science, Software Engineering

TETRIS: Scalable and Efficient Neural Network Acceleration with 3D Memory

Mingyu Gao et al.

OPERATING SYSTEMS REVIEW (2017)

Proceedings Paper Computer Science, Theory & Methods

Toward Standardized Near-Data Processing with Unrestricted Data Placement for GPUs

Gwangsun Kim et al.

SC'17: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS (2017)

Proceedings Paper Computer Science, Theory & Methods

Detecting and Mitigating Data-Dependent DRAM Failures by Exploiting Current Memory Content

Samira Khan et al.

50TH ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE (MICRO) (2017)

Proceedings Paper Computer Science, Hardware & Architecture

TETRIS: Scalable and Efficient Neural Network Acceleration with 3D Memory

Mingyu Gao et al.

TWENTY-SECOND INTERNATIONAL CONFERENCE ON ARCHITECTURAL SUPPORT FOR PROGRAMMING LANGUAGES AND OPERATING SYSTEMS (ASPLOS XXII) (2017)

Proceedings Paper Computer Science, Hardware & Architecture

Compute Caches

Shaizeen Aga et al.

2017 23RD IEEE INTERNATIONAL SYMPOSIUM ON HIGH PERFORMANCE COMPUTER ARCHITECTURE (HPCA) (2017)

Article Computer Science, Hardware & Architecture

Simultaneous Multi-Layer Access: Improving 3D-Stacked Memory Bandwidth at Low Cost

Donghyuk Lee et al.

ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION (2016)

Article Computer Science, Theory & Methods

In-Place Matrix Transposition on GPUs

Juan Gomez-Luna et al.

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS (2016)

Article Computer Science, Hardware & Architecture

Ramulator: A Fast and Extensible DRAM Simulator

Yoongu Kim et al.

IEEE COMPUTER ARCHITECTURE LETTERS (2016)

Proceedings Paper Computer Science, Hardware & Architecture

Accelerating Dependent Cache Misses with an Enhanced Memory Controller

Milad Hashemi et al.

2016 ACM/IEEE 43RD ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA) (2016)

Proceedings Paper Computer Science, Hardware & Architecture

Pinatubo: A Processing-in-Memory Architecture for Bulk Bitwise Operations in Emerging Non-volatile Memories

Shuangchen Li et al.

2016 ACM/EDAC/IEEE DESIGN AUTOMATION CONFERENCE (DAC) (2016)

Article Computer Science, Hardware & Architecture

In-Memory Data Rearrangement for Irregular, Data-Intensive Computing

Scott Lloyd et al.

COMPUTER (2015)

Article Computer Science, Hardware & Architecture

Active Memory Cube: A processing-in-memory architecture for exascale systems

R. Nair et al.

IBM JOURNAL OF RESEARCH AND DEVELOPMENT (2015)

Article Engineering, Electrical & Electronic

Evolution of Memory Architecture

Ravi Nair

PROCEEDINGS OF THE IEEE (2015)

Article Computer Science, Hardware & Architecture

Fast Bulk Bitwise AND and OR in DRAM

Vivek Seshadri et al.

IEEE COMPUTER ARCHITECTURE LETTERS (2015)

Proceedings Paper Computer Science, Hardware & Architecture

Data Reorganization in Memory Using 3D-stacked DRAM

Berkin Akin et al.

2015 ACM/IEEE 42ND ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA) (2015)

Proceedings Paper Computer Science, Theory & Methods

Revisiting Memory Errors in Large-Scale Production Data Centers: Analysis and Modeling of New Trends from the Field

Justin Meza et al.

2015 45TH ANNUAL IEEE/IFIP INTERNATIONAL CONFERENCE ON DEPENDABLE SYSTEMS AND NETWORKS (2015)

Proceedings Paper Computer Science, Hardware & Architecture

BSSync: Processing Near Memory for Machine Learning Workloads with Bounded Staleness Consistency Models

Joo Hwan Lee et al.

2015 INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURE AND COMPILATION (PACT) (2015)

Proceedings Paper Computer Science, Hardware & Architecture

Practical Near-Data Processing for In-memory Analytics Frameworks

Mingyu Gao et al.

2015 INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURE AND COMPILATION (PACT) (2015)

Proceedings Paper Computer Science, Hardware & Architecture

In-Place Data Sliding Algorithms for Many-Core Architectures

Juan Gomez-Luna et al.

2015 44TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING (ICPP) (2015)

Proceedings Paper Engineering, Electrical & Electronic

Design, Packaging, and Architectural Policy Co-optimization for DC Power Integrity in 3D DRAM

Yarui Peng et al.

2015 52ND ACM/EDAC/IEEE DESIGN AUTOMATION CONFERENCE (DAC) (2015)

Proceedings Paper Computer Science, Hardware & Architecture

A Scalable Processing-in-Memory Accelerator for Parallel Graph Processing

Junwhan Ahn et al.

2015 ACM/IEEE 42ND ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA) (2015)

Proceedings Paper Computer Science, Hardware & Architecture

PIM-Enabled Instructions: A Low-Overhead, Locality-Aware Processing-in-Memory Architecture

Junwhan Ahn et al.

2015 ACM/IEEE 42ND ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA) (2015)

Article Computer Science, Hardware & Architecture

GP-SIMD Processing-in-Memory

Amir Morad et al.

ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION (2014)

Article Computer Science, Hardware & Architecture

Efficient Data Mapping and Buffering Techniques for Multilevel Cell Phase-Change Memories

Hanbin Yoon et al.

ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION (2014)

Article Computer Science, Hardware & Architecture

NEAR-DATA PROCESSING: INSIGHTS FROM A MICRO-46 WORKSHOP

Rajeev Balasubramonian et al.

IEEE MICRO (2014)

Article Engineering, Electrical & Electronic

MAGIC-Memristor-Aided Logic

Shahar Kvatinsky et al.

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS (2014)

Article Computer Science, Hardware & Architecture

Memristor-Based Material Implication (IMPLY) Logic: Design Principles and Methodologies

Shahar Kvatinsky et al.

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS (2014)

Article Engineering, Electrical & Electronic

Logic operations in memory using a memristive Akers array

Yifat Levy et al.

MICROELECTRONICS JOURNAL (2014)

Proceedings Paper Computer Science, Hardware & Architecture

Flipping Bits in Memory Without Accessing Them: An Experimental Study of DRAM Disturbance Errors

Yoongu Kim et al.

2014 ACM/IEEE 41ST ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA) (2014)

Article Computer Science, Theory & Methods

Performance Modeling of Atomic Additions on GPU Scratchpad Memory

Juan Gomez-Luna et al.

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS (2013)

Article Computer Science, Artificial Intelligence

An optimized approach to histogram computation on GPU

Juan Gomez-Luna et al.

MACHINE VISION AND APPLICATIONS (2013)

Article Engineering, Electrical & Electronic

Metal-Oxide RRAM

H. -S. Philip Wong et al.

PROCEEDINGS OF THE IEEE (2012)

Article Computer Science, Hardware & Architecture

Phase Change Memory Architecture and the Quest for Scalability

Benjamin C. Lee et al.

COMMUNICATIONS OF THE ACM (2010)

Article Computer Science, Hardware & Architecture

PHASE-CHANGE TECHNOLOGY AND THE FUTURE OF MAIN MEMORY

Benjamin C. Lee et al.

IEEE MICRO (2010)

Article Engineering, Electrical & Electronic

Phase Change Memory

H. -S. Philip Wong et al.

PROCEEDINGS OF THE IEEE (2010)

Article Computer Science, Hardware & Architecture

Roofline: An Insightful Visual Performance Model for Multicore Architectures

Samuel Williams et al.

COMMUNICATIONS OF THE ACM (2009)

Article Multidisciplinary Sciences

The missing memristor found

Dmitri B. Strukov et al.

NATURE (2008)

Article Computer Science, Software Engineering

An updated set of Basic Linear Algebra Subprograms (BLAS)

LS Blackford et al.

ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE (2002)

Article Computer Science, Hardware & Architecture

Challenges and future directions for the scaling of dynamic random-access memory (DRAM)

JA Mandelman et al.

IBM JOURNAL OF RESEARCH AND DEVELOPMENT (2002)