Accelerating Neural Network Inference with Processing-in-DRAM: From the Edge to the Cloud

Oliveira, Geraldo F.; Gómez-Luna, Juan; Ghose, Saugata; Boroumand, Amirali; Mutlu, Onur

Computer Science > Hardware Architecture

arXiv:2209.08938 (cs)

[Submitted on 19 Sep 2022 (v1), last revised 27 Mar 2023 (this version, v2)]

Title:Accelerating Neural Network Inference with Processing-in-DRAM: From the Edge to the Cloud

Authors:Geraldo F. Oliveira, Juan Gómez-Luna, Saugata Ghose, Amirali Boroumand, Onur Mutlu

View PDF

Abstract:Neural networks (NNs) are growing in importance and complexity. A neural network's performance (and energy efficiency) can be bound either by computation or memory resources. The processing-in-memory (PIM) paradigm, where computation is placed near or within memory arrays, is a viable solution to accelerate memory-bound NNs. However, PIM architectures vary in form, where different PIM approaches lead to different trade-offs. Our goal is to analyze, discuss, and contrast DRAM-based PIM architectures for NN performance and energy efficiency. To do so, we analyze three state-of-the-art PIM architectures: (1) UPMEM, which integrates processors and DRAM arrays into a single 2D chip; (2) Mensa, a 3D-stack-based PIM architecture tailored for edge devices; and (3) SIMDRAM, which uses the analog principles of DRAM to execute bit-serial operations. Our analysis reveals that PIM greatly benefits memory-bound NNs: (1) UPMEM provides 23x the performance of a high-end GPU when the GPU requires memory oversubscription for a general matrix-vector multiplication kernel; (2) Mensa improves energy efficiency and throughput by 3.0x and 3.1x over the Google Edge TPU for 24 Google edge NN models; and (3) SIMDRAM outperforms a CPU/GPU by 16.7x/1.4x for three binary NNs. We conclude that the ideal PIM architecture for NN models depends on a model's distinct attributes, due to the inherent architectural design choices.

Comments:	This is an extended and updated version of a paper published in IEEE Micro, pp. 1-14, 29 Aug. 2022. arXiv admin note: text overlap with arXiv:2109.14320
Subjects:	Hardware Architecture (cs.AR); Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG)
Cite as:	arXiv:2209.08938 [cs.AR]
	(or arXiv:2209.08938v2 [cs.AR] for this version)
	https://doi.org/10.48550/arXiv.2209.08938

Submission history

From: Geraldo Francisco De Oliveira Junior [view email]
[v1] Mon, 19 Sep 2022 11:46:05 UTC (1,332 KB)
[v2] Mon, 27 Mar 2023 17:16:03 UTC (966 KB)

Computer Science > Hardware Architecture

Title:Accelerating Neural Network Inference with Processing-in-DRAM: From the Edge to the Cloud

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Hardware Architecture

Title:Accelerating Neural Network Inference with Processing-in-DRAM: From the Edge to the Cloud

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators