DORY: Lightweight memory hierarchy management for deep NN inference on IoT endnodes: work-in-progress

A Burrello, F Conti, A Garofalo, D Rossi… - Proceedings of the …, 2019 - dl.acm.org
Proceedings of the International Conference on Hardware/Software Codesign …, 2019dl.acm.org
IoT endnodes often couple a small and fast L1 scratchpad memory with higher-capacity but
lower bandwidth and speed L2 background memory. The absence of a coherent hardware
cache hierarchy saves energy but comes at the cost of labor-intensive explicit memory
management, complicating the deployment of algorithms with large data memory footprint,
such as Deep Neural Network (DNN) inference. In this work, we present DORY, a
lightweight software-cache dedicated to DNN Deployment Oriented to memoRY. DORY …
IoT endnodes often couple a small and fast L1 scratchpad memory with higher-capacity but lower bandwidth and speed L2 background memory. The absence of a coherent hardware cache hierarchy saves energy but comes at the cost of labor-intensive explicit memory management, complicating the deployment of algorithms with large data memory footprint, such as Deep Neural Network (DNN) inference. In this work, we present DORY, a lightweight software-cache dedicated to DNN Deployment Oriented to memoRY. DORY leverages static data tiling and DMA-based double buffering to hide the complexity of manual L1-L2 memory traffic management. DORY enables storage of activations and weights in L2 with less than 4% performance overhead with respect to direct execution in L1. We show that a 142 kB DNN achieving 79.9% on CIFAR-10 runs 3.2X faster compared to its execution directly from L2 memory while consuming 1.9X less energy.
ACM Digital Library
Showing the best result for this search. See all results