Maximizing cnn throughput on fpga clusters

R Li, K Liu, M Zhao, Z Shen, X Cai, Z Jia - Proceedings of the 2020 ACM …, 2020 - dl.acm.org
R Li, K Liu, M Zhao, Z Shen, X Cai, Z Jia
Proceedings of the 2020 ACM/SIGDA International Symposium on Field …, 2020dl.acm.org
Field Programmable Gate Array (FPGA) platform has been a popular choice for deploying
Convolutional Neural Networks (CNNs) as a result of its high parallelism and low energy
consumption. Due to the limitation of on-chip resources on a single board, FPGA clusters
become promising solutions to improve the throughput of CNNs. In this paper, we firstly put
forward strategies to optimize the resource allocation intra and inter FPGA boards. Then we
model the multi-board cluster problem and design algorithms based on knapsack problem …
Field Programmable Gate Array (FPGA) platform has been a popular choice for deploying Convolutional Neural Networks (CNNs) as a result of its high parallelism and low energy consumption. Due to the limitation of on-chip resources on a single board, FPGA clusters become promising solutions to improve the throughput of CNNs. In this paper, we firstly put forward strategies to optimize the resource allocation intra and inter FPGA boards. Then we model the multi-board cluster problem and design algorithms based on knapsack problem and dynamic programming to calculate the optimal topology of the FPGA clusters. We also give a quantitative analysis of the inter-board data transmission bandwidth requirement. To make our design accommodate for more situations, we provide solutions for deploying fully connected layers and special convolution layers with large memory requirement. Experimental results show that typical well-known CNNs with the proposed topology of FPGA clusters could obtain a higher throughput per board than single-board solutions and other multi-board solutions.
ACM Digital Library
Showing the best result for this search. See all results