Tram: An open-source template-based reconfigurable architecture modeling framework

Y Qiu, Y Cao, Y Dai, W Yin… - 2022 32nd International …, 2022 - ieeexplore.ieee.org
Y Qiu, Y Cao, Y Dai, W Yin, L Wang
2022 32nd International Conference on Field-Programmable Logic and …, 2022ieeexplore.ieee.org
Coarse-grained reconfigurable architecture (CGRA) is a promising accelerator design
choice due to its high performance and power efficiency in the computation or data-intensive
application domains, such as security, multimedia, digital signal processing, machine
learning, and high-performance computing. CGRA consists of coarse-grained processing
elements (PEs) and interconnects that determine the architecture flexibility to support
different applications and also affect the performance and power efficiency significantly …
Coarse-grained reconfigurable architecture (CGRA) is a promising accelerator design choice due to its high performance and power efficiency in the computation or data-intensive application domains, such as security, multimedia, digital signal processing, machine learning, and high-performance computing. CGRA consists of coarse-grained processing elements (PEs) and interconnects that determine the architecture flexibility to support different applications and also affect the performance and power efficiency significantly. Although multiple types of interconnects have been proposed, a parameterized unified model is still lacking. In this paper, we propose a flexible and scalable CGRA template with a novel interconnect model that can unify the typical neighbor-to-neighbor, switch-based, and FPGA-like interconnects. Furthermore, we present TRAM, an open-source template-based reconfigurable architecture modeling framework that integrates the Chisel-based CGRA modeling, architecture intermediate representation (IR) and Verilog generation, dataflow graph (DFG) mapping, simulation, and evaluation. The mapping flow contains graph-based placement and routing, critical-path-driven data synchronization, and simulated-annealing-based optimization. We evaluate the impacts of the rich design parameters, which demonstrate the significance of such a flexible template to facilitate architecture optimization. Compared with the related work, TRAM can achieve a 4.1× smaller DFG latency and a faster mapping speed for both the 8×8 and 16×16 CGRAs. Moreover, TRAM is able to attain an extremely high PE utilization of 94.4 % on average by architecture tuning.
ieeexplore.ieee.org
Showing the best result for this search. See all results