[PDF][PDF] TPSS: A Flexible Hardware Support for Unicast and Multicast on Network-on-Chip.

W Hu, Z Lu, H Liu, A Jantsch - J. Comput., 2012 - jcomputers.us
W Hu, Z Lu, H Liu, A Jantsch
J. Comput., 2012jcomputers.us
Multicast is an important traffic mode that runs on multi-core systems, and an efficient
hardware support for multicast can greatly improve the performance of the whole system.
Most multicast solutions use the dimension-order routing to generate the mutlicast trees,
which are neither bandwidth nor power efficient. This article presents a synthesizable router
for network-on-chip (NoC) which supports arbitrarily shaped multicast path based on a mesh
topology. In our scheme, incremental setup is adopted to simplify the process of multicast …
Abstract
Multicast is an important traffic mode that runs on multi-core systems, and an efficient hardware support for multicast can greatly improve the performance of the whole system. Most multicast solutions use the dimension-order routing to generate the mutlicast trees, which are neither bandwidth nor power efficient. This article presents a synthesizable router for network-on-chip (NoC) which supports arbitrarily shaped multicast path based on a mesh topology. In our scheme, incremental setup is adopted to simplify the process of multicast tree construction. For each sub-path setup, we present a novel scheme called two period sub-path setup (TPSS). TPSS is divided into two periods: routing to a predeterminate intermediate router, and updating lookup tables from the intermediate router to destination. This novel setup makes it feasible to support arbitrarily shaped path setup. In our case study, Optimized tree algorithm (OPT) and Left-XY-Right-Optimized tree algorithm (LXYROPT) are proposed for power-efficient path searching, but they need to be pre-configured for the reason of high computation cost. Moreover, Virtual Circuit Tree Multicasting (VCTM) is also supported in our scheme for dynamic construction of multicast path, which needs no computation in path searching. The performance is evaluated by using a cycle accurate simulator developed in SystemC, and the hardware overhead is estimated by using a synthesizable HDL model. Compared to VCTM (without FIFO, multicast table and network adapter), the area overhead of implementing our router is negligible (less than 0.5%).
jcomputers.us
Showing the best result for this search. See all results