Linear pseudosystolic array for partitioned matrix algorithms

JH Moreno, ME Figueroa, T Lang - … of VLSI signal processing systems for …, 1991 - Springer
Journal of VLSI signal processing systems for signal, image and video technology, 1991Springer
We describe a class-specific linear pseudosystolic array, with K processing elements,
suitable for partitioned execution of matrix algorithms. This array achieves high efficiency,
exploits pipelining within cells in a simple manner, has off-cells communication rate lower
than computation rate, a small storage inside each cell (whose size is independent of the
size of problems), and external storage. This array has been derived from the application of
the multimesh graph (MMG) method to a large class of matrix algorithms. Processing …
Abstract
We describe a class-specific linear pseudosystolic array, withK processing elements, suitable for partitioned execution of matrix algorithms. This array achieves high efficiency, exploits pipelining within cells in a simple manner, has off-cells communication rate lower than computation rate, a small storage inside each cell (whose size is independent of the size of problems), and external storage. This array has been derived from the application of the multimesh graph (MMG) method to a large class of matrix algorithms.
Processing elements (cells) use the decoupled access/execute model of computation, which requires two programs in each cell: one controlling the execution of operations and the other the data transfers. All storage modules in the array are accessed as FIFO queues, without the need for addressing mechanisms. We describe the proposed instruction set, which includes single-instruction loops with no overhead, and block-loops with just one extra instruction. Moreover, cells can nest up to three loops with no added overhead. These features are needed for mapping algorithms with the MMG method.
Mapping onto this array is illustrated using the LU-decomposition algorithm, and results obtained with other algorithms are also given. Estimates of performance indicate that it is possible to achieve over 85% efficiency, with low requirements in communication bandwidth and storage.
Springer
Showing the best result for this search. See all results