The SNAP project: Design of floating point arithmetic units
SF Oberman, H Al-Twaijry… - Proceedings 13th IEEE …, 1997 - ieeexplore.ieee.org
SF Oberman, H Al-Twaijry, MJ Flynn
Proceedings 13th IEEE Sympsoium on Computer Arithmetic, 1997•ieeexplore.ieee.orgIn recent years computer applications have increased in their computational complexity. The
industry wide usage of performance benchmarks, such as SPECmarks, and the popularity of
3D graphics applications forces processor designers to pay particular attention to
implementation of the floating point unit, or FPU. The paper presents results of the Stanford
subnanosecond arithmetic processor (SNAP) research effort in the design of hardware for
floating point addition, multiplication and division. We show that one cycle FP addition is …
industry wide usage of performance benchmarks, such as SPECmarks, and the popularity of
3D graphics applications forces processor designers to pay particular attention to
implementation of the floating point unit, or FPU. The paper presents results of the Stanford
subnanosecond arithmetic processor (SNAP) research effort in the design of hardware for
floating point addition, multiplication and division. We show that one cycle FP addition is …
In recent years computer applications have increased in their computational complexity. The industry wide usage of performance benchmarks, such as SPECmarks, and the popularity of 3D graphics applications forces processor designers to pay particular attention to implementation of the floating point unit, or FPU. The paper presents results of the Stanford subnanosecond arithmetic processor (SNAP) research effort in the design of hardware for floating point addition, multiplication and division. We show that one cycle FP addition is achievable 32% of the time using a variable latency algorithm. For multiplication, a binary tree is often inferior to a Wallace tree designed using an algorithmic layout approach for contemporary feature sizes (0.3 /spl mu/m). Further, in most cases two bit Booth encoding of the multiplier is preferable to non Booth encoding for partial product generation. It appears that for division, optimum area performance is achieved using functional iteration, and we present two techniques to further reduce average division latency.
ieeexplore.ieee.org
Showing the best result for this search. See all results