Vimn : Iscas'88

Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

A NOVEL RADIX-2 PIPELINE ARCHITECTURE FOR THE COMPUTATION OF THE DFT

Rainer Storn Institut fuer Netzwerk- und Systemtheorie, University of Stuttgart, Pfaffenwaldring 47, D-7000 Stuttgart

ABSTRACT

multiplication with UNmN to obtain

A new radix-2 DFT algorithm will be derived which supports a pipeline architecture realization. The amount of necessary multipliers for the new pipeline structure can be reduced by a factor up to four compared to the conventional and most often used radix-2 pipeline-FFT. This reduction is obtained without sacrificing computational speed.

-m
FN,m = N

N- 1 n=O -n

W-m [(N-1)-n] N

which represents the (N-l)st output value of a cyclic convolution of fn with the impulse response g = W-m(n+l)of the filter n N transfer Gm(z) which has the f unc t ion

INTRODUCTION Several applications of digital signal processing require dedicated hardware to compute the DFT and/or its inverse in order to cope with the fast processing speeds which are needed. Examples are high speed convolution or correlation in the radarand image-processing field [ 1 1 . . . [41. The pipeline-FFT transformer is a well-known solution to satisfy the above mentioned speed requirements and is described by many authors (cf. ref. in 141). In this paper, however, a new cascade Fourier transformer will be introduced, the most prominent feature of which is a reduced amount of multipliers. Indeed the number of multipliers amounts to merely one fourth of the number required by the classical pipeline-FFT if the input sequence to be processed is real. THE ALGORITHM Consider the DFT of a n N point sequence f n N - _x - 1 __ x - um N
(4)

with X = z - . Let us assume that N = 2 and concentrate first on the numerator of eq.(4) which can be factored according to
U-1

(XZU-

1) = (x2111)(x2

+ 1)

(5)

I t can be verified that the second term of the right hand side in eq. ( 5 ) can be decomposed into
U-1

1
i=O.

U-3

2-s1
=

i=O

s-1

5-2

(X2 - 2 x 2 * 2u-s+ln(2i+l) + 1) c o s

and Since

= 0 , 1 , ..

. ,N-1

(3)

vimN=

1 we can recast eq. ( 1 ) after

ISCAS88

1899 CH2458-8/88/0000-1899$1 01988 IEEE .OO

whereas the first term can be split up again following eq.(S) itself. The findings having emerged s o far can be applied to compose a tree-like structure to depict the different representations of XN- 1 by using eqs. (5) and (6). An example for N = Z4 = 16 is given in figure 1. The next step is crucial to understand the construction of the DFT filter tree that will be inferred from figure 1, a simplified version of which can be seen in figure 2a. If we swap the positions of the terms (X8-1) and ( X +l), the equal-signs to their right have to be replaced by multiplication-signs in order to maintain the validity of the equation for (X16-1) which now actually corresponds to two "horizontal" right hand sides (figure 2b) instead of a single "vertical" one (figure 2a). I t is easy to see that this swapping procedure can be applied to every term pair from the root up to the leaves of the 1 which renders 16 tree in figure "horizontal" equations for X16-1 i.e. N N representations for the nominator X -1. Now all filters Gm(z) emerge from this modified "swapped" tree by canceling the term X-U; (see eq. (4)). The appropriate
8

switches Gikbehave in the opposite way. In case of belonging to the coefficient of a multiplier, 6 i k denotes the Kronecker symbol. I t is recommended to refer to figure 6 in order to follow the way of processing of the architecture. Compared to a conventional pipeline-FFT, the new processor needs merely one fourth of the number of multipliers, as neither the complex radix-2 FFT nor its intricate derivatives [61,[71, tailored to process real input sequences, are suited to derive a pipeline architecture with a reduced amount of multipliers. INVERSE DFT Fast convolutions performed via the DFT call for a n inverse transform which should also provide for the unscrambling of the output data. This can be obtained in two steps being valid for any FFT algorithm : first the flowgraph of the complex version of the forward transform has to be transposed, then all multipliers have to be replaced by their complex conjugates. By applying these steps to the new algorithm, the pipeline architecture for the inverse transform can be designed. COMPLEX DFT If complex sequences have to be transformed two pipelines are needed for the real and imaginary part respectively. In this case the new architecture still needs only half the number of multipliers as required by the conventional pipeline-FFT. REFERENCES Martinson, L.W. and Smith, Matched Filtering with Pipelined Floating Point FFTs", IEEE Trans. ASSP, ASSP-23, 1975, pp. 222-234. [21 Dudgeon, D.E. and Mersereau, R . M . , "Multidimensional Signal Processing", Prent ice-Hall, NJ, 1984. [31 Johnston, J . A . , "Parallel Pipeline Fast Fourier Transformer", IEE Proc., Vol. 130, Part F , No.6, 1983, pp. 564-572. [41 Oppenheim, A.V. ,Ed.,"Applications of Digital Signal Processing", Prentice-Hall, NJ, 1978. [SI Bruun. G.,"Z-Transform DFT-Filters and FFTs", IEEE Trans. ASSP, ASSP-26, 1978, pp. 1047-1057. [61 Sorensen, H.V. et alii, "Real-valued FFT-Algorithms", IEEE Trans. ASSP, ASSP-35, 1987, pp. 849-863. [71 Storn, R. ,"Fast Algorithms for the Discrete Hartley Transform", AE , Band 40, Heft 4, Juli 1986. [11

final result in this example is the DFT filter tree shown in figure 3. THE PIPELINE ARCHITECTURE FOR REAL INPUT SIGNALS Figure 4 shows a f lowgraph representat ion is more of the new algorithm which convenient to derive an architecture from than is a filter tree. Note that the correspondence cos~(2'+*-k) = - c o s L k 2 ' + 1 21.1+1 has been made use of in order to decrease the amount of distinct multipliers. I t appears that the new algorithm leads to essentially the same flowgraph as the algorithm invented by Bruun [SI. Nevertheless, the new derivation unmasks Bruun's complicated coefficients to be simple cosine values and allows for the formulation of a relatively simple unscrambling procedure which will not be elaborated upon here. The most important novelty, however, is the transposition of this flowgraph into a pipeline architecture, a n example of which is depicted in figure 5 for the case N=16. The boxes SW1, SW2 and SW3 represent commutator switches [41 that are capable to provide either straight through or crisscross connections. The switches 6ik are closed for i=k and open otherwise. The

R. J . , "Digital

1900

( X 0- 1 )

16 ( X -1) =

(X -1)

16

Figure 2a.
0

Simplified representation of figure 1. The content of the rectangles No. I and I 1 is not shown any more.

n
Figure
1.
A

decomposition-tree

for
"='I

the Figure 2b. If the terms ( X8 - 1 ) and ( X 8 + 1 ) in figure 2 a are swapped, the mathematical signs have to be altered to maintain validity of the equation for
56,O -F16,8
56,L

term (Xl6-l). " 0 " and denote the mathematical multiplication- and equal-sign respectively.

(XI6-1 1 , which equals two right hand sides now.

%,I2
%,2 %,U
56,6

Figure 3

!n

%,lo
46,l

h,15
56,7

The depicted filtertree can be inferred from figure 1 by SUC cessive application of the swapping procedure from figure 2 and division by the appropriate terms
(X-W,m).

q6,9

-%,3
56,13
36,5
I

$6.1 1

1901

F i g u r e 5. P i p e l i n e s t r u c ture to evaluate the DFT of f i g u r e 4.

sw 2

1 I
2c; ;, 2
L

I
2c;

i
0 0
2c;

straight

sw 3
c r i sscross

Figure 6 .
2c;
C I;

c1

2c; 2cl)

zc; 2c;

zc;

0 Z; C

2cg 2,;

2c;

c2
&+n3
input upper output

2 ~ ZC;~ZC:, ; ~

zc; o
S
f5 3
f6

o
0
f7

ZC; ZC; Z C ; ~
4

zq5 o
5

o
0

ZC:, ZC; 2 c I Z C I 2 ~ 2 ~ 3o 3 1 6 1 6 16 I6

o
0

Timing-diagram for the DFT-processor o f f i g u r e 5.

6
f,
.

2
f2
.

7 f3
.

I
fL

7
fll

3 flL

L
-

2
-

7
-

5
-

3
-

fo

fg f g

fI0

f I 2 f,3

f,5

lower output

1902

You might also like