DD
DD
DD
A Thesis
by
BHAVISHYA MURUKUTLA
MASTER OF SCIENCE
DECEMBER 2013
(December 2013)
In most of the Communication Systems the Fourier transform is the main concept to process the
signals which are used in the system. Then the FFT/IFFT comes in to the picture for fast signal
processing, but the FFT/IFFT has some delays present and area of the implementation is very
large [1]. So we need to design an architecture which is optimized in terms of both delay and
area.
The Main Reason for the delay and complexity of the architecture due to the complex
multiplications implementation present in the Fast Fourier Transform due to the twiddle factors
(𝑤𝑁𝑟 ). Then the proposed pipeline architecture leads to decreasing of the complex multiplications
Here we propose Pipeline architecture which removes the complex multiplications using the
twiddle factors. In this Pipeline architecture we are going to use the delay elements, switch
elements and basic butterfly structure and the input data stream is divided in to two half streams
and processing is done at the same time and output data stream will also be in two half data
streams and here the number of complex multiplications will be reduced a lot which reduces the
cost of implementation.
iii
ACKNOWLEDGEMENT
I consider this opportunity to show my gratitude towards my advisor chair Dr. Reza
Nekovei, for his invaluable guidance throughout this thesis work, my career choices and for
Dr. Lifford McLauchlan and Dr. Sung-won Park. This thesis would not have been possible or
I would like to thank all the faculty and members and staff of Texas A&M University-
Kingsville for the timely response and help I received during my course of study here.
I would like to thank my sister and all my friends for being there for me with constant
encouragement.
I wish to deliver my deep love and thankfulness to my dear parents. Their selfless and
Finally, I thank God for pouring out his wisdom and knowledge on me.
iv
TABLE OF CONTENTS
Page
ABSTRACT................................................................................................................................... iii
ACKNOWLEDGEMENT ............................................................................................................ iv
REFERENCES ..............................................................................................................................39
APPENDIX A CODE FOR IMPLEMENTATION OF DIT FFT AND PIPELINED FFT ..........42
VITA ..............................................................................................................................................70
v
LIST OF FIGURES
Page
Fig. 2.8. Input to Output sequence generation of Decimation in Frequency FFT .........................18
vi
Fig. 4.4. Worst Case Delay ............................................................................................................35
vii
LIST OF TABLES
Page
viii
CHAPTER I
INTRODUCTION
In most of the communication systems [1] and control system the frequency spectrum of
the signal is important to calculate the frequency range of the signal to know whether the system
Most of the signals are in time domain in which the variation is represented with respect to time
so to get the frequency domain signal which means to know the signal variation with respect to
frequency [2] we need a transformation from time domain to frequency domain which is done by
2) Fourier Transform(FT)
3) Laplace Transform
4) Z-transform
The Fourier series is applied only for the repetitive signals or periodic signals so we are
going to the Fourier Transform which can be applied for the periodic and non-periodic signals
[3] also.
components. The FT can be done by using the Discrete Fourier Transform in which equally
[4].
1
F(k) is the frequency domain signal
The Frequency domain output F(k) is discrete signal as the input considered should be in discrete
ones.
The Fourier Transform can also be done by using the DTFT in which the input signal should be
𝐹 𝑒 𝑗𝜔 = 𝑛=∞
𝑛=−∞ 𝑓(𝑛) 𝑒
−𝑗𝜔𝑛
………………. (1.2)
Where 𝐹(𝑒 𝑗𝑤 )is the Frequency domain signal which is continuous and periodic one
ω is the frequency
If the input is continuous signal we need to do sampling and get the output as discrete
signal and apply the DTFT technique to get the frequency domain signal.
Applying DFT and DTFT to the signals in time domain leads to frequency domain of the signal
For a N point the conversion can be done by using the following [3] [4]
domain:
64 complex multiplications
56 complex additions
If we increase the numbers of samples in the input sequence the multiplications going to increase
very rapidly.
2
Let us consider the 16-point sequence the conversion requires the following:
To reduce the no of complex multiplications and additions [4], we are going to use FFT
technique to calculate the frequency domain of the signal the conversion requires the following:
The Fast Fourier Transform is done using the COOLEY-TUKEY [6] algorithm which is
Basically, Fast Fourier Transform can be done by using radix algorithm which can be of
type radix-r, r can be any integer and the N-point FFT can be calculated by using different radix
like radix-2, radix-4 and radix-8 and so-on. The FFT can be implemented by using two different
To increase the speed, the pipeline architecture [6] [7] is used in the computation of FFT
and in particularly Multi Delay Commutator [6] [8] [9] [10] architecture is used in the
communication systems. In the Pipeline architecture, we also use a butterfly element and the
butterfly can be done by using different radix like radix-2 and radix-4 and in this the elements
The implementation of FFT using the DIT FFT for 8-Point sequence is done using the
Verilog and synthesized in Quartus-II [14] and the Pipeline architecture for 64-Point is done
3
1.2 Radix:
In this the Radix means number of elements can be taken in at a time and processing can
be done using the Butterfly if it is a Radix-2 the input elements will be „2‟ and the processing
like addition and multiplication operations are done and the output can be obtained. If it is
Radix-4 the input elements will be „4‟ and the output elements will be „4‟ at a time.
1.3 Verilog:
Basically the hardware description languages are different from the software description
languages and the mostly used hardware description languages are as follows:
1) VHDL
2) Verilog HDL
1.4 Quartus-II
The synthesis of designed code will be done by using the Quartus-II and to do the
synthesis first we need to do simulation in Modelsim and the synthesis and implementation is
done, placing of Integrated circuits, allocating pins respectively and the timing analysis will be
After the synthesis we get the different views of the circuit we designed they are:
1) RTL view
The dumping of the program in to the hardware can also be done by using QUARTUS-II
4
In the first step coding can be done by using the different hardware description
languages. In this we are using the Verilog Language But we can also use the VHDL language
and the synthesis and implementation can also be done by using Xilinx software also.
DESIGN
VERILOG Coding
Functional Simulation
Using MODLESIM
Synthesis and
Implementation
SIMULATION
PROGRAMMING
And
CONFIGURATION
5
The thesis is divided in to Chapter and subsections:
1) Chapter II: deals with theoretical description of different types of FFT algorithms
a) Cooley-Tukey Method
b) DIT FFT
c) DIF FFT
a) Simulation Results
b) RTL view
4) Chapter V: Conclusion
6
CHAPTER II
This Method is most used in the computation of FFT in this the DFT of N point is
N=N1*N2
point.
In the N1 and N2 one of them is small value compared with other one and if N1 is radix FFT can
be done by using Decimation in Time FFT and if N2 is radix FFT can be done by using
The operation done in recursive model by using radix-2 DFT‟s and the radix-2 DIT will be done
by multiplying the phase factor which is called as Twiddle factor to odd transform after that
addition and subtraction operation will be performed, butterfly of even and odd transform is
The Fast Fourier Transform can be done by using two different methods[4]:
This is done by dividing in to number of stages and they can be calculated as:
𝑣 = 𝑙𝑜𝑔2 𝑁
7
𝑁 𝑁
N-Point DFT with even N will be calculated with two ( 2 ) point DFT again each point DFT is
2
𝑁
done by using ( 4 ) point and so on until it reach to 2-point DFT‟s only.
Basically the Fast Fourier Transform can be done by using butterfly structure and the operation
In the two ways one is used in the DIT FFT and other is used in the DIF FFT
>
a c
𝑤𝑁𝑟
b d
>
𝑁
(𝑟+ )
2
𝑤𝑁
Factors.
𝑁
(𝑟+ )
𝑑 = 𝑎 + 𝑏𝑤𝑁 2
………………….... (2.2)
Twiddle Factor:
Twiddle Factor is root of a unity complex in the butterfly operation used to compute the discrete
Fourier transform
8
𝑤𝑁𝑟 = 𝑒 −𝑗 2𝜋𝑟 /𝑁 ……………………… (2.3)
The butterfly requires two complex multiplications and two complex additions we can
𝑁
( )
Consider 𝑤𝑁2 the value will be equal to 𝑒 −𝑗𝜋
As From the trigonometric equations 𝑒 −𝑗𝜃 = 𝑐𝑜𝑠Ɵ + 𝑗𝑠𝑖𝑛Ɵ the value can be calculated as
The value will be equal to “-1” the Twiddle Factor can becomes equal to the−𝑤𝑁𝑟 .
a 1 c
b d
𝑤𝑁𝑟 -1
9
Butterfly used in DIF FFT [1]:
1 >
a c
b 𝑤𝑁𝑟 d
-1 >
Fig. 2.3.Butterfly structure used in the DIF FFT
The results from the butterfly structure is given by
𝑐 = 𝑎 + 𝑏……………………. (2.8)
In the DIT FFT the input will be given in bit reversal order and the output will be in the
order.
In the DIF FFT the input will be in the correct order and the output will be in the bit
reversal order.
The Bit reversal order is generated using the exchange the first and last bits, the next bit
to first to the previous bit to the last bit present in the sequence and so on.
10
For getting the bit reversal order
Let us consider 8 point input the bit reversal order can be as shown below:
11
Original sample Binary Representation Bit reversal Order
X 0 X 000 X 000 = X 0
𝑋 5 𝑋 101 𝑋 101 = 𝑋 5
𝑋 6 𝑋 110 𝑋 011 = 𝑋 3
The algorithm in which the x(n) is break down in to smaller subsequences and the
principle of the decimation in time FFT can be explained by considering the No of i/p points in
𝑁 = 2𝑟
The x(n) is break down in to two parts in which one has only even parts and other has odd parts.
The Frequency domain can be obtained from the time domain by using the below formula:
𝑛=𝑁−1
𝑋 𝑘 = 𝑛=0 𝑥 𝑛 ∗ 𝑤𝑁𝑛𝑘 ………………………. (2.10)
And the breaking of the signal in to two subsequences leads to the frequency domain as
represented below:
12
𝑋 𝑘 = 𝑛𝑒𝑣𝑒𝑛 𝑥 𝑛 ∗ 𝑤𝑁𝑛𝑘 + 𝑛𝑜𝑑𝑑 𝑥 𝑛 ∗ 𝑤𝑁𝑛𝑘 ………………. (2.11)
Here n will be replaced by 2*r where r varies from 0 to (N/2)-1 the above equation can be
By the symmetry property we can break the Twiddle Factor and the frequency domain is sum of
𝐺(0)
𝑤𝑁0
G(1)
Even (N/2) Point
DFT G(2) 𝑤𝑁1
𝑤𝑁2
𝐺(3)
𝑤𝑁3
Output
𝐻(0) 𝑤𝑁4 frequency
responses
Odd (N/2) Point 𝐻(1) 𝑤𝑁5
=
DFT
𝐻(2) 𝑤𝑁6
𝐻(3) 𝑤𝑁7
Dividing the input sequence in to odd and even can be done by giving the input in bit reversal
13
𝑁 𝑁
Again each is divided in to two point DFT and so on the process is done till the
2 4
2-point DFT.
𝑋[0] 𝑋[0]
1) Radix-2
2) Radix-4
14
Radix-2 DIT FFT:
0 1 2 3 4 5 6 7
8- Point
0 2 4 6 1 3 5 7
0246 0246
4-Point
0 4 2 6 1 5 3 7
2- Point
15
Radix-4 FFT:
The Radix-4 basic butterfly diagram is as shown below:
𝑤𝑁0
𝑤𝑁𝑞 -1
-1
1
1
2𝑞
𝑤𝑁 -1
-1
3𝑞
𝑤𝑁 -j
16
2.3 DIF FFT [1]:
The DIF FFT can be done with i/p in normal order and the o/p in the bit reversal order.
𝑁
In this the N-point is divided in to two point sequences and the sequences can be shown as
2
below:
𝑁
The second sequence is with 𝑥 𝑛 + ( 2 ) where 0≤n≤ (N/2)-1
1) Radix-2
2) Radix-4
17
Radix-2 DIF FFT:
In this the 𝑁-point is divided in to two parts and the two parts are individually divided as
shown below:
0 1 2 3 4 5 6 7
0 1 2 3 4 5 6 7
Butterfly Computation
0 1 2 3 0 1 2 3
0 1 2 3 0 1 2 3
0 1 0 1 0 1 0 1
0 1 0 1 0 1 0 1
0 4 2 6 1 5 3 7
18
Computing the DFT of N-point i/p sequence x(n)
𝑛 =𝑁−1
𝑋(𝑘) = 𝑛 =0 𝑥(𝑛)𝑤𝑁𝑛𝑘 ……………………… (2.13)
𝑁 𝑁
−1 𝑁−1 (𝑛 + )
𝑋(𝑘) = 2
𝑛 =0 𝑥(𝑛)𝑤𝑁𝑛𝑘 + 𝑁 𝑥(𝑛)𝑤𝑁
2
…… (2.14)
𝑛=( )
2
𝑁
−1 𝑁
𝑥(𝑘) =[ 2
𝑛=0 (𝑥 𝑛 + −1 𝑘
𝑥(𝑛 + )]𝑤𝑁𝑛𝑘 … (2.15) where 𝑘 = 0,1,2 … … . (𝑁 − 1)
2
The butterfly structure used in DIF FFT is different from the Butterfly structure used in
DIT FFT. The Basic difference is in the DIT FFT Butterfly the multiplication is done before
additions but in the DIF FFT Butterfly the multiplication is done after additions.
𝑁
𝑥(𝑛) [𝑥 𝑛 + 𝑥(𝑛 + ]
2
𝑁
𝑥(𝑛 + ) -1 𝑤𝑁𝑛 [𝑥(𝑛) – 𝑥(𝑛 + (𝑁/2)] ∗ 𝑤𝑁𝑛
2
19
8-Point DIF FFT:
𝑋(0) 𝑋(0)
2-Point DFT
𝑋(1) 𝑋(4)
4-Point DFT
𝑋(2) -1 𝑋(2)
2-Point DFT
𝑋(3) -1 𝑋(6)
Radix- 4 FFT:
𝑋(𝑛) 𝑋(4𝑟)
𝑁
𝑋 𝑛+ 𝑤𝑛 𝑋 (4𝑟 + 1)
4
𝑁
𝑋(𝑛 + ) 𝑤 2𝑛
2
𝑋(4𝑟 + 2)
𝑤 3𝑛
3𝑁
𝑋 𝑛+ 𝑋(4𝑟 + 3)
4
20
In this the FFT, length can be calculated by using 4𝑣 here v is the number of stages and
𝑁
( )
𝑤𝑛 2 = (cos 𝜋 − 𝑗𝑠𝑖𝑛 𝜋 )𝑘 = (1)𝑘 ………………... (2.17)
3𝑁
( ) 3𝜋 3𝜋
𝑤𝑛 4 = (cos − 𝑗𝑠𝑖𝑛 )𝑘 = (𝑗)𝑘 ………..….... (2.18).
2 2
For the H/W architecture of the FFT there are three different types of architectures they are:
In all these the pipeline architecture [8] is very attractive in the multimedia
To reduce the complex multiplications further more we proposed pipeline architecture which
produces the low latency, power consumption will be low, throughput will be high and occupies
less area.
In this the MDC architecture is having the multiple input data because of its high
21
Single Path Delay feedback is best solution for the single input data stream and it is used when
the memory requirement is less but in the SDC architecture [18] usage of adders is very low but
the memory requirement will be more and the output will be in reversal order and we need to get
In this the input data stream is divided in to two parallel data streams and the processing
is done using the delay elements, butterfly elements and processing elements and in the MDC
architectures depending up on radix we are using the utilization of the resources will depend.
If we are using radix-r the utilization of the resources will be 1/r, r can be any integer and
if we are using radix-2 for the FFT computation the utilization of the resources will be 50%.
If we are using the radix-2 it is called as R2MDC pipeline architecture and the architecture using
X3X2X1X0
R R R
R B B B
E F F S
S F
R R
X7X6X5X4 G -j R
RR𝑅
Twiddle
Factor
22
The R represents the delay elements and BF is the butterfly structure used in the FFT, and
The Pipeline architecture can be implemented using different radix like radix-4 it is
called as R4MDC.By using the pipelined R2MDC architecture the complex multipliers will be
reduced compared with the normal DIT FFT and DIF FFT.
23
CHAPTER III
The implementation of the DIT FFT and Pipelined FFT is done using the Verilog
Starting with the implementation of DIT FFT of 8-point as it can be implemented using different
radix we are starting with Radix-2 Butterfly if we are using Radix-2 number of stages can be
calculated by using 𝑙𝑜𝑔2 𝑁 Here N is number of points in the input sequence [21] For 8-point
sequence it uses 3 stages in each stages there will be usage of different butterflies.
First Stage:
The first stage consists of four similar butterflies which is shown below:
X0 X10
X1𝑤𝑁𝑟 -1 X11
In the Twiddle Factor it consists of the both real and imaginary parts and it can be expressed as
In the outputs X10 and X11 also there are real and imaginary parts and they are obtained separately
24
If real and imaginary parts are not separated the output from the butterfly can be
obtained as:
Considering the real and imaginary parts in the Twiddle Factor we are going modify the
X10=(X0+(X1*(r+ji)))…………. (3.4)
X10r=(X0+(X1*r))……………….. (3.6)
X10i=(X1*i)……………………… (3.7).
X11=X11r+jX11i
X11r=(X0-(X1*r))…………………… (3.9)
The negative can be obtained by taking the two‟s Complement and the two complement of a
binary number can be calculated by using one‟s Complement and adding „1‟ to it.
25
X11i= (~(X1*i) +1)……………… (3.12)
The inputs to the first stage can be given in the bit reversal order as there are four
butterflies in the first stage the inputs given to the four butterflies shown below [22]:
In this the first stage the Twiddle Factor used is 𝑤80 which has real part equal to „1‟ and
Second Stage:
The second stage uses the four input butterfly which can be shown as below:
X10 X20
X11 X21
X13 X23
Here the Twiddle Factors used are 𝑤80 and 𝑤82 . In the second Twiddle Factor the imaginary part
26
The above two equations will be similar to the equations used in the butterfly which is
used in the first stage so we use same butterfly here imaginary parts of X10 and X12 are not
considered because the imaginary part in the X12 will be equal to zero and Twiddle Factor also
The other two outputs will be multiplied by the Twiddle Factor which has imaginary part
equal to „-1‟ and that will have an consideration of imaginary and real parts so we consider the
(X11r, X11i) and (X22r, X22i) the operation in this butterfly is done by using inversion only of real
Here the imaginary part will be equal to „0‟ and the real part is X11r
Here the imaginary part in X11i is equal to „0‟ and the imaginary part is equal to –X13r which is
calculated using two‟s complement which is addition of one‟s complement and one [25].
The other output can be obtained by using X11 and X13 is as shown below:
The real part will be equal to (X11r-X13i) here X13i will be equal to zero and the
X23r= X11r
The imaginary part will be equal to (X11i+X13r) here X11i will be equal to zero and the imaginary
part will be
27
X23i= X13r
The four input butterfly is a combination of the two 2-input butterflies one using the
Twiddle Factors and the other using the real and imaginary values of the inputs.
Stage 3:
The butterflies using the Twiddle Factors with 𝑤80 and 𝑤82 are already explained above
and the butterflies using 𝑤81 and 𝑤83 having value (0.707-j (0.707)) and (-0.707-j (0.707)) and the
X21 Y0
X22 Y1
X23 Y2
X24 Y3
X25 𝑤80 -1 Y4
X26 𝑤81 -1 Y5
X27 𝑤82 -1 Y6
X28 𝑤83 -1 Y7
28
The outputs from the butterfly can be obtained as:
This can be done by using the first stage butterfly only and the outputs from the butterfly
using the Twiddle Factor𝑤82 is done by using taking the two‟s complement of number and the
rest of the outputs can be obtained by using the shifting operation butterflies.
The X26 is having the real and imaginary parts and the Twiddle Factor also consists of
The internal ones are obtained by using the internal products and taking the two‟s complement
29
The real part is equal to Y5r= (X22r-((X26r*0.707) + (X26i*0.707))…….. (3.26)
They need four internal products and addition and subtraction is done using the two‟s
complement.
The other butterfly using the Twiddle Factor 𝑤83 which is equal to the -0.707-j(0.707) it
is also implemented using the partial products and also by taking the two‟s complementsit can be
as shown below:
After the simplification we get the values for the real and imaginary parts and it is as shown
below:
It is obtained using the two internal products and by using for subtraction the two‟s complement
30
Complex Complex
multiplications Additions
Normal DFT of 8- 64 56
Point
DFT of 8-Point using 12 24
FFT
The pipelined FFT is implemented using the delay elements and also switches and in this
the input buffer is used to store all the values that needs to be given as an input and the number
of delay elements need to be used depends up on N-Point sequence and as shown above for the
In this we are going to implement the 64-point R2MDC here we use the delay elements with 16,
8, 4 and 2 and switches will also be used of 16, 8, 4 and 2 and the input buffer will have memory
First the input is divided in to two half parallel streams and they are passed through the
delay elements and the switch operation is done. In this the half bits present in the data stream is
getting delayed by the delay elements and the processing will be done with the second half of the
data stream.
First delay of the second half data stream is done and it is delayed by 2 delay elements in
the 8-point but in the 64-point the delay of the 32-point is done with 16 elements and after the
switch operation again the delay operation is done and the butterfly processing is done and the 8
delay elements will be used and the processing is done and again the delay4, delay2 and 1 will be
used.
31
The output will also be obtained as two half sequences and inside the butterfly the
complex multiplications can be done by using the booth multiplication and addition operation the
Here the Twiddle Factors are computed and stored and they are given in synchronous to
the operation and the R2MDC of 64-point [21] [22] is as shown below:
X31X30X29……X0 16 ………………. 1
R B B
E F S S F
X63X62X61…….X32 16 ...…………….-j. 1
G
Here the input to delay element „1‟ is multiplied by –j and the processing using the
switch is done and the butterfly operation is done at the end also the butterfly used in the
32
CHAPTER IV
RESULTS
The FFT is implemented in Verilog and the simulation is done using the MODELSIM PE
The clk is given with duty cycle and period and the selection is done using the force
Here Y0r, Y0i, Y1r, Y1i ….Y7r, Y7i are the outputs and at the end one of them is displayed by
using sel.
This is synthesized using the Altera Quartus-II using Cyclone II EP2C35F672C6 device
33
Mux0
s el[2..0] SEL[2..0]
bfly1:s 12 bfly1:s 21
MUX
8' h04 -- x[7..0] x[7..0]
x0r[7..0] x0r[7..0] bfly2:s 33 Mux8
8' h40 -- y[7..0] y[7..0]
x1r[7..0] x1r[7..0] ENA
8' h01 -- wr[7..0] 8' h01 -- wr[7..0] xr[7..0] x0r[7..0]
x1i[7..0] x1i[7..0] CLR
8' h00 -- wi[7..0] 8' h00 -- wi[7..0] xi[7..0] x0i[7..0] SEL[2..0]
yr[7..0] x1r[7..0]
yi[7..0] x1i[7..0]
yi[7..0]~reg0
bfly1:s 13 PRE
OUT D Q yi[7..0]
8' h02 -- x[7..0] bfly2:s 24 DATA[7..0]
x0r[7..0] bfly3:s 32
8' h20 -- y[7..0]
x1r[7..0] xr[7..0] x0r[7..0]
8' h01 -- wr[7..0] xr[7..0]
x1i[7..0] xi[7..0] x0i[7..0]
8' h00 -- wi[7..0] xi[7..0] x0r[7..0]
yr[7..0] x1r[7..0]
yr[7..0] x0i[7..0]
yi[7..0] x1i[7..0]
yi[7..0] x1r[7..0]
SEL[2..0]
OUT
DATA[7..0]
MUX
Mux2
SEL[2..0]
OUT
DATA[7..0]
MUX
Mux3
SEL[2..0]
OUT
DATA[7..0]
MUX
Mux4
SEL[2..0]
OUT
DATA[7..0]
MUX
Mux5
SEL[2..0]
OUT
DATA[7..0]
MUX
Mux6
SEL[2..0]
OUT
DATA[7..0]
MUX
Mux7
SEL[2..0]
OUT
DATA[7..0]
MUX
Mux9
SEL[2..0]
OUT
DATA[7..0]
MUX
Mux10
SEL[2..0]
OUT
DATA[7..0]
MUX
Mux11
SEL[2..0]
OUT
DATA[7..0]
MUX
Mux12
SEL[2..0]
OUT
DATA[7..0]
MUX
Mux13
SEL[2..0]
OUT
DATA[7..0]
MUX
Mux14
SEL[2..0]
OUT
DATA[7..0]
MUX
Mux15
SEL[2..0]
OUT
DATA[7..0]
MUX
clk
By simulating the FFT using the Quartus-II, the resource utilization is as shown below:
34
Fig. 4.3. Resource Utilization summary
The Pipeline FFT is implemented using the Verilog and simulated using the MODELSIM
35
Fig. 4.5. Simulation Output for 64-Point Pipeline FFT
To get the output, the reset should have a value‟1‟ and din_valid should be „1‟. When the output
36
CHAPTER V
CONCLUSION
This Thesis work shows the implementation of the 8-Point FFT using the Verilog
Hardware description language and the implementation of 64 Point Pipeline FFT using the
Verilog Hardware Description Language. In the implementation of the 8-Point FFT using the
Verilog the synthesis is done using the Quartus-II and the RTL View of the 8-Point can be
observed and by observing the timing analysis the FFT has less time compared with the other
Mandeep Singh and Balwinder Singh the computation of DFT using the DIT FFT will have less
The implementation in the base paper is done using the VHDL but in the thesis it is done
language.
The implementation itself shows that the no of multiplications and additions are reduced
compared to normal one. Due to the reduced multiplications and additions the worst case delay
will be reduced and that leads use of FFT in most of the communication systems which uses the
computation of FT.
Description Language and in this the delay elements will be used it indicates the increase in the
delay but the complex multipliers will be reduced. From the paper published by Mounir Arioua
the complex multipliers are reduced compared with the multipliers in FFT implementation.
In this Thesis the RTL View of the FFT implementation using the Verilog is shown
37
The RTL view shown in the results is done by synthesizing the 8-Point FFT in the
The Pipeline architecture can be used when there is requirement of less resource usage
but when it is in point of time delay we can use the general FFT architecture because of using
delay elements in the Pipeline architecture. This study can be expanded by reducing the resource
usage further and also reducing the no of complex multiplications and additions required for the
38
REFERENCES
[2] Weidong Li, “Studies on implementation of lower power FFT processors”, Linkoping
Studies in Science and Technology ,Thesis No. 1030, ISBN 91-7373-692-9 , Linkoping,
the 10th International Parallel Processing symposium. (IPPS). pp.766-770, April 1996.
[4] Pawan Verma, Harpeet Kaur and Mandeep singh, “VHDL implementation of FFT/IFFT
[5] Johnson, L.,” Conflict Free Memory Addressing for Dedicated FFT Hardware”, IEEE
[6] J.W. Cooley and J.W. Tukey, “An algorithm for the machine calculation of complex
[7] W. Li and L. Wanhammar, "A Pipeline FFT Processor", IEEE Workshop on signal
[8] E.H. Wold and A.M. Despain, “Pipeline and Parallel Pipeline FFT processors for VLSI
[9] R. Stron, “Radix -2 FFT Pipeline architecture with reduced noise to signal ratio”, IEEE
39
[10] D. Cohen, “Simplified Control of FFT Hardware”, IEEE Transactions on Signal and
Science and Technology, Thesis No. 619, Linkoping University, Sweden, 1997.
[12] L.R. Rabiner and B. Gold, “Theory and Application of Digital signal Processing”,
[13] M. Petrov, M. Glesner, “Optimal FFT Architecture Selection for OFDM Receivers on
[14] J. Viejo, A. Millan, M.J. Bellido, ”Design Of a FFT/IFFT module as an IP core suitable
for embedded systems”, IEEE Transactions on Industrial Embedded Systems, pp. 337-
340, 2007.
[15] J. Melander, “Design Of SIC FFT Architectures”, Linkoping Studies in Science and
[16] U.M. Baese , Digital signal Processing with Field Programmable Gate Arrays, 3rd edition
Springer,2007.
[17] Weidong. Li, Mark Vesterbacka and Lars Wanhammar, “An FFT Processor Based On 16
[18] Y. Ma, “A VLSI oriented Parallel FFT algorithm”, IEEE Transactions on Signal
[19] E.E. Swatzlander., W.K.W. Young, and S.J. Joseph, “A radix-4 delay commutator for
fast Fourier transforms processor implementation”, IEEE J. Solid- State Circuits, SC-
40
[20] Yunho Jung, Hongil Yoon and Jaeseok Kim, "New Efficient FFT Algorithm and Pipeline
[22] Hsin-Lei Lin, Hongchin Lin, Yu Chuan Chen and Robert C. Chang, “A Novel Pipelined
Fast Fourier Transform Architecture for Double Rate OFDM Systems”, IEEE
[23] Shousheng He and Mats Torkelson, “Design and Implementation of a 1024- Point
Pipeline FFT processor”, Custom Integrated Circuits Conference, IEEE, pp. 131-134,
1998.
[24] H.L. Lin, H. Lin, Y.C. Chen and R.C. Chang, “A Novel Pipelined Fast Fourier Transform
Architecture for Double Rate OFDM Systems”, IEEE workshop on signal processing
[25] K. Maharatna, E. Grass and U. Jaghold,” A Low power 64 Point FFT/IFFT Architecture
Hamburg, 2000.
41
APPENDIX A
always@(posedgeclk)
case(sel)
0:beginyr=y0r; yi=y0i; end
1:beginyr=y1r; yi=y1i; end
2:beginyr=y2r; yi=y2i; end
3:beginyr=y3r; yi=y3i; end
4:beginyr=y4r; yi=y4i; end
5:beginyr=y5r; yi=y5i; end
6:beginyr=y6r; yi=y6i; end
7:beginyr=y7r; yi=y7i; end
42
endcase
endmodule
module bfly1(x,y,wr,wi,x0r,x0i,x1r,x1i);// sub module
input [7:0]x,y,wr,wi;
output[7:0]x1r,x1i,x0r,x0i;
assign x0r=x+(y*wr);
assign x0i=y*wi;
assign x1r=x+(~(y*wr)+1);
assign x1i=~(y*wi)+1;
endmodule
module bfly2(xr,xi,yr,yi,x0r,x0i,x1r,x1i); // sub module
input [7:0]xr,xi,yr,yi;
output [7:0]x0r,x0i,x1r,x1i;
assign x0r=xr;
assign x0i=~yr+1;
assign x1r=xr;
assign x1i=yr;
endmodule
module bfly3(xr,xi,yr,yi,wr,wi,x0r,x0i,x1r,x1i); // sub module
input [7:0]xr,xi,yr,yi,wr,wi;
output [7:0]x0r,x0i,x1r,x1i;
wire [15:0]p1,p2,p3,p4;
wire [7:0]win,yrn,yin;
wire [8:0]ywr,ywi;
parametersht=8'b1000;
assignyrn=~yr+1;
assign yin=yi;
assign win=~wi+1;
assign p1=(yrn*wr)>>sht;
assign p2=(yin*win)>>sht;
assign p3=(yrn*win)>>sht;
assign p4=(yin*wr)>>sht;
assignywr=(~p1+1)+p2;
assignywi=p3+p4;
assign x0r=xr+ywr;
assign x0i=xi+ywi;
assign x1r=xr+(~ywr+1);
assign x1i=xi+(~ywi+1);
endmodule
module bfly4(xr,xi,yr,yi,wr,wi,x0r,x0i,x1r,x1i); // sub module
input [7:0]xr,xi,yr,yi,wr,wi;
output [7:0]x0r,x0i,x1r,x1i;
wire [15:0]p1,p2;
wire [7:0]win,yrn,yin;
wire [8:0]ywr,ywi;
parametersht=8'b1000;
43
assignyrn=~yr+1;
assign yin=~yi+1;
assign win=~wi+1;
assign p1=(yrn*win)>>sht;
assign p2=(yin*win)>>sht;
assignywr=p1+(~p2+1);
assignywi=p1+p2;
assign x0r=xr+ywr;
assign x0i=xi+ywi;
assign x1r=xr+(~ywr+1);
assign x1i=xi+(~ywi+1);
endmodule
Implementation of 64 Point Pipeline FFT:
`timescale 1ns/1ns // main module
module tb_fft64;
regclk,reset,din_valid;
reg [9:0] din_re,din_im;
wire [9:0] dout_re,dout_im;
wiredout_valid;
fft64 f1(
.clk(clk),.reset(reset),.din_valid(din_valid),.din_re(din_re),.din_im(din_im),.dout_re(dout_re),.do
ut_im(dout_im) dout_valid(dout_valid) );
always #20 clk=~clk;
integer file;
initial begin
clk=0;
reset=0;
din_valid=0;
din_re=10'b0;din_im=10'b0;
#80 reset=1;din_valid=1;
din_re=10'b0010110100;din_im=10'b1000010101;
repeat(200)begin
#40 din_re=din_re+1;din_im=din_im+1;
file=$fopen("result_out.txt") | 1;
$fdisplay(file, "(%d) + (%d )*j ;", dout_re, dout_im );
end
end
endmodule
//submodule of fft64
`timescale 1ns/1ns
module fft64(clk,reset, din_valid, din_re,din_im, //first_r,first_i,last_r,last_i,
dout_re,dout_im,dout_valid);
parameter IN_WIDTH=10;
input clk,reset,din_valid;
input [IN_WIDTH-1:0] din_re,din_im;
output [IN_WIDTH-1:0] dout_re,dout_im;
44
outputdout_valid;
wiredout_valid;
wire [IN_WIDTH-1:0] first_r,first_i,last_r,last_i;
wire [IN_WIDTH-1:0] r0_0,i0_0,r32_0,i32_0;//the output signals of buffer
wire [IN_WIDTH-1:0] br0_1,bi0_1,br32_1,bi32_1;
wire [IN_WIDTH-1:0] dr32_1,di32_1,sr0_1,si0_1;
wire [IN_WIDTH-1:0] r0_1,i0_1,r32_1,i32_1;//the output signals of first stage
wire [IN_WIDTH-1:0] br0_2,bi0_2,br32_2,bi32_2;
wire [IN_WIDTH-1:0] dr32_2,di32_2,sr0_2,si0_2;
wire [IN_WIDTH-1:0] r0_2,i0_2,r32_2,i32_2;//the output signals of second stage
wire [IN_WIDTH-1:0] br0_3,bi0_3,br32_3,bi32_3;
wire [IN_WIDTH-1:0] dr32_3,di32_3,sr0_3,si0_3;
wire [IN_WIDTH-1:0] r0_3,i0_3,r32_3,i32_3;//the output signals of thirdstage
wire [IN_WIDTH-1:0] br0_4,bi0_4,br32_4,bi32_4;
wire [IN_WIDTH-1:0] dr32_4,di32_4,sr0_4,si0_4;
wire [IN_WIDTH-1:0] r0_4,i0_4,r32_4,i32_4;//the output signals of four stage
wire [IN_WIDTH-1:0] br0_5,bi0_5,br32_5,bi32_5,ir32_5,ii32_5;
wire [IN_WIDTH-1:0] dr32_5,di32_5,sr0_5,si0_5;
wire [IN_WIDTH-1:0] r0_5,i0_5,r32_5,i32_5;//the output signals of five stage
wire [IN_WIDTH-1:0] r0_6,i0_6,r32_6,i32_6;
wireclk_in;
wire [4:0]count;
///***************** control *********************************************///
clk_divclk_div(.clk(clk),.reset(reset),.hclk(clk_in));
control control(.clk(clk_in),.reset(reset),.count(count));
input_buffer i1(.wclk(clk),.rclk(clk_in),.reset(reset),.din_valid(din_valid),
.indata_r(din_re),.indata_i(din_im), .first_r(r0_0),.first_i(i0_0),.last_r(r32_0),.last_i(i32_0));
//*****************first stage*********************************************///
bm
b1(.clk(clk_in),.reset(reset),.address(count),.ar(r0_0),.ai(i0_0),.br(r32_0),.bi(i32_0),.r0(br0_1),.i0
(bi0_1),.r16(br32_1),.i16(bi32_1));
delay16 d32_1(.clk(clk_in),.reset(reset),.x_r(br32_1),.x_i(bi32_1), .y_r(dr32_1),.y_i(di32_1));
switch16 s1(.count(count),.x0_r(br0_1),.x0_i(bi0_1),.x1_r(dr32_1),.x1_i(di32_1),
.y0_r(sr0_1),.y0_i(si0_1),.y1_r(r32_1),.y1_i(i32_1));
delay16 d0_1(.clk(clk_in),.reset(reset),.x_r(sr0_1),.x_i(si0_1),.y_r(r0_1),.y_i(i0_1));
///*****************second stage*********************************************///
bm
b2(.clk(clk_in),.reset(reset),.address({count[3:0],1'b0}),.ar(r0_1),.ai(i0_1),.br(r32_1),.bi(i32_1),
.r0(br0_2),.i0(bi0_2),.r16(br32_2),.i16(bi32_2));
delay8 d32_2(.clk(clk_in),.reset(reset),.x_r(br32_2),.x_i(bi32_2), .y_r(dr32_2),.y_i(di32_2));
switch8 s2(.count(count[3:0]),.x0_r(br0_2),.x0_i(bi0_2),.x1_r(dr32_2),.x1_i(di32_2),
.y0_r(sr0_2),.y0_i(si0_2),.y1_r(r32_2),.y1_i(i32_2));
delay8 d0_2(.clk(clk_in),.reset(reset),.x_r(sr0_2),.x_i(si0_2),.y_r(r0_2),.y_i(i0_2));
///*************************third stage *********************************///
45
bm
b3(.clk(clk_in),.reset(reset),.address({count[2:0],2'b0}),.ar(r0_2),.ai(i0_2),.br(r32_2),.bi(i32_2),
.r0(br0_3),.i0(bi0_3),.r16(br32_3),.i16(bi32_3));
delay4 d32_3(.clk(clk_in),.reset(reset),.x_r(br32_3),.x_i(bi32_3), .y_r(dr32_3),.y_i(di32_3));
switch4
s3(.count(count[2:0]),.x0_r(br0_3),.x0_i(bi0_3),.x1_r(dr32_3),.x1_i(di32_3),.y0_r(sr0_3),.y0_i(s
i0_3),.y1_r(r32_3),.y1_i(i32_3));
delay4 d0_3(.clk(clk_in),.reset(reset),.x_r(sr0_3),.x_i(si0_3),.y_r(r0_3),.y_i(i0_3));
///*************************four stage *********************************///
bm
b4(.clk(clk_in),.reset(reset),.address({count[1:0],3'b0}),.ar(r0_3),.ai(i0_3),.br(r32_3),.bi(i32_3),
.r0(br0_4),.i0(bi0_4),.r16(br32_4),.i16(bi32_4));
delay2 d32_4(.clk(clk_in),.reset(reset),.x_r(br32_4),.x_i(bi32_4), .y_r(dr32_4),.y_i(di32_4));
switch2 s4(.count(count[1:0]),.x0_r(br0_4),.x0_i(bi0_4),.x1_r(dr32_4),.x1_i(di32_4),
.y0_r(sr0_4),.y0_i(si0_4),.y1_r(r32_4),.y1_i(i32_4));
delay2 d0_4(.clk(clk_in),.reset(reset),.x_r(sr0_4),.x_i(si0_4),.y_r(r0_4),.y_i(i0_4));
///*************************five stage *********************************///
butterfly b5(.a_r(r0_4),.a_i(i0_4),.b_r(r32_4),.b_i(i32_4),
.a1_r(br0_5),.a1_i(bi0_5),.b1_r(br32_5),.b1_i(bi32_5));
inverter i5(.count(count[0]),.a_r(br32_5),.a_i(bi32_5),.a1_r(ir32_5),.a1_i(ii32_5));
delay1 d32_5(.clk(clk_in),.reset(reset),.x_r(ir32_5),.x_i(ii32_5), .y_r(dr32_5),.y_i(di32_5));
switch1 s5(.count(count[0]),.x0_r(br0_5),.x0_i(bi0_5),.x1_r(dr32_5),.x1_i(di32_5),
.y0_r(sr0_5),.y0_i(si0_5),.y1_r(r32_5),.y1_i(i32_5));
delay1 d0_5(.clk(clk_in),.reset(reset),.x_r(sr0_5),.x_i(si0_5),.y_r(r0_5),.y_i(i0_5));
///*************************six stage *********************************///
butterfly b6(.a_r(r0_5),.a_i(i0_5),.b_r(r32_5),.b_i(i32_5),
.a1_r(r0_6),.a1_i(i0_6),.b1_r(r32_6),.b1_i(i32_6));
dataout dataout(.clk(clk),.reset(reset),.first_r(r0_6),.first_i(i0_6),.last_r(r32_6),.last_i(i32_6),
.dout_re(dout_re),.dout_im(dout_im),.dout_valid(dout_valid));
//always @(posedgeclk_in)
//begin
//end
Endmodule
For dividing the clock:
`timescale 1ns/1ns
Moduleclk_div(clk,reset, hclk);
input clk,reset;
output hclk;
reghclk;
//reg count;
always @(posedgeclk or negedge reset)
begin
if (!reset)
hclk<=0;
else
hclk<=hclk+1;
46
end
endmodule
// control block implementation
`timescale 1ns/1ns
module control( clk,reset, count);
input clk,reset;
output [4:0] count;
reg [4:0] count;
always @(posedgeclk or negedge reset)
begin
if (!reset) begin
count<=5'b11111;
end
else begin
count<=count+1;
end
end
endmodule
// input buffer
`timescale 1ns/1ns
Moduleinput_buffer(wclk,rclk,reset,din_valid, indata_r, indata_i,first_r, last_r,first_i, last_i);
parameter IN_WIDTH=10;
inputwclk,rclk,reset,din_valid;
input [IN_WIDTH-1:0] indata_r,indata_i;
output [IN_WIDTH-1:0] first_r,last_r,first_i,last_i;
47
mem_r[8]=10'b0; mem_i[8]=10'b0;mem_r[9]=10'b0; mem_i[9]=10'b0;mem_r[10]=10'b0;
mem_i[10]=10'b0;mem_r[11]=10'b0;mem_i[11]=10'b0;
mem_r[12]=10'b0; mem_i[12]=10'b0;mem_r[13]=10'b0; mem_i[13]=10'b0;mem_r[14]=10'b0;
mem_i[14]=10'b0;mem_r[15]=10'b0;mem_i[15]=10'b0;
mem_r[16]=10'b0; mem_i[16]=10'b0;mem_r[17]=10'b0; mem_i[17]=10'b0;mem_r[18]=10'b0;
mem_i[18]=10'b0;mem_r[19]=10'b0;mem_i[19]=10'b0;
mem_r[20]=10'b0; mem_i[20]=10'b0;mem_r[21]=10'b0; mem_i[21]=10'b0;mem_r[22]=10'b0;
mem_i[22]=10'b0;mem_r[23]=10'b0;mem_i[23]=10'b0;
mem_r[24]=10'b0; mem_i[24]=10'b0;mem_r[25]=10'b0; mem_i[25]=10'b0;mem_r[26]=10'b0;
mem_i[26]=10'b0;mem_r[27]=10'b0;mem_i[27]=10'b0;
mem_r[28]=10'b0; mem_i[28]=10'b0;mem_r[29]=10'b0; mem_i[29]=10'b0;mem_r[30]=10'b0;
mem_i[30]=10'b0;mem_r[31]=10'b0;mem_i[31]=10'b0;
mem_r[32]=10'b0; mem_i[32]=10'b0;mem_r[33]=10'b0; mem_i[33]=10'b0;mem_r[34]=10'b0;
mem_i[34]=10'b0;mem_r[35]=10'b0;mem_i[35]=10'b0;
mem_r[36]=10'b0; mem_i[36]=10'b0;mem_r[37]=10'b0; mem_i[37]=10'b0;mem_r[38]=10'b0;
mem_i[38]=10'b0;mem_r[39]=10'b0;mem_i[39]=10'b0;
mem_r[40]=10'b0; mem_i[40]=10'b0;mem_r[41]=10'b0; mem_i[41]=10'b0;mem_r[42]=10'b0;
mem_i[42]=10'b0;mem_r[43]=10'b0;mem_i[43]=10'b0;
mem_r[44]=10'b0; mem_i[44]=10'b0;mem_r[45]=10'b0; mem_i[45]=10'b0;mem_r[46]=10'b0;
mem_i[46]=10'b0;mem_r[47]=10'b0;mem_i[47]=10'b0;
mem_r[48]=10'b0; mem_i[48]=10'b0;mem_r[49]=10'b0; mem_i[49]=10'b0;mem_r[50]=10'b0;
mem_i[50]=10'b0;mem_r[51]=10'b0;mem_i[51]=10'b0;
mem_r[52]=10'b0; mem_i[52]=10'b0;mem_r[53]=10'b0; mem_i[53]=10'b0;mem_r[54]=10'b0;
mem_i[54]=10'b0;mem_r[55]=10'b0;mem_i[55]=10'b0;
mem_r[56]=10'b0; mem_i[56]=10'b0;mem_r[57]=10'b0; mem_i[57]=10'b0;mem_r[58]=10'b0;
mem_i[58]=10'b0;mem_r[59]=10'b0;mem_i[59]=10'b0;
mem_r[60]=10'b0; mem_i[60]=10'b0;mem_r[61]=10'b0; mem_i[61]=10'b0;mem_r[62]=10'b0;
mem_i[62]=10'b0;mem_r[63]=10'b0;mem_i[63]=10'b0;
mem_r[64]=10'b0; mem_i[64]=10'b0;mem_r[65]=10'b0; mem_i[65]=10'b0;mem_r[66]=10'b0;
mem_i[66]=10'b0;mem_r[67]=10'b0;mem_i[67]=10'b0;
mem_r[68]=10'b0; mem_i[68]=10'b0;mem_r[69]=10'b0; mem_i[69]=10'b0;mem_r[70]=10'b0;
mem_i[70]=10'b0;mem_r[71]=10'b0;mem_i[71]=10'b0;
mem_r[72]=10'b0; mem_i[72]=10'b0;mem_r[73]=10'b0; mem_i[73]=10'b0;mem_r[74]=10'b0;
mem_i[74]=10'b0;mem_r[75]=10'b0;mem_i[75]=10'b0;
mem_r[76]=10'b0; mem_i[76]=10'b0;mem_r[77]=10'b0; mem_i[77]=10'b0;mem_r[78]=10'b0;
mem_i[78]=10'b0;mem_r[79]=10'b0;mem_i[79]=10'b0;
mem_r[80]=10'b0; mem_i[80]=10'b0;mem_r[81]=10'b0; mem_i[81]=10'b0;mem_r[82]=10'b0;
mem_i[82]=10'b0;mem_r[83]=10'b0;mem_i[83]=10'b0;
mem_r[84]=10'b0; mem_i[84]=10'b0;mem_r[85]=10'b0; mem_i[85]=10'b0;mem_r[86]=10'b0;
mem_i[86]=10'b0;mem_r[87]=10'b0;mem_i[87]=10'b0;
mem_r[88]=10'b0; mem_i[88]=10'b0;mem_r[89]=10'b0; mem_i[89]=10'b0;mem_r[90]=10'b0;
mem_i[90]=10'b0;mem_r[91]=10'b0;mem_i[91]=10'b0;
mem_r[92]=10'b0; mem_i[92]=10'b0;mem_r[93]=10'b0; mem_i[93]=10'b0;mem_r[94]=10'b0;
mem_i[94]=10'b0;mem_r[95]=10'b0;mem_i[95]=10'b0;
mem_r[96]=10'b0; mem_i[96]=10'b0;mem_r[97]=10'b0; mem_i[97]=10'b0;mem_r[98]=10'b0;
mem_i[98]=10'b0;mem_r[99]=10'b0;mem_i[99]=10'b0;
48
mem_r[100]=10'b0; mem_i[100]=10'b0;mem_r[101]=10'b0;
mem_i[101]=10'b0;mem_r[102]=10'b0;
mem_i[102]=10'b0;mem_r[103]=10'b0;mem_i[103]=10'b0;
mem_r[104]=10'b0; mem_i[104]=10'b0;mem_r[105]=10'b0;
mem_i[105]=10'b0;mem_r[106]=10'b0;
mem_i[106]=10'b0;mem_r[107]=10'b0;mem_i[107]=10'b0;
mem_r[108]=10'b0; mem_i[108]=10'b0;mem_r[109]=10'b0;
mem_i[109]=10'b0;mem_r[110]=10'b0;
mem_i[110]=10'b0;mem_r[111]=10'b0;mem_i[111]=10'b0;
mem_r[112]=10'b0; mem_i[112]=10'b0;mem_r[113]=10'b0;
mem_i[113]=10'b0;mem_r[114]=10'b0;
mem_i[114]=10'b0;mem_r[115]=10'b0;mem_i[115]=10'b0;
mem_r[116]=10'b0; mem_i[116]=10'b0;mem_r[117]=10'b0;
mem_i[117]=10'b0;mem_r[118]=10'b0;
mem_i[118]=10'b0;mem_r[119]=10'b0;mem_i[119]=10'b0;
mem_r[120]=10'b0; mem_i[120]=10'b0;mem_r[121]=10'b0;
mem_i[121]=10'b0;mem_r[122]=10'b0;
mem_i[122]=10'b0;mem_r[123]=10'b0;mem_i[123]=10'b0;
mem_r[124]=10'b0; mem_i[124]=10'b0;mem_r[125]=10'b0;
mem_i[125]=10'b0;mem_r[126]=10'b0;
mem_i[126]=10'b0;mem_r[127]=10'b0;mem_i[127]=10'b0;
end
else if(din_valid==1)begin
mem_r[count1]=indata_r;
mem_i[count1]=indata_i;
end
end
always @(posedgerclk or negedge reset )
begin
if(!reset)
count2<=6'b111111;
else if(din_valid==1)
count2<=count2+1;
end
always @(posedgerclk or negedge reset)
if(!reset) begin
first_r<=10'b0;
last_r<=10'b0;
first_i<=10'b0;
last_i<=10'b0;
end
else begin
if (count2<32)begin
first_r<=mem_r[count2+64];
last_r<=mem_r[count2+96];
first_i<=mem_i[count2+64];
49
last_i<=mem_i[count2+96];
end
else begin
first_r<=mem_r[count2-32];
last_r<=mem_r[count2];
first_i<=mem_i[count2-32];
last_i<=mem_i[count2];
end
end
endmodule
`timescale 1ns/1ns
Moduletb_buffer;
Regwclk,rclk,reset,din_valid;
reg [9:0] indata_r,indata_i;
wire [9:0] first_r,first_i,last_r,last_i;
always #5 wclk=~wclk;
always #10 rclk=~rclk;
initial begin
wclk=0;
rclk=0;
reset=0;
din_valid=0;
indata_r=0;indata_i=0;
#10 reset=1;din_valid=1;
indata_r=0;indata_i=0;
repeat(200)begin
#10 indata_r=indata_r+1;indata_i=indata_i+1;
end
end
input_buffer i1(.wclk( wclk),.rclk(rclk),.reset(reset),.din_valid(din_valid),
.indata_r(indata_r),.indata_i(indata_i),
.first_r(first_r),.first_i(first_i),.last_r(last_r),.last_i(last_i));
endmodule
`timescale 1ns/1ns
Moduledff( clk,reset, d, y );
parameter IN_WIDTH=10;
inputclk,reset;
input [IN_WIDTH-1:0]d;
output [IN_WIDTH-1:0]y;
wire [IN_WIDTH-1:0]y;
reg [IN_WIDTH-1:0]r;
assign y=r;
always @(posedgeclk or negedge reset)
begin
if(!reset)begin
r<=10'b0;
50
end
else begin
r<=d;
end
end
endmodule
//butterfly
`timescale 1ns/1ns
module butterfly( a_r,a_i,b_r,b_i, a1_r,a1_i,b1_r,b1_i);
parameter IN_WIDTH=10;
input [IN_WIDTH-1:0] a_r,a_i,b_r,b_i;
output [IN_WIDTH-1:0] a1_r,a1_i,b1_r,b1_i;
wire [IN_WIDTH:0] a0_r,a0_i,b0_r,b0_i;
assign a0_r=a_r+b_r;
assign b0_r=a_r-b_r;
assign a0_i=a_i+b_i;
assign b0_i=a_i-b_i;
assign a1_r=a0_r[IN_WIDTH:1];
assign b1_r=b0_r[IN_WIDTH:1];
assign a1_i=a0_i[IN_WIDTH:1];
assign b1_i=b0_i[IN_WIDTH:1];
endmodule
//delay 16
`timescale 1ns/1ns
module delay16 (clk,reset,x_r,x_i,y_r,y_i);
parameter IN_WIDTH=10;
input clk,reset;
input [IN_WIDTH-1:0]x_r,x_i;
output[IN_WIDTH-1:0]y_r,y_i;
reg [IN_WIDTH-1:0]y_r,y_i;
wire [IN_WIDTH-1:0]x0_r,x0_i;
wire [IN_WIDTH-1:0]x1_r,x1_i;
wire [IN_WIDTH-1:0]x2_r,x2_i;
wire [IN_WIDTH-1:0]x3_r,x3_i;
wire [IN_WIDTH-1:0]x4_r,x4_i;
wire [IN_WIDTH-1:0]x5_r,x5_i;
wire [IN_WIDTH-1:0]x6_r,x6_i;
wire [IN_WIDTH-1:0]x7_r,x7_i;
wire [IN_WIDTH-1:0]x8_r,x8_i;
wire [IN_WIDTH-1:0]x9_r,x9_i;
wire [IN_WIDTH-1:0]x10_r,x10_i;
wire [IN_WIDTH-1:0]x11_r,x11_i;
wire [IN_WIDTH-1:0]x12_r,x12_i;
wire [IN_WIDTH-1:0]x13_r,x13_i;
wire [IN_WIDTH-1:0]x14_r,x14_i;
//wire [IN_WIDTH-1:0]x15_r,x15_i;
51
dff d1(.clk(clk),.reset(reset),.d(x_r),.y(x0_r));
dff d2(.clk(clk),.reset(reset),.d(x0_r),.y(x1_r));
dff d3(.clk(clk),.reset(reset),.d(x1_r),.y(x2_r));
dff d4(.clk(clk),.reset(reset),.d(x2_r),.y(x3_r));
dff d5(.clk(clk),.reset(reset),.d(x3_r),.y(x4_r));
dff d6(.clk(clk),.reset(reset),.d(x4_r),.y(x5_r));
dff d7(.clk(clk),.reset(reset),.d(x5_r),.y(x6_r));
dff d8(.clk(clk),.reset(reset),.d(x6_r),.y(x7_r));
dff d9(.clk(clk),.reset(reset),.d(x7_r),.y(x8_r));
dff d10(.clk(clk),.reset(reset),.d(x8_r),.y(x9_r));
dff d11(.clk(clk),.reset(reset),.d(x9_r),.y(x10_r));
dff d12(.clk(clk),.reset(reset),.d(x10_r),.y(x11_r));
dff d13(.clk(clk),.reset(reset),.d(x11_r),.y(x12_r));
dff d14(.clk(clk),.reset(reset),.d(x12_r),.y(x13_r));
dff d15(.clk(clk),.reset(reset),.d(x13_r),.y(x14_r));
//dffd16(.clk(clk),.reset(reset),.d(x14_r),.y(x15_r));
dff d17(.clk(clk),.reset(reset),.d(x_i),.y(x0_i));
dff d18(.clk(clk),.reset(reset),.d(x0_i),.y(x1_i));
dff d19(.clk(clk),.reset(reset),.d(x1_i),.y(x2_i));
dff d20(.clk(clk),.reset(reset),.d(x2_i),.y(x3_i));
dff d21(.clk(clk),.reset(reset),.d(x3_i),.y(x4_i));
dff d22(.clk(clk),.reset(reset),.d(x4_i),.y(x5_i));
dff d23(.clk(clk),.reset(reset),.d(x5_i),.y(x6_i));
dff d24(.clk(clk),.reset(reset),.d(x6_i),.y(x7_i));
dff d25(.clk(clk),.reset(reset),.d(x7_i),.y(x8_i));
dff d26(.clk(clk),.reset(reset),.d(x8_i),.y(x9_i));
dff d27(.clk(clk),.reset(reset),.d(x9_i),.y(x10_i));
dff d28(.clk(clk),.reset(reset),.d(x10_i),.y(x11_i));
dff d29(.clk(clk),.reset(reset),.d(x11_i),.y(x12_i));
dff d30(.clk(clk),.reset(reset),.d(x12_i),.y(x13_i));
dff d31(.clk(clk),.reset(reset),.d(x13_i),.y(x14_i));
//dffd32(.clk(clk),.reset(reset),.d(x14_i),.y(x15_i));
always @(posedgeclk or negedge reset )
begin
if(!reset)
begin
y_r<=10'b0;
y_i<=10'b0;
end
else
begin
y_r<=x14_r;
y_i<=x14_i;
end
end
endmodule
52
//delay8
`timescale 1ns/1ns
module delay8 ( clk,reset, x_r,x_i,y_r,y_i);
parameter IN_WIDTH=10;
input clk,reset;
input [IN_WIDTH-1:0]x_r,x_i;
output[IN_WIDTH-1:0]y_r,y_i;
reg[IN_WIDTH-1:0]y_r,y_i;
wire [IN_WIDTH-1:0]x0_r,x0_i;
wire [IN_WIDTH-1:0]x1_r,x1_i;
wire [IN_WIDTH-1:0]x2_r,x2_i;
wire [IN_WIDTH-1:0]x3_r,x3_i;
wire [IN_WIDTH-1:0]x4_r,x4_i;
wire [IN_WIDTH-1:0]x5_r,x5_i;
wire [IN_WIDTH-1:0]x6_r,x6_i;
//wire [IN_WIDTH-1:0]x7_r,x7_i;
dff d1(.clk(clk),.reset(reset),.d(x_r),.y(x0_r));
dff d2(.clk(clk),.reset(reset),.d(x0_r),.y(x1_r));
dff d3(.clk(clk),.reset(reset),.d(x1_r),.y(x2_r));
dff d4(.clk(clk),.reset(reset),.d(x2_r),.y(x3_r));
dff d5(.clk(clk),.reset(reset),.d(x3_r),.y(x4_r));
dff d6(.clk(clk),.reset(reset),.d(x4_r),.y(x5_r));
dff d7(.clk(clk),.reset(reset),.d(x5_r),.y(x6_r));
//dffd8(.clk(clk),.reset(reset),.d(x6_r),.y(x7_r));
dff d9(.clk(clk),.reset(reset),.d(x_i),.y(x0_i));
dff d10(.clk(clk),.reset(reset),.d(x0_i),.y(x1_i));
dff d11(.clk(clk),.reset(reset),.d(x1_i),.y(x2_i));
dff d12(.clk(clk),.reset(reset),.d(x2_i),.y(x3_i));
dff d13(.clk(clk),.reset(reset),.d(x3_i),.y(x4_i));
dff d14(.clk(clk),.reset(reset),.d(x4_i),.y(x5_i));
dff d15(.clk(clk),.reset(reset),.d(x5_i),.y(x6_i));
//dffd16(.clk(clk),.reset(reset),.d(x6_i),.y(x7_i));
always @(posedgeclk or negedge reset)
begin if(!reset)
begin
y_r<=10'b0;
y_i<=10'b0;
end
else begin
y_r<=x6_r;
y_i<=x6_i;
end
end
endmodule
//delay4
`timescale 1ns/1ns
53
module delay4 (
clk,reset,
x_r,x_i,
y_r,y_i
);
parameter IN_WIDTH=10;
input clk,reset;
input [IN_WIDTH-1:0]x_r,x_i;
output[IN_WIDTH-1:0]y_r,y_i;
reg [IN_WIDTH-1:0]y_r,y_i;
wire [IN_WIDTH-1:0]x0_r,x0_i;
wire [IN_WIDTH-1:0]x1_r,x1_i;
wire [IN_WIDTH-1:0]x2_r,x2_i;
//wire [IN_WIDTH-1:0]x3_r,x3_i;
dff d1(.clk(clk),.reset(reset),.d(x_r),.y(x0_r));
dff d2(.clk(clk),.reset(reset),.d(x0_r),.y(x1_r));
dff d3(.clk(clk),.reset(reset),.d(x1_r),.y(x2_r));
//dffd4(.clk(clk),.reset(reset),.d(x2_r),.y(x3_r));
dff d5(.clk(clk),.reset(reset),.d(x_i),.y(x0_i));
dff d6(.clk(clk),.reset(reset),.d(x0_i),.y(x1_i));
dff d7(.clk(clk),.reset(reset),.d(x1_i),.y(x2_i));
//dffd8(.clk(clk),.reset(reset),.d(x2_i),.y(x3_i));
always @(posedgeclk or negedge reset )
begin
if(!reset)
begin
y_r<=10'b0;
y_i<=10'b0;
end
else begin
y_r<=x2_r;
y_i<=x2_i;
end
end
endmodule
//delay2
`timescale 1ns/1ns
module delay2 ( clk,reset,x_r,x_i, y_r,y_i);
parameter IN_WIDTH=10;
input clk,reset;
input [IN_WIDTH-1:0]x_r,x_i;
output[IN_WIDTH-1:0]y_r,y_i;
reg [IN_WIDTH-1:0]y_r,y_i;
wire [IN_WIDTH-1:0]x0_r,x0_i;
//wire [IN_WIDTH-1:0]x1_r,x1_i;
dff d1(.clk(clk),.reset(reset),.d(x_r),.y(x0_r));
54
//dffd2(.clk(clk),.reset(reset),.d(x0_r),.y(x1_r));
dff d3(.clk(clk),.reset(reset),.d(x_i),.y(x0_i));
//dffd4(.clk(clk),.reset(reset),.d(x0_i),.y(x1_i));
always @(posedgeclk or negedge reset)
begin
if(!reset)
begin
y_r<=10'b0;
y_i<=10'b0;
end
else begin
y_r<=x0_r;
y_i<=x0_i;
end
end
endmodule
//delay1
`timescale 1ns/1ns
module delay1 ( clk,reset, x_r,x_i, y_r,y_i);
parameter IN_WIDTH=10;
input clk,reset;
input [IN_WIDTH-1:0]x_r,x_i;
output[IN_WIDTH-1:0]y_r,y_i;
reg[IN_WIDTH-1:0]y_r,y_i;
always @(posedgeclk or negedge reset )
begin
if(!reset)
begin
y_r<=10'b0;
y_i<=10'b0;
end
else begin
y_r<=x_r;
y_i<=x_i;
end
end
endmodule
//switch16
`timescale 1ns/1ns
module switch16(count, x0_r,x1_r,x0_i,x1_i, y0_r,y1_r,y0_i,y1_i);
parameter IN_WIDTH=10;
input [4:0] count;
input [IN_WIDTH-1:0] x0_r,x1_r,x0_i,x1_i;
output[IN_WIDTH-1:0] y0_r,y1_r,y0_i,y1_i;
reg[IN_WIDTH-1:0] y0_r,y1_r,y0_i,y1_i;
always @(count or x0_r or x1_r or x0_i or x1_i)
55
begin
if(count>15)
begin
y0_r=x1_r;
y0_i=x1_i;
y1_r=x0_r;
y1_i=x0_i;
end
else
begin
y0_r=x0_r;
y0_i=x0_i;
y1_r=x1_r;
y1_i=x1_i;
end
end
endmodule
// switch 8
`timescale 1ns/1ns
module switch8( count,x0_r,x1_r,x0_i,x1_i, y0_r,y1_r,y0_i,y1_i);
parameter IN_WIDTH=10;
input [3:0] count;
input [IN_WIDTH-1:0] x0_r,x1_r,x0_i,x1_i;
output[IN_WIDTH-1:0] y0_r,y1_r,y0_i,y1_i;
reg[IN_WIDTH-1:0] y0_r,y1_r,y0_i,y1_i;
always @(count or x0_r or x1_r or x0_i or x1_i)
begin
if(count>7)
begin
y0_r=x1_r;
y0_i=x1_i;
y1_r=x0_r;
y1_i=x0_i;
end
else
begin
y0_r=x0_r;
y0_i=x0_i;
y1_r=x1_r;
y1_i=x1_i;
end
end
endmodule
// switch 4
`timescale 1ns/1ns
module switch4(
56
count,
x0_r,x1_r,x0_i,x1_i,
y0_r,y1_r,y0_i,y1_i
);
parameter IN_WIDTH=10;
input [2:0] count;
input [IN_WIDTH-1:0] x0_r,x1_r,x0_i,x1_i;
output[IN_WIDTH-1:0] y0_r,y1_r,y0_i,y1_i;
reg[IN_WIDTH-1:0] y0_r,y1_r,y0_i,y1_i;
always @(count or x0_r or x1_r or x0_i or x1_i)
begin
if(count>3)
begin
y0_r=x1_r;
y0_i=x1_i;
y1_r=x0_r;
y1_i=x0_i;
end
else
begin
y0_r=x0_r;
y0_i=x0_i;
y1_r=x1_r;
y1_i=x1_i;
end
end
endmodule
// switch 2
`timescale 1ns/1ns
module switch2( count, x0_r,x1_r,x0_i,x1_i,,y0_r,y1_r,y0_i,y1_i);
parameter IN_WIDTH=10;
input [1:0] count;
input [IN_WIDTH-1:0] x0_r,x1_r,x0_i,x1_i;
output[IN_WIDTH-1:0] y0_r,y1_r,y0_i,y1_i;
reg[IN_WIDTH-1:0] y0_r,y1_r,y0_i,y1_i;
always @(count or x0_r or x1_r or x0_i or x1_i)
begin
if(count>1)
begin
y0_r=x1_r;
y0_i=x1_i;
y1_r=x0_r;
y1_i=x0_i;
end
else
begin
57
y0_r=x0_r;
y0_i=x0_i;
y1_r=x1_r;
y1_i=x1_i;
end
end
endmodule
// switch1
`timescale 1ns/1ns
module switch1( count, x0_r,x1_r,x0_i,x1_i,,y0_r,y1_r,y0_i,y1_i);
parameter IN_WIDTH=10;
input count;
input [IN_WIDTH-1:0] x0_r,x1_r,x0_i,x1_i;
output[IN_WIDTH-1:0] y0_r,y1_r,y0_i,y1_i;
reg[IN_WIDTH-1:0] y0_r,y1_r,y0_i,y1_i;
always @(count or x0_r or x1_r or x0_i or x1_i)
begin
if(count==1)
begin
y0_r=x1_r;
y0_i=x1_i;
y1_r=x0_r;
y1_i=x0_i;
end
else
begin
y0_r=x0_r;
y0_i=x0_i;
y1_r=x1_r;
y1_i=x1_i;
end
end
endmodule
// cla20
`timescale 1ns/1ns
module cla20 (a,b,ci,s,co);
input [19:0]a,b;
input ci;
output [19:0] s;
output co;
wire [19:0] a,b,s;
wireci,co;
wire [19:0]c;
wire [19:0] p,g,ps;
wire [18:0] p_1,g_1;
wire [15:0] p_2,g_2;
58
wire [3:0] p_3,g_3;
assign p=a|b;
assign g=a&b;
assign c[0]=ci;
assign c[1]=g[0]|(p[0]&ci);
//first line
opo2 l101(.p2(p[1]),.g2(g[1]),.p1(p[0]),.g1(g[0]),.p3(p_1[0]),.g3(g_1[0]));
opo3 l102(.p3(p[2]),.g3(g[2]),.p2(p[1]),.g2(g[1]),.p1(p[0]),.g1(g[0]),.p4(p_1[1]),.g4(g_1[1]));
opo4 l103(.p4(p[3]),.g4(g[3]),.p3(p[2]),.g3(g[2]),.p2(p[1]),.g2(g[1]),.p1(p[0]),.g1(g[0]),
.p5(p_1[2]),.g5(g_1[2]));//1
opo4 l104(.p4(p[4]),.g4(g[4]),.p3(p[3]),.g3(g[3]),.p2(p[2]),.g2(g[2]),.p1(p[1]),.g1(g[1]),
.p5(p_1[3]),.g5(g_1[3]));//2
opo4
l105(.p4(p[5]),.g4(g[5]),.p3(p[4]),.g3(g[4]),.p2(p[3]),.g2(g[3]),.p1(p[2]),.g1(g[2]),.p5(p_1[4]),.g5
(g_1[4]));//3
opo4 l106(.p4(p[6]),.g4(g[6]),.p3(p[5]),.g3(g[5]),.p2(p[4]),.g2(g[4]),.p1(p[3]),.g1(g[3]),
.p5(p_1[5]),.g5(g_1[5]));//4
opo4
l107(.p4(p[7]),.g4(g[7]),.p3(p[6]),.g3(g[6]),.p2(p[5]),.g2(g[5]),.p1(p[4]),.g1(g[4]),.p5(p_1[6]),.g5
(g_1[6]));//5
opo4 l108(.p4(p[8]),.g4(g[8]),.p3(p[7]),.g3(g[7]),.p2(p[6]),.g2(g[6]),.p1(p[5]),.g1(g[5]),
.p5(p_1[7]),.g5(g_1[7]));//6
opo4 l109(.p4(p[9]),.g4(g[9]),.p3(p[8]),.g3(g[8]),.p2(p[7]),.g2(g[7]),.p1(p[6]),.g1(g[6]),
.p5(p_1[8]),.g5(g_1[8]));//7
opo4 l110(.p4(p[10]),.g4(g[10]),.p3(p[9]),.g3(g[9]),.p2(p[8]),.g2(g[8]),.p1(p[7]),.g1(g[7]),
.p5(p_1[9]),.g5(g_1[9]));//8
opo4 l111(.p4(p[11]),.g4(g[11]),.p3(p[10]),.g3(g[10]),.p2(p[9]),.g2(g[9]),.p1(p[8]),.g1(g[8]),
.p5(p_1[10]),.g5(g_1[10]));//9
opo4 l112(.p4(p[12]),.g4(g[12]),.p3(p[11]),.g3(g[11]),.p2(p[10]),.g2(g[10]),.p1(p[9]),.g1(g[9]),
.p5(p_1[11]),.g5(g_1[11]));//10
opo4
l113(.p4(p[13]),.g4(g[13]),.p3(p[12]),.g3(g[12]),.p2(p[11]),.g2(g[11]),.p1(p[10]),.g1(g[10]),
.p5(p_1[12]),.g5(g_1[12]));//11
opo4
l114(.p4(p[14]),.g4(g[14]),.p3(p[13]),.g3(g[13]),.p2(p[12]),.g2(g[12]),.p1(p[11]),.g1(g[11]),
.p5(p_1[13]),.g5(g_1[13]));//12
opo4
l115(.p4(p[15]),.g4(g[15]),.p3(p[14]),.g3(g[14]),.p2(p[13]),.g2(g[13]),.p1(p[12]),.g1(g[12]),
.p5(p_1[14]),.g5(g_1[14]));//13
opo4
l116(.p4(p[16]),.g4(g[16]),.p3(p[15]),.g3(g[15]),.p2(p[14]),.g2(g[14]),.p1(p[13]),.g1(g[13]),
.p5(p_1[15]),.g5(g_1[15]));//14
opo4
l117(.p4(p[17]),.g4(g[17]),.p3(p[16]),.g3(g[16]),.p2(p[15]),.g2(g[15]),.p1(p[14]),.g1(g[14]),
.p5(p_1[16]),.g5(g_1[16]));//15
59
opo4
l118(.p4(p[18]),.g4(g[18]),.p3(p[17]),.g3(g[17]),.p2(p[16]),.g2(g[16]),.p1(p[15]),.g1(g[15]),
.p5(p_1[17]),.g5(g_1[17]));//16
opo4
l119(.p4(p[19]),.g4(g[19]),.p3(p[18]),.g3(g[18]),.p2(p[17]),.g2(g[17]),.p1(p[16]),.g1(g[16]),.p5(p
_1[18]),.g5(g_1[18]));//17
assign c[2]=g_1[0]|(p_1[0]&ci);
assign c[3]=g_1[1]|(p_1[1]&ci);
//second line
opo2 l201(.p2(p_1[3]),.g2(g_1[3]),.p1(p[0]),.g1(g[0]),.p3(p_2[0]),.g3(g_2[0]));
opo2 l202(.p2(p_1[4]),.g2(g_1[4]),.p1(p_1[0]),.g1(g_1[0]),.p3(p_2[1]),.g3(g_2[1]));
opo2 l203(.p2(p_1[5]),.g2(g_1[5]),.p1(p_1[1]),.g1(g_1[1]),.p3(p_2[2]),.g3(g_2[2]));
opo2 l204(.p2(p_1[6]),.g2(g_1[6]),.p1(p_1[2]),.g1(g_1[2]),.p3(p_2[3]),.g3(g_2[3]));
opo3
l205(.p3(p_1[7]),.g3(g_1[7]),.p2(p_1[3]),.g2(g_1[3]),.p1(p[0]),.g1(g[0]),.p4(p_2[4]),.g4(g_2[4]))
;
opo3
l206(.p3(p_1[8]),.g3(g_1[8]),.p2(p_1[4]),.g2(g_1[4]),.p1(p_1[0]),.g1(g_1[0]),.p4(p_2[5]),.g4(g_
2[5]));
opo3
l207(.p3(p_1[9]),.g3(g_1[9]),.p2(p_1[5]),.g2(g_1[5]),.p1(p_1[1]),.g1(g_1[1]),.p4(p_2[6]),.g4(g_
2[6]));
opo3
l208(.p3(p_1[10]),.g3(g_1[10]),.p2(p_1[6]),.g2(g_1[6]),.p1(p_1[2]),.g1(g_1[2]),.p4(p_2[7]),.g4(
g_2[7]));
opo4
l209(.p4(p_1[11]),.g4(g_1[11]),.p3(p_1[7]),.g3(g_1[7]),.p2(p_1[3]),.g2(g_1[3]),.p1(p[0]),.g1(g[0
]), .p5(p_2[8]),.g5(g_2[8]));//1
opo4
l210(.p4(p_1[12]),.g4(g_1[12]),.p3(p_1[8]),.g3(g_1[8]),.p2(p_1[4]),.g2(g_1[4]),.p1(p_1[0]),.g1(
g_1[0]), .p5(p_2[9]),.g5(g_2[9]));//2
opo4
l211(.p4(p_1[13]),.g4(g_1[13]),.p3(p_1[9]),.g3(g_1[9]),.p2(p_1[5]),.g2(g_1[5]),.p1(p_1[1]),.g1(
g_1[1]), .p5(p_2[10]),.g5(g_2[10]));//3
opo4
l212(.p4(p_1[14]),.g4(g_1[14]),.p3(p_1[10]),.g3(g_1[10]),.p2(p_1[6]),.g2(g_1[6]),.p1(p_1[2]),.g
1(g_1[2]), .p5(p_2[11]),.g5(g_2[11]));//4
opo4
l213(.p4(p_1[15]),.g4(g_1[15]),.p3(p_1[11]),.g3(g_1[11]),.p2(p_1[7]),.g2(g_1[7]),.p1(p_1[3]),.g
1(g_1[3]), .p5(p_2[12]),.g5(g_2[12]));//5
opo4
l214(.p4(p_1[16]),.g4(g_1[16]),.p3(p_1[12]),.g3(g_1[12]),.p2(p_1[8]),.g2(g_1[8]),.p1(p_1[4]),.g
1(g_1[4]), .p5(p_2[13]),.g5(g_2[13]));//6
opo4
l215(.p4(p_1[17]),.g4(g_1[17]),.p3(p_1[13]),.g3(g_1[13]),.p2(p_1[9]),.g2(g_1[9]),.p1(p_1[5]),.g
1(g_1[5]), .p5(p_2[14]),.g5(g_2[14]));//7
60
opo4
l216(.p4(p_1[18]),.g4(g_1[18]),.p3(p_1[14]),.g3(g_1[14]),.p2(p_1[10]),.g2(g_1[10]),.p1(p_1[6]),
.g1(g_1[6]) .p5(p_2[15]),.g5(g_2[15]));//8
assign c[4]=g_1[2]|(p_1[2]&ci);
assign c[5]=g_2[0]|(p_2[0]&ci);
assign c[6]=g_2[1]|(p_2[1]&ci);
assign c[7]=g_2[2]|(p_2[2]&ci);
assign c[8]=g_2[3]|(p_2[3]&ci);
assign c[9]=g_2[4]|(p_2[4]&ci);
assign c[10]=g_2[5]|(p_2[5]&ci);
assign c[11]=g_2[6]|(p_2[6]&ci);
assign c[12]=g_2[7]|(p_2[7]&ci);
assign c[13]=g_2[8]|(p_2[8]&ci);
assign c[14]=g_2[9]|(p_2[9]&ci);
assign c[15]=g_2[10]|(p_2[10]&ci);
//third line
opo2 l301(.p2(p_2[12]),.g2(g_2[12]),.p1(p_2[4]),.g1(g_2[4]),.p3(p_3[0]),.g3(g_3[0]));
opo2 l302(.p2(p_2[13]),.g2(g_2[13]),.p1(p_2[5]),.g1(g_2[5]),.p3(p_3[1]),.g3(g_3[1]));
opo2 l303(.p2(p_2[14]),.g2(g_2[14]),.p1(p_2[6]),.g1(g_2[6]),.p3(p_3[2]),.g3(g_3[2]));
opo2 l304(.p2(p_2[15]),.g2(g_2[15]),.p1(p_2[7]),.g1(g_2[7]),.p3(p_3[3]),.g3(g_3[3]));
//result
assign c[16]=g_2[11]|(p_2[11]&ci);
assign c[17]=g_3[0]|(p_3[0]&ci);
assign c[18]=g_3[1]|(p_3[1]&ci);
assign c[19]=g_3[2]|(p_3[2]&ci);
assign co=g_3[3]|p_3[3]&ci;
assign s=(p&(~g))^c;
endmodule
//booth
`timescale 1ns/1ns
module booth (a,b,out,signal);
input [9:0] a;
input [2:0] b;
output [10:0] out;
output signal;
wire [9:0] a;
wire [2:0] b;
reg [10:0] out;
reg signal;
always @(a or b)
begin
case (b)
3'b000: begin
out=11'b0;
signal=0;
61
end
3'b001: begin
out={a[9],a};
signal=0;
end
3'b010: begin
out={a[9],a};
signal=0;
end
3'b011: begin
out[10:0]=a<<1;
signal=0;
end
3'b100: begin
out[10:0]=(~(a<<1));
signal=1;
end
3'b101: begin
out[10]=~a[9];
out[9:0]=~a;
signal=1;
end
3'b110: begin
out[10]=~a[9];
out[9:0]=~a;
signal=1;
end
3'b111: begin
out=11'b0;
signal=0;
end
endcase
end
endmodule
//complex_mul
`timescale 1ns/1ns
Modulecomplex_mul( a,b,c,d, yr,yi);
parameter IN_WIDTH=10;
input [IN_WIDTH-1:0]a,b,c,d;
output [IN_WIDTH*2-1:0]yr,yi;
wire [IN_WIDTH:0] a1,c2,c3;
wire [IN_WIDTH-1:0] a0,c0,c1;
wire [IN_WIDTH*2-1:0] y0,y1,y2;
assign a1=a-b;
assign c2=c-d;
assign c3=c+d;
62
assign a0=a1[IN_WIDTH:1];
assign c0=c2[IN_WIDTH:1];
assign c1= c3[IN_WIDTH:1];
multiplier m0(.x(a0),.y(d),.result(y0));
multiplier m1(.x(c0),.y(a),.result(y1));
multiplier m2(.x(c1),.y(b),.result(y2));
assign yr=y0+y1;
assignyi=y0+y2;
endmodule
//tbcla
`timescale 1ns/1ns
Moduletbcla;
Regclk;
reg ci;
reg [19:0] a,b;
wire [19:0] s;
wire co;
reg [20:0] check;
cla20 c1(.a(a),.b(b),.ci(ci),.s(s),.co(co));
always #5 clk=~clk;
initial begin
clk=1'b0;
a=20'b0;
b=20'b0;
ci=1'b0;
repeat(100) begin
a=$random;b=$random;ci=1'b0;
check=a+b+ci;
#10 $display ($time, " %d+%d+%d=%d(%d)",a,b,ci,{co,s},check);
end
end
endmodule
//cl42_20
module cl42_20(a,b,c,d,ci,s,cr);
input [19:0]a,b,c,d;
input ci;
output [20:0]s;
output [20:0]cr;
wire [19:0] txr,tao,toa;
assigntxr=(a^b)^(c^d);
assigntao=(a&b)|(c&d);
assigntoa=(a|b)&(c|d);
assign s={txr[19],txr}^{toa,ci};
assign cr=({txr[19],txr}&{toa,ci})|((~{txr[19],txr})&{tao[19],tao});
endmodule
63
//multiplier
`timescale 1ns/1ns
module multiplier ( x,y, result );
input [9:0]x,y;
output [19:0]result;
wire [19:0] result;
wire [9:0] a,b;
wire [10:0] w0,w1,w2,w3,w4;
wire x0,x1,x2,x3,x4;
wire [14:0] s1,s2;
wire [12:0] s3,s4;
wire [20:0] s5,s6;
wire [19:0] s7;
wire co;
assign a=x;
assign b=y;
assign result=s7;
//booth coding
booth b0(.a(a),.b({b[1:0],1'b0}),.out(w0),.signal(x0));
booth b1(.a(a),.b(b[3:1]),.out(w1),.signal(x1));
booth b2(.a(a),.b(b[5:3]),.out(w2),.signal(x2));
booth b3(.a(a),.b(b[7:5]),.out(w3),.signal(x3));
booth b4(.a(a),.b(b[9:7]),.out(w4),.signal(x4));
//******************first line with 3:2 compressor w0_w1_w2 w3_w4_x4************//
csa_15
c1(.a({{4{w0[10]}},w0}),.b({{2{w1[10]}},w1,1'b0,x0}),.ci({w2,1'b0,x1,2'b0}),.s(s1),.co(s2));
csa_13 c2(.a({{2{w3[10]}},w3}),.b({w4,1'b0,x3}),.ci({10'b0,x4,2'b0}),.s(s3),.co(s4));
//********************second line with 4:2 compressor******************//
cl42_20
c3(.a({{5{s1[14]}},s1}),b({{4{s2[14]}},s2,1'b0}),.c({{s3[12]},s3,1'b0,x2,4'b0}),.d({s4,7'b0}),.c
i(1'b0),.s(s5),.cr(s6));
//******************** leading carry adder**********************************//
cla20 cla(.a({s5[19:0]}),.b({s6[18:0],1'b0}),.ci(1'b0),.s(s7),.co(co));
endmodule
//inverter
`timescale 1ns/1ns
module inverter( count, a_r,a_i, a1_r,a1_i);
parameter IN_WIDTH=10;
input count;
input [IN_WIDTH-1:0] a_r,a_i;
output [IN_WIDTH-1:0] a1_r,a1_i;
wire[IN_WIDTH-1:0] a1_r,a1_i;
assign a1_r=(count)?a_i:a_r;
assign a1_i=(count)?(-a_r):a_i;
endmodule
64
//bm
`timescale 1ns/1ns
module bm(clk,reset address, ar,ai,br,bi, r16,i16, r0,i0);
parameter IN_WIDTH=10;
input clk,reset;
input [4:0]address;
input [IN_WIDTH-1:0] ar,ai;
input [IN_WIDTH-1:0] br,bi;
output [IN_WIDTH-1:0] r16,i16;
output [IN_WIDTH-1:0] r0,i0;
wire [IN_WIDTH-1:0] r16,i16;
wire [IN_WIDTH-1:0] r0,i0;
wire [IN_WIDTH-1:0] yr0,yi0,yr16,yi16;
wire [IN_WIDTH*2-1:0] yr,yi;
wire [IN_WIDTH-1:0] wr,wi;
butterfly b1(.a_r(ar),.a_i(ai),.b_r(br),.b_i(bi),.a1_r(yr0),.a1_i(yi0),.b1_r(yr16),.b1_i(yi16));
twiddle1 t1(.clk(clk),.reset(reset),.address(address),.wr(wr),.wi(wi));
complex_mul m1(.a(yr16),.b(yi16),.c(wr),.d(wi),.yr(yr),.yi(yi));
assign r0=yr0;
assign i0=yi0;
assign r16=yr[IN_WIDTH*2-1:IN_WIDTH];
assign i16=yi[IN_WIDTH*2-1:IN_WIDTH];
endmodule
// opo2
`timescale 1ns/1ns
module opo2(p2,g2,p1,g1,p3,g3);
input p1,p2,g1,g2;
output p3,g3;
assign p3=p2&p1;
assign g3=g2|(g1&p2);
endmodule
//opo3
`timescale 1ns/1ns
module opo3(p3,p2,p1,g3,g2,g1,p4,g4);
input p1,p2,p3,g1,g2,g3;
output p4,g4;
assign p4=p3&p2&p1;
assign g4=g3|(p3&g2)|(p3&p2&g1);
endmodule
//opo4
`timescale 1ns/1ns
module opo4(p4,p3,p2,p1,g4,g3,g2,g1,p5,g5);
input p4,p3,p2,p1,g4,g3,g2,g1;
output p5,g5;
assign p5=p4&p3&p2&p1;
assign g5=g4|p4&g3|p4&p3&g2|p4&p3&p2&g1;
65
endmodule
//csa13
module csa_13(a,b,ci,s,co);
input[12:0] a,b,ci;
output[12:0] s,co;
assign s=a^b^ci;
assign co=(a&b)|(a&ci)|(b&ci);
endmodule
//dataout
`timescale 1ns/1ns
Moduledataout ( clk,reset,first_r,first_i, last_r,last_i, dout_re,dout_im,dout_valid);
Inputclk,reset;
input [9:0]first_r,first_i,last_r,last_i;
output [9:0]dout_re,dout_im;
outputdout_valid;
reg [9:0]dout_re,dout_im;
regdout_valid;
reg flag;
reg [6:0]count2;
reg count;
always @(posedgeclk or negedge reset)
begin
if (!reset)
count2<=7'b1111111;
else if(flag==0)
count2<=count2+1;
end
always @(posedgeclk or negedge reset)
begin
if (!reset)
flag<=0;
else if(count2==7'b1111101)
flag<=1;
end
always @(posedgeclk or negedge reset)
begin
if (!reset)
count<=1;
else
count<=count+1;
end
always @(posedgeclk or negedge reset)
begin
if(!reset) begin
dout_re<=10'b0; dout_im<=10'b0;
end
66
else begin
if(count==0) begin
dout_re<=first_r; dout_im<=first_i;
end
else begin
dout_re<=last_r;dout_im<=last_i;
end
end
end
always @(posedgeclk or negedge reset)
begin
if(!reset)
dout_valid<=0;
else if(flag==1)
dout_valid<=1;
else
dout_valid<=0;
end
endmodule
//twiddle
`timescale 1ns/1ns
module twiddle1(clk,reset,address,wr,wi);
parameter IN_WIDTH=10;
parameter mem0=10'b0111111111;
parameter mem1=10'b0111111101;
parameter mem2=10'b0111110110;
parameter mem3=10'b0111101001;
parameter mem4=10'b0111011001;
parameter mem5=10'b0111000011;
parameter mem6=10'b0110101001;
parameter mem7=10'b0110001011;
parameter mem8=10'b0101101010;
parameter mem9=10'b0101000100;
parameter mem10=10'b0100011100;
parameter mem11=10'b0011110001;
parameter mem12=10'b0011000011;
parameter mem13=10'b0010010100;
parameter mem14=10'b0001100011;
parameter mem15=10'b0000110010;
input clk,reset;
input [4:0]address;
output [IN_WIDTH-1:0] wr,wi;
reg [IN_WIDTH-1:0] wr,wi;
always @(posedgeclk or negedge reset )
if (!reset) begin
wr<=0;wi<=0;
67
end
else
begin
case(address)
5'd0 :beginwr<=mem0;wi<=0; end
5'd1 : begin wr<=mem1;wi<=-mem15;end
5'd2 : begin wr<=mem2;wi<=-mem14; end
5'd3 : begin wr<=mem3;wi<=-mem13; end
5'd4 : begin wr<=mem4;wi<=-mem12;end
5'd5 : begin wr<=mem5;wi<=-mem11;end
5'd6 : begin wr<=mem6;wi<=-mem10;end
5'd7 : begin wr<=mem7;wi<=-mem9;end
5'd8 : begin wr<=mem8;wi<=-mem8;end
5'd9 : begin wr<=mem9;wi<=-mem7;end
5'd10 :beginwr<=mem10;wi<=-mem6;end
5'd11 :beginwr<=mem11;wi<=-mem5;end
5'd12 :beginwr<=mem12;wi<=-mem4;end
5'd13 :beginwr<=mem13;wi<=-mem3;end
5'd14 :beginwr<=mem14;wi<=-mem2;end
5'd15 :beginwr<=mem15;wi<=-mem1;end
5'd16 :beginwr<=0;wi<=-mem0; end
5'd17 :beginwr<=-mem15;wi<=-mem15;end
5'd18 :beginwr<=-mem14;wi<=-mem14; end
5'd19 :beginwr<=-mem13;wi<=-mem13; end
5'd20: begin wr<=-mem12;wi<=-mem12;end
5'd21: begin wr<=-mem11;wi<=-mem11;end
5'd22: begin wr<=-mem10;wi<=-mem10;end
5'd23 :beginwr<=-mem9;wi<=-mem9;end
5'd24 :beginwr<=-mem8;wi<=-mem8;end
5'd25 :beginwr<=-mem7;wi<=-mem7;end
5'd26 :beginwr<=-mem6;wi<=-mem6;end
5'd27 :beginwr<=-mem5;wi<=-mem5;end
5'd28 :beginwr<=-mem4;wi<=-mem4;end
5'd29 :beginwr<=-mem3;wi<=-mem3;end
5'd30 :beginwr<=-mem2;wi<=-mem2;end
5'd31 :beginwr<=-mem1;wi<=-mem1;end
Endcase
end
endmodule
//tbmul
`timescale 1ns/1ns
Moduletbmul;
Regclk,reset;
reg [9:0] x,y;
wire [19:0] result;
reg [19:0] check;
68
multiplier m0(
.x(x),
.y(y),
.result(result)
);
always #20 clk=~clk;
initial begin
clk=0;
reset=1;
x=-10'd15;
y=10'd30;
#5 reset=0;
#20 reset=1;
#15;
//check=x*y;
repeat(100) begin
x=x+20;y=y+30;
check=x*y;
#40;
end
end
endmodule
69
VITA
Bhavishya Murukutla was born in Guntur, Andhra Pradesh, India. She has graduated with a
University, Kakinada, India in May 2012. After completion of her Bachelor‟s degree, she moved
to the United States of America in August 2012 to pursue her Master of Science in Electrical
2013.
70