DD

PIPELINE FFT ARCHITECTURE IMPLEMENTATION USING VERILOG HDL
A Thesis
by
BHAVISHYA MURUKUTLA
Submitted to the College of Graduate Studies

Texas A&M University-Kingsville
in partial fulfillment of the requirements for the degree of
MASTER OF SCIENCE
DECEMBER 2013
Major: Electrical Engineering

ABSTRACT
PIPELINE FFT Architecture Implementation Using VERILOG HDL
(December 2013)
Bhavishya Murukutla, Bachelor of Technology, JNTU University, Kakinada, INDIA
Chairman of Advisory Committee: Dr. Reza Nekovei
In most of the Communication Systems the Fourier transform is the main concept to process the
signals which are used in the system. Then the FFT/IFFT comes in to the picture for fast signal
processing, but the FFT/IFFT has some delays present and area of the implementation is very
large [1]. So we need to design an architecture which is optimized in terms of both delay and
area.
The Main Reason for the delay and complexity of the architecture due to the complex
multiplications implementation present in the Fast Fourier Transform due to the twiddle factors
(𝑤𝑁𝑟 ). Then the proposed pipeline architecture leads to decreasing of the complex multiplications
[1]. But the delay is present in this architecture also.
Here we propose Pipeline architecture which removes the complex multiplications using the
twiddle factors. In this Pipeline architecture we are going to use the delay elements, switch
elements and basic butterfly structure and the input data stream is divided in to two half streams
and processing is done at the same time and output data stream will also be in two half data
streams and here the number of complex multiplications will be reduced a lot which reduces the
cost of implementation.
iii
ACKNOWLEDGEMENT
I consider this opportunity to show my gratitude towards my advisor chair Dr. Reza
Nekovei, for his invaluable guidance throughout this thesis work, my career choices and for
answering every question very patiently.
I also extend my appreciation to the members of the supervisory committee;
Dr. Lifford McLauchlan and Dr. Sung-won Park. This thesis would not have been possible or
successful without their invaluable instructions.
I would like to thank all the faculty and members and staff of Texas A&M University-
Kingsville for the timely response and help I received during my course of study here.
I would like to thank my sister and all my friends for being there for me with constant
encouragement.
I wish to deliver my deep love and thankfulness to my dear parents. Their selfless and
great love is a significant impetus throughout my study life.
Finally, I thank God for pouring out his wisdom and knowledge on me.
iv
TABLE OF CONTENTS
Page
ABSTRACT................................................................................................................................... iii
ACKNOWLEDGEMENT ............................................................................................................ iv
TABLE OF CONTENTS ................................................................................................................v
LIST OF FIGURES ....................................................................................................................... vi
LIST OF TABLES ........................................................................................................................ vii
CHAPTER I. INTRODUCTION ....................................................................................................1
CHAPTER II. DIFFERENT FFT ALGORITHMS .........................................................................7
CHAPTER III. IMPLEMENTATION OF DITFFT AND PIPELINED FFT ...............................24
CHAPTER IV. RESULTS .............................................................................................................33
CHAPTER V. CONCLUSION ......................................................................................................37
REFERENCES ..............................................................................................................................39
APPENDIX A CODE FOR IMPLEMENTATION OF DIT FFT AND PIPELINED FFT ..........42
VITA ..............................................................................................................................................70
v
LIST OF FIGURES
Page
Fig. 1.1. Quatrus-II Work Flow………………………………………………………………….. 5
Fig. 2.1. Basic Butterfly Structure ..................................................................................................8
Fig. 2.2. Modified Butterfly Structure .............................................................................................9
Fig. 2.3. Butterfly Structure used in the DIF FFT .........................................................................10
Fig. 2.4. Basic Decimation in Time FFT .......................................................................................13
Fig. 2.5. 8-Point Decimation in Time FFT. ...................................................................................14
Fig. 2.6. Sequence of input in DIT FFT .........................................................................................15
Fig. 2.7. Radix-4 DIT FFT .........................................................................................................…16
Fig. 2.8. Input to Output sequence generation of Decimation in Frequency FFT .........................18
Fig. 2.9. Butterfly structure used in DIF FFT ............................................................................…19
Fig. 2.10. 8-Point Radix-2 Decimation in Frequency FFT ............................................................20
Fig. 2.11. Radix-4 DIF FFT ...........................................................................................................20
Fig. 2.12. Basic Pipeline architecture ........................................................................................…22
Fig. 2.13. R2MDC Pipeline architecture of 8- Point .....................................................................22
Fig. 3.1. Stage-1 Butterfly..............................................................................................................24
Fig. 3.2. Second Stage Butterfly ....................................................................................................26
Fig. 3.3. Butterfly used in third stage.............................................................................................28
Fig. 3.4. Pipeline architecture of 64-Point FFT using R2MDC .....................................................32
Fig. 4.1. Simulation Output for 8-Point DIT FFT..........................................................................33
Fig. 4.2. RTL View of 8-Point DIT FFT .......................................................................................34
Fig. 4.3. Resource Utilization Summary........................................................................................35
vi
Fig. 4.4. Worst Case Delay ............................................................................................................35
Fig. 4.5. Simulation Output for 64-Point Pipeline FFT .................................................................36
vii
LIST OF TABLES
Page
Table. 2.1. Bit Reversal Order .......................................................................................................12
Table. 3.1. Comparison between Normal DFT and FFT ...............................................................31
viii
CHAPTER I
INTRODUCTION
In most of the communication systems [1] and control system the frequency spectrum of
the signal is important to calculate the frequency range of the signal to know whether the system
can use the signal for further processing.
Most of the signals are in time domain in which the variation is represented with respect to time
so to get the frequency domain signal which means to know the signal variation with respect to
frequency [2] we need a transformation from time domain to frequency domain which is done by
using different transformation techniques [3]. They are the following:
1) Fourier series [3]
2) Fourier Transform(FT)
3) Laplace Transform
4) Z-transform
The Fourier series is applied only for the repetitive signals or periodic signals so we are
going to the Fourier Transform which can be applied for the periodic and non-periodic signals
[3] also.
The FT of a signal is done by decomposing the signal in to sum of finite sinusoidal
components. The FT can be done by using the Discrete Fourier Transform in which equally
spaced samples of a function is transformed to finite combination of complex sinusoidal signals
[4].
The Discrete Fourier Transform can be expressed as [4]:

2ᴨ
𝑁−1 −𝑗 ( )𝑘𝑛
𝐹 𝑘 = 𝑘=0 𝑓(𝑛)𝑒 𝑁 ……………….. (1.1)
Here Nis the samples present in the signals
1
F(k) is the frequency domain signal
f(n) is the time domain signal
The Frequency domain output F(k) is discrete signal as the input considered should be in discrete
ones.
The Fourier Transform can also be done by using the DTFT in which the input signal should be
discrete and the output frequency domain signal is continuous signal.
The DTFT is expressed as [4]:
𝐹 𝑒 𝑗𝜔 = 𝑛=∞
𝑛=−∞ 𝑓(𝑛) 𝑒
−𝑗𝜔𝑛
………………. (1.2)
Where 𝐹(𝑒 𝑗𝑤 )is the Frequency domain signal which is continuous and periodic one
f(n) is time domain signal and discrete
ω is the frequency
If the input is continuous signal we need to do sampling and get the output as discrete
signal and apply the DTFT technique to get the frequency domain signal.
Applying DFT and DTFT to the signals in time domain leads to frequency domain of the signal
For a N point the conversion can be done by using the following [3] [4]
No of complex multiplications present in DFT: 𝑁 2
No of complex additions present in DFT: 𝑁(𝑁 − 1)
If we consider 8-point input sequence the following is required to convert in to frequency
domain:
64 complex multiplications
56 complex additions
If we increase the numbers of samples in the input sequence the multiplications going to increase
very rapidly.
2
Let us consider the 16-point sequence the conversion requires the following:
No of complex multiplications: 256
No of complex additions: 240
To reduce the no of complex multiplications and additions [4], we are going to use FFT
technique to calculate the frequency domain of the signal the conversion requires the following:
No of complex multiplications: (N/2)𝑙𝑜𝑔2 𝑁 [5]
No of complex additions : N 𝑙𝑜𝑔2 𝑁 [5]
The Fast Fourier Transform is done using the COOLEY-TUKEY [6] algorithm which is
also called prime factor algorithm.
Basically, Fast Fourier Transform can be done by using radix algorithm which can be of
type radix-r, r can be any integer and the N-point FFT can be calculated by using different radix
like radix-2, radix-4 and radix-8 and so-on. The FFT can be implemented by using two different
methods like DIT FFT and DIFFFT.
To increase the speed, the pipeline architecture [6] [7] is used in the computation of FFT
and in particularly Multi Delay Commutator [6] [8] [9] [10] architecture is used in the
communication systems. In the Pipeline architecture, we also use a butterfly element and the
butterfly can be done by using different radix like radix-2 and radix-4 and in this the elements
will be retrieved by using memory addressing [11] [12] [13].
The implementation of FFT using the DIT FFT for 8-Point sequence is done using the
Verilog and synthesized in Quartus-II [14] and the Pipeline architecture for 64-Point is done
using the Verilog in the Modelsim [14].
3
1.2 Radix:
In this the Radix means number of elements can be taken in at a time and processing can
be done using the Butterfly if it is a Radix-2 the input elements will be „2‟ and the processing
like addition and multiplication operations are done and the output can be obtained. If it is
Radix-4 the input elements will be „4‟ and the output elements will be „4‟ at a time.
1.3 Verilog:
Basically the hardware description languages are different from the software description
languages and the mostly used hardware description languages are as follows:
1) VHDL
2) Verilog HDL
Verilog is a hardware description language which is similar to C language which is
standardization of IEEE 1364.In the hardware description languages there is a need of
propagation of signal and time.
1.4 Quartus-II
The synthesis of designed code will be done by using the Quartus-II and to do the
synthesis first we need to do simulation in Modelsim and the synthesis and implementation is
done, placing of Integrated circuits, allocating pins respectively and the timing analysis will be
done to analyze the worst case delay present in the circuit.
After the synthesis we get the different views of the circuit we designed they are:
1) RTL view
2) STATE MACHINE view
The dumping of the program in to the hardware can also be done by using QUARTUS-II
4
In the first step coding can be done by using the different hardware description
languages. In this we are using the Verilog Language But we can also use the VHDL language
and the synthesis and implementation can also be done by using Xilinx software also.
The Flow of the synthesis steps can be as shown below [14]:
DESIGN
VERILOG Coding
Functional Simulation
Using MODLESIM
Synthesis and
Implementation
Place and Route
Timing Analysis Timing Closure
SIMULATION
PROGRAMMING
And
CONFIGURATION
Fig. 1.1.Quartus- II Work Flow [14]
5
The thesis is divided in to Chapter and subsections:
1) Chapter II: deals with theoretical description of different types of FFT algorithms
a) Cooley-Tukey Method
b) DIT FFT
c) DIF FFT
d) Pipelined FFT architecture
2) Chapter III: deals with Implementation of algorithms
a) Implementation of DIT FFT of 8-Point input
b) Implementation of Pipelined FFT of 64-Point
3) Chapter IV: deals with Results
a) Simulation Results
b) RTL view
4) Chapter V: Conclusion
6
CHAPTER II
DIFFERENT FFT ALGORITHMS
2.1 Cooley-Tukey Method:
This Method is most used in the computation of FFT in this the DFT of N point is
expressed as product of N1 and N2[3]
N=N1*N2
It can be done by breaking in to N1 DFT‟s of size N2 point or breaking in to N2 DFT‟s of size N1
point.
In the N1 and N2 one of them is small value compared with other one and if N1 is radix FFT can
be done by using Decimation in Time FFT and if N2 is radix FFT can be done by using
Decimation in frequency FFT.
The operation done in recursive model by using radix-2 DFT‟s and the radix-2 DIT will be done
by multiplying the phase factor which is called as Twiddle factor to odd transform after that
addition and subtraction operation will be performed, butterfly of even and odd transform is
called size-2 DFT
The Fast Fourier Transform can be done by using two different methods[4]:
1) DIT Fast Fourier Transform
2) DIF Fast Fourier Transform
This is done by dividing in to number of stages and they can be calculated as:
𝑣 = 𝑙𝑜𝑔2 𝑁
Where N is the no of input samples present in the time domain
7
𝑁 𝑁
N-Point DFT with even N will be calculated with two ( 2 ) point DFT again each point DFT is
2
𝑁
done by using ( 4 ) point and so on until it reach to 2-point DFT‟s only.
Basically the Fast Fourier Transform can be done by using butterfly structure and the operation
can be done in two ways:
In the two ways one is used in the DIT FFT and other is used in the DIF FFT
Butterfly used in DIT FFT:
>
a c
𝑤𝑁𝑟
b d
>
𝑁
(𝑟+ )
2
𝑤𝑁
Fig.2.1. Basic Butterfly Structure

𝑁
(𝑟+ )
Here a and b are the input samples for the butterfly and 𝑤𝑁𝑟 , 𝑤𝑁 2
are the Twiddle
Factors.
The results from the butterfly structure are as below:
𝑐 = 𝑎 + 𝑏𝑤𝑁𝑟 ……………………….. (2.1)
𝑁
(𝑟+ )
𝑑 = 𝑎 + 𝑏𝑤𝑁 2
………………….... (2.2)
Twiddle Factor:
Twiddle Factor is root of a unity complex in the butterfly operation used to compute the discrete
Fourier transform
8
𝑤𝑁𝑟 = 𝑒 −𝑗 2𝜋𝑟 /𝑁 ……………………… (2.3)
The butterfly requires two complex multiplications and two complex additions we can
reduce the no of complex multiplications by using symmetry property.
The symmetry property is

𝑁 𝑁
(𝑟+ ) ( )
𝑤𝑁 2
= 𝑤𝑁𝑟 ∗ 𝑤𝑁2 ………………………. (2.4)
𝑁
( )
Consider 𝑤𝑁2 the value will be equal to 𝑒 −𝑗𝜋
As From the trigonometric equations 𝑒 −𝑗𝜃 = 𝑐𝑜𝑠Ɵ + 𝑗𝑠𝑖𝑛Ɵ the value can be calculated as
𝑒 −𝑗𝜋 = 𝑐𝑜𝑠𝜋 + 𝑗𝑠𝑖𝑛𝜋……………………. (2.5)
The value will be equal to “-1” the Twiddle Factor can becomes equal to the−𝑤𝑁𝑟 .
From this the butterfly can be modified as shown below:
a 1 c
b d
𝑤𝑁𝑟 -1
Fig. 2.2.Modified Butterfly Structure
The results from the modified butterfly will be equal to
𝑐 = 𝑎 + 𝑏𝑤𝑁𝑟 …………………. (2.6)
𝑑 = 𝑎 − 𝑏𝑤𝑁𝑟 …………………. (2.7)
This requires only “1” complex multiplication and“2” complex additions.
9
Butterfly used in DIF FFT [1]:
The Butterfly structure used in DIF FFT is as shown below:
 1 >
a c
b 𝑤𝑁𝑟 d
 -1 >
Fig. 2.3.Butterfly structure used in the DIF FFT
The results from the butterfly structure is given by
𝑐 = 𝑎 + 𝑏……………………. (2.8)
𝑑 = (𝑎 − 𝑏)𝑤𝑁𝑟 ……………….. (2.9).
This requires “2” complex additions and “1” complex multiplication.
Decimation in Time FFT:
In the DIT FFT the input will be given in bit reversal order and the output will be in the
order.
Decimation in Frequency FFT:
In the DIF FFT the input will be in the correct order and the output will be in the bit
reversal order.
Bit reversal order:
The Bit reversal order is generated using the exchange the first and last bits, the next bit
to first to the previous bit to the last bit present in the sequence and so on.
X (b0 b1 b2 b3 b4) ------------ original order of bits
10
For getting the bit reversal order
1) First exchange the bits b4 and b0
X (𝑏0 𝑏1 𝑏2 𝑏3 𝑏4) = X (𝑏4 𝑏3 𝑏2 𝑏1 𝑏0)
2) exchange the bits b3 and b2
X (𝑏4 𝑏1 𝑏2 𝑏3 𝑏0) = X (𝑏4 𝑏3 𝑏2 𝑏1 𝑏0)
3) The result is bit reversal order
X (𝑏4 𝑏3 𝑏2 𝑏1 𝑏0) is the bit reversal order
Let us consider 8 point input the bit reversal order can be as shown below:
The input samples can be{𝑥 0 , 𝑥 1 , 𝑥 2 , 𝑥 3 , 𝑥 4 , 𝑥 5 , 𝑥 6 , 𝑥 7 }
The bit reversal order can be obtained as:
11
Original sample Binary Representation Bit reversal Order
X 0 X 000 X 000 = X 0
𝑋 1 𝑋 001 𝑋 100 = 𝑋(4)
𝑋 2 𝑋 010 𝑋 010 = 𝑋(2)
𝑋 3 𝑋 011 𝑋 110 = 𝑋(6)
𝑋 4 𝑋 100 𝑋 001 = 𝑋(1)
𝑋 5 𝑋 101 𝑋 101 = 𝑋 5
𝑋 6 𝑋 110 𝑋 011 = 𝑋 3
𝑋 7 𝑋 111 𝑋 111 = 𝑋(7)
Table.2.1. Bit Reversal order

2.2 DIT FFT:
The algorithm in which the x(n) is break down in to smaller subsequences and the
principle of the decimation in time FFT can be explained by considering the No of i/p points in
FFT should be expressed as power of 2.
𝑁 = 2𝑟
The x(n) is break down in to two parts in which one has only even parts and other has odd parts.
The Frequency domain can be obtained from the time domain by using the below formula:
𝑛=𝑁−1
𝑋 𝑘 = 𝑛=0 𝑥 𝑛 ∗ 𝑤𝑁𝑛𝑘 ………………………. (2.10)
Here X(k) is the representation of a signal x(n) in frequency domain.
And the breaking of the signal in to two subsequences leads to the frequency domain as
represented below:
12
𝑋 𝑘 = 𝑛𝑒𝑣𝑒𝑛 𝑥 𝑛 ∗ 𝑤𝑁𝑛𝑘 + 𝑛𝑜𝑑𝑑 𝑥 𝑛 ∗ 𝑤𝑁𝑛𝑘 ………………. (2.11)
Here n will be replaced by 2*r where r varies from 0 to (N/2)-1 the above equation can be
modified as shown below:

𝑁 𝑁
𝑟= −1 𝑟= −1 (2𝑟+1)𝑘
𝑋 𝑘 = 𝑟=0
2
𝑥 2𝑟 ∗ 𝑤𝑁2𝑟𝑘 + 𝑟=0
2
𝑥 2𝑟 + 1 ∗ 𝑤𝑁 ………….(2.12)
By the symmetry property we can break the Twiddle Factor and the frequency domain is sum of
even sequence and odd sequence multiplied by 𝑤𝑁𝑘 .
The decimation in time FFT process as shown below:
𝐺(0)
𝑤𝑁0
G(1)
Even (N/2) Point
DFT G(2) 𝑤𝑁1
𝑤𝑁2
𝐺(3)
𝑤𝑁3
Output
𝐻(0) 𝑤𝑁4 frequency
responses
Odd (N/2) Point 𝐻(1) 𝑤𝑁5
=
DFT
𝐻(2) 𝑤𝑁6
𝐻(3) 𝑤𝑁7
Fig.2.4. Basic Decimation in Time FFT
Dividing the input sequence in to odd and even can be done by giving the input in bit reversal
order and the output frequency responses will be in order as 𝑥 0 , 𝑥 1 , 𝑥 2 … … . 𝑥 7 .
13
𝑁 𝑁
Again each is divided in to two point DFT and so on the process is done till the
2 4
2-point DFT.
The total decimation in time for 8-Point Sequence is as shown below:
𝑋[0] 𝑋[0]
𝑋[4] 𝑤80 -1 𝑋[1]
𝑋[2] 𝑤80 -1 𝑋[2]
𝑋6 𝑤80 -1 𝑤82 -1 𝑋[3]
𝑋[1] 𝑤80 -1 𝑋[4]
𝑋[5] 𝑤80 -1 𝑤81 -1 𝑋[5]
𝑋[3] 𝑤80 -1 𝑤82 -1 𝑋[6]
𝑋7 𝑤80 -1 𝑤82 -1 𝑤83 -1 𝑋[7]
Fig. 2.5. 8-Point DIT FFT
The DIT FFT can be done using different radices:
1) Radix-2
2) Radix-4
14
Radix-2 DIT FFT:
In the radix-2 DIT FFT i/p sequence is divided as shown below:
0 1 2 3 4 5 6 7
8- Point
0 2 4 6 1 3 5 7
0246 0246
4-Point
0 4 2 6 1 5 3 7
2- Point
Fig.2.6. Sequence of Input in DIT FFT
15
Radix-4 FFT:
The Radix-4 basic butterfly diagram is as shown below:
𝑤𝑁0
𝑤𝑁𝑞 -1
-1
1
1
2𝑞
𝑤𝑁 -1
-1
3𝑞
𝑤𝑁 -j
Fig. 2.7.Radix-4 DIT FFT
16
2.3 DIF FFT [1]:
The DIF FFT can be done with i/p in normal order and the o/p in the bit reversal order.
𝑁
In this the N-point is divided in to two point sequences and the sequences can be shown as
2
below:
The first half sequence is with 𝑥 𝑛 where 0≤n≤ (N/2)-1 and
𝑁
The second sequence is with 𝑥 𝑛 + ( 2 ) where 0≤n≤ (N/2)-1
The decimation in Frequency FFT can be done by using different Radices:
1) Radix-2
2) Radix-4
17
Radix-2 DIF FFT:
In this the 𝑁-point is divided in to two parts and the two parts are individually divided as
shown below:
0 1 2 3 4 5 6 7
0 1 2 3 4 5 6 7
Butterfly Computation
0 1 2 3 0 1 2 3
0 1 2 3 0 1 2 3
Butterfly Computation Butterfly Computation
0 1 0 1 0 1 0 1
0 1 0 1 0 1 0 1
Butterfly Butterfly Butterfly Butterfly
0 4 2 6 1 5 3 7
Fig. 2.8. Input to output sequence generation of DIF FFT
18
Computing the DFT of N-point i/p sequence x(n)
𝑛 =𝑁−1
𝑋(𝑘) = 𝑛 =0 𝑥(𝑛)𝑤𝑁𝑛𝑘 ……………………… (2.13)
𝑁 𝑁
−1 𝑁−1 (𝑛 + )
𝑋(𝑘) = 2
𝑛 =0 𝑥(𝑛)𝑤𝑁𝑛𝑘 + 𝑁 𝑥(𝑛)𝑤𝑁
2
…… (2.14)
𝑛=( )
2
The above equation is modified as shown below:
𝑁
−1 𝑁
𝑥(𝑘) =[ 2
𝑛=0 (𝑥 𝑛 + −1 𝑘
𝑥(𝑛 + )]𝑤𝑁𝑛𝑘 … (2.15) where 𝑘 = 0,1,2 … … . (𝑁 − 1)
2
Basic Butterfly structure used in DIF FFT of Radix-2:
The butterfly structure used in DIF FFT is different from the Butterfly structure used in
DIT FFT. The Basic difference is in the DIT FFT Butterfly the multiplication is done before
additions but in the DIF FFT Butterfly the multiplication is done after additions.
𝑁
𝑥(𝑛) [𝑥 𝑛 + 𝑥(𝑛 + ]
2
𝑁
𝑥(𝑛 + ) -1 𝑤𝑁𝑛 [𝑥(𝑛) – 𝑥(𝑛 + (𝑁/2)] ∗ 𝑤𝑁𝑛
2
Fig.2.9. Butterfly structure used in DIF FFT
This involves two complex additions and one complex multiplication.
19
8-Point DIF FFT:
The FFT is performed using the decimation in frequency as shown below:
𝑋(0) 𝑋(0)
2-Point DFT
𝑋(1) 𝑋(4)
4-Point DFT
𝑋(2) -1 𝑋(2)
2-Point DFT
𝑋(3) -1 𝑋(6)
𝑋(4) -1 𝑤80 𝑋(1)

2-Point DFT
𝑋(5) -1 𝑤81 𝑋(5)
4-Point DFT
𝑋(6) -1 𝑤82 -1 𝑋(3)
2-Point DFT
𝑋 7 -1 𝑤83 -1 -1
𝑋(7)
Fig. 2.10. 8-Point Radix-2 DIF FFT
Radix- 4 FFT:
The Radix-4 FFT basic butterfly diagram is as shown below:
𝑋(𝑛) 𝑋(4𝑟)
𝑁
𝑋 𝑛+ 𝑤𝑛 𝑋 (4𝑟 + 1)
4
𝑁
𝑋(𝑛 + ) 𝑤 2𝑛
2
𝑋(4𝑟 + 2)
𝑤 3𝑛
3𝑁
𝑋 𝑛+ 𝑋(4𝑟 + 3)
4
Fig. 2.11.Radix-4DIF FFT
20
In this the FFT, length can be calculated by using 4𝑣 here v is the number of stages and
the Twiddle Factor can be expressed as shown below:

𝑁
( ) 𝜋 𝜋
𝑤𝑛 4 = (cos − 𝑗𝑠𝑖𝑛 )𝑘 = (−𝑗)𝑘 ……………… (2.16)
2 2
𝑁
( )
𝑤𝑛 2 = (cos 𝜋 − 𝑗𝑠𝑖𝑛 𝜋 )𝑘 = (1)𝑘 ………………... (2.17)
3𝑁
( ) 3𝜋 3𝜋
𝑤𝑛 4 = (cos − 𝑗𝑠𝑖𝑛 )𝑘 = (𝑗)𝑘 ………..….... (2.18).
2 2
2.4 Pipelined R2MDC:
For the H/W architecture of the FFT there are three different types of architectures they are:
1) Single Butterfly Architecture
2) Pipeline Architecture [15] [16]
3) Parallel Architecture [15] [16] [17]
In all these the pipeline architecture [8] is very attractive in the multimedia
communication systems which uses the FFT processor in their systems.
To reduce the complex multiplications further more we proposed pipeline architecture which
produces the low latency, power consumption will be low, throughput will be high and occupies
less area.
The pipeline architecture can be done by using different types as below:
1) Multi Path Delay Commutator (MDC)
2) Single Path Delay Commutator (SDC)
3) Single Path Delay Feedback(SDF)
In this the MDC architecture is having the multiple input data because of its high
throughput and the hardware utilization of the MDC is low.
21
Single Path Delay feedback is best solution for the single input data stream and it is used when
the memory requirement is less but in the SDC architecture [18] usage of adders is very low but
the memory requirement will be more and the output will be in reversal order and we need to get
in to the normal order it requires more operations for that one.
The Basic Pipeline architecture is as shown below:
Butterfly Delay Butterfly Delay Butterfly
Computation Commutator Computation Commutator Computation
Fig. 2.12 Basic Pipeline architecture
In this the input data stream is divided in to two parallel data streams and the processing
is done using the delay elements, butterfly elements and processing elements and in the MDC
architectures depending up on radix we are using the utilization of the resources will depend.
If we are using radix-r the utilization of the resources will be 1/r, r can be any integer and
if we are using radix-2 for the FFT computation the utilization of the resources will be 50%.
If we are using the radix-2 it is called as R2MDC pipeline architecture and the architecture using
the Radix-2 is as shown below [19] [20]:
X3X2X1X0
R R R
R B B B
E F F S
S F
R R
X7X6X5X4 G -j R
RR𝑅
Twiddle
Factor
Fig.2.13. R2MDC Pipeline architecture of 8-Point
22
The R represents the delay elements and BF is the butterfly structure used in the FFT, and
it is done by using the Radix-2 and S is the Switch.
The Pipeline architecture can be implemented using different radix like radix-4 it is
called as R4MDC.By using the pipelined R2MDC architecture the complex multipliers will be
reduced compared with the normal DIT FFT and DIF FFT.
23
CHAPTER III
IMPLEMENTATION OF DIT FFT AND PIPELINED FFT
The implementation of the DIT FFT and Pipelined FFT is done using the Verilog
Hardware Description Language using the Modelsim.
3.1 Implementation of DITFFT:
Starting with the implementation of DIT FFT of 8-point as it can be implemented using different
radix we are starting with Radix-2 Butterfly if we are using Radix-2 number of stages can be
calculated by using 𝑙𝑜𝑔2 𝑁 Here N is number of points in the input sequence [21] For 8-point
sequence it uses 3 stages in each stages there will be usage of different butterflies.
First Stage:
The first stage consists of four similar butterflies which is shown below:
X0 X10
X1𝑤𝑁𝑟 -1 X11
Fig. 3.1.Stage-1 Butterfly
In the Twiddle Factor it consists of the both real and imaginary parts and it can be expressed as
𝑤𝑁0 = 𝑟 + 𝑗 𝑖…………….. (3.1)
In the outputs X10 and X11 also there are real and imaginary parts and they are obtained separately
and it is as shown below:
As X10 = X10r + jX10i………….. (3.2)
24
If real and imaginary parts are not separated the output from the butterfly can be
obtained as:
X10= X0 + (X1*𝑤𝑁𝑟 )…………… (3.3)
Considering the real and imaginary parts in the Twiddle Factor we are going modify the
equation as shown below:
X10=(X0+(X1*(r+ji)))…………. (3.4)
It can be expressed as:
X10=(X0+(X1*r+ (j*X1*i))……… (3.5)
But X10= X10r+j X10i
X10r=(X0+(X1*r))……………….. (3.6)
X10i=(X1*i)……………………… (3.7).
The second output X11 can be obtained by using:
X11=X11r+jX11i
TheX11=(X0-(X1*𝑤𝑁𝑟 ))……………….…. (3.8)
X11r=(X0-(X1*r))…………………… (3.9)
X11i= (-X1*i)……………………... (3.10)
The negative can be obtained by taking the two‟s Complement and the two complement of a
binary number can be calculated by using one‟s Complement and adding „1‟ to it.
X11r=(X0+ (~(X1*r) +1)………… (3.11)
25
X11i= (~(X1*i) +1)……………… (3.12)
The inputs to the first stage can be given in the bit reversal order as there are four
butterflies in the first stage the inputs given to the four butterflies shown below [22]:
For the First Butterfly inputs will be {𝑋(0)and𝑋(4)}
For the Second Butterfly inputs will be {𝑋(2)and𝑋(6)}
For the third butterfly inputs will be {𝑋(1)and𝑋(5)}
For the Fourth butterfly inputs will be {𝑋(3) and 𝑋(7)}
In this the first stage the Twiddle Factor used is 𝑤80 which has real part equal to „1‟ and
the imaginary part equal to „0‟.
Second Stage:
The second stage uses the four input butterfly which can be shown as below:
X10 X20
X11 X21
X12 𝑤𝑁0 -1 X22

𝑤𝑁2 -1
X13 X23
Fig. 3.2. Second Stage Butterfly
Here the Twiddle Factors used are 𝑤80 and 𝑤82 . In the second Twiddle Factor the imaginary part
will be equal to „-1‟ and the real part equal to „0‟.
In the above butterfly shown the output values can be obtained as
X20 = (X10+(X12*(𝑤80 ))…………….. (3.13)
The output X22= (X10-(X12*(𝑤80 ))……………… (3.14).
26
The above two equations will be similar to the equations used in the butterfly which is
used in the first stage so we use same butterfly here imaginary parts of X10 and X12 are not
considered because the imaginary part in the X12 will be equal to zero and Twiddle Factor also
equal to zero [23] [24].
The other two outputs will be multiplied by the Twiddle Factor which has imaginary part
equal to „-1‟ and that will have an consideration of imaginary and real parts so we consider the
(X11r, X11i) and (X22r, X22i) the operation in this butterfly is done by using inversion only of real
and imaginary parts.
X21=(X11 + (X13*𝑤82 )……………….. (3.15)
And the X21= (X11r+jX11i) + (X13r+jX13i)*(-j))…….. (3.16).
The real part will be obtained by using (X11r+ X13i)
Here the imaginary part will be equal to „0‟ and the real part is X11r
The imaginary part will be obtained by using (X11i-X13r)
Here the imaginary part in X11i is equal to „0‟ and the imaginary part is equal to –X13r which is
calculated using two‟s complement which is addition of one‟s complement and one [25].
The other output can be obtained by using X11 and X13 is as shown below:
X23=(X11-(X13*𝑤82 ))………….. (3.17)
The X23= (X11r+jX11i)-(X13r+jX13i)*(-j)…………. (3.18).
The real part will be equal to (X11r-X13i) here X13i will be equal to zero and the
X23r= X11r
The imaginary part will be equal to (X11i+X13r) here X11i will be equal to zero and the imaginary
part will be
27
X23i= X13r
The four input butterfly is a combination of the two 2-input butterflies one using the
Twiddle Factors and the other using the real and imaginary values of the inputs.
Stage 3:
This stage uses the four types of butterflies in which the:
1) Butterfly using the Twiddle Factor 𝑤80
2) Butterfly using the Twiddle Factor𝑤82
3) Butterfly using the Twiddle Factor𝑤81
4) Butterfly using the Twiddle Factor 𝑤83
The butterflies using the Twiddle Factors with 𝑤80 and 𝑤82 are already explained above
and the butterflies using 𝑤81 and 𝑤83 having value (0.707-j (0.707)) and (-0.707-j (0.707)) and the
value can be obtained by using right shifting operation.
X21 Y0
X22 Y1
X23 Y2
X24 Y3
X25 𝑤80 -1 Y4
X26 𝑤81 -1 Y5
X27 𝑤82 -1 Y6
X28 𝑤83 -1 Y7
Fig. 3.3.Butterfly used in third stage
28
The outputs from the butterfly can be obtained as:
Y0=(X21+(X25*𝑤80 ))………………....….. (3.19)
Y4=(X21-(X25*𝑤80 ))……………………… (3.20)
This can be done by using the first stage butterfly only and the outputs from the butterfly
using the Twiddle Factor𝑤82 is done by using taking the two‟s complement of number and the
rest of the outputs can be obtained by using the shifting operation butterflies.
The output Y1 can be obtained by using the
Y1=(X22+(X26*𝑤81 ))…………………. (3.21)
The X26 is having the real and imaginary parts and the Twiddle Factor also consists of
the real and imaginary values and it can be shown as
Y1= (X22r+jX22i) + ((X26r+jX26i)*(0.707-j (0.707)))………… (3.22)
Y1= (X22r+jX22i) + ((X26r*0.707) +(X26i*0.707)) +j (X26i*0.707-(X26r*0.707)))
Y1r= (X22r+ ((X26r*0.707) + (X26i * 0.707))…………. (3.23)
Y1i= (X22i+ ((X26i*0.707) - (X26r*0.707)))………….. (3.24).
The internal ones are obtained by using the internal products and taking the two‟s complement
the other output can be expressed as:
Y5=(X22-(X26*(𝑤83 ))………….. (3.25)
It can be modified as Y5= (X22r+jX22i)-((X26r+jX26i)*(0.707-j (0.707)))
Y5= (X22r+jX22i)-((X26r*0.707) + (X26i*0.707)) +j ((X26r*0.707) + (X26i*0.707))
29
The real part is equal to Y5r= (X22r-((X26r*0.707) + (X26i*0.707))…….. (3.26)
The imaginary part is equal to Y5i= (X22i- (X26r*0.707) - (X26i*0.707))….. (3.27).
They need four internal products and addition and subtraction is done using the two‟s
complement.
The other butterfly using the Twiddle Factor 𝑤83 which is equal to the -0.707-j(0.707) it
is also implemented using the partial products and also by taking the two‟s complementsit can be
as shown below:
In this the two outputs will be
Y3= (X23r+jX23i) + ((X28r+jX28i)*(-0.707-j (0.707)))…….. (3.28)
Y7= (X23r+jX23i)-((X28r+jX28i)*(-0.707-j (0.707)))……… (3.29).
After the simplification we get the values for the real and imaginary parts and it is as shown
below:
Y3r= (X23r+ ((-X28r*0.707) + (X28i*0.707)))……………... (3.30)
Y3i= (X23i + ((-0.707*X28i) + (-X28r*0.707)))…………….. (3.31)
Y7r= (X23r+ ((X28r*0.707) + (-X28i*0.707)))……………… (3.32)
Y7i= (X23i+ ((X28r*0.707) + (X28i*0.707)))……………….. (3.33)
It is obtained using the two internal products and by using for subtraction the two‟s complement
operation will be used.
30
Complex Complex
multiplications Additions
Normal DFT of 8- 64 56
Point
DFT of 8-Point using 12 24
FFT
Table. 3.1. Comparison between Normal DFT and FFT
3.2 Implementation of Pipelined FFT:
The pipelined FFT is implemented using the delay elements and also switches and in this
the input buffer is used to store all the values that needs to be given as an input and the number
of delay elements need to be used depends up on N-Point sequence and as shown above for the
8-point it uses first 2 delay elements and the 1 delay element.
In this we are going to implement the 64-point R2MDC here we use the delay elements with 16,
8, 4 and 2 and switches will also be used of 16, 8, 4 and 2 and the input buffer will have memory
and accessing of the elements can be done by using the address.
The implementation of 64-point is as shown below:
First the input is divided in to two half parallel streams and they are passed through the
delay elements and the switch operation is done. In this the half bits present in the data stream is
getting delayed by the delay elements and the processing will be done with the second half of the
data stream.
First delay of the second half data stream is done and it is delayed by 2 delay elements in
the 8-point but in the 64-point the delay of the 32-point is done with 16 elements and after the
switch operation again the delay operation is done and the butterfly processing is done and the 8
delay elements will be used and the processing is done and again the delay4, delay2 and 1 will be
used.
Totally the delay present is equal to 16+8+8+4+4+2+2+1+1=46.
31
The output will also be obtained as two half sequences and inside the butterfly the
complex multiplications can be done by using the booth multiplication and addition operation the
further processing includes the addition and subtraction operations.
Here the Twiddle Factors are computed and stored and they are given in synchronous to
the operation and the R2MDC of 64-point [21] [22] is as shown below:
X31X30X29……X0 16 ………………. 1
R B B
E F S S F
X63X62X61…….X32 16 ...…………….-j. 1
G
Fig. 3.4. Pipeline architecture of 64-point FFT using R2MDC [23]
Here the input to delay element „1‟ is multiplied by –j and the processing using the
switch is done and the butterfly operation is done at the end also the butterfly used in the
R2MDC architecture is the Radix-2 Butterfly.
32
CHAPTER IV
RESULTS
The FFT is implemented in Verilog and the simulation is done using the MODELSIM PE
10.2 C and the results obtained as shown below:
Fig. 4.1. Simulation output for 8-Point DIT FFT
The clk is given with duty cycle and period and the selection is done using the force
value to select the output to be displayed.
Here Y0r, Y0i, Y1r, Y1i ….Y7r, Y7i are the outputs and at the end one of them is displayed by
using sel.
This is synthesized using the Altera Quartus-II using Cyclone II EP2C35F672C6 device
and the RTL view is as shown
33
Mux0
s el[2..0] SEL[2..0]
bfly1:s 11 bfly2:s 22 bfly4:s 34 yr[7..0]~reg0

PRE
8' h01 -- x[7..0] xr[7..0] x0r[7..0] xr[7..0] OUT D Q yr[7..0]
x0r[7..0] DATA[7..0]
8' h10 -- y[7..0] xi[7..0] x0i[7..0] xi[7..0] x0r[7..0]
x1r[7..0]
8' h01 -- wr[7..0] yr[7..0] x1r[7..0] yr[7..0] x0i[7..0]
x1i[7..0]
8' h00 -- wi[7..0] yi[7..0] x1i[7..0] yi[7..0] x1r[7..0]
8' h4B -- wr[7..0] x1i[7..0]
8' h4B -- wi[7..0]
bfly1:s 12 bfly1:s 21
MUX
8' h04 -- x[7..0] x[7..0]
x0r[7..0] x0r[7..0] bfly2:s 33 Mux8
8' h40 -- y[7..0] y[7..0]
x1r[7..0] x1r[7..0] ENA
8' h01 -- wr[7..0] 8' h01 -- wr[7..0] xr[7..0] x0r[7..0]
x1i[7..0] x1i[7..0] CLR
8' h00 -- wi[7..0] 8' h00 -- wi[7..0] xi[7..0] x0i[7..0] SEL[2..0]
yr[7..0] x1r[7..0]
yi[7..0] x1i[7..0]
yi[7..0]~reg0
bfly1:s 13 PRE
OUT D Q yi[7..0]
8' h02 -- x[7..0] bfly2:s 24 DATA[7..0]
x0r[7..0] bfly3:s 32
8' h20 -- y[7..0]
x1r[7..0] xr[7..0] x0r[7..0]
8' h01 -- wr[7..0] xr[7..0]
x1i[7..0] xi[7..0] x0i[7..0]
8' h00 -- wi[7..0] xi[7..0] x0r[7..0]
yr[7..0] x1r[7..0]
yr[7..0] x0i[7..0]
yi[7..0] x1i[7..0]
yi[7..0] x1r[7..0]
bfly1:s 14 8' hB5 -- wr[7..0] x1i[7..0] MUX

8' h4B -- wi[7..0]
8' h08 -- x[7..0] bfly1:s 23
x0r[7..0]
8' h80 -- y[7..0] ENA
x1r[7..0] x[7..0]
8' h01 -- wr[7..0] x0r[7..0] CLR
x1i[7..0] y[7..0] bfly1:s 31
8' h00 -- wi[7..0] x1r[7..0]
8' h01 -- wr[7..0]
x1i[7..0] x[7..0] x0r[7..0]
8' h00 -- wi[7..0]
y[7..0] x0i[7..0]
8' h01 -- wr[7..0] x1r[7..0]
8' h00 -- wi[7..0] x1i[7..0] Mux1
SEL[2..0]
OUT
DATA[7..0]
MUX
Mux2
SEL[2..0]
OUT
DATA[7..0]
MUX
Mux3
SEL[2..0]
OUT
DATA[7..0]
MUX
Mux4
SEL[2..0]
OUT
DATA[7..0]
MUX
Mux5
SEL[2..0]
OUT
DATA[7..0]
MUX
Mux6
SEL[2..0]
OUT
DATA[7..0]
MUX
Mux7
SEL[2..0]
OUT
DATA[7..0]
MUX
Mux9
SEL[2..0]
OUT
DATA[7..0]
MUX
Mux10
SEL[2..0]
OUT
DATA[7..0]
MUX
Mux11
SEL[2..0]
OUT
DATA[7..0]
MUX
Mux12
SEL[2..0]
OUT
DATA[7..0]
MUX
Mux13
SEL[2..0]
OUT
DATA[7..0]
MUX
Mux14
SEL[2..0]
OUT
DATA[7..0]
MUX
Mux15
SEL[2..0]
OUT
DATA[7..0]
MUX
clk
Fig. 4.2. RTL View of 8-Point DITFFT
By simulating the FFT using the Quartus-II, the resource utilization is as shown below:
34
Fig. 4.3. Resource Utilization summary
The timing analysis is as shown below:
Fig. 4.4. Worst case delay
The Pipeline FFT is implemented using the Verilog and simulated using the MODELSIM
10.2 C and the output is as shown below:
35
Fig. 4.5. Simulation Output for 64-Point Pipeline FFT
To get the output, the reset should have a value‟1‟ and din_valid should be „1‟. When the output
dout_valid is equal to „1‟, the output is getting displayed.
36
CHAPTER V
CONCLUSION
This Thesis work shows the implementation of the 8-Point FFT using the Verilog
Hardware description language and the implementation of 64 Point Pipeline FFT using the
Verilog Hardware Description Language. In the implementation of the 8-Point FFT using the
Verilog the synthesis is done using the Quartus-II and the RTL View of the 8-Point can be
observed and by observing the timing analysis the FFT has less time compared with the other
Fourier transform techniques. From the paper published by PawanVerma, HarpeetKaur,
Mandeep Singh and Balwinder Singh the computation of DFT using the DIT FFT will have less
no of multiplications and additions.
The implementation in the base paper is done using the VHDL but in the thesis it is done
using Verilog which is a Hardware Description language similar to the C Programming
language.
The implementation itself shows that the no of multiplications and additions are reduced
compared to normal one. Due to the reduced multiplications and additions the worst case delay
will be reduced and that leads use of FFT in most of the communication systems which uses the
computation of FT.
The implementation of FFT using the Pipeline architecture by Verilog Hardware
Description Language and in this the delay elements will be used it indicates the increase in the
delay but the complex multipliers will be reduced. From the paper published by Mounir Arioua
the complex multipliers are reduced compared with the multipliers in FFT implementation.
In this Thesis the RTL View of the FFT implementation using the Verilog is shown
above in the results.
37
The RTL view shown in the results is done by synthesizing the 8-Point FFT in the
Quartus-II Cyclone II EP2C35F672C6 device.
The Pipeline architecture can be used when there is requirement of less resource usage
but when it is in point of time delay we can use the general FFT architecture because of using
delay elements in the Pipeline architecture. This study can be expanded by reducing the resource
usage further and also reducing the no of complex multiplications and additions required for the
calculation of FFT which is used in most of the communication systems.
38
REFERENCES
[1] Mounir.A, Moha Hassant,” VHDL implementation Of Optimized 8-point FFT in
Pipelined Architecture For OFDM applications”, International Conference on Multimedia
communication systems, IEEE, pp. 1-5, 2010.
[2] Weidong Li, “Studies on implementation of lower power FFT processors”, Linkoping
Studies in Science and Technology ,Thesis No. 1030, ISBN 91-7373-692-9 , Linkoping,
Sweden, June 2003.
[3] S. He and M. Torkelson, “A new approach to Pipeline FFT processor”, In proceedings of
the 10th International Parallel Processing symposium. (IPPS). pp.766-770, April 1996.
[4] Pawan Verma, Harpeet Kaur and Mandeep singh, “VHDL implementation of FFT/IFFT
Blocks for OFDM,” International Conference on Advances in Recent Technologies in
Communication and Computing, pp. 186-188, 2009.
[5] Johnson, L.,” Conflict Free Memory Addressing for Dedicated FFT Hardware”, IEEE
Transactions on Analog and Digital Signal Processing, pp.312-316, 1992.
[6] J.W. Cooley and J.W. Tukey, “An algorithm for the machine calculation of complex
fourier series” IEEE Transactions on Math computation, Vol 19, pp297-301,1965.
[7] W. Li and L. Wanhammar, "A Pipeline FFT Processor", IEEE Workshop on signal
Processing Systems (SiPS), Taipei, China, pp.1982-1985,Oct.1999
[8] E.H. Wold and A.M. Despain, “Pipeline and Parallel Pipeline FFT processors for VLSI
implementation”, IEEE Transactions on Computers, pp.414-426, May 1984.
[9] R. Stron, “Radix -2 FFT Pipeline architecture with reduced noise to signal ratio”, IEEE
Proceeding on Image signal processing, pp. 81-86, Apr, 1984.
39
[10] D. Cohen, “Simplified Control of FFT Hardware”, IEEE Transactions on Signal and
Speech Processing, pp. 577–579, Dec 1976.
[11] T. widhe, “Efficient Implementation of FFT processing Elements”, Linkoping Studies in
Science and Technology, Thesis No. 619, Linkoping University, Sweden, 1997.
[12] L.R. Rabiner and B. Gold, “Theory and Application of Digital signal Processing”,
Prentice-Hall, pp. 23-27, 1975.
[13] M. Petrov, M. Glesner, “Optimal FFT Architecture Selection for OFDM Receivers on
FPGA”, In Proc. Of 2005 IEEE International Conference on Field Programmable
Technology, pp. 313 – 314, 2005.
[14] J. Viejo, A. Millan, M.J. Bellido, ”Design Of a FFT/IFFT module as an IP core suitable
for embedded systems”, IEEE Transactions on Industrial Embedded Systems, pp. 337-
340, 2007.
[15] J. Melander, “Design Of SIC FFT Architectures”, Linkoping Studies in Science and
Technology, Thesis No.618.Linkoping University, Sweden 1997.
[16] U.M. Baese , Digital signal Processing with Field Programmable Gate Arrays, 3rd edition
Springer,2007.
[17] Weidong. Li, Mark Vesterbacka and Lars Wanhammar, “An FFT Processor Based On 16
POINT Module”, Electronics Systems, Dept. of EE., Linkoping University, pp.1-8,1996.
[18] Y. Ma, “A VLSI oriented Parallel FFT algorithm”, IEEE Transactions on Signal
Processing, VOL 44, NO 2, pp. 445-448,Feb 1996.
[19] E.E. Swatzlander., W.K.W. Young, and S.J. Joseph, “A radix-4 delay commutator for
fast Fourier transforms processor implementation”, IEEE J. Solid- State Circuits, SC-
19(5), pp. 702-709 Oct 1984.
40
[20] Yunho Jung, Hongil Yoon and Jaeseok Kim, "New Efficient FFT Algorithm and Pipeline
Implementation Results for OFDM/ DMT Applications”, IEEE Transactions on
consumer Electronics, vol.49, no.1, pp. 14-17, Feb. 2003.
[21] W. Li, L. Wanhammar, “Complex Multiplication reduction in FFT processor”, IEEE
Workshop on Signal Processing Systems, Sweden, pp. 654-662, Mar 2002.
[22] Hsin-Lei Lin, Hongchin Lin, Yu Chuan Chen and Robert C. Chang, “A Novel Pipelined
Fast Fourier Transform Architecture for Double Rate OFDM Systems”, IEEE
Transactions on Signal Processing Systems, pp. 7-11, 2004.
[23] Shousheng He and Mats Torkelson, “Design and Implementation of a 1024- Point
Pipeline FFT processor”, Custom Integrated Circuits Conference, IEEE, pp. 131-134,
1998.
[24] H.L. Lin, H. Lin, Y.C. Chen and R.C. Chang, “A Novel Pipelined Fast Fourier Transform
Architecture for Double Rate OFDM Systems”, IEEE workshop on signal processing
systems design and implementation, pp. 7-11, 2004.
[25] K. Maharatna, E. Grass and U. Jaghold,” A Low power 64 Point FFT/IFFT Architecture
for wireless Broadband Communication”, In 5thInternational OFDM Workshop,
Hamburg, 2000.
41
APPENDIX A
Code for implementation of 8-Point FFT using verilog:
Modulefft(clk,sel,yr,yi); //main module

inputclk;
input [2:0]sel;
outputreg [7:0]yr,yi;
wire [7:0]y0r,y1r,y2r,y3r,y4r,y5r,y6r,y7r,y0i,y1i,y2i,y3i,y4i,y5i,y6i,y7i;
wire [7:0]x20r,x20i,x21r,x21i,x22r,x22i,x23r,x23i,x24r,x24i,x25r,x25i,x26r,x26i,x27r,x27i;
wire [7:0]x10r,x10i,x11r,x11i,x12r,x12i,x13r,x13i,x14r,x14i,x15r,x15i,x16r,x16i,x17r,x17i;
wire [7:0]x0,x1,x2,x3,x4,x5,x6,x7;
parameter w0r=8'b1;
parameter w0i=8'b0;
parameter w1r=8'b10110101;
parameter w1i=8'b01001011;
parameter w2r=8'b0;
parameter w3r=8'b01001011;
//stage1
bfly1 s11(x0,x4,w0r,w0i,x10r,x10i,x11r,x11i);
//stage2
bfly1 s21(x10r,x12r,w0r,w0i,x20r,x20i,x22r,x22i);
bfly2 s22(x11r,x11i,x13r,x13i,x21r,x21i,x23r,x23i);
bfly1 s23(x14r,x16r,w0r,w0i,x24r,x24i,x26r,x26i);
bfly2 s24(x15r,x15i,x17r,x17i,x25r,x25i,x27r,x27i);
//stage3
bfly1 s31(x20r,x24r,w0r,w0i,y0r,y0i,y4r,y4i);
bfly3 s32(x21r,x21i,x25r,x25i,w1r,w1i,y1r,y1i,y5r,y5i);
bfly2 s33(x22r,x22i,x26r,x26i,y2r,y2i,y6r,y6i);
bfly4 s34(x23r,x23i,x27r,x27i,w3r,w3i,y3r,y3i,y7r,y7i);
always@(posedgeclk)
case(sel)
0:beginyr=y0r; yi=y0i; end
42
endcase
endmodule
module bfly1(x,y,wr,wi,x0r,x0i,x1r,x1i);// sub module
input [7:0]x,y,wr,wi;
output[7:0]x1r,x1i,x0r,x0i;
assign x0r=x+(y*wr);
assign x0i=y*wi;
assign x1r=x+(~(y*wr)+1);
assign x1i=~(y*wi)+1;
endmodule
module bfly2(xr,xi,yr,yi,x0r,x0i,x1r,x1i); // sub module
input [7:0]xr,xi,yr,yi;
output [7:0]x0r,x0i,x1r,x1i;
assign x0r=xr;
assign x0i=~yr+1;
assign x1r=xr;
assign x1i=yr;
endmodule
module bfly3(xr,xi,yr,yi,wr,wi,x0r,x0i,x1r,x1i); // sub module
input [7:0]xr,xi,yr,yi,wr,wi;
wire [15:0]p1,p2,p3,p4;
wire [7:0]win,yrn,yin;
wire [8:0]ywr,ywi;
parametersht=8'b1000;
assignyrn=~yr+1;
assign yin=yi;
assign win=~wi+1;
assign p1=(yrn*wr)>>sht;
assign p2=(yin*win)>>sht;
assign p3=(yrn*win)>>sht;
assign p4=(yin*wr)>>sht;
assignywr=(~p1+1)+p2;
assignywi=p3+p4;
assign x0r=xr+ywr;
assign x0i=xi+ywi;
assign x1r=xr+(~ywr+1);
assign x1i=xi+(~ywi+1);
endmodule
module bfly4(xr,xi,yr,yi,wr,wi,x0r,x0i,x1r,x1i); // sub module
input [7:0]xr,xi,yr,yi,wr,wi;
wire [15:0]p1,p2;
wire [7:0]win,yrn,yin;
wire [8:0]ywr,ywi;
parametersht=8'b1000;
43
assignyrn=~yr+1;
assign yin=~yi+1;
assign win=~wi+1;
assign p1=(yrn*win)>>sht;
assign p2=(yin*win)>>sht;
assignywr=p1+(~p2+1);
assignywi=p1+p2;
assign x0r=xr+ywr;
assign x0i=xi+ywi;
assign x1r=xr+(~ywr+1);
assign x1i=xi+(~ywi+1);
endmodule
Implementation of 64 Point Pipeline FFT:
`timescale 1ns/1ns // main module
module tb_fft64;
regclk,reset,din_valid;
reg [9:0] din_re,din_im;
wire [9:0] dout_re,dout_im;
wiredout_valid;
fft64 f1(
.clk(clk),.reset(reset),.din_valid(din_valid),.din_re(din_re),.din_im(din_im),.dout_re(dout_re),.do
ut_im(dout_im) dout_valid(dout_valid) );
always #20 clk=~clk;
integer file;
initial begin
clk=0;
reset=0;
din_valid=0;
din_re=10'b0;din_im=10'b0;
#80 reset=1;din_valid=1;
din_re=10'b0010110100;din_im=10'b1000010101;
repeat(200)begin
#40 din_re=din_re+1;din_im=din_im+1;
file=$fopen("result_out.txt") | 1;
$fdisplay(file, "(%d) + (%d )*j ;", dout_re, dout_im );
end
end
endmodule
//submodule of fft64
`timescale 1ns/1ns
module fft64(clk,reset, din_valid, din_re,din_im, //first_r,first_i,last_r,last_i,
dout_re,dout_im,dout_valid);
parameter IN_WIDTH=10;
input clk,reset,din_valid;
input [IN_WIDTH-1:0] din_re,din_im;
output [IN_WIDTH-1:0] dout_re,dout_im;
44
outputdout_valid;
wiredout_valid;
wire [IN_WIDTH-1:0] first_r,first_i,last_r,last_i;
wire [IN_WIDTH-1:0] r0_0,i0_0,r32_0,i32_0;//the output signals of buffer
wire [IN_WIDTH-1:0] br0_1,bi0_1,br32_1,bi32_1;
wire [IN_WIDTH-1:0] dr32_1,di32_1,sr0_1,si0_1;
wire [IN_WIDTH-1:0] r0_1,i0_1,r32_1,i32_1;//the output signals of first stage
wire [IN_WIDTH-1:0] r0_2,i0_2,r32_2,i32_2;//the output signals of second stage
wire [IN_WIDTH-1:0] r0_3,i0_3,r32_3,i32_3;//the output signals of thirdstage
wire [IN_WIDTH-1:0] r0_4,i0_4,r32_4,i32_4;//the output signals of four stage
wire [IN_WIDTH-1:0] br0_5,bi0_5,br32_5,bi32_5,ir32_5,ii32_5;
wire [IN_WIDTH-1:0] r0_5,i0_5,r32_5,i32_5;//the output signals of five stage
wire [IN_WIDTH-1:0] r0_6,i0_6,r32_6,i32_6;
wireclk_in;
wire [4:0]count;
///***************** control *********************************************///
clk_divclk_div(.clk(clk),.reset(reset),.hclk(clk_in));
control control(.clk(clk_in),.reset(reset),.count(count));
input_buffer i1(.wclk(clk),.rclk(clk_in),.reset(reset),.din_valid(din_valid),
.indata_r(din_re),.indata_i(din_im), .first_r(r0_0),.first_i(i0_0),.last_r(r32_0),.last_i(i32_0));
//*****************first stage*********************************************///
bm
b1(.clk(clk_in),.reset(reset),.address(count),.ar(r0_0),.ai(i0_0),.br(r32_0),.bi(i32_0),.r0(br0_1),.i0
(bi0_1),.r16(br32_1),.i16(bi32_1));
delay16 d32_1(.clk(clk_in),.reset(reset),.x_r(br32_1),.x_i(bi32_1), .y_r(dr32_1),.y_i(di32_1));
switch16 s1(.count(count),.x0_r(br0_1),.x0_i(bi0_1),.x1_r(dr32_1),.x1_i(di32_1),
.y0_r(sr0_1),.y0_i(si0_1),.y1_r(r32_1),.y1_i(i32_1));
delay16 d0_1(.clk(clk_in),.reset(reset),.x_r(sr0_1),.x_i(si0_1),.y_r(r0_1),.y_i(i0_1));
///*****************second stage*********************************************///
bm
b2(.clk(clk_in),.reset(reset),.address({count[3:0],1'b0}),.ar(r0_1),.ai(i0_1),.br(r32_1),.bi(i32_1),
.r0(br0_2),.i0(bi0_2),.r16(br32_2),.i16(bi32_2));
switch8 s2(.count(count[3:0]),.x0_r(br0_2),.x0_i(bi0_2),.x1_r(dr32_2),.x1_i(di32_2),
.y0_r(sr0_2),.y0_i(si0_2),.y1_r(r32_2),.y1_i(i32_2));
///*************************third stage *********************************///
45
bm
.r0(br0_3),.i0(bi0_3),.r16(br32_3),.i16(bi32_3));
switch4
s3(.count(count[2:0]),.x0_r(br0_3),.x0_i(bi0_3),.x1_r(dr32_3),.x1_i(di32_3),.y0_r(sr0_3),.y0_i(s
i0_3),.y1_r(r32_3),.y1_i(i32_3));
///*************************four stage *********************************///
bm
.r0(br0_4),.i0(bi0_4),.r16(br32_4),.i16(bi32_4));
switch2 s4(.count(count[1:0]),.x0_r(br0_4),.x0_i(bi0_4),.x1_r(dr32_4),.x1_i(di32_4),
.y0_r(sr0_4),.y0_i(si0_4),.y1_r(r32_4),.y1_i(i32_4));
///*************************five stage *********************************///
butterfly b5(.a_r(r0_4),.a_i(i0_4),.b_r(r32_4),.b_i(i32_4),
.a1_r(br0_5),.a1_i(bi0_5),.b1_r(br32_5),.b1_i(bi32_5));
inverter i5(.count(count[0]),.a_r(br32_5),.a_i(bi32_5),.a1_r(ir32_5),.a1_i(ii32_5));
delay1 d32_5(.clk(clk_in),.reset(reset),.x_r(ir32_5),.x_i(ii32_5), .y_r(dr32_5),.y_i(di32_5));
switch1 s5(.count(count[0]),.x0_r(br0_5),.x0_i(bi0_5),.x1_r(dr32_5),.x1_i(di32_5),
.y0_r(sr0_5),.y0_i(si0_5),.y1_r(r32_5),.y1_i(i32_5));
///*************************six stage *********************************///
butterfly b6(.a_r(r0_5),.a_i(i0_5),.b_r(r32_5),.b_i(i32_5),
.a1_r(r0_6),.a1_i(i0_6),.b1_r(r32_6),.b1_i(i32_6));
dataout dataout(.clk(clk),.reset(reset),.first_r(r0_6),.first_i(i0_6),.last_r(r32_6),.last_i(i32_6),
.dout_re(dout_re),.dout_im(dout_im),.dout_valid(dout_valid));
//always @(posedgeclk_in)
//begin
//end
Endmodule
For dividing the clock:
`timescale 1ns/1ns
Moduleclk_div(clk,reset, hclk);
input clk,reset;
output hclk;
reghclk;
//reg count;
always @(posedgeclk or negedge reset)
begin
if (!reset)
hclk<=0;
else
hclk<=hclk+1;
46
end
endmodule
// control block implementation
`timescale 1ns/1ns
module control( clk,reset, count);
input clk,reset;
output [4:0] count;
reg [4:0] count;
begin
if (!reset) begin
count<=5'b11111;
end
else begin
count<=count+1;
end
end
endmodule
// input buffer
`timescale 1ns/1ns
Moduleinput_buffer(wclk,rclk,reset,din_valid, indata_r, indata_i,first_r, last_r,first_i, last_i);
inputwclk,rclk,reset,din_valid;
input [IN_WIDTH-1:0] indata_r,indata_i;
output [IN_WIDTH-1:0] first_r,last_r,first_i,last_i;
reg [IN_WIDTH-1:0] mem_r [127:0];

reg [IN_WIDTH-1:0] mem_i [127:0];
reg [6:0] count1;
reg [5:0] count2;
reg [IN_WIDTH-1:0] first_r,last_r,first_i,last_i;
always @(posedgewclk or negedge reset )
begin
if(!reset)
count1<=7'b1111111;
else if(din_valid==1)
count1<=count1+1;
end
always @(posedgewclk or negedge reset )
begin
if (!reset)
begin
mem_r[0]=10'b0; mem_i[0]=10'b0;mem_r[1]=10'b0; mem_i[1]=10'b0;mem_r[2]=10'b0;
mem_i[2]=10'b0;mem_r[3]=10'b0;mem_i[3]=10'b0;
47
48
mem_r[100]=10'b0; mem_i[100]=10'b0;mem_r[101]=10'b0;
mem_i[101]=10'b0;mem_r[102]=10'b0;
mem_i[105]=10'b0;mem_r[106]=10'b0;
mem_i[109]=10'b0;mem_r[110]=10'b0;
mem_i[113]=10'b0;mem_r[114]=10'b0;
mem_i[117]=10'b0;mem_r[118]=10'b0;
mem_i[121]=10'b0;mem_r[122]=10'b0;
mem_i[125]=10'b0;mem_r[126]=10'b0;
end
else if(din_valid==1)begin
mem_r[count1]=indata_r;
mem_i[count1]=indata_i;
end
end
always @(posedgerclk or negedge reset )
begin
if(!reset)
count2<=6'b111111;
else if(din_valid==1)
count2<=count2+1;
end
always @(posedgerclk or negedge reset)
if(!reset) begin
first_r<=10'b0;
last_r<=10'b0;
first_i<=10'b0;
last_i<=10'b0;
end
else begin
if (count2<32)begin
first_r<=mem_r[count2+64];
last_r<=mem_r[count2+96];
first_i<=mem_i[count2+64];
49
last_i<=mem_i[count2+96];
end
else begin
first_r<=mem_r[count2-32];
last_r<=mem_r[count2];
first_i<=mem_i[count2-32];
last_i<=mem_i[count2];
end
end
endmodule
`timescale 1ns/1ns
Moduletb_buffer;
Regwclk,rclk,reset,din_valid;
reg [9:0] indata_r,indata_i;
wire [9:0] first_r,first_i,last_r,last_i;
always #5 wclk=~wclk;
always #10 rclk=~rclk;
initial begin
wclk=0;
rclk=0;
reset=0;
din_valid=0;
indata_r=0;indata_i=0;
#10 reset=1;din_valid=1;
indata_r=0;indata_i=0;
repeat(200)begin
#10 indata_r=indata_r+1;indata_i=indata_i+1;
end
end
input_buffer i1(.wclk( wclk),.rclk(rclk),.reset(reset),.din_valid(din_valid),
.indata_r(indata_r),.indata_i(indata_i),
.first_r(first_r),.first_i(first_i),.last_r(last_r),.last_i(last_i));
endmodule
`timescale 1ns/1ns
Moduledff( clk,reset, d, y );
inputclk,reset;
input [IN_WIDTH-1:0]d;
output [IN_WIDTH-1:0]y;
wire [IN_WIDTH-1:0]y;
reg [IN_WIDTH-1:0]r;
assign y=r;
begin
if(!reset)begin
r<=10'b0;
50
end
else begin
r<=d;
end
end
endmodule
//butterfly
`timescale 1ns/1ns
module butterfly( a_r,a_i,b_r,b_i, a1_r,a1_i,b1_r,b1_i);
input [IN_WIDTH-1:0] a_r,a_i,b_r,b_i;
output [IN_WIDTH-1:0] a1_r,a1_i,b1_r,b1_i;
wire [IN_WIDTH:0] a0_r,a0_i,b0_r,b0_i;
assign a0_r=a_r+b_r;
assign b0_r=a_r-b_r;
assign a0_i=a_i+b_i;
assign b0_i=a_i-b_i;
assign a1_r=a0_r[IN_WIDTH:1];
assign b1_r=b0_r[IN_WIDTH:1];
assign a1_i=a0_i[IN_WIDTH:1];
assign b1_i=b0_i[IN_WIDTH:1];
endmodule
//delay 16
`timescale 1ns/1ns
module delay16 (clk,reset,x_r,x_i,y_r,y_i);
input clk,reset;
input [IN_WIDTH-1:0]x_r,x_i;
output[IN_WIDTH-1:0]y_r,y_i;
reg [IN_WIDTH-1:0]y_r,y_i;
wire [IN_WIDTH-1:0]x0_r,x0_i;
//wire [IN_WIDTH-1:0]x15_r,x15_i;
51
dff d1(.clk(clk),.reset(reset),.d(x_r),.y(x0_r));
dff d2(.clk(clk),.reset(reset),.d(x0_r),.y(x1_r));
//dffd16(.clk(clk),.reset(reset),.d(x14_r),.y(x15_r));
dff d17(.clk(clk),.reset(reset),.d(x_i),.y(x0_i));
dff d18(.clk(clk),.reset(reset),.d(x0_i),.y(x1_i));
//dffd32(.clk(clk),.reset(reset),.d(x14_i),.y(x15_i));
always @(posedgeclk or negedge reset )
begin
if(!reset)
begin
y_r<=10'b0;
y_i<=10'b0;
end
else
begin
y_r<=x14_r;
y_i<=x14_i;
end
end
endmodule
52
//delay8
`timescale 1ns/1ns
module delay8 ( clk,reset, x_r,x_i,y_r,y_i);
input clk,reset;
reg[IN_WIDTH-1:0]y_r,y_i;
begin if(!reset)
begin
y_r<=10'b0;
y_i<=10'b0;
end
else begin
y_r<=x6_r;
y_i<=x6_i;
end
end
endmodule
//delay4
`timescale 1ns/1ns
53
module delay4 (
clk,reset,
x_r,x_i,
y_r,y_i
);
input clk,reset;
begin
if(!reset)
begin
y_r<=10'b0;
y_i<=10'b0;
end
else begin
y_r<=x2_r;
y_i<=x2_i;
end
end
endmodule
//delay2
`timescale 1ns/1ns
module delay2 ( clk,reset,x_r,x_i, y_r,y_i);
input clk,reset;
54
begin
if(!reset)
begin
y_r<=10'b0;
y_i<=10'b0;
end
else begin
y_r<=x0_r;
y_i<=x0_i;
end
end
endmodule
//delay1
`timescale 1ns/1ns
module delay1 ( clk,reset, x_r,x_i, y_r,y_i);
input clk,reset;
reg[IN_WIDTH-1:0]y_r,y_i;
begin
if(!reset)
begin
y_r<=10'b0;
y_i<=10'b0;
end
else begin
y_r<=x_r;
y_i<=x_i;
end
end
endmodule
//switch16
`timescale 1ns/1ns
module switch16(count, x0_r,x1_r,x0_i,x1_i, y0_r,y1_r,y0_i,y1_i);
input [4:0] count;
input [IN_WIDTH-1:0] x0_r,x1_r,x0_i,x1_i;
output[IN_WIDTH-1:0] y0_r,y1_r,y0_i,y1_i;
reg[IN_WIDTH-1:0] y0_r,y1_r,y0_i,y1_i;
always @(count or x0_r or x1_r or x0_i or x1_i)
55
begin
if(count>15)
begin
y0_r=x1_r;
y0_i=x1_i;
y1_r=x0_r;
y1_i=x0_i;
end
else
begin
y0_r=x0_r;
y0_i=x0_i;
y1_r=x1_r;
y1_i=x1_i;
end
end
endmodule
// switch 8
`timescale 1ns/1ns
module switch8( count,x0_r,x1_r,x0_i,x1_i, y0_r,y1_r,y0_i,y1_i);
input [3:0] count;
begin
if(count>7)
begin
y0_r=x1_r;
y0_i=x1_i;
y1_r=x0_r;
y1_i=x0_i;
end
else
begin
y0_r=x0_r;
y0_i=x0_i;
y1_r=x1_r;
y1_i=x1_i;
end
end
endmodule
// switch 4
`timescale 1ns/1ns
module switch4(
56
count,
x0_r,x1_r,x0_i,x1_i,
y0_r,y1_r,y0_i,y1_i
);
input [2:0] count;
begin
if(count>3)
begin
y0_r=x1_r;
y0_i=x1_i;
y1_r=x0_r;
y1_i=x0_i;
end
else
begin
y0_r=x0_r;
y0_i=x0_i;
y1_r=x1_r;
y1_i=x1_i;
end
end
endmodule
// switch 2
`timescale 1ns/1ns
module switch2( count, x0_r,x1_r,x0_i,x1_i,,y0_r,y1_r,y0_i,y1_i);
input [1:0] count;
begin
if(count>1)
begin
y0_r=x1_r;
y0_i=x1_i;
y1_r=x0_r;
y1_i=x0_i;
end
else
begin
57
y0_r=x0_r;
y0_i=x0_i;
y1_r=x1_r;
y1_i=x1_i;
end
end
endmodule
// switch1
`timescale 1ns/1ns
module switch1( count, x0_r,x1_r,x0_i,x1_i,,y0_r,y1_r,y0_i,y1_i);
input count;
begin
if(count==1)
begin
y0_r=x1_r;
y0_i=x1_i;
y1_r=x0_r;
y1_i=x0_i;
end
else
begin
y0_r=x0_r;
y0_i=x0_i;
y1_r=x1_r;
y1_i=x1_i;
end
end
endmodule
// cla20
`timescale 1ns/1ns
module cla20 (a,b,ci,s,co);
input [19:0]a,b;
input ci;
output [19:0] s;
output co;
wire [19:0] a,b,s;
wireci,co;
wire [19:0]c;
wire [19:0] p,g,ps;
wire [18:0] p_1,g_1;
wire [15:0] p_2,g_2;
58
wire [3:0] p_3,g_3;
assign p=a|b;
assign g=a&b;
assign c[0]=ci;
assign c[1]=g[0]|(p[0]&ci);
//first line
opo2 l101(.p2(p[1]),.g2(g[1]),.p1(p[0]),.g1(g[0]),.p3(p_1[0]),.g3(g_1[0]));
opo3 l102(.p3(p[2]),.g3(g[2]),.p2(p[1]),.g2(g[1]),.p1(p[0]),.g1(g[0]),.p4(p_1[1]),.g4(g_1[1]));
opo4 l103(.p4(p[3]),.g4(g[3]),.p3(p[2]),.g3(g[2]),.p2(p[1]),.g2(g[1]),.p1(p[0]),.g1(g[0]),
.p5(p_1[2]),.g5(g_1[2]));//1
.p5(p_1[3]),.g5(g_1[3]));//2
opo4
l105(.p4(p[5]),.g4(g[5]),.p3(p[4]),.g3(g[4]),.p2(p[3]),.g2(g[3]),.p1(p[2]),.g1(g[2]),.p5(p_1[4]),.g5
(g_1[4]));//3
.p5(p_1[5]),.g5(g_1[5]));//4
opo4
l107(.p4(p[7]),.g4(g[7]),.p3(p[6]),.g3(g[6]),.p2(p[5]),.g2(g[5]),.p1(p[4]),.g1(g[4]),.p5(p_1[6]),.g5
(g_1[6]));//5
.p5(p_1[7]),.g5(g_1[7]));//6
.p5(p_1[8]),.g5(g_1[8]));//7
.p5(p_1[9]),.g5(g_1[9]));//8
.p5(p_1[10]),.g5(g_1[10]));//9
.p5(p_1[11]),.g5(g_1[11]));//10
opo4
l113(.p4(p[13]),.g4(g[13]),.p3(p[12]),.g3(g[12]),.p2(p[11]),.g2(g[11]),.p1(p[10]),.g1(g[10]),
.p5(p_1[12]),.g5(g_1[12]));//11
opo4
.p5(p_1[13]),.g5(g_1[13]));//12
opo4
.p5(p_1[14]),.g5(g_1[14]));//13
opo4
.p5(p_1[15]),.g5(g_1[15]));//14
opo4
.p5(p_1[16]),.g5(g_1[16]));//15
59
opo4
.p5(p_1[17]),.g5(g_1[17]));//16
opo4
l119(.p4(p[19]),.g4(g[19]),.p3(p[18]),.g3(g[18]),.p2(p[17]),.g2(g[17]),.p1(p[16]),.g1(g[16]),.p5(p
_1[18]),.g5(g_1[18]));//17
assign c[2]=g_1[0]|(p_1[0]&ci);
assign c[3]=g_1[1]|(p_1[1]&ci);
//second line
opo2 l201(.p2(p_1[3]),.g2(g_1[3]),.p1(p[0]),.g1(g[0]),.p3(p_2[0]),.g3(g_2[0]));
opo2 l202(.p2(p_1[4]),.g2(g_1[4]),.p1(p_1[0]),.g1(g_1[0]),.p3(p_2[1]),.g3(g_2[1]));
opo3
l205(.p3(p_1[7]),.g3(g_1[7]),.p2(p_1[3]),.g2(g_1[3]),.p1(p[0]),.g1(g[0]),.p4(p_2[4]),.g4(g_2[4]))
;
opo3
l206(.p3(p_1[8]),.g3(g_1[8]),.p2(p_1[4]),.g2(g_1[4]),.p1(p_1[0]),.g1(g_1[0]),.p4(p_2[5]),.g4(g_
2[5]));
opo3
l207(.p3(p_1[9]),.g3(g_1[9]),.p2(p_1[5]),.g2(g_1[5]),.p1(p_1[1]),.g1(g_1[1]),.p4(p_2[6]),.g4(g_
2[6]));
opo3
l208(.p3(p_1[10]),.g3(g_1[10]),.p2(p_1[6]),.g2(g_1[6]),.p1(p_1[2]),.g1(g_1[2]),.p4(p_2[7]),.g4(
g_2[7]));
opo4
l209(.p4(p_1[11]),.g4(g_1[11]),.p3(p_1[7]),.g3(g_1[7]),.p2(p_1[3]),.g2(g_1[3]),.p1(p[0]),.g1(g[0
]), .p5(p_2[8]),.g5(g_2[8]));//1
opo4
g_1[0]), .p5(p_2[9]),.g5(g_2[9]));//2
opo4
g_1[1]), .p5(p_2[10]),.g5(g_2[10]));//3
opo4
l212(.p4(p_1[14]),.g4(g_1[14]),.p3(p_1[10]),.g3(g_1[10]),.p2(p_1[6]),.g2(g_1[6]),.p1(p_1[2]),.g
1(g_1[2]), .p5(p_2[11]),.g5(g_2[11]));//4
opo4
1(g_1[3]), .p5(p_2[12]),.g5(g_2[12]));//5
opo4
1(g_1[4]), .p5(p_2[13]),.g5(g_2[13]));//6
opo4
1(g_1[5]), .p5(p_2[14]),.g5(g_2[14]));//7
60
opo4
l216(.p4(p_1[18]),.g4(g_1[18]),.p3(p_1[14]),.g3(g_1[14]),.p2(p_1[10]),.g2(g_1[10]),.p1(p_1[6]),
.g1(g_1[6]) .p5(p_2[15]),.g5(g_2[15]));//8
assign c[4]=g_1[2]|(p_1[2]&ci);
assign c[5]=g_2[0]|(p_2[0]&ci);
assign c[6]=g_2[1]|(p_2[1]&ci);
assign c[7]=g_2[2]|(p_2[2]&ci);
assign c[8]=g_2[3]|(p_2[3]&ci);
assign c[9]=g_2[4]|(p_2[4]&ci);
assign c[10]=g_2[5]|(p_2[5]&ci);
assign c[11]=g_2[6]|(p_2[6]&ci);
assign c[12]=g_2[7]|(p_2[7]&ci);
assign c[13]=g_2[8]|(p_2[8]&ci);
assign c[14]=g_2[9]|(p_2[9]&ci);
assign c[15]=g_2[10]|(p_2[10]&ci);
//third line
//result
assign c[16]=g_2[11]|(p_2[11]&ci);
assign c[17]=g_3[0]|(p_3[0]&ci);
assign c[18]=g_3[1]|(p_3[1]&ci);
assign c[19]=g_3[2]|(p_3[2]&ci);
assign co=g_3[3]|p_3[3]&ci;
assign s=(p&(~g))^c;
endmodule
//booth
`timescale 1ns/1ns
module booth (a,b,out,signal);
input [9:0] a;
input [2:0] b;
output [10:0] out;
output signal;
wire [9:0] a;
wire [2:0] b;
reg [10:0] out;
reg signal;
always @(a or b)
begin
case (b)
3'b000: begin
out=11'b0;
signal=0;
61
end
3'b001: begin
out={a[9],a};
signal=0;
end
3'b010: begin
out={a[9],a};
signal=0;
end
3'b011: begin
out[10:0]=a<<1;
signal=0;
end
3'b100: begin
out[10:0]=(~(a<<1));
signal=1;
end
3'b101: begin
out[10]=~a[9];
out[9:0]=~a;
signal=1;
end
3'b110: begin
out[10]=~a[9];
out[9:0]=~a;
signal=1;
end
3'b111: begin
out=11'b0;
signal=0;
end
endcase
end
endmodule
//complex_mul
`timescale 1ns/1ns
Modulecomplex_mul( a,b,c,d, yr,yi);
input [IN_WIDTH-1:0]a,b,c,d;
output [IN_WIDTH*2-1:0]yr,yi;
wire [IN_WIDTH:0] a1,c2,c3;
wire [IN_WIDTH-1:0] a0,c0,c1;
wire [IN_WIDTH*2-1:0] y0,y1,y2;
assign a1=a-b;
assign c2=c-d;
assign c3=c+d;
62
assign a0=a1[IN_WIDTH:1];
assign c0=c2[IN_WIDTH:1];
assign c1= c3[IN_WIDTH:1];
multiplier m0(.x(a0),.y(d),.result(y0));
multiplier m1(.x(c0),.y(a),.result(y1));
multiplier m2(.x(c1),.y(b),.result(y2));
assign yr=y0+y1;
assignyi=y0+y2;
endmodule
//tbcla
`timescale 1ns/1ns
Moduletbcla;
Regclk;
reg ci;
reg [19:0] a,b;
wire [19:0] s;
wire co;
reg [20:0] check;
cla20 c1(.a(a),.b(b),.ci(ci),.s(s),.co(co));
always #5 clk=~clk;
initial begin
clk=1'b0;
a=20'b0;
b=20'b0;
ci=1'b0;
repeat(100) begin
a=$random;b=$random;ci=1'b0;
check=a+b+ci;
#10 $display ($time, " %d+%d+%d=%d(%d)",a,b,ci,{co,s},check);
end
end
endmodule
//cl42_20
module cl42_20(a,b,c,d,ci,s,cr);
input [19:0]a,b,c,d;
input ci;
output [20:0]s;
output [20:0]cr;
wire [19:0] txr,tao,toa;
assigntxr=(a^b)^(c^d);
assigntao=(a&b)|(c&d);
assigntoa=(a|b)&(c|d);
assign s={txr[19],txr}^{toa,ci};
assign cr=({txr[19],txr}&{toa,ci})|((~{txr[19],txr})&{tao[19],tao});
endmodule
63
//multiplier
`timescale 1ns/1ns
module multiplier ( x,y, result );
input [9:0]x,y;
output [19:0]result;
wire [19:0] result;
wire [9:0] a,b;
wire [10:0] w0,w1,w2,w3,w4;
wire x0,x1,x2,x3,x4;
wire [14:0] s1,s2;
wire [12:0] s3,s4;
wire [20:0] s5,s6;
wire [19:0] s7;
wire co;
assign a=x;
assign b=y;
assign result=s7;
//booth coding
booth b0(.a(a),.b({b[1:0],1'b0}),.out(w0),.signal(x0));
booth b1(.a(a),.b(b[3:1]),.out(w1),.signal(x1));
//******************first line with 3:2 compressor w0_w1_w2 w3_w4_x4************//
csa_15
c1(.a({{4{w0[10]}},w0}),.b({{2{w1[10]}},w1,1'b0,x0}),.ci({w2,1'b0,x1,2'b0}),.s(s1),.co(s2));
csa_13 c2(.a({{2{w3[10]}},w3}),.b({w4,1'b0,x3}),.ci({10'b0,x4,2'b0}),.s(s3),.co(s4));
//********************second line with 4:2 compressor******************//
cl42_20
c3(.a({{5{s1[14]}},s1}),b({{4{s2[14]}},s2,1'b0}),.c({{s3[12]},s3,1'b0,x2,4'b0}),.d({s4,7'b0}),.c
i(1'b0),.s(s5),.cr(s6));
//******************** leading carry adder**********************************//
cla20 cla(.a({s5[19:0]}),.b({s6[18:0],1'b0}),.ci(1'b0),.s(s7),.co(co));
endmodule
//inverter
`timescale 1ns/1ns
module inverter( count, a_r,a_i, a1_r,a1_i);
input count;
input [IN_WIDTH-1:0] a_r,a_i;
output [IN_WIDTH-1:0] a1_r,a1_i;
wire[IN_WIDTH-1:0] a1_r,a1_i;
assign a1_r=(count)?a_i:a_r;
assign a1_i=(count)?(-a_r):a_i;
endmodule
64
//bm
`timescale 1ns/1ns
module bm(clk,reset address, ar,ai,br,bi, r16,i16, r0,i0);
input clk,reset;
input [4:0]address;
input [IN_WIDTH-1:0] ar,ai;
input [IN_WIDTH-1:0] br,bi;
output [IN_WIDTH-1:0] r16,i16;
output [IN_WIDTH-1:0] r0,i0;
wire [IN_WIDTH-1:0] r16,i16;
wire [IN_WIDTH-1:0] r0,i0;
wire [IN_WIDTH-1:0] yr0,yi0,yr16,yi16;
wire [IN_WIDTH*2-1:0] yr,yi;
wire [IN_WIDTH-1:0] wr,wi;
butterfly b1(.a_r(ar),.a_i(ai),.b_r(br),.b_i(bi),.a1_r(yr0),.a1_i(yi0),.b1_r(yr16),.b1_i(yi16));
twiddle1 t1(.clk(clk),.reset(reset),.address(address),.wr(wr),.wi(wi));
complex_mul m1(.a(yr16),.b(yi16),.c(wr),.d(wi),.yr(yr),.yi(yi));
assign r0=yr0;
assign i0=yi0;
assign r16=yr[IN_WIDTH*2-1:IN_WIDTH];
assign i16=yi[IN_WIDTH*2-1:IN_WIDTH];
endmodule
// opo2
`timescale 1ns/1ns
module opo2(p2,g2,p1,g1,p3,g3);
input p1,p2,g1,g2;
output p3,g3;
assign p3=p2&p1;
assign g3=g2|(g1&p2);
endmodule
//opo3
`timescale 1ns/1ns
module opo3(p3,p2,p1,g3,g2,g1,p4,g4);
input p1,p2,p3,g1,g2,g3;
output p4,g4;
assign p4=p3&p2&p1;
assign g4=g3|(p3&g2)|(p3&p2&g1);
endmodule
//opo4
`timescale 1ns/1ns
module opo4(p4,p3,p2,p1,g4,g3,g2,g1,p5,g5);
input p4,p3,p2,p1,g4,g3,g2,g1;
output p5,g5;
assign p5=p4&p3&p2&p1;
assign g5=g4|p4&g3|p4&p3&g2|p4&p3&p2&g1;
65
endmodule
//csa13
module csa_13(a,b,ci,s,co);
input[12:0] a,b,ci;
output[12:0] s,co;
assign s=a^b^ci;
assign co=(a&b)|(a&ci)|(b&ci);
endmodule
//dataout
`timescale 1ns/1ns
Moduledataout ( clk,reset,first_r,first_i, last_r,last_i, dout_re,dout_im,dout_valid);
Inputclk,reset;
input [9:0]first_r,first_i,last_r,last_i;
output [9:0]dout_re,dout_im;
outputdout_valid;
reg [9:0]dout_re,dout_im;
regdout_valid;
reg flag;
reg [6:0]count2;
reg count;
begin
if (!reset)
count2<=7'b1111111;
else if(flag==0)
count2<=count2+1;
end
begin
if (!reset)
flag<=0;
else if(count2==7'b1111101)
flag<=1;
end
begin
if (!reset)
count<=1;
else
count<=count+1;
end
begin
if(!reset) begin
dout_re<=10'b0; dout_im<=10'b0;
end
66
else begin
if(count==0) begin
dout_re<=first_r; dout_im<=first_i;
end
else begin
dout_re<=last_r;dout_im<=last_i;
end
end
end
begin
if(!reset)
dout_valid<=0;
else if(flag==1)
dout_valid<=1;
else
dout_valid<=0;
end
endmodule
//twiddle
`timescale 1ns/1ns
module twiddle1(clk,reset,address,wr,wi);
parameter mem0=10'b0111111111;
input clk,reset;
input [4:0]address;
output [IN_WIDTH-1:0] wr,wi;
reg [IN_WIDTH-1:0] wr,wi;
if (!reset) begin
wr<=0;wi<=0;
67
end
else
begin
case(address)
5'd0 :beginwr<=mem0;wi<=0; end
5'd1 : begin wr<=mem1;wi<=-mem15;end
5'd2 : begin wr<=mem2;wi<=-mem14; end
5'd3 : begin wr<=mem3;wi<=-mem13; end
5'd10 :beginwr<=mem10;wi<=-mem6;end
5'd16 :beginwr<=0;wi<=-mem0; end
5'd17 :beginwr<=-mem15;wi<=-mem15;end
5'd18 :beginwr<=-mem14;wi<=-mem14; end
5'd19 :beginwr<=-mem13;wi<=-mem13; end
5'd20: begin wr<=-mem12;wi<=-mem12;end
Endcase
end
endmodule
//tbmul
`timescale 1ns/1ns
Moduletbmul;
Regclk,reset;
reg [9:0] x,y;
wire [19:0] result;
reg [19:0] check;
68
multiplier m0(
.x(x),
.y(y),
.result(result)
);
always #20 clk=~clk;
initial begin
clk=0;
reset=1;
x=-10'd15;
y=10'd30;
#5 reset=0;
#20 reset=1;
#15;
//check=x*y;
repeat(100) begin
x=x+20;y=y+30;
check=x*y;
#40;
end
end
endmodule
69
VITA
Bhavishya Murukutla was born in Guntur, Andhra Pradesh, India. She has graduated with a
Bachelor‟s degree in Electronics and Communication Engineering from JNTU Kakinada
University, Kakinada, India in May 2012. After completion of her Bachelor‟s degree, she moved
to the United States of America in August 2012 to pursue her Master of Science in Electrical
Engineering at Texas A&M University–Kingsville. She is scheduled to graduate in December
2013.
70

DD

Uploaded by

Copyright:

Available Formats

DD

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

DD

Uploaded by

Copyright:

Available Formats

PIPELINE FFT ARCHITECTURE IMPLEMENTATION USING VERILOG HDL

Submitted to the College of Graduate Studies

Major: Electrical Engineering

PIPELINE FFT Architecture Implementation Using VERILOG HDL

Bhavishya Murukutla, Bachelor of Technology, JNTU University, Kakinada, INDIA

Chairman of Advisory Committee: Dr. Reza Nekovei

[1]. But the delay is present in this architecture also.

answering every question very patiently.

I also extend my appreciation to the members of the supervisory committee;

successful without their invaluable instructions.

great love is a significant impetus throughout my study life.

TABLE OF CONTENTS ................................................................................................................v

LIST OF FIGURES ....................................................................................................................... vi

LIST OF TABLES ........................................................................................................................ vii

CHAPTER I. INTRODUCTION ....................................................................................................1

CHAPTER II. DIFFERENT FFT ALGORITHMS .........................................................................7

CHAPTER III. IMPLEMENTATION OF DITFFT AND PIPELINED FFT ...............................24

CHAPTER IV. RESULTS .............................................................................................................33

CHAPTER V. CONCLUSION ......................................................................................................37

Fig. 1.1. Quatrus-II Work Flow………………………………………………………………….. 5

Fig. 2.1. Basic Butterfly Structure ..................................................................................................8

Fig. 2.2. Modified Butterfly Structure .............................................................................................9

Fig. 2.3. Butterfly Structure used in the DIF FFT .........................................................................10

Fig. 2.4. Basic Decimation in Time FFT .......................................................................................13

Fig. 2.5. 8-Point Decimation in Time FFT. ...................................................................................14

Fig. 2.6. Sequence of input in DIT FFT .........................................................................................15

Fig. 2.7. Radix-4 DIT FFT .........................................................................................................…16

Fig. 2.9. Butterfly structure used in DIF FFT ............................................................................…19

Fig. 2.10. 8-Point Radix-2 Decimation in Frequency FFT ............................................................20

Fig. 2.11. Radix-4 DIF FFT ...........................................................................................................20

Fig. 2.12. Basic Pipeline architecture ........................................................................................…22

Fig. 2.13. R2MDC Pipeline architecture of 8- Point .....................................................................22

Fig. 3.1. Stage-1 Butterfly..............................................................................................................24

Fig. 3.2. Second Stage Butterfly ....................................................................................................26

Fig. 3.3. Butterfly used in third stage.............................................................................................28

Fig. 3.4. Pipeline architecture of 64-Point FFT using R2MDC .....................................................32

Fig. 4.1. Simulation Output for 8-Point DIT FFT..........................................................................33

Fig. 4.2. RTL View of 8-Point DIT FFT .......................................................................................34

Fig. 4.3. Resource Utilization Summary........................................................................................35

Fig. 4.5. Simulation Output for 64-Point Pipeline FFT .................................................................36

Table. 2.1. Bit Reversal Order .......................................................................................................12

Table. 3.1. Comparison between Normal DFT and FFT ...............................................................31

can use the signal for further processing.

using different transformation techniques [3]. They are the following:

1) Fourier series [3]

The FT of a signal is done by decomposing the signal in to sum of finite sinusoidal

spaced samples of a function is transformed to finite combination of complex sinusoidal signals

The Discrete Fourier Transform can be expressed as [4]:

Here Nis the samples present in the signals

f(n) is the time domain signal

discrete and the output frequency domain signal is continuous signal.

The DTFT is expressed as [4]:

f(n) is time domain signal and discrete

No of complex multiplications present in DFT: 𝑁 2

No of complex additions present in DFT: 𝑁(𝑁 − 1)

If we consider 8-point input sequence the following is required to convert in to frequency

No of complex multiplications: 256

No of complex additions: 240