Vlsi Architecture For r2b r4b r8b
Vlsi Architecture For r2b r4b r8b
Vlsi Architecture For r2b r4b r8b
1
CONTENTS:
CHAPTER 1
INTRODUCTION.
CHAPTER 2
LITERATURE SURVEY.
2.1. A HIGH FLEXIBLE LOW LATENCYN MEMORY BASED FFT
PROCESSOR FOR 4G,WLAN AND FUTURE 5G.
CHAPTER 7
2
RESULTS AND SIMULATION.
CHAPTER 8
CONCLUSION.
REFERENCE.
3
LIST OF FIGURES:
FIG 1.1. QUARTUS-II WORKFLOW
FIG 3.1. BASIC BUTTERFLY STRUCTURE
FIG 3.2. MODFIED BUTTERFLY STRUCTURE
FIG 3.3. BUTTERFLY STRUCTURE USED IN THE DITFFT
FIG 3.4. BASIC DECIMATION IN TIME FFT
FIG 3.5. 8 POINT DITFFT
FIG 3.6 SEQUENCE OF INPUT IN DITFFT
FIG 3.7. RADIX-4 DITFFT
FIG 3.8. INPUT TO OUTPUT SEQUENCE ENERATION OF DITFFT
FIG 3.9. BUTTERFLY STRUCTURE USED IN DITFFT
FIG 3.10. 8-POINT RADIX-2 DITFFT
FIG 3.11.RADIX 4-DITFFT
FIG 3.12. BASIC PIPELINE ARCHITECTURE
FIG 3.13. R2MDC PIPELINE ARCHITECTURE OF 8-POINT
FIG 4.1. 4 BIT BINATY EXCESS CONVERTER
FIG 4.2. 16- BIT REULAR CSLA
FIG 4.3. GROUP 2
FIG 4.4. GROUP 3
FIG 4.5. GROUP 4
LIST OF TABLES:
TABLE 3.1. BIT REVERSAL ORDER
4
ABSTARCT:
The FFT is enumerate is DFT and DFT is enumerate is consecutive way, it accomplishes
continuous application with constant preparing when the information is persistently taken care
of through the processor. Included paper, joined is radix-2 butterfly (R2B), R4B & R8B
components based single path delay feedback (SDF) and modified carry select adder (MCSLA)
technique, for diminishing the computational stages and for decreasing the equipment use than
the R2B and R4B FFT. The implemented SDF technique has single delay commutators at one
component. The proposed technique has less number of multipliers and the more modest
number of computational stages and butterfly components than the Radix-2 & 4 FFT.
5
CHAPTER 1
1. INTRODUCTION:
In the start of the twenty-first century, it was difficult to assemble various machines and their
apparatuses without applying the standards of gadgets. Also, correspondence blast was
impractical without the headway in the gadgets business and without utilizing IC [1]. The
coordinated circuit configuration has arrived at a phenomenal turn of events essentially because
of the advances in VLSI innovations and framework plan. Referable to the headway in the
CMOS handling innovation, the component size of the semiconductor has been limited, also, the
incorporated circuit. The upgrades in the VLSI innovation have invigorated an extraordinary
premium in delivering specific reason equal processor exhibits to encourage constant sign
Shashidhara et al. [4], introduced a multimode memory-based Fast FFT design for a clinical
framework. The proposed design upheld remote showcases dependent on MIMO- OFDM.
The FFT processor empowered the utilization of 2- stream 4096/2048/1024-point FFTs and 1-
to 4-stream 128/64- point FFTs for FD-OCT and OFDM applications individually. The
proposed design gave the information access for up to sixteen memory ways utilizing practical
four-bank single-port SRAM working in four-word information width. The proposed design
1
arrangement utilizing equipment productive increase and store units. A test chip was planned
utilizing TSMC-0.18 CMOS innovation with a center size of 4.8. Post-format reproduction
performed for 4096- point FFT at 80MHz and the 128-point FFT at 40MHz. The 4096-point
FFT at 80MHz also, 128-point FFT at 40MHz gave the throughput of 152 MS/s and 160 MS/s
individually. Notwithstanding that, 4096-point FFT at 80MHz and 128- point FFT at 40MHz
are devoured 156.2 mW and 69.9 mW powers individually. Further, framework level check
based FFT. This design is reasonable for playing out the calculation for genuine esteemed
cycles are limited and the use of the butterfly processing element is expanded by the stage
parcel of the Real FFT. The PE could handle four contributions to resemble by utilizing two
radix-2 butterflies and just two multiplexers. The proposed memory addressing succession
and a control sign of the multiplexers are created by the counter rationale as indicated by the
RFFT calculation stage. Notwithstanding that, the proposed RFFT engineering underpins all
the more preparing component. The proposed engineering decreased the calculation cycles by
a factor of 17.5% for a 32-point RFFT calculation while keeping up lower equipment use and
Fahad Qureshi et al. [6], introduced the 128/256/512/1024/1536/2048-point SDF pipeline FFT
engineering for long haul development and versatile overall interoperability for microwave
access frameworks. The proposed SDF engineering required a minimal effort calculation plan
to empower 1536-point FFT, which altogether lessens equipment costs just as force
utilization. To execute a radix-3 FFT, the proposed configuration included proficient three-
2
stage SDF pipeline engineering.
In most of the communication systems [1] and control system the frequency spectrum of
the signal is important to calculate the frequency range of the signal to know whether the
Most of the signals are in time domain in which the variation is represented with respect
to time so to get the frequency domain signal which means to know the signal variation
with respect to frequency [2] we need a transformation from time domain to frequency
domain which is done by using different transformation techniques [3]. They are the
following:
2) Fourier Transform(FT)
3) Laplace Transform
4) Z-transform
The Fourier series is applied only for the repetitive signals or periodic signals so we
are going to the Fourier Transform which can be applied for the periodic and non-
periodic signals
[3] also.
components. The FT can be done by using the Discrete Fourier Transform in which
3
Here Nis the samples present in the signals
4
F(k) is the frequency domain signal
The Frequency domain output F(k) is discrete signal as the input considered should be in discrete
ones.
The Fourier Transform can also be done by using the DTFT in which the input signal should be
Where F(ejw )is the Frequency domain signal which is continuous and periodic one
ω is the frequency
If the input is continuous signal we need to do sampling and get the output as discrete
signal and apply the DTFT technique to get the frequency domain signal.
Applying DFT and DTFT to the signals in time domain leads to frequency domain of the signal
For a N point the conversion can be done by using the following [3] [4]
domain:
64 complex multiplications
56 complex additions
If we increase the numbers of samples in the input sequence the multiplications going to increase
5
very rapidly.
6
Let us consider the 16-point sequence the conversion requires the following:
To reduce the no of complex multiplications and additions [4], we are going to use FFT
technique to calculate the frequency domain of the signal the conversion requires the following:
The Fast Fourier Transform is done using the COOLEY-TUKEY [6] algorithm which is
Basically, Fast Fourier Transform can be done by using radix algorithm which can be of
type radix-r, r can be any integer and the N-point FFT can be calculated by using different radix
like radix-2, radix-4 and radix-8 and so-on. The FFT can be implemented by using two different
To increase the speed, the pipeline architecture [6] [7] is used in the computation of FFT
and in particularly Multi Delay Commutator [6] [8] [9] [10] architecture is used in the
communication systems. In the Pipeline architecture, we also use a butterfly element and the
butterfly can be done by using different radix like radix-2 and radix-4 and in this the elements
The implementation of FFT using the DIT FFT for 8-Point sequence is done using the
Verilog and synthesized in Quartus-II [14] and the Pipeline architecture for 64-Point is done
7
Radix:
In this the Radix means number of elements can be taken in at a time and processing can
be done using the Butterfly if it is a Radix-2 the input elements will be ‘2’ and the processing
like addition and multiplication operations are done and the output can be obtained. If it is
Radix-4 the input elements will be ‘4’and the output elements will be ‘4’ at a time.
Verilog:
Basically the hardware description languages are different from the software description
languages and the mostly used hardware description languages are as follows:
1) VHDL
2) Verilog HDL
Quartus-II
The synthesis of designed code will be done by using the Quartus-II and to do the
synthesis first we need to do simulation in Modelsim and the synthesis and implementation is
done, placing of Integrated circuits, allocating pins respectively and the timing analysis will be
After the synthesis we get the different views of the circuit we designed they are:
1) RTL view
The dumping of the program in to the hardware can also be done by using QUARTUS-II
8
In the first step coding can be done by using the different hardware description
languages. In this we are using the Verilog Language But we can also use the VHDL language
and the synthesis and implementation can also be done by using Xilinx software also.
DESIGN
VERILOG Coding
Functional Simulation
Using MODLESIM
Synthesis and
Implementation
SIMULATION
PROGRAMMING
And
CONFIGURATION
9
The thesis is divided in to Chapter and subsections:
1) Chapter II: deals with theoretical description of different types of FFT algorithms
a) Cooley-Tukey Method
b) DIT FFT
c) DIF FFT
a) Simulation Results
b) RTL view
4) Chapter V: Conclusion
10
CHAPTER 2
2. LITERETURE REVIEW:
This paper presents a novel runtime-reconfigurable, mixed radix core for computation 2−, 3−, 4− point fast
Fourier transforms (FFT). The proposed architecture is based on radix3 Wingorad Fourier transform,
complexity is equal to multiplierless 3-point FFT in terms of adders/subtractors with the exception of a few
11
additional multiplexers. The proposed architecture supports all the FFT sizes which can be factorized into
2, 3, 4 point. It is also explained that the accuracy of the proposed architecture is not affected due to
constant multiplication.
2.3. Power Efficient Radix-2 DIT FFT using Folding Technique and DKG
Reversible Gate:
FFT is normally utilized in computerized flag preparing algorithms. 4G correspondence and different
remote framework based correspondence are directly hotly debated issues of innovative work in the remote
correspondence and organizing field. FFT is a calculation that speeds up the count of DFT. In the main
stage, low multifaceted nature Radix-2 Multi-way Delay Commutator (R2MDC) FFT recurrence change
method is created through Exceptionally Large Scale Integration System structure condition. Low power
utilization, less zone and rapid are the VLSI primary parameters. Customary R2MDC FFT structure has
more equipment multifaceted nature because of its escalated computational components. Two strategies are
utilized to plan radix-2 FFT calculation. In firest strategy is plan radix-2 FFT with the help of reversible
Peres gate and TR gate. Second method is design radix-2 FFT with the help of reversible DKG Gate. The
all structure are usage vertex-4 device family Xilinx programming and looked at past calculation.
12
CHAPTER 3
Cooley-Tukey Method:
This Method is most used in the computation of FFT in this the DFT of N point is
N=N1*N2
point.
In the N1 and N2 one of them is small value compared with other one and if N1 is radix FFT can
be done by using Decimation in Time FFT and if N2 is radix FFT can be done by using
The operation done in recursive model by using radix-2 DFT‟s and the radix-2 DIT will be done
by multiplying the phase factor which is called as Twiddle factor to odd transform after that
addition and subtraction operation will be performed, butterfly of even and odd transform is
The Fast Fourier Transform can be done by using two different methods[4]:
This is done by dividing in to number of stages and they can be calculated as:
v = log2 N
13
N-Point DFT with even N will be calculated with two (N N
point DFT is
point DFT again each
2
)
2
done by using (N
point and so on until it reach to 2-point DFT‟s only.
)
4
Basically the Fast Fourier Transform can be done by using butterfly structure and the operation
In the two ways one is used in the DIT FFT and other is used in the DIF FFT
Here a and b are the input samples for the butterfly and Wr ,W(r+(N/2))) are the Twiddle
N N
Factors.
14
Twiddle Factor:
Twiddle Factor is root of a unity complex in the butterfly operation used to compute the discrete
Fourier transform
15
�
The butterfly requires two complex multiplications and two complex additions we can reduce the no of
complex multiplications by using symmetry property.
The symmetry property is
The value will be equal to “-1” the Twiddle Factor can becomes equal to the−W
r
N From this the butterfly can be modified as shown below:
16
This requires only “1” complex multiplication and“2” complex additions.
17
Butterfly used in DIF FFT [1]:
In the DIT FFT the input will be given in bit reversal order and the output will be in the
order.
In the DIF FFT the input will be in the correct order and the output will be in the bit
reversal order.
The Bit reversal order is generated using the exchange the first and last bits, the next bit
to first to the previous bit to the last bit present in the sequence and so on.
18
For getting the bit reversal order
Let us consider 8 point input the bit reversal order can be as shown below:
19
Original sample Binary Representation Bit reversal Order
X ( X( 000) X( 000) = X( 0)
0)
X( 1) X( 001) X (100) = X(4)
The algorithm in which the x(n) is break down in to smaller subsequences and the
principle of the decimation in time FFT can be explained by considering the No of i/p points in
N = 2r
The x(n) is break down in to two parts in which one has only even parts and other has odd parts.
The Frequency domain can be obtained from the time domain by using the below formula:
And the breaking of the signal in to two subsequences leads to the frequency domain as
represented below:
20
21
Here n will be replaced by 2*r where r varies from 0 to (N/2)-1 the above equation can be
By the symmetry property we can break the Twiddle Factor and the frequency domain is sum of
K
even sequence and odd sequence multiplied by W�N
Dividing the input sequence in to odd and even can be done by giving the input in bit reversal
22
7) .
23
Again each 𝑁
is divided in to two 𝑁
point DFT and so on the process is done till the
2 4
2- point DFT.
1) Radix-2
2) Radix-4
24
Radix-2 DIT FFT:
01234567
8- Point
0246 1357
4-Point
04
2 6 15 37
2- Point
25
Radix-4 FFT:
The Radix-4 basic butterfly diagram is as shown below:
26
DIF FFT [1]:
The DIF FFT can be done with i/p in normal order and the o/p in the bit reversal order.
The first half sequence is with X(n) where 0≤n≤ (N/2)-1 and
1) Radix-2
2) Radix-4
27
Radix-2 DIF FFT:
In this the 𝑁-point is divided in to two parts and the two parts are individually divided as
shown below:
28
The above equation is modified as shown below:
The butterfly structure used in DIF FFT is different from the Butterfly structure used in
DIT FFT. The Basic difference is in the DIT FFT Butterfly the multiplication is done before
additions but in the DIF FFT Butterfly the multiplication is done after additions.
29
Fig. 3.10. 8-Point Radix-2 DIF FFT
30
Radix- 4 FFT:
31
In this the FFT, length can be calculated by using 4𝑣 here v is the number of stages and the
Pipelined R2MDC:
For the H/W architecture of the FFT there are three different types of architectures they are:
In all these the pipeline architecture [8] is very attractive in the multimedia
To reduce the complex multiplications further more we proposed pipeline architecture which
produces the low latency, power consumption will be low, throughput will be high and occupies
less area.
In this the MDC architecture is having the multiple input data because of its high
32
throughput and the hardware utilization of the MDC is low.
33
Single Path Delay feedback is best solution for the single input data stream and it is used when
the memory requirement is less but in the SDC architecture [18] usage of adders is very low but
the memory requirement will be more and the output will be in reversal order and we need to get
In this the input data stream is divided in to two parallel data streams and the processing
is done using the delay elements, butterfly elements and processing elements and in the MDC
architectures depending up on radix we are using the utilization of the resources will depend.
If we are using radix-r the utilization of the resources will be 1/r, r can be any integer and
if we are using radix-2 for the FFT computation the utilization of the resources will be 50%.
If we are using the radix-2 it is called as R2MDC pipeline architecture and the architecture using
X3X2X1X0
R R R
REG BF BF
S BF
S
R R
X7X6X5X4 -j R
Twiddle
Factor
The Pipeline architecture can be implemented using different radix like radix-4 it is
called as R4MDC.By using the pipelined R2MDC architecture the complex multipliers will be
reduced compared with the normal DIT FFT and DIF FFT.
35
CHAPTER 4
The most crucial areas of research in VLSI system design is to design circuits that work faster
by performing quick calculations and to be efficient. So there can be numerous combinations that
can be designed to optimize parameters such as power, area and delay. In conventional digital
adders,the sum and carry are generated after a normalized delay. When an arithmetic operation is
performed for larger bit size this delay is maximized, hence fast adders such as CSLA and CSA
are used to minimize the delay. The structure of RCA is simple and requires lesser gates
compared to other adders but the computational speeds are significantly affected.
The problem of carry propagation delay in conventional adders can be avoided by using a
carry select adder structure as they independently produce more than one carry and after which a
carry is selected in order to get the sum. However, it is understood from the structure of CSLA
that it is not efficient in terms of area as it uses more than one pair of Ripple carry adders (RCAs)
to generate the partial sum and carry by accounting for the carry input to the RCAs as Cin=0 and
Cin=1, then the final Cin is selected by a multiplexer.So as there are pairs of RCAs used in each
stage in regular CSLA, it occupies larger area and consumes more power for functionality. Hence
36
idea of the work that has been put forward is to make use ofan n-bit binary to excess-1 code
convertors (BEC) to enhance the speed of the operation. The same concept of BEC can very well
be used to replace the RCA for Cin=1 to improve speed and delay further.And hence, it can be
observed that the power delay product and area delay product also decreases. One of the major
advantages of using a BEC is its structure. The BEC consumes far less number of logic gates than
the full adders in RCA for the reason that there is a significant decrease in area when replaced
with BEC andin turn lesser power consumption to drive the logic block[1]. Arithmetic units are
often the work horse of any computational circuit. And addition is the backbone of any such logic
block. So a faster means of performing this operation is essentially required. CSLAs are one of
the best options to do the same. But the excessive area overhead makes it unattractive. Several
techniques have been found out to overcome this problem. One of them is the introduction of add
one circuit. It shows the usage of BECs instead of RCAs in the CSLA structure that reduces area
without compromising delay even when the input length progresses [2]. VLSI hardware
implementation is an area where different domains such as image processing, neural networks and
so on are integrated. CSLA architecture with BEC is used extensively in high speed VLSI
hardware implementation. This is because of the very simple reason that the number of gates are
reduced and in turn an area reduction is achieved by replacing the RCA with BEC in the existing
structure. It is noteworthy that this architecture is also used as an alternative for adder
to achieve a trade-off between the cost and performance in any carry propagation adder design
because it has got the edge of logarithmic gate depth as is for the case with any other structure of
the distant-carry adder family. The consumption of more power due to the amount of circuitry
hinders the direct usage of the architecture. Gate depth is as important as gate output load from
power perspective because of today’s deep sub-micron technology. Hencethe need to reduce the
37
transistor count and to simplification of the layout is justified. Area and power reduction are
followed from the in turn reduction of the number of transistors. The SQRT CSLA
is found to have outperformed rest of the CSLs with fewer transistors and least PDP and ADP
[4].All these developments have contributed immensely to low power applications, versatile
The parameters of area and delay are evaluated for gates such as AND, OR, and NOT, each of
which ishaving area equal to 1unit and a delay of 1 unit.Finally for a particular logic block, the
number of gates employed in it determines the total area of that block and then adds up the total
gates in the elongated path of a logic block that contribute to the maximum value of delay. Based
on the given approach, the basic blocks of CSLA consists of gates like 2:1 Multiplexer (Mux),
Half Adder (HA) and Full Adder (FA) which are evaluated and listed in the table I.As mentioned
above, the prime idea is to use a block instead of RCA with Cin=1 that is better than RCA for
reduction in area and power of regular carry select adder. The Ripple carry adder which is of n-
bits is replaced by an n+1-bit BEC. The architecture and working of BEC is explained using Fig1
When ripple carry adder is replaced with BEC,the reduction in area and power can be observed
when constructed for larger sized CSLA structures. The Boolean representation of the 4- bit BEC
is given below:
38
4.2. REGULAR 16-B SQRT CSLA EVALUATION-AREA AND DELAY
The fundamental structure of 16-b regular SQRT CSLA is shown in Fig 2[1]. The structure here
consists of five modules or groups of various sized ripple carry adders. Each module is shown in
Fig. 3, 4 and 5.
Here a 4-bit input is provided into the BEC which results in an n+1-bit output. When a 4-b input
39
TABLE II
Function Table of 4-b BEC
The evaluation of area and delay pertaining toeach module are shown below. The numerals
enclosed within [ ]denote the delay. The evaluation is done as follows:
1) Fig 3.has two sets of 2-bit RCAs. Considering the delay values in the above table, the
incoming time of carry c1 (t=7) to the 6:3 Mux is earlier than the summationS3 from the full
adder to the mux and latter for S2 (t=6).The outputs from the RCA that is C3, S3, S2 are given
40
as inputs to the mux. The output from the mux adds up a delay of 3 which makes C3 (t=10),
Sum3 (t=11), Sum2 (t=10), this delay is caused because of the operation time of the mux. Where
2) Other than group2 the entry time of multiplexer selection input is much more than the entry
time of data outputs from the RCA’s for every group shown in Fig.4, 5 and 6. Because of the
delay caused by the multiplexer selection in each stage there is an overall increase in slight
delay. The computation of RCAs in each stage of the CSLA happens in a parallel fashion and
mitigates the problem of delay due to carry propagation by independently producing more than
one carries and then choosing a carry to generate the sum. Thus, the delay caused from group3
3) Based on the area count and delay count from the above tables. The evaluation of maximum
area and delay of each module of regular SQRT CSLA is done and listed in Table III.
Fig.4.3. Group 2
41
Fig. 4.4.Group3
TABLE III
TOTAL 59 408
42
CHAPTER 5
PROPOSED METHODOLOGY
The consolidated R2B, R4B & R8B based SDF FFT has been planned in this proposed work. The
consolidated Radix of FFT design has a lesser measure of computational way and furthermore
improves the exhibitions of FFT processor. SDF design, the info information successions are going
through one single way. The butterfly preparing component plays out the calculation on the
information. The expansion and deduction activity is done in butterfly components. The changed
convey select viper circuit is utilized for snake activity in this engineering. This snake structure is
extremely productive in this design. The structure of joined R2B, R4B & R8B FFT is appeared in
Fig. 5 shows the 16 point SDF engineering. The procedural progression of this engineering is as
per the following; first and foremost the information with files 0 to 7 is put away in the move
register. The R2B components work on this information and the rest of the info information with
records 8 to15.
43
44
Fig. 5.1. Proposed Architecture of pipelined 16-point Combine Radix FFT using SDF and
MCSLA
45
subsequent information from the butterfly expansion activity is passed to the subsequent stage, and the
deduction results are taken care of back to the move register. After that the 8-point information expansion is
passed to the subsequent stage, it has been finished by utilizing Radix-8 butterfly component and the
deduction information from the registers are passed to the subsequent stage has been finished by a R2B with
butterfly-preparing fidget factor coefficient. The following stages are finished by utilizing R2B & R4B FFT.
In the conventional R2SDF FFT, inputs are surrendered to successively, and the four information
sources are prepared alongside the assistance of single butterfly (Processing Element) unit.
Nonetheless, this design of equipment use is more and force utilization due to using or putting
away the majority of superfluous middle preparing computerized signals. To beat this issue, the
plan of R2B, R4B and R8B based SDF FFT designs are consolidated to diminish the equipment
usage of the processor. The proposed strategy which altogether decreases the region, deferral, and
force utilization. The joined R2B, R4B and R8B FFT have been proposed in this engineering for
diminishing the computational stages. For instance 64 point FFT, R2B FFT has 6 phases to
register the FFT yield. R4B FFT has just 3 phases. Contrasted with R2B, R4B and R8B FFT have
just 2 phases. So combined the R2B, R4B and R8B for improving is execution of engineering. In
the proposed strategy, the 16 point FFT is separated into two a large portion of, the initial 8 point
is legitimately got the yield by utilizing R8B FFT. The quantities of stages are decreased and
furthermore diminished the handling time. The following 8 focuses, utilized R4B and R2B FFT.
In the ordinary 16 point R2B FFT, 15 phases of R2B FFT are utilized. In the proposed joined
R2B, R4B and R8B FFT, just 5 phases of R2B FFT has been utilized. When contrasted with
typical R2B FFT, the combined R2B, R4B and R8B FFT has less computational way than the
current technique.
5.1. SDF:
46
The information successions took care of through one single way postpone input.
The butterfly-preparing component plays out the estimations on the information. The defer units are
all the more productively used by having a similar stockpiling among info and yield of butterfly
unit. Butterfly units and multiplier can be utilized 50% because they are bypassed half the time.
SDF FFT is a pipelined based frequency transformation technique. The structure of SDF FFT is just
like "stream-like" processing of block-based algorithm. The representation of sequential data flow
in SDF FFT is shown in Fig. 6. It has single butterfly processor for performing signed addition and
signed subtraction function. Single path delay elements have been used in the feedback structure.
5.2. MCSLA:
The MCSLA is consist of reduce full adder (RFA). RFA is consist of multiplier is shown.
Fig.5.2.RFA Circuit.
47
RFA circuit has been designed by using a minimal number of logic gates. Also, Multiplexer
(MUX) based RFA circuit has been designed in this work to further alleviates the performances of
Gate Count of Reduced Full Adder = Gate Count [(2*AND) + (1*OR) + (2* NOT) + (1*MUX)]
Gate Count of Reduced Full Adder = [(2*1) + (1*1) + (2*1) + (1*4)] = 2+1+2+4 = 9
In the MCSLA, the full adder circuit is greatly reduced. The number of gate counts is reduced
in FA circuit. The RFA circuit is incorporated into CSLA circuit, which is called as MCSLA.
In MCSLA the area consumption has been reduced than the CSLA.
48
CHAPTER 6
XILINX Software
Xilinx Tools is a suite of software tools used for the design of digital circuits implemented
using Xilinx Field Programmable Gate Array (FPGA) or Complex Programmable Logic
Device (CPLD). The design procedure consists of (a) design entry, (b) synthesis and
implementation of the design, (c) functional simulation and (d) testing and verification. Digital
designs can be entered in various ways using the above CAD tools: using a schematic entry tool,
using a hardware description language (HDL) – Verilog or VHDL or a combination of both. In
this lab we will only use the design flow that involves the use of VerilogHDL.
The CAD tools enable you to design combinational and sequential circuits starting with Verilog
HDL design specifications. The steps of this design procedure are listed below:
A Verilog input file in the Xilinx software environment consists of the following segments:
49
All your designs for this lab must be specified in the above Verilog input format. Note that the
state diagram segment does not exist for combinational logic designs.
In this lab digital designs will be implemented in the Basys2 board which has a Xilinx Spartan3E
–XC3S250E FPGA with CP132 package. This FPGA part belongs to the Spartan family of
FPGAs. These devices come in a variety of packages. We will be using devices that are
packaged in 132 pin package with the following part number: XC3S250E-CP132. This FPGA is
a device with about 50K gates. Detailed information on this device is available at the Xilinx
website.
3. Creating a NewProject
Xilinx Tools can be started by clicking on the Project Navigator Icon on the Windows desktop.
This should open up the Project Navigator window on your screen. This window shows (see
Figure 1) the last accessed project.
50
Figure 1: Xilinx Project Navigator window (snapshot from Xilinx ISE software)
Select File->New Project to create a new project. This will bring up a new project window
(Figure 2) on the desktop. Fill up the necessary entries as follows:
51
Figure 2: New Project Initiation window (snapshot from Xilinx ISE software)
Example: If the project name were “o_gate”, enter “o_gate” as the project name and then click
“Next”.
52
Figure 3: Device and Design Flow of Project (snapshot from Xilinx ISE software)
For each of the properties given below, click on the ‘value’ area and select from the list of
values that appear.
o Device Family: Family of the FPGA/CPLD used. In this laboratory we will be
using the Spartan3EFPGA’s.
o Device: The number of the actual device. For this lab you may enterXC3S250E
(this can be found on the attached prototyping board)
o Package:Thetypeofpackagewiththenumberofpins.TheSpartanFPGAusedin this lab
is packaged in CP132package.
o Speed Grade: The Speed grade is“-4”.
o Synthesis Tool: XST[VHDL/Verilog]
o Simulator: The tool used to simulate and verify the functionality of the design.
Modelsim simulator is integrated in the Xilinx ISE. Hence choose “Modelsim-XE
Verilog” as the simulator or even Xilinx ISE Simulator can beused.
53
o Then click on NEXT to save theentries.
All project files such as schematics, netlists, Verilog files, VHDL files, etc., will be stored in a
subdirectory with the project name. A project can only have one top level HDL source file (or
schematic). Modules can be added to the project to create a modular, hierarchical design (see
Section 9).
In order to open an existing project in Xilinx Tools, select File->Open Project to show the list
of projects on the machine. Choose the project you want and click OK.
Figure 4: Create New source window (snapshot from Xilinx ISE software)
54
In this lab we will enter a design using a structural or RTL description using the Verilog HDL.
You can create a Verilog HDL input file (.v file) using the HDL Editor available in the Xilinx
ISE Tools (or any text editor).
A window pops up as shown in Figure 4. (Note: “Add to project” option is selected by default. If
you do not select it then you will have to add the new source file to the project manually.)
Figure 5: Creating Verilog-HDL source file (snapshot from Xilinx ISE software)
Select Verilog Module and in the “File Name:” area, enter the name of the Verilog source file
you are going to create. Also make sure that the option Add to project is selected so that the
source need not be added to the project again. Then click on Next to accept the entries. This pops
55
up the following window (Figure 5).
Figure 6: Define Verilog Source window (snapshot from Xilinx ISE software)
In the Port Name column, enter the names of all input and output pins and specify the Direction
accordingly. A Vector/Bus can be defined by entering appropriate bit numbers in the MSB/LSB
columns. Then click on Next> to get a window showing all the new source information (Figure 6). If
any changes are to be made, just click on <Back to go back and make changes. If everything is
acceptable, click on Finish > Next > Next > Finish tocontinue.
56
Figure 7: New Project Information window(snapshot from Xilinx ISE software)
Once you click on Finish, the source file will be displayed in the sources window in the
Project Navigator (Figure 1).
If a source has to be removed, just right click on the source file in the Sources in Project
window in the Project Navigator and select Removein that. Then select Project -> Delete
Implementation Data from the Project Navigator menu bar to remove any relatedfiles.
The source file will now be displayed in the Project Navigator window (Figure 8). The source
filewindowcanbeusedasatexteditortomakeanynecessarychangestothesourcefile.All
The input/output pins will be displayed. Save your Verilog program periodically by selecting the
File->Save from the menu. You can also edit Verilog programs in any text editor and add them to the
project directory using “Add Copy Source”.
57
Figure 8: Verilog Source code editor window in the Project Navigator (from Xilinx ISE
software)
A brief Verilog Tutorial is available in Appendix-A. Hence, the language syntax and
construction of logic equations can be referred to Appendix-A.
The Verilog source code template generated shows the module name, the list of ports and
also the declarations (input/output) for each port. Combinational logic code can be added
to the verilog code after the declarations and before the endmodule line.
For example, an output z in an OR gate with inputs a and b can be described as,
assign z = a | b;
Remember that the names are case sensitive.
A given logic function can be modeled in many ways in verilog. Here is another example
in which the logic function, is implemented as a truth table using a case statement:
moduleor_gate(a,
b,z); input a;
input
b;
output
z;
58
reg z;
always @(a
or b) begin
case
({a,b}) 00:
z =1'b0;
01: z =1'b1;
10: z =1'b1;
11: z =1'b1;
endcase
end
en
dmodule
Suppose we want to describe an OR gate. It can be done using the logic equation as shown in
Figure 9a or using the case statement (describing the truth table) as shown in Figure 9b. These
are just two example constructs to design a logic function. Verilog offers numerous such
constructs to efficiently model designs. A brief tutorial of Verilog is available in Appendix-A.
59
Figure 9: OR gate description using assign statement (snapshot from Xilinx ISE
software)
60
Figure 10: OR gate description using case statement (from Xilinx ISE software)
The design has to be synthesized and implemented before it can be checked for correctness, by running
functional simulation or downloaded onto the prototyping board. With the top-level Verilog file opened
(can be done by double-clicking that file) in the HDL editor window in the right half of the Project
Navigator, and the view of the project being in the Module view , the implement design option can be
seen in the process view. Design entry utilities and Generate Programming File options can also be
seen in the process view. The former can be used to include user constraints, if any and the latter will
be discussed later.
61
To synthesize the design, double click on the Synthesize Design option in the Processes window.
To implement the design, double click the Implement design option in the Processes window. It will
go through steps like Translate, Map and Place & Route. If any of these steps could not be done or
done with errors, it will place a X mark in front of that, otherwise a tick mark will be placed after each
of them to indicate the successful completion. If everything is done successfully, a tick mark will be
placed before the Implement Design option. If thereare
warnings, one can see mark in front of the option indicating that there are some warnings. One can look at
the warnings or errors in the Console window present at the bottom of the Navigator window. Every time
the design file is saved; all these marks disappear asking for a freshcompilation.
62
Figure 11: Implementing the Design (snapshot from Xilinx ISE software)
The schematic diagram of the synthesized verilog code can be viewed by double clicking View RTL
Schematic under Synthesize-XST menu in the Process Window. This would be a handy way to debug
the code if the output is not meeting our specifications in the proto type board.
By double clicking it opens the top level module showing only input(s) and output(s) as shown below.
63
By double clicking the rectangle, it opens the realized internal logic as shown
below.
Figure 13: Realized logic by the XilinxISE for the verilog code
To check the functionality of a design, we have to apply test vectors and simulate the circuit. In order
to apply test vectors, a test bench file is written. Essentially it will supply all the inputs to the module
64
designed and will check the outputs of the module. Example: For the 2 input OR Gate, the steps to
generate the test bench is as follows:
In the Sources window (top left corner) right click on the file that you want to generate the test
bench for and select ‘New Source’
Provide a name for the test bench in the file name text box and select ‘ Verilog test fixture’
among the file types in the list on the right side as shown in figure 11.
Figure 14: Adding test vectors to the design (snapshot from Xilinx ISE software)
65
Click on ‘Next’ to proceed. In the next window select the source file with which you want to
associate the test bench.
Figure 15: Associating a module to a testbench (snapshot from Xilinx ISE software)
Click on Next to proceed. In the next window click on Finish. You will now be provided with a
template for your test bench. If it does not open automatically click the radio button next to
Simulation .
66
You should now be able to view your test bench template. The code generated would be something like
this:
moduleo_gate_tb_v;
// Inputs reg
a;
reg b;
// Outputs
wire z;
.b(b),
.z(z)
67
);
initialbegin
// Initialize Inputs a
= 0;
b =0;
The Xilinx tool detects the inputs and outputs of the module that you are going to test an assigns them
initial values. In order to test the gate completely we shall provide all the different input combinations.
‘#100’ is the time delay for which the input has to maintain the current value. After 100 units of time have
elapsed the next set of values can be assign to the inputs.
Complete the test bench as shown below:
moduleo_gate_tb_v;
// Inputs reg
a; regb;
// Outputs
wire z;
68
// Instantiate the Unit Under Test
(UUT) o_gateuut (
.a(a),
.b(b),
.z(z)
);
initialbegin
// Initialize Inputs a
= 0;
b =0;
b =1;
b =0;
69
// Wait 100 ns for global reset tofinish
#100;
a = 1;
b =1;
Now under the Processes window (making sure that the testbench file in the Sources window is
selected) expand the ModelSim simulator Tab by clicking on the add sign next to it. Double Click on
Simulate Behavioral Model. You will probably receive a complier error. This is nothing to worry
about – answer “No” when asked if you wish to abort simulation. This should cause ModelSim to open. Wait for
it to complete execution. If you wish to not receive the compiler error, right click on Simulate Behavioral
Model and select process properties. Mark the
checkbox next to “Ignore Pre-Complied Library Warning Check”.
70
Figure 16: Simulating the design (snapshot from Xilinx ISE software)
To save the simulation results, Go to the waveform window of the Modelsim simulator, Click on File -
> Print to Postscript -> give desired filename and location.
Else a normal print screen option can be used on the waveform window and subsequently stored in
Paint.
72
Figure 18: Changing Waveform Background in ModelSim
CHAPTER 7
RESULTS AND SIMULATION
73
CHAPTER 8
CONCLUSION
Combined R2B, R4B and R8B based SDF FFT has been structured in this exploration work. Radix-2
SDF FFT has more equipment use and computational stages likewise expanded. To conquer this issue,
built up a consolidated R2B, R4B and R8B butterfly structure based SDF FFT strategy in this work. The
proposed design offers 53.38% decrease in slice, 56.27% decrease in LUTs, 32.35% decrease in delay
and 27.36% decrease in power utilization than the conventional strategy. Contrasted with the ordinary
technique, the proposed strategy for computational stages is diminished and gives preferred exhibitions
74
over the conventional one.
REFERENCES
Fahad Qureshi, Muazam Ali, and Jarmo Takala, “Multiplier-less Reconfigurable Processing Element
for Mixed Radix-2/3/4/5 FFTs”, International Conference on 2017 IEEE.
Arathi Ajay and Dr. R. Mary Lourde, “VLSI Implementation of an improved multiplier for FFT
Computation in Biomedical Applications”, Computer Society Annual Symposium on VLSI, PP.
68-73, IEEE 2015.
V. Arunachalam and Alex Noel Joseph Raj, “Efficient VLSI implementation of FFT for orthogonal
frequency division multiplexing applications”, Published in IET Circuits, Devices & Systems, Vol.
8, No. 06, PP. 526-531, IET 2014.
Jienan Chen, Jianhao Hu and Shuyang Lee, “Hardware Efficient Mixed Radix-25/16/9 FFT for LTE
Systems”, IEEE Transactions on Very Large Scale Integration (VLSI) Systems, PP. 01-09, IEEE
75
2014.
Sidinei Ghissoni, Eduardo A. C. da Costa and Angelo Gonçalves da Luz, “Implementation of Power
Efficient Multicore FFT Data paths by Reordering the Twiddle Factors”, PP. 4562-4568, IEEE
2014.
Naman Govil and Shubhajit Roy Chowdhury, “High Performance and Low Cost Implementation of
Fast Fourier Transform Algorithm based on Hardware Software Co-design”, IEEE Region 10
Symposium, PP. 403-407, IEEE 2014.
Harpreet Singh Dhillon and Abhijit Mitra, “A Reduced-bit Multiplication Algorithm for Digital
Arithmetic”, International Journal of Computational and Mathematical Sciences, Febrauary 2008,
pp.6469.
Sumit Vaidya and Depak Dandekar. “Delay-power performance comparison of multipliers in VLSI
circuit design”. International Journal of Computer Networks & Communications (IJCNC), Vol.2,
No.4, July 2010.
76