Vlsi Architecture For r2b r4b r8b

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 81

VLSI Architecture for Combined R2B, R4B and R8B FFT

using SDF and Modified CSLA

1
CONTENTS:
CHAPTER 1
INTRODUCTION.
CHAPTER 2
LITERATURE SURVEY.
2.1. A HIGH FLEXIBLE LOW LATENCYN MEMORY BASED FFT
PROCESSOR FOR 4G,WLAN AND FUTURE 5G.

2.2. MULTIPLIER LESS UNIFIED ARCHITECTURE FOR MIXED RADIX -


2/3/4 FFT’S.

2.3. POWER EFFICIENT RADIX-2 DITFFT USING FOLDING TECHNIQUE


AND DKG REVERSIBLE GATE.
CHAPTER 3
DIFFERENTIAL FFT ALGORITHM.
CHAPTER 4
MODIFIED CARRY SELECT ADDER FOR POWER AND AREA
REDUCTION.
4.1. BASIC ADDER BLOCK EVALUTION DELAY AND AREA.
4.2. REGULAR 16-B SQRT CSLA EVALUATION AREA AND DELAY.
CHAPTER 5
PROPOSED METHODOLOGY.
5.1. SDF.
5.2. MCSLA.
CHAPTER 6
XILINX SOFTWARE.

CHAPTER 7
2
RESULTS AND SIMULATION.
CHAPTER 8
CONCLUSION.
REFERENCE.

3
LIST OF FIGURES:
FIG 1.1. QUARTUS-II WORKFLOW
FIG 3.1. BASIC BUTTERFLY STRUCTURE
FIG 3.2. MODFIED BUTTERFLY STRUCTURE
FIG 3.3. BUTTERFLY STRUCTURE USED IN THE DITFFT
FIG 3.4. BASIC DECIMATION IN TIME FFT
FIG 3.5. 8 POINT DITFFT
FIG 3.6 SEQUENCE OF INPUT IN DITFFT
FIG 3.7. RADIX-4 DITFFT
FIG 3.8. INPUT TO OUTPUT SEQUENCE ENERATION OF DITFFT
FIG 3.9. BUTTERFLY STRUCTURE USED IN DITFFT
FIG 3.10. 8-POINT RADIX-2 DITFFT
FIG 3.11.RADIX 4-DITFFT
FIG 3.12. BASIC PIPELINE ARCHITECTURE
FIG 3.13. R2MDC PIPELINE ARCHITECTURE OF 8-POINT
FIG 4.1. 4 BIT BINATY EXCESS CONVERTER
FIG 4.2. 16- BIT REULAR CSLA
FIG 4.3. GROUP 2
FIG 4.4. GROUP 3
FIG 4.5. GROUP 4

FIG 5.1. REPRESENTATION OF SEQUENTIAL DATAFLOW IN SDFFFT

FIG. RFA CIRCUIT

FIG. 5.2. ARCHITECTURE OF MCLSA STRUCTURE

LIST OF TABLES:
TABLE 3.1. BIT REVERSAL ORDER

4
ABSTARCT:

The FFT is enumerate is DFT and DFT is enumerate is consecutive way, it accomplishes

continuous application with constant preparing when the information is persistently taken care

of through the processor. Included paper, joined is radix-2 butterfly (R2B), R4B & R8B

components based single path delay feedback (SDF) and modified carry select adder (MCSLA)

technique, for diminishing the computational stages and for decreasing the equipment use than

the R2B and R4B FFT. The implemented SDF technique has single delay commutators at one

stage without exception. N/2 point is consecutive controlled in consequence of delay

component. The proposed technique has less number of multipliers and the more modest

number of computational stages and butterfly components than the Radix-2 & 4 FFT.

5
CHAPTER 1

1. INTRODUCTION:

In the start of the twenty-first century, it was difficult to assemble various machines and their

apparatuses without applying the standards of gadgets. Also, correspondence blast was

impractical without the headway in the gadgets business and without utilizing IC [1]. The

coordinated circuit configuration has arrived at a phenomenal turn of events essentially because

of the advances in VLSI innovations and framework plan. Referable to the headway in the

CMOS handling innovation, the component size of the semiconductor has been limited, also, the

quantity of segments present in the IC has expanded quickly [2].

In this way, it is conceivable to coordinate more number of functionalities in a single

incorporated circuit. The upgrades in the VLSI innovation have invigorated an extraordinary

premium in delivering specific reason equal processor exhibits to encourage constant sign

handling. DFT is commonly utilized in OFDM-based interchanges, picture handling,

furthermore, other sign preparing applications. Numerous essential calculations such as

convolution, range assessment and connection are acknowledged by DFT [3].

Shashidhara et al. [4], introduced a multimode memory-based Fast FFT design for a clinical

framework. The proposed design upheld remote showcases dependent on MIMO- OFDM.

The FFT processor empowered the utilization of 2- stream 4096/2048/1024-point FFTs and 1-

to 4-stream 128/64- point FFTs for FD-OCT and OFDM applications individually. The

proposed design gave the information access for up to sixteen memory ways utilizing practical

four-bank single-port SRAM working in four-word information width. The proposed design

gave the high-throughput multimode FFT activities in an energy-and area efficient

1
arrangement utilizing equipment productive increase and store units. A test chip was planned

utilizing TSMC-0.18 CMOS innovation with a center size of 4.8. Post-format reproduction

performed for 4096- point FFT at 80MHz and the 128-point FFT at 40MHz. The 4096-point

FFT at 80MHz also, 128-point FFT at 40MHz gave the throughput of 152 MS/s and 160 MS/s

individually. Notwithstanding that, 4096-point FFT at 80MHz and 128- point FFT at 40MHz

are devoured 156.2 mW and 69.9 mW powers individually. Further, framework level check

for pragmatic OCT imaging is acted in FPGA stage.

Fahad Qureshi et al. [5], introduced the design for memory

based FFT. This design is reasonable for playing out the calculation for genuine esteemed

signs dependent on radix-2 annihilation in-recurrence calculation. The calculation clock

cycles are limited and the use of the butterfly processing element is expanded by the stage

parcel of the Real FFT. The PE could handle four contributions to resemble by utilizing two

radix-2 butterflies and just two multiplexers. The proposed memory addressing succession

and a control sign of the multiplexers are created by the counter rationale as indicated by the

RFFT calculation stage. Notwithstanding that, the proposed RFFT engineering underpins all

the more preparing component. The proposed engineering decreased the calculation cycles by

a factor of 17.5% for a 32-point RFFT calculation while keeping up lower equipment use and

intricacy in the PE plan.

Fahad Qureshi et al. [6], introduced the 128/256/512/1024/1536/2048-point SDF pipeline FFT

engineering for long haul development and versatile overall interoperability for microwave

access frameworks. The proposed SDF engineering required a minimal effort calculation plan

to empower 1536-point FFT, which altogether lessens equipment costs just as force

utilization. To execute a radix-3 FFT, the proposed configuration included proficient three-

2
stage SDF pipeline engineering.

In most of the communication systems [1] and control system the frequency spectrum of

the signal is important to calculate the frequency range of the signal to know whether the

system can use the signal for further processing.

Most of the signals are in time domain in which the variation is represented with respect

to time so to get the frequency domain signal which means to know the signal variation

with respect to frequency [2] we need a transformation from time domain to frequency

domain which is done by using different transformation techniques [3]. They are the

following:

1) Fourier series [3]

2) Fourier Transform(FT)

3) Laplace Transform

4) Z-transform

The Fourier series is applied only for the repetitive signals or periodic signals so we

are going to the Fourier Transform which can be applied for the periodic and non-

periodic signals

[3] also.

The FT of a signal is done by decomposing the signal in to sum of finite sinusoidal

components. The FT can be done by using the Discrete Fourier Transform in which

equally spaced samples of a function is transformed to finite combination of

complex sinusoidal signals [4].

The Discrete Fourier Transform can be expressed as [4]:

3
Here Nis the samples present in the signals

4
F(k) is the frequency domain signal

f(n) is the time domain signal

The Frequency domain output F(k) is discrete signal as the input considered should be in discrete

ones.

The Fourier Transform can also be done by using the DTFT in which the input signal should be

discrete and the output frequency domain signal is continuous signal.

The DTFT is expressed as [4]:

Where F(ejw )is the Frequency domain signal which is continuous and periodic one

f(n) is time domain signal and discrete

ω is the frequency

If the input is continuous signal we need to do sampling and get the output as discrete

signal and apply the DTFT technique to get the frequency domain signal.

Applying DFT and DTFT to the signals in time domain leads to frequency domain of the signal

For a N point the conversion can be done by using the following [3] [4]

No of complex multiplications present in DFT: N2

No of complex additions present in DFT: N(N − 1)

If we consider 8-point input sequence the following is required to convert in to frequency

domain:

64 complex multiplications

56 complex additions

If we increase the numbers of samples in the input sequence the multiplications going to increase

5
very rapidly.

6
Let us consider the 16-point sequence the conversion requires the following:

No of complex multiplications: 256

No of complex additions: 240

To reduce the no of complex multiplications and additions [4], we are going to use FFT

technique to calculate the frequency domain of the signal the conversion requires the following:

No of complex multiplications: (N/2)log2 N

[5] No of complex additions : N log2 N [5]

The Fast Fourier Transform is done using the COOLEY-TUKEY [6] algorithm which is

also called prime factor algorithm.

Basically, Fast Fourier Transform can be done by using radix algorithm which can be of

type radix-r, r can be any integer and the N-point FFT can be calculated by using different radix

like radix-2, radix-4 and radix-8 and so-on. The FFT can be implemented by using two different

methods like DIT FFT and DIFFFT.

To increase the speed, the pipeline architecture [6] [7] is used in the computation of FFT

and in particularly Multi Delay Commutator [6] [8] [9] [10] architecture is used in the

communication systems. In the Pipeline architecture, we also use a butterfly element and the

butterfly can be done by using different radix like radix-2 and radix-4 and in this the elements

will be retrieved by using memory addressing [11] [12] [13].

The implementation of FFT using the DIT FFT for 8-Point sequence is done using the

Verilog and synthesized in Quartus-II [14] and the Pipeline architecture for 64-Point is done

using the Verilog in the Modelsim [14].

7
Radix:

In this the Radix means number of elements can be taken in at a time and processing can

be done using the Butterfly if it is a Radix-2 the input elements will be ‘2’ and the processing

like addition and multiplication operations are done and the output can be obtained. If it is

Radix-4 the input elements will be ‘4’and the output elements will be ‘4’ at a time.

Verilog:

Basically the hardware description languages are different from the software description

languages and the mostly used hardware description languages are as follows:

1) VHDL

2) Verilog HDL

Verilog is a hardware description language which is similar to C language which is

standardization of IEEE 1364.In the hardware description languages there is a need of

propagation of signal and time.

Quartus-II

The synthesis of designed code will be done by using the Quartus-II and to do the

synthesis first we need to do simulation in Modelsim and the synthesis and implementation is

done, placing of Integrated circuits, allocating pins respectively and the timing analysis will be

done to analyze the worst case delay present in the circuit.

After the synthesis we get the different views of the circuit we designed they are:

1) RTL view

2) STATE MACHINE view

The dumping of the program in to the hardware can also be done by using QUARTUS-II

8
In the first step coding can be done by using the different hardware description

languages. In this we are using the Verilog Language But we can also use the VHDL language

and the synthesis and implementation can also be done by using Xilinx software also.

The Flow of the synthesis steps can be as shown below [14]:

DESIGN

VERILOG Coding

Functional Simulation
Using MODLESIM

Synthesis and
Implementation

Place and Route

Timing Analysis Timing Closure

SIMULATION

PROGRAMMING
And
CONFIGURATION

Fig. 1.1.Quartus- II Work Flow

9
The thesis is divided in to Chapter and subsections:

1) Chapter II: deals with theoretical description of different types of FFT algorithms

a) Cooley-Tukey Method

b) DIT FFT

c) DIF FFT

d) Pipelined FFT architecture

2) Chapter III: deals with Implementation of algorithms

a) Implementation of DIT FFT of 8-Point input

b) Implementation of Pipelined FFT of 64-Point

3) Chapter IV: deals with Results

a) Simulation Results

b) RTL view

4) Chapter V: Conclusion

10
CHAPTER 2

2. LITERETURE REVIEW:

2.1. A High-Flexible Low-Latency Memory-Based FFT Processor for 4G, WLAN,


and Future 5G:
A high-throughput programmable fast Fourier transform (FFT) processor is designed supporting 16- to
4096-point FFTs and 12- to 2400-point discrete Fourier transforms (DFTs) for 4G, wireless local area
network, and future 5G. A 16-path data parallel memory-based architecture is selected as a tradeoff
between throughput and cost. To implement a hardware-efficient high-speed processor, several
improvements are provided. To maximally reuse the hardware resource, a reconfigurable butterfly unit is
proposed to support computing including eight radix-2 in parallel, four radix-3/4 in parallel, two radix-5/8
in parallel, and a radix-16 in one clock cycle. Twiddle factor multipliers using different schemes are
optimized and compared, wherein modified coordinate rotation digital computer scheme is finally
implemented to minimize the hardware cost while supporting both FFTs and DFTs. An optimized conflict-
free data access scheme is also proposed to support multiple butterflies at any radices. The processor is
designed as a general IP and can be implemented using a processor synthesizer (application-specific
instruction-set processor designer). The electronic design automation synthesis result based on a 65-nm
technology shows that the processor area is 1.46 mm 2 . The processor supports 972 MS/s 4096-point FFT
at 250 MHz with a power consumption of 68.64 mW and a signal-to-quantization-noise ratio of 66.1 dB.
The proposed processor has better-normalized throughput per area unit than the state-of-the-art available
designs.

2.2. Multiplierless Unified Architecture for Mixed Radix−2/3/4 FFTs:

This paper presents a novel runtime-reconfigurable, mixed radix core for computation 2−, 3−, 4− point fast

Fourier transforms (FFT). The proposed architecture is based on radix3 Wingorad Fourier transform,

however multiplication is performed by constant multiplication instead of general multiplier. The

complexity is equal to multiplierless 3-point FFT in terms of adders/subtractors with the exception of a few

11
additional multiplexers. The proposed architecture supports all the FFT sizes which can be factorized into

2, 3, 4 point. It is also explained that the accuracy of the proposed architecture is not affected due to

constant multiplication.

2.3. Power Efficient Radix-2 DIT FFT using Folding Technique and DKG

Reversible Gate:

FFT is normally utilized in computerized flag preparing algorithms. 4G correspondence and different

remote framework based correspondence are directly hotly debated issues of innovative work in the remote

correspondence and organizing field. FFT is a calculation that speeds up the count of DFT. In the main

stage, low multifaceted nature Radix-2 Multi-way Delay Commutator (R2MDC) FFT recurrence change

method is created through Exceptionally Large Scale Integration System structure condition. Low power

utilization, less zone and rapid are the VLSI primary parameters. Customary R2MDC FFT structure has

more equipment multifaceted nature because of its escalated computational components. Two strategies are

utilized to plan radix-2 FFT calculation. In firest strategy is plan radix-2 FFT with the help of reversible

Peres gate and TR gate. Second method is design radix-2 FFT with the help of reversible DKG Gate. The

all structure are usage vertex-4 device family Xilinx programming and looked at past calculation.

12
CHAPTER 3

DIFFERENT FFT ALGORITHMS:

Cooley-Tukey Method:

This Method is most used in the computation of FFT in this the DFT of N point is

expressed as product of N1 and N2[3]

N=N1*N2

It can be done by breaking in to N1 DFT‟s of size N2 point or breaking in to N2 DFT‟s of size N1

point.

In the N1 and N2 one of them is small value compared with other one and if N1 is radix FFT can

be done by using Decimation in Time FFT and if N2 is radix FFT can be done by using

Decimation in frequency FFT.

The operation done in recursive model by using radix-2 DFT‟s and the radix-2 DIT will be done

by multiplying the phase factor which is called as Twiddle factor to odd transform after that

addition and subtraction operation will be performed, butterfly of even and odd transform is

called size-2 DFT

The Fast Fourier Transform can be done by using two different methods[4]:

1) DIT Fast Fourier Transform

2) DIF Fast Fourier Transform

This is done by dividing in to number of stages and they can be calculated as:

v = log2 N

Where N is the no of input samples present in the time domain

13
N-Point DFT with even N will be calculated with two (N N
point DFT is
point DFT again each
2
)
2

done by using (N
point and so on until it reach to 2-point DFT‟s only.
)
4

Basically the Fast Fourier Transform can be done by using butterfly structure and the operation

can be done in two ways:

In the two ways one is used in the DIT FFT and other is used in the DIF FFT

Butterfly used in DIT FFT:

Fig.3.1. Basic Butterfly Structure

Here a and b are the input samples for the butterfly and Wr ,W(r+(N/2))) are the Twiddle
N N

Factors.

The results from the butterfly structure are as below:

14
Twiddle Factor:

Twiddle Factor is root of a unity complex in the butterfly operation used to compute the discrete

Fourier transform

15

The butterfly requires two complex multiplications and two complex additions we can reduce the no of
complex multiplications by using symmetry property.
The symmetry property is

As From the trigonometric equations the value can be calculated as

The value will be equal to “-1” the Twiddle Factor can becomes equal to the−W

r
N From this the butterfly can be modified as shown below:

Fig. 3.2.Modified Butterfly Structure

The results from the modified butterfly will be equal to

16
This requires only “1” complex multiplication and“2” complex additions.

17
Butterfly used in DIF FFT [1]:

The Butterfly structure used in DIF FFT is as shown below:

Fig. 3.3.Butterfly structure used in the DIF FFT


The results from the butterfly structure is given by

This requires “2” complex additions and “1” complex multiplication.

Decimation in Time FFT:

In the DIT FFT the input will be given in bit reversal order and the output will be in the

order.

Decimation in Frequency FFT:

In the DIF FFT the input will be in the correct order and the output will be in the bit

reversal order.

Bit reversal order:

The Bit reversal order is generated using the exchange the first and last bits, the next bit

to first to the previous bit to the last bit present in the sequence and so on.

X (b0 b1 b2 b3 b4)--------------original order of bits

18
For getting the bit reversal order

1) First exchange the bits b4 and b0

2) exchange the bits b3 and b2

3) The result is bit reversal order

X (b4 b3 b2 b1 b0) is the bit reversal order

Let us consider 8 point input the bit reversal order can be as shown below:

The input samples can be{x(0) , x( 1) , x(2) , x(3) , x( 4) , x( 5) ,

x( 6) , x( 7) } The bit reversal order can be obtained as:

19
Original sample Binary Representation Bit reversal Order

X ( X( 000) X( 000) = X( 0)
0)
X( 1) X( 001) X (100) = X(4)

X( 2) X( 010) X (010 )= X(2)

X(3) X( 011) X(110) = X(6)

X(4) X( 100) X (001 )= X(1)

X(5) X( 101) X(101) = X (5)

X(6) X( 110) X (011 )= X (3)

X(7) X(111) X( 111) = X(7)

Table.3.1. Bit Reversal order


DIT FFT:

The algorithm in which the x(n) is break down in to smaller subsequences and the

principle of the decimation in time FFT can be explained by considering the No of i/p points in

FFT should be expressed as power of 2.

N = 2r

The x(n) is break down in to two parts in which one has only even parts and other has odd parts.

The Frequency domain can be obtained from the time domain by using the below formula:

Here X(k) is the representation of a signal x(n) in frequency domain.

And the breaking of the signal in to two subsequences leads to the frequency domain as

represented below:
20
21
Here n will be replaced by 2*r where r varies from 0 to (N/2)-1 the above equation can be

modified as shown below:

By the symmetry property we can break the Twiddle Factor and the frequency domain is sum of
K
even sequence and odd sequence multiplied by W�N

The decimation in time FFT process as shown below:

Fig.3.4. Basic Decimation in Time FFT

Dividing the input sequence in to odd and even can be done by giving the input in bit reversal

order and the output frequency responses will be in order as X( 0) , X( 1), X( 2) … … . X(

22
7) .

23
Again each 𝑁
is divided in to two 𝑁
point DFT and so on the process is done till the
2 4

2- point DFT.

The total decimation in time for 8-Point Sequence is as shown below:

Fig. 3.5. 8-Point DIT FFT

The DIT FFT can be done using different radices:

1) Radix-2

2) Radix-4

24
Radix-2 DIT FFT:

In the radix-2 DIT FFT i/p sequence is divided as shown below:

01234567
8- Point

0246 1357

4-Point
04
2 6 15 37
2- Point

Fig.3.6. Sequence of Input in DIT FFT

25
Radix-4 FFT:
The Radix-4 basic butterfly diagram is as shown below:

Fig. 3.7.Radix-4 DIT FFT

26
DIF FFT [1]:

The DIF FFT can be done with i/p in normal order and the o/p in the bit reversal order.

In this the N-point is divided in to two N


2 point sequences and the sequences can be shown as
below:

The first half sequence is with X(n) where 0≤n≤ (N/2)-1 and

The second sequence is with X( n + ) where 0≤n≤ (N/2)-1


(N)
2

The decimation in Frequency FFT can be done by using different Radices:

1) Radix-2

2) Radix-4

27
Radix-2 DIF FFT:

In this the 𝑁-point is divided in to two parts and the two parts are individually divided as

shown below:

Fig. 3.8. Input to output sequence generation of DIF FFT

Computing the DFT of N-point i/p


sequence x(n)

28
The above equation is modified as shown below:

Basic Butterfly structure used in DIF FFT of Radix-2:

The butterfly structure used in DIF FFT is different from the Butterfly structure used in

DIT FFT. The Basic difference is in the DIT FFT Butterfly the multiplication is done before

additions but in the DIF FFT Butterfly the multiplication is done after additions.

Fig.3.9. Butterfly structure used in DIF FFT

This involves two complex additions and one complex multiplication.


8-Point DIF FFT:
The FFT is performed using the decimation in frequency as shown below:

29
Fig. 3.10. 8-Point Radix-2 DIF FFT

30
Radix- 4 FFT:

The Radix-4 FFT basic butterfly diagram is as shown below:

Fig. 3.11.Radix-4DIF FFT

31
In this the FFT, length can be calculated by using 4𝑣 here v is the number of stages and the

Twiddle Factor can be expressed as shown below:

Pipelined R2MDC:

For the H/W architecture of the FFT there are three different types of architectures they are:

1) Single Butterfly Architecture

2) Pipeline Architecture [15] [16]

3) Parallel Architecture [15] [16] [17]

In all these the pipeline architecture [8] is very attractive in the multimedia

communication systems which uses the FFT processor in their systems.

To reduce the complex multiplications further more we proposed pipeline architecture which

produces the low latency, power consumption will be low, throughput will be high and occupies

less area.

The pipeline architecture can be done by using different types as below:

1) Multi Path Delay Commutator (MDC)

2) Single Path Delay Commutator (SDC)

3) Single Path Delay Feedback(SDF)

In this the MDC architecture is having the multiple input data because of its high

32
throughput and the hardware utilization of the MDC is low.

33
Single Path Delay feedback is best solution for the single input data stream and it is used when

the memory requirement is less but in the SDC architecture [18] usage of adders is very low but

the memory requirement will be more and the output will be in reversal order and we need to get

in to the normal order it requires more operations for that one.

The Basic Pipeline architecture is as shown below:

Butterfly Delay Butterfly Delay Butterfly

Computation Commutator Computation Commutator Computation

Fig. 3.12 Basic Pipeline architecture

In this the input data stream is divided in to two parallel data streams and the processing

is done using the delay elements, butterfly elements and processing elements and in the MDC

architectures depending up on radix we are using the utilization of the resources will depend.

If we are using radix-r the utilization of the resources will be 1/r, r can be any integer and

if we are using radix-2 for the FFT computation the utilization of the resources will be 50%.

If we are using the radix-2 it is called as R2MDC pipeline architecture and the architecture using

the Radix-2 is as shown below [19] [20]:

X3X2X1X0
R R R
REG BF BF
S BF
S
R R
X7X6X5X4 -j R

Twiddle

Factor

Fig.3.13. R2MDC Pipeline architecture of 8-Point


34
The R represents the delay elements and BF is the butterfly structure used in the FFT, and

it is done by using the Radix-2 and S is the Switch.

The Pipeline architecture can be implemented using different radix like radix-4 it is

called as R4MDC.By using the pipelined R2MDC architecture the complex multipliers will be

reduced compared with the normal DIT FFT and DIF FFT.

35
CHAPTER 4

4.1. Modified Carry Select Adder for Power and Area


Reduction

The most crucial areas of research in VLSI system design is to design circuits that work faster

by performing quick calculations and to be efficient. So there can be numerous combinations that

can be designed to optimize parameters such as power, area and delay. In conventional digital

adders,the sum and carry are generated after a normalized delay. When an arithmetic operation is

performed for larger bit size this delay is maximized, hence fast adders such as CSLA and CSA

are used to minimize the delay. The structure of RCA is simple and requires lesser gates

compared to other adders but the computational speeds are significantly affected.

The problem of carry propagation delay in conventional adders can be avoided by using a

carry select adder structure as they independently produce more than one carry and after which a

carry is selected in order to get the sum. However, it is understood from the structure of CSLA

that it is not efficient in terms of area as it uses more than one pair of Ripple carry adders (RCAs)

to generate the partial sum and carry by accounting for the carry input to the RCAs as Cin=0 and

Cin=1, then the final Cin is selected by a multiplexer.So as there are pairs of RCAs used in each

stage in regular CSLA, it occupies larger area and consumes more power for functionality. Hence

to subdue this problem, the fundamental

36
idea of the work that has been put forward is to make use ofan n-bit binary to excess-1 code

convertors (BEC) to enhance the speed of the operation. The same concept of BEC can very well

be used to replace the RCA for Cin=1 to improve speed and delay further.And hence, it can be

observed that the power delay product and area delay product also decreases. One of the major

advantages of using a BEC is its structure. The BEC consumes far less number of logic gates than

the full adders in RCA for the reason that there is a significant decrease in area when replaced

with BEC andin turn lesser power consumption to drive the logic block[1]. Arithmetic units are

often the work horse of any computational circuit. And addition is the backbone of any such logic

block. So a faster means of performing this operation is essentially required. CSLAs are one of

the best options to do the same. But the excessive area overhead makes it unattractive. Several

techniques have been found out to overcome this problem. One of them is the introduction of add

one circuit. It shows the usage of BECs instead of RCAs in the CSLA structure that reduces area

without compromising delay even when the input length progresses [2]. VLSI hardware

implementation is an area where different domains such as image processing, neural networks and

so on are integrated. CSLA architecture with BEC is used extensively in high speed VLSI

hardware implementation. This is because of the very simple reason that the number of gates are

reduced and in turn an area reduction is achieved by replacing the RCA with BEC in the existing

structure. It is noteworthy that this architecture is also used as an alternative for adder

implementations in many processors [3].Carry select method is considered to be an eminent way

to achieve a trade-off between the cost and performance in any carry propagation adder design

because it has got the edge of logarithmic gate depth as is for the case with any other structure of

the distant-carry adder family. The consumption of more power due to the amount of circuitry

hinders the direct usage of the architecture. Gate depth is as important as gate output load from

power perspective because of today’s deep sub-micron technology. Hencethe need to reduce the

37
transistor count and to simplification of the layout is justified. Area and power reduction are

followed from the in turn reduction of the number of transistors. The SQRT CSLA

is found to have outperformed rest of the CSLs with fewer transistors and least PDP and ADP

[4].All these developments have contributed immensely to low power applications, versatile

mobile and portable electronics.

4.1. BASIC ADDER BLOCK EVALUATION- DELAY AND AREA.

The parameters of area and delay are evaluated for gates such as AND, OR, and NOT, each of

which ishaving area equal to 1unit and a delay of 1 unit.Finally for a particular logic block, the

number of gates employed in it determines the total area of that block and then adds up the total

gates in the elongated path of a logic block that contribute to the maximum value of delay. Based

on the given approach, the basic blocks of CSLA consists of gates like 2:1 Multiplexer (Mux),

Half Adder (HA) and Full Adder (FA) which are evaluated and listed in the table I.As mentioned

above, the prime idea is to use a block instead of RCA with Cin=1 that is better than RCA for

reduction in area and power of regular carry select adder. The Ripple carry adder which is of n-

bits is replaced by an n+1-bit BEC. The architecture and working of BEC is explained using Fig1

and table II.

When ripple carry adder is replaced with BEC,the reduction in area and power can be observed

when constructed for larger sized CSLA structures. The Boolean representation of the 4- bit BEC

is given below:

NOT (B0) =X0 B0 XORB1=X1

B2XOR [(B0) AND (B1)] =X2

B3 XOR [(B0 AND B1) AND (B2)] =X3

38
4.2. REGULAR 16-B SQRT CSLA EVALUATION-AREA AND DELAY

The fundamental structure of 16-b regular SQRT CSLA is shown in Fig 2[1]. The structure here

consists of five modules or groups of various sized ripple carry adders. Each module is shown in

Fig. 3, 4 and 5.

Here a 4-bit input is provided into the BEC which results in an n+1-bit output. When a 4-b input

is given to BEC the n+1-bit output is shown in table II.

Fig 1. 4-Bit Binary Excess Converter

Fig 4.1. 4-Bit Binary Excess Converter

39
TABLE II
Function Table of 4-b BEC

Fig.4.2. 16-Bit Regular CSLA

The evaluation of area and delay pertaining toeach module are shown below. The numerals
enclosed within [ ]denote the delay. The evaluation is done as follows:

1) Fig 3.has two sets of 2-bit RCAs. Considering the delay values in the above table, the

incoming time of carry c1 (t=7) to the 6:3 Mux is earlier than the summationS3 from the full

adder to the mux and latter for S2 (t=6).The outputs from the RCA that is C3, S3, S2 are given
40
as inputs to the mux. The output from the mux adds up a delay of 3 which makes C3 (t=10),

Sum3 (t=11), Sum2 (t=10), this delay is caused because of the operation time of the mux. Where

the C3 is given as the Cin to the next multiplexer to start functioning.

2) Other than group2 the entry time of multiplexer selection input is much more than the entry

time of data outputs from the RCA’s for every group shown in Fig.4, 5 and 6. Because of the

delay caused by the multiplexer selection in each stage there is an overall increase in slight

delay. The computation of RCAs in each stage of the CSLA happens in a parallel fashion and

mitigates the problem of delay due to carry propagation by independently producing more than

one carries and then choosing a carry to generate the sum. Thus, the delay caused from group3

till group5 is resolved, as follows:

{c6, Sum [6:4]} = c3 (t=10) +Mux

{c10, Sum [10:7]} =c6 (t=13) +Mux

{Cout, Sum [15:11]} =c10 (t=16) +Mux

3) Based on the area count and delay count from the above tables. The evaluation of maximum

area and delay of each module of regular SQRT CSLA is done and listed in Table III.

Fig.4.3. Group 2

41
Fig. 4.4.Group3

Fig 4.5. Group 4

TABLE III

TOTAL 59 408
42
CHAPTER 5

PROPOSED METHODOLOGY

The consolidated R2B, R4B & R8B based SDF FFT has been planned in this proposed work. The

consolidated Radix of FFT design has a lesser measure of computational way and furthermore

improves the exhibitions of FFT processor. SDF design, the info information successions are going

through one single way. The butterfly preparing component plays out the calculation on the

information. The expansion and deduction activity is done in butterfly components. The changed

convey select viper circuit is utilized for snake activity in this engineering. This snake structure is

extremely productive in this design. The structure of joined R2B, R4B & R8B FFT is appeared in

Fig. 5. The engineering of 16 point SDF FFT is appeared in Fig. 8.

Fig. 5 shows the 16 point SDF engineering. The procedural progression of this engineering is as

per the following; first and foremost the information with files 0 to 7 is put away in the move

register. The R2B components work on this information and the rest of the info information with

records 8 to15.

43
44
Fig. 5.1. Proposed Architecture of pipelined 16-point Combine Radix FFT using SDF and

MCSLA

45
subsequent information from the butterfly expansion activity is passed to the subsequent stage, and the
deduction results are taken care of back to the move register. After that the 8-point information expansion is
passed to the subsequent stage, it has been finished by utilizing Radix-8 butterfly component and the
deduction information from the registers are passed to the subsequent stage has been finished by a R2B with
butterfly-preparing fidget factor coefficient. The following stages are finished by utilizing R2B & R4B FFT.
In the conventional R2SDF FFT, inputs are surrendered to successively, and the four information

sources are prepared alongside the assistance of single butterfly (Processing Element) unit.

Nonetheless, this design of equipment use is more and force utilization due to using or putting

away the majority of superfluous middle preparing computerized signals. To beat this issue, the

plan of R2B, R4B and R8B based SDF FFT designs are consolidated to diminish the equipment

usage of the processor. The proposed strategy which altogether decreases the region, deferral, and

force utilization. The joined R2B, R4B and R8B FFT have been proposed in this engineering for

diminishing the computational stages. For instance 64 point FFT, R2B FFT has 6 phases to

register the FFT yield. R4B FFT has just 3 phases. Contrasted with R2B, R4B and R8B FFT have

just 2 phases. So combined the R2B, R4B and R8B for improving is execution of engineering. In

the proposed strategy, the 16 point FFT is separated into two a large portion of, the initial 8 point

is legitimately got the yield by utilizing R8B FFT. The quantities of stages are decreased and

furthermore diminished the handling time. The following 8 focuses, utilized R4B and R2B FFT.

In the ordinary 16 point R2B FFT, 15 phases of R2B FFT are utilized. In the proposed joined

R2B, R4B and R8B FFT, just 5 phases of R2B FFT has been utilized. When contrasted with

typical R2B FFT, the combined R2B, R4B and R8B FFT has less computational way than the

current technique.

5.1. SDF:

46
The information successions took care of through one single way postpone input.

Fig. 5.1. Representation of Sequential Dataflow in SDF FFT

The butterfly-preparing component plays out the estimations on the information. The defer units are

all the more productively used by having a similar stockpiling among info and yield of butterfly

unit. Butterfly units and multiplier can be utilized 50% because they are bypassed half the time.

SDF FFT is a pipelined based frequency transformation technique. The structure of SDF FFT is just

like "stream-like" processing of block-based algorithm. The representation of sequential data flow

in SDF FFT is shown in Fig. 6. It has single butterfly processor for performing signed addition and

signed subtraction function. Single path delay elements have been used in the feedback structure.

5.2. MCSLA:
The MCSLA is consist of reduce full adder (RFA). RFA is consist of multiplier is shown.

Fig.5.2.RFA Circuit.

47
RFA circuit has been designed by using a minimal number of logic gates. Also, Multiplexer

(MUX) based RFA circuit has been designed in this work to further alleviates the performances of

digital adder circuits.

Gate Count of Full Adder is determined as follows,

Gate Count of FA = Gate Count [(2*XOR) + (2*AND) + (1*OR)]

Gate Count of FA = [(2*5) + (2*1) + (1*1)] = 10+2+1 = 13

Gate Count of Reduced Full Adder = Gate Count [(2*AND) + (1*OR) + (2* NOT) + (1*MUX)]

Gate Count of Reduced Full Adder = [(2*1) + (1*1) + (2*1) + (1*4)] = 2+1+2+4 = 9

Fig. 5.2. Architecture of MCSLA Structure

In the MCSLA, the full adder circuit is greatly reduced. The number of gate counts is reduced

in FA circuit. The RFA circuit is incorporated into CSLA circuit, which is called as MCSLA.

In MCSLA the area consumption has been reduced than the CSLA.

48
CHAPTER 6

XILINX Software
Xilinx Tools is a suite of software tools used for the design of digital circuits implemented
using Xilinx Field Programmable Gate Array (FPGA) or Complex Programmable Logic
Device (CPLD). The design procedure consists of (a) design entry, (b) synthesis and
implementation of the design, (c) functional simulation and (d) testing and verification. Digital
designs can be entered in various ways using the above CAD tools: using a schematic entry tool,
using a hardware description language (HDL) – Verilog or VHDL or a combination of both. In
this lab we will only use the design flow that involves the use of VerilogHDL.

The CAD tools enable you to design combinational and sequential circuits starting with Verilog
HDL design specifications. The steps of this design procedure are listed below:

1. Create Verilog design input file(s) using template driveneditor.


2. Compile and implement the Verilog designfile(s).
3. Create the test-vectors and simulate the design (functional simulation) without using a
PLD (FPGA orCPLD).
4. Assign input/output pins to implement the design on a targetdevice.
5. Download bitstream to an FPGA or CPLDdevice.
6. Test design on FPGA/CPLDdevice

A Verilog input file in the Xilinx software environment consists of the following segments:

Header: module name, list of input and output ports.


Declarations: input and output ports, registers and wires.
Logic Descriptions: equations, state machines and logic functions.
End: endmodule

49
All your designs for this lab must be specified in the above Verilog input format. Note that the
state diagram segment does not exist for combinational logic designs.

2. Programmable Logic Device:FPGA

In this lab digital designs will be implemented in the Basys2 board which has a Xilinx Spartan3E
–XC3S250E FPGA with CP132 package. This FPGA part belongs to the Spartan family of
FPGAs. These devices come in a variety of packages. We will be using devices that are
packaged in 132 pin package with the following part number: XC3S250E-CP132. This FPGA is
a device with about 50K gates. Detailed information on this device is available at the Xilinx
website.

3. Creating a NewProject
Xilinx Tools can be started by clicking on the Project Navigator Icon on the Windows desktop.
This should open up the Project Navigator window on your screen. This window shows (see
Figure 1) the last accessed project.

50
Figure 1: Xilinx Project Navigator window (snapshot from Xilinx ISE software)

3.1 Opening aproject

Select File->New Project to create a new project. This will bring up a new project window
(Figure 2) on the desktop. Fill up the necessary entries as follows:

51
Figure 2: New Project Initiation window (snapshot from Xilinx ISE software)

ProjectName: Write the name of your newproject


Project Location: The directory where you want to store the new project (Note: DO NOT
specify the project location as a folder on Desktop or a folder in the Xilinx\bin directory.
Your H: drive is the best place to put it. The project location path is NOT to have any spaces
in it eg: C:\Nivash\TA\new lab\sample exercises\o_gate is NOT to be used)

Leave the top level module type as HDL.

Example: If the project name were “o_gate”, enter “o_gate” as the project name and then click
“Next”.

Clicking on NEXT should bring up the following window:

52
Figure 3: Device and Design Flow of Project (snapshot from Xilinx ISE software)

For each of the properties given below, click on the ‘value’ area and select from the list of
values that appear.
o Device Family: Family of the FPGA/CPLD used. In this laboratory we will be
using the Spartan3EFPGA’s.
o Device: The number of the actual device. For this lab you may enterXC3S250E
(this can be found on the attached prototyping board)
o Package:Thetypeofpackagewiththenumberofpins.TheSpartanFPGAusedin this lab
is packaged in CP132package.
o Speed Grade: The Speed grade is“-4”.
o Synthesis Tool: XST[VHDL/Verilog]
o Simulator: The tool used to simulate and verify the functionality of the design.
Modelsim simulator is integrated in the Xilinx ISE. Hence choose “Modelsim-XE
Verilog” as the simulator or even Xilinx ISE Simulator can beused.

53
o Then click on NEXT to save theentries.

All project files such as schematics, netlists, Verilog files, VHDL files, etc., will be stored in a
subdirectory with the project name. A project can only have one top level HDL source file (or
schematic). Modules can be added to the project to create a modular, hierarchical design (see
Section 9).

In order to open an existing project in Xilinx Tools, select File->Open Project to show the list
of projects on the machine. Choose the project you want and click OK.

Clicking on NEXT on the above window brings up the following window:

Figure 4: Create New source window (snapshot from Xilinx ISE software)

If creating a new source file, Click on the NEW SOURCE.

3.2 Creating a Verilog HDL input file for a combinational logicdesign

54
In this lab we will enter a design using a structural or RTL description using the Verilog HDL.
You can create a Verilog HDL input file (.v file) using the HDL Editor available in the Xilinx
ISE Tools (or any text editor).

In the previous window, click on the NEW SOURCE

A window pops up as shown in Figure 4. (Note: “Add to project” option is selected by default. If
you do not select it then you will have to add the new source file to the project manually.)

Figure 5: Creating Verilog-HDL source file (snapshot from Xilinx ISE software)

Select Verilog Module and in the “File Name:” area, enter the name of the Verilog source file
you are going to create. Also make sure that the option Add to project is selected so that the
source need not be added to the project again. Then click on Next to accept the entries. This pops

55
up the following window (Figure 5).

Figure 6: Define Verilog Source window (snapshot from Xilinx ISE software)

In the Port Name column, enter the names of all input and output pins and specify the Direction
accordingly. A Vector/Bus can be defined by entering appropriate bit numbers in the MSB/LSB
columns. Then click on Next> to get a window showing all the new source information (Figure 6). If
any changes are to be made, just click on <Back to go back and make changes. If everything is
acceptable, click on Finish > Next > Next > Finish tocontinue.

56
Figure 7: New Project Information window(snapshot from Xilinx ISE software)

Once you click on Finish, the source file will be displayed in the sources window in the
Project Navigator (Figure 1).

If a source has to be removed, just right click on the source file in the Sources in Project
window in the Project Navigator and select Removein that. Then select Project -> Delete
Implementation Data from the Project Navigator menu bar to remove any relatedfiles.

3.3 Editing the Verilog sourcefile

The source file will now be displayed in the Project Navigator window (Figure 8). The source
filewindowcanbeusedasatexteditortomakeanynecessarychangestothesourcefile.All

The input/output pins will be displayed. Save your Verilog program periodically by selecting the
File->Save from the menu. You can also edit Verilog programs in any text editor and add them to the
project directory using “Add Copy Source”.

57
Figure 8: Verilog Source code editor window in the Project Navigator (from Xilinx ISE
software)

Adding Logic in the generated Verilog Source codetemplate:

A brief Verilog Tutorial is available in Appendix-A. Hence, the language syntax and
construction of logic equations can be referred to Appendix-A.

The Verilog source code template generated shows the module name, the list of ports and
also the declarations (input/output) for each port. Combinational logic code can be added
to the verilog code after the declarations and before the endmodule line.

For example, an output z in an OR gate with inputs a and b can be described as,
assign z = a | b;
Remember that the names are case sensitive.

Other constructs for modeling the logicfunction:

A given logic function can be modeled in many ways in verilog. Here is another example
in which the logic function, is implemented as a truth table using a case statement:

moduleor_gate(a,
b,z); input a;
input
b;
output
z;

58
reg z;

always @(a
or b) begin
case
({a,b}) 00:
z =1'b0;
01: z =1'b1;
10: z =1'b1;
11: z =1'b1;
endcase
end
en
dmodule

Suppose we want to describe an OR gate. It can be done using the logic equation as shown in
Figure 9a or using the case statement (describing the truth table) as shown in Figure 9b. These
are just two example constructs to design a logic function. Verilog offers numerous such
constructs to efficiently model designs. A brief tutorial of Verilog is available in Appendix-A.

59
Figure 9: OR gate description using assign statement (snapshot from Xilinx ISE
software)

60
Figure 10: OR gate description using case statement (from Xilinx ISE software)

4. Synthesis and Implementation of theDesign

The design has to be synthesized and implemented before it can be checked for correctness, by running
functional simulation or downloaded onto the prototyping board. With the top-level Verilog file opened
(can be done by double-clicking that file) in the HDL editor window in the right half of the Project
Navigator, and the view of the project being in the Module view , the implement design option can be
seen in the process view. Design entry utilities and Generate Programming File options can also be
seen in the process view. The former can be used to include user constraints, if any and the latter will
be discussed later.

61
To synthesize the design, double click on the Synthesize Design option in the Processes window.

To implement the design, double click the Implement design option in the Processes window. It will
go through steps like Translate, Map and Place & Route. If any of these steps could not be done or
done with errors, it will place a X mark in front of that, otherwise a tick mark will be placed after each
of them to indicate the successful completion. If everything is done successfully, a tick mark will be
placed before the Implement Design option. If thereare

warnings, one can see mark in front of the option indicating that there are some warnings. One can look at
the warnings or errors in the Console window present at the bottom of the Navigator window. Every time
the design file is saved; all these marks disappear asking for a freshcompilation.

62
Figure 11: Implementing the Design (snapshot from Xilinx ISE software)

The schematic diagram of the synthesized verilog code can be viewed by double clicking View RTL
Schematic under Synthesize-XST menu in the Process Window. This would be a handy way to debug
the code if the output is not meeting our specifications in the proto type board.

By double clicking it opens the top level module showing only input(s) and output(s) as shown below.

Figure 12: Top Level Hierarchy of the design

63
By double clicking the rectangle, it opens the realized internal logic as shown
below.

Figure 13: Realized logic by the XilinxISE for the verilog code

5. Functional Simulation of CombinationalDesigns


5.1 Adding the testvectors

To check the functionality of a design, we have to apply test vectors and simulate the circuit. In order
to apply test vectors, a test bench file is written. Essentially it will supply all the inputs to the module
64
designed and will check the outputs of the module. Example: For the 2 input OR Gate, the steps to
generate the test bench is as follows:

In the Sources window (top left corner) right click on the file that you want to generate the test
bench for and select ‘New Source’

Provide a name for the test bench in the file name text box and select ‘ Verilog test fixture’
among the file types in the list on the right side as shown in figure 11.

Figure 14: Adding test vectors to the design (snapshot from Xilinx ISE software)

65
Click on ‘Next’ to proceed. In the next window select the source file with which you want to
associate the test bench.

Figure 15: Associating a module to a testbench (snapshot from Xilinx ISE software)

Click on Next to proceed. In the next window click on Finish. You will now be provided with a
template for your test bench. If it does not open automatically click the radio button next to
Simulation .

66
You should now be able to view your test bench template. The code generated would be something like
this:
moduleo_gate_tb_v;
// Inputs reg
a;
reg b;
// Outputs
wire z;

// Instantiate the Unit Under Test


(UUT) o_gateuut (
.a(a),

.b(b),

.z(z)

67
);

initialbegin

// Initialize Inputs a
= 0;
b =0;

// Wait 100 ns for global reset tofinish


#100;

// Add stimulus here


end
endmodule

The Xilinx tool detects the inputs and outputs of the module that you are going to test an assigns them
initial values. In order to test the gate completely we shall provide all the different input combinations.
‘#100’ is the time delay for which the input has to maintain the current value. After 100 units of time have
elapsed the next set of values can be assign to the inputs.
Complete the test bench as shown below:

moduleo_gate_tb_v;
// Inputs reg
a; regb;
// Outputs
wire z;

68
// Instantiate the Unit Under Test
(UUT) o_gateuut (
.a(a),

.b(b),

.z(z)

);

initialbegin

// Initialize Inputs a
= 0;
b =0;

// Wait 100 ns for global reset to finish #100;


a = 0;

b =1;

// Wait 100 ns for global reset tofinish


#100;
a = 1;

b =0;

69
// Wait 100 ns for global reset tofinish
#100;
a = 1;

b =1;

// Wait 100 ns for global reset tofinish


#100;
end
endmodule
Save your test bench file using the File menu.s

5.2 Simulating and Viewing the OutputWaveforms

Now under the Processes window (making sure that the testbench file in the Sources window is
selected) expand the ModelSim simulator Tab by clicking on the add sign next to it. Double Click on
Simulate Behavioral Model. You will probably receive a complier error. This is nothing to worry
about – answer “No” when asked if you wish to abort simulation. This should cause ModelSim to open. Wait for
it to complete execution. If you wish to not receive the compiler error, right click on Simulate Behavioral
Model and select process properties. Mark the
checkbox next to “Ignore Pre-Complied Library Warning Check”.

70
Figure 16: Simulating the design (snapshot from Xilinx ISE software)

5.3 Saving the simulationresults

To save the simulation results, Go to the waveform window of the Modelsim simulator, Click on File -
> Print to Postscript -> give desired filename and location.

Notethatbydefault,thewaveformis“zoomedin”tothenanosecondlevel. Use the zoom


controls to display the entirewaveform.

Else a normal print screen option can be used on the waveform window and subsequently stored in
Paint.

Figure 17: Behavioral Simulation output Waveform (Snapshot from ModelSim)


71
For taking printouts for the lab reports, convert the black background to white in Tools -> Edit
Preferences. Then click Wave Windows -> Wave Background attribute.

72
Figure 18: Changing Waveform Background in ModelSim

CHAPTER 7
RESULTS AND SIMULATION

73
CHAPTER 8

CONCLUSION

Combined R2B, R4B and R8B based SDF FFT has been structured in this exploration work. Radix-2

SDF FFT has more equipment use and computational stages likewise expanded. To conquer this issue,

built up a consolidated R2B, R4B and R8B butterfly structure based SDF FFT strategy in this work. The

proposed design offers 53.38% decrease in slice, 56.27% decrease in LUTs, 32.35% decrease in delay

and 27.36% decrease in power utilization than the conventional strategy. Contrasted with the ordinary

technique, the proposed strategy for computational stages is diminished and gives preferred exhibitions
74
over the conventional one.

REFERENCES

Fahad Qureshi, Muazam Ali, and Jarmo Takala, “Multiplier-less Reconfigurable Processing Element
for Mixed Radix-2/3/4/5 FFTs”, International Conference on 2017 IEEE.

Arathi Ajay and Dr. R. Mary Lourde, “VLSI Implementation of an improved multiplier for FFT
Computation in Biomedical Applications”, Computer Society Annual Symposium on VLSI, PP.
68-73, IEEE 2015.

V. Arunachalam and Alex Noel Joseph Raj, “Efficient VLSI implementation of FFT for orthogonal
frequency division multiplexing applications”, Published in IET Circuits, Devices & Systems, Vol.
8, No. 06, PP. 526-531, IET 2014.

Jienan Chen, Jianhao Hu and Shuyang Lee, “Hardware Efficient Mixed Radix-25/16/9 FFT for LTE
Systems”, IEEE Transactions on Very Large Scale Integration (VLSI) Systems, PP. 01-09, IEEE

75
2014.

Sidinei Ghissoni, Eduardo A. C. da Costa and Angelo Gonçalves da Luz, “Implementation of Power
Efficient Multicore FFT Data paths by Reordering the Twiddle Factors”, PP. 4562-4568, IEEE
2014.

Naman Govil and Shubhajit Roy Chowdhury, “High Performance and Low Cost Implementation of
Fast Fourier Transform Algorithm based on Hardware Software Co-design”, IEEE Region 10
Symposium, PP. 403-407, IEEE 2014.

Harpreet Singh Dhillon and Abhijit Mitra, “A Reduced-bit Multiplication Algorithm for Digital
Arithmetic”, International Journal of Computational and Mathematical Sciences, Febrauary 2008,
pp.6469.

Sumit Vaidya and Depak Dandekar. “Delay-power performance comparison of multipliers in VLSI
circuit design”. International Journal of Computer Networks & Communications (IJCNC), Vol.2,
No.4, July 2010.

76

You might also like