0% found this document useful (0 votes)
11 views107 pages

Module 1

Uploaded by

kgaddigoudar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
0% found this document useful (0 votes)
11 views107 pages

Module 1

Uploaded by

kgaddigoudar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 107

Module 1:

Introduction to ASIC’s
TEXT 1:
APPLICATION SPECIFIC INTEGRATED CIRCUITS
-MICHAEL JOHN
& SEBASTIAN SMITH
Introduction:

• ASIC [―a-sick] is an acronym for Application Specific Integrated


Circuit.
• As the name indicates, ASIC is a nonstandard integrated circuit that
is designed for a specific use or application.
• Generally an ASIC design will be undertaken for a product that will
have a large production run , and the ASIC may contain a very large
part of the electronics needed on a single integrated circuit.
Examples for ASIC IC‘s are :

1)A chip for a toy bear that talks;


2)A chip for a satellite.
3) A chip designed to handle the interface between memory
and a microprocessor for a workstation CPU.
3)A chip containing a microprocessor as a cell together with
other logic.
ASSPs:

•Two ICs that might or might not be considered as ASICs


are, a controller chip for a PC and a chip for a modem.
•Both of these examples are specific to an application
(shades of an ASIC) but are sold to many different system
vendors (shades of a standard part).
• ASICs such as these are sometimes called application
specific standard products ( ASSPs ).
Types of ASICs:
The classification of ASICs is shown below:
So, as shown in the slide the
ASICs are broadly classified
into three types:
• I. Full-Custom ASICs
• II. Semi-custom ASICs
• III. Programmable ASICs
Full-Custom ASICs:

• A Full custom ASIC is one which includes some (possibly all) logic cells
that are customized and all mask layers that are customized.
• A microprocessor is an example of a full-custom IC . Designers spend
many hours squeezing the most out of every last square micron of
microprocessor chip space by hand.
• Customizing all of the IC features in this way allows designers to include
analog circuits, optimized memory cells, or mechanical structures on an
IC, for example.
Note:Full-custom ICs are the most expensive to manufacture and to
design.
•The manufacturing lead time (the time required just to make an IC not
including design time) is typically eight weeks for a full-custom IC.
• These specialized full-custom ICs are often intended for a specific
application so, we might call some of them as full-custom ASICs.
• In a full-custom ASIC an engineer designs some or all of the logic cells,
circuits, or layout specifically for one ASIC. This means the designer avoids
using pretested and pre-characterized cells for all or part of that design.
•This might be because existing cell libraries are not fast enough, or the
logic cells are not small enough or consume too much power.
When do we go for full custom ASIC IC
design:
One has to use full-custom design if the ASIC technology is new or so
specialized that there are no existing cell libraries or because the
ASIC is so specialized that some circuits must be custom designed.
• Fewer and fewer full-custom ICs are being designed because of the
problems with these special parts of the ASIC.
• The growing member of this family, now a days is , the mixed
analog/digital ASIC.
Semicustom ASICs :

• ASICs , for which all of the logic cells are predesigned and some
(possibly all) of the mask layers are customized are called semi
custom ASICs.
• Using the predesigned cells from a cell library makes the design ,
much easier.
• There are two types of semicustom ASICs.
(i) Standard-cell–based ASICs
(ii)Gate-array– based ASICs.
Standard-Cell Based ASICs:
•A cell-based ASIC (cell-based IC, or CBIC pronounced sea-bick) uses
predesigned logic cells (AND gates, OR gates, multiplexers, and flip-flops,
for example) known as standard cells.
• One can apply the term CBIC to any IC that uses cells, but it is generally
accepted that a cell-based ASIC or CBIC means a standard cell based ASIC.
• The standard-cell areas (also called flexible blocks) in a CBIC are built of
rows of standard cells like a wall built of bricks. The standard cell areas
may be used in combination with microcontrollers or even
microprocessors, known as mega cells. Mega cells are also called mega
functions, full-custom blocks, system-level macros (SLMs), fixed blocks,
cores, or Functional Standard Blocks (FSBs).
Standard-Cell Based ASICs:

A cell-based ASIC (CBIC) die with a single standard-cell area (a flexible block) together with four fixed
blocks.
• The ASIC designer defines only the placement of the standard cells and
the interconnect in a CBIC area. However, the standard cells can be placed
anywhere on the silicon; this means that all the mask layers of a CBIC are
customized and are unique to a particular customer.
• The advantage of CBICs is that designers save time, money, and reduce
risk by using a predesigned, pretested, and pre characterized standard-cell
library.
• In addition each standard cell can be optimized individually. During the
design of the cell library each and every transistor in every standard cell
can be chosen to maximize speed or minimize area .
DISADVANTAGE:
• The disadvantages are the time or expense of designing or buying
the standard-cell library and the time needed to fabricate all layers
of the ASIC for each new design.
Layout of an standard cell:
Routing in CBIC:
Gate-Array Based ASICs:
• In a gate array (sometimes abbreviated GA) or gate-array based ASIC the transistors
are predefined on the silicon wafer.
• The predefined pattern of transistors on a gate array is the base array , and the
smallest element that is replicated to make the base array is the base cell (sometimes
called a primitive cell ).
• Only the top few layers of metal, which define the interconnect between transistors,
are defined by the designer using custom masks. To distinguish this type of gate array
from other types of gate array, it is often called a masked gate array ( MGA).
• The designer chooses from a gate-array library of predesigned and pre-characterized
logic cells.
•The logic cells in a gate-array library are often called macros . The reason for this is
that the base-cell layout is the same for each logic cell, and only the interconnect
(inside cells and between cells) is customized, which is similar to a software macro.
Features of MGA:
• Only the interconnect is customized.
• The interconnect uses predefined spaces for interconnect.
• Manufacturing lead time is between two days and two weeks.
Types of MGA or Gate-array based ASICs
There are three types of Gate Array based ASICs:
• Channeled gate arrays.
• Channelless gate arrays.
• Structured gate arrays.
Channeled gate arrays:
• The channeled gate array was the first to
be developed . In a channeled gate array
space is left between the rows of
transistors for wiring.
• A channeled gate array is similar to a
CBIC. Both use the rows of cells separated
by channels used for interconnect. One
difference is that the space for interconnect
between rows of cells are fixed in height in
a channeled gate array, whereas the space
between rows of cells may be adjusted in a
CBIC.
Features of MGA:
• Only the interconnect is customized.
• The interconnect uses predefined spaces between the rows for
interconnect.
• Manufacturing lead time is between two days and two weeks.
Channel less Gate Array:
• This channel less gate-array architecture is now more widely used .
The routing on a channelless gate array uses rows of unused
transistors.
• The key difference between a channel less gate array and
channeled gate array is that there are no predefined areas set aside
for routing between cells on a channel less gate array. Instead we
route over the top of the gate-array devices. We can do this because
we customize the contact layer that defines the connections
between metal 1, the first layer of metal, and the transistors.
Features of Channel less Gate Array(Sea of
Gates)SOG:

• Only the interconnect is customized.


• The interconnect uses predefined spaces
on top of unused base cells.
• Manufacturing lead time is around two
days to two weeks.
• When we use an area of transistors for
routing in a channel less array, we do not
make any contacts to the devices lying
underneath , we simply leave
the transistors unused.
•The logic density ,the amount of logic that can be implemented in a given silicon
area is higher for channel less gate arrays than for channeled gate arrays. This is
usually attributed to the difference in structure between the two types of array.
•In fact, the difference occurs because the contact mask is customized in a
channel less gate array, but is not usually customized in a channeled gate array.
•This leads to denser cells in the channel less architectures. Customizing the
contact layer in a channel less gate array allows us to increase the density of
gate-array cells because we can route over the top of unused contact sites.
Structured Gate Array:
• This design combines some of the features of
CBICs and MGAs . It is also known as an embedded
gate array or structured gate array(also called as
master slice or master image).
• One of the limitations of the MGA is the fixed gate
array base cell. This makes the implementation of
memory, difficult and inefficient.
• In an embedded gate array some of the IC area is
set aside and dedicate it to a specific function. This
embedded area either can contain a different base
cell that is more suitable for building memory cells,
or it can contain a complete circuit block, such as a
microcontroller.
Features of Structured Gate Array:
• Only the interconnect is customized.
• Custom blocks (the same for each design) can be embedded.
• Manufacturing lead time is between two days and two weeks.
• An embedded gate array gives the improved area efficiency
and increased performance of a CBIC but with the lower cost
and faster turn around of an MGA.
• The disadvantage of an embedded gate array is that the
embedded function is fixed.
• For example, if an embedded gate array contains an area
set aside for a 32 k-bit memory, but we only need a 16 k-bit
memory, then we may have to waste half of the embedded
memory function. However, this may still be more efficient
and cheaper than implementing a 32 k-bit memory using
macros on a SOG array.
Programmable Logic Devices:
• Programmable logic devices ( PLDs )
are standard ICs that are available in
standard configurations.
• However, PLDs may be configured or
programmed to create a part customized
to a specific application, and so they also
belong to the family of ASICs.
• PLDs use different technologies to
allow programming of the device.
Features of PLDs:
• No customized mask layers or logic cells.
• Fast design turnaround.
• A single large block of programmable interconnect.
• A matrix of logic macro cells that usually consist of programmable array logic
followed by a flip-flop or latch.
• The simplest type of programmable IC is a read-only memory( ROM ). The most
common types of ROM use a metal fuse that can be blown permanently (a
programmable ROM or PROM ).
• An electrically programmable ROM , or EPROM , uses programmable MOS
transistors whose characteristics are altered by applying a high voltage.
• One can erase an EPROM either by using another high voltage (an
electrically erasable PROM , or EEPROM ) or by exposing the device
to ultraviolet light (UV-erasable PROM, or UVPROM).
• There is another type of ROM that can be placed on any ASIC a
mask-programmable ROM (mask programmed ROM or masked
ROM). A masked ROM is a regular array of transistors permanently
programmed using custom mask patterns. So, an embedded masked
ROM is a large specialized logic cell.
•Depending on how PLD’s are programmed they can be
grouped into :
◦ 1) Erasable PLD’s
◦ 2) Mask- Programmed PLD’s.
•Earlier PLA,PAL,PLD’s used Bipolar technology and used
Programmable Fuses or Links.
• CMOS PLD’s usually employ Floating-gate transistor.
•The same programmable technology can be used to build other flexible logic Structures.
•By using the programmable devices in a large array of AND gates and OR gates , we can
create a family of flexible and programmable logic devices called Logic arrays.
•Monolithic memories was the first company to produce Programmable Array Logics(PAL)
which can be used as the transition decoder for state machine design . Or PAL‘s may also
include Flip-Flop‘s or registers to design a complete state machine.
•Just like masked programmable ROM we can place logic array as a cell on custom ASIC ,
this type of logic array is called Programmable logic array(PLA)
•The difference between PAL and PLA is that PLA has both AND and OR plane that are
programmable, where as PAL‘s have only And Plane that is programmable.
Field-Programmable Gate Arrays(FPGAs):
• FPGAs are the newest member of the ASIC family and are rapidly
growing in importance , Replacing TTL in microelectronic systems,
also they are a step above the PLD‘s in terms of their complexity.
Even though an FPGA is a type of gate array, we do not consider the
term gate-array based ASICs to include FPGAs.
• There is very little difference between an FPGA and a PLD .An FPGA
is usually just larger and more complex than a PLD. In fact, some
vendors that manufacture programmable ASICs call their products as
FPGAs and some call them as complex PLDs .
Characteristics of an FPGA:
• None of the mask layers are customized.
• There is a method for programming the basic logic
cells and the interconnect.
• The core is a regular array of programmable basic
logic cells that can implement combinational as well
as sequential logic (flip-flops).
• A matrix of programmable interconnect surrounds
the basic logic cells.
• Programmable I/O cells surround the core.
• Design turnaround is a few hours.
•The architecture consists of configurable logic blocks, configurable
I/O blocks, and programmable interconnect. Also, there will be clock
circuitry for driving the clock signals to each logic block, and
additional logic resources such as ALUs, memory, and decoders may
be available.
•The two basic types of programmable elements for an FPGA are:
◦ 1) Static RAM
◦ 2) anti-fuses.
Design Flow:
Design Flow:
• The sequence of steps to design an ASIC is known as the Design flow . The various
steps involved in ASIC design flow are given below:
1. Design entry : Design entry is a stage where the micro architecture is
implemented in a Hardware Description language like VHDL, Verilog , System Verilog
etc.
• In early days , a schematic editor was used for design entry where designers
instantiated gates. Increased complexity in the current designs require the use of
HDLs to gain productivity . Another advantage is that HDLs are independent of
process technology and hence can be reused over time
2. Logic synthesis: Use an HDL (VHDL or Verilog) and a logic synthesis tool to
produce a net list a description of the logic cells and their connections
3.System partitioning : Divide a large system into ASIC sized pieces.
4. Pre-layout simulation: Check to see if the design functions correctly.
5. Floor planning: Arrange the blocks of the netlist on the chip.
6. Placement: Decide the locations of cells in a block.
7. Routing: Make the connections between cells and blocks.
8.Extraction : Determine the resistance and capacitance of the interconnect.
9. Post layout simulation: It is used to check to see whether the design still
works with the added loads of the interconnect or not. The flow diagram is
shown in the next slide:
• Inthe flow diagram the steps from 1 to 4 are part of logical design
,and steps from 5 to 9 are part of physical design.
• When we are performing system partitioning we have to consider
both logical and physical factors.
ASIC CELL- LIBRARIES:
•For a programmable ASIC the FPGA company provides us with a
library of logic cells in the form of design-kit ( usually there are no
other options but to buy from them).
•For MGA‘s and CBICS we are provided with 3 choices:
◦ 1)ASIC vendor will supply.
◦ 2) third party library vendor.
◦ 3) we can build our own library
•The first choice requires us to use a set of design tools approved by the
ASIC vendor to enter and simulate our design.
•We have to buy the tools and cost of the cell library is folded into NRE .
Some ASIC vendors supply tools that they have developed in –house.
•An ASIC vendor library is usually an Phantom library( cells are empty
boxes.), but contain enough information for layout(like bounding boxes or
abuttment boxes.).
•After completing the design we have to hand over the netlist to the ASIC
vendor he will fill the empty boxes before manufacturing the chip.
•The second and third choice requires us to make the buy-or build decision.
•If we complete an ASIC design using a cell- library that we bought then we own
the masks( the tooling) that are used to manufacture our ASIC’s hence called as
―Customer owned tooling(COT).
•A library vendor normally develops library using the information supplied by an
ASIC foundry. An ASIC foundry will only provide manufacturing without design
help, if the cell – library meets the foundry specification then it is called
―Qualified cell library. These library cells are normally expensive , but if a library
is qualified at several foundries then buying a costly design would be cheaper on
the longer run for high volume production
•The third option is to develop an library in-house. Large companies make this choice
even though it‘s a complex and expensive process.
•However created an library cell should have the following: 1)Physical layout
2) Behavioral model
3) Verilog/ VHDL model
4) The deatailed timing model
5) Test strategy
6)Circuit schematic
7)The cell icon
8) Wire-load model
9) Routing model
•The ASIC designer needs a Behavioral model as simulation at the detailed timing level takes
too long.
•ASIC designers also need detailed timing model for each cell to determine the performance
of the critical component of an ASIC.
•It is too difficult ,too expensive and too time consuming to build every cell in Si and measure
the cell delays.
•Hence library engineers simulate the design of each cell (Charecterization).
•Characterization of a standard cell/gate array involves circuit extraction from the full custom
cell layout for each cell, this extracted schematic includes all parasitic R & C.
•The library engineers perform a simulation of each cell including the parasitic elements to
determine switching delays.
The simulation models for the
transistors are derived from
measurements on special chip
included on a wafer Called ―
Process control monitors or
―Drop-ins.
•Library engineers then use the
result of circuit simulation to
generate detailed timing model.
•All ASIC‘s needs to be production tested, simple blocks can be tested using
automated techniques but complex designs such as RAM‘s or multipliers need a
planned strategy.
•The cell schematic describe each cell so that the cell designer can perform
simulation for complex cells. It is enough to have sufficient information to check
LVS.
•In schematic entry each cell needs an ICON along with connector and naming
information which can be avoided using synthesis information . Synthesis also
makes it easy to retargetting and moving between cell libraries.
•A statistical estimate/ look up table is created for all parasitics called wire-load -
model
•We also require routing model for each cell.
•Large cells have way too much information for automated tools to
handle hence we need a phantom representation of large cells.
•These phantom version of physical layout do not have detail layout
information, but they still have enough information to tell the
automated tool where to route and where not to route on the cell,
along with the location and type of connection for the cells.
Datapath logic cells:
What is a datapath?
The layout of buswide logic that operates on
data signals is called a Datapath.

What is the difference between datapath and


standard cells?
• In CBIC or MGA Cells are placed together in
rows but generally there is no regularity to the
arrangement of the cells within the rows—we
let software arrange the cells and complete
the interconnect.
•Just as we do for Standard cell library we make all datapath cells in
library the same height so that we can ABUT other datapath cells on
either side of the cell (ADD module), to create more complex
datapath.
•It’s a normal assumption that if number of bits increase for data the
datapath grows in height and if number function increase then the
datapath grows in width, but we can always rotate and position a
completed datapath in any direction we want on a chip.
• Datapath layout automatically takes care of most of the
interconnect between the cells with the following advantages: –
Regular layout, produces predictable and equal delay for each bit. –
Interconnect between cells can be built-into each cell.
• Disadvantages of a Datapath:
◦ - overhead (buffering& routing control signals)
◦ - harder design(must be predesigned for use in wide range of datpath size)
◦ - software is more complex
Suppose we want to design a Full Adder (FA):
◦ – Sum=A ^ B ^ CIN = Parity(A,B,CIN)
◦– COUT=AB+ACIN+BCIN=MAJ(A,B,CIN)
• Combine the two functions to a single FA logic cell:
ADD(A[i],B[i],CIN,S[i],COUT)
•The module ADD here is a Datapath cell or Datapath
element.
4-Bit Full Adder in Datapath
Datapath Elements:
Symbols for datapath elements(adder):
rule:
◦ -0.5 point line for control signals/bus.
◦ -1.5 point line with a stroke for data signals/bus.
ADDERS:
Generate and Propagate:
If A=1 and B=1 A carry is generated no mater what the value of carry input (Ci ) is: i.e. Generate = G = A & B.
If only A=1 or only B=1 The value of carry input is propagated to the carry output: .i.e. Propagate = P = A ^ B or P =
A+B.
method 1 method 2
G[i] = A[i] · B[i] G[i] = A[i] · B[i]
P[i] = A[i] ∧ B[i] (mux ) P[i] = A[i] + B[i] (and /or structure for Co)
C[i] = G[i] + P[i] · C[i–1] C[i] = G[i] + P[i] · C[i–1]
S[i] = P[i] ∧ C[i–1] S[i] = A[i] ∧ B[i] ∧ C[i–1]
Carry signal(critical path):
Carry out can be written as:
C[i] = A[i] · B[i] + P[i] · C[i – 1] ≡ G[i] + P[i] · C[i – 1]
or
C[i] = (A[i] + B[i]) · (P[i]' + C[i – 1]).
where P[i]'=NOT(P[i])
Alternative implementation of RCA with Carry-chain using two-input
NAND gates, one per cell and an extra connection:
Only considering the carry terms(critical path):
even stages : odd stages:
C1[i]' = P[i ] · C3[i – 1] · C4[i – 1] C3[i]' = P[i ] · C1[i – 1] · C2[i – 1]
C2[i] = A[i] + B[i ] C4[i]' = A[i] · B[i ]
C[i] = C1[i ] · C2[i ] C[i] = C3[i ]'+ C4[i ]'
The carry inputs to stage zero of (b) are C3[–1] = C4[–1] = '0'. We can use this RCA in a datapath, with standard cells, or on a gate
array.
Carry-save adder ( CSA ):
Instead of propagating the carries through each stage of an RCA, Figure 2.23 shows a different
approach.
A carry-save adder ( CSA ) cell:
CSA(A1[ i ], A2[ i ], A3[ i ], CIN, S1[ i ], S2[ i ], COUT)
has three outputs:
S1[ i ] = CIN
S2[ i ] = A1[ i ] ⊕ A2[ i ] ⊕ A3[ i ]
= PARITY(A1[ i ], A2[ i ], A3[ i ])
COUT = A1[ i ] · A2[ i ] + [(A1[ i ] + A2[ i ]) · A3[ i ]]
= MAJ(A1[ i ], A2[ i ], A3[ i ]) .
•We can use a CSA to add multiple inputs .
•At the last stage we are using RCA or CPA to find the final sum.
•We can register the CSA stages by adding vectors of flip flops.
• this reduces the adder delay to slowest stage of adder(usually the
RCA/CPA).
•By using registers between the stages of combinational logic we use
pipelining to increase the speed and pay a price of increased area and
introduce latency.
•It takes a few clock cycles to fill the pipeline but once it is filled , the
answers emerge every clock cycle.
(a) A CSA cell. (b) A 4-bit CSA. (c) Symbol for a CSA. (d) A four-input CSA. (e) The datapath for a four-input, 4-bit adder using CSAs
with a ripple-carry adder (RCA) as the final stage. (f) A pipelined adder. (g) The datapath for the pipelined version showing the
pipeline registers as well as the clock control lines that use m2. The carry-save adder (CSA) :
Carry Bypass and Carry Skip Adder:
•We can also pipeline to RCA.
•The problem with an RCA is the delay associated with each stage
•If we closely examine the P signal , we can bypass this critical path.
•Example : to bypass the the carries for the bits 4-7 of an adder we can compute bypass signal
by : BYPASS:P[4].P[5].P[6].P[7]
•And then use a MUX as follows: C(7)+=(G(7)+P(7).C(6)+).BYPASS’ + C(3).BYPASS
•Large custom adders use Manchester carry chains to compute the carries and TG’s or Bypass
transistors for bypass operation. These types of carry chains may be part of predesigned ASIC
adder cell but are not used by ASIC designer.
•Instead of checking the Propagate signals we can check the input signals to compute SKIP signal
which is analogous to BYPASS.
SKIP[i]=(A[i-1] xorB[i-1])+(A[i]xorB[i]) and then use 2:1 mux to select C[i].
CSKIP[i] = (G[i]+P[i].C[i-1]).SKIP’+C[i-2].SKIP.
Carry Bypass and SKIP adders might Have redundant logic since we have to compute the results with and
without carry.
Also we must be carefull that the redundant logic is not optimized away during the synthesis.
Carry look ahead adder:
C[1]=G[1]+P[1].C[0]
= G[1]+P[1].(G[0]+P[0].C[-1])
=G[1]+P[1].G[0].
C[2]=G[2]+P[2].G[1]+p[2].P[1].G[0].
C[3]=G[3]+P[3].G[2]+P[2].P[1].g[1]+p[3].P[2].P[1].G[0].
Where G[i]= A[i]B[i] &
P[i]= A[i]+B[i].
Carry select adder:
Here we duplicate 2 small adders (usually 4-bit or 8-bit CLA’s) for cases CIN=0 and CIN=1 and the
use a mux to selct the case we need, wasteful but fast.
This adder is used in Datapath because of its regular layout.
With equal number of adders in block:

With unequal number of adders in block:


Conditional Sum adder:
•If we have an n-bit adder that generates 2 sums:
one assuming carry =1 and other assuming carry
=0, we can split the n-bit adder into i-bit adder for
the I LSB’s and an n-I bit adder for n-I MSB’s.’
•Both of the smaller adder generates 2 conditional
sums as well as true and complementary carry
signals .
•The 2 carry signals from the LSB adder are now
used select between the two (n-i+1)bit conditional
sums from the MSB adder using the 2(n-i+1) two
input muxes this is how a Conditional Sum Adder
functions.
• The conditional sum adders are the fastest of all
the adders discussed up until now(as the no of bits
“n” tends to increases).
Multipliers:
•Multipliers play an important role in today‘s digital signal processing and various other
applications.
•Essential design targets of multiplier include high speed, low power consumption, regularity of
layout and hence less area or even combination of them in one multiplier are required thereby
making them suitable for various VLSI implementations.
•The straightforward way to implement a multiplication is based on an iterative adder-
accumulator for the generated partial products . This multiplier is called a serial multiplier.
• Mental arithmetic: 15 (multiplicand) 19 (multiplier) = 15×(20–1) = 15 ×21‘
• Suppose we want to multiply8-bit binary number A by B=00010111 (decimal
16+4+2+1=23)
• Use the canonical signed-digit vector representation of B (CSD vector of B is )
D=00101‘001 (decimal 32–8+1= 23)
• B has a weight of 4( number of 1‘s , but D has a weight of 3(number of 1‘s)
— this requires only three add/subtract operations than the normal
multiplication operation and thus saves hardware.
Booth Multipliers:
•It is a powerful algorithm for signed-number multiplication, which treats both positive and
negative numbers uniformly .
•For the standard add-shift operation, each multiplier bit generates one multiple of the
multiplicand to be added to the partial product. If the multiplier is very large, then a large
number of multiplicands have to be added. In this case the delay of multiplier is determined
mainly by the number of additions to be performed. If there is a way to reduce the number of
the additions, the performance will get better.
•Booth algorithm is a method that will reduce the number of multiplicand multiples. For a given
range of numbers to be represented, a higher representation radix leads to fewer digits. Since a
k-bit binary number can be interpreted as K/2-digit radix-4 number, a K/3-digit radix-8 number,
and so on, it can deal with more than one bit of the multiplier in each cycle by using high radix
multiplication. This is shown for Radix-4 in the example below.
BOOTHS recoding table:
•As shown in the figure above, if multiplication is done in radix 4, in each step, the partial
product term (Bi+1Bi)2 A needs to be formed and added to the cumulative partial product.
Whereas in radix-2 multiplication, each row of dots in the partial products matrix represents 0 or
a shifted version of A must be included and added.
•Table 1below is used to convert a binary number to radix-4 number . Initially, a ‘0’ is placed to
the right most bit of the multiplier. Then 3 bits of the multiplicand is recoded according to table
below or according to the following equation: Zi = -2xi+1 + xi + xi-1
•Example: B= 0101110 Multiplier is equal to 0 1 0 1 1 10 then a 0 is placed to the right most bit
which gives 0 1 0 1 1 10 0 the 3 digits are selected at a time with overlapping left most bit as
follows:
For example: an unsigned number can be converted into a signed-digit number radix 4: (10 01 11 01
10 10 11 10)2 = ( –2 2 –1 2 –1 –1 0 –2)4.
•Here –2*multiplicand is actually the 2s complement of the multiplicand with an equivalent left shift
of one bit position.
•Also, +2 *multiplicand is the multiplicand shifted left one bit position which is equivalent to
multiplying by 2.
•To enter 2*multiplicand into the adder, an (n+1)-bit adder is required.
• In this case, the multiplicand is offset one bit to the left to enter into the adder while for the low-
order multiplicand position a 0 is added.
•Each time the partial product is shifted two bit positions to the right and the sign is extended to the
left.
•During each add-shift cycle, different versions of the multiplicand are added to the new partial
product depends on the equation derived from the bit-pair recoding table above.
Booth encoding reduces the number of partial products by a factor of two, thus reduces the area and increases the speed of
multiplier
Other Datapath operators:
The symbols for some other datapath elements.
•A bold line is used for datapath cells
•Regular line is used for scalar symbols .

(a)An array/vector of flip-flops. (b) A 2-input nand cell with a databus inputs. (c)2- input nand cell with control
inputs. (d)A buswide MUX. (e) An Incrementer / Decrementer. (f) An all zeros Detector. (g)An all one detector
(h)An adder /Subtractor.
A Subtractor:
•A subtractor is similar to an adder except in a full subtractor we have borrow-in BIN, Borrow
out BOUT and a difference signal DIFF.
•These equations are same as full adder except that B inputs are inverted and the sense of the
carry chain is inverted.

To build a subtractor that calculates(A - B) we invert the entire B input bus and connect the BIN[
not to VSS] Example: A=1001;B=0011; We calculate it as A+B’+1=1001+1100+1=0110. As in
adder oveflow is calculated as XOR(BOT[MSB],BOUT[MSB-1]).
• we can build ripple-borrow subtractor , a borrow-save subtractor, and a
borrow-select subtractors in the same way we built these adder
architectures.
•An Adder /Subtractor has a control signals that gates the A input with an
Exclusive –OR cell to switch between an adder or subtractor.
•Some adder /subtractor gate both inputs to allow us to compute(-A-B).
•We must be careful to connect the inputs to the LSB of the carry chain
(CIN[0] or BIN[0]) when changing between the addition and subtraction.
A barrel shifter:
•A barrel shifter shifts or rotates input bus by a
specified amount.
•A barrel shifter can shift either to right or left
•A barrel shifter can have output width that is less
than input bus width
•These shifters are extensively used in floating point
arithmetic to align floating point numbers(with sign,
exponent, and mantissa).
•A leading-one detector along with the left-barrel
shifter(normalizing) is used to align mantissa in a
floating point numbers( example 2).
Example:
1) Input A=11110000; and a shift right and by 00010000(3 encoded by bit
position) times. Output Z=00011110.
2)normalizing: A=00000101;leading one –detector output
is=00000100(fifth bit position from MSB).
If we feed the leading one-detector output to the shift select input of the
normalizing barrel shifter the shifter will normalize the input A. The
output Z=10100000 Now that Z is aligned (with 1 in MSB). It can be
multiplied by another normalized number.
•The output of a priority encoder is the binary encoded position of the
leading one in an input
Example :
if A-00000101 with the leading one at position 3 then the output of
the 4-bit priority encoder will be Z= 0011.
•Some cell libraries have reverse encoding where it yields Z=0101(5),
this kind of encoding is used in Floating point arithmetic .
•If A is mantissa and we normalize A to 10100000 we have to
subtract 5 from the exponent, this exponent correction is equal to
the output of the priority encoder
Accumulator:
•Its an Adder/ Subtractor along with a register,
•Sometimes it is combined with a multiplier to form an Multiplier-accumulator(MAC).

Incrementer:
• It adds one to the input bus Z= A+1; We can use this function together with register to
negate a two‘s complement number : Z[i]=XOR(A[i],CIN[i]); COUT[i]= AND(A[i],CIN[i]).
•The CIN[0] thus acts as a control input . If we set it to ‗0‘ then output is same as input.
•The implementation of arithmetic cells is more complicated than we have
explained CMOS logic is normally inverting, so that it is faster to implement an
incrementer as : Z[i(even)]=XOR(A[i],CIN[i]) & COUT[i(even)]=NAND(A[i], CIN[i]).
•This inverts the COUT so that in the following stages we must invert it again
•If we push the inverting bubble to the input CIN we get: Z[i(odd)]=
XNOR(A[i],CIN[i]) & COUT[i(odd)]=NOR(NOT(A[i],CIN[i]).
•In many datapath implementations all odd- bits cells operate on inverted carry
signals, and thus the ODD and EVEN bit datapath elements are different
normally this is hidden from the designed in the datapath assembly and any
output control signals are inverted, if necessary by inserting buffers.
Decrementer:
•A decrementer subtracts 1 from the input bus
•The logical implementation is: Z[i]= XOR(A[i],CIN[i]) and
COUT[i]=AND(NOT(A[i]),CIN[i]).
•The implementation may invert odd carry signals , with CIN[0] again acting as
enable signals.
•An Incrementer/Decrementer has a second control signal that gates the input,
inverting the input to the carry chain. This has the same effect of Selecting
incrementer or Decrementer function.
ALL ZERO DETECTOR and ALL ONE DETECTOR:
While using all zero and one detector we have to remember that :
• in one‘s complement arithmetic zero is represented by both 1111 and 0000.
•In signed magnitude arithmetic zero is represented by both 1000 and 0000.
A Register File( or scratchpad memory):

•Is a bank of Flip-Flops arranged across the bus


•Some times these have the option of multiple ports (multiport register files) for
read and write.
•Normally these register files are the densest and hardest to fit in the datapath.
•For large register files it may be appropriate to use multi port memory .
•We can add control logic to this register file to create a FIFO or LIFO register.
•The standard cell version and gate-array macro version of the sequential cells
each contain their own clock buffers because we don‘t know where a standard
cell or a gate-array macro will be placed on a chip, also we don‘t know the
condition of the clock signals coming into the sequential cells.
•The ability to place the clock buffers outside the sequential cells gives us more
flexibility and saves space.
•For example we can place the clock buffers for all the clocked elements at the top of
the datapath (together with buffers for the control signals) or
• River route (here interconnection lines all flow in the same direction in the same
layer) the connections to the clock lines.
•This saves space and allows us to guarantee the clock skew and timing, however there
is a fixed overhead associated with the datapath. Some tools allow us to design logic
using a portable netlist.
• after we complete the design we can decide whether to implement the portable
netlist in a datapath, standard cells or even a gate array, based on area, speed, or power
considerations.
I/O Cells:
•The figure shows the three-state bidirectional
output buffer(Tri-state is a registered trade mark
of National semiconductor).
•When the OE signal is high the circuit functions
as a noninverting buffer driving the value of
DATAout onto the I/O pad.
•When OE is low , the output transistors or
drivers ,M1 and M2 are disconnected, this allows
mutiple drivers to be connected on a bus .
• It is upto the designer to make sure that a bus
never has 2-drivers-a problem known as
contention.
•In order to prevent the problem opposite to bus contention – a bus floating to an
intermediate voltage when there are no bus driver – we can use a Bus –keeper or Bus –
hold cell.
•A bus keeper normally acts like two weak cross- coupled inverters that act as a latch to
retain the last logic state on the bus, but the latch is weak enough that it may be driven
easily to the opposite state.
•Even though the bus keeper acts like latches , and will simulate like latches, they
should not be used as latches, since their drive strength is weak.
•Transistors M1 and M2 in figure have to drive large off-chip loads .If we wish to change
the voltage on a C=200pF load by 5V in 5ns (a slew rate of 1V/ns) we will require a
current in the output transistors of IDS =C(dv/dt)=(200×10 -12 )(5/5×10-9 ) = 0.2A or
200mA.
•Such large currents flowing in the output transistors must also flow
in the power supply bus and can cause problems .
•There is always some inductance in series with the power supply ,
between the point at which the supply enters the ASIC package and
reaches the power bus on chip. The inductance is due to bond wire,
lead frame , and package pin.
•If we have power supply inductance of 2nH and a current changing
from 0 to 1A (32 I/O cells on a bus switching at 30mA each) in 5ns,
we will have a voltage spike on the power supply (called power-
supply bounce) of L(dI/dt)=(2×10 -9 )(1/5×10-9 ) = 0.4V.
We do several things to alleviate this problems:
•Limit number of Simultaneously switching outputs(SSO‘s).
•We can limit the number of I/O drivers that can be attached to any one VDD or GND
pads.
•We can design the output buffer to limit the slew rate of the output( we call these
slew rate limited I/O pads). Quiet-I/O cells :use 2 separate power supplies and 2 sets of
I/O drivers :
•An AC supply( clean or quiet AC supply) with small AC drivers for the I/O circuits that
start and stop the output slewing in the beginning and end of the output transition, and
a DC supply( noisy or dirty supply) for the transistors that handle the large currents as
they slew the output .
• The TRI – State buffer allows us to employ the same pad for input and output –
Bidirectional I/O: when we want to use the pad as input we just set OE low and
take the data fromDATAin.
•Its not necessary to have all pads as bi-directional, we can also build input only
or output –only pads INPUT I/O PADS:
•We can also use many of these output cell features for input cells that have to
drive large On- chip loads( a clock pad cell).
•Some gate arrays simply turn an output buffer around to drive large on-chip
grid of interconnection that supply the clock.
•Example: with a typical interconnect capacitance of 0.2pFcm -1 with a grid of
100cm presents a load of 20pF to the clock buffer.
•Some libraries include I/O cells that have passive pull-ups or pull-
downs(resistors) instead of transistors M1 and M2.
•We can also omit an transistor M1 or M2 to form open drain output that
require an external pull- up or pull-down.
•We can design an output driver to produce TTL output levels rather than CMOS
logic levels.
• we may also add input hysteresis using a Schmitt trigger to the input buffer, I1
to accept input signals that contain glitches or that are slow rising.
•The input buffer can also include a level shifter to accept TTL input levels and
shift the input signal to CMOS levels.
ESD AND EOS:

•The gate-oxide in CMOS is very thin(100Å or less) which leaves the gate oxide of the
I/O cells of input transistor susceptible to breakdown from Static electricity(ESD).
•ESD arises when humans or machines handle the package leads.
•Sometimes this is also called as EOS (Electrical overstress) since most ESD – related
failures occur due to thermal stress that occcurs when an n-channel transistor in an
output driver over heats(melts) due to large currents that can flow in the drain
diffusion connected to a pad during an ESD event.
Measures to overcome ESD and EOS:
•To protect the I/O cells from the ESD, the input pads are normally tied to
device structures that clamp the input voltage to below the gate-oxide
breakdown voltage.
•Some I/O cells have transistor with special ESD implant that increases the
breakdown voltage and provides protection to the transistor drivers in I/O
pads.
•I/O driver transistors can also use elongated drain structures(ladder
structure) and large drain to gate spacing to help limit the current, but
during the salicide process it lowers the drain resistance and it becomes
difficult to manage ( I/O cells can be masked during the salicide step).
• Another solution is to use pnpn or npnp diffusion structures called SCR
to clamp voltages and divert current to protect I/O circuits from ESD
Modeling an EOS for I/O cells:
•The Human body Model(HBM): it typically represents the voltage
generated by human body(2-4KV) which is represented by an ESD by
100pF capacitor trying to discharge through an1.5 KΩ resistor .
•The machine model(MM): it represents the ESD model developed by the
automated machine handlers, typical parameters use 200pF capacitor
discharging through a 25Ω resistor. Representing a peak initial current of
10 A.
•Charge-Device model(CDM): It represents the problem when the IC
package is charged , in a shipping tube for example and then grounded .If
the max charge in the package is 3nC and the package capacitance to
ground is 1.5pF,we can simulate this event by charging a 1.5pF capacitor
to 2kV and discharging it through a 1Ωresistor.
Problems of not designing I/O pad with care:

•If the diffusion structures are not designed with care, it is possible to
construct an SCR unwillingly , and instead of protecting the transistors the
SCR can enter a mode where it is latched on and conducting large enough
currents to destroy the chip.
•This mode of failure is called Latch-up .this effect is seen if the p-n
junctions on chip get forward biased and inject minority carriers into
substrate. The source –substrate and drain – substrate diodes become
forward biased due to power supply bounce or output undershoot ( when
output cell falls below VSS)or overshoot( when output rises above VDD).
•These injected minority charge carriers can travel fairly long distance and
engage other transistors causing the latch-up
Measures to overcome Latch-up:

•I/O cells surround the I/O transistors with the guard rings i.e continuous n-rings
in an n-well connected to VDD, and a ring of p-diffusion in p-well connected to
VSS, to collect this minority charge carriers.
•This problem can also occur in the core cell‘s too that‘s why we include the
substrate and well connection to the power supplies in every cell .
Cell compilers:

•The process of handcrafting the circuit layout for a full- custom IC is a tedious,
time-consuming and error-prone task.
•There are 2 types of automated layout assembly tools often known as silicon
compilers.
•The first kind produces a specific kind of circuit , A RAM compiler or Multiplier
compiler, for example .
•The second kind of compiler is more flexible, usually providing a
programming language that assembles or tiles layout from an input
command file, but this is still a full custom IC design.
•We can build a register file from a latches or flip-flops , but at 4.5-
6.5 gates /bit of data storage it becomes too expensive.
•Other option is using Dynamic RAM‘s which need only 1 transistor
to store.
RAM’s(SRAM).
•Most SRAMs in ASIC uses 6=transistor cell.(4( cross coupled inverter+2 R/W control).
•RAM compilers allow us to produce single port , dual port and Multi port RAM’s.
•In a multiport Ram the compiler may or ,may not handle the address contention.
•RAM‘s can be designed to be either synchronous or Asynchronous.
•In addition to producing the layout we also need a model compiler so that we can
verify the circuit at the behavioral level and we also need a netlist compiler to simulate
the circuit and verify that it works correctly at the structural level.
•Si compilers are a complex piece of software .We assume that a Si compiler will
produce the working Si even if every configuration is not tested, hence they are correct
by construction.

You might also like