COA Mod 3
COA Mod 3
Array Multiplier
Binary multiplication can be implemented in a combinational two-dimensional logic array
called array multiplier.
The main component in each in each cell is a full adder, FA.
The AND gate in each cell determines whether a multiplicand bit mj, is added to the
incoming partial product bit based on the value of the multiplier bit, qi.
Each row i, where 0<= i <=3, adds the multiplicand (appropriately shifted) to the
incoming parcel product, PPi, to generate the outgoing partial product, PP(i+1), if
qi.=1.
If qi.=0, PPi is passed vertically downward unchanged. PP0 is all 0’s and PP4 is the
desired product. The multiplication is shifted left one position per row by the diagonal
signal path.
(a)Array multiplication of positive binary operands (b) Multiplier cell
Disadvantages:
(1) An n bit by n bit array multiplier requires n2 AND gates and n(n-2) full adders and n
half adders.(Half aders are used if there are 2 inputs and full adder used if there are 3
inputs).
(2) The longest part of input to output through n adders in top row, n -1 adders in the
bottom row and n-3 adders in middle row. The longest in a circuit is called critical
path.
Sequential Circuit Multiplier
Multiplication is performed as a series of (n) conditional addition and shift operation such
that if the given bit of the multiplier is 0 then only a shift operation is performed, while if the
given bit of the multiplier is 1 then addition of the partial products and a shift operation are
performed.
The combinational array multiplier uses a large number of logic gates for multiplying
numbers. Multiplication of two n-bit numbers can also be performed in a sequential circuit that
uses a single n bit adder.
The block diagram in Figure shows the hardware arrangement for sequential
multiplication. This circuit performs multiplication by using single n-bit adder n times to
implement the spatial addition performed by the n rows of ripple-carry adders in Figure. Registers
A and Q are shift registers, concatenated as shown. Together, they hold partial product PPi while
multiplier bit qi generates the signal Add/Noadd. This signal causes the multiplexer MUX to
select 0 when qi = 0, or to select the multiplicand M when qi = 1, to be added to PPi to generate
PP(i + 1). The product is computed in n cycles. The partial product grows in length by one bit per
cycle from the initial vector, PP0, of n 0s in register A. The carryout from the adder is stored in
flipflop C, shown at the left end of the register C.
Algorithm:
(1) The multiplier and multiplicand are loaded into two registers Q and M. Third register
A and C are cleared to 0.
(2) In each cycle it performs 2 steps:
(a) If LSB of the multiplier qi =1, control sequencer generates Add signal which
adds the multiplicand M with the register A and the result is stored in A.
(b) If qi =0, it generates Noadd signal to restore the previous value in register A.
(3) Right shift the registers C, A and Q by 1 bit
BR=10111
Qn Qn+1 AC Q Qn+1 SC
BR'+1=01001
Initial 00000 10011 0 101
00000+
SUB 01001
1 0
01001 10011 0 101
ASHR 00100 11001 1 100
1 1 ASHR 00010 01100 1 011
00010+
ADD 10111
0 1
11001 01100 1 011
ASHR 11100 10110 0 010
0 0 ASHR 11110 01011 0 001
11110
SUB 01001
1 0
00111 01011 0 001
ASHR 00011 10101 1 000
BR=01101
Qn Qn+1 AC Q Qn+1 SC
BR'+1=10011
Initial 00000 11010 0 101
0 0 ASHR 00000 01101 0 100
00000+
SUB 10011
1 0
10011 01101 0 100
ASHR 11001 10110 1 011
11001+
ADD 01101
0 1
00110 10110 1 011
ASHR 00011 01011 0 010
00011
SUB 10011
1 0
10110 01011 0 010
ASHR 11011 00101 1 001
1 1 ASHR 11101 10010 1 000
[13x-6 will give a –ve product. so the resultant product’s 2’s compliment should be
determined]
Resultant Product in A and Q = 11101 10010
2’s complement = 00010 01101+
1
--------------------
0001001110
=26+23+22+21
. = -78
==================
Multiply -11 x 8 using Booth Algorithm
11 = 1011 8 = 1000 BR= 10101
+11 = 01011 +8 = 01000 (Q) BR’+1= 01010+
-11 = 10100+ - 1
1 --------------
------------------ 01011 (BR’+1)
10101 (BR)
BR=10101
Qn Qn+1 AC Q Qn+1 SC
BR'+1=01011
Initial 00000 01000 0 101
0 0 ASHR 00000 00100 0 100
0 0 ASHR 00000 00010 0 011
0 0 ASHR 00000 00001 0 010
00000+
SUB 01011
1 0
01011 00001 0 010
ASHR 00101 10000 1 001
00101
ADD 10101
0 1
11010 10000 1 010
ASHR 11101 01000 0 000
[-11x8 will give a –ve product. so the resultant product’s 2’s compliment should be
determined]
Answer:
A=010111 B=110110
[sign bit is 0, therefore +ve number] [sign bit is 1, therefore -ve number]
Find 2’s complement.
A=23 [10111] 2’s complement of 10110 is 01001+ 1
= 01010 => 10
Therefore, A=+23 [010111] Therefore, B= -10 [110110]
BR=010111
Qn Qn+1 AC Q Qn+1 SC
BR'+1=101001
Initial 000000 110110 0 0110
0 0 ASHR 000000 011011 0 0101
000000+
SUB 101001
1 0
101001 011011 0 0101
ASHR 110100 101101 1 0100
1 1 ASHR 111010 010110 1 0011
111010+
ADD 010111
0 1
010001 010110 1 0011
ASHR 001000 101011 0 0010
001000+
1 0 SUB 101001
110001 101011 0 0010
ASHR 111000 110101 1 0001
1 1 ASHR 111100 011010 1 0000
[+23x-10 will give a –ve product. so the resultant product’s 2’s compliment should be
determined]
Resultant Product in A and Q = 111100 011010
2’s complement = 000011 100101+
1
--------------------------
000011100110
=27 +26+25+22 + 21
= -230
=============
Booth algorithm works equally well for both negative and positive multipliers.
Booth algorithm deals with signed multiplication of given number.
Speed up the multiplication process.
In general, in the Booth algorithm, −1 times the shifted multiplicand is selected when moving
from 0 to 1, and +1 times the shifted multiplicand is selected when moving from1 to 0, as the
multiplier is scanned from right to left. The case when the LSB of the multiplier is 1, it is
handled by assuming that an implied 0 lies to its right.
In worst case multiplier, numbers of addition and subtraction operations are large.
In ordinary multiplier, 0 indicates no operation, but still there are addition and
subtraction operations to be performed.
In good multiplier, booth algorithm works well because majority are 0s .
A good multiplier consists of block/sequence of 1s.
Booth algorithm achieves efficiency in the number of additions required when the multiplier had
a few large blocks of 1s. The speed gained by skipping over 1s depends on the data. On average,
the speed of doing multiplication with the booth algorithm is the same as with the normal
multiplication
• Best case – a long string of 1’s (skipping over 1s)
• Worst case – 0’s and 1’s are alternating
• The transformation 011….110 to 100….0-10 is called skipping over 1s.
INTEGER DIVISION
Figure shows examples of decimal division and binary division of the same values.
Consider the decimal version first. The 2 in the quotient is determined by the following
reasoning: First, we try to divide 13 into 2, and it does not work. Next, we try to divide 13into 27.
We go through the trial exercise of multiplying 13 by 2 to get 26, and, observing that 27 − 26 = 1
is less than 13, we enter 2 as the quotient and perform the required subtraction.
Dividend = 274
Divisor = 13
Quotient=21
Remainder =1
The next digit of the dividend, 4, is brought down, and we finish by deciding that 13 goes
into 14 once and the remainder is 1. We can discuss binary division in a similar way, with the
simplification that the only possibilities for the quotient bits are 0 and 1.
A circuit that implements division by this longhand method operates as follows: It
positions the divisor appropriately with respect to the dividend and performs a subtraction. If the
remainder is zero or positive, a quotient bit of 1 is determined, the remainder is extended by
another bit of the dividend, the divisor is repositioned, and another subtraction is performed.
If the remainder is negative, a quotient bit of 0 is determined, the dividend is restored by
adding back the divisor, and the divisor is repositioned for another subtraction. This is called the
restoring division algorithm.
Restoring Division
Figure shows a logic circuit arrangement that implements the restoring division algorithm
just discussed. An n-bit positive divisor is loaded into register M and an n-bit positive dividend
is loaded into register Q at the start of the operation. Register A is set to 0. After the division is
complete, the n-bit quotient is in register Q and the remainder is in register A.
The required subtractions are facilitated by using 2’s-complement arithmetic. The extra
bit position at the left end of both A and M accommodates the sign bit during subtractions. The
following algorithm performs restoring division.
Do the following three steps n times:
1. Shift A and Q left one bit position.
2. Subtract M from A, ie; (A-M) and place the answer back in A.
3. If the sign of A is 1, set q0 to 0 and add M back to A (that is, restore A); otherwise, set
q0 to 1.
M = 00011
M’+1 = 11100+1
= 11101
Pipelining is a technique of decomposing a sequential process into sub operations, with each
sub process being executed in a special dedicated segment that operates concurrently with
all other segments. A pipeline can be visualized as a collection of processing segments
through which binary information flows. Each segment performs partial processing dictated
by the way the task is partitioned. The result obtained from the computation in each segment
is transferred to the next segment in the pipeline. The final result is obtained after the data
have passed through all segments.
A pipeline processor may process each instruction in 4 steps:
F Fetch: Read the instruction from the memory
D Decode: Decode the instruction and fetch the source operands
E Execute: Perform the operation specified by the instruction
W Write: Store the result in the destination location.
In figure (a) four instructions progress at any given time. This means that four distinct
hardware units are needed as in figure (b). These units must be capable of performing their
tasks simultaneously without interfering with one another. Information is passed from one
unit to next through a storage buffer. As an instruction progresses through the pipeline. all
the information needed by the stages downstream must be passed along.
Pipeline Organization
The simplest way of viewing the pipeline structure is to imagine that each segment
consists of an input register followed by a combinational circuit. The register holds the data
and the combinational circuit performs the sub operation in the particular segment. The
output of the combinational circuit is applied to the input register of the next segment. A
clock is applied to all registers after enough time has elapsed to perform all segment
activity. In this way the information flows through the pipeline one step at a time. Example
demonstrating the pipeline organization
Consider the case where a k-segment pipeline with a clock cycle time tp is used to
execute n tasks. The first task T1 requires a time equal to ktp to complete its operation since
there are k segments in a pipe. The remaining n-1 tasks emerge from the pipe at the rate of
one task per clock cycle and they will be completed after a time equal to (n-1) tp. Therefore,
to complete n tasksusing a k segment pipeline requires k+ (n-1) clock cycles.
Consider a non pipeline unit that performs the same operation and takes a time equal to
tn to complete each task. The total time required for n tasks is n tn. The speedup of a pipeline
processing over an equivalent non pipeline processing is defined by the ratio
S=ntn / (k+n-1)tp
As the number of tasks increases, n becomes much larger than k-1, and k+n-1 approaches the
value of n. under this condition the speed up ratio becomes
S=tn/tp
If we assume that the time it takes to process a task is the same in the pipeline and non
pipeline circuits, we will have tn=ktp. Including this assumption speedup ratio reduces to
S=ktp/tp=k
1. Arithmetic Pipelining: The arithmetic logic units of a computer can be segmented for
pipeline operations in various data formats.
2. Instruction Pipelining: The execution of stream of instructions can be pipelined by
overlapping the execution of current instruction with the fetch, decode and execution of
subsequent instructions. This technique is known as instruction lookahead.
16
KTU - CST202 - Computer Organization and Architecture Module: 3
ARITHMETIC PIPELINES
17
KTU - CST202 - Computer Organization and Architecture Module: 3
A and B are two fractions that represent the mantissa and a and bare the exponents. The
floating point addition and subtraction can be performed in four segments. The registers
labeled are placed between the segments to store intermediate results. The sub operations
that are performed in the four segments are:
18
KTU - CST202 - Computer Organization and Architecture Module: 3
The exponents are compared by subtracting them to determine their difference. The
larger exponent is chosen as the exponent of the result. The exponent difference determines
how many times the mantissa associated with the smaller exponent must be shifted to the
right. This produces an alignment ofthe two mantissas.
The two mantissas are added or subtracted in segment3. The result is normalized in
segment 4. When an overflow occurs, the mantissa of the sum or difference is shifted to right
and the exponent incremented by one. If the underflow occurs, the number of leading zeroes
in the mantissa determines the number of left shifts in the mantissa and the number that must
be subtracted from the exponent.
[Overflow – When the result of an Arithmetic operation is finite but larger in magnitude than
the largest floating point which can be stored by the precision, Underflow – When the result
of an Arithmetic operation is smaller in magnitude than the smallest floating point which can
be stored]
INSTRUCTION PIPELINE
An instruction pipeline operates on a stream of instructions by overlapping the
fetch, decode, and execute phases of instruction cycle. An instruction pipeline reads
consecutive instructions from memory while previous instructions are being executed in
other segments. This causes the instruction fetch and executes phases to overlap and perform
simultaneous operations.
Consider a computer with an instruction fetch unit and an instruction execute unit
designed to provide a two segment pipeline. The instruction fetch segment can be
implemented by means of a first in first out (FIFO) buffer. Whenever the execution unit is
not using memory, the control increments the program counter and uses it address value to
read consecutive instructions frommemory. The instructions are inserted into the FIFO buffer
so that they can be executed on a first in first out basis. Thus an instruction stream can be
placed inqueue, waiting for decoding and processing by the execution segment.
19
KTU - CST202 - Computer Organization and Architecture Module: 3
In general the computer needs to process each instruction with the following sequence of
steps. (6 steps in 4 segments)
20
KTU - CST202 - Computer Organization and Architecture Module: 3
Fig shows the operation of the instruction pipeline. The clock in thehorizontal axis is
divided into steps of equal duration. The four segments are represented in the diagram with
an abbreviated symbol.
2. DA is the segment that decodes the instruction and calculates theeffective address.
In case of third instruction we see that it is a branched instruction. Here when it is being
decoded, 4th instruction is fetched simultaneously. But as it is a branched instruction it may
point to some other instruction when it is decoded. Thus fourth instruction is kept on hold
until the branched instruction is executed. When it gets executed then the fourth instruction is
KTU - CST202 - Computer Organization and Architecture Module: 3
copied back and the other phases continue as usual. In the absence of a branch instruction,
each segment operates on different instructions.
PIPELINE CONFLICTS:
1. Resource Conflicts: They are caused by access to memory by two segments at the
same time. Most of these conflicts can be resolved by using separate instruction and
data memories.
2. Data Dependency: these conflicts arise when an instruction depends on the result of
a previous instruction, but this result is not yet available.
3. Branch Difference: they arise from branch and other instructions that change the
value of PC.
Pipeline hazards are caused by resource usage conflicts among various instructions in the
pipeline. Such hazards are triggered by inter instruction dependencies when successive
instructions overlap their fetch, decode and execution through a pipeline processor, inter
instruction dependencies may arise to prevent the sequential data flow in the pipeline.
For example an instruction may depend on the results of a previous instruction. Until the
completion of the previous instruction, the present instruction cannot be initiated into the
pipeline. In other instances, two stages of a pipeline may need to update the same memory
location. Hazards of this sort, if not properly detected and resolved could result in an inter
lock situation in the pipeline or produce unreliable results by overwriting.
There are three classes of data dependent hazards, according to various data update
patterns:
Note that Read After Read does not pose a problem because nothing is changed.
KTU - CST202 - Computer Organization and Architecture Module: 3
We use resource object to refer to working registers, memory locations and special flags. The
contents of these resource objects are called data objects. Each instruction can be considered
a mapping from a set of data objects to a set of data objects. The domain D(I) of an
instruction I is a set of resource objects whose data objects may affect the execution of
instruction I. The range of an instruction R(I) is the set of resource objects whose data objects
may be modified by the execution of instruction I. Obviously, the operands to be used in an
instruction execution are retrieved (read) from its domain and the results will be stored
(written) in its range.
Consider the execution of two instructions I and J in a program. Instruction J appears after
instruction I in the program. There may be none or other instructions between instruction I
and J. The latency between the two instructions is a very subtitle matter. Instruction J may
enter the execution pipe before or after the completion of the execution of instruction l. The
improper timing and the data dependencies may create some other hazardous situations.
1. RAW hazard between the two instructions I and J may occur when they attempt to
read some data object that has been modified by I.
2. WAR hazard may occur when J attempt to modify some data object that is read by I.
3. WAW hazard may occur if both I and J attempt to modify the same data object.
Possible hazards are listed in table. Recognizing the existence of possible hazards, computer
designers wish to detect the hazard and then to resolve it efficiently. Hazard detection can be
done in the instruction fetch stage of a pipeline processor by comparing the domain and the
range of incoming instruction with those of the instructions being processed in the pipe.
Should any of the condition in equation 3.18 be detected, a warning signal can be generated
to prevent the hazard from taking place. Another approach is to allow the incoming
instruction through the pipe and distribute the detection to all the potential pipeline stages.
KTU - CST202 - Computer Organization and Architecture Module: 3
This distributed approach offers better flexibility at the expense of increased hardware
control. Note that the necessary conditions in the equation 3.18 may not be sufficient
conditions.
Once the hazard is detected, the system should resolve the interlock situation. Consider the
instruction sequence {.. I, I+1,....J, J+1,...} in which a hazard has been detected between the
KTU - CST202 - Computer Organization and Architecture Module: 3
In order to avoid RAW hazards, IBM engineers developed a short circuiting approach
which gives a copy of the data object to be written directly to the instruction waiting to read
the data. This concept was generalized into a technique known as data forwarding, which
forward multiple copies of the data to as many waiting instructions as may wish to read it. A
data forwarding chain can be established in some cases. The internal forwarding and register-
tagging techniques are helpful in resolving logic hazards in pipelines.