Unit 2
Unit 2
Unit 2
Fixed-Point Representation
This representation has fixed number of bits for integer part and for fractional part. For example,
if given fixed-point representation is IIII.FFFF, then you can store minimum value is 0000.0001
and maximum value is 9999.9999. There are three parts of a fixed-point number representation:
the sign field, integer field, and fractional field.
Example −Assume number is using 32-bit format which reserve 1 bit for the sign, 15 bits for
the integer part and 16 bits for the fractional part.
We can move the radix point either left or right with the help of only integer field is 1.
Floating-Point Representation
This representation does not reserve a specific number of bits for the integer part or the
fractional part. Instead it reserves a certain number of bits for the number (called the mantissa
or significand) and a certain number of bits to say where within that number the decimal place
sits (called the exponent).
The floating number representation of a number has two part: the first part represents a signed
fixed point number called mantissa. The second part of designates the position of the decimal
(or binary) point and is called the exponent. The fixed point mantissa may be fraction or an
integer. Only the mantissa m and the exponent e are physically represented in the register
(including their sign). A floating-point binary number is represented in a similar manner except
that is uses base 2 for the exponent. A floating-point number is said to be normalized if the most
significant digit of the mantissa is 1.
So, actual number is (-1)s(1+m)x2(e-Bias), where s is the sign bit, m is the mantissa, e is the
exponent value, and Bias is the bias number.
Note that signed integers and exponent are represented by either sign representation, or one’s
complement representation, or two’s complement representation.
The floating point representation is more flexible. Any non-zero number can be represented in
the normalized form of ±(1.b1b2b3 ...)2x2n This is normalized form of a number x.
Example −Suppose number is using 32-bit format: the 1 bit sign bit, 8 bits for signed exponent,
and 23 bits for the fractional part. The leading bit 1 is not stored (as it is always 1 for a
normalized number) and is referred to as a “hidden bit”.
Note that 8-bit exponent field is used to store integer exponents -126 ≤ n ≤ 127.
The precision of a floating-point format is the number of positions reserved for binary digits
plus one (for the hidden bit). In the examples considered here the precision is 23+1=24.
The gap between 1 and the next normalized floating-point number is known as machine epsilon.
the gap is (1+2-23)-1=2-23 for above example, but this is same as the smallest positive floating-
point number because of non-uniform spacing unlike in the fixed-point scenario.
Note that non-terminating binary numbers can be represented in floating point representation,
e.g., 1/3 = (0.010101 ...)2 cannot be a floating-point number as its binary representation is non-
terminating.
According to IEEE 754 standard, the floating-point number is represented in following ways:
Half Precision (16 bit): 1 sign bit, 5 bit exponent, and 10 bit mantissa
Single Precision (32 bit): 1 sign bit, 8 bit exponent, and 23 bit mantissa
Double Precision (64 bit): 1 sign bit, 11 bit exponent, and 52 bit mantissa
Quadruple Precision (128 bit): 1 sign bit, 15 bit exponent, and 112 bit mantissa
The adder produce carry propagation delay while performing other arithmetic operations like
multiplication and divisions as it uses several additions or subtraction steps. This is a major
problem for the adder and hence improving the speed of addition will improve the speed of all
other arithmetic operations. Hence reducing the carry propagation delay of adders is of great
importance. There are different logic design approaches that have been employed to overcome
the carry propagation problem. One widely used approach is to employ a carry look-ahead
which solves this problem by calculating the carry signals in advance, based on the input signals.
This type of adder circuit is called a carry look-ahead adder. Here a carry signal will be
generated in two cases:
In ripple carry adders, for each adder block, the two bits that are to be added are available
instantly. However, each adder block waits for the carry to arrive from its previous block. So,
it is not possible to generate the sum and carry of any block until the input carry is known.
The block waits for the block to produce its carry. So there will be a considerable time delay
which is carry propagation delay.
Consider the above 4-bit ripple carry adder. The sum is produced by the corresponding full
adder as soon as the input signals are applied to it. But the carry input is not available on its
final steady-state value until carry is available at its steady-state value. Therefore, though the
carry must propagate to all the stages in order that output and carry settle their final steady-state
value.
The propagation time is equal to the propagation delay of each adder block, multiplied by the
number of adder blocks in the circuit. For example, if each full adder stage has a propagation
delay of 20 nanoseconds, then will reach its final correct value after 60 (20 × 3) nanoseconds.
The situation gets worse, if we extend the number of stages for adding more number of bits.
1. Explain Carry Look Ahead adders in detail
A carry-look ahead adder (CLA) is a type of adder used in digital logic. A carry-lookahead
adder improves speed by reducing the amount of time required to determine carry bits. The
carry-look ahead adder calculates one or more carry bits before the sum, which reduces the wait
time to calculate the result of the larger value bits.
Operation Mechanism
1. Calculating, for each digit position, whether that position is going to propagate a carryif
one comes in from the right.
2. Combining these calculated values to be able to deduce quickly whether, for each groupof
digits, that group is going to propagate a carry that comes in from the right.
CLA – Concept
To reduce the computation time, there are faster ways to add two binary numbers byusing
carry look ahead adders.
They work by creating two signals P and G known to be Carry Propagator and Carry
Generator.
The carry propagator is propagated to the next level whereas the carry generator is usedto
generate the output carry regardless of input carry.
The block diagram of a 4-bit Carry Lookahead Adder is shown here below
The number of gate levels for the carry propagation can be found from the circuit of full adder.
The signal from input carry C in to output carry Cout requires an AND gate and an OR gate,
which constitutes two gate levels. So if there are four full adders in the parallel adder, the output
carry C5would have 2 X 4 = 8 gate levels from C1 to C5. For an n-bit parallel adder, there are
2n gate levels to propagate through.
The corresponding boolean expressions are given here to construct a carry look
ahead adder. In the carry-look ahead circuit we ned to generate the two signals carry
propagator(P) and carry generator(G),
Pi = Ai ⊕ Bi
Gi = Ai · Bi
Having these we could design the circuit. We can now write the Boolean function
for the carry output of each stage and substitute for each Ci its value from the previous
equations:
C1 = G0 + P0 · C0
C2 = G1 + P1 · C1 = G1 + P1 · G0 + P1 · P0 · C0
C3 = G2 + P2 · C2 = G2 P2 · G1 + P2 · P1 · G0 + P2 · P1 · P0 · C0
C4 = G3 + P3 · C3 = G3 P3 · G2 P3 · P2 · G1 + P3 · P2 · P1 · G0 + P3 · P2 · P1 · P0 · C0
c2 = g1 + p1c1
= g1 + p1(g0 + p0c0)
= g1 + p1g0 + p1p0c0
c3 = g2 + p2c2
Example,
A number in scientific notation that has a single digit to the left of the decimal point has no leading 0s is called
a normalized number.
Example, 1.010 x 10-9 is in normalized scientific notation, but 0.110 x 10-8 & 10.010 x
10-10 are not.
Binary numbers in scientific notation: 1.02 X 2-1, here decimal point is called as binary point for clarity.
General syntax for Binary floating point number 1.xxxxxxxxx2 X 2yyyyFloating-Point Representation:
The representation has 2 parts that has fraction part and exponent part. The fraction part size is used to increase or
decrease precision. The exponent part size is used to increase or decrease the range of the number.
The representation has 1 sign bit where bit -0 represents +ve value and bit -0 representsnegative value.
(-1)S x F x 2E
Overflow here means that the exponent is too large to be represented in the exponent field.
The smallest non zero fraction that can be represented in the fraction field, when it istoo small
to store in faction field then we call it as underflow.
Double precision representation:
To pack even more bits into the significand, IEEE 754 makes the leading 1-bit of normalized binary numbers
implicit.
Hence, the number is actually 24 bits long in single precision (implied 1 and a 23-bit fraction), and 53 bits long
in double precision (1 + 52).
The IEEE format uses NaN, for Not a Number to represent undefined values like infinity, calculation of 0/0
values etc.
The 2 power -1 is represented in 2's complement format, in fig (b) value -1 isrepresented like as if a large exponent
value is sitting in the exponent field, to avoid this confusion bias value is used.
IEEE 754 uses a bias of 127 for single precision, so an exponent of -1 is represented bythe bit pattern of the value
-1+ 127ten, or 126ten and +1 is represented by 1 + 127ten, or 128ten
Multiplying decimal numbers in scientific notation by hand: 1.110ten X 1010 X 9.200ten X 10-5
Assume that we can store only four digits of the significand and two digits of the exponent.
Step 1 : Unlike addition, we calculate the exponent of the product by simply adding theexponents of the
operands together:
Step 4 : We assumed that the significand is only four digits long (excluding the sign), so we must round the
number. The number 1.0212ten X 106 is rounded to four digits in the significandto 1.021ten X 106.
Step 5 : The sign of the product depends on the signs of the original operands. If they are boththe same, the sign
is positive; otherwise, it’s negative. Hence, the product is +1.021ten X 106
example based on above algorithm -- using binary numbers Binary Floating-Point Multiplication
Multiply the number 0.5 ten and -0.4375 ten in binaryConvert the number to binary
0.5 ten = 0.1two = 0.1two x 20 = 1.000 two x 2-1
-0.4375 = -0.0111two = -0.0111two x 20 = -1.110 two x 2-2
Step 1. Adding the exponent with out bias: -1 + (-2) = -3
Step 2. Multiply the significand:
1.000two
x 1.110 two
0000
1000
1000
1000
1110000 two There are three digits to the right of the decimal point for each operand, so the product will be
1110000 two x 2-3 , we need to keeponly 4 - bits 1.110 two x 2-3 .
Step 3. Check whether the product is normalized, and check for underflow or overflow.
It is already normalized so, the product is 1.110 two x 2-3
Step 4. Round the product to 4 - bits, it already fits, so no action.
1.110 two x 2-3
Step 5. Since the sign of the operands are different so the show product as negative.
-1.110 two x 2-3
converting to decimal, -1.110 two x 2-3 = -0.001110two , After conversion we get -7
-5
x 2ten
= - 7/25ten = -7/32 ten = -0.21875 ten
Booth Multiplier
Booth algorithm gives a procedure for multiplying binary integers in signed 2’s complement
representation in efficient way, i.e., less number of additions/subtractions required. It operates
on the fact that strings of 0’s in the multiplier require no addition but just shifting and a string of
1’s in the multiplier from bit weight 2^k to weight 2^m can be treated as 2^(k+1 ) to 2^m. As in
all multiplication schemes, booth algorithm requires examination of the multiplier bits and
shifting of the partial product. Prior to the shifting, the multiplicand may be added to the partial
product, subtracted from the partial product, or left unchanged according to following rules:
1. The multiplicand is subtracted from the partial product upon encountering the first least
significant 1 in a string of 1’s in the multiplier
2. The multiplicand is added to the partial product upon encountering the first 0 (provided
that there was a previous ‘1’) in a string of 0’s in the multiplier.
3. The partial product does not change when the multiplier bit is identical to the previous
multiplier bit.
We name the register as A, B and Q, AC, BR and QR respectively. Qn designates the least significant
bit of multiplier in the register QR. An extra flip-flop Qn+1is appended to QR to facilitate a
double inspection of the multiplier.The flowchart for the booth algorithm is shown below.
AC and the appended bit Qn+1 are initially cleared to 0 and the sequence SC is set to a number n equal
to the number of bits in the multiplier. The two bits of the multiplier in Qn and Qn+1are
inspected. If the two bits are equal to 10, it means that the first 1 in a string has been encountered.
This requires subtraction of the multiplicand from the partial product in AC. If the 2 bits are
equal to 01, it means that the first 0 in a string of 0’s has been encountered. This requires the
addition of the multiplicand to the partial product in AC. When the two bits are equal, the partial
product does not change. An overflow cannot occur because the addition and subtraction of the
multiplicand follow each other. As a consequence, the 2 numbers that are added always have a
opposite signs, a condition that excludes an overflow. The next step is to shift right the partial
product and the multiplier (including Qn+1). This is an arithmetic shift right (ashr) operation
which AC and QR to the right and leaves the sign bit in AC unchanged. The sequence counter
is decremented and the computational loop is repeated n times. Product of negative numbers is
important, while multiplying negative numbers we need to find 2’s complement of the number
to change its sign, because it’s easier to add instead of performing binary subtraction. product of
two negative number is demonstrated below along with 2’s complement.
Restoring Division Algorithm For Unsigned Integer
A division algorithm provides a quotient and a remainder when we divide two number. They are
generally of two types slow and fast algorithm.
Slow division algorithm are restoring, non-restoring, non-performing restoring, SRT algorithm and
under fast comes Newton–Raphson and Goldschmidt. In this article, will be performing restoring
algorithm for unsigned integer. Restoring term is due to fact that value of register A is restored
after each iteration.
Here, register Q contain quotient and register A contain remainder. Here, n-bit dividend is loaded in Q
and divisor is loaded in M. Value of Register is initially kept 0 and this is the register whose
value is restored during iteration due to which it is named Restoring.
Step Involved
Step-1: First the registers are initialized with corresponding values (Q = Dividend, M =
Divisor, A = 0, n = number of bits in dividend)
Step-2: Then the content of register A and Q is shifted left as if they are a single unit
Step-4: Then the most significant bit of the A is checked if it is 0 the least significant bit
of Q is set to 1 otherwise if it is 1 the least significant bit of Q is set to 0 and value of
register A is restored i.e the value of A before the subtraction with M
Step-7: Finally, the register Q contain the quotient and A contain remainder
In earlier post Restoring Division learned about restoring division. Now, here performing Non-Restoring
division, it is less complex than the restoring one because simpler operation are involved i.e.
addition and subtraction, also no restoring step is performed. In the method, relybon the sign bit
of the register which initially contain zero named as A. Here is the flowchart given below.
Step-1: First the registers are initialized with corresponding values (Q = Dividend, M =
Divisor, A = 0, n = number of bits in dividend)
Step-2: Check the sign bit of register A
Step-3: If it is 1 shift left content of AQ and perform A = A+M, otherwise shift left AQ
and perform A = A-M (means add 2’s complement of M to A and store it to A)
Step-4: Again the sign bit of register A
Step-5: If sign bit is 1 Q[0] become 0 otherwise Q[0] become 1 (Q[0] means least
significant bit of register Q)
Step-6: Decrements value of N by 1
Step-7: If N is not equal to zero go to Step 2 otherwise go to next step
Step-8: If sign bit of A is 1 then perform A = A+M
Step-9: Register Q contains quotient and A contains remainder.