FPGA Implementation of IEEE-754 Karatsuba Multiplier

Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

Title to be selected

Ravi Kishore Kodali and Satya Kesav


Department of Electronics and Communication Engineering
National Institute of Technology
Warangal, India.
AbstractFloating point arithmetic is widely used in several
applications of signal processing and in most scientific computations. Especially, the floating point multiplication is more
frequently used among all arithmetic operations. The IEEE-754
format of single and double precision floating point numbers
multiplication requires the 24 x 24 and 53 x 53 mantissa multiplications respectively. Hence, there is a constraint on the hardware
utilization of the mantissa multiplier. This paper presents the
implementation of Floating point multiplier using two different
multiplication algorithms namely Booths and Karatsuba (Normal and Recursive) Algorithms on FPGA devices. Xilinx Virtex-6
device has been used during the implementation and comparison
of the hardware resources and their execution speeds is made.
The performance results have been summarized, compared and
conclusion has been presented.
KeywordsFloating Point Multiplication, FPGA, Singleprecision, Double-precision, Karatsuba Multiplication, Booths
Multiplication.

I.

I NTRODUCTION

Embedded systems are designed for different functionality


which finally requires the manipulation of real-valued data.
These data are stored as floating point numbers in the memory
which is limited. So, these floating point computations are to
be approximated to avoid memory limitations which are known
as the floating point arithmetic [?].
In floating point arithmetic, multiplication operation occurs
frequently when compared to others. Thus floating point multiplication plays a major role in the design and implementation
aspects of floating point processor [?].Computational speed
and hardware utilization [?] are the two criteria that decide
the choice of selection of an algorithm in the implementation
of floating point multipliers [?].

II.

L ITERATURE REVIEW

Optimizing the operational speed of the multiplier is the


main aspect in the design of a floating point arithmetic
processor. 24 x 24 and 53 x 53 mantissa multiplication when
performed using traditional multiplication approach, utilizes
large amount of hardware resources and takes more delay
for the computation. So, the hardware utilization and timing
delays can be reduced by the Booths algorithm when used
for mantissa multiplication [?] which is detailed in [?] and [?]
The timing delay and power dissipation are further reduced
by using a carry save adder scheme, high-speed CMOS full
adder and modified carry select adder which is given in [?].
However, several algorithms are in existence which serves the
purpose of optimization of floating point multipliers.
Karatsuba algorithm defined for multiplication of long
integers is one of the fastest and best algorithms. The survey of
strengths and weaknesses of booths and karatsuba algorithm is
presented in [?] which concludes that karatsuba multiplication
takes lesser signal propogation time and also long number
multiplication can be suitably implemented using karatsubas
algorithm when compared to booths [?].
The implementation of Floating point multiplier using
karatsuba algorithm is very efficient as presented in [?]. When
this algorithm is performed recursively [?] till it encounters
the multiplication of 2-bit or 3-bit numbers, use of higher
order logical multiplier blocks is avoided and hence the
implementation becomes very simple and efficient in terms
of area requirements [?]. Hence recursive karatsuba algorithm
is chosen for the implementation of the floating point multiplier on FPGA platform and comparison is done with the
aforementioned two algorithms.
III.

OVERVIEW OF FLOATING POINT MULTIPLICATION


AND THE USED ALGORITHMS

The format of a floating point number is as follows:


FPGAs offer high performance and very high operating
speeds with limited amount of logic devices and IP cores
available on the system. Their applications are more commonly
observed in the field of digital signal processing, communications engineering, and also in very high speed computing
systems such as super computers. This work involves efficient
implementation of floating point arithmetic operation, namely
single and double precision floating point multiplication, using
two different algorithms on an FPGA. The rest of the paper
is organized as follows: Section II provides literature survey,
section III presents an overview of the floating point multiplication and the algorithms used, section IV gives hardware
implementation, section V presents simulation and experimental results and section VI concludes the work

For single precision:


Sign Exponent M
antissa
| {z } | {z } | {z }
1-bit

8-bits

23-bits

For Double precision:


Sign Exponent M
antissa
| {z } | {z } | {z }
1-bit

11-bits

52-bits

In general, floating point arithmetic implementation involves computing the sign, exponent and mantissa part of the
operands separately, and then combining the three of them after
rounding and normalization. A basic overview of the flow of
floating point multiplication operations is given below.

TABLE I: Booths multiplier recoding rules

1)
2)
3)
4)
5)

Qn

Qn+1

shift

+1

Add M

Subtract M

Shift M

Recoded Bits

Operation Performed

Make group of two bits in the overlapped way.

Consider an example which has the 8 bit multiplicand as


11011001 and multiplier as 011100010.
Multiplicand

11011001

Multiplier

0 1 1 1 0 0 0 10

Recoded multiplier

+1 0 0 -10 0+1-1

XOR sign bits of both numbers to get the sign bit of


the product.
Add both operands exponent, adjust the bias.
Perform the multiplication of both mantissa.
Perform normalization of mantissa product
Finally, rounding of mantissa and bias adjustment of
exponent.

000100111
111011001
000000000
000000000
000100111
000000000

The first three steps can be easily stated by the following


expression,

000000000
111011001

y = Operand1 Operand2
= (1)sign1 2exp1 1.mant1 (1)sign2 2exp2 1.mant2
= 1sign1 xorsign2 2exp11 +exp2 bias 1.mant1 1.mant2
where sign1 , exp1 , mant1 are the sign, exponent and mantissa of first operand, and sign2 , exp2 , mant2 are the sign,
exponent and mantissa of second operand

Product

0000001001001001

B. Non-Recursive Karatsuba Multiplication Algorithm


Let p = (u1u2...un)b , q = (v1v2...vn)b , where n = 2k ,
then we can write p and q as the following form:
p = p1 bn/2 + p0 , q = q1 bn/2 + q0
So we have

A. Booths Multiplication Algorithm


Booth algorithm provides a procedure for multiplying
binary integers in signed-2s complement representation [13].
According to the multiplication procedure, strings of 0s in the
multiplier require no addition but just shifting and a string of
1s in the multiplier from bit weight 2k to weight 2m can be
treated as 2k+1 2m .
Booth algorithm involves recoding the multiplier first. In
the recoded format, each bit in the multiplier can take any of
the three values: 0, 1 and -1.Suppose we want to multiply
a number by 01110 (in decimal 14). This number can be
considered as the difference between 10000 (in decimal 16)
and 00010 (in decimal 2). The multiplication by 01110 can be
achieved by summing up the following products:

24 times the multiplicand (24 = 16)


2s complement of 21 times the multiplicand
(21 = 2).
In a standard multiplication, three additions are
required due to the string of three 1s. This can
be replaced by one addition and one subtraction.
The above requirement is identified by recoding
of the multiplier 01110 using the following rules
summarized in table 1.

p q = p1 q1 bn + (p1 q0 + p0 q1 )bn/2 + p0 q0

In 1963, Karatsuba transformed formula (1) to the following formula (2)


p q = r1 bn + (r2 r1 r0 )bn/2 + r0

To generate recoded multiplier for radix-2, following steps


are to be performed:

Append the given multiplier with a zero to the


LSB side

(2)

where r0 = p0 q0 , r1 = p1 q1 r2 = (p0 + q0 )(q0 + q1 )


if n = 2, equation (2) needs to execute three multiplication
and four addition and subtraction base operations. If n > 2,
the same equation reduces the problem of multiplying two
length n(n = 2k ) integers to three multiplication of length
n/2 numbers, namely p0 q0 , p1 q1 , (p1 + p0 )(q1 + q0 ), plus two
addition of length n/2 numbers, two additions of length n
numbers, two subtractions of length n numbers.
C. Recursive Karatsuba Multiplication Algorithm
We can obtain the same product as that in above by
using divide and conquer method recursively. Let T (n) be the
computation time of the multiplication p q, we can get the
recursion easily:

T (n) =

Recode the number using the above table.

(1)

7
3T (n/2) + 5n

if n = 2,
if n > 2.

So we get T (n) = 9nlog 3 10n = O(nlog 3 )


Hence recursive karatsuba is more efficient than normal
karatsuba which is proved by hardware implementation on
FPGA.

IV.

A LGORITHM I MPLEMENTATION

In this section the details of the floating point multiplier


design using Booths, Karatsuba and Recurive karatsuba algorithms has been discussed. The implementation of design has
been focused on xilinx virtex-6 FPGA.
Algorithm 1 Booths Multiplication
INPUTS : Two Operands - Multiplier and
Multiplicand(M c)
OUTPUT: Product(P ) with double the size of the Operand
1. Consider a Product register (P r) with length equal to
twice that of the operands length (L) plus one.
2. Append a zero to the right of Multiplier.
M q(L : 0) M ultiplier(L 1 : 0) & 0 00
3. Initialize the lower L bits of the Product register with
Multiplicand.
P r(2L : 0) (others <=0 00 ) & M c(L 1 : 0)
4. Find the 2s compliment of multiplicand and store it in
another register M cc .
for i = 0 to L 1 do
if M q(1 : 0) = 01 then
P r(2L : L) P r(2L : L) + M c(L 1 : 0)
else if M q(1 : 0) = 10 then
P r(2L : L) P r(2L : L) + M cc (L 1 : 0)
else
do nothing
end if
end for
6. P = P r(2L 1 : 0) is the Product .

Algorithm 2 Karatsuba Multiplication


INPUTS : Two Operands X and Y of length N .
OUTPUT: Product P with length 2N .
1. If the length of the operands is odd then prepend a zero
to the two operands in order to make their length an even
number.
L N + 1 if N is odd
LN
if N is even
2. Break each operand into two sequences of length L/2
each.
X1 X(L/2 1 : 0)
X2 X(L 1 : L/2)
Y 1 Y (L/2 1 : 0)
Y 2 Y (L 1 : L/2)
3. Calculate the product of first half length sequences of X
and Y and also product of second half length sequences of
X and Y
P 1 X1 Y 1
P 2 X2 Y 2
4. Calculate P 3 as
P 3 (X1 + X2) (Y 1 + Y 2) P 1 P 2
5. Then calculate Pr as
P r(2L 1 : L) = P 1(L 1 : 0)
P r(L 1 : 0) = P 2(L 1 : 0)
P r(3L/2 1 : L/2) = P r(3L/2 1 : L/2) + P 3(L 1 : 0)
6. Final product is
P = P r(2L 3 : 0) if N is odd
P = Pr
if N is even

Algorithm 3 Recursive Karatsuba Multiplication


INPUTS : Two Operands X and Y of length L.
OUTPUT: Product P with length 2L.
1. Recursive Karatsuba Multiplication requires the length of
the operand to be a power of 2.
2. So, prepend zeros to the two operands such that their
length becomes a power of 2.
Let the length be N after adding the 0s and X, Y are
the sequences.
3. Product P is obtained by
P Recur(X,Y) where Recur is the recursive function
used for the calculation of the product.
4. The recursive function for the multiplication is as follows:
function Recur (Op1, Op2:Bit Vector)
Begin
l length(Op1)
if l = 2 then
P r(2l 1 : 0) Op1 Op2
else
P 1 Recur(Op1(l 1 : l/2), Op2(l 1 : l/2)
P 2 Recur(Op1(l/2 1 : 0), Op2(l/2 1 : 0)
P 3 Recur(Op1(l 1 : l/2) + Op1(l/2 1 :
0), Op2(l/2 1 : 0) + Op2(l/2 1 : 0))
P r(2l 1 : 0) P 1(l 1 : 0)&P 2(l 1 : 0)
P r(3l/2 1 : l/2) P 3(l 1 : 0) + P r(3l/2 1 : l/2))
end if
return P r
End Recur

TABLE II: Comparison between algorithms for Single precision format


Device Utilisation Summary

Booth

Karatsuba

Number of slices

1282

1156

679

Number of 4-input LUTs

2351

2165

1206

Number of Bonded INPUTS

64

64

64

Number of Bonded OUTPUT

32

32

32

Recursive Karatsuba

Macro statistics
Adders/ Subtractor

50

81

45

Multiplexers

528

462

Maximum Combinational path delay

98.837 ns

54.899 ns

31.123 ns

Maximum Combinational path delay

91.679 ns

58.878 ns

21.892 ns

Maximum output required time after clock

4.114 ns

4.114 ns

4.114 ns

Timing Summary

V.

R ESULTS AND SIMULATION

We have implemented the aforementioned algorithms for


both single and double precision using VHDL. The design
has been synthesized and routed on Virtex-6 FPGA target
using Xilinx ISE. Simulation results has been analyzed in
ModelSim-SE. Hardware utilization and performance for all
proposed implementation and for the Xilinx core are shown in
Table-II and Table-III respectively, on FPGA. All the hardware
resource estimates were obtained after place and route process
of FPGA synthesis. The simulation results for single and
double precision are shown in figure-1 and figure-2.

TABLE III: Comparison between algorithms for double precision format


Device Utilisation Summary

Booth

Karatsuba

Recursive Karatsuba

Number of slices

6115

5285

1784

Number of 4-input LUTs

11346

9662

3280

Number of Bonded INPUTS

128

128

128

Number of Bonded OUTPUT

64

64

64

Macro statistics
Adders/ Subtractor

108

81

45

Multiplexers

2703

462

Maximum Combinational path delay

196.198 ns

54.899 ns

37.348 ns

Maximum Combinational path delay

189.029 ns

58.878 ns

29.660 ns

Maximum output required time after clock

4.114 ns

4.114 ns

4.114 ns

Timing Summary

VI.

C ONCLUSION

After implementation and comparison of three multiplication algorithms, Booths, Normal Karatsuba and Recursive
Karatsuba, a brief performance result is reported in this paper.
Two main criteria, FPGA resources and processing speed are
adopted when evaluating the performance. We can see from
the results that Recurisve Karatsuba Algorithm performs better
than Normal Karatsuba and Booths algorithm. Recursive
Karatsuba algorithm has the least FPGA resources utilised and
the speed is relatively high. Hence Recursive Karatsuba is the
best algorithm among the three algorithms.

You might also like