FPGA Implementation of IEEE-754 Karatsuba Multiplier

Title to be selected
Ravi Kishore Kodali and Satya Kesav

Department of Electronics and Communication Engineering
National Institute of Technology
Warangal, India.
AbstractFloating point arithmetic is widely used in several
applications of signal processing and in most scientific computations. Especially, the floating point multiplication is more
frequently used among all arithmetic operations. The IEEE-754
format of single and double precision floating point numbers
multiplication requires the 24 x 24 and 53 x 53 mantissa multiplications respectively. Hence, there is a constraint on the hardware
utilization of the mantissa multiplier. This paper presents the
implementation of Floating point multiplier using two different
multiplication algorithms namely Booths and Karatsuba (Normal and Recursive) Algorithms on FPGA devices. Xilinx Virtex-6
device has been used during the implementation and comparison
of the hardware resources and their execution speeds is made.
The performance results have been summarized, compared and
conclusion has been presented.
KeywordsFloating Point Multiplication, FPGA, Singleprecision, Double-precision, Karatsuba Multiplication, Booths
Multiplication.
I.
I NTRODUCTION
Embedded systems are designed for different functionality

which finally requires the manipulation of real-valued data.
These data are stored as floating point numbers in the memory
which is limited. So, these floating point computations are to
be approximated to avoid memory limitations which are known
as the floating point arithmetic [?].
In floating point arithmetic, multiplication operation occurs
frequently when compared to others. Thus floating point multiplication plays a major role in the design and implementation
aspects of floating point processor [?].Computational speed
and hardware utilization [?] are the two criteria that decide
the choice of selection of an algorithm in the implementation
of floating point multipliers [?].
II.
L ITERATURE REVIEW
Optimizing the operational speed of the multiplier is the

main aspect in the design of a floating point arithmetic
processor. 24 x 24 and 53 x 53 mantissa multiplication when
performed using traditional multiplication approach, utilizes
large amount of hardware resources and takes more delay
for the computation. So, the hardware utilization and timing
delays can be reduced by the Booths algorithm when used
for mantissa multiplication [?] which is detailed in [?] and [?]
The timing delay and power dissipation are further reduced
by using a carry save adder scheme, high-speed CMOS full
adder and modified carry select adder which is given in [?].
However, several algorithms are in existence which serves the
purpose of optimization of floating point multipliers.
Karatsuba algorithm defined for multiplication of long
integers is one of the fastest and best algorithms. The survey of
strengths and weaknesses of booths and karatsuba algorithm is
presented in [?] which concludes that karatsuba multiplication
takes lesser signal propogation time and also long number
multiplication can be suitably implemented using karatsubas
algorithm when compared to booths [?].
The implementation of Floating point multiplier using
karatsuba algorithm is very efficient as presented in [?]. When
this algorithm is performed recursively [?] till it encounters
the multiplication of 2-bit or 3-bit numbers, use of higher
order logical multiplier blocks is avoided and hence the
implementation becomes very simple and efficient in terms
of area requirements [?]. Hence recursive karatsuba algorithm
is chosen for the implementation of the floating point multiplier on FPGA platform and comparison is done with the
aforementioned two algorithms.
III.
OVERVIEW OF FLOATING POINT MULTIPLICATION

AND THE USED ALGORITHMS
The format of a floating point number is as follows:

FPGAs offer high performance and very high operating
speeds with limited amount of logic devices and IP cores
available on the system. Their applications are more commonly
observed in the field of digital signal processing, communications engineering, and also in very high speed computing
systems such as super computers. This work involves efficient
implementation of floating point arithmetic operation, namely
single and double precision floating point multiplication, using
two different algorithms on an FPGA. The rest of the paper
is organized as follows: Section II provides literature survey,
section III presents an overview of the floating point multiplication and the algorithms used, section IV gives hardware
implementation, section V presents simulation and experimental results and section VI concludes the work
For single precision:

Sign Exponent M
antissa
| {z } | {z } | {z }
1-bit
8-bits
23-bits
For Double precision:

Sign Exponent M
antissa
| {z } | {z } | {z }
1-bit
11-bits
52-bits
In general, floating point arithmetic implementation involves computing the sign, exponent and mantissa part of the
operands separately, and then combining the three of them after
rounding and normalization. A basic overview of the flow of
floating point multiplication operations is given below.
TABLE I: Booths multiplier recoding rules
1)
2)
3)
4)
5)
Qn
Qn+1
shift
+1
Add M
Subtract M
Shift M
Recoded Bits
Operation Performed
Make group of two bits in the overlapped way.
Consider an example which has the 8 bit multiplicand as

11011001 and multiplier as 011100010.
Multiplicand
11011001
Multiplier
0 1 1 1 0 0 0 10
Recoded multiplier
+1 0 0 -10 0+1-1
XOR sign bits of both numbers to get the sign bit of

the product.
Add both operands exponent, adjust the bias.
Perform the multiplication of both mantissa.
Perform normalization of mantissa product
Finally, rounding of mantissa and bias adjustment of
exponent.
000100111
111011001
000000000
000000000
000100111
000000000
The first three steps can be easily stated by the following

expression,
000000000
111011001
y = Operand1 Operand2
= (1)sign1 2exp1 1.mant1 (1)sign2 2exp2 1.mant2
= 1sign1 xorsign2 2exp11 +exp2 bias 1.mant1 1.mant2
where sign1 , exp1 , mant1 are the sign, exponent and mantissa of first operand, and sign2 , exp2 , mant2 are the sign,
exponent and mantissa of second operand
Product
0000001001001001
B. Non-Recursive Karatsuba Multiplication Algorithm

Let p = (u1u2...un)b , q = (v1v2...vn)b , where n = 2k ,
then we can write p and q as the following form:
p = p1 bn/2 + p0 , q = q1 bn/2 + q0
So we have
A. Booths Multiplication Algorithm

Booth algorithm provides a procedure for multiplying
binary integers in signed-2s complement representation [13].
According to the multiplication procedure, strings of 0s in the
multiplier require no addition but just shifting and a string of
1s in the multiplier from bit weight 2k to weight 2m can be
treated as 2k+1 2m .
Booth algorithm involves recoding the multiplier first. In
the recoded format, each bit in the multiplier can take any of
the three values: 0, 1 and -1.Suppose we want to multiply
a number by 01110 (in decimal 14). This number can be
considered as the difference between 10000 (in decimal 16)
and 00010 (in decimal 2). The multiplication by 01110 can be
achieved by summing up the following products:
24 times the multiplicand (24 = 16)

2s complement of 21 times the multiplicand
(21 = 2).
In a standard multiplication, three additions are
required due to the string of three 1s. This can
be replaced by one addition and one subtraction.
The above requirement is identified by recoding
of the multiplier 01110 using the following rules
summarized in table 1.
p q = p1 q1 bn + (p1 q0 + p0 q1 )bn/2 + p0 q0
In 1963, Karatsuba transformed formula (1) to the following formula (2)

p q = r1 bn + (r2 r1 r0 )bn/2 + r0
To generate recoded multiplier for radix-2, following steps

are to be performed:
Append the given multiplier with a zero to the

LSB side
(2)
where r0 = p0 q0 , r1 = p1 q1 r2 = (p0 + q0 )(q0 + q1 )

if n = 2, equation (2) needs to execute three multiplication
and four addition and subtraction base operations. If n > 2,
the same equation reduces the problem of multiplying two
length n(n = 2k ) integers to three multiplication of length
n/2 numbers, namely p0 q0 , p1 q1 , (p1 + p0 )(q1 + q0 ), plus two
addition of length n/2 numbers, two additions of length n
numbers, two subtractions of length n numbers.
C. Recursive Karatsuba Multiplication Algorithm
We can obtain the same product as that in above by
using divide and conquer method recursively. Let T (n) be the
computation time of the multiplication p q, we can get the
recursion easily:

T (n) =
Recode the number using the above table.
(1)
7
3T (n/2) + 5n
if n = 2,
if n > 2.
So we get T (n) = 9nlog 3 10n = O(nlog 3 )

Hence recursive karatsuba is more efficient than normal
karatsuba which is proved by hardware implementation on
FPGA.
IV.
A LGORITHM I MPLEMENTATION
In this section the details of the floating point multiplier

design using Booths, Karatsuba and Recurive karatsuba algorithms has been discussed. The implementation of design has
been focused on xilinx virtex-6 FPGA.
Algorithm 1 Booths Multiplication
INPUTS : Two Operands - Multiplier and
Multiplicand(M c)
OUTPUT: Product(P ) with double the size of the Operand
1. Consider a Product register (P r) with length equal to
twice that of the operands length (L) plus one.
2. Append a zero to the right of Multiplier.
M q(L : 0) M ultiplier(L 1 : 0) & 0 00
3. Initialize the lower L bits of the Product register with
Multiplicand.
P r(2L : 0) (others <=0 00 ) & M c(L 1 : 0)
4. Find the 2s compliment of multiplicand and store it in
another register M cc .
for i = 0 to L 1 do
if M q(1 : 0) = 01 then
P r(2L : L) P r(2L : L) + M c(L 1 : 0)
else if M q(1 : 0) = 10 then
P r(2L : L) P r(2L : L) + M cc (L 1 : 0)
else
do nothing
end if
end for
6. P = P r(2L 1 : 0) is the Product .
Algorithm 2 Karatsuba Multiplication

INPUTS : Two Operands X and Y of length N .
OUTPUT: Product P with length 2N .
1. If the length of the operands is odd then prepend a zero
to the two operands in order to make their length an even
number.
L N + 1 if N is odd
LN
if N is even
2. Break each operand into two sequences of length L/2
each.
X1 X(L/2 1 : 0)
X2 X(L 1 : L/2)
Y 1 Y (L/2 1 : 0)
Y 2 Y (L 1 : L/2)
3. Calculate the product of first half length sequences of X
and Y and also product of second half length sequences of
X and Y
P 1 X1 Y 1
P 2 X2 Y 2
4. Calculate P 3 as
P 3 (X1 + X2) (Y 1 + Y 2) P 1 P 2
5. Then calculate Pr as
P r(2L 1 : L) = P 1(L 1 : 0)
P r(L 1 : 0) = P 2(L 1 : 0)
P r(3L/2 1 : L/2) = P r(3L/2 1 : L/2) + P 3(L 1 : 0)
6. Final product is
P = P r(2L 3 : 0) if N is odd
P = Pr
if N is even
Algorithm 3 Recursive Karatsuba Multiplication

INPUTS : Two Operands X and Y of length L.
OUTPUT: Product P with length 2L.
1. Recursive Karatsuba Multiplication requires the length of
the operand to be a power of 2.
2. So, prepend zeros to the two operands such that their
length becomes a power of 2.
Let the length be N after adding the 0s and X, Y are
the sequences.
3. Product P is obtained by
P Recur(X,Y) where Recur is the recursive function
used for the calculation of the product.
4. The recursive function for the multiplication is as follows:
function Recur (Op1, Op2:Bit Vector)
Begin
l length(Op1)
if l = 2 then
P r(2l 1 : 0) Op1 Op2
else
P 1 Recur(Op1(l 1 : l/2), Op2(l 1 : l/2)
P 2 Recur(Op1(l/2 1 : 0), Op2(l/2 1 : 0)
P 3 Recur(Op1(l 1 : l/2) + Op1(l/2 1 :
0), Op2(l/2 1 : 0) + Op2(l/2 1 : 0))
P r(2l 1 : 0) P 1(l 1 : 0)&P 2(l 1 : 0)
P r(3l/2 1 : l/2) P 3(l 1 : 0) + P r(3l/2 1 : l/2))
end if
return P r
End Recur
TABLE II: Comparison between algorithms for Single precision format

Device Utilisation Summary
Booth
Karatsuba
Number of slices
1282
1156
679
Number of 4-input LUTs
2351
2165
1206
Number of Bonded INPUTS
64
64
64
Number of Bonded OUTPUT
32
32
32
Recursive Karatsuba
Macro statistics
Adders/ Subtractor
50
81
45
Multiplexers
528
462
Maximum Combinational path delay
98.837 ns
54.899 ns
31.123 ns
91.679 ns
58.878 ns
21.892 ns
Maximum output required time after clock
4.114 ns
4.114 ns
4.114 ns
Timing Summary
V.
R ESULTS AND SIMULATION
We have implemented the aforementioned algorithms for

both single and double precision using VHDL. The design
has been synthesized and routed on Virtex-6 FPGA target
using Xilinx ISE. Simulation results has been analyzed in
ModelSim-SE. Hardware utilization and performance for all
proposed implementation and for the Xilinx core are shown in
Table-II and Table-III respectively, on FPGA. All the hardware
resource estimates were obtained after place and route process
of FPGA synthesis. The simulation results for single and
double precision are shown in figure-1 and figure-2.
TABLE III: Comparison between algorithms for double precision format

Device Utilisation Summary
Booth
Karatsuba
Recursive Karatsuba
Number of slices
6115
5285
1784
Number of 4-input LUTs
11346
9662
3280
Number of Bonded INPUTS
128
128
128
Number of Bonded OUTPUT
64
64
64
Macro statistics
Adders/ Subtractor
108
81
45
Multiplexers
2703
462
196.198 ns
54.899 ns
37.348 ns
189.029 ns
58.878 ns
29.660 ns
Maximum output required time after clock
4.114 ns
4.114 ns
4.114 ns
Timing Summary
VI.
C ONCLUSION
After implementation and comparison of three multiplication algorithms, Booths, Normal Karatsuba and Recursive
Karatsuba, a brief performance result is reported in this paper.
Two main criteria, FPGA resources and processing speed are
adopted when evaluating the performance. We can see from
the results that Recurisve Karatsuba Algorithm performs better
than Normal Karatsuba and Booths algorithm. Recursive
Karatsuba algorithm has the least FPGA resources utilised and
the speed is relatively high. Hence Recursive Karatsuba is the
best algorithm among the three algorithms.

FPGA Implementation of IEEE-754 Karatsuba Multiplier

Uploaded by

Copyright:

Available Formats

FPGA Implementation of IEEE-754 Karatsuba Multiplier

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

FPGA Implementation of IEEE-754 Karatsuba Multiplier

Uploaded by

Copyright:

Available Formats

Title to be selected

Ravi Kishore Kodali and Satya Kesav

Embedded systems are designed for different functionality

Optimizing the operational speed of the multiplier is the

OVERVIEW OF FLOATING POINT MULTIPLICATION

The format of a floating point number is as follows:

For single precision:

For Double precision:

TABLE I: Booths multiplier recoding rules

Make group of two bits in the overlapped way.

Consider an example which has the 8 bit multiplicand as

XOR sign bits of both numbers to get the sign bit of

The first three steps can be easily stated by the following

B. Non-Recursive Karatsuba Multiplication Algorithm

A. Booths Multiplication Algorithm

24 times the multiplicand (24 = 16)

In 1963, Karatsuba transformed formula (1) to the following formula (2)

To generate recoded multiplier for radix-2, following steps

Append the given multiplier with a zero to the

where r0 = p0 q0 , r1 = p1 q1 r2 = (p0 + q0 )(q0 + q1 )

Recode the number using the above table.

So we get T (n) = 9nlog 3 10n = O(nlog 3 )

In this section the details of the floating point multiplier

Algorithm 2 Karatsuba Multiplication

Algorithm 3 Recursive Karatsuba Multiplication

TABLE II: Comparison between algorithms for Single precision format

Number of 4-input LUTs

Number of Bonded INPUTS

Number of Bonded OUTPUT

Maximum Combinational path delay

Maximum Combinational path delay

Maximum output required time after clock

R ESULTS AND SIMULATION

We have implemented the aforementioned algorithms for

TABLE III: Comparison between algorithms for double precision format

Number of 4-input LUTs

Number of Bonded INPUTS

Number of Bonded OUTPUT

Maximum Combinational path delay

Maximum Combinational path delay

Maximum output required time after clock

You might also like