Baughwooly Multiplier
Baughwooly Multiplier
Baughwooly Multiplier
2
Required Reading
3
Multi-Operand Signed Addition
5
Adding Two's Complement Numbers: Avoiding or
Detecting Overflow
To avoid overflow, adding a K.L binary two's complement
number to a K.L two's complement number results in a
(K+1).L number. To compute, sign extend MSB, ignore cK+1
Example: 00111.01 + K=4, L=2
7
Adding Multiple Two's Complement Numbers
When adding two numbers, must sign-extend to final result's
width, then add together using adder techniques
When adding multiple numbers, must sign-extend to final
result's width, then add together using adder techniques
This can dramatically increase size of carry-save adders
-24 23 22 21 20
1 0 0 1 1
= -24 + 21 + 20 = -16 + 2 + 1 = -13
0 1 0 0 1
= 23 + 20 = 8 + 1 = 9
9
Two's Complement Sign Extension Using "Negative
Weight" Method
To sign extend a k-bit number by one bit
First step: Instead of making the MSB have a weight of -1, consider that the digit has a value of -1
Does not "exist" hardware, just for notational purposes
Second step: Apply the equation xi = (1 |xi|) + 1 2 = xi' + 1 -2, where xi' = NOT xi
In original representation, complement the sign bit, add one to sign bit column (column k-1) , add negative one to
column k
Third step: Now remap the new sign bit such that the weight is -1
24 23 22 21 20
-1 0 0 1 1
= -24 + 21 + 20 = -16 + 2 + 1 = -13
25 24 23 22 21 20
-1 0 0 0 1 1
+1
= -25 + 24 + 21 + 20 = -16 + 2 + 1 = -13
-25 24 23 22 21 20
1 1 0 0 1 1
= -25 + 24 + 21 + 20 = -16 + 2 + 1 = -13 10
Multi-Operand Sign Extension using "Negative
Weight" Method
---------- Extended positions ---------- Sign Magnitude positions ---------
1 1 1 1 0 xk1' xk2 xk3 xk4 ...
yk1' yk2 yk3 yk4 ...
zk1' zk2 zk3 zk4 ...
1
Apply the equation xi = (1 |xi|) + 1 2 = xi' + 1 -2, to all three numbers
Complement the three sign bits
Add three +1 values to the k-1 column
Add three -1 values to the k column
One -1 value from the k column and two +1 values from the k-1 column eliminate each other
This leaves two -1 values in k column, one +1 value in the k-1 column
Two -1 values from the k column become
One -1 value in the k+1 column
One 0 value in the k column
One -1 value in the k+1 column becomes
One -1 value in the k+2 column
One +1 value in the k+1 column
Continue moving left until you reach the new MSB
When have final -1 value in the new MSB column (in this case, k+4) "re-map" the new sign bit such that it's weight
is -1
The -1 in the MSB now becomes a 1
11
Multiplication
LIBRARY ieee;
USE ieee.std_logic_1164.all; Since using both signed and unsigned data types,
USE ieee.std_logic_arith.all; dont use std_logic_unsigned/signed. Do all
conversions explicitly.
entity multiply is
port(
a : in STD_LOGIC_VECTOR(15 downto 0);
b : in STD_LOGIC_VECTOR(7 downto 0);
cu : out STD_LOGIC_VECTOR(23 downto 0);
cs : out STD_LOGIC_VECTOR(23 downto 0)
);
end multiply;
architecture dataflow of multiply is VHDL and hardware does not care about the
SIGNAL sa: SIGNED(15 downto 0); binary point. It is up to the user to keep
SIGNAL sb: SIGNED(7 downto 0); track of where the binary point
SIGNAL sres: SIGNED(23 downto 0); is in the input and output.
SIGNAL ua: UNSIGNED(15 downto 0);
SIGNAL ub: UNSIGNED(7 downto 0);
SIGNAL ures: UNSIGNED(23 downto 0);
15
Multiplication of signed and unsigned numbers (2)
begin
-- signed multiplication
sa <= SIGNED(a);
sb <= SIGNED(b);
sres <= sa * sb;
cs <= STD_LOGIC_VECTOR(sres);
-- unsigned multiplication
ua <= UNSIGNED(a);
ub <= UNSIGNED(b);
ures <= ua * ub;
cu <= STD_LOGIC_VECTOR(ures);
end dataflow;
16
Notation
a Multiplicand ak-1ak-2 . . . a1 a0
x Multiplier xk-1xk-2 . . . x1 x0
p Product (a x) p2k-1p2k-2 . . . p2 p1 p0
17
Multiplication of Two 4-bit Unsigned
Binary Numbers in Dot Notation
Partial Product 0
Partial Product 1
Partial Product 2
Partial Product 3
k-1
p = a x = a xi 2i =
i=0
= x0a20 + x1a21 + x2a22 + + xk-1a2k-1
19
Unsigned Multiplication
a4 a3 a2 a1 a0
x x4 x3 x2 x1 x0
p9 p8 p7 p6 p5 p4 p3 p2 p1 p0
20
Tree Multiplication
Area
TODAY'S LECTURE
22
Full Tree Architecture
Multiplier
Designs are distinguished by a ...
variations in three elements:
Multiple- a
Forming
Circuits a
1. Multiple-forming circuits a
. . .
Partial-Products
Reduction Tree
2. Partial products reduction tree (Multi-Operand
Addition Tree)
Redundant result
23
Tree Adder Components
1: Multiple Forming Circuits
In binary multipliers, these are AND gates (i.e. a AND xi)
In signed Booth multipliers, these are Booth recoding blocks
These circuits create partial products ready to be summed
2: Partial Products Reduction Tree
This is usually a carry-save tree (i.e. Wallace, Dadda)
Produces a "redundant" result (i.e. carry and save outputs)
Some lower bits produced directly
3: Redundant-to-Binary Converter
This is usually a fast carry-propagate adder (i.e. carry and save
lines final output sum)
24
1. Multiple Forming Circuits
a4 a3 a2 a1 a0
Creates 5 partial products,
each requires 5 AND gates
25 AND gates x x4 x3 x2 x1 x0
p9 p8 p7 p6 p5 p4 p3 p2 p1 p0
25
2. Partial Products Tree Reduction (Wallace)
After partial products created, must sum them together
Wallace tree: reduce number operands as soon as possible, 4 x 4 example below
1 2 3 4 3 2 1 1 2 3 4 3 2 1
FA FA FA HA FA FA
-------------------- --------------------
1 3 2 3 2 1 1 1 3 2 2 3 2 1
FA HA FA HA FA HA HA FA
---------------------- ----------------------
2 2 2 2 1 1 1 2 2 2 2 1 2 1
4-Bit Adder 6-Bit Adder
---------------------- ----------------------
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
28
7 x 7 Multiplier
[0, 6] [2, 8] [3, 9] [5, 11]
[1, 7] [4, 10] [6, 12]
[1, 6]
7-bit CSA 7-bit CSA
[2, 8] [1,8] [5, 11] [3, 11]
7-bit CSA
[6, 12] [3, 12]
[2, 8]
7-bit CSA
[3,9] [2,12]
[3,12]
30
Slice of 11:2 Counter Reduction Tree In VLSI,
one column
Inputs FA
FA FA FA
Level-1 FA
carries
11 + y1 = 2y1 + 3 FA FA FA
Level-2
Therefore, y1 = 8 carries
carries are needed FA
FA FA
Level-3
carries FA
FA
FA
Level-4
carry
FA
FA
Outputs FA
FA 31
Binary Tree of 4-to-2 Reduction Modules
32
Example Multiplier with 4-to-2 Reduction Tree
Multiple M u l t i p l i c a n d
Even if 4-to-2 reduction
generation ...
Similarly,
using Booths
recoding may
not yield any
advantage,
because it
introduces
Redundant-to-binary converter
irregularity
33
3. Redundant-to-Binary Converter
34
Signed Tree Multiplication
36
Remove redundant full adder cells for sign-
extension
Signs
Sign extensions
x x x x x x x The difference in
x x x x x x x x multiplication is the
x x x x x x x x x shifting sign positions
Five redundant copies
removed
FA FA FA FA FA FA
Fig. 11.7 Sharing of full adders to reduce
the CSA width in a signed tree multiplier.
37
Two's Complement Negative Weight
Representation
-24 23 22 21 20
a4 a3 a2 a1 a0
x x4 x3 x2 x1 x0
24 23 22 21 20
-a4 a3 a2 a1 a0
x -x4 x3 x2 x1 x0
38
Two's Complement Multiplication
-a4 a3 a2 a1 a0
x -x4 x3 x2 x1 x0
-p9 p8 p7 p6 p5 p4 p3 p2 p1 p0
29 28 27 26 25 24 23 22 21 20 39
Two's Complement Multiplication
-p9 p8 p7 p6 p5 p4 p3 p2 p1 p0
29 28 27 26 25 24 23 22 21 20
p9 p8 p7 p6 p5 p4 p3 p2 p1 p0
-29 28 27 26 25 24 23 22 21 20
40
Implementing Partial Products
z=1-z
z=1-z
- aj xi = - aj (1 - xi) = aj xi - aj = aj xi + aj - 2 aj
- aj xi = - (1- aj ) xi = aj xi - xi = aj xi + xi - 2 xi
- aj xi = - (1- aj xi) = aj xi - 1 = aj xi + 1 - 2
-aj = - (1 - aj) = aj - 1 = aj + 1 - 2
-xi = - (1 - xi) = xi - 1 = xi + 1 - 2
41
-a4x0
-a4x1
-a4x2
+ -a4x3
-a4 a4x0
-a4 a4x1 a4
-a4 a4x2 a4
-a4 a4x3 a4
a4
a4 a4x3 a4x2 a4x1 a4x0
-1 a4
42
-a3x4 -a2x4 -a1x4 -a0x4
+
-x4 a0x4
-x4 a1x4 x4
-x4 a2x4 x4
-x4 a3x4 x4
x4
x4 a3x4 a2x4 a1x4 a0x4
-1 x4
43
29 28 27 26 25 24
a4 a4x3 a4x2 a4x1 a4x0
-1 a4
combine
x4 a3x4 a2x4 a1x4 a0x4
-1 x4
-1 a4 a4x3 a4x2 a4x1 a4x0
x4 a3x4 a2x4 a1x4 a0x4
a4
x4
1 a4 a4x3 a4x2 a4x1 a4x0
remap sign bit to
negative weight x4 a3x4 a2x4 a1x4 a0x4
a4
-29 x4 44
Baugh-Wooley Twos Complement Multiplier
-a4 a3 a2 a1 a0
x -x4 x3 x2 x1 x0
46
-a4x0
-a4x1
-a4x2
+ -a4x3
-1 a4 x0
-1 a4x1 1
-1 a4x2 1
-1 a4x3 1
-1 a1x4 1
-1 a2x4 1
-1 a3x4 1
1
a3x4 a2x4 a1x4 a0x4
-1 1
48
29 28 27 26 25 24
a4x3 a4x2 a4x1 a4x0
-1 1
combine a3x4 a2x4 a1x4 a0x4
-1 1
-29
49
Modified Baugh-Wooley Multiplier
-a4 a3 a2 a1 a0
x -x4 x3 x2 x1 x0
52
Basic 5 x 5 Unsigned Array Multiplier
53
5 x 5 Array Multiplier
Critical path
(assuming sum
and carry delays
the same)
54
Array Multiplier Basic Cell
x cin
y FA
cout s
55
Baugh-Wooley Twos Complement Multiplier
-a4 a3 a2 a1 a0
x -x4 x3 x2 x1 x0
57
Array Multiplier Modified Basic Cell
am
si-1 ci
xn
AND gate
included in
basic cell FA
ci+1 si
58
5 x 5 Array Multiplier with Modified Cells
59
Pipelined 5 x 5 Multiplier
60
Squaring
62
Optimizations for Squaring (2)
xi x j
xi xj + xi xj = 2 x i x j
xj x i
xi xj xi xi = x i
xi xj xi xj + xi = 2 xi xj - xi xj + xi =
xi
= 2 xi xj + xi (1-xj) =
xi xj xi xj
= 2 xi x j + x i xj
63
Squaring Using Lookup Tables
for relatively small values k
input=a output=a2
0 0
1 1
2 4
3 9
4 16 2k words 2k-bit each
...
i i2
...
2k-1 (2k-1)2
64
Multiplication Using Squaring
(a+x)2 - (a-x)2
ax=
4
65