Techniques For Implementing Multipliers in Stratix, Stratix GX & Cyclone Devices
Techniques For Implementing Multipliers in Stratix, Stratix GX & Cyclone Devices
Techniques For Implementing Multipliers in Stratix, Stratix GX & Cyclone Devices
Multipliers in Stratix,
Stratix GX & Cyclone Devices
August 2003, ver. 1.0 Application Note 306
Introduction Stratix™, Stratix GX, and Cyclone™ FPGAs have dedicated architectural
features that make it easy to implement high-performance multipliers.
Stratix and Stratix GX devices feature embedded high-performance
multiplier-accumulators (MACs) in dedicated digital signal processing
(DSP) blocks. DSP blocks can operate at data rates above 300 million
samples per second (MSPS), making Stratix and Stratix GX FPGAs ideal
for high-speed DSP applications. In addition to the dedicated DSP blocks,
designers can also use the devices’ TriMatrix™ memory blocks to
implement variable depth/width, high-performance soft multipliers. For
example, designers can implement TriMatrix memory blocks as look-up
tables (LUTs) that contain partial results from multiplication of input data
with coefficients. Cyclone devices have M4K memory blocks which can
be used as LUTs to implement variable depth/width high-performance
soft multipliers for low cost, high volume DSP applications.
Stratix, Stratix GX, and Cyclone FPGAs can implement the multiplier
types shown in Table 1.
■ Parallel multiplication
■ Semi-parallel multiplication
■ Sum of multiplication
■ Hybrid multiplication
■ Fully variable multipliers
Multipliers using DSP These multipliers are implemented in dedicated DSP v v -
blocks or logic blocks or LEs using the lpm_mult, altmult_add, or
elements (LEs) altmult_accum megafunctions.
Hard multiplier These multipliers are implemented in a combination of v v -
DSP blocks and LEs.
Notes to Table 2:
(1) Soft multipliers implemented in sum of multiplication mode. RAM blocks
configured with 18-bit data widths and sum of coefficients up to 18 bits.
(2) The number in parentheses represents the increase factor, which is the total
number of multipliers with soft multipliers divided by the number of 18 × 18
multipliers supported by DSP blocks only.
(3) The total number of multipliers may vary according to the multiplier mode used.
Notes to Table 3:
(1) Soft multipliers implemented in sum of multiplication mode. RAM blocks
configured with 18-bit data widths and sum of coefficients up to 18 bits.
(2) The number in parentheses represents the increase factor, which is the total
number of multipliers with soft multipliers divided by the number of 18 × 18
multipliers supported by DSP blocks only.
(3) The total number of multipliers may vary according to the multiplier mode used.
Notes to Table 4:
(1) Soft multipliers implemented in sum of multiplication mode. RAM blocks
configured with 18-bit data widths and sum of coefficients up to 18 bits.
(2) The total number of multipliers may vary according to the multiplier mode used.
This application note describes the dedicated memory and DSP blocks,
the supported multiplier types, and includes an example of each type.
Memory Blocks The Stratix and Stratix GX TriMatrix memories consist of three types of
RAM blocks: M512, M4K, and M-RAM. The M512 and M4K RAM blocks
are memory blocks with a maximum width of 18 and 36 bits, respectively,
and a maximum performance of approximately 300 MHz, which is ideal
for implementing soft multipliers.
Tables 5 and 6 show the available TriMatrix memory blocks in Stratix and
Stratix GX devices, respectively.
The Cyclone M4K memory blocks have a maximum width of 36 bits and
a maximum performance of approximately 200 MHz. Table 7 shows the
number of Cyclone M4K memory blocks.
Table 8 shows the possible configurations of the M512, M4K, and M-RAM
blocks found in Stratix, Stratix GX, and Cyclone devices.
DSP Blocks Stratix and Stratix GX devices contain dedicated DSP blocks for
implementing high-speed multiplication functions within the FPGA.
Tables 9 and 10 show the number of DSP blocks in Stratix and Stratix GX
respectively.
Note to Table 9:
(1) Each device has either the number of 9 × 9-, 18 × 18-, or 36 × 36-bit multipliers shown. The total number of
multipliers for each device is not the sum of all the multipliers.
Table 10. Number of DSP Blocks in Stratix GX Devices (Part 1 of 2) Note (1)
Table 10. Number of DSP Blocks in Stratix GX Devices (Part 2 of 2) Note (1)
Multiplication
The base of many DSP algorithms is multiplication in which a multiplier
is multiplied to a multiplicand. In this operation, each element of the
multiplier is multiplied by each bit of the multiplicand. Then, the partial
product of each multiplication is accumulated according to the weight of
the partial product, where the weight indicates the location of a bit
corresponding to other bits. For example, if a partial product of bits 4
through 7 is added to a partial product of bits 0 through 3, the partial
product of 4 through 7 is shifted according to their weight and then
accumulated to the partial product of previous stages. Figure 1 shows a
simple 2 × 2 multiplication of multiplier a1a0 to multiplicand b1b0.
b1
b0
a1
b1 b0
b1 b0
x a1 a0
a0b1 a0b0
+ a1b1 a1b0
c3 c2 c1 c0
carry_in
c3 c2 c1 c0
Distributed Arithmetic
Distributed arithmetic is a method of performing multiplication by
distributing the operation over many LUTs. Figure 2 shows a four-
product MAC function that uses sequential shift and add to multiply four
pairs, and then sums their partial product to obtain a final result. Each
multiplier forms partial products by multiplying the multiplicand by one
bit of the input data (multiplier) at a time, using an AND gate.
c0 Scaling Accumulator
w SREG
>> 1
c1
x SREG
D Q
c2
y SREG CLK
c3
wc0 + xc1 + yc2 + zc3
z SREG
At the end of the process, each partial product result of each input bit is
summed prior to the final scaling accumulator stage, which performs a
shift-accumulate.
c0
w
Addr Data
0000 0
0001 c0
c1
x 0010 c1
0011 c0 + c 1
c2
y
1110 c1 + c 2 + c 3
1111 c0 + c 1 + c 2 + c 3
c3
z
Note to Figure 3:
(1) c0 to c3 are constant coefficients.
The addressing method and data values stored in the LUT in Figure 3
apply to the sum of multiplication operation mode. The addressing
method and LUT data values vary depending on the multiplier
implementation mode.
Implementing You can use the Stratix and Stratix GX M512 or M4K RAM memory blocks
and Cyclone M4K RAM memory blocks as LUTs to implement
Soft Multipliers multiplication for DSP applications. Combinations of the coefficient
Using Memory results are pre-calculated and stored in the M512 or M4K RAM blocks as
a LUT. The address port of the RAM block represents one of the
Blocks multiplication operands. The content of the RAM block at each address
represents a unique multiplication result calculated between the input
operand and a known coefficient value based on the multiplier mode
implemented.
Parallel Multiplication
Parallel multiplication involves multiplying all sections of a single input
bus or multiplier value with a single multiplicand or coefficient and
summing the partial product of each multiplication to obtain the final
result. All of the input bits are parallel-loaded into the RAM block address
port registers and a new multiplication is completed each clock cycle. For
example, a 16-bit input bus can be separated into two groups of eight bits
(one group of eight LSB bits and another group of eight MSB bits) and
simultaneously shifted into the address ports of two RAM blocks. The
output of the RAM blocks indicate the multiplication result for the
particular set of bits with the coefficient. Figure 4 represents the
decomposition of a 16-bit data input, 10-bit constant coefficient parallel
multiplier.
Input[15..0]
Coefficient[9..0]
Sign Extend
LSB Partial Product[18..0]
Mult_Result[25..0]
Figure 5. 16-Bit Input, 10-Bit Coefficient Parallel Multiplication Implementation Using M4K RAM Blocks as
LUTs Note (1)
ADDRESS MULT_RESULT
00000000 0
00000001 C
00000010 2*C
00000011 3*C
11111110 -2*C
11111111 -1*C
M4K RAM (1)
16 8 18
Block (LUT)
Input[15:0] 256 x 18
MSB (MSB)
<< 8
26
Output[25..0]
ADDRESS MULT_RESULT
00000000 0
00000001 C
00000010 2*C
00000011 3*C
11111110 254*C
11111111 255*C
Note to Figure 5:
(1) Optional pipeline register to increase system performance.
Figure 5 shows an implementation for a 16-bit data input, split into two
8-bit sections implemented using two M4K RAM blocks, one for the MSB
section and the other for the LSB section. For signed input buses, the M4K
RAM block that accepts the MSB bits must contain precalculated
coefficient values for signed inputs because the eight MSB bits that feed
this RAM block are treated as signed values. The M4K RAM block that
accepts the LSB bits must contain precalculated coefficient values for
unsigned inputs because the eight LSB bits that feed this RAM blocks are
unsigned values.
Because the size for M4K RAM blocks is 256 × 18 bits, the maximum
number of bits per section for each M4K RAM block for this coefficient
size is eight (28 = 256 addresses). The input bus and coefficient size
directly affects the number and configuration of RAM blocks used to
implement the multiplier. The parallel multiplication mode ensures
maximum data throughput (i.e., a new data value every clock cycle).
You can also implement the parallel fixed-coefficient multiplier using the
altmemmult Quartus II megafunction. You can use the MegaWizard®
Plug-In Manager to customize the altmemmult megafunction to specify
a parallel, fixed coefficient soft multiplier in your design. The input and
coefficient bit width settings as well as RAM block selection type
determine if the altmemmult function implements a semi-parallel or
parallel mode soft multiplier, whichever is more efficient. Figures 6 and 7
show the appropriate settings required to implement the both the MSB
and LSB M4K RAM blocks respectively, for the 16-bit input, 10-bit parallel
multiplier example shown in Figure 14. The coefficient implemented in
this example is a constant value of five.
Figure 6. altmemmult MegaWizard Settings for the MSB RAM Block 16-Bit
Input, 10-Bit Constant Coefficient Parallel Multiplier
Figure 7. altmemmult MegaWizard Settings for the LSB RAM Block for a 16-Bit
Input, 10-Bit Constant Coefficient Parallel Multiplier
The sload_data signal and the message located at the bottom right
hand corner of the MegaWizard window indicates whether the
altmemmult function chose to implement a semi-parallel or parallel
mode soft multiplier. A parallel soft multiplier does not have the
sload_data signal and the megafunction can accept a new input every
clock cycle. The altmemmult megafunction can only implement small
parallel mode soft multipliers (i.e., 8-bit input, 10-bit coefficient
multipliers). Larger parallel multipliers require multiple altmemmult
megafunctions to generate partial product results. To obtain the final
multiplication result, these partial products must be summed in an end-
stage adder implemented externally to the altmemmult function.
Fixed-Coefficient Multiplication
Figure 8 shows the simulation results for the example shown in Figure 5.
This example multiplies the input, which has a decimal value of 297, with
a coefficient, which has a value of 5.
Table 11 shows the implementation result for the parallel fixed coefficient
multiplication example shown in Figure 5. The example is implemented
using the altmemmult megafunction.
Note to Table 11
(1) Latency is the number of clock cycles required to complete a single multiplication
computation.
f You can download the files (parallel_fixed.zip) for the design described
in Table 11 from the Design Examples section of the Altera web site at
www.altera.com.
Figure 9. 16-Bit Input, 10-Bit Variable Coefficient Parallel Multiplication Implementation Using M4K Single-
Port RAM Blocks as LUTs Note (1)
ADDRESS MULT_RESULT
00000000 0
00000001 C
00000010 2*C
00000011 3*C
16 8
Input[15:0]
MSB
Coefficient 8 8 M4K RAM Block (LUT) (1)
256 x 18 18 11111110 -2*C
Address [7:0] 11111111 -1*C
(MSB)
Coefficient
Write Enable
18 << 8
MSB Coefficient
Input [17:0]
26
Output[25..0]
LSB
8 M4K RAM Block (LUT) (1)
256 x 18 18
ADDRESS MULT_RESULT
(LSB)
00000000 0
18 00000001 C
LSB Coefficient 00000010 2*C
Input [17:0] 00000011 3*C
11111110 254*C
11111111 255*C
Note to Figure 9:
(1) Optional pipeline register to increase system performance.
f You can download the files (parallel_var.zip) for the design described in
Table 12 from the Design Examples section of the Altera web site at
www.altera.com.
Semi-Parallel Multiplication
Semi-parallel multiplication involves multiplying sections of a single
input bus or multiplier value with a single multiplicand or coefficient and
shift accumulating the partial product of each multiplication to obtain the
final result. For example, a 16-bit input bus can be separated into four
groups of four bits that are consecutively shifted into the address port of
the RAM block once every clock cycle, beginning with the first four LSB
bits. The output of the RAM block indicates the multiplication result for
a particular set of bits with the coefficient, every clock cycle. Figure 10
shows the decomposition of a 16-bit data input, 14-bit coefficient semi-
parallel multiplier.
Input[15..0]
Coefficient[13..0]
Sign Extend
Partial Product[17..0]
Sign Extend
Shift 4 Bits Partial Product[21..4]
Sign Extend
Shift 8 Bits Partial Product[25..8]
Mult_Result[29..0]
Accumulate Results
from Each Multiply
Figure 11. 16-Bit Input, 14-Bit Coefficient Semi-Parallel Multiplication Implementation Using M512 RAM
Blocks as LUTs Note (1)
30
>> 4
30
1110 14*C
1111 15*C
Figure 11 shows an implementation for a 16-bit data input, split into four
4-bit sections implemented using a single M512 RAM block. In this
example, for the same memory block utilization, factors like the input bus
size help determine the output bit width and the latency of the multiplier.
Increasing the bit width of the sections (i.e., implementing more than
4-bit sections in this case) can reduce the latency of the multiplier. This
implementation may require more M512 RAM blocks or that you use
M4K RAM blocks.
The sload_data signal and the message located at the bottom right
hand corner of the MegaWizard window indicate whether the
altmemmult function chose to implement a semi-parallel or parallel
mode soft multiplier. A semi-parallel soft multiplier has an sload_data
signal and can only accept a new input after more than one clock cycle.
The semi-parallel multiplier in Figure 11 indicates that the 16-bit input is
split into four groups of four bits each. Because it takes four clock cycles
to load the entire 16-bits into the RAM block, the current input must
remain stable for four clock cycles prior to loading the new input. A high
signal on sload_data for one clock cycle indicates the start of a new
block of input data.
f You can download the files (semi_prl_var.zip) for the design described
in Table 14 from the Design Examples section of the Altera web site at
www.altera.com.
Sum of Multiplication
The sum of multiplication mode result is the weighted summation of
results produced by multiplying a set of input data (multiplier) to a set of
multiplicands. This sum forms the basis of a MAC function that is useful
in functions such as FIR filters, where each input data (multiplier) value
is multiplied with a particular coefficient (or multiplicand) and summed
to provide the final result.
In the sum of multiplication mode, each input bus shifts into the address
port of the memory block one bit per clock cycle, starting with the LSB. If
there are four inputs (called A, B, C, and D) to the multiplier block, at the
first clock cycle, the LSB of inputs A, B, C, and D forms the 4-bit address
value to the RAM block. The next clock cycle, the second LSB bit for each
input forms the next address value to the RAM block, and so on. For an
n-bit input data width, it takes n clock cycles to load in all of the data bits
required to compute the multiplication result. The RAM block output
indicates the multiplication result for a specific bit position at each clock
cycle.
Figure 14 shows the RAM LUT implementation of four 4-bit data inputs
and up to 16-bit constant coefficients. This fixed coefficient
implementation takes six clock cycles (four to load the input values into
the RAM block plus two pipeline delays) to complete the multiplication
operation by shift-accumulating the partial products obtained from the
RAM block once per clock cycle, according to their weights. Each shift-
accumulation of a partial product generates an extra carry bit. At the end
Figure 14. 4-Input Sum of Multiplication Implementation Using M512 RAM Blocks as LUTs
22
>> 1
A 22
1110 c1 + c 2 + c 3
1111 c0 + c 1 + c 2 + c 3 Output
Figure 15 shows the simulation result for an example based on Figure 14.
This example has additional pipeline stages and multiplies input A,
which has a binary value of 0001, with the c0 coefficient, which has a
value of -3.
Table 15 shows the implementation results of the four input, 16-bit fixed
coefficient sum of multiplication example shown in Figure 14.
f You can download the files (sum_mult_var.zip) for the design described
in Table 16 from the Design Examples section of the Altera web site at
www.altera.com.
You can combine multiple M512 blocks and/or M4K blocks to create
larger multiplier structures that are capable of multiplying more data
inputs and coefficients simultaneously. Figure 16 shows the
multiplication of eight 4-bit data inputs to eight 16-bit constant
coefficients in two M512 RAM blocks.
Figure 16. Using Multiple M512 RAM Blocks for an 8-Coefficient Multiplier
D 23
1 19 23
Output[22..0]
Note to Figure 16
(1) Optional pipeline register to increase system performance.
You can also create similar implementations using M4K RAM blocks,
particularly if the coefficients are larger than 16 bits. Figure 17 shows
multiplication of seven 16-bit data inputs to a 20-bit constant coefficient
in one M4K RAM block. The 128 addressed lines correspond to seven data
inputs or unique coefficients in a M4K RAM block. Performing seven 16
× 20-bit multiplications generates a 23-bit output from a M4K RAM block.
It takes 18 clock cycles to complete accumulation of the partial products
(16 clock cycles to shift the input values into the address port of the RAM
block plus two pipeline delays). After each partial product accumulation,
one bit is added to the total number of output bits, making the final
output 39 bits wide.
39
>> 1
A 39
B
C M4K RAM (1) 23 39
D Block (LUT) Output[38..0]
E 128 x 23
F
G
Hybrid Multiplication
The hybrid multiplication mode is a combination of the semi-parallel and
sum of multiplication modes where bit sections from two unique input
streams are multiplied with two different coefficients values. This mode
is useful in applications that require complex multiplication like fast
Fourier transforms (FFTs) where each signal generally has a real and
imaginary component that could be multiplied by two unique coefficient
values. The partial products obtained from each bit section within the
components are shift accumulated to obtain the final result.
block is four bits wide, each input contributes two bits to the partial
product calculation every clock cycle until the entire bit width of the
inputs have completely shifted into the RAM block. In this case, for an
input bus of 16-bits, it takes 8 clock cycles to shift in all of the data bits of
that particular input. The output of the RAM block indicates the sum of
multiplication result for a particular set of bits with the coefficients, every
clock cycle.
Figure 18 shows the RAM LUT implementation of two 16-bit inputs, each
labeled I Input and Q Input, respectively, and up to 15-bit constant
coefficients. This implementation takes 11 clock cycles (eight to load the
input values into the RAM block plus three pipeline delays) to complete
the multiplication operation by shift-accumulating the partial products
obtained from the RAM once per clock cycle, according to their weights.
Each shift-accumulation of a partial product generates two extra bits. At
the end of the last (eighth) partial product accumulation, the multiplier
generates a 32-bit output. The size of the input data helps determine the
output bit width and the latency of the multiplier.
Figure 18. Two-Input Hybrid Multiplication Implementation Using M512 RAM Blocks as LUTs
32
>> 2
MSB LSB
2 2 32
Input Q [15..0]
M512 RAM (1) 18 32
Block (LUT)
Output[31..0]
32 x 18
2 2
Input I [15..0]
Hybrid Multiplications Table
ADDRESS MULT_RESULT
0000 0
0001 Ci
0010 2*Ci
0011 3*Ci
Figure 19 shows the simulation results for an example based on Figure 18.
This example has additional pipeline stages and multiplies the I and Q
inputs, which have values of 300 and 55, respectively, with coefficients Ci
and Cq, which have values of 10 and 25, respectively (result = (input_I ×
Ci) + (input_Q × Cq) = (300 × 10) + (55 × 25) = 4375).
Start of Input Data Sequence Input Data Held First Partial Product Final Result
Indicated by Pulse of sload_data for 8 Clock Cycles Available on Clock Available on Clock
on Clock Cycle 1 Cycle 5 Cycle 13
Table 17 shows the implementation results of the two 16-bit input, 15-bit
constant coefficient hybrid multiplication example shown in Figure 18.
f You can download the files (hybrid_fixed.zip) for the design described
in Table 17 from the Design Examples section of the Altera web site at
www.altera.com.
f You can download the files (hybrid_var.zip) for the design described in
Table 18 from the Design Examples section of the Altera web site at
www.altera.com.
= 4ab
therefore:
Figure 20. 8-Bit Fully Variable Multiplier Implementation Using M4K RAM Blocks as LUTs
2
((a + b) )/4
8
Input A [7..0]
Output[15..0]
2
((a - b) )/4
Figure 20 shows an implementation for two 8-bit data inputs. 8-bit inputs
result in 16-bit outputs and 9-bit addresses per partial product RAM
block. Therefore, for each partial product, two M4K RAM blocks are
required in a 256 × 16 configuration (29 = 512 addresses). In this multiplier
mode, the size of the inputs directly affects the total number of RAM
blocks required.
f You can download the files (fully_var.zip) for the design described in
Table 19 from the Design Examples section of the Altera web site at
www.altera.com.
Firm Multipliers Firm multipliers use a combination of DSP blocks and LEs, enabling you
to increase the utilization efficiency of the DSP blocks within your Stratix
or Stratix GX device. Stratix and Stratix GX DSP blocks support 9 × 9,
18 × 18, and 36 × 36 multipliers. If you implement a multiplier of a
different size, some DSP blocks may be partially used. For example, a
12 × 9 multiplier uses two 9 × 9 DSP blocks because the 12-bit input
exceeds the maximum requirement of a single 9 × 9 multiplier. The first
9 × 9 DSP block is fully utilized but the second 9 × 9 multiplier is partially
used. Instead of using the partially utilized DSP block for the remaining
logic, you can use a firm multiplier to implement it, freeing the DSP block
for other use. This method is particularly useful if your design requires a
lot of DSP blocks but has LE resources available.
When deciding whether to select the 3-bit section from the MSB or the
LSB of the 12-bit input, keep in mind that an LE multiplier is more
resource efficient when implemented as a signed multiplier than as an
unsigned multiplier. If the 9-bit input is unsigned, the 3-bit section is
chosen from the MSB so that the LE multiplier performs signed
multiplication. If the 9-bit input is signed, you can choose the 3-bit section
from the MSB or LSB because either implementation results in a signed
multiplier implemented in LEs.
Input A [11..0]
Input B [8..0]
Sign Extend
Partial Product[17..0]
Mult_Result[20..0]
Accumulate Results
from Each Multiply
Based on this decomposition, you can build the circuit for the firm
multiplier using three main blocks:
The DSP block multiplier multiplies the 9-bit input by the 9-bit LSB
section of the 12-bit input. The LE-based multiplier multiplies the 9-bit
input with the 3-bit MSB section of the 12-bit input. The result of both
multipliers is the partial products of the decomposition. The results of the
partial products are weighted prior to being summed in the end-stage
adder. This weighting and addition restores the bit-alignment of the
partial products to ensure proper result values. Based on Figure 22, the
9 × 3 multiplication partial product is weighted by a shift to the left of
nine bits. The 12-bit end-stage adder has to accommodate the 12-bit result
of the 9 × 3 multiplication and the nine MSBs of the 9 × 9 multiplication,
sign extended.
Output [20..0]
9
Input A [8..0]
Unsigned (1) 18 DSP Mult [17..9] 9 9
9
Input B [8..0] DSP Mult [8..0]
DSP Block Multiplier
Input A [11..0]
Input B [11..0]
Partial Product[17..0]
Sign Extend
Shift 9 Bits Partial Product[20..9]
Mult_Result[23..0]
Accumulate Results
from Each Multiply
The circuit for the firm multiplier can now be extracted from the
decomposition. The firm multiplier circuit consists of five main blocks:
The DSP block multiplier multiplies the two 9-bit LSB sections of the
12-bit inputs. The first LE-based multiplier multiplies the 9-bit LSB
section of one 12-bit input with the 3-bit MSB section of the other 12-bit
input. The other LE-based multiplier multiplies the 3-bit MSB of one 12-
bit input with the entire 12-bits of the other input. The results of these
three multipliers are the three partial products of the decomposition. The
results of these partial products are summed in two stages (using two
adders) prior to producing the final output.
Figure 26 shows the two adder stages within the final circuit of the 12 × 12
firm multiplier.
12 12
Input A [11..0]
Input A [11..0]
(1) 15 LE Mult2 [14..0] 15
P3 [23..9]
<< 9
3
12 Input B [11..9]
Input B [11..0]
LE Multiplier
15
P4 [23..9]
3 Output [23..0]
Input A [11..9]
(1) 12 P1[20..9]
12 LE Mult1 [11..0]
<< 9
9
Input B [8..0]
LE Multiplier 12 P2 [20..9]
9
Input A [8..0]
Unsigned (1) 9 P0 [17..9]
18
9 P0 [8..0]
9
Input B [8..0]
Unsigned DSP Block Multiplier
Conclusion Although Stratix and Stratix GX DSP blocks are useful for implementing
DSP applications, you can also use Stratix and Stratix GX TriMatrix blocks
(M512 or M4K RAM blocks) or Cyclone M4K RAM blocks for designs that
need more multipliers than are available using DSP blocks alone. For
example, using soft multipliers, you can increase the number of 16 × 16
multipliers in a Stratix E1S80 device by a factor of more than 7 see Table 9
on page 5). Another example, the fully variable soft multiplier is an ideal
implementation for applications requiring smaller multipliers with
frequently varying coefficients. Other soft multiplier modes are more
resource efficient and better suited for applications that do not require
frequent coefficient updates. The firm multiplier allows you to balance
the use of DSP block multipliers with LE-based multipliers, allowing
more efficient use of the Stratix and Stratix GX DSP blocks.
Copyright © 2003 Altera Corporation. All rights reserved. Altera, The Programmable Solutions Company,
the stylized Altera logo, specific device designations, and all other words and logos that are identified as
trademarks and/or service marks are, unless noted otherwise, the trademarks and service marks of Altera
Corporation in the U.S. and other countries. All other product or service names are the property of their re-
spective holders. Altera products are protected under numerous U.S. and foreign patents and pending
101 Innovation Drive applications, maskwork rights, and copyrights. Altera warrants performance of its semiconductor products
San Jose, CA 95134 to current specifications in accordance with Altera's standard warranty, but reserves the right to make chang-
(408) 544-7000 es to any products and services at any time without notice. Altera assumes no responsibility or liability
arising out of the application or use of any information, product, or service described
www.altera.com herein except as expressly agreed to in writing by Altera Corporation. Altera customers
Applications Hotline: are advised to obtain the latest version of device specifications before relying on any pub-
(800) 800-EPLD lished information and before placing orders for products or services.
Literature Services:
Printed on recycled paper
[email protected]