Simplex Algorithm

Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

An FPGA Implementation of the Simplex Algorithm

Samuel Bayliss #1 , Christos-S. Bouganis #2 , George A. Constantinides #3 , Wayne Luk ∗4


#
Department of Electrical and Electronic Engineering
Imperial College London
University of London
South Kensington campus
London SW7 2AZ, UK
1
[email protected]
2
[email protected]
3
[email protected]

Department of Computing
Imperial College London
University of London
South Kensington campus
London SW7 2AZ, UK
4
[email protected]

Abstract— Linear programming is applied to a large variety or maximized is linear and the constraints can be modelled
of scientific computing applications and industrial optimization as a system of linear equations. The application of linear pro-
problems. The Simplex algorithm is widely used for solving gramming techniques is commonly associated with operational
linear programs due to its robustness and scalability properties.
However, application of the current software implementations of research problems. Real world applications of linear program-
the Simplex algorithm to real-life optimization problems are time ming can be found in fields as varied as Aircraft and Crew
consuming when used as the bounding engine within an integer Scheduling [1], Portfolio Optimization [2] and Staff Rostering
linear programming framework. This work aims to accelerate the [3]. The use of linear programming in graph optimization and
Simplex algorithm by proposing a novel parameterizable hard- set partition problems [4] makes it an important tool for use in
ware implementation of the algorithm on an FPGA. Evaluation
of the proposed design using real problems demonstrates a speed- many scientific computing applications. In hardware synthesis
up of up to 20 times over a highly optimized commercial software field, several applications of linear programming and integer
implementation running on a 3.4GHz Pentium 4 processor, which linear programming have been reported. In [5], the authors
is itself 100 times faster than one of the main public domain give a scheduling formulation for high level synthesis, where
solvers. in [6] the authors demonstrate the use of ILP in optimal
I. I NTRODUCTION wordlength allocation in digital hardware.
The Simplex algorithm [7], [4] provides a robust tool for
Linear programming is a scientific computing application
solving problems modelled using a linear programming frame-
which provides a general framework for describing optimiza-
work. Almost all the commercial and research tools available
tion problems as a linear objective function and a set of
for linear programming use some variant of the Simplex
linear constraints. A formulation for a maximization problem
algorithm [MINOS, CPLEX]. In 1972, Klee and Minty [8]
is shown in (1), where x is a vector with the variables, A is a
demonstrated pathological examples where, in the worst-case,
matrix, and c and b are vectors of coefficients. The vector
the number of iterations of the Simplex algorithm required to
inequalities are interpreted as satisfied if and only if they
find an optimal solution is exponential in the number of con-
are satisfied component-wise. A minimization problem can be
straints. However, Borgwardt [9] derives a probabilistic model
formed by negating the objective function coefficients of the
which shows that under certain assumptions, the expected
maximization problem.
number of iterations required varies linearly with the number
max cT x (1) of constraints. This makes the Simplex algorithm a good
subject to Ax ≤ b candidate for fast practical solutions to linear programming
problems.
x≥0
A class of problems of particular interest are integer linear
Linear programs are characterized by the number of vari- programming problems. These add the constraint that all the
ables used to define the objective function and constraints, n, variables in any feasible solution take integer values. Real-
and the number of constraint equations used to define feasible world problems often require these constraints, since the enti-
solutions to the problem, m. ties modelled by variables can be indivisible. Moreover, the in-
Linear programming is a useful tool for solving all manner troduction of integer variables allows logical constraints, such
of problems in which the objective function to be minimized as dichotomy, to be modelled within a linear programming

0-7803-9729-0/06/$20.00  2006 IEEE 49 FPT 2006

Authorized licensed use limited to: IEEE Xplore. Downloaded on February 17, 2009 at 10:54 from IEEE Xplore. Restrictions apply.
framework. The branch-and-bound methods used to solve
integer linear programs and mixed integer linear programs, M aximise Z = 2x1 + x2
which contain both integer-constrained and unconstrained vari-
ables, typically proceed by a sequence of many non-integer Subject to x1 ≤4
relaxation problems which together are used to derive the
final solution. An LP relaxation is the derivation of an LP
x2 ≤ 4
problem from an ILP by using the same objective function x1 ++ x2 ≤ 5
and set of constraints, and replacing the integer variables
by continuous constraints. The considerable time needed to
solve these relaxation problems often makes integer linear
programming an unattractive tool for large problems and sub-
optimal heuristic methods are often used in place of ILP
even when a suitable problem formulation is available. A
faster implementation of the Simplex algorithm which took
advantage of opportunities for parallel computation within
each iteration, and pipelined relaxation problems to increase
throughput would allow us to derive optimal solutions to many
ILP problems previously considered too large to be tractable.
This paper outlines research into a stream-based FPGA
hardware implementation of the Simplex algorithm designed
to perform much faster than traditional load-store processor- Fig. 1. Feasible Region for problem with n = 2 and m = 3
based implementations. By exploiting parallelism inherent
in the algorithm and eliminating overheads associated with
external memory latencies, the proposed hardware architec- with n decision variables, two CPF solutions are said to be
ture significantly out-performs conventional software-based adjacent if they share n − 1 common constraint boundaries.
Simplex implementations even at modest clock-speeds. Key When interpreted geometrically, the Simplex algorithm
contributions include: moves from one corner-point feasible solution to a better
• a study of the opportunities for parallelism presented corner-point-feasible solution along one of the constraint
within the Simplex algorithm, boundaries. There are only a finite number of CPF solutions,
• to our knowledge, the first FPGA implementation of the although this number is potentially exponential in n, however
Simplex algorithm, it is not necessary to visit all of them to determine the
• an implementation of the Simplex algorithm, using a optimal solution to the problem. The convex nature of linear
2D block partitioning which scales to useful problem programming means that there are no local maxima present
sizes (up to 751 constraints in 751 variables for a Virtex in the problem which are not also global maxima. Hence if
XC4VFX140 device using 18 bits precision). Results at some CPF solution, no improvement is made by a move to
demonstrate an up to 20 times speed up over commercial another adjacent CPF then the algorithm terminates and we
packages. can be confident that the optimal solution has been found.
This geometric basis for the Simplex algorithm is expressed
II. S IMPLEX A LGORITHM
algebraically as a system of equations. The inequality con-
A. Primal Simplex Algorithm straints are converted to equality constraints by the introduc-
If the unconstrained solution space is defined in n dimen- tion of slack variables. The result is a set of m equations in
sions (each dimension assumed to be infinite), each inequality n + m variables giving us n degrees of freedom in exploring
constraint in the linear programming formulation divides the possible solutions. At each Simplex iteration, variables are
solution space into two halves. The convex shape defined in designated either as basic or non-basic, and the n non-basic
n-dimensional space after m bisections represents the feasible variables are set equal to zero. The solution to the resulting
area for the problem, and all points which lie inside this system of equations defines a basic solution to the problem.
space are feasible solutions to the problem. Figure 1 shows Moving from one basic solution to another involves switching
the feasible region for a problem defined in two variables, one variable from basic to non-basic and adjusting the values
n = 2, and three constraints, m = 3. Note that in linear of the basic variables to continue satisfying the system of
programming, there is an implicit non-negativity constraints equations (a pivoting operation).
for the variables. The Simplex problem is typically presented in the form of
The linearity of the objective function implies that the the a Simplex tableau. The reduced costs, the coefficients derived
optimal solution cannot lie within the interior of the feasible from pivoting operations on the objective function, are stored
region and must lie at the intersection of at least n constraint in row zero of the (m + 1) x (n + m + 1) tableau, and the
boundaries. These intersections are known as corner-point- right hand size of the constraint equations b is stored in the
feasible (CPF) solutions. In any linear programming problem final column. Figure 2 shows pseudo code for the Simplex

50

Authorized licensed use limited to: IEEE Xplore. Downloaded on February 17, 2009 at 10:54 from IEEE Xplore. Restrictions apply.
Serial Shift
S IMPLEX A LGORITHM(A, b, c) In

(N, B, A, b, c, v) ← I NITIALIZE(A, b, c)
# Optimality Test #
1 if cj ≥ 0 for all index j ∈ N Interface Pricing Ratio Pivot
block block test control
then
# current solution is optimal #
2 return solution
else Serial Shift
Out FIFO Column
# Pricing Test # Buffers latch
3 Select an index k ∈ N for which ck < 0 Pivot
4 Find the index i ∈ B that has the minimum block
bi /aik and aik > 0
5 if such index exists
then
Dividers Circular
# Pivot Step #
block Buffer
6 (N, B, A, b, c, v) ← P IVOT(N, B, A,
, b, c, v, k, i)
else
Fig. 3. Block diagram of Simplex implementation
7 return “unbounded ”
8 goto step 1
Alongside these opportunities for intra-iteration parallelism,
Fig. 2. Pseudo-Code for Simplex Algorithm
our implementation allows the streaming of several problems
in a pipelined fashion through the hardware architecture. This
algorithm. Fundamentally the steps taken in a single iteration inter-iteration parallelism adds a further performance edge
of the Simplex algorithm are over conventional sequential implementations of the Simplex
algorithm.
Select Entering Variable: Choose a pivot column j such
that the reduced cost (row 0) is negative. If no III. D ESIGN A RCHITECTURE
column exists such that the reduced cost element is
While [7] shows the expected number of iterations of the
negative then the optimal solution has been found
Simplex algorithm for certain problems varies linearly with
and the algorithm terminates.
the number of problem constraints, the exponential worst-case
Ratio Test: For each positive element in the column in-
iteration count means it is infeasible to consider unrolling the
dexed by j calculate the ratio δi = bi /aij . The index
algorithm to solve complete problems in a linear pipeline. Thus
i which minimizes the ratio identifies the pivot row.
a circular pipeline structure is adopted, with each problem
If δi ≤ 0 for all i = 1 . . . m then the problem is
iteration feeding back in a circular fashion until an optimal
unbounded, i.e. there is no optimal solution.
solution is found.
Pivot: Divide all the elements in the pivot row with index i
The Simplex algorithm is an iterative algorithm since iter-
by a scalar such that the coefficient which lies in the
ation n + 1 is unable to begin execution before iteration n
pivot row and pivot column becomes one. Subtract
has completed. The inevitable latency of the design means
multiples of that row from all the other rows such that
hardware is left idle when working on a single problem.
all the other elements in the pivot column become
Pipelining several different problems through the hardware
zero.
allows us to hide the design latency and achieve high through-
Several opportunities for parallelism can be exploited within put. In solving integer linear programming problems, we
each Simplex iteration. typically have to solve many different Simplex relaxation
• The pivoting operation used to transform the Simplex problems. The pipelined implementation presented here allows
tableau on each iteration is performed by subtracting several different relaxations to be processed simultaneously at
multiples of the pivot row from every other row in the different stages within the hardware. Figure 3 shows a block
tableau. This operation is typically expensive to perform diagram for the proposed pipelined architecture. The design
sequentially on conventional computer hardware. How- uses a 2D block partitioning scheme, breaking the Simplex
ever within a hardware implementation, each of these tableau into small regularly sized blocks, which enter the
array operations can be performed in parallel. problem pipeline sequentially. Figure 4 shows the 2D block-
• The selection of an entering variable can be performed partitioning of a Simplex tableau and indicates the position of
in parallel using a tree of comparators. the reduced costs row within the first p blocks to enter the
• The ratio test used to select a pivot row can also be system, and the right hand column of the linear programming
performed in parallel. formulation within the corresponding blocks.

51

Authorized licensed use limited to: IEEE Xplore. Downloaded on February 17, 2009 at 10:54 from IEEE Xplore. Restrictions apply.
through the hardware are necessary to read-out the optimal
results and swap-in the next problem. The interfacing block
contains a parallel-in serial-out shift register and logic to
support the scheduling of new problems, stalling the pipeline
if new problems must be entered.

B. Pricing block
A range of different pricing strategies is discussed in the
literature for selecting an entering variable for the basis
function. Our implementation uses the steepest edge criterion
first suggested by Dantzig [7]. This selects the most negative
coefficient in the reduced costs row to enter the basis. The
first row of blocks to enter the system contains the reduced
price coefficients used to select the entering variable column.
A binary tree structure of comparators is used to select the
most negative coefficient from each block. This coefficient is
compared to the most negative coefficient from the preceding
blocks in the current problem and if found to be more negative,
the value is latched for comparison with subsequent blocks.
The fully pipelined pricing block generates horizontal and
vertical sync control signals, selects the appropriate entering
Fig. 4. 2D block partitioning of Simplex tableau when s is the vertical
dimension of each block, r is the horizontal dimension of each block, p is column, and latches the entering column and right hand side
the number of blocks in each tableau row and q is the number of blocks in column from each row of blocks which flow through it. These
each tableau column. column vectors are passed to the ratio test block.

C. Ratio test
The 2D block partitioning scheme allows a great deal of A number of different strategies can be considered for
flexibility in trading off FPGA area and overall performance. finding the elements with the smallest ratio from two vectors
Larger block sizes means more computation can be performed of numbers. Clearly a division operation can be used to find
in parallel and therefore reduces the time taken to perform the ratio of each pair of numbers and the resulting scalar
each Simplex iteration. The aspect ratio of the blocks in the numbers are compared using a tree of comparators. The results
design is unconstrained and can therefore be parameterized to published in this paper refer to an implementation using cross-
match the aspect ratio of a target problem. The 2D partitioning multiplication of the candidate vectors using a sequential
scheme also allows different problems with varying sizes, multiplier implemented in LUTs within the FPGA.
although sharing a fixed block size, to be streamed through the
hardware simultaneously. The ability to interleave problems D. FIFO Buffer Implementation
of different sizes allows many different problems to be solved The full Simplex tableau must be read into the design before
simultaneously. That has a major impact in the solution of the algorithm selects the appropriate pivot row. Hence the
ILP problems where the different relaxations to the ILP have pivot operation cannot begin until all the blocks in a given
different problem dimensions. In addition, this is a desirable problem have been read into the system. The block elements
feature in embedded optimization algorithms where decisions are stored in FIFO buffers implemented using dual-port Block-
required for adaptive behavior have to meet hard deadlines. RAM. These embedded memories are driven by a two times
faster clock derived from an embedded delay-locked-loop.
A. Interface block Alternate cycles are used for sequential access to the data
Blocks enter the design in a row-major fashion beginning (i.e. FIFO mode), using one memory port to read and another
from the top left-most block of the Simplex tableau. As data to write, and random access to elements held within the FIFO
leaves the pipeline, data is reintroduced to the pricing block buffer using both memory ports as two independent random
through the interfacing block. This block allows the extraction access channels. This allows pivot row elements and entering
of optimal problems from the pipeline and the introduction column elements to be pre-loaded from memory whilst data
of new problems without disruption to other problems in the is streamed out through the pivoting block ensuring maximum
pipeline. When the pricing module finds no further negative throughput in the design. The dual-port memory blocks in
reduced costs coefficients, the problem is flagged as optimal a modern FPGA claim operation at up to 550MHz and so
and passes through the hardware without pivoting. The band- few implementation problems were encountered in running the
width of blocks flowing around the iteration loop even using a memories at double clock speed within our design.
modest 3 × 3 block partitioning far exceeds the bandwidth of The loading of data from the FIFOs into the circular buffer
regular off-chip interfaces. Therefore multiple iteration cycles and the column-latch block used for pivoting is coordinated

52

Authorized licensed use limited to: IEEE Xplore. Downloaded on February 17, 2009 at 10:54 from IEEE Xplore. Restrictions apply.
TABLE I
by the pivot-control block. This uses an address offset from
S YNTHESIS RESULTS USING DIFFERENT DESIGN PARAMETERS
the beginning of the problem data to load the entering column
Block Word- Slices RAMs MULs Freq.
data and an address offset from the end of the problem data to size length
load pivot row data into the dividers used for pivot operation. 4×4 18 bit 10,067 16 16 117MHz
The pivot control block also keeps track of parameters for 4×4 12 bit 6,139 16 16 130MHz
8×8 18 bit 27,411 64 64 108MHz
problems within the pipeline such as problem ID, horizontal 12 × 12 18 bit 48,036 144 144 105MHz
and vertical block counts, optimality, degeneracy and overflow
flags.
from the pivoting operation, a problem well understood in
E. Circular Buffer and Pivot Blocks
numerical analysis in the context of Gaussian Elimination
The pivoting operation generates a new basic feasible solu- [11]. The proposed implementation is fully parameterizable
tion by elementary row operations. The pivot value refers to by wordlength, so that numerical accuracy can be guaranteed;
the number which lies in both the entering column and pivot results have been collected for both 12-bit and 18-bit datap-
row. The chosen pivot row is divided by a scalar value to force aths, although the wordlength required for convergence to the
the pivot value to one. Multiples of this row are subtracted optimal CPF solution will be problem dependent in general.
from each of the other rows in the matrix to force all the
other elements in the pivot column to zero. V. E VALUATION
In the hardware implementation, elements from the pivot A. Synthesis Results
row are loaded into a fully pipelined divider from the FIFO After verification of the behavior of the design using syn-
buffer using one of the two random access channels. Results thetic problems, the design was synthesized using Xilinx XST
from the divider are stored in a circular buffer implemented 7.1 and implemented in a Virtex 4VFX140 device. Table I
in distributed RAM. shows the area and clock-speed achieved varying the block size
At the beginning of each row of blocks, the appropriate and word-length used in the design. A 4 × 4 block size refers
elements from the pivot column are loaded from the FIFO to the block partitioning size, where larger blocks imply more
buffers using the second random access channel and latched. parallelism and hence reduced problem latency and increased
The hardware is designed such that as each block blocki,j is throughput. The proposed design is pipelined at the block
loaded sequentially from the FIFO, the appropriate portion of level, able to process one r × r block per clock cycle. These
the divided row i is retrieved from the head of the circular designs consume between 11% and 84% of the 4VFX140.
buffer and multiplied by the value held in the latched column. Using the results in Table I we can interpolate/extrapolate
The data are multiplied together and subtracted from all the the requirements in area given a block size r × r and the
elements in the tableau except from the elements belonging in wordlength w of the system. The main area in a block is
the pivot row. allocated to the dividers that are needed for pivoting, which is
A block of data are pivoted using a chain of single pro- a quadratic function of the wordlength used. Thus, the area of
cessing elements. Each processing element contains a fully the design in slices, Aslices , can be approximated using (2).
pipelined multiplier, embedded multipliers are used, two shift
registers to match the delay through the divider, and a sub- Aslices = c1 rw2 + c2 r2 w (2)
tractor, which is implemented using LUT fast-carry chains. Using the data from Table I and linear regression, we obtain
the values of the coefficients: c1 = 6.73 and c2 = 8.02. Figure
IV. S CALABILITY AND NUMERICAL STABILITY
5 shows the predicted area using (2) for different values of the
The size of problems solvable using the demonstrated Sim- wordlength and the block size. The current synthesized designs
plex implementation is bounded by the FIFO buffer-size. The are also plotted in the graph.
implementation uses a single Xilinx Block-RAM per pivoting-
element. With an 18 bit wordlength, each FIFO is capable of B. Benchmark Performance
buffering up to 1024 entries. For the 4 × 4 block size, this Table II shows five benchmarks selected from the netlib
places an upper limit on the problem size, assuming problems library [12] to test the performance of the design. The selected
have a 1:1 aspect ratio of variables to constraints, of 751×751. benchmarks were timed running on a 3.4GHz Pentium 4 with
This limit constraints the size of the LP problems that can 1GB of RAM, using a public available LP solver, Lp-Solve,
be addressed by the proposed system. However, this limit of and a commercial software CPLEX. The time taken to load
the maximum problem size is large enough to fit real-life LP the problems into memory and initialize the basic solution
problems that are produced by relaxations of ILP problems. were stripped from the times that are reported. Table III
Using more than one Block-RAM per pivoting-element would illustrates the obtained results from the two software packages
allow scaling to larger problems. and alongside we present the time taken using our smallest
A study of the numerical stability of the Simplex algorithm hardware implementation which is a 4 × 4 block partitioning
in its various forms can be found in [10]. Accumulation of running at 117MHz. It should be noted that the Block-RAM is
rounding error, both in hardware and in software implemen- clocked at double the clock frequency. The results demonstrate
tations, is a major cause of instability. This essentially results that a considerable speed-up is achieved using the proposed

53

Authorized licensed use limited to: IEEE Xplore. Downloaded on February 17, 2009 at 10:54 from IEEE Xplore. Restrictions apply.
It should be noted that the degree of speedup over software
4
x 10 will be a function of the sparsity of the A matrix in (1), as
9
CPLEX includes sophisticated procedures to take advantage
8
of patterns of sparseness. However, despite this, the proposed
7
architecture achieves up to 20x speedup on these real-world
Number of slices

6
problems, which are not fully dense in nature; even greater
5
speedups are likely on dense problems.
4

3 VI. C ONCLUSIONS
2 This paper presents a novel, scalable, and parameterizable
1
15
architecture for FPGA implementation of the Simplex algo-
0
10 10
rithm. The scalability of the proposed architecture makes it
12
14
16
18
5 applicable to real-life problems, especially when a pipeline can
20
22 0 be filled with many relaxations of an initial large integer linear
Block size
Wordlength program. A Xilinx Virtex 4 implementation of the proposed
architecture has been achieved, with a datapath clock-rate
Fig. 5. Prediction of the required area for different values of the design’s of 105 to 117MHz and a double-rate memory subsystem,
wordlength and block size. The actual synthesized designs from Table I are
also superimposed.
running at 210 to 234MHz. The partitioning of the problem
can be varied at design time to match the required device
TABLE II size, and an area model is presented allowing this to be
A SELECTION OF BENCHMARK PROBLEMS SELECTED FROM THE NETLIB done: the throughput of an r × r block partition increases
ONLINE REPOSITORY [12]. quadratically with r, as does the area requirement. The Virtex
Benchmark Constraints (m) Variables (n) 4 implementation demonstrates that speedups of up to 20
Adlittle 57 97 times (median 8.9 times) are achievable over a state-of-the
Afiro 28 32 art commercial solver running on a 3.4GHz PC with 1GB of
Blend 75 83
Recipe 92 180 RAM, and 100 times more compared to a commonly-used
Share2b 97 79 public domain solver.
Future work is likely to involve the integration of this
design into a larger framework for hardware-based branch-
architecture even when its performance is compared against a and-bound solution of integer linear programming problems.
commercial program like CPLEX. The median speedup over Moreover, specialisation of the design to sparsity patterns
the CPLEX software package is 8.9 times. known at design-time appears to be a promising direction
Although the software time shown in Table III can be for future speed-up on particular classes of problem; robust
measured in microseconds, it should be remembered that this software solvers often contain such a ”toolbox” of special case
is the time for a single Simplex iteration. The number of approaches.
such iterations when solving an ILP is typically exponential in ACKNOWLEDGMENT
the problem size. For example, the online repository MIPLIB
The authors wish to acknowledge the financial support
[[13], arki001] cites an example problem with 1048 constraints
of the EPSRC under the platform grant EP/C549481/1 and
and 1388 variables taking one month of CPU time and involv-
the UK Research Council (Basic Technology Research Pro-
ing 100 million complete relaxations. Problems that would fit
gramme “Reverse Engineering Human Visual Processes”
within our existing FPGA design are also reported as taking 14
GR/R87642/02).
hours [[13], noswot]. Thus a 10x speedup in solution for the
inner-loop of such procedures is a critically important factor R EFERENCES
in solving large scale integer linear programming problems. [1] C. Martin, D. Jones, and P. Keskinocak, “Optimizing on-demand aircraft
schedules for fractional aircraft operators,” Inferfaces, vol. 33, no. 5,
2003.
TABLE III [2] E. I. Ronn, “A new linear programming approach to bond portfolio
C OMPARISON OF PER - ITERATION PERFORMANCE OF S OFTWARE AND management,” Journal of Financial and Quantitative Analysis, vol. 22,
no. 4, 1987.
H ARDWARE IMPLEMENTATIONS OF S IMPLEX (H ARDWARE [3] B. Gendron, “Scheduling employees in quebec’s liquor stores with
IMPLEMENTATION IS 4 × 4 B LOCK PARTITIONED DESIGN AT 117MH Z ) integer programming,” Interfaces, vol. 35, 2005.
[4] F. S. Hillier and G. J. Lieberman, Introduction to Mathematical Pro-
Benchmark Iteration Time Speed-up gramming, 2nd ed. McGraw-Hill Inc, 1995.
Software Hardware [5] L.-Y. Wang and Y.-T. Lai, “Graph-theory-based simplex algorithm for
Lp-Solve CPLEX Lp-Solve (CPLEX) VLSI layout spacing problems with multiple variable constraints,” IEEE
Adlittle 3.43ms 0.06333ms 0.0030ms 1143 (21.1) Transactions on Computer-Aided Design of Integrated Circuits and
Afiro 6.35ms 0.07571ms 0.0070ms 907 (10.8) Systems, vol. 20, no. 8, pp. 967–979, 2001.
Blend 3.33ms 0.03500ms 0.0035ms 951 (10) [6] G. A. Constantinides, P. Y. K. Cheung, and W. Luk, “Optimum and
Recipe 6.06ms 0.06333ms 0.0096ms 631 (6.6) heuristic synthesis of multiple word-length architectures,” IEEE Trans.
Share2b 3.72ms 0.03082ms 0.0044ms 845 (7) Very Large Scale Integr. Syst., vol. 13, no. 1, pp. 39–57, 2005.

54

Authorized licensed use limited to: IEEE Xplore. Downloaded on February 17, 2009 at 10:54 from IEEE Xplore. Restrictions apply.
[7] A. Schrijver, Theory of Linear and Integer Programming. Wiley and
Sons, 1972.
[8] V. Klee and G. J. Minty, Inequalities, III. Academic Press, 1972, ch.
How good is the simplex algorithm?, pp. 159–175.
[9] K. H. Borgwardt, The Simplex Method, A Probabilistic Analysis.
Springer-Verlag, 1987.
[10] S. S. Morgan, “A comparison of simplex method algorithms,” Master’s
thesis, University of Florida, 1997.
[11] N. J. Higham, Accuracy and Stability of Numerical Algorithms, 2nd ed.
SIAM, 2002.
[12] “http://www.netlib.org/.”
[13] T. Achterberga, T. Koch, and A. Martin, “The mixed integer program-
ming library: Miplib 2003,” http://miplib.zib.de/, 2003.

55

Authorized licensed use limited to: IEEE Xplore. Downloaded on February 17, 2009 at 10:54 from IEEE Xplore. Restrictions apply.
56

Authorized licensed use limited to: IEEE Xplore. Downloaded on February 17, 2009 at 10:54 from IEEE Xplore. Restrictions apply.

You might also like