Wolfe's Algorithm
Wolfe's Algorithm
Wolfe's Algorithm
Philip W O L F E
IBM Thomas J. Watson Re,~earch Center, Yorktown fteights, N.Y,, U.S.A.
An algorithm is developed for the problem of finding the point of smallest Euclidean
norm in the convex hull of a given finite point set in a Euclidean space, with particular
attention paid to the description of the procedure in geometric terms.
Dedication
1. Introduction
is the convex hull ofQ. [e is the column vector (1, 1. . . . , 1)r. The number
of components in vectors like e and w is to be inferred from the context:
here, k]. The set Q is affinely independent if q ~ A(Q \ { q }) is false for all
q e Q. Finally, for any X ~ E"
H ( X ) = { y~En: X T Y = x T X I
will denote the (n - 1)-dimensional affine set passing through the point
X and normal to the line through X and the origin.
P. Wolfe, Algorithm for a least-distance programming problem 193
Algorithm
Step O. Find a point of P of minimal norm. Let X be that point and
Q= {x}.
Step 1. If X = 0 or H(X) separates P from the origin, stop. Other-
wise, choose PJ e P on the near side of H(X) and replace Q by Q w {PJ }.
Step 2. Let Y be the point of smallest norm in A(Q). If Y is in the
relative interior of C(Q), replace X by Y and return to step 1. Otherwise
Step 3. Let Z be the nearest point to Y on the line segment C(Q) n X Y
(thus a boundary point of C(Q)). Delete from Q one of the points not
on the face of C(Q) in which Z lies, and replace X by Z. Go to step 2.
by the replacement in step 2 (the major cycle has no minor cycle) the
value of I X ] is reduced, since the segment X Y intersects the interior of
C(Q w {Pj}), and IX[ strictly decreases along that segment. For the
same reason, the first minor cycle, if any, of a major cycle also reduces
I X], and subsequent minor cycles cannot increase it. Thus I x I is
reduced in each major cycle. Since X is uniquely determined by the
corral on hand at step l, no corral can enter the algorithm more than
once. Since there is but a finite number of corrals, the algorithm must
terminate, and it can only do so when the problem is solved.
The reader familiar with the Simplex Method for linear programming
will notice its close relationship to our algorithm, particularly in Dant-
zig's description [7, Section 7-3] emphasizing the role of simplices in
visualizing his method.
Pi = (0,2)
P2=(3,o)
Fig. 1.
Example (see Fig. 1). R = (0.923, 1.385) is the nearest point to O on P1P2.
S = (0.353, 0.529) is the intersection of OR and P2P3. T = (0.115, 0.577)
is the answer.
Here are the steps taken, and the results:
Step X Q Y
0 P1 P1
1 P1 P1, P2
2 R do R
1 R P1, P2, P3
2 R do O
3 S P2, P3
2 T do T
1 Stop
P. Wolfe, Algorithm for a least-distanceprogramming problem 195
be nonsingular. (To prove the latter, suppose there were /~, u so that
eru=O, e # + Q V Q u = O . Then O = u T ( e # + Q V Q u ) = l Q u l z, so
Q u = 0, whence Q is affinely dependent.) The sets A(Q) and C(Q) then
have dimension k - 1, and C(Q) is a nondegenerate simplex whose
vertices are the points of Q, while all the faces of dimension p of that
simplex are the convex hulls of all the subsets of p + 1 points of Q. Also,
the smallest face of C(Q) containing a given point X in its relative
interior is uniquely defined: its vertices are those points for which the
barycentric coordinate wj of X ~ Q w in Q is positive.
When Q is affinely independent, we can find the projection X of any
point Y on A(Q), that is, solve the problem
minimize I Y - X I 2 = I r - Q u l z,
subject to eTu = l.
eXu = 1
e ~ + Q'rQ u = QTy, (3.2)
minimize wr P r P w,
(4.1)
subject to w >- O, eVw = 1.
uTw ~ O,
P ~ P w + e). = v, (4.2)
eTw = 1.
These conditions are just the necessary conditions of Kuhn and Tucker
[11] applied to (4.1), and are, of course, also sufficient for this problem.
They have a simple geometric interpretation: Noting that
2 = w % 2 = wry - w T p T p W = -- X T X ,
v = pTx -- e X T X = (P - X eT)rX
(4.3)
2 w1 wz
o eT 4 I= 1
el QTQ QTR = vl (4.4)
e2 RTQ RTR = v2
P. Wolfe, Algorithm for a least-distance programming problem 197
QO_J"
The system (4.4) has the simple complementary solution found by
setting wz = 0, vl = 0, and solving for wl and v2; it is exhibited ex-
plicitly by performing a block pivot on M, yielding the tableau (4.5)
below where S = R T R - [e2, R T Q ] M - l [e2, RTQ] v.
- 1, - vl W2
M- 1 m - 1[e2, RTQ]T
(4.5)
-- [e 2, R T Q ] M - 1 S /)2
The system (4.5) is almost, but not quite, that of the "fundamental
problem" of complementary pivot theory [6, 12], but it is easy to see
how to bring it to that by discarding the first row of (4.5), separating
out the first column, and changing signs appropriately to obtain a
system of the form
W2 l) 2
It will be more convenient for us to stay with the equivalent form (4.5).
Let us number the rows and columns of the tableau (4.5) 0, 1, 2. . . . , m,
and denote entry i, j of the tableau by T(i, j). The (variable) set of indices
IQ = { i 1. . . . . ik} ~-- { 1, 2 . . . . . rn} will designate an affinely independent
subset Q = {Pi, . . . . . Pik} of P. With IQ is associated the tableau of the
form (4.5) (except for irrelevant simultaneous permutation of rows and
columns) obtained from the starting tableau (4.4) by pivoting on the
block whose row and column indices are {0, i l , . . . , ik}. The first row and
column are special; once having been pivoted in, they are not used
again for pivot choices.
Each entry of the tableau (4.5) has a geometric meaning for our
problem. We are particularly interested in the left-hand column which
198 P Wolfe, Algorithrn for a least-distance prograrnming probh, rn
gives the values of the basic variables that determine the choice of
pivots, and in the diagonal on which we do all our pivoting. What we
need is covered in the five propositions below.
IR will denote the complement of the index set IQ in the set
{1,2 . . . . . m}.
Proposition 4. For i ~ In, T(i, i) is the square of the distance ofPifrom A(Q).
Proof. Let r = Pi~ R. The column M - 111, rrQ] T from the upper right-
hand block of (4.5) is the solution (~, u r) of the equations (2.2) for Y = r,
so that Q u is the nearest point to r on A(Q). N o w (2.2) gives g = uVe la =
uT(QTr _ QTQ u) = rTQ u - t Q ul 2, s o that
Proof. Consider the tableau T of the form (4.5) formed for the affinely
independent set Q {P;}. By Proposition 4, T(i, i) is the square of the
distance of Pi from A(Q\{Pi}),and that is not zero. The tableau f is
obtained from T by pivoting on i, i, so that T(i, i) = liT(i, i).
P. Wolfe, Algorithm for a least-distance programming program 199
Algorithm
Step O. (a) Form the (m + 1)-by-(m + 1) tableau
To _ [ 0 eT I
e pTp
I
(b) Let i = io minimize To(i, i) for i > 0. Set IQ = {io} and let w be
the vector of length m for which wi = 6Uo.
(c) Pivot in To on (io, io) and then on (0, 0); call the resulting tableau T.
Step 1. (Begins with an index set IQ, an m-vector w, and a tableau T;
IR is the complement of IQ in { 1. . . . . m}.)
(a) If T(0, i) ~ 0 for all i ~ 1~, stop.
(b) Otherwise choose i ~ IR so that T(0, 0 is maximal.
Replace 1Q by IQ u {i}.
Step 2. (Requires the above data, and i from (lb) or (3a).)
(a) Replace T by the result of pivoting in T on {i, i}.
(b) Let y be the m-vector: yj = T(O,j) for j ~ IQ, ys = 0 for j~IR.
(c) If yj > 0 for all j ~ IQ, set w = y and return to step 1. Otherwise,
do step 3.
200 P. Wolfe, Algorithm for a least-distance programming problem
0 rain{ w/__
--
Yj
" wj - yj > 0 }
,
6. S o l u t i o n o f t h e e x a m p l e
P= 0 "
P. Wolfe, Algorithm Jor a least-distance programming problem 201
Steps Result
0 1 1 1
1 4 0 2
0a
1 0 9 -6
1 2 -6 5
Ob IQ = {1}, w = (1,0,0)
-4 1 -4 -2
1 0 1 I
0c
4 -1 13 -4
2 -1 -4 5
la T(0, IR) = { 4, 2 } ; go on
lb i=2, IQ= {1,2}
- 36 9 4 -42
9 ! - 1 17
2a ~l•
4 - 1 1 - 4
42 - 17 4 49
2b y = (9/13, 4/13, 0)
2c w = (9/13, 4/13, 0)
la T(O, IR) = {42/13}; go on
ib i = 3, IQ= {1,2,3}
0 - 21 28 42
- 21 26 - 9 - 17
2a •
28 - 9 5 4
42 - 17 4 13
- 9 21 11 15
-21 49 -9 -17
2a ~6 x
11 9 1 -1
15 17 - 1 1
7. C o m p u t a t i o n a l notes
Table 1
Average number of pivot steps
Type 1 Type 2
n m 20 40 20 40
We must mention that the algorithm of this paper, with the calcula-
tions performed in a tableau, is probably efficient in arithmetic only for
problems in which the number m of points is not much greater than the
dimension n of the space in which they reside. Almost all of the cal-
culating is done in the pivot step, which requires O(rn 2) multiplications.
When m :~- n, a different organization of the calculation [22], related
to the present method as the revised simplex method for linear program-
ming is to the standard form, is much better; it requires O(m n) multi-
plications for a step, can take advantage of sparsity of the matrix P,
and gives better control over roundoff error.
The preceding paragraph explains why we have not much inves-
tigated the use of the tableau algorithm in practice. Doing so would
involve at least comparing its behavior on a variety of problems with
that of other tableau procedures for quadratic programming, such as
those of Cottle and Dantzig [5] and Lemke [12], which we have not
204 P. Wolfe, Algorithm for a least-distance programming problem
References
[1] E.W. Barankin and R. Dorfman, "Toward quadratic programming", Rept. to the
Logistics Branch, Office of Naval Research (January, 1955), unpublished.
[2] E.W. Barankin and R. Dorfman, "On quadratic programming", University of
California Publications in Statistics 2 (1958) 285 318.
[3] E.M.L. Beale, "Numerical methods", Parts ii-iv, in: Nonlinear programmin9,
Ed. J. Abadie (North-Holland, Amsterdam, 1967)pp. 143-172.
[4] M.D. Canon and C.D. Cullum, "The determination of optimum separating hyper-
plans I. A finite step procedure", RC 2023, IBM Watson Research Center, Yorktown
Heights, N.Y. (February, 1968).
[5] R.W. Cottle, "The principal pivoting method of quadratic programming", in:
Mathematics of the decision sciences, Part I, Vol. 11 of Lectures in Applied Ma-
thematics, Eds. G.B. Dantzig and A.F. Veinott (A.M.S., Providence, R.I., 1968)
pp. 144-162.
[6] R.W. Cottle and G.B. Dantzig, "Complementary pivot theory", in: Mathematics
of the decision sciences, Part I, Vol. 11 of Lectures in Applied Mathematics, Eds.
G.B. Dantzig and A.F. Veinott (A.M.S., Providence, R.I., 1968) pp. 115-136.
[7] G.B. Dantzig, Linear programmin9 and extensions (Princeton University Press,
Princeton, N.J., 1963).
[8] G.B. Dantzig and A.F. Veinott, Eds., Mathematics of the decision sciences, Part I,
Vol. 11 of Lectures in Applied Mathematics (A.M.S., Providence, R.I., 1968).
P. Wolfe, Algorithm for a least-distance programming problem 205
[9] R. Fletcher, "A general quadratic programming algorithm", T.P. 401, Theoretical
Physics Div., U.K.A.E.A. Research Group, A.E.R.E., Harwell (March, 1970).
[10] M. Frank and P. Wolfe, "An algorithm for quadratic programming", Naval Research
Logistics Quarterly 3 (1956) 95-110.
[I 1] H.W. Kuhn and A.W. Tucker, "Nonlinear programming", in: Proceedings of the
second Bereketey symposium on mathematical statistics and probability, Ed. J. Neyman
(University of California Press, Berkeley, Calif., 1951) pp. 481-492.
[12] C.E. Lemke, "On complementary pivot theory" in: Mathematics of the decision
sciences, Part I, Vol. I I of Lectures in Applied Mathematics, Eds. G.B. Dantzig
and A.F. Veinott (A.M.S. Providence, R.I., 1968) pp. 95-114.
[13] T.D. Parsons and A.W. Tucker, "Hybrid programs: linear and least-distance",
Mathematical Programming 1 (1971) 153-167.
[14] A.W. Tucker, "Analogues of KircholTs laws", Preprint LOGS 65, Stanford Uni-
versity Stanford, Calif. (July 19.50).
[15] A.W. Tucker, "On Kirchoff's laws, potential, Lagrange multipliers, etc.", NAML
Rept. 52-17, National Bureau of Standards, Institute for Numcrical Analysis,
University of California, Los Angeles, Calif. (August, 1951).
[16] A.W. Tucker, "Combinatorial theory underlying linear programs", in: Recent
advances in mathematical programming, Eds. R.L. Graves and P. Wolfe (McGraw-
Hill, New York, 1963) pp. 1-16.
[17] A.W. Tucker, "Principal pivot transforms of square matrices", SIAM Review 5
(1963) 305.
[18] A.W. Tucker, "A least-distance approach to quadratic programming", in: Mathe-
matics of the decision sciences, Part I, Vol. 11 of Lectures in Applied Mathematics,
Eds. G.B. Dantzig and A.F. Veinott (A.M.S., Providence, R.I., 1968)pp. 163-176.
[19] A.W. Tucker, "Least-distance programming", in: Proceedings of the Princeton
symposium on mathematical programming, Ed. H.W. Kuhn (Princeton University
Press, Princeton, N. J., 1971) pp. 583-588.
[20] P. Wolfe, "A simplex method for quadratic programming", ONR Logistics Project
Report., Princeton University, Princeton, N. J. (February 1957).
[21J P. Wolfe, "The simplex method for quadratic programming", Econometrica 27
(1959) 382 393.
[22] P. Wolfe, "Finding the nearest point in a polytope", RC 4887, IBM Research
Center, Yorktown Heights, N.Y. (June, 1974).