Optimization With Constraints: 2nd Edition, March 2004
Optimization With Constraints: 2nd Edition, March 2004
Optimization With Constraints: 2nd Edition, March 2004
C ONTENTS
Example 1.1. In IR1 consider the objective function f (x) = (x − 1)2 . The
unconstrained minimizer is x+ +
u = 1 with f (xu ) = 0. We shall look at the effect
of some simple constraints.
¡ ¢
1◦ . With the constraint x ≥ 0, r = 0, m = 1, c1 (x) = x we also find the con-
+
strained minimizer x = 1.
2◦ With the constraint x−2 ≥ 0 the feasible region is the interval P = [2, ∞[,
and x+ = 2 with f (x+ ) = 1.
1. I NTRODUCTION 3◦ The inequality constraints given by c1 (x) = x−2 and c2 (x) = 3−x lead to
P = [2, 3] and x+ = 2.
In this booklet we shall discuss numerical methods for constrained opti- 4◦ If we have the equality constraint 3−x = 0, then the feasible region consists
mization problems. The description supplements the description of uncon- of one point only, P = {3}, and this point will be the minimizer.
strained optimization in Frandsen et al (1998). We consider a real function
5◦ Finally, 3−x ≥ 0, x−4 ≥ 0 illustrates that P may be empty, in which case
of a vector variable which is constrained to satisfy certain conditions, specif-
the constrained optimization problem has no solution.
ically a set of equality constraints and a set of inequality constraints.
In many constrained problems the solution is at the border of the feasible
Definition 1.1. Feasible region. A point x ∈ IRn is feasible if it satis- region (as in cases 2◦ – 4◦ in Example 1.1). Thus a very important special
fies the equality constraints case is the set of points in P which satisfy some of the inequality constraints
ci (x) = 0, i = 1, . . . , r, r≥0 to the limit, ie with equality. At such a point z ∈ P the corresponding con-
straints are said to be active. For practical reasons a constraint which is not
and the inequality constraints
satisfied at z is also called active at z.
ci (x) ≥ 0, i = r+1, . . . , m, m≥r,
where the ci : IR 7→ IR are given.
n Definition 1.3. Active constraints. A constraint ck (x) ≥ 0 is said to
be
The set of feasible points is denoted by P and called the feasible region. active at z ∈ IRn if ck (z) ≤ 0 ,
inactive at z ∈ IRn if ck (z) > 0 .
Notice that if r = 0, then we have no equality constraints, and if r = m we
have no inequality constraints. The active set at z, A(z), is the set of indices of equality constraints
and active inequality constraints:
A constrained minimizer gives a minimal value of the function while satis-
fying all constraints, ie e
A(z) = {1, . . . , r} ∪ A(z) ,
e
where A(z) = {j ∈ {r+1, . . . , m} | cj (z) ≤ 0} .
Definition 1.2. Global constrained minimizer. Find
x+ = argminx∈P f (x) , Thus, an inequality constraint which is inactive at z has no influence on the
where f : IRn 7→ IR and P is given in Definition 1.1. optimization problem in a neighbourhood of z.
Example 1.2. In case 3◦ of Example 1.1 the constraint c1 is active and c2 is inactive 1.1. Smoothness and Descent Directions
at the solution x+ . Here the active set is
In this booklet we assume that the cost function satisfies the following Tay-
e + ) = {1} .
A(x+ ) = A(x lor smoothness condition,
2) f is continuous for all x ∈ P for i = 1, . . . , m. Here ai and Ai represent the gradient and the Hessian
matrix respectively,
3) P is bounded (∃ C ∈ IR : kxk ≤ C for all x ∈ P)
Then there exists (at least) one global, constrained minimizer. ai = ci0 (x) , Ai = ci00 (x) . (1.7b)
If both the cost function f and all constraint functions ci are linear in x, Notice that even when (1.7) is true, the boundary of P may contain points,
then we have a so-called a linear optimization problem. The solution of curves, surfaces (and other subspaces), where the boundary is not smooth,
such problems is treated in Nielsen (1999). In another important special eg points where more than one inequality constraint is active.
case all constraints are linear, and f is a quadratic polynomial; this is called
a quadratic optimization problem, see Chapter 3. Example 1.3. We consider a two-dimensional problem with two inequality con-
straints, c1 (x) ≥ 0 and c2 (x) ≥ 0.
We conclude this introduction with two sections on important properties of
the functions involved in our problems.
5 1. I NTRODUCTION 1.1. Smoothness and Descent Directions 6
c2 The methods we present in this booklet are in essence descent methods, ie it-
erative methods where we move from the present position x in a direction
Figure 1.1: Two inequality h that provides a smaller value of the cost function. We must satisfy the
constraints in IR2 . descent condition
The infeasible side is hatched.
In this and the following figures x2 c1
f (x+h) < f (x) . (1.8)
c1 means the set {x | c1 (x) = 0},
etc. In Frandsen et al (1999) we have shown that the direction giving the fastest
x1
local descent rate is the Steepest Descent direction
In Figure 1.1 you see two curves with the points where c1 and c2 , respectively, hsd = −f 0 (x) . (1.9)
is active, see Definition 1.3. The infeasible side of each curve is indicated by
hatching. The resulting boundary of P (shown with thick line) is not smooth at In the same reference we also showed that the hyperplane
the point where both constraints are active. You can also see that at this point the
tangents of the two “active curves” form an angle which is less than (or equal to) H(x) = {x+u | u> f 0 (x) = 0} (1.10)
π, when measured inside the feasible region. This is a general property. n
divides the space IR into a “descent” (or “downhill”) half space and an
Next, we consider a three-dimensional problem with two inequality constraints. “uphill” half space.
Below, you see the active surfaces c1 (x) = 0 and c2 (x) = 0. As in the 2-
dimensional case we have marked the actual boundary of the feasible region A descent direction h is characterized by having a positive projection onto
by thick line and indicated the infeasible side of each constraint by hatching. It the steepest descent direction,
is seen that the intersection curve is a kink line in the boundary surface. It is also
h> hsd > 0 ⇐⇒ h> f 0 (x) < 0 . (1.11)
seen that the angle between the intersecting constraint surfaces is less than (or
equal to) π, when measured inside P.
Now consider the constraint functions. The equality constraints ci (x) =
c2 0 (i = 1, . . . , r) and the boundary curves corresponding to the active in-
equality constraints, ci (x) ≥ 0 satisfied with “=”, can be considered as level
curves or contours (n = 2), respectively level surfaces or contour surfaces
c1 (n > 2), for these functions. We truncate the Taylor series (1.7) to
ci (x+h) = ci (x) + h> ai + O(khk ) (i = 1, . . . , m) .
2
x3
Figure 1.2: Two inequality
From this it can be seen that the direction ai (= ci0 (x)) is orthogonal to any
constraints in IR3 . x2 tangent to the contour at position x, ie ai is a normal to the constraint curve
x1 (surface) at the position.
The opposite is true at the position y, c2 (y) = 0 and c1 (y) > 0. Definition 1.13. Convexity of a function. Assume that D ⊆ IRn is
At the two positions we have indicated the gradients of the active constraints. convex. The function f is convex on D if the following holds for arbi-
They point into the interior of P, the feasible region. trary x, y ∈ D, θ ∈ [0, 1] and xθ ≡ θx + (1−θ)y :
c2 f (xθ ) ≤ θf (x) + (1−θ)f (y) .
c2
a2
f is strictly convex on D if
a2 f (xθ ) < θf (x) + (1−θ)f (y) .
a1
y c1
y a1
Theorem 1.16. If D ⊆ IRn is convex and f is twice differentiable on 2) Let ci be an inequality constraint. Assume that ci is concave on IRn . If
D, then ci (x) ≥ 0, ci (y) ≥ 0 and θ ∈ [0, 1], then Definition 1.14 implies that
1◦ f is convex on D
ci (θx + (1−θ)y) ≥ θ ci (x) + (1−θ) ci (y)) ≥ 0 .
⇐⇒ f 00 (x) is positive semidefinite for all x ∈ D
2◦ f is strictly convex on D if Thus, the set of points where ci is satisfied is a convex set. This means that
f 00 (x) is positive definite for all x ∈ D the feasible domain P is convex if all equality constraints are linear and all
inequality constraints are concave.
We finish this section with two interesting observations about the feasible
domain P.
1) Let ci be an equality constraint. Take two arbitrary feasible points x
and y: ci (x) = ci (y) = 0. All points xθ on the line between x and y must
also be feasible (cf Definition 1.12),
ci (θx + (1−θ)y) = 0 for all θ ∈ [0, 1] .
Thus ci must be linear, and we obtain the surprising result: If P is convex,
then all equality constraints are linear. On the other hand the set of points
satisfying a linear equality constraint must be convex. Therefore the feasible
domain of an equality constrained problem (ie r = m) is convex if and only
if all constraints are linear.
2. L OCAL , C ONSTRAINED M INIMIZERS 12
Figure 2.1: Steepest descent direction and “downhill” halfspace. 2) Two active inequality constraints (no equality constraints) in IR2
In order to get a lower cost value we should move in a direction h in the Figure 2.5 illustrates this case. At position x both constraints are active.
unhatched halfspace of descent directions. If the step is not too long then The pulling force hsd shown indicates that the entire feasible region is on
the constraints of the problem are of no consequence. the ascent side of the dividing plane H (defined in (1.10) and indicated in
Figure 2.5 by a dashed line). In this case, x is a local, constrained minimizer.
13 2. L OCAL , C ONSTRAINED M INIMIZERS 2.1. Lagrangian Function 14
c2
x*
a1
a2
xu xu xu
Figure 2.5: Two inequality constraints
in IR2 , c1 (x) ≥ 0 and c2 (x) ≥ 0. x c1 c1
At the intersection point, hsd points c1 c1
out of the feasible region.
hsd c1 is strongly active
c1 is inactive c1 is weakly active
Figure 2.6: Contours of a quadratic function in IR2 ;
Imagine that you turn hsd around the point x (ie, you change the cost func- one constraint, c1 (x) ≥ 0.
tion f ). As soon as the dividing plane intersects with the active part of one In other words: if a constraint is weakly active, we can discard it without
of the borders, a feasible descent direction appears. The limiting cases are, changing the optimizer. This remark is valid both for inequality and equality
when hsd is opposite to either a1 or a2 . The position xs is said to be a con- constraints.
strained stationary point if h(s)
sd is inside the angle formed by −a1 and −a2 ,
or 2.1. The Lagrangian Function
h(s) (s)
sd = −λ1 a1 − λ2 a2
(s)
with λ1 , λ2 ≥ 0 .
The introductory section of this chapter indicated that there is an im-
This is equivalent to
portant relationship between g∗ , the gradient of the cost function, and
f 0 (xs ) = λ1 c10 (xs ) + λ2 c20 (xs ) with λ1 , λ2 ≥ 0 . (2.2) a∗i (i = 1, . . . , m) the gradients of the constraint functions, all evaluated at
a local minimizer. This has lead to the introduction of Lagrange’s Function:
By comparison with the formulae in the introduction to this chapter we see Theorem 2.5. First order necessary conditions. (KKT conditions)
that in all cases the necessary condition for a local, constrained minimizer
Assume that
could be expressed in the form Lx0 (xs , λ) = 0 .
a) x∗ is a local constrained minimizer of f (see definition 1.4).
For an unconstrained optimization problem you may recall that the neces-
sary conditions and the sufficient condition for a minimizer involve the gra- b) either b1) all active constraints ci are linear,
dient f 0 (x∗ ) and the Hessian matrix f 00 (x∗ ) of the cost function, see Theo- or b2) the gradients a∗i = ci0 (x∗ ) for all active
rems 1.1, 1.2 and 1.5 in Frandsen et al (1999). In the next sections you will constraints are linearly independent.
see that the corresponding results for constrained optimization will involve
Then there exist Lagrangian multipliers {λ∗i }i=1 (see definition 2.3)
m
the gradient and the Hessian matrix (with respect to x) of the Lagrangian
such that
function.
1◦ Lx0 (x∗ , λ∗ ) = 0 ,
2.2. First Order Condition, Necessary Condition 2◦ λ∗i ≥ 0, i = r+1, . . . , m ,
First order conditions on local minimizers only consider first order partial 3◦ λ∗i ci (x∗ ) = 0, i = 1, . . . , m .
derivatives of the cost function and the constraint functions. With this re-
striction we can only formulate the necessary conditions; the sufficient con-
ditions also include second derivatives. The formulation is very compact, and we therefore give some clarifying
remarks:
Our presentation follows Fletcher (1993), and we refer to this book for the
formal proofs, which are not always straight forward. The strategy is as 1◦ This was exemplified in connection with (2.4).
follows, 2◦ λ∗i ≥ 0 for all inequality constraints was exemplified in (2.2), and in
(1) Choose an arbitrary, feasible point. Appendix A we give a formal proof.
(2) Determine a step which leads from this point to a neighbouring point,
3◦ For an equality constraint ci (x∗ ) = 0, and λ∗i can have any sign.
which is feasible and has a lower cost value.
For an active inequality constraint ci (x∗ ) = 0, and λ∗i ≥ 0.
(3) Detect circumstances which make the above impossible.
(4) Prove that only the above circumstances can lead to failure in step (2). For an inactive inequality constraint ci (x∗ ) > 0, so we must have
λ∗i = 0, confirming the observation in Example 1.4, that these con-
First, we formulate, the so-called first order Karush–Kuhn–Tucker condi- straints have no influence on the constrained minimizer.
tions (KKT conditions for short):
In analogy with unconstrained optimization we can introduce the following
17 2. L OCAL , C ONSTRAINED M INIMIZERS 2.3. Second Order Conditions 18
Corollary 2.6. Constrained stationary point In all the cases, xs = 0 is a constrained stationary point, see definition (2.6):
¡ ¢ ¡ ¢
(x, λ) = 12 (x1 − 1)2 + x22 − λ −x1 + βx22 ,
xs is feasible and (xs , λs ) satisfy 1◦ –3◦ in Theorem 2.5 · ¸ · ¸ · ¸
x1 − 1 −1 −1 + λ
m Lx0 (x, λ) = − λ ; Lx0 (0, λ) = .
x2 2βx2 0
xs is a constrained stationary point Thus, (xs , λs ) = (0, 1) satisfy 1◦ in Theorem 2.5, and 2◦ –3◦ are automatically
satisfied when the problem has equality constraints only.
Notice, that f is strictly convex in IR2 .
2.3. Second Order Conditions For β = 0 the feasible region is the x2 -axis. This together with the contours of
The following example demonstrates that not only the curvature of the cost f (x) near origo tells us that we have a local, constrained minimizer, x∗ = 0.
function but also the curvatures of the constraint functions are involved in With β = 14 the stationary point xs = 0 is also a local, constrained minimizer,
the conditions for constrained minimizers. x∗ = 0. This can be seen by correlating the feasible parabola with the contours
of f around 0.
Example 2.1. This example in IR2 with one equality constraint (r = m = 1) is due Finally, for β = 1 we get the rather surprizing result that xs = 0 is a local, con-
to Fiacco and McCormick (1968). The cost function and constraint are strained maximizer. Inspecting the feasible parabola and the contours carefully,
¡ ¢
f (x) = 12 (x1 − 1)2 + x22 , c1 (x) = −x1 + βx22 . you will discover that two local constrained minimizers have appeared around
x = [0.5, ±0.7]> .
We consider this problem for three different values of the parameter β, see Fig-
ure 2.7.
In Frandsen et al (2004) we derived the second order conditions for uncon-
x2 strained minimizers. The derivation was based on the Taylor series (1.6) for
f (x∗ +h), and lead to conditions on the definiteness of the Hessian matrix
1
Hu = f 00 (xu ), where xu is the unconstrained minimizer.
The above example indicates that we have to take into account also the cur-
vature of the active constraints, A∗i = ci00 (x∗ ) for i ∈ A(x∗ ).
The second order condition takes care of the situation where we move along
the edge of P from a stationary point x. Such a direction is called a feasible
1 x active direction:
1
Figure 2.7: Contours of f
and the constraint Definition 2.7. Feasible active direction. Let x ∈ P. The nonzero
vector h is a feasible active direction if
−x1 + βx22 = 0
for three values of β.
−1 h> ci0 (x) = 0
β=0 β = 1/4 β=1
for all active constraints.
19 2. L OCAL , C ONSTRAINED M INIMIZERS 2.3. Second Order Conditions 20
Now we use the Taylor series to study the variation of the Lagrangian func- Theorem 2.11. Second order necessary condition.
tion. Suppose we are at a constrained stationary point xs , and in the variation Assume that
we keep λ = λs from Definition 2.6. From xs we move in a feasible active a) x∗ is a local constrained minimizer for f .
direction h, b) As b) in Theorem 2.5.
c) All the active constraints are strongly active.
L(xs +h, λs ) = L(xs , λs ) + h> Lx0 (xs , λs )
Then there exists Lagrangian multipliers {λ∗i }i=1 (see Definition 2.3)
m
1 > 00 3
+ h2
Lxx (xs , λs )h + O(khk ) such that
= L(xs , λs ) + 12 h> Lxx
00
(xs , λs )h + O(khk3 ) , (2.8) 1◦ Lx0 (x∗ , λ∗ ) = 0 ,
2◦ λ∗i ≥ 0, i = r+1, . . . , m ,
since (xs , λs ) satisfies 1◦ in Theorem 2.5. The fact that xs is a constrained
stationary point implies that also 3◦ of Theorem 2.5 is satisfied, so that 3◦ λ∗i > 0 if ci is active, i = r+1, . . . , m ,
L(xs , λs ) = f (xs ). Since h is a feasible active direction, we obtain (again
4◦ λ∗i ci (x∗ ) = 0, i = 1, . . . , m ,
using 3◦ of Theorem 2.5),
5◦ h> W∗ h ≥ 0 for any feasible active direction h.
Pm
L(xs +h, λs ) = f (xs +h) − i=1 λ(s)
i ci (xs +h)
Here, W∗ = Lxx
00
(x∗ , λ∗ ).
Pm
' f (xs +h) − i=1 λ(s) > 0
i (ci (xs )+h ci (xs ))
= f (xs +h) , (2.9) Theorem 2.12. Second order sufficient condition.
Assume that
and inserting this in (2.8) we get (for small values of khk) a) xs is a local constrained stationary point (see Definition 2.6).
b) As b) in Theorem 2.5.
f (xs +h) ' f (xs ) + 12 h> Ws h , (2.10a) c) As c) in Theorem 2.11.
d) h> W∗ h > 0 for any feasible active direction h,
where the matrix Ws is given by where W∗ = Lxx 00
(x∗ , λ∗ ).
Then
X
m
00 00 00 xs is a local constrained minimizer.
Ws = Lxx (xs , λs ) = f (xs ) − λ(s)
i ci (xs ) . (2.10b)
i=1
For the proofs we refer to Fletcher (1993). There, you may also find a
This leads to the sufficient condition that the stationary point xs is a local, treatment of the cases, where the gradients of the active constraints are not
constrained minimizer if h> Ws h > 0 for any feasible active direction h. linearly independent, and where some constraints are weakly active.
Since this condition is also necessary we can formulate the following two
second order conditions:
21 2. L OCAL , C ONSTRAINED M INIMIZERS
The matrix H ∈ IRn×n and the vectors g, a1 , . . . , am ∈ IRn are given. The
associated Lagrange function is
X
m
L(x, λ) = 12 x> H x + g> x − λi (a>i x − bi ) , (3.2a)
i=1
with the first and second order derivatives
Xm
Lx0 (x, λ) = Hx + g − λ i ai , 00
Lxx (x, λ) = H . (3.2b)
i=1
Assumption 3.3. H is symmetric and positive definite. which is the following linear system of equations,
(See Fletcher (1993) for methods for the cases where these simplifying as- H xu = −g . (3.5)
sumptions are not satisfied). Under Assumption 3.3 the problem is strictly
The solution is unique according to our assumptions.
convex (Theorem 1.16). This ensures that q(x)→+∞ when kxk→∞, irre-
spective of the direction. Thus we need not require the feasible region P to
be bounded. All the constraint functions are linear and this makes P convex. 3.1. Basic Quadratic Optimization
Thus, in this case Theorem 1.17 leads to The basic quadratic optimization problem is the special case of Problem QO
(Definition 3.1) with only equality constraints, ie m = r. We state it in the
Corollary 3.4. Under Assumption 3.3 the problem QO of Defini-
form1)
tion 3.1 has a unique solution.
Definition 3.6. Basic quadratic optimization problem (BQO)
As in Chapter 2 we shall progress gradually with the different complications
Find
of the methods, ending the chapter with a method for non-linear optimiza-
x∗ = argminx∈P {q(x)} ,
tion using iterations where each step solves a quadratic optimization prob-
where
lem, gradually approaching the properties of the non-linear cost function
q(x) = 12 x> H x + g> x , P = {x ∈ IRn | A> x = b} .
and constraints.
The matrix A ∈ IRn×m has the columns A:,j = aj and bj is the jth
Example 3.1. In Figure 3.1 you see the contours of a positive definite quadratic element in b ∈ IRm .
in IR2 . If there are no constraints on the minimizer, we get the unconstrained
minimizer, indicated by xu in the figure. The solution can be found directly, namely by solving the linear system of
equations which express the necessary condition that the Lagrange function
x
2 L is stationary at the solution with respect to both of its vector variables x
and λ:
xu
Lx0 (x, λ) = 0 : Hx + g − Aλ = 0 ,
(3.7)
Lλ0 (x, λ) =0: A> x − b = 0 .
Figure 3.1: Contours
of a quadratic in IR2 The first equation is the KKT condition, and the second expresses that the
and its unconstrained
constraints are satisfied at the solution. This linear system of equations has
minimizer xu
x1 the dimension (n+r)×(n+r), with r = m. Thus the solution requires
O((n+m)3 ) operations. We return to the solution of (3.7) in Section 3.3.
The solution of the unconstrained quadratic optimization problem corre-
sponding to Definition 3.1 is found from the necessary condition q 0 (xu ) = 0 1) In other presentations you may find the constraint equation formulated as Ãx = b with
à = A> . Hopefully this will not lead to confusion.
25 3. Q UADRATIC O PTIMIZATION 3.2. General Quadratic Optimization 26
3.2. General Quadratic Optimization If, on the other hand, xeq is feasible, then we are finished (ie x∗ = xeq )
provided that all Lagrange multipliers corresponding to Ae are non-negative
In the general case we have both equality constraints and inequality con-
(Theorem 2.13). If there is one or more λj < 0 for j ∈A e (ie for active
straints in Problem 3.1, and we must use an iterative method to solve the
problem. If we knew which constraints are active at the solution x∗ we inequality constraints), then one of the corresponding indices is dropped
could set up a linear system like (3.7) and find the solution directly. Thus from A before the next iteration.
the problem can be formulated as that of finding the active set A(x∗ ). Before formally defining the strategy in Algorithm 3.10 we illustrate it
We present a so-called active set method. Each iterate x is found via an through a simple example.
active set A (corresponding to the constraints that should be satisfied with
“=”, cf Definition 1.3). Ignoring the inactive constraints we consider the ba- Example 3.2. We take a geometric view of a problem in IR2 with 3 inequality
sic quadratic optimization problem with the equality constraints given by A: constraints. In Figure 3.2 we give the contours of the cost function and the
border lines for the inequalities. The infeasible side is hatched.
Definition 3.8. Current BQO problem (CBQO(A))
x2
Find
xeq = argminx∈P {q(x)} ,
where
q(x) = 12 x> H x + g> x , P = {x ∈ IRn | A> x = b} . xu
The matrix A ∈ IRn×p has the columns A:,j = aj , j ∈ A and b ∈ IRp hsd
x2
has the corresponding values of bj . p is the number of elements in A. x3
3
x1
We shall refer to CBQO(A) as a function (subprogram) that returns
hsd
(xeq , λeq ), the minimizer and the corresponding set of Lagrange multipli-
ers corresponding to the active set A. Similar to the BQO they are found as
the solution to the following linear system of dimension (n+p)×(n+p): x0
Hx + g − Aλ = 0 ,
(3.9)
A> x − b = 0 .
2 1
x1
loosen the only remaining constraint (corresponding to λ2 < 0). Thus, the next 3◦ The vector xeq satisfies the current active constraints, but some of the
CBQO step will lead to xu , the unconstrained minimizer which is infeasible: It e may be
inequality constraints that were ignored: j ∈ {r+1, . . . , m} \A
satisfies constraints 1 and 2, but not 3. The next iterate, x = x2 is found as the
intersection between the line from x1 to xu and the bordering line for a> violated at xeq . Let V denote the set of indices of constraints violated
3 x ≥ b3 .
Finally, a CBQO step from x2 with A = {3} gives x = x3 . This is feasible and
at xeq , V = {j ∈ {r+1, . . . , m} | a>j xeq < bj } . We shall choose the
by checking the contours of the cost function we see that we have come to the best feasible point on the line between x and xeq ,
solution, x∗ = x3 . Algebraically we see this from the fact that λ3 > 0.
e = x + t(xeq − x),
x 0<t<1 . (3.11a)
The strategy from this example is generalized in Algorithm 3.10.
The value of t which makes constraint no. j active is given by
Algorithm 3.10. General quadratic optimization aj> x
e = bj , which is equivalent to
q(xnew ) ≤ q(x) the strict decrease can only take place a finite number of columns. First, we note that the matrix H is used in all iteration steps. It
times because of the finite number of possible active sets A. should be factorized once and for all, eg by Cholesky’s method, cf Appendix
Therefore we only drop a constraint a finite number of times, and thus cy- A in Frandsen et al (2004),
cling cannot take place: The algorithm must stop after a finite number of H = CC> ,
iterations. where C is lower triangular. This requires O(n3 ) operations, and after this,
each “H−1 w” will then require O(n2 ) operations.
3.3. Implementation Aspects The first equation in (3.9) can be reformulated to
To start Algorithm 3.10 we need a feasible starting point x0 . This is sim-
x = H−1 (Aλ − g) , (3.13a)
ple if m ≤ n (the number of constraints is at most equal to the number of
unknown): We just solve and when we insert this in the second equation in (3.9), we get
A> x = b , (3.12a) (A> H−1 A)λ = b + A> H−1 g . (3.13b)
with A ∈ IRn×m having the columns ai , i = 1, . . . , m. If m < n, then this Next, we can reformulate (3.13b) to
system is underdetermined, and the solution has (at least) n−m free param-
Gλ = b + A> d
eters. For any choice of these the vector x is feasible; all the constraints are
active. with G = (C−1 A)> (C−1 A) , d = H−1 g .
If m > n, we cannot expect to find an x with all inequality constraints active. This system is solved via the Cholesky factorization of the p×p matrix G
Instead, we can use the formulation (p being the current number of active constraints). When A changes by
adding or deleting a column, it is possible to update this factorization in
A> x − s = b with s ≥ 0 , (3.12b) O(n·p) operations, and the cost of each iteration step reduces to O(n2 ) op-
and si = 0 for the equality constraints. The problem of finding an x that sat- erations. For more details see pp 18–19 in Madsen (1995).
isfies (3.12b) is similar to getting a feasible starting point for the S IMPLEX There are alternative methods for solving the system (3.9). Gill and Mur-
method in Linear Optimization, see Section 4.4 in Nielsen (1999). ray (1974) suggest to use the QR factorization2) of the active constraint
The most expensive part of the process is solution of the CBQO at each matrix,
· ¸ · ¸
iteration. The simplest approach would be to start from scratch for each R £ ¤ R
new A. Then the accumulated cost of the computations involved in the A = Q = QR QN = QR R , (3.14)
0 0
solutions of (3.9) would be O((n+m)3 ) floating point operations per call of
where Q is orthogonal and R is upper triangular. As indicated, we can split
CBQO. If constraint gradients are linearly independent then the number of
Q into QR ∈ IRn×p and QN ∈ IRn×(n−p) . The orthogonality of Q implies
equality and active constraints cannot exceed n, and thus the work load is
that
O(n3 ) floating point operations per call of CBQO.
Considerable savings are possible when we note that each new A is obtained 2) See eg Chapter 2 in Madsen and Nielsen (2002) or Section 5.2 in Golub and Van
from the previous either by deleting a column or by adding one or more new Loan (1996).
31 3. Q UADRATIC O PTIMIZATION 3.4. Sequential Quadratic Optimization 32
Q>
R QR = I(n−p)×(n−p) , Q>
R QN = 0(n−p)×p , (3.15) Because of Assumption 3.3 the matrix is symmetric. It is not positive def-
inite, however3) , but there are efficient methods for solving such systems,
where the indices on I and 0 are the dimensions of the matrix. The columns where the sparsity is preserved better, without spoiling numerical stabil-
of Q form an orthonormal basis of IRn , and we can express x in the form ity. It is also possible to handle the updating aspects efficiently; see eg
x = QR u + QN v, u ∈ IRp , v ∈ IRn−p . (3.16) Duff (1993).
What has this got to do with Quadratic Optimization? Quite a lot! Com- x∗ = argminx∈P f (x) ,
pare (3.22) with (3.19). Since (3.19) gives the solution (x, λ) to the CBQO, P = {x ∈ IRn | cj (x) = 0 , j = 1, . . . , r (3.24)
(Definition 3.8), it follows that (3.22) gives the solution h and the corre- cj (x) ≥ 0 , j = r+1, . . . , m} ,
sponding Lagrange multiplier vector λ to the following problem,
then we can still use (3.23) except that the feasible region has to be changed
Find h = argminh∈Plin {q(h)} (3.23a) accordingly. Thus the QO problem becomes
35 3. Q UADRATIC O PTIMIZATION 3.4. Sequential Quadratic Optimization 36
Example 3.3. Consider the problem Figure 3.3 shows the contours of q (full line) and f (dashed line) through the
points given by δ = 0 and δ = h. In the second case we see that in the region
f (x) = x21 + x22 , P = {x ∈ IR2 | x21 − x2 − 1 = 0} (3.26) of interest q is a much better approximation to f than in the first case. Notice
the difference in scaling and that each plot has the origin at the current x. At the
The cost function is a quadratic in x, but the constraint c1 (x) = x21 −x2 −1 is not point x := x + αh ' [ 0.702, −0.508 ]> we get c1 (x) ' 1.3·10−4 . This value
linear, so this is not a quadratic optimization problem. is too small to be seen in Figure 3.3b.
In Example 4.5 we solve this problem via a series of approximations of the form
>
f (x+δ) ' q(δ) ≡ 12 δ> W δ + f 0 (x) δ + f (x),
>
c(x+δ) ' l(δ) ≡ c01 (x) δ + c1 (x) ,
00
where q is the function of (3.23) with Lxx (x, λ) replaced by an approximation
W. This leads to the following subproblem,
Let the first approximation for solving (3.26) correspond to x = [1, 1]> and
W = I. Then
37 3. Q UADRATIC O PTIMIZATION
δ2
δ2
2 c1 = 0
0.5 c1 = 0
d =0
1
d =0
1
2 δ 0.5 δ
1 1
4. P ENALTY AND SQO M ETHODS
h
There are several strategies on which to base methods for general con-
h strained optimization. The first is called sequential linear optimization: in
a b each iteration step we solve a linear optimization problem where both cost
function and constraint functions are approximated linearly. This strategy
may be useful e.g. in large scale problems.
Figure 3.3: Approximating quadratics and the constraint The next strategy is sequential quadratic optimization (SQO). We intro-
Dotted line: f and c. Full line: q and d duced this in Section 3.4, and in section 4.2 we shall complete the descrip-
tion, including features that make it practical and robust.
The third strategy could be called sequential unconstrained optimization
(SUO). In each iteration step we solve an unconstrained optimization prob-
lem, with the cost function modified to induce or force the next iterate to be
feasible. The modification consists in adding a penalty term to the cost func-
tion. The penalty term is zero, if we are in the feasible region, and positive
if we are outside it. The following examples are due to Fletcher (1993).
When the components of x tend to infinity, f (x) tends to −∞. In Figure 4.1
we see the contours of ϕ(x, 1), ϕ(x, 10) and ϕ(x, 100). For σ > 0 we have a 1 1 1
1 X 1 X 1 X
1 1 1 1 1 1
σ=1 σ = 10 σ = 100
Figure 4.2: Contours and minimizer of ϕ(x, σ)
x∗ and xσ is marked by * and o, respectively
1 X 1 X 1 X
1 1 1
equality constraints there is an extra difficulty with this penalty function: Inside
σ=1 σ = 10 σ = 100 the feasible region the functions f and ϕ have the same values and derivatives,
Figure 4.1: Contours and minimizer of ϕ(x, σ) while this is not the case in the infeasible region. On the border of P (where the
x∗ and xσ is marked by * and o, respectively solution is situated) there is a discontinuity in the second derivative of ϕ(x, σ),
The figure indicates a very serious problem connected with SUO. As σ → ∞, and this disturbs line searches and descent directions which are based on inter-
the valley around xσ becomes longer and narrower making trouble for the polation, thus adding to the problems caused by the narrow valley.
method used to find this unconstrained minimizer. Another way of expressing
this, is that the unconstrained problems become increasingly ill-conditioned. It is characteristic for penalty methods, as indicated in the examples, that
(normally) all the iterates are infeasible with respect to (some of) the in-
equality constraints. Therefore they are also called exterior point methods.
Example 4.2. Consider the same problem as before, except that now c1 is an
inequality constraint: Find In some cases the objective function is undefined in (part of) the infeasible
region. Then the use of exterior point methods becomes impossible. This
argminx∈P f (x) , P = {x ∈ IR2 | c1 (x) ≥ 0} ,
has lead to the class of barrier methods that force all the iterates to be feasi-
where f and c1 are given in Example 4.1. The feasible region is the interior of ble. To contrast them with penalty function methods they are called interior
the unit circle, and again the solution is x∗ = √12 [1, 1]> .
point methods (IPM).
The penalty term should reflect that all x for which c1 (x) ≥ 0 are permissible, The most widely used IPMs are based on the logarithmic barrier function.
and we can use
We can illustrate it with a problem with one inequality constraint only,
ϕ(x, σ) = f (x) + 12 σ (min{c1 (x), 0})2 , σ≥0.
x+ = argminx∈P f (x) , P = {x ∈ IRn | c1 (x) ≥ 0} .
In Figure 4.2 we see the contours of ϕ(x, σ) and their minimizers xσ for the
same σ-values as in Example 4.1. The corresponding barrier function is1)
All the xσ are infeasible and seem to converge to the solution. We still have ϕ(x, µ) = f (x) − µ log c1 (x) ,
the long narrow valleys and ill conditioned problems, when σ is large. With in-
1) “log” is the natural (or Naperian) logarithm.
41 4. P ENALTY AND SQO M ETHODS 4.1. Augmented Lagrangian Method 42
with the barrier parameter µ > 0. The logarithm is defined only for x c being the vector function c : IRn 7→ IRr , whose ith component is the ith
strictly inside P (we confine ourselves to working with real numbers), and constraint function ci . At the end of this section we generalize the formula-
since log c1 (x) → −∞ for c1 (x)→0, we see that ϕ(x, µ) → +∞ for x tion to include inequality constraints as well.
approaching the border of P. However, when µ→0, the minimizer xµ of We have the following Lagrangian function (Definition 2.3),
ϕ(x, µ) can approach a point at the border.
L(x, λ) = f (x) − λ> c(x) ,
Methods based on barrier functions share some of the disadvantages of the
penalty function methods: As we approach the solution the intermediate and introduce a penalty term as indicated at the beginning of this chapter.
results xµ are minimizers situated at the bottom of valleys that are narrow, Thus consider the following augmented Lagrangian function2)
ie xµ is the solution of an ill-conditioned (unconstrained) problem. >
ϕ(x, λ, σ) = f (x) − λ> c(x) + 12 σ c(x) c(x) . (4.1)
As indicated barrier methods are useful in problems where infeasible x vec-
tors must not occur, but apart from this they may also be efficient in large Notice that the discrepancy mentioned above has been relaxed: If λ = λ∗ ,
scale problems. In linear optimization a number of very efficient versions then the first order conditions in Corollary 2.6 and the fact that c(x∗ ) = 0
have been developed during the 1990s, see eg Chapter 3 in Nielsen (1999). implies that x∗ is a stationary point of ϕ:
We end this introduction by returning to the penalty functions used in Ex- ϕ0x (x∗ , λ∗ , σ) = 0 .
amples 4.1 and 44.2 and taking a look at the curvatures of the penalty func- Furthermore, Fletcher has shown the existence of a finite number σ b with
tion near the solution x∗ and xσ , the unconstrained minimizer of ϕ(x, σ). b, then x∗ is an unconstrained local minimizer of
the property that if σ > σ
Consider one inequality constraint as in Example 4.2, and assume that the ϕ(x, λ∗ , σ), ie if
constraint is strongly active at the solution: f 0 (x∗ ) 6= 0. This shows that
xλ,σ = argminx∈IRn ϕ(x, λ, σ) , (4.2)
ϕ0x (x∗ , σ) 6= 0 ,
then3)
independent of σ, while the unconstrained minimizer xσ satisfies
ϕ0x (xσ , σ) = 0 . xλ∗ ,σ = x∗ b.
for all σ > σ (4.3)
When σ→∞, xσ →x∗ , but the difference in the gradients of ϕ (at x∗ and This means that the penalty parameter σ does not have to go to infinity. If σ
xσ ) remains constant, and thus the curvature of ϕ goes to infinity. This dis- is sufficiently large and if we insert λ∗ (the vector of Lagrangian multipli-
crepancy is eliminated in the following method which was first introduced ers at the solution x∗ ), then the unconstrained minimizer of the augmented
by Powell (1969). Lagrangian function solves the constrained problem. Thus the problem of
finding x∗ has been reduced – or rather changed – to that of finding λ∗ .
4.1. The Augmented Lagrangian Method We shall describe a method that uses the augmented Lagrangian function
to find the solution. The idea is to use the penalty term to get close to the
At first we consider the special case where only equality constraints are
present: 2) Remember that λ> c(x) = P m λ c (x) and c(x)> c(x) = P m (c (x))2 .
i=1 i i i=1 i
P = {x ∈ IR | c(x) = 0} ,
n
3) In case of several local minimizers “argmin
x∈IRn ” is interpreted as the local uncon-
strained minimizer in the valley around x∗ .
43 4. P ENALTY AND SQO M ETHODS 4.1. Augmented Lagrangian Method 44
solution x∗ , and then let the Lagrangian term provide the final convergence From the current λ we seek a step η such that λ+η ' λ∗ . In order to get a
by letting λ approach λ∗ . A rough sketch of the algorithm is guideline on how to choose η we look at the Taylor expansion for ψ,
Choose initial values for λ, σ ψ(λ+η) = ψ(λ) + η> ψ 0 (λ) + 12 η> ψ 00 (λ) η + O(kηk3 )
repeat −1 >
Compute xλ,σ (4.4) = ψ(λ) − η> c − 12 η> Jc (ϕxx
00
) Jc η + O(kηk3 ) , (4.7)
Update λ and σ
where c = c(xλ ), Jc = Jc (xλ ) is the Jacobian matrix defined in (3.21c), and
until stopping criteria satisfied 00 00
ϕxx = ϕxx (xλ , λ, σfix ). A proof of these expressions for the first and sec-
The computation of xλ,σ (for fixed λ and σ) is an unconstrained optimiza- ond derivatives of ψ can be found in Fletcher (1993). This expansion shows
tion problem, which we deal with later. First, we concentrate on ideas for that
updating (λ, σ) in such a way that σ stays limited and λ→λ∗ . η = −α c(xλ ) , α>0
In the first iteration steps we keep λ constant (eg λ = 0) and let σ increase. is a step in the steepest ascent direction. Another way to get this, and at
This should lead us close to x∗ as described for penalty methods at the start the same time providing a value for α, goes as follows: The vector xλ is a
of this chapter. minimizer for ϕ. Therefore ϕ0x (xλ , λ, σfix ) = 0, implying that
Next, we would like to keep σ fixed, σ = σfix , and vary λ. Then f 0 (xλ ) − Jc (xλ )[λ − σfix c(xλ )] = 0 .
xλ = argminx∈IRn ϕ(x, λ, σfix ) Combining this with the KKT condition (Theorem 2.5),
and f 0 (x∗ ) − Jc (x∗ )λ∗ = 0 ,
ψ(λ) = ϕ(xλ , λ, σfix ) = minx∈IRn ϕ(x, λ, σfix ) and the assumption that xλ ' x∗ , we find
b. Since
are functions of λ alone. Assume σfix > σ
λ∗ ' λ − σfix c(xλ ) . (4.8)
◦
1 ψ(λ) is the minimal value of ϕ,
The right-hand side can be used for updating λ. Fletcher (1993) shows that
2◦ the definition (4.1) combined with c(x∗ ) = 0 shows that under certain regularity assumptions (4.8) provides linear convergence4) .
ϕ(x∗ , λ, σ) = f (x∗ ) for any (λ, σ), Faster convergence is obtained by applying Newton’s method to the nonlin-
ear problem ψ 0 (λ) = 0,
3◦ (4.3) implies xλ∗ = x∗ , λ∗ ' λ + η , where ψ 00 (λ)η = −ψ 0 (λ) .
it follows that for any λ Notice, that this is equivalent to finding η as a stationary point for the
quadratic model obtained by dropping the error term O(kηk3 ) in (4.7). A
ψ(λ) ≤ ϕ(x∗ , λ, σfix ) = ϕ(x∗ , λ∗ , σfix ) = ψ(λ∗ ) . (4.5) formula for ψ 00 (λ) is also given in (4.7). Inserting this we obtain
Thus the Lagrangian multipliers at the solution is a local maximizer for ψ,
4) This means that in the limit we have kλ ∗ ∗
new − λ k ≤ κkλ − λ k, where
λ∗ = argmaxλ ψ(λ) . (4.6) λnew = λ − σfix c(xλ ) and 0 < κ < 1.
45 4. P ENALTY AND SQO M ETHODS 4.1. Augmented Lagrangian Method 46
begin Example 4.3. We illustrate Algorithm 4.10 with the following simple problem,
k := 0; x := x0 ; λ := λ0 ; σ := σ0 {1◦ } with n = 2 and r = m = 1:
Kprev := kc(x)k∞ {2◦ } minimize f (x) = x21 + x22
repeat with the constraint c1 (x) = 0 , c1 (x) = x21 − x2 − 1 .
k := k+1 For hand calculation the following expressions are useful:
x := argminx ϕ(x, λ, σ); K := kc(x)k∞ {2◦ } £ ¤
f 0 (x) = [ 2x1 , 2x2 ]> , Jc (x) = 2x1 −1 ,
if (K ≤ 14 Kprev ) ¡ ¢2
λ := Update(x, λ, σ) {3◦ } ϕ(x, λ, σ) = (x21 + x22 ) − λ · (x21 − x2 − 1) + σ · x21 − x2 − 1 ,
Kprev := K · ¸
2x1 (1 − λ + σ(x21 − x2 − 1))
else ϕx0 (x, λ, σ) = ,
2x2 + λ − σ(x1 − x2 − 1)
2
σ := 10 ∗ σ {4◦ } · ¸
00 2x1 (1−λ − σ(x2 +1−3x21 )) −2σx1
until K < ε or k > kmax ϕxx (x, λ, σ) =
−2σx1 2+σ
.
end
We shall follow the iterations from the starting point x0 = [ 1, 1 ]> , λ0 = 0,
We have the following remarks: σ0 = 2. We find Kprev = |c1 (x0 )| = 1.
1◦ As mentioned earlier it is natural to start with the pure penalty method, First step: The augmented Lagrangian function is
¡ ¢2
ie we let λ0 = 0. σ0 must be a positive number, one might eg start with ϕ(x, λ, σ) = (x21 + x22 ) − 0 · (x21 − x2 − 1) + 1 · x21 − x2 − 1 ,
σ0 = 1. x0 is an initial estimate of the solution provided by the user.
47 4. P ENALTY AND SQO M ETHODS 4.1. Augmented Lagrangian Method 48
−1 −1
½
Here, we have defined the approximate active set Aδ (x) by cr+i (x) − zi = 0
cr+i (x) ≥ 0 ⇔ , i = 1, . . . , m−r . (4.13)
zi ≥ 0
Aδ (x) = {1, . . . , r} ∪ { i | i > r and ci (x) ≤ δ } , (4.11)
Notice, that we have extended the number of variables, and still have in-
equality constraints. These are simple, however, and – as we shall see – the
where δ is a small positive number. Initially we could keep λ = 0 and
slack variables can be eliminated.
increase σ until the approximate active set seems to have stabilized (eg by
being constant for two consecutive iterations). As long as A(x) remains Consider the augmented Lagrangian function corresponding to the m equal-
constant we update λ using (4.8) or (4.9) (discarding inactive constraints ity constraints,
and assuming that the active inequality constraints are numbered first). Oth- Pr Pr
ϕ(x, z, λ, σ) = f (x) − i=1 λi ci (x) + 12 σ i=1 ci (x)2
erwise λ is set to 0 and σ is increased. Pm
− i=r+1 λi (ci (x) − zi−r )
The algorithm might be outlined as follows: Pm
+ 12 σ i=r+1 (ci (x) − zi−r )2 . (4.14)
Algorithm 4.12. Augmented Lagrangian method For fixed λ and σ we wish to find xλ,σ and zλ,σ that minimize ϕ under the
(General problem, easy solution). constraint zλ,σ ≥ 0. xλ,σ minimizes the original problem provided that σ is
begin sufficiently large and λ is the vector of Lagrange multipliers at the solution.
λ := 0; σ := σ0 At the minimizer (xλ,σ ,zλ,σ ) either zi = 0 (the constraint zi ≥ 0 is active) or
repeat ∂ϕ
= 0. Now, from (4.14) we see that
e λ, σ)
x := argminx ϕ(x, ∂zi
if (stable active set Aδ (x)) ∂ϕ
λ := Update(x, λ, σ) = λi − σ(ci (x) − zi−r ) ,
∂zi−r
else
1
λ := 0; σ := 10 ∗ σ and equating this with zero we get zi−r = ci (x) − λi . Thus, the relevant
until STOP σ
values for the slack variables are
end 1
zi−r = max{0, ci (x) − λi }, i = r+1, . . . , m .
Many alternatives for defining the active set could be considered. It might, σ
eg, depend on the values of |ci (x)|, i = 1, , ..., m. One disadvantage about Inserting this in (4.14) will make z disappear, and we obtain
this type of definition is that a threshold value, like δ, must be provided ϕ(x, λ, σ) = f (x) − λ> d(x) + 12 σd(x)> d(x) , (4.15a)
by the user. This might be avoided by a technique like the one in Algo-
rithm 4.10 (and the following Algorithm 4.20). where d(x) hold the modified equality constraint functions given by
(
ci (x) if i ≤ r or ci (x) ≤ σ1 λi
4.1.2. A better solution. We change the inequality constraints (i = di (x) = 1 . (4.15b)
r+1, . . . , m) into equality constraints by introducing so-called slack vari- σ λi otherwise
ables zi : Thus, the augmented Lagrangian function for the generally constrained
problem is very similar to (4.1).
51 4. P ENALTY AND SQO M ETHODS 4.1. Augmented Lagrangian Method 52
Letting
Algorithm 4.20. Augmented Lagrangian method.
ψ(λ) = minx∈IRn ϕ(x, λ, σfix ) = ϕ(xλ , λ, σfix ) , (4.16) (General case).
the inequality ψ(λ) ≤ ψ(λ∗ ) corresponding to (4.5), can easily be shown begin
valid. Thus λ∗ maximizes ψ so ψ 0 (λ∗ ) = 0. k := 0; x := x0 ; λ := λ0 ; σ := σ0 {1◦ }
Kprev := kd(x)k∞ {2◦ }
The steepest ascent iteration, corresponding to (4.8), is repeat
λsa = λ − σfix d(xλ ) . (4.17) k := k+1
x := argminx ϕ(x, λ, σ); K := kd(x)k∞ {2◦ }
0 ∗
The Newton iteration for solving ψ (λ ) = 0, corresponding to (4.9), is if (K ≤ 14 Kprev )
λ := Update(x, λ, σ) {3◦ }
λnew = λ + η , where ψ 00 (λ)η = −ψ 0 (λ) . (4.18)
Kprev := max(K, Kprev )
Here the first and second order derivatives of ψ are (see Fletcher (1993)) else
σ := 10 ∗ σ
ψ 0 (λ) = −d(xλ ) ,
until K < ε or k > kmax
· ¸ (4.19) end
00 G 0 ec (ϕ00 )−1 J
e> .
ψ (λ) = − with G = J
0 σ1 I xx c
In Je we only consider the active constraints (first line in (4.15b)) and we 3◦ The updating of λ can be made by the steepest ascent formula
assume that these are numbered first. Thus G is an s by s matrix (where s (4.17), which is efficient initially, or by Newton’s method (4.18),
is the number of active constraints), and I is the unit matrix in IRm−s . which provides quadratic final convergence (under the usual regular-
Notice that if constraint number i is inactive at (x,λ) (last line in (4.15b)) ity conditions).
then the value of ηi in (4.18) is −λi . Thus the i’th component of λnew will
be 0 which is consistent with remark 3◦ on Theorem 2.5. If a Quasi-Newton method is used to find x at 3◦ , then an approximate
The algorithm is given below. Essentially, it is identical with 4.10. We have ϕ00 (or (ϕ00 )−1 ) is available and can be used in (4.19b). In this case we
the following remarks: do not obtain quadratic but superlinear convergence, which is almost as
1◦ As remark 1◦ to Algorithm 4.10. good.
2◦ As remark 2◦ to Algorithm 4.10, except for K: For active constraints Algorithm 4.20 has proved to be robust and quite efficient in practice. Typ-
|di (x)| is the deviation from ci (x) to zero. For an inactive constraint, ically the solution is found after 3 – 10 runs through the repeat loop. In
Example 4.6 we report results of some test runs with the algorithm.
|di (x)| = |λi /σ| which becomes 0 when λ is updated. If this con-
straint is also inactive at the solution, then λ∗i = 0, see remark 3◦ on
Theorem 2.5; thus, also in this case the value |di | is relevant for the
stopping criterion.
53 4. P ENALTY AND SQO M ETHODS 4.2. Lagrange-Newton Method 54
the point (α, π(α)) is below the dashed line indicated in Figure 4.7. The 4.2.2. Choice of W in (4.22). By comparison with the Taylor expansion
slope of this line is 10% of the slope of the chord between (0, ψ(0)) and (1.6) an obvious choice is W(x) = f 00 (x). However, the goal is to find a
(1, ψ(1)), ie minimizer for the Lagrangian function L(x, λ), and the description in Sec-
tion 3.4 shows that a more appropriate choice is
∆ = ψ(1) − ψ(0) X
00
X
r X
m W(x) = Lxx (x, λ) = f 00 (x) − λi ci00 (x) .
= h> f 0 (x) − µi |ci (x)| − µi | min{0, ci (x)}| . (4.25)
i=1 i=r+1
We know from Theorem 2.11 that at the solution (x∗ , λ∗ ) the Hessian ma-
trix satisfies h> Lxx
00
(x∗ , λ∗ )h ≥ 0 for all feasible directions. This does not
In this expression we have used the fact that ψ(1) = h> f 0 (x) since the other imply that Lxx 00
(x∗ , λ∗ ) is positive definite, but contributes to the theoretical
e Note that h is downhill for f , and therefore ∆ is
terms are zero for h ∈ P. motivation for the following strategy that has proven successful: Start with
guaranteed to be negative. a positive definite W(x0 ), eg W(x0 ) = I. In each iteration step update W
In each step of the line search algorithm we use a second order polynomial so that it is positive definite, thus giving a well-defined descent direction.
P (t) to approximate π(t) on the interval [0, α]. The coefficients are deter- The use of an updating strategy has the further benefit that we do not have
mined so that P (0) = π(0), P 0 (0) = ∆, P (α) = π(α), to supply second derivatives of the cost function f and the constraint func-
t2 tions {ci }.
P (t) = π(0) + ∆t + (π(α) − π(0) − ∆α) 2 .
α A good updating strategy is the BFGS method discussed in Section 5.10 of
If the coefficient to t2 is positive, then this polynomial has a minimizer β, Frandsen et al (2004). Given the current W = W(x) and the next iterate
determined by P 0 (β) = 0, or xnew = x+αh. The change in the gradient of Lagrange’s function (with
−∆α2 respect to x) is
β= . (4.26)
2(π(α) − π(0) − ∆α)
y = Lx0 (xnew , λ) − Lx0 (x, λ)
Now we can formulate the line search algorithm: >
= f 0 (xnew ) − f 0 (x) − (Jc (xnew ) − Jc (x)) λ . (4.28a)
Algorithm 4.27. Penalty Line Search. We check the so-called curvature condition,
begin
y> (xnew − x) > 0 . (4.28b)
α := 1; Compute ∆ by (4.25)
while π(α) ≥ π(0) + 0.1∆α If this is satisfied, then W is “positive definite with respect to the step di-
Compute β by (4.26) rection h”, and so is Wnew found by the BFGS formula,
α := min{0.9α, max{β, 0.1α} }
end 1 1
Wnew = W + >
yy> − > uu> ,
αh y h u (4.28c)
The expression for the new α ensures that the algorithm does not get stuck where u = Wh .
at the current value and, on the other hand, does not go to zero too fast. The
algorithm has been validated by experience. If the curvature condition is not satisfied, then we let Wnew = W.
59 4. P ENALTY AND SQO M ETHODS 4.2. Lagrange-Newton Method 60
4.2.3. Stopping criterion. We use the following measure for the goodness We shall need the following expressions
· ¸
of the approximate solution obtained as x = xprev + αh, 2x1 £ ¤
f 0 (x) = , Jc (x) = 2x1 −1 ,
2x2
η(x, λ) = |q(αh) − f (x)| ¡ ¢
q(δ) = f (x) + 2(x1 δ1 + x2 δ2 ) + 12 w11 δ12 + 2w12 δ1 δ2 + w22 δ22
X X
+ λi |ci (x)| + | min{0, ci (x)}| . (4.29) d1 (δ) = c1 (x) + 2x1 δ1 − δ2 .
i∈B i∈J
The first model problem is
As in Chapter 3, B is the set of equality and active inequality constraints, minimize q(δ) = 2 + 2δ1 + 2δ2 + 0.5δ12 + 0.5δ12
and J is the set of inactive inequality constraints. The first term measures
subject to d1 (δ) = −1 + 2δ1 − δ2 = 0 .
the quality of the approximating quadratic (4.22a) and the other terms mea-
sure how well the constraints are satisfied. This was discussed in Example 3.3, where we found the minimizer h =
[ −0.8, −2.6 ]> . The corresponding Lagrange multiplier is λ = 0.6, and this
is also used as the first value for the penalty parameter µ. Figure 4.8 shows
4.2.4. Summary. The algorithm can be summarized as follows. The pa-
π(α) = (1−.8α)2 + (1−2.6α)2 + .6|(1−.8α)2 − (1−2.6α) − 1| .
rameters ε and kmax must be set by the user.
y
Example 4.5. We shall use the algorithm on the same problem as in Example 4.3, We see that ∆ = −7.4 and
minimize f (x) = x21 + x22 π(1) = 2.984 > 2.6 + .1∆ · 1 = 1.86 .
xnew = [ 0.690283, −0.528487 ]> Each function call involves one evaluation of f (x) and f 0 (x).
η(xnew , λ) ' 0.094 , kxnew − x∗ k∞ ' 0.028 , For these examples the Lagrange–Newton method is clearly superior when the
· ¸ number of function evaluations is used as a measure. However, the work load per
0.908 0.250
Wnew ' . function evaluation may be much higher for the Lagrange–Newton method since
0.250 2.060
it involves many QP problems. This is especially important when the number of
variables and/or constraints is high.
The results from the next iteration steps are
In conclusion, we recommend the Lagrange-Newton method (SQP) when func-
tion evaluations are expensive, and the Augmented Lagrangian method when
function evaluations are cheap and we have many variables and constraints.
6) The computation in this example was performed with machine accuracy ε = 5·10−14 ,
M
but results are shown with at most 6 digits.
7) According to Example 4.3, x∗ = [
√
0.5, −0.5 ]> .
63 A PPENDIX A PPENDIX 64
This shows, that for α > 0 and sufficiently small, x+αh is feasible. Further,
from the Taylor series (1.6) for the cost function and (A.1) we get
f (x + αh) ' f (x) + αh> f 0 (x)
A PPENDIX Pp
= f (x) + αh> ( i=1 λi ai )
= f (x) + αλp h> h ,
showing that f (x+αh) < f (x) for α > 0, since λp < 0.
A. Karush–Kuhn–Tucker Theorem Thus, we have shown that at a local, constrained minimizer all the Lagrange
◦ multipliers for inequality constraints are nonnegative.
We shall prove property 2 in Theorem 2.5. Without loss of generality we
assume that the active inequality constraints are numbered first:
Equality constraints : ci (x) = 0, i = 1, . . . , r
Active inequality constraints : ci (x) = 0, i = r+1, . . . , p
Inactive constraints : ci (x) > 0, i = p+1, . . . , m .
The comments on 3◦ in the theorem and the definition of the Lagrange func-
tion imply that 1◦ has the form
X
p
f 0 (x) = λ i ai with ai = ci0 (x) . (A.1)
i=1
p
We shall prove that if one (or more) of the {λi }i=r+1 is negative, then x is
not a local, constrained minimizer:
p
For the sake of simplicity, assume that the gradients {ai }i=1 are linearly
independent and that λp < 0. Then we can decompose ap into v, its or-
thogonal projection on the subspace spanned by {ai }p−1
i=1 and h, which is
orthogonal to this subspace,
ap = v + h with ai> h = 0 for i = 1, . . . , p−1 .
For small values of kαhk we use the Taylor series (1.7) for the constraint
functions to see that
(
>
0 for i = 1, . . . , p−1
ci (x+αh) ' ci (x)+αh ai = .
αh> h for i = p
65 R EFERENCES R EFERENCES 66