Karush Kuhn Tucker
Karush Kuhn Tucker
Karush Kuhn Tucker
Moritz Kuhn
CDSEM Uni Mannheim
November 2006
max f (x)
x∈RN
s.t. gj (x) ≥ 0 j = 1, . . . , m
hi (x) = 0 j = 1, . . . , n
(x̃, λ̃, µ̃) s.t. L(x̃, λ̃, µ̃) = min max L(x, λ, µ)
µ,λ≥0 x
We know that
1
i.e. (x̃, λ̃, µ̃) is a critical point of L(x, λ, µ), but neither a minimum nor a
maximum.
and the FOC with respect to x, which characterize the critical points of
L(x, λ, µ) and are necessary for a maximum
m
X n
X
∇x f (x̃) + λj ∇x gj (x̃) + µi ∇x hi (x̃) = 0
j=1 i=1
Furthermore, consider the FOC of L(x̃, λ, µ) with respect to (λ, µ), which are
necessary for a minimum of L(x̃, λ, µ)
m
X n
X
L(x̃, λ, µ) = f (x̃) + λj gj (x̃) + µi hi (x̃)
j=1 i=1
Define
d(λ, µ) := L(x̃, λ, µ)
The function d(λ, µ) is also called the dual function of the problem. No-
tice that d(λ, µ) is an affine function independent of the functional form of
f (x), gj (x), hi (x). Since it is a linear programming problem the minimum of
the function is either f (x̃) or it does not exist.
f (x̃) if gj (x̃) ≥ 0 ∀j and hi (x̃) = 0 ∀i
min d(λ, µ) =
µ,λ≥0 −∞ else
From this result, we can conclude that every saddle point must be a solution
to the original maximization problem. To see why, consider two arguments:
1) A saddle point exists iff x̃ is feasible for the maximization problem, i.e.
gj (x̃) ≥ 0 ∀j ∧ hi (x̃) = 0 ∀i
2
2) The Lagrange function with (λ̃, µ̃) overestimates the objective function
on the interior of the feasible set. To see this, consider the following
equivalent problem
m
X n
X
max f (x) + I(gj (x) ≥ 0) + I(hi (x) = 0)
x
j=1 i=1
with
0 if gj (x) ≥ 0 ∀j
I(gj (x) ≥ 0) =
−∞ else
0 if hi (x) = 0 ∀i
I(hi (x)) = 0) =
−∞ else
This means we penalize the function for violations of the constraints. This
problem is equivalent to the first, if we assume that a solution exists.
with
λ̃j ≥ 0, µ̃i ≥ 0 ∀i, j
For feasible values, i.e.
we clearly overestimate the true objective function. We, therefore, know that
there are no other (feasible) choices x̂ with a higher value of the objective
function than the saddle point of the problem with value f (x̃), i.e.
m
X
f (x̃) = L(x̃, λ̃, µ̃) ≥ L(x̂, λ̃, µ̃) = f (x̂) + λ̃j gj (x̂)
j=1
Hence, we have shown that every saddle point is a solution to the maximiza-
tion problem.
3
Furthermore, we have already shown that
λ̃j ≥ 0 if gj (x̃) = 0
λ̃j = 0 if gj (x̃) > 0
µ̃i ≥ 0, hi (x̃) 6= 0 ∀i
and m n
X X
∇x f (x̃) + λ̃j ∇x gj (x̃) + µ̃i ∇x hi (x̃) = 0
j=1 i=1
Hence we can conclude that every saddle point satisfies the Karush-Kuhn-
Tucker conditions
m
X n
X
∇x f (x̃) + λ̃j ∇x gj (x̃) + µ̃i ∇x hi (x̃) = 0
j=1 i=1
gj (x̃) ≥ 0 ∀j hi (x̃) = 0 ∀i
λ̃j ≥ 0, µ̃i ≥ 0 and λ̃j gj (x̃) = 0 ∀j
Therefore, we get
x is maximizer ⇐ saddle point of L(x, λ, µ) ⇒ (x̃, λ̃, µ̃) satisfy
Karush-Kuhn-Tucker condition
The next question we want to examine is, under which conditions a KKT
point, i.e. a point that satisfies the Karush-Kuhn-Tucker conditions, is also
a saddle point of the Lagrangian.
The next issue is, under which conditions FOC wrt x are sufficient for a
maximum of the Lagrangian. We already know that FOC are sufficient, if
we consider a concave programming problem (concave objective function and
convex choice set).
4
The Lagrangian is concave if
f : RN → R, gj : R → R ∀j and hi : R → R ∀i
are concave functions . Remember that the sum of concave functions is con-
cave and that (λ, µ) ≥ 0.
Now let us check that, with this concavity assumption, the choice set is
convex. Assume x′ and x′′ are feasible, i.e.
For feasibility with respect to hi (x), we see from the second equation that
hi (·) must be affine, i.e. we can write the equality constraints as Ax = b,
where A is a matrix of dimension n × N and b is a vector of dimension n.
5
3 Necessity of the Karush-Kuhn-Tucker con-
ditions and the existence of Lagrange mul-
tipliers
To get the necessity of the KKT conditions, we have to do a bit more. First,
we define a new object. A cone C in RN is the set of points s.t.
x ∈ C ⇒ λx ∈ C ∀λ > 0
We define furthermore a tangent cone at x̄ ∈ M as
T (x̄, M) = {d ∈ RN : ∃αk > 0, x̄k ∈ M : lim x̄k → x̄ ∧ lim αk (x̄ − x̄k ) = d}
k→∞ k→∞
Next, we want to establish the connection between the tangent cone and an
optimal solution. To do so we define the set of feasible choices as
F := {x : g(x) ≥ 0 ∧ h(x) = 0}
Lemma 1. Every solution x∗ to the maximization problem satisfies
g(x∗ ) ≥ 0, h(x∗ ) = 0 and (∇x f (x∗ ))d ≥ 0
∀d ∈ T (x∗ , F )
Proof. αk (f (x∗ ) − f (xk )) ≥ 0
6
Lemma 2. For continuously differentiable functions
g : RN → Rm and h : RN → Rn
we have
T (x, {x : g(x) ≥ 0, h(x) = 0}) ⊂ TL (x, g, h)
for all feasible x.
⇔ αk ∇x gj (x̄)(x̄ − xk ) + αk 0(kx∗ − xk k) ≤ 0
and for xk → x̄
∇x gj (x̄)d ≤ 0 ∀j ∈ A(x̄)
Hence, lemma 2 is proven.
TL (x, g, h) 6⊂ T (x, F )
TL (x, g, h) = T (x, F )
which is also called the Abadie constraint qualification (ACQ) for feasible x.
Later on, we consider sufficient conditions for (ACQ). See section 4 about
constraint qualifications.
7
Now, we need an additional lemma that I will not prove here
Farkas’ Lemma 1. For A ∈ RN ×m , B ∈ RN ×n and c ∈ RN then
∀d ∈ RN AT d ≤ 0 B T d = 0 cT d ≤ 0
⇔ ∃u ∈ Rm
+ and v ∈ Rn s.t. An + Bv = c
Using this lemma, we can finally prove that the KKT are necessary for an
optimum of the maximization problem.
Kuhn-Tucker Theorem 1. If x∗ is a (local) optimum of the problem
Proof.
∇g(x∗ )λ∗ + ∇h(x∗ )µ∗ = −∇f (x∗ ) (4)
Recall
∇x f (x∗ )d ≥ 0 ∀d ∈ TL (x∗ , g, h)
Au + Bv = c with u ≥ 0
8
Hence, we can conclude that the KKT conditions are necessary for an opti-
mum. From our previous discussion, we know furthermore that, if f, g are
concave and h is affine, then the KKT are sufficient for a saddle path.
4 Constraint Qualifications
Next we discuss different versions of constraint qualifications. A set of con-
ditions is called constraint qualifications if the conditions imply that (ACQ)
holds. I will not prove that the constraint qualifications are indeed sufficient
to imply the (ACQ).
ACQ
We say that the Abadie constraint qualification (ACQ) holds at a feasible
point x∗ if
J(x∗ , F ) = JL (x∗ , g, h)
where J(x∗ , F ) denotes the tangent cone of F at x∗ ∈ F where F is the
feasible set and JL (x∗ , g, h) denotes the linearized cone at x∗ ∈ F .
LICQ
An easy to check constraint qualification is the linear independence constraint
qualification (LICQ). The constraint functions satisfy the (LICQ) iff
has full column rank, i.e. the rows of the Jacobian matrices for inequality
and equality constraints are linearly independent.
This condition furthermore implies the uniqueless of the Lagrange multipli-
ers.
Corollary 1.
If (LICQ) is satisfied, then the Lagrange multipliers are determined uniquely
at a KKT point.
9
Proof
We know that (x∗ , λ∗ , µ∗ ) must satisfy
m
X n
X
∇f (x∗ ) + λ∗j ∇x gj (x∗ ) + µ∗i ∇x hi (x∗ ) = 0
j=1 i=1
to be a KKT point.
⇔
X m
X
∗
−∇f (x ) = λ∗j ∇x gj (x∗ ) + µ∗i ∇x hi (x∗ )
j∈A i=1
h iT D g̃(x∗ )
∗ ∗ ∗ x
−∇f (x ) = λ̃ , µ
Dx h(x∗ )
where λ̃∗ = {λ∗j } j ∈ A (x), g̃(x∗ ) = {gj (x∗ }) j ∈ A (x) since [Dx g̃(x∗ ), Dx h(x∗ )]T
are linearly independent by (LICQ) [λ˜∗ , µ∗ ] are uniquely determined.
MFCQ
We say that the Mangasarian-Fromovitz constraint qualification (MFCQ)
holds at a feasible point x∗ if the gradient vectors
∇hi (x∗ ) = 0 ∀i
It can be shown that (LICQ) ⇒ (MFCQ) and therefore we can conclude that
(LICQ) ⇒ (MFCQ) ⇒ (ACQ).
10
Remark 1.
Although this seems to imply a well-behaved concave program you show be
aware that it actually does not, because the choice set is not convex for this
problem.
11
Example
max f (x, y) = x2 + x + 4y 2
s.t. 2x + 2y ≤ 1
x≥0
y≥0
Since all constraints are linear we know that the constraint qualification and
hence (ACQ) is satisfied Hessian of the objective function
2 0
H =
0 8
(1) 2x + 1 − 2λ + µ1 = 0
(2) 8y − 2λ + µ2 = 0
(3) 1 − 2x − 2y ≥ 0
(4) λ≥0
(5) µ1 ≥ 0
(6) µ2 ≥ 0
(7) λ(1 − 2x − 2y) = 0
(8) µ1 x = 0
(9) µ2 y = 0
(10) x≥0
(11) y≥0
from (1) 2x + 1 = 2λ − µ1
if x = 0 : 1 = 2λ − µ1 ⇒ 1 + µ1 = 2λ ⇒ 2x + 2y + 1
if x > 0 : 2x + 1 = λ ⇒ 2x + 2y = 1
from (2) 8y = 2λ − µ2
if y = 0 : µ2 = 2λ µ2 > 0 ⇒ 2x + 2y = 1
if y > 0 : 8y = 2λ ⇒ 2x = 2y + 1
12
Therefore we know that in any case the first constraint must be binding
and we have
1
x+y =
2
CASE 1:
1
y=0 x= 2
⇒ µ2 > 0 µ1 = 0
from (1)
1 + 1 − 2λ = 0
⇒ λ = 1
from (2)
−2 + µ2 = 0
⇒ µ2 = 2
CASE 2:
y = 21 x = 0 ⇒ µ2 = 0 µ1 > 0
from (2)
4 − 2λ = 0
⇒ λ = 2
from (1)
1 − 4 + µ1 = 0
⇒ µ1 = 3
CASE 3:
x + y = 12 x > 0 y > 0 µ1 = 0 µ2 = 0
from (1): 2x + 1 = 2λ
from (2): 8y = 2λ
13
⇔
2x + 1 = 8y
1
2 − y + 1 = 8y
2
2 = 10y
1
y =
5
3
⇒ x =
10
4
⇒ λ =
5
Now evaluate the objective function at the candidates
1 1 1 3
x= y=0: + =
2 4 2 4
1 1
x=0 y= : 4 = 1 ←− optimal solution
2 4
3 1 9 3 1 9 30 16 55 11
x= y= : + +4 = + + = =
10 5 100 10 25 100 100 100 100 22
14