Karush Kuhn Tucker

The Karush-Kuhn-Tucker Theorem
Moritz Kuhn
CDSEM Uni Mannheim
November 2006
1 Problem formulation and the Lagrange func-

tion
Consider the following maximization problem
max f (x)
x∈RN
s.t. gj (x) ≥ 0 j = 1, . . . , m
hi (x) = 0 j = 1, . . . , n
with f : RN → R, gj : RN → Rp , hi : RN → Rm being continuously

differentiable functions.
2 Saddle points of the Lagrangian and Karush-

Kuhn-Tucker points
Define the Lagrange function of the problem as
m
X n
X
L(x, λ, µ) = f (x) + λj gj (x) + µi hi (x)
j=1 i=1
Define a saddle point of the Lagrangian as a tuple
(x̃, λ̃, µ̃) s.t. L(x̃, λ̃, µ̃) = min max L(x, λ, µ)
µ,λ≥0 x
We know that
min L(x, λ, µ) ≤ min max L(x, λ, µ) ≤ max L(x, λ, µ)

µ,λ≥0 µ,λ≥0 x x
1
i.e. (x̃, λ̃, µ̃) is a critical point of L(x, λ, µ), but neither a minimum nor a
maximum.
As a next step, we want to establish the connection between a saddle point

and the solution to the maximization problem.
Consider the Lagrangian

m
X n
X
j=1 i=1
and the FOC with respect to x, which characterize the critical points of
L(x, λ, µ) and are necessary for a maximum
m
X n
X
∇x f (x̃) + λj ∇x gj (x̃) + µi ∇x hi (x̃) = 0
j=1 i=1
Furthermore, consider the FOC of L(x̃, λ, µ) with respect to (λ, µ), which are
necessary for a minimum of L(x̃, λ, µ)
m
X n
X
L(x̃, λ, µ) = f (x̃) + λj gj (x̃) + µi hi (x̃)
j=1 i=1
Define
d(λ, µ) := L(x̃, λ, µ)
The function d(λ, µ) is also called the dual function of the problem. No-
tice that d(λ, µ) is an affine function independent of the functional form of
f (x), gj (x), hi (x). Since it is a linear programming problem the minimum of
the function is either f (x̃) or it does not exist.

f (x̃) if gj (x̃) ≥ 0 ∀j and hi (x̃) = 0 ∀i
min d(λ, µ) =
µ,λ≥0 −∞ else
From this result, we can conclude that every saddle point must be a solution
to the original maximization problem. To see why, consider two arguments:
1) A saddle point exists iff x̃ is feasible for the maximization problem, i.e.
gj (x̃) ≥ 0 ∀j ∧ hi (x̃) = 0 ∀i
(Existence saddle point ⇒ feasibility of x̃)
2
2) The Lagrange function with (λ̃, µ̃) overestimates the objective function
on the interior of the feasible set. To see this, consider the following
equivalent problem
m
X n
X
max f (x) + I(gj (x) ≥ 0) + I(hi (x) = 0)
x
j=1 i=1
with
0 if gj (x) ≥ 0 ∀j
I(gj (x) ≥ 0) =
−∞ else

0 if hi (x) = 0 ∀i
I(hi (x)) = 0) =
−∞ else
This means we penalize the function for violations of the constraints. This
problem is equivalent to the first, if we assume that a solution exists.
In a next step, we replace the ”hard” penalty function by ”weak” linear

penalty functions.
m
X n
X
max f (x) + λ̃j gj + µ̃i hi (x)
x
j=1 i=1
with
λ̃j ≥ 0, µ̃i ≥ 0 ∀i, j
For feasible values, i.e.
gj (x) ≥ 0 and hi (x) = 0 ∀i, j
we clearly overestimate the true objective function. We, therefore, know that
there are no other (feasible) choices x̂ with a higher value of the objective
function than the saddle point of the problem with value f (x̃), i.e.
m
X
f (x̃) = L(x̃, λ̃, µ̃) ≥ L(x̂, λ̃, µ̃) = f (x̂) + λ̃j gj (x̂)
j=1
⇔ max L(x, λ̃, µ̃) ≥ L(x̂, λ̃, µ̃) ∀x̂

x
Hence, we have shown that every saddle point is a solution to the maximiza-
tion problem.
x is maximizer ⇐ saddle point of L(x, λ, µ)
3
Furthermore, we have already shown that
λ̃j ≥ 0 if gj (x̃) = 0
λ̃j = 0 if gj (x̃) > 0
µ̃i ≥ 0, hi (x̃) 6= 0 ∀i
and m n
X X
∇x f (x̃) + λ̃j ∇x gj (x̃) + µ̃i ∇x hi (x̃) = 0
j=1 i=1
Hence we can conclude that every saddle point satisfies the Karush-Kuhn-
Tucker conditions
m
X n
X
∇x f (x̃) + λ̃j ∇x gj (x̃) + µ̃i ∇x hi (x̃) = 0
j=1 i=1
gj (x̃) ≥ 0 ∀j hi (x̃) = 0 ∀i
λ̃j ≥ 0, µ̃i ≥ 0 and λ̃j gj (x̃) = 0 ∀j
Therefore, we get
x is maximizer ⇐ saddle point of L(x, λ, µ) ⇒ (x̃, λ̃, µ̃) satisfy
Karush-Kuhn-Tucker condition
The next question we want to examine is, under which conditions a KKT
point, i.e. a point that satisfies the Karush-Kuhn-Tucker conditions, is also
a saddle point of the Lagrangian.
As we have shown above, d(λ, µ) is an affine function and we minimize over

a convex set {(λ, µ) ∈ Rm n
+ × R }. We, therefore, know that, if a solution
exists, then it is f (x̃). Thus, we can conclude that at the saddle point, we
have found a minimum with respect to (λ, µ).
The next issue is, under which conditions FOC wrt x are sufficient for a
maximum of the Lagrangian. We already know that FOC are sufficient, if
we consider a concave programming problem (concave objective function and
convex choice set).
Recall the Lagrangian

m
X n
X
j=1 i=1
4
The Lagrangian is concave if
f : RN → R, gj : R → R ∀j and hi : R → R ∀i
are concave functions . Remember that the sum of concave functions is con-
cave and that (λ, µ) ≥ 0.
Now let us check that, with this concavity assumption, the choice set is
convex. Assume x′ and x′′ are feasible, i.e.
gj (x′ ) ≥ 0 gj (x′′ ) ≥ 0 ∀j and hi (x′ ) = 0 hi (x′′ ) = 0 ∀i
What about x̄ = αx′ + (1 − α)x′′ α ∈ (0, 1) ?
gj (x̄) = gj (αx′ + (1 − α)x′′ ) ≥ αgj (x′ ) + (1 − α)gj (x′′ ) ≥ 0 ∀j

hi (x̄) = hi (αx′ + (1 − α)x′′ ) ≥ αhi (x′ ) + (1 − α)hi (x′′ ) ≥ 0 ∀i
For feasibility with respect to hi (x), we see from the second equation that
hi (·) must be affine, i.e. we can write the equality constraints as Ax = b,
where A is a matrix of dimension n × N and b is a vector of dimension n.
This yields that
hi (αx′ + (1 − α)x′′ ) = αhi (x′ ) + (1 − α)hi (x′′ ) = 0
To get sufficient conditions for a maximum of the Lagrangian, we need, there-

fore, concavity of all functions and that equality constraint can be written
as Ax = b.
Hence, we update our diagram as follows
x is maximizer ⇐ saddle point of L(x, λ, µ) ⇒ (x̃, λ̃, µ̃) satisfy

Karush-Kuhn-Tucker condition ⇐ concavity + h affine
The last connection to be established is the question of the necessity of a

maximizer to be a saddle point of the Lagrangian and, therefore, the neces-
sity of the KKT conditions.
5
3 Necessity of the Karush-Kuhn-Tucker con-
ditions and the existence of Lagrange mul-
tipliers
To get the necessity of the KKT conditions, we have to do a bit more. First,
we define a new object. A cone C in RN is the set of points s.t.
x ∈ C ⇒ λx ∈ C ∀λ > 0
We define furthermore a tangent cone at x̄ ∈ M as
T (x̄, M) = {d ∈ RN : ∃αk > 0, x̄k ∈ M : lim x̄k → x̄ ∧ lim αk (x̄ − x̄k ) = d}
k→∞ k→∞
The d are also called a limiting direction of a feasible sequences.
Next, we want to establish the connection between the tangent cone and an
optimal solution. To do so we define the set of feasible choices as
F := {x : g(x) ≥ 0 ∧ h(x) = 0}
Lemma 1. Every solution x∗ to the maximization problem satisfies
g(x∗ ) ≥ 0, h(x∗ ) = 0 and (∇x f (x∗ ))d ≥ 0
∀d ∈ T (x∗ , F )
Proof. αk (f (x∗ ) − f (xk )) ≥ 0
by Taylor’s theorem, we get

αk (∇x f (x∗ )(x∗ − xk ) + αk ≀(kx∗ − xk k) ≥ 0
and for xk → x∗ , we get
∇x f (x∗ )d ≥ 0
with
d := lim αk (x∗ − xk )
k→∞
which proves the lemma.
Now define the index set of active constraints

A(x) := {j : gj (x) = 0}
Using this, we can linearize the tangent cone
TL (x, g, h) := {d ∈ RN : ∇x gj (x)d ≤ 0 ∀j ∈ A(x), ∇x h(x)d = 0}
6
Lemma 2. For continuously differentiable functions
g : RN → Rm and h : RN → Rn
we have
T (x, {x : g(x) ≥ 0, h(x) = 0}) ⊂ TL (x, g, h)
for all feasible x.
Proof. Pick an arbitrary d ∈ T (x̄, F ) for a feasible x̄ together with a feasible

sequence {xk } with xk → x̄ and for α > 0, we have
αk (hx̄ − h(xk )) = 0 ⇔ αk (∇k h(x̄)(x̄ − xk )) + αk ≀(kx∗ − xk k) = 0
and for xk → x̄, we have

∇x h(x̄)d = 0
Furthermore
αk (gj (x̄) − gj (xk )) ≤ 0
because we only consider gj (x̄) = 0 and gj (xk ) ≥ 0
⇔ αk ∇x gj (x̄)(x̄ − xk ) + αk 0(kx∗ − xk k) ≤ 0
and for xk → x̄
∇x gj (x̄)d ≤ 0 ∀j ∈ A(x̄)
Hence, lemma 2 is proven.
Unfortunately, the reverse is not always true, i.e.
TL (x, g, h) 6⊂ T (x, F )
therefore we impose the following assumption
TL (x, g, h) = T (x, F )
which is also called the Abadie constraint qualification (ACQ) for feasible x.
Later on, we consider sufficient conditions for (ACQ). See section 4 about
constraint qualifications.
Corollary 1. If x∗ is a solution to the maximization problem and (ACQ)

holds, then
g(x∗ ) ≥ 0 h(x∗ ) = 0 and ∇x f (x∗ )d ≥ 0 ∀d ∈ TL (x∗ , g, h)
7
Now, we need an additional lemma that I will not prove here
Farkas’ Lemma 1. For A ∈ RN ×m , B ∈ RN ×n and c ∈ RN then
∀d ∈ RN AT d ≤ 0 B T d = 0 cT d ≤ 0
⇔ ∃u ∈ Rm
+ and v ∈ Rn s.t. An + Bv = c
Using this lemma, we can finally prove that the KKT are necessary for an
optimum of the maximization problem.
Kuhn-Tucker Theorem 1. If x∗ is a (local) optimum of the problem
max f (x) s.t. g(x) ≥ 0 h(x) = 0

x
and the (ACQ) is satisfied, then
∇f (x∗ ) + ∇g(x∗ )λ∗ + ∇h(x∗ )µ∗ = 0 (1)

h(x∗ ) = 0 g(x∗ ) ≥ 0 (2)
λ∗ ≥ 0 λ∗ g(x∗ ) = 0 (3)
Proof.
∇g(x∗ )λ∗ + ∇h(x∗ )µ∗ = −∇f (x∗ ) (4)
Recall
TL (x∗ , g, h) = {d ∈ RN : ∇x gj (x)d ≤ 0 ∀j ∈ A(x∗ ) ∇x h(x)d = 0}
and from the corollary, we know that
∇x f (x∗ )d ≥ 0 ∀d ∈ TL (x∗ , g, h)
Since such a d exists, we can use Farkas’ lemma to conclude that
Au + Bv = c with u ≥ 0
Now replace A = ∇x g(x∗ ), B = ∇x h(x∗ ) and c = −∇x f (x∗ ) and we get
∇x g(x∗ )u + ∇x h(x∗ )v = −∇x f (x∗ )

If we compare this to (4), we can conclude that there exist λ∗ ≥ 0 and µ∗ s.t.
(4) holds. Hence, we have shown that (1) must hold. ((2)) holds, since x∗
is a solution to the optimization problem and ((3)) holds, since λ∗ ≥ 0 and
λ∗j = 0 for gj (x∗ ) > 0. The theorem is proven.
8
Hence, we can conclude that the KKT conditions are necessary for an opti-
mum. From our previous discussion, we know furthermore that, if f, g are
concave and h is affine, then the KKT are sufficient for a saddle path.
Therefore, we can now complete the diagram.
x∗ is maximizer ⇐ saddle point of L(x, λ, µ) ⇒ (x̃, λ̃, µ̃) satisfy

Karush-Kuhn-Tucker condition ⇐ concavity + h affine
4 Constraint Qualifications
Next we discuss different versions of constraint qualifications. A set of con-
ditions is called constraint qualifications if the conditions imply that (ACQ)
holds. I will not prove that the constraint qualifications are indeed sufficient
to imply the (ACQ).
ACQ
We say that the Abadie constraint qualification (ACQ) holds at a feasible
point x∗ if
J(x∗ , F ) = JL (x∗ , g, h)
where J(x∗ , F ) denotes the tangent cone of F at x∗ ∈ F where F is the
feasible set and JL (x∗ , g, h) denotes the linearized cone at x∗ ∈ F .
LICQ
An easy to check constraint qualification is the linear independence constraint
qualification (LICQ). The constraint functions satisfy the (LICQ) iff
[{∇x gj (x)}; {∇x hi (x)}]T ∀j ∈ A (x) ∀i
has full column rank, i.e. the rows of the Jacobian matrices for inequality
and equality constraints are linearly independent.
This condition furthermore implies the uniqueless of the Lagrange multipli-
ers.
Corollary 1.
If (LICQ) is satisfied, then the Lagrange multipliers are determined uniquely
at a KKT point.
9
Proof
We know that (x∗ , λ∗ , µ∗ ) must satisfy
m
X n
X
∇f (x∗ ) + λ∗j ∇x gj (x∗ ) + µ∗i ∇x hi (x∗ ) = 0
j=1 i=1
to be a KKT point.
⇔
X m
X
∗
−∇f (x ) = λ∗j ∇x gj (x∗ ) + µ∗i ∇x hi (x∗ )
j∈A i=1
h iT D g̃(x∗ )
∗ ∗ ∗ x
−∇f (x ) = λ̃ , µ
Dx h(x∗ )
where λ̃∗ = {λ∗j } j ∈ A (x), g̃(x∗ ) = {gj (x∗ }) j ∈ A (x) since [Dx g̃(x∗ ), Dx h(x∗ )]T
are linearly independent by (LICQ) [λ˜∗ , µ∗ ] are uniquely determined.
MFCQ
We say that the Mangasarian-Fromovitz constraint qualification (MFCQ)
holds at a feasible point x∗ if the gradient vectors
∇hi (x∗ ) = 0 ∀i
are linearly independent and that ∃d ∈ RN s.t.
∇gj (x∗ )d < 0 ∀j ∈ A (x∗ )

∇hi (x∗ )d = 0 ∀i
It can be shown that (LICQ) ⇒ (MFCQ) and therefore we can conclude that
(LICQ) ⇒ (MFCQ) ⇒ (ACQ).
5 Further Constraint Qualifications

Definition 1 (CQ I:).
All constraints gj (x) ≥ 0, hi (x) = 0 are all affine.
Definition 2 (CQ II).
All hi (x) = 0 are affine and gj (x) ≥ 0 are convex.
10
Remark 1.
Although this seems to imply a well-behaved concave program you show be
aware that it actually does not, because the choice set is not convex for this
problem.
Definition 3 (Slater’s condition).

h(x) is affine and g(x) is concave and ∃x̄ > 0, i.e. there exists a strictly
point, and we have a concave programming problem.
11
Example
max f (x, y) = x2 + x + 4y 2
s.t. 2x + 2y ≤ 1
x≥0
y≥0
Since all constraints are linear we know that the constraint qualification and
hence (ACQ) is satisfied Hessian of the objective function

2 0
H =
0 8
obviously the objective function is convex.

Form the Lagrangian of the problem
2(x, y, λ, µ1, µ2 ) = x2 + x + 4y 2 + λ(1 − 2x − 2y) + µ1 x + µ2 y
we know that if (ACQ) holds that KKT are necessary
(1) 2x + 1 − 2λ + µ1 = 0
(2) 8y − 2λ + µ2 = 0
(3) 1 − 2x − 2y ≥ 0
(4) λ≥0
(5) µ1 ≥ 0
(6) µ2 ≥ 0
(7) λ(1 − 2x − 2y) = 0
(8) µ1 x = 0
(9) µ2 y = 0
(10) x≥0
(11) y≥0
from (1) 2x + 1 = 2λ − µ1
if x = 0 : 1 = 2λ − µ1 ⇒ 1 + µ1 = 2λ ⇒ 2x + 2y + 1
if x > 0 : 2x + 1 = λ ⇒ 2x + 2y = 1
from (2) 8y = 2λ − µ2
if y = 0 : µ2 = 2λ µ2 > 0 ⇒ 2x + 2y = 1
if y > 0 : 8y = 2λ ⇒ 2x = 2y + 1
12
Therefore we know that in any case the first constraint must be binding
and we have
1
x+y =
2
CASE 1:
1
y=0 x= 2
⇒ µ2 > 0 µ1 = 0
from (1)
1 + 1 − 2λ = 0
⇒ λ = 1
from (2)
−2 + µ2 = 0
⇒ µ2 = 2
CASE 2:
y = 21 x = 0 ⇒ µ2 = 0 µ1 > 0
from (2)
4 − 2λ = 0
⇒ λ = 2
from (1)
1 − 4 + µ1 = 0
⇒ µ1 = 3
CASE 3:
x + y = 12 x > 0 y > 0 µ1 = 0 µ2 = 0
from (1): 2x + 1 = 2λ
from (2): 8y = 2λ
13
⇔
2x + 1 = 8y

1
2 − y + 1 = 8y
2
2 = 10y
1
y =
5
3
⇒ x =
10
4
⇒ λ =
5
Now evaluate the objective function at the candidates
1 1 1 3
x= y=0: + =
2 4 2 4
1 1
x=0 y= : 4 = 1 ←− optimal solution
2 4
3 1 9 3 1 9 30 16 55 11
x= y= : + +4 = + + = =
10 5 100 10 25 100 100 100 100 22
14

Karush Kuhn Tucker

Uploaded by

Copyright:

Available Formats

Karush Kuhn Tucker

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Karush Kuhn Tucker

Uploaded by

Copyright:

Available Formats

The Karush-Kuhn-Tucker Theorem

1 Problem formulation and the Lagrange func-

with f : RN → R, gj : RN → Rp , hi : RN → Rm being continuously

2 Saddle points of the Lagrangian and Karush-

Define a saddle point of the Lagrangian as a tuple

min L(x, λ, µ) ≤ min max L(x, λ, µ) ≤ max L(x, λ, µ)

As a next step, we want to establish the connection between a saddle point

Consider the Lagrangian

(Existence saddle point ⇒ feasibility of x̃)

In a next step, we replace the ”hard” penalty function by ”weak” linear

gj (x) ≥ 0 and hi (x) = 0 ∀i, j

⇔ max L(x, λ̃, µ̃) ≥ L(x̂, λ̃, µ̃) ∀x̂

x is maximizer ⇐ saddle point of L(x, λ, µ)

As we have shown above, d(λ, µ) is an affine function and we minimize over

Recall the Lagrangian

gj (x′ ) ≥ 0 gj (x′′ ) ≥ 0 ∀j and hi (x′ ) = 0 hi (x′′ ) = 0 ∀i

What about x̄ = αx′ + (1 − α)x′′ α ∈ (0, 1) ?

gj (x̄) = gj (αx′ + (1 − α)x′′ ) ≥ αgj (x′ ) + (1 − α)gj (x′′ ) ≥ 0 ∀j

This yields that

hi (αx′ + (1 − α)x′′ ) = αhi (x′ ) + (1 − α)hi (x′′ ) = 0

To get sufficient conditions for a maximum of the Lagrangian, we need, there-

Hence, we update our diagram as follows

x is maximizer ⇐ saddle point of L(x, λ, µ) ⇒ (x̃, λ̃, µ̃) satisfy

The last connection to be established is the question of the necessity of a

The d are also called a limiting direction of a feasible sequences.

by Taylor’s theorem, we get

Now define the index set of active constraints

Proof. Pick an arbitrary d ∈ T (x̄, F ) for a feasible x̄ together with a feasible

αk (hx̄ − h(xk )) = 0 ⇔ αk (∇k h(x̄)(x̄ − xk )) + αk ≀(kx∗ − xk k) = 0

and for xk → x̄, we have

Unfortunately, the reverse is not always true, i.e.

therefore we impose the following assumption

Corollary 1. If x∗ is a solution to the maximization problem and (ACQ)

g(x∗ ) ≥ 0 h(x∗ ) = 0 and ∇x f (x∗ )d ≥ 0 ∀d ∈ TL (x∗ , g, h)

max f (x) s.t. g(x) ≥ 0 h(x) = 0

and the (ACQ) is satisfied, then

∇f (x∗ ) + ∇g(x∗ )λ∗ + ∇h(x∗ )µ∗ = 0 (1)

TL (x∗ , g, h) = {d ∈ RN : ∇x gj (x)d ≤ 0 ∀j ∈ A(x∗ ) ∇x h(x)d = 0}

and from the corollary, we know that

Since such a d exists, we can use Farkas’ lemma to conclude that

Now replace A = ∇x g(x∗ ), B = ∇x h(x∗ ) and c = −∇x f (x∗ ) and we get

∇x g(x∗ )u + ∇x h(x∗ )v = −∇x f (x∗ )

Therefore, we can now complete the diagram.

x∗ is maximizer ⇐ saddle point of L(x, λ, µ) ⇒ (x̃, λ̃, µ̃) satisfy

[{∇x gj (x)}; {∇x hi (x)}]T ∀j ∈ A (x) ∀i

are linearly independent and that ∃d ∈ RN s.t.

∇gj (x∗ )d < 0 ∀j ∈ A (x∗ )

5 Further Constraint Qualifications

Definition 3 (Slater’s condition).

obviously the objective function is convex.

2(x, y, λ, µ1, µ2 ) = x2 + x + 4y 2 + λ(1 − 2x − 2y) + µ1 x + µ2 y

we know that if (ACQ) holds that KKT are necessary

You might also like