Unconstrained Minimization in R: Newton Methods

Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

Chapter 4

Unconstrained Minimization in Rn :
Newton Methods

1. Motivation
Again the problem here is
min f (x) (1)
x∈Rn

The basic idea is to take a quadratic approximation around the current point xi and minimize
this approximation to get xi+1 . So we have:
1
minn f (xi ) + ∇f (xi )(x − xi ) + (x − xi )∇2 f (xi )(x − xi ).
x∈R 2
By setting the gradient of this approximation to zero, we get:

∇f (xi ) + ∇2 f (xi )(xi+1 − xi ) = 0.

If ∇2 f (xi )−1 exists then


xi+1 = xi − ∇2 f (xi )−1 ∇f (xi )
which is the classical Newton iteration. An equivalent interpretation is obtained by linearizing
∇f (x) around xi and solving the linearized equations for xi+1 .

2. Local Convergence of Newton’s Method and its Superlinear and Quadratic Rates
of Convergence

1 Theorem Let g: Rn → Rn be differentiable on an open neighborhood of a solution x̄ of


g(x) = 0 and let ∇g be continuous and nonsingular at x̄. Then:

(a) x̄ is a point of attraction of the Newton iteration

xk+1 = xk − ∇g(xk )−1 g(xk )

and
kxk+1 − x̄k
lim =0 (Superlinear quotient rate)
k→∞ kxk − x̄k

(b) Moreover, if
k∇g(x) − ∇g(x̄)k <
= αkx − x̄k

for all x in some neighborhood N (x̄) of x̄, that is g ∈ LCα1 (N (x̄)), then there is a positive
constant c s.t.:
2
kxk+1 − x̄k <
= ckxk − x̄k (Quadratic quotient rate)

for all k >


= k0 where k0 depends on x0 .

1
Proof.

(a) We first show that ∇f (x) is nonsingular for all x in a neighborhood of x̄. Set β =
k∇g(x̄)−1 k and let 0 < ε < (2β)−1 . By the continuity
of ∇g at x̄ there is a δ > 0
s.t. k∇g(x) − ∇g(x̄)k = ε whenever x ∈ S: = {x kx − x̄k <
<
= δ}. Hence by the Banach

Perturbation Lemma A.18:


 −1 β
k ∇g(x̄) + ∇g(x) − ∇g(x̄) k = k∇g(x)−1 k <
= < 2β
1 − βε

Define the Newton mapping:

h(x) = x − ∇g(x)−1 g(x).

By B.6.2:
∇h(x̄) = I − ∇g(x̄)−1 ∇g(x̄) = 0.
Hence by the Ostrowski Point of Attraction Theorem B.6.1, x̄ is a point of attraction.
Moreover since h is differentiable at x̄, that is:

kh(xk ) − h(x̄) − ∇h(x̄)(xk − x̄)k


lim = 0,
k→∞ kxk − x̄k

whenever lim xk = x̄. Since ∇h(x̄) = 0, this gives:


k→∞

kxk+1 − x̄k
lim = 0.
k→∞ kxk − x̄k

(b) By the Quadratic Bound Lemma B.1.3:


α
kg(x) − g(x̄) − ∇g(x̄)(x − x̄)k <
= kx − x̄k2 ,
2

and

kh(x) − h(x̄)k = kx − ∇g(x)−1 g(x) − x̄k


 
<
= k − ∇g(x)−1 g(x) − g(x̄) − ∇g(x̄)(x − x̄) k
 
+ k∇g(x)−1 ∇g(x) − ∇g(x̄) (x − x̄)k
<
= βαkx − x̄k2 + 2βαkx − x̄k2 ,

in a suitable neighborhood of x̄. Hence:


2
kxk+1 − x̄k <
= 3βαkxk − x̄k ,

provided that for a given x0 , k0 , is chosen so that xk lies in the neighborhood of all
k>= k0 , and by part (a) such a k0 is guaranteed to exist.

2
3. Local convergence of Newton’s method, global convergence of damped Newton’s
method and their quadratic rate of convergence

1 Theorem (Local Newton convergence & its quadratic rate)


Consider the Newton algorithm

xi+1 = xi − ∇2 f (xi )−1 ∇f (xi )

for solving ∇f (x) = 0 where f : Rn → R. Assume that:

(i) ∃x̄: ∇f (x̄) = 0

(ii) k∇2 f (y) − ∇2 f (x)k <


= Rky − xk, ∀x, y ∈ B(x̄) for some R > 0

1
(iii) k∇2 f (x)−1 k <
= , ∀x ∈ B(x̄) for some m > 0 and B 2m (x̄) ⊂ B(x̄)
m R

2m
(iv) kx0 − x̄k <
R

Then

R
(a): (i), (ii) and (iii) =⇒ kxi+1 − x̄k <
= kxi − x̄k2
2m
(b): (i), (ii), (iii) and (iv)=⇒ ∀i : xi ∈ B 2m (x̄) & xi → x̄
R

Note: B 2m (x̄) ⊂ B(x̄) can be induced by taking R larger, but then x0 must be closer to x̄.
R

Proof.

(a)
 
xi+1 − x̄ = xi − x̄ − ∇2 fi−1 ∇fi − ∇f (x̄) By (i)
h  it=1
= xi − x̄ − ∇2 fi−1 ∇f x̄ + t(xi − x̄)
t=0
Z 1   
= xi − x̄ − ∇2 fi−1 ∇2 f x̄ + t(xi − x̄) − ∇2 fi + ∇2 fi (xi − x̄)dt
0
(By continuity of ∇2 f from (ii))
Hence:
Z 1  
kxi+1 − x̄k <
= k∇2 fi−1 k k∇2 f x̄ + t(xi − x̄) − ∇2 fi k · kxi − x̄kdt
0
(By Lemma B.2.2)
1 1
Z
<
= R(1 − t)kxi − x̄kdtkxi − x̄k (By (ii), (iii))
m 0
R
= kxi − x̄k2
2m

3
(b) By induction:
! !2i
R < R
kxi − x̄k = kx0 − x̄k i = 1, 2, . . .
2m 2m

2 i
(Induction proof: ai <
= a0

2 i
= a0 . Then:
The inequality holds for i = 1. Suppose it holds for i, that is ai <

 i
2 i+1
2< 2
ai+1 <
= (ai ) = ao = a2i ,

where the first inequality follows from part (a). Thus the induction is extended to i + 1.)

Hence !2i
2m R 2m
kxi − x̄k <
= kx0 − x̄k < ,
R 2m R
and lim kxi − x̄k = 0.
i→∞

2 Theorem (Globally convergent damped Newton & its rate of convergence)


Consider the damped Newton algorithm:

Direction choice pi = −∇2 f (xi )−1 ∇f (xi )

Stepsize: xi+1 = xi + λi pi and λi chosen such that:

f (xi ) − f (xi+1 ) >


= σ2 (−∇f (xi )pi ),

for some forcing function σ2 . (Hence all stepsizes of SCAT1 can be used. 1 )

Assumption
(v) f ∈ C 2 and ∃M >
= m > 0 s.t. ∀x, y ∈ RN :

mkyk2 < 2 <


= y∇ f (x)y = M kyk
2

Then

(a): (v) =⇒ {xi } → x̄ where x̄ is the unique solution of minn f (x)


x∈R
(b): h (ii), (v) and “min. or first stationary point along pi ” stepsizei =⇒
1
M 2R
kxi+1 −x̄k <
= 3 kxi −x̄k2
2m 2
1
O. L. Mangasarian: “Parallel gradient distribution in unconstrained optimization”, SIAM Journal on Optimiza-
tion 33, 1995, 1916-1925, Theorem 2.

4
Proof.
1
(a) −∇fi pi = ∇fi ∇2 fi−1 ∇fi = k∇fi k2 =: σ(k∇fi k)
M 2
= f (xi ) the bounded set S = {x f (x) = f (x0 )} ⊂{x kx−x0 k =
Because f (xi+1 ) < < < k∇f(x0 )k}

m
contains {xi }. But:
1 1
kpi k2 = ∇fi ∇2 fi−2 ∇fi <
= 2
k∇fi k2 <
= max ||∇f (xi )k2 .
M m2 x∈S
So {xi , pi } is bounded and hence must have an accumulation point. But for each such
accumulation point (x̄, p̄), x̄ is the unique solution of minn f (x). So {xi } → x̄.
x∈R
(b) We need the following lemma:
3 Lemma Let f : Rn → R, let M >
= m>0 s.t. ∀x, y ∈ Rn

M kyk2 > 2 >


= y∇ f (x)y = mkyk
2

Then * +
M 1
2
f (y) <
= f (x), ∇f (x̄ = 0 =⇒ ky − x̄k <
= kx − x̄k
m
Proof of Lemma
1
f (y) = f (x̄) + ∇f (x̄)(y − x̄) + (y − x̄)∇2 f (s)(y − x̄)
2
1
f (x) = f (x̄) + ∇f (x̄)(x − x̄) + (x − x̄)∇2 f (t)(x − x̄),
2

where s ∈ (x̄, y) and t ∈ (x̄, x). Since, ∇f (x̄), subtraction gives:


1 1
0>
= f (y) − f (x) = (y − x̄)∇2 f (s)(y − x̄) − (x − x̄)∇2 f (t)(x − x̄)
2 2
1 1
>
= − M kx − x̄k2 + mky − x̄k2
2 2
Going back to the proof of part (b), let x̄i = xi − ∇2 fi−1 ∇fi , that is, x̄i is the “xi+1 ” of the
Newton algorithm. Because we are minimizing along pi , obviously f (xi+1 ) < = f (x̄i ) and hence

by the above lemma:


M 1
2
kxi+1 − x̄k <
= kx̄i − x̄k.
m
By part (a) of the previous theorem, Theorem 1,
R
kxi+1 − x̄k <
= kxi − x̄k2 .
2m
Hence 1
M 2R
kxi+1 − x̄k <
= 3 kxi − x̄k2
2m 2

You might also like