Unconstrained
Unconstrained
Unconstrained
10.1
Unconstrained minimization
minimize 𝑓 (𝑥)
𝑓 (𝑥 (𝑘) ) → 𝑝★
∇ 𝑓 (𝑥★) = 0
• 𝑥 (0) ∈ dom 𝑓
• sublevel set 𝑆 = {𝑥 | 𝑓 (𝑥) ≤ 𝑓 (𝑥 (0) )} is closed
2nd condition is hard to verify, except when all sublevel sets are closed:
X
𝑚 X
𝑚
𝑓 (𝑥) = log( exp(𝑎𝑇𝑖 𝑥 + 𝑏𝑖 )), 𝑓 (𝑥) = − log(𝑏𝑖 − 𝑎𝑇𝑖 𝑥)
𝑖=1 𝑖=1
Implications
• for 𝑥, 𝑦 ∈ 𝑆 ,
𝑇 𝑚
𝑓 (𝑦) ≥ 𝑓 (𝑥) + ∇ 𝑓 (𝑥) (𝑦 − 𝑥) + k𝑥 − 𝑦k22
2
• 𝑆 is bounded
• 𝑝★ > −∞ and for 𝑥 ∈ 𝑆 ,
★ 1
𝑓 (𝑥) − 𝑝 ≤ k∇ 𝑓 (𝑥) k22
2𝑚
• other notations:
𝑥 + = 𝑥 + 𝑡Δ𝑥, 𝑥 := 𝑥 + 𝑡Δ𝑥
∇ 𝑓 (𝑥)𝑇 Δ𝑥 < 0
𝑓 (𝑥 + 𝑡Δ𝑥)
𝑇
𝑓 (𝑥) + 𝑡∇ 𝑓 (𝑥) Δ𝑥 𝑓 (𝑥) + 𝛼𝑡∇ 𝑓 (𝑥)𝑇 Δ𝑥
𝑡
𝑡=0 𝑡0
Unconstrained minimization 10.6
Gradient descent method
𝑘 𝑘
(𝑘) 𝛾−1 (𝑘) 𝛾−1
𝑥1 = 𝛾 , 𝑥2 = −
𝛾+1 𝛾+1
• very slow if 𝛾 ≫ 1 or 𝛾 ≪ 1
• example for 𝛾 = 10:
𝑥 (0)
𝑥2
0
𝑥 (1)
−4
−10 0 10
𝑥1
Unconstrained minimization 10.8
Nonquadratic example
𝑥 (0) 𝑥 (0)
𝑥 (2)
𝑥 (1)
𝑥 (1)
𝑇
X
500
𝑓 (𝑥) = 𝑐 𝑥 − log(𝑏𝑖 − 𝑎𝑇𝑖 𝑥)
𝑖=1
104
102
𝑓 (𝑥 (𝑘) ) − 𝑝★
100
exact l.s.
10−2
backtracking l.s.
10−4
0 50 100 150 200
𝑘
𝑓 (𝑥 + 𝑣) ≈ 𝑓 (𝑥) + ∇ 𝑓 (𝑥)𝑇 𝑣
unit balls, steepest descent directions for a quadratic norm and ℓ1-norm:
−∇ 𝑓 (𝑥)
−∇ 𝑓 (𝑥)
Δ𝑥nsd
Δ𝑥nsd
𝑥 (0)
𝑥 (0)
𝑥 (2)
𝑥 (1) 𝑥 (2)
𝑥 (1)
• steepest descent with backtracking line search for two quadratic norms
• ellipses show {𝑥 | k𝑥 − 𝑥 (𝑘) k𝑃 = 1}
• equivalent interpretation of steepest descent with quadratic norm k · k𝑃 :
gradient descent after change of variables 𝑥¯ = 𝑃1/2𝑥
Interpretations
b 𝑇 1 𝑇 2
𝑓 (𝑥 + 𝑣) = 𝑓 (𝑥) + ∇ 𝑓 (𝑥) 𝑣 + 𝑣 ∇ 𝑓 (𝑥)𝑣
2
∇ 𝑓 (𝑥 + 𝑣) ≈ ∇ b
𝑓 (𝑥 + 𝑣) = ∇ 𝑓 (𝑥) + ∇2 𝑓 (𝑥)𝑣 = 0
b
𝑓′
b
𝑓 𝑓′
(𝑥 + Δ𝑥nt, 𝑓 ′ (𝑥 + Δ𝑥nt))
(𝑥, 𝑓 (𝑥)) (𝑥, 𝑓 ′ (𝑥))
(𝑥 + Δ𝑥nt, 𝑓 (𝑥 + Δ𝑥nt)) 𝑓
𝑥 + Δ𝑥nsd
𝑥 + Δ𝑥nt
Properties
b 1
𝑓 (𝑥) − inf 𝑓 (𝑦) = 𝜆(𝑥) 2
𝑦 2
• equal to the norm of the Newton step in the quadratic Hessian norm
Affine invariance
• Newton iterates for 𝑓˜(𝑦) = 𝑓 (𝑇 𝑦) with starting point 𝑦 (0) = 𝑇 −1𝑥 (0) are
Assumptions
2𝑙−𝑘 2𝑙−𝑘
𝐿 𝑙 𝐿 𝑘 1
2
k∇ 𝑓 (𝑥 ) k2 ≤ 2
k∇ 𝑓 (𝑥 ) k2 ≤ , 𝑙≥𝑘
2𝑚 2𝑚 2
𝑓 (𝑥 (0) ) − 𝑝★
+ log2 log2 (𝜖0/𝜖)
𝛾
105
𝑥 (0) 100
𝑓 (𝑥 (𝑘) ) − 𝑝★
𝑥 (1) 10−5
10−10
10−15
0 1 2 3 4 5
𝑘
105 2
10−15 0
0 2 4 6 8 10 0 2 4 6 8
𝑘 𝑘
X
10000 X
100000
𝑓 (𝑥) = − log(1 − 𝑥𝑖2) − log(𝑏𝑖 − 𝑎𝑇𝑖 𝑥)
𝑖=1 𝑖=1
105
𝑓 (𝑥 (𝑘) ) − 𝑝★
100
10−5
0 5 10 15 20
𝑘
Definition
• convex 𝑓 : R → R is self-concordant if
Examples on R
Properties
is self-concordant
Examples: properties can be used to show that the following are s.c.
P𝑚 𝑇 𝑥) on {𝑥 | 𝑎𝑇 𝑥 < 𝑏 , 𝑖 = 1, . . . , 𝑚}
• 𝑓 (𝑥) = − 𝑖=1
log(𝑏 𝑖 − 𝑎 𝑖 𝑖 𝑖
𝑛
• 𝑓 (𝑋) = − log det 𝑋 on S++
• 𝑓 (𝑥) = − log(𝑦 2 − 𝑥𝑇 𝑥) on {(𝑥, 𝑦) | k𝑥k2 < 𝑦}
• if 𝜆(𝑥) ≤ 𝜂, then
2
(𝑘+1) (𝑘)
2𝜆(𝑥 ) ≤ 2𝜆(𝑥 )
𝑓 (𝑥 (0) ) − 𝑝★
+ log2 log2 (1/𝜖)
𝛾
P
𝑚
minimize 𝑓 (𝑥) = − log(𝑏𝑖 − 𝑎𝑇𝑖 𝑥)
𝑖=1
25
20
◦: 𝑚 = 100, 𝑛 = 50
iterations
15
: 𝑚 = 1000, 𝑛 = 500
^: 𝑚 = 1000, 𝑛 = 50 10
0
0 5 10 15 20 25 30 35
(0) ★
𝑓 (𝑥 )−𝑝
• number of iterations much smaller than 375( 𝑓 (𝑥 (0) ) − 𝑝★) + 6
• bound of the form 𝑐( 𝑓 (𝑥 (0) ) − 𝑝★) + 6 with smaller 𝑐 (empirically) valid
Unconstrained minimization 10.28
Implementation
main effort in each iteration: evaluate derivatives and solve Newton system
𝐻Δ𝑥 = −𝑔
X
𝑛
𝑓 (𝑥) = 𝜓𝑖 (𝑥𝑖 ) + 𝜓0 ( 𝐴𝑥 + 𝑏), 𝐻 = 𝐷 + 𝐴𝑇 𝐻0 𝐴
𝑖=1