Numerical Optimization: Numerical Geometry of Non-Rigid Shapes

Numerical geometry of non-rigid shapes Numerical Optimization
Numerical Optimization
Alexander Bronstein, Michael Bronstein 2008 All rights reserved. Web: tosca.cs.technion.ac.il
Slowest
Longest
Shortest
Maximal
Minimal
Fastest Largest
Smallest
Common denominator: optimization problems
Optimization problems
Generic unconstrained minimization problem
where Vector space A solution The value is the minimum is the search space
is a cost (or objective) function is the minimizer of
Local vs. global minimum

Find minimum by analyzing the local behavior of the cost function
Local minimum
Global minimum
Local vs. global in real life

False summit 8,030 m
Main summit 8,047 m
Broad Peak (K3), 12th highest mountain on Earth
Convex functions
A function defined on a convex set is called convex if
for any
and
For convex function local minimum = global minimum
Convex
Non-convex
One-dimensional optimality conditions

Point Approximate a function around as a parabola using Taylor expansion . is the local minimizer of a -function if
guarantees the minimum at
guarantees the parabola is convex
Gradient
In multidimensional case, linearization of the function according to Taylor
gives a multidimensional analogy of the derivative.
The function
, denoted as
, is called the gradient of
In one-dimensional case, it reduces to standard definition of derivative
Gradient
In Euclidean space ( ), can be represented in standard basis
in the following way:
i-th place
which gives
10
Example 1: gradient of a matrix function

Given product Compute the gradient of the function an matrix where is (space of real matrices) with standard inner
For square matrices
11
Example 2: gradient of a matrix function

Compute the gradient of the function an matrix where is
12
Hessian
Linearization of the gradient
gives a multidimensional analogy of the secondorder derivative.
The function
is called the Hessian of
, denoted as
Ludwig Otto Hesse (1811-1874)
In the standard basis, Hessian is a symmetric matrix of mixed second-order

derivatives
13
Optimality conditions, bis

Point matrix (denoted Approximate a function around . for all ) as a parabola using Taylor expansion , i.e., the Hessian is a positive definite is the local minimizer of a -function if
guarantees the minimum at
guarantees the parabola is convex
14
Optimization algorithms
Descent direction Step size
15
Generic optimization algorithm

Start with some Determine descent direction
Choose step size
such that
Update iterate
Until convergence
Increment iteration counter Solution Descent direction Step size Stopping criterion
16
Stopping criteria
Near local minimum, (or equivalently )
Stop when gradient norm becomes small
Stop when step size becomes small
Stop when relative objective change becomes small
17
Line search
Optimal step size can be found by solving a one-dimensional optimization problem
One-dimensional optimization algorithms for finding the optimal step size are generically called exact line search
18
Armijo [ar-mi-xo] rule

The function sufficiently decreases if Armijo rule (Larry Armijo, 1966): start with multiplying by some and decrease it by
until the function sufficiently decreases
19
Descent direction
How to descend in the fastest way? Go in the direction in which the height lines are the densest
Devils Tower
Topographic map
20
Steepest descent
Directional derivative: how much changes in the direction (negative for a descent direction)
Find a unit-length direction minimizing directional

derivative
21
Steepest descent
L2 norm
L1 norm
Normalized steepest descent
Coordinate descent (coordinate axis in which descent is maximal)
22
Steepest descent algorithm

Start with some Compute steepest descent direction
Choose step size using line search
Until convergence
Update iterate
Increment iteration counter
23
MATLAB
intermezzo
Steepest descent
24
Condition number
Condition number is the ratio of maximal and minimal eigenvalues of the Hessian
1
,
1
0.5
0.5
-0.5
-0.5
-1 -1
-0.5
0.5
-1 -1
-0.5
0.5
Problem with large condition number is called ill-conditioned Steepest descent convergence rate is slow for ill-conditioned problems
25
Q-norm
Change of coordinates
Q-norm
L2 norm
Function Gradient Descent direction
26
Preconditioning
Using Q-norm for steepest descent can be regarded as a change of coordinates, called preconditioning Preconditioner should be chosen to improve the condition number of
the Hessian in the proximity of the solution In system of coordinates, the Hessian at the solution is
(a dream)
27
Newton method as optimal preconditioner

Best theoretically possible preconditioner direction , giving descent
Ideal condition number
Problem: the solution
is unknown in advance
Newton direction: use Hessian as a preconditioner at each iteration
28
Another derivation of the Newton method

Approximate the function as a quadratic function using second-order Taylor expansion
(quadratic function in
Close to solution the function looks like a quadratic function; the Newton method converges fast
29
Newton method
Start with some Compute Newton direction
Choose step size using line search
Until convergence
Update iterate
Increment iteration counter
30
Frozen Hessian
Observation: close to the optimum, the Hessian does not change significantly Reduce the number of Hessian inversions by keeping the Hessian from previous iterations and update it once in a few iterations Such a method is called Newton with frozen Hessian
31
Cholesky factorization
Decompose the Hessian
where
is a lower triangular matrix
Solve the Newton system Andre Louis Cholesky (1875-1918)
in two steps Forward substitution
Backward substitution
Complexity: , better than straightforward matrix inversion
32
Truncated Newton
Solve the Newton system approximately
A few iterations of conjugate gradients or other algorithm for the solution of linear systems can be used Such a method is called truncated or inexact Newton
33
Non-convex optimization
Using convex optimization methods with non-convex functions does not guarantee global convergence! There is no theoretical guaranteed global optimization, just heuristics
Local minimum
Global minimum
Good initialization
Multiresolution
34
Iterative majorization
Construct a majorizing function . Majorizing inequality: for all satisfying
is convex or easier to optimize w.r.t.
35
Iterative majorization
Start with some Find such that
Update iterate
Until convergence
Increment iteration counter Solution
36
Constrained optimization
MINEFIELD CLOSED ZONE
37
Constrained optimization problems

Generic constrained minimization problem
where
are inequality constraints

are equality constraints in which the constraints hold is called
A subset of the search space feasible set A point
belonging to the feasible set is called a feasible solution may be infeasible!
A minimizer of the problem
38
An example
Equality constraint Inequality constraint
Feasible set
Inequality constraint
A point
is active at point
if
, inactive otherwise
and of
is regular if the gradients of equality constraints are linearly independent
active inequality constraints
39
Lagrange multipliers
Main idea to solve constrained problems: arrange the objective and constraints into a single function
and minimize it as an unconstrained problem is called Lagrangian and are called Lagrange multipliers
40
KKT conditions
If is a regular point and a local minimum, there exist Lagrange multipliers and Known as Karush-Kuhn-Tucker conditions Necessary but not sufficient! such that for all such that inactive constraints and for all
for active constraints and zero for
41
KKT conditions
Sufficient conditions:
If the objective
is convex, the inequality constraints

are affine, and and for all
are convex
and the equality constraints then for all such that inactive constraints
for active constraints and zero for
is the solution of the constrained problem (global constrained
minimizer)
42
Geometric interpretation
Consider a simpler problem: Equality constraint
The gradient of objective and constraint must line up at the solution
43
Penalty methods
Define a penalty aggregate
where
and
are parametric penalty functions
For larger values of the parameter

is stronger
, the penalty on the constraint violation
44
Penalty methods
Inequality penalty
Equality penalty
45
Penalty methods
Start with some Find and initial value of
by solving an unconstrained optimization problem initialized with
Until convergence
Set
Set Update
Solution

Numerical Optimization: Numerical Geometry of Non-Rigid Shapes

Uploaded by

Copyright:

Available Formats

Numerical Optimization: Numerical Geometry of Non-Rigid Shapes

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Numerical Optimization: Numerical Geometry of Non-Rigid Shapes

Uploaded by

Copyright:

Available Formats

Numerical geometry of non-rigid shapes Numerical Optimization

Numerical geometry of non-rigid shapes Numerical Optimization

Common denominator: optimization problems

Numerical geometry of non-rigid shapes Numerical Optimization

is a cost (or objective) function is the minimizer of

Numerical geometry of non-rigid shapes Numerical Optimization

Local vs. global minimum

Numerical geometry of non-rigid shapes Numerical Optimization

Local vs. global in real life

Main summit 8,047 m

Broad Peak (K3), 12th highest mountain on Earth

Numerical geometry of non-rigid shapes Numerical Optimization

For convex function local minimum = global minimum

Numerical geometry of non-rigid shapes Numerical Optimization

One-dimensional optimality conditions

guarantees the minimum at

guarantees the parabola is convex

Numerical geometry of non-rigid shapes Numerical Optimization

gives a multidimensional analogy of the derivative.

, is called the gradient of

In one-dimensional case, it reduces to standard definition of derivative

Numerical geometry of non-rigid shapes Numerical Optimization

in the following way:

Numerical geometry of non-rigid shapes Numerical Optimization

Example 1: gradient of a matrix function

For square matrices

Numerical geometry of non-rigid shapes Numerical Optimization

Example 2: gradient of a matrix function

Numerical geometry of non-rigid shapes Numerical Optimization

gives a multidimensional analogy of the secondorder derivative.

Ludwig Otto Hesse (1811-1874)

In the standard basis, Hessian is a symmetric matrix of mixed second-order

Numerical geometry of non-rigid shapes Numerical Optimization

Optimality conditions, bis

guarantees the minimum at

guarantees the parabola is convex

Numerical geometry of non-rigid shapes Numerical Optimization

Numerical geometry of non-rigid shapes Numerical Optimization

Generic optimization algorithm

Choose step size

Numerical geometry of non-rigid shapes Numerical Optimization

Stop when gradient norm becomes small

Stop when step size becomes small

Stop when relative objective change becomes small

Numerical geometry of non-rigid shapes Numerical Optimization

Numerical geometry of non-rigid shapes Numerical Optimization

Armijo [ar-mi-xo] rule

until the function sufficiently decreases

Numerical geometry of non-rigid shapes Numerical Optimization

Numerical geometry of non-rigid shapes Numerical Optimization

Find a unit-length direction minimizing directional

Numerical geometry of non-rigid shapes Numerical Optimization

Normalized steepest descent

Coordinate descent (coordinate axis in which descent is maximal)

Numerical geometry of non-rigid shapes Numerical Optimization

Steepest descent algorithm

Choose step size using line search

Increment iteration counter

Numerical geometry of non-rigid shapes Numerical Optimization