Nonlinear Programming Unconstrained

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 182

Nonlinear programming

Unconstrained optimization
techniques
Introduction
This chapter deals with the various methods of solving the unconstrained minimization
problem:




It is true that rarely a practical design problem would be unconstrained; still, a study of
this class of problems would be important for the following reasons:

The constraints do not have significant influence in certain design problems.
Some of the powerful and robust methods of solving constrained minimization
problems require the use of unconstrained minimization techniques.
The unconstrained minimization methods can be used to solve certain complex
engineering analysis problems. For example, the displacement response (linear or
nonlinear) of any structure under any specified load condition can be found by
minimizing its potential energy. Similarly, the eigenvalues and eigenvectors of any
discrete system can be found by minimizing the Rayleigh quotient.
) ( minimizes which Find
2
1
X X f
x
x
x
n

Classification of unconstrained
minimization methods
Direct search methods

Random search method
Grid search method
Univariate method
Pattern search methods
Powells method
Hooke-Jeeves method
Rosenbrocks method
Simplex method
Descent methods

Steepest descent
(Cauchy method)
Fletcher-Reeves method
Newtons method
Marquardt method
Quasi-Newton methods
Davidon-Fletcher-Powell
method
Broyden-Fletcher-Goldfarb-
Shanno method

Direct search methods
They require only the objective function values but not the partial
derivatives of the function in finding the minimum and hence are
often called the nongradient methods.

The direct search methods are also known as zeroth-order methods
since they use zeroth-order derivatives of the function.

These methods are most suitable for simple problems involving a
relatively small numbers of variables.

These methods are in general less efficient than the descent
methods.
Descent methods
The descent techniques require, in addition to the function values,
the first and in some cases the second derivatives of the objective
function.

Since more information about the function being minimized is used
(through the use of derivatives), descent methods are generally
more efficient than direct search techniques.

The descent methods are known as gradient methods.

Among the gradient methods, those requiring only first derivatives of
the function are called first-order methods; those requiring both first
and second derivatives of the function are termed second-order
methods.
General approach
All unconstrained minimization methods are iterative in nature and
hence they start from an initial trial solution and proceed toward the
minimum point in a sequential manner.

Different unconstrained minimization techniques differ from one
another only in the method of generating the new point X
i+1
from X
i
and in testing the point X
i+1
for optimality.

Convergence rates
In general, an optimization method is said to have convergence of order p if




where X
i
and X
i+1
denote the points obtained at the end of iterations i and
i+1, respectively, X* represents the optimum point, and ||X|| denotes the
length or norm of the vector X:

1 , 0 ,
1
> > s

+
p k k
p
i
i
* X X
* X X
2 2
2
2
1 n
x x x + + + = X
Convergence rates
If p=1 and 0 s k s 1, the method is said to be linearly convergent
(corresponds to slow convergence).

If p=2, the method is said to be quadratically convergent
(corresponds to fast convergence).

An optimization method is said to have superlinear convergence
(corresponds to fast convergence) if:




The above definitions of rates of convergence are applicable to
single-variable as well as multivariable optimization problems.



0 lim
1

+

* X X
* X X
i
i
i
Condition number
The condition number of an n x n matrix, [A] is defined as:





| | | |
| |
| |
ij
n
j n i
a A
A A A cond
1 1
1
max
by given sum row
maximum the as defined is [A] of norm infinite the example, For [A]. matrix the of norm a denotes A where
1 ]) ([
= s s

E =
> =
d. conditione ill or behaved well
not be to said is [A] matrix the 1, an greater th tly significan is [A] cond if hand, other On the d. conditione or well
behaved well be to said is A matrix the 1, to close is cond[A] If and [A] in s variation small to sensitive very
be to expected is [A] equations of system the of ector solution v the large, is cond[A] if example, For
[A]. matrix with the dealing in small be to expected are errors off - round the 1, to close is number condition the If
B.
B X X =
Scaling of design variables
The rate of convergence of most unconstrained
minimization methods can be improved by scaling the
design variables.

For a quadratic objective function, the scaling of the
design variables changes the condition number of the
Hessian matrix.

When the condition number of the Hessian matrix is 1,
the steepest descent method, for example, finds the
minimum of a quadratic objective function in one
iteration.
Scaling of design variables
If f=1/2 X
T
[A] X denotes a quadratic term, a transformation of the form



can be used to obtain a new quadratic term as:




The matrix [R] can be selected to make


diagonal (i.e., to eliminate the mixed quadratic terms).

)
`

=
)
`

=
2
1
22 21
12 11
2
1
or ] [
y
y
r r
r r
x
x
R Y X
| |
| | Y Y
Y Y
T
T
T
T
A
R A R f
~
2
1
] ][ [
2
1
=
=
| | ] ][ [ ] [
~
R A R A
T
=
Scaling of design variables
For this, the columns of the matrix [R] are to be chosen
as the eigenvectors of the matrix [A].

Next, the diagonal elements of the matrix can be
reduced to 1 (so that the condition number of the
resulting matrix will be 1) by using the transformation
]
~
[ A
| |
)
`

=
)
`

=
2
1
22
11
2
1
0
0
or
z
z
s
s
y
y
S Z Y
Scaling of design variables
Where the matrix [S] is given by:




Thus, the complete transformation that reduces the
Hessian matrix of f to an identity matrix is given by:

so that the quadratic term
Z Z X T S R = ] ][ [
(
(
(
(
(

=
=
=
22
22
11
11
~
1
0
0
~
1
] [
a
s
a
s
S
| | | |Z Z X X
T T
I A
2
1
to reduces
2
1
Scaling of design variables
If the objective function is not a quadratic, the Hessian matrix and hence
the transformations vary with the design vector from iteration to iteration.
For example, the second-order Taylors series approximation of a general
nonlinear function at the design vector X
i
can be expressed as:


where

X X X B X
T T
] [
2
1
) ( A c f + + =
(
(
(
(
(
(
(

c
c
c c
c
c c
c
c
c
=

c
c
c
c
=
=
i
i
i i
i
i
X
n
X
n
X
n
X
X
n
X
i
x
f
x x
f
x x
f
x
f
x
f
x
f
f c
2
2
1
2
1
2
2
1
2
1
[A]
) (

B
X
Scaling of design variables
The transformations indicated by the equations:






can be applied to the matrix [A] given by



(
(
(
(
(
(
(

c
c
c c
c
c c
c
c
c
=
i i
i i
X
n
X
n
X
n
X
x
f
x x
f
x x
f
x
f
2
2
1
2
1
2
2
1
2
[A]

)
`

=
)
`

=
2
1
22 21
12 11
2
1
or ] [
y
y
r r
r r
x
x
R Y X
| |
)
`

=
)
`

=
2
1
22
11
2
1
0
0
or
z
z
s
s
y
y
S Z Y
Example
Find a suitable scaling (or transformation) of variables to reduce the
condition number of the Hessian matrix of the following function to 1:


Solution: The quadratic function can be expressed as:


where




As indicated above, the desired scaling of variables can be accomplished
in two stages.
(E1) 2 2 6 6 ) , (
2 1
2
2 2 1
2
1 2 1
x x x x x x x x f + =
(E2) X X X B X
T T
] [
2
1
) ( A f + =
(

=
)
`

=
)
`

=
4 6 -
6 - 12
[A] and ,
2
1
,
2
1
B X
x
x
Example
Stage 1: Reducing [A] to a Diagonal Form,

The eigenvectors of the matrix [A] can be found by solving the
eigenvalue problem:


where
i
is the ith eigenvalue and u
i
is the corresponding
eigenvector. In the present case, the eigenvalues,
i
are given by:




which yield
1
=8+\52=15.2111 and
2
=8-\52=0.7889.
| | (E3) ] [ ] [ 0 u =
i i
I A
]
~
[ A
(E4) 0 12 16
4 6
6 12
2
= + =


i i
i
i

Example
The eigenvector u
i
corresponding to

i
can be found by solving



| | 0 u =
i i
I A ] [ ] [
)
`

=
)
`

=
=
=
)
`

=
)
`



5332 . 0
0 . 1

is that
5332 . 0 or
0 6 ) 12 ( or
0
0
4 6
6 12
21
11
11 21
21 11 1
21
11
1
1
u
u
u u
u u
u
u
1
u

Example
and



)
`

=
)
`

=
=
=
)
`

=
)
`



8685 . 1
0 . 1

is that
8685 . 1 or
0 6 ) 12 ( or
0
0
4 6
6 12
22
12
12 22
22 12 2
21
11
2
2
u
u
u u
u u
u
u
1
u

Example
Thus the transformation that reduces [A] to a diagonal form is given by:








This yields the new quadratic term as where




2 1 2
2 1 1
2
1
8685 . 1 5352 . 0
is that
(E5) ] [
y y x
y y x
y
y
R
+ =
+ =
)
`

= = =
1.8685 0.5352 -
1 1
]Y u [u Y X
2 1
Y Y
T
]
~
[
2
1
A
(

= =
5432 . 3 0
0 5682 . 19
] ][ [ ] [ ]
~
[ R A R A
T
Example
And hence the quadratic function becomes:




Stage 2: Reducing to a unit matrix

The transformation is given by , where





(E6) ) 5432 . 3 (
2
1
) 5682 . 19 (
2
1
737 . 4 0704 . 0
]
~
[
2
1
) , (
2
2
2
1 2 1
2 1
y y y y
A y y f
+ + =
+ = Y Y [R]Y B
T T
]
~
[ A
Z Y ] [S =
(

=
(
(
(
(

=
5313 . 0 0 . 0
0 . 0 2262 . 0
5432 . 3
1
0
0
5682 . 19
1
] [S
Example
Stage 3: Complete Transformation
The total transformation is given by

| |
2 1 2
2 1 1
9927 . 0 1211 . 0
5313 . 0 2262 . 0
or
0.9927 0.1211 -
0.5313 0.2262

(E8)
0.5313 0
0 2262 . 0
1.8685 0.5352 -
1 1
[R][S] [T]
where
) 7 ( ] ][ [ ] [
z z x
z z x
E T S R R
+ =
+ =
(

=
(

= =
= = = Z Z Y X
Example
With this transformation, the quadratic function of


becomes


2 2 6 6 ) , (
2 1
2
2 2 1
2
1 2 1
x x x x x x x x f + =
(E9)
2
1
2
1
5167 . 2 0161 . 0
] ][ [ ] [
2
1
] [ ) , (
2
2
2
1 2 1
2 1
z z z z
T A T T z z f
T
+ + =
+ =

Z Z Z B
T T
Example
The contour the below equation is:




2 2 6 6 ) , (
2 1
2
2 2 1
2
1 2 1
x x x x x x x x f + =
Example
The contour the below equation is:






(E6) ) 5432 . 3 (
2
1
) 5682 . 19 (
2
1
737 . 4 0704 . 0
]
~
[
2
1
) , (
2
2
2
1 2 1
2 1
y y y y
A y y f
+ + =
+ = Y Y [R]Y B
T T
Example
The contour the below equation is:






(E9)
2
1
2
1
5167 . 2 0161 . 0
] ][ [ ] [
2
1
] [ ) , (
2
2
2
1 2 1
2 1
z z z z
T A T T z z f
T
+ + =
+ =

Z Z Z B
T T
Direct search methods
Random Search Methods: Random serach methods
are based on the use of random numbers in finding the
minimum point. Since most of the computer libraries
have random number generators, these methods can be
used quite conveniently. Some of the best known
random search methods are:

Random jumping method

Random walk method

Random jumping method
Although the problem is an unconstrained one, we establish the bounds l
i

and u
i
for each design variable x
i
, i=1,2,,n, for generating the random
values of x
i
:

In the random jumping method, we generate sets of n random numbers,
(r
1
, r
2
,.,r
n
), that are uniformly distributed between 0 and 1. Each set of
these numbers, is used to find a point, X, inside the hypercube defined by
the above equation as





and the value of the function is evaluated at this point X.
n i u x l
i i i
,....., 2 , 1
,
= s s

+
+
+
=

=
) (
) (
) (
2 2 2 2
1 1 1 1
2
1
n n n n n
l u r l
l u r l
l u r l
x
x
x

X
Random jumping method
By generating a large number of random points X and
evaluating the value of the objective function at each of
these points, we can take the smallest value of f(X) as
the desired minimum point.
Random walk method
The random walk method is based on generating a sequence of
improved approximations to the minimum, each derived from the
preceding approximation.

Thus, if X
i
is the approximation to the minimum obtained in the (i-
1)th stage (or step or iteration), the new or improved approximation
in the ith stage is found from the relation


where is a prescribed scalar step length and u
i
is a unit random
vector generated in the ith stage.
i i 1 i
u X X + =
+
Random walk method
The detailed procedure of this method is given by the following steps:
1. Start with an initial point X
1
, a sufficiently large initial step length , a
minimum allowable step length c, and a maximum permissable number
of iterations N.
2. Find the function value f
1
= f (X
1
).
3. Set the iteration number as i=1
4. Generate a set of n random numbers r
1
, r
2
, ,r
n
each lying in the interval
[-1 1] and formulate the unit vector u as:

+ + +
=
n
n
r
r
r
r r r
2
1
2 / 1 2 2
2
2
1
) (
1
u
Random walk method
4. The directions generated by the equation




are expected to have a bias toward the diagonals of the unit hypercube. To avoid
such a bias, the length of the vector R, is computed as:


and the random numbers (r
1
, r
2
, ,r
n
) generated are accepted only if R1 but are
discarded if R>1. If the random numbers are accepted, the unbiased random vector
u
i
is given by:

+ + +
=
n
n
r
r
r
r r r
2
1
2 / 1 2 2
2
2
1
) (
1
u
2 / 1 2 2
2
2
1
) ... (
n
r r r R + + + =

+ + +
=
n
n
r
r
r
r r r
2
1
2 / 1 2 2
2
2
1
) (
1
u
Random walk method
5. Compute the new vector and the corresponding function value as


6. Compare the values of f and f
1
. If f < f
1
, set the new values as X
1
=X and f
1
=f, and
go to step 3. If f f
1
, go to step 7.

7. If i N, set the new iteration number as i = i+1 and go to step 4. On the other
hand, if i > N, go to step 8.

8. Compute the new, reduced step length as = /2. If the new step length is
smaller than or equal to c, go to step 9. Otherwise (i.e., if the new step length is
greater than c), go to step 4.

9. Stop the procedure by taking X
opt
~ X
1
and f
opt
~ f
1

) ( and X u X X
1
f f = + =
Example
Minimize


using random walk method from the point


with a starting step length of =1.0. Take c=0.05 and N = 100
2
2 2 1
2
1 2 1 2 1
2 2 ) , ( x x x x x x x x f + + + =
)
`

=
0 . 0
0 . 0
1
X
Example
Random walk method with direction
exploitation
In the random walk method explained, we proceed to
generate a new unit random vector u
i+1
as soon as we
find that u
i
is successful in reducing the function value for
a fixed step length .

However, we can expect to achieve a further decrease in
the function value by taking a longer step length along
the direction u
i
.

Thus, the random walk method can be improved if the
maximum possible step is taken along each successful
direction. This can be achieved by using any of the one-
dimensional minimization methods discussed in the
previous chapter.
Random walk method with direction
exploitation
According to this procedure, the new vector X
i+1
is
found as:


where
i
* is the optimal step length found along the
direction u
i
so that



The search method incorporating this feature is called
the random walk method with direction exploitation.
i i i i
u X X
*
1
+ =
+
) ( min ) (
*
1 i i i i i i
f f f
i
u X u X

+ = + =
+
Advantages of random search
methods
1. These methods can work even if the objective function is
discontinuous and nondifferentiable at some of the points.

2. The random methods can be used to find the global minimum
when the objective function possesses several relative minima.

3. These methods are applicable when other methods fail due to
local difficulties such as sharply varying functions and shallow
regions.

4. Although the random methods are not very efficient by
themselves, they can be used in the early stages of optimization
to detect the region where the global minimum is likely to be
found. Once this region is found, some of the more efficient
techniques can be used to find the precise location of the global
minimum point.
Grid-search method
This method involves setting up a suitable grid in the design space,
evaluating the objective function at all the grid points, and finding the
grid point corresponding to the lowest function values. For example
if the lower and upper bounds on the ith design variable are known
to be l
i
and u
i
, respectively, we can divide the range (l
i
, u
i
) into p
i
-1
equal parts so that x
i
(1)
, x
i
(2)
,, x
i
(pi)
denote the grid points along the
x
i
axis ( i=1,2,..,n).

It can be seen that the grid method requires prohibitively large
number of function evaluations in most practical problems. For
example, for a problem with 10 design variables (n=10), the number
of grid points will be 3
10
=59049 with p
i
=3 and 4
10
=1,048,576 with
p
i
=4 (i=1,2,..,10).

Grid-search method
For problems with a small number of design variables, the grid method can
be used conveniently to find an approximate minimum.

Also, the grid method can be used to find a good starting point for one of the
more efficient methods.



Univariate method
In this method, we change only one variable at a time and seek to
produce a sequence of improved approximations to the minimum
point.

By starting at a base point X
i
in the ith iteration, we fix the values of
n-1 variables and vary the remaining variable. Since only one
variable is changed, the problem becomes a one-dimensional
minimization problem and any of the methods discussed in the
previous chapter on one dimensional minimization methods can be
used to produce a new base point X
i+1
.

The search is now continued in a new direction. This new direction
is obtained by changing any one of the n-1 variables that were fixed
in the previous iteration.
Univariate method
In fact, the search procedure is continued by taking each coordinate direction in
turn. After all the n directions are searched sequentially, the first cycle is
complete and hence we repeat the entire process of sequential minimization.

The procedure is continued until no further improvement is possible in the
objective function in any of the n directions of a cycle. The univariate method
can be summarized as follows:

1. Choose an arbitrary starting point X
1
and set i=1
2. Find the search direction S as

=
+ + =
+ + =
+ + =
=
... , 3 , 2 , for ) 1 ,..., 0 , 0 , 0 (
3,... 3,2 3, for ) 0 ,..., 1 , 0 , 0 (
2,... 2,2 2, for ) 0 ,..., 0 , 1 , 0 (
1,... 1,2 1, for ) 0 ,..., 0 , 0 , 1 (
n n n i
n n i
n n i
n n i
T
i

S
Univariate method
3. Determine whether
i
should be positive or negative.

For the current direction S
i
, this means find whether the function value
decreases in the positive or negative direction.

For this, we take a small probe length (c) and evaluate f
i
=f (X
i
), f
+
=f(X
i
+c S
i
), and f
-
=f(X
i
-c S
i
). If f
+
< f
i
, S
i
will be the correct direction
for decreasing the value of f and if f
-
< f
i
, -S
i
will be the correct one.

If both f
+
and f

are greater than f
i
, we take X
i
as the minimum along the
direction S
i
.

Univariate method
4. Find the optimal step length
i
* such that


where + or sign has to be used depending upon whether S
i
or -S
i
is the
direction for decreasing the function value.

5. Set X
i+1
= X
i

i
*S
i
depending on the direction for decreasing the
function value, and f
i+1
= f (X
i+1
).

6. Set the new value of i=i+1 , and go to step 2. Continue this procedure
until no significant change is achieved in the value of the objective
function.
) ( min ) (
* *
i i i i i i
i
f S X S X

=
Univariate method
The univariate method is very simple and can be implemented easily.

However, it will not converge rapidly to the optimum solution, as it has a
tendency to oscillate with steadily decreasing progress towards the
optimum.

Hence it will be better to stop the computations at some point near to the
optimum point rather than trying to find the precise optimum point.

In theory, the univariate method can be applied to find the minimum of
any function that possesses continuous derivatives.

However, if the function has a steep valley, the method may not even
converge.
Univariate method
For example, consider the
contours of a function of two
variables with a valley as shown in
figure. If the univariate search
starts at point P, the function value
can not be decreased either in the
direction S
1
, or in the direction
S
2
. Thus, the search comes to a
halt and one may be misled to
take the point P, which is certainly
not the optimum point, as the
optimum point. This situation
arises whenever the value of the
probe length c needed for
detecting the proper direction (
S
1
or S
2
) happens to be less
than the number of significant
figures used in the computations.
Example
Minimize



With the starting point (0,0).

Solution: We will take the probe length c as 0.01 to
find the correct direction for decreasing the function
value in step 3. Further, we will use the differential
calculus method to find the optimum step length
i
*
along the direction S
i
in step 4.
2
2 2 1
2
1 2 1 2 1
2 2 ) , ( x x x x x x x x f + + + =
Example
Iteration i=1

Step 2: Choose the search direction S
1
as

Step 3: To find whether the value of f decreases along S
1
or S
1
, we use the
probe length c. Since





-S
1
is the correct direction for minimizing f from X
1
.
)
`

=
0
1
1
S
1
1
1
9998 . 0 0 0 ) 0001 . 0 ( 2 0 01 . 0 ) 0 , ( ) (
0102 . 0 0 0 ) 0001 . 0 ( 2 0 01 . 0 ) 0 , ( ) (
0 ) 0 , 0 ( ) (
f f f f
f f f f
f f f
< = + + + = = =
> = + + + = = + =
= = =

+
c c
c c
1 1
1 1
1
S X
S X
X
Example
Step 4: To find the optimum step length
1
*, we minimize





Step 5: Set

4
1
have we ,
4
1
at 0
d
d
As
2 0 0 ) ( 2 0 ) (
) 0 , ( ) (
*
1 1
1
1
2
1
2
1 1
1 1
= = =
= + + + =
=



f
f f S - X
1
8
1
) 0 ,
4
1
( ) (
0
4
1
0
1
4
1
0
0
2 2
*
1 1 2
= = =

=
)
`

)
`

= =
f f f X
S X X
1

Example
Iteration i=2

Step 2: Choose the search direction S
2
as

Step 3: Since




S
2
is the correct direction for decreasing the value of f from X
2
.
)
`

=
1
0
2
S
2
2
2
1099 . 0 ) 01 . 0 , 25 . 0 ( ) (
1399 . 0 ) 01 . 0 , 25 . 0 ( ) (
125 . 0 ) (
f f f f
f f f f
f f
> = = =
< = = + =
= =

+
2 2
2 2
2
S X
S X
X
c
c
Example
Step 4: We minimize f (X
2
+
2
S
2
) to find
2
*.
Here





Step 5: Set

75 . 0 at 0 5 . 1 2
d
d

125 . 0 5 . 1
) )( 25 . 0 ( 2 ) 25 . 0 ( 2 25 . 0
) , 25 . 0 ( ) (
*
2 2
2
2
2
2
2
2 2
2
2
2 2
= = =
=
+ + =
= +




f
f f
2 2
S X
found. is 25 . 1 ) ( with
5 . 1
0 . 1
solution
optimum the until procedure the continue and 3, i as number iteration set the we Next,
6875 . 0 ) (
75 . 0
25 . 0
1
0
75 . 0
0
25 . 0
3 3
*
2 2 3
=
)
`

=
=
= =
)
`

=
)
`

+
)
`

= + =
X* X*
X
S X X
2
f
f f

Pattern Directions
In the univariate method, we search for the minimum
along the directions parallel to the coordinate axes. We
noticed that this method may not converge in some
cases, and that even if it converges, its convergence will
be very slow as we approach the optimum point.

These problems can be avoided by changing the
directions of search in a favorable manner instead of
retaining them always parallel to the coordinate axes.

Pattern Directions
Consider the contours of the function
shown in the figure. Let the points
1,2,3,... indicate the successive
points found by the univariate
method. It can be noticed that the
lines joining the alternate points of
the search (e.g.,1,3;2,4;3,5;4,6;...) lie
in the general direction of the
minimum and are known as pattern
directions. It can be proved that if the
objective function is a quadratic in
two variables, all such lines pass
through the minimum. Unfortunately,
this property will not be valid for
multivariable functions even when
they are quadratics. However, this
idea can still be used to achieve
rapid convergence while finding the
minimum of an n-variable function.
Pattern Directions
Methods that use pattern directions as search directions are known as pattern
search methods.
Two of the best known pattern search methods are:
Hooke-Jeeves method
Powells method
In general, a pattern search method takes n univariate steps, where n denotes the
number of design variables and then searches for the minimum along the pattern
direction S
i
, defined by


where X
i
is the point obtained at the end of n univariate steps.
In general, the directions used prior to taking a move along a pattern direction
need not be univariate directions.

n i i i
= X - X S
Hooke and Jeeves Method
The pattern search method of Hooke and Jeeves is a sequential technique each
step of which consists of two kinds of moves, the exploratory move and the
pattern move.

The first kind of move is included to explore the local behaviour of the
objective function and the second kind of move is included to take advantage of
the pattern direction.

The general procedure can be described by the following steps:

1. Start with an arbitrarily chosen point




called the starting base point, and prescribed step lengths x
i
in each of the
coordinate directions u
i
, i=1,2,...,n. Set k=1.

=
n
x
x
x

2
1
1
X
Hooke and Jeeves method
2. Compute f
k
= f (X
k
). Set i=1, Y
k0
=X
k
, where the point Y
kj
indicates the temporary
base point X
k
by perturbing the jth component of X
k
. Then start the exploratory
move as stated in Step 3.

3. The variable x
i
is perturbed about the current temporary base point Y
k,i-1
to obtain
the new temporary base point as





This process of finding the new temporary base point is continued for i=1,2,...
until x
n
is perturbed to find Y
k,n
.

< =
A + = < = < A = A
= < A + = A +
=
+

) , min( ) ( if
) ( ) ( ) ( if
) ( ) ( if
1 , 1 ,
1 , 1 , 1 , 1 ,
1 , 1 , 1 ,
,
f f f f
x f f f f x f f x
f f x f f x
i k i k
i i i k i k i i i k i i i k
i k i i i k i i i k
i k
Y Y
u Y Y u Y u Y
Y u Y u Y
Y
Hooke and Jeeves Method
4. If the point Y
k,n
remains the same as X
k
, reduce the step lengths x
i
(say, by a factor
of 2), set i=1 and go to step 3. If Y
k,n
is different from X
k
, obtain the new base point
as

and go to step 5.
5. With the help of the base points X
k
and X
k+1
, establish a pattern direction S as




where is the step length, which can be taken as 1 for simplicity. Alternatively, we
can solve a one-dimensional minimization problem in the direction S and use the
optimum step length * in place of in the equation
n k, 1 k
Y X =
+
S X Y
Y
X - X S
+ =
=
+ +
+
+
1 0 , 1
0 , 1
1
as point a find and
k k
k
k k
S X Y + =
+ + 1 0 , 1 k k
Hooke and Jeeves Method
6. Set k=k+1, f
k
=f (Y
k0
), i=1, and repeat step 3. If at the end of step 3,
f (Y
k,n
) < f (X
k
), we take the new base point X
k+1
=Y
k,n
and go to step 5. On the
other hand, if f (Y
k,n
) > f (X
k
), set X
k+1
~X
k
, reduce the step lengths x
i
, set
k=k+1, and go to step 2.

7. The process is assumed to have converged whenever the step lengths fall
below a small quantity c. Thus the process is terminated if
c < A ) ( max
i
i
x
Example
Minimize


starting from the point


Take x
1
= x
2
= 0.8 and c = 0.1.
Solution:
Step 1: We take the starting base point as


and step lengths as x
1
= x
2
= 0.8 along the coordinate directions u
1
and u
2
,
respectively. Set k=1.
2
2 2 1
2
1 2 1 2 1
2 2 ) , ( x x x x x x x x f + + + =
)
`

=
0
0
1
X
)
`

=
0
0
1
X
Example
Step 2: f
1
= f (X
1
) = 0, i=1, and


Step 3: To find the new temporary base point, we set i=1 and evaluate f = f (Y
10
)=0.0



Since f < min( f
+
, f
-
), we take Y
11
=X
1
. Next we set i=2, and evaluate

f = f (Y
11
)=0.0 and

Since f
+
< f, we set

*** Y
kj
indicates the temporary base point X
k
by perturbing the jth component of X
k
)
`

= =
0
0
1 10
X Y
48 . 0 ) 0 . 0 , 8 . 0 ( ) (
08 . 2 ) 0 . 0 , 8 . 0 ( ) (
1
1
= = A =
= = A + =
+
f x f f
f x f f
1 10
1 10
u Y
u Y
16 . 0 ) 8 . 0 , 0 . 0 ( ) (
2
= = A + =
+
f x f f
2 11
u Y
)
`

=
8 . 0
0 . 0
12
Y
Example
Step 4: As Y
12
is different from X
1
, the new base point is taken as:


Step 5: A pattern direction is established as:



The optimal step length * is found by minimizing


As df / d = 1.28 +0.48 = 0 at * = - 0.375, we obtain the point Y
20
as

)
`

=
8 . 0
0 . 0
2
X
)
`

=
)
`

)
`

= =
8 . 0
0 . 0
0 . 0
0 . 0
8 . 0
0 . 0
1 2
X X S
16 . 0 48 . 0 64 . 0 ) 8 . 0 8 . 0 , 0 . 0 ( ) (
2
2
+ = + = + f f S X
)
`

=
)
`

)
`

= + =
5 0
0 0
8 0
0 0
375 0
8 0
0 0
.
.
.
.
.
.
.
S * X Y
2 20

Example
Step 6: Set k = 2, f = f
2
= f (Y
20
) = -0.25, and repeat step 3. Thus, with i=1,we
evaluate




Since f
-
< f < f
+
, we take

Next, we set i=2 and evaluate f = f (Y
21
) = - 0.57 and


As f
+
< f , we take . Since f (Y
22
) = -1.21 < f (X
2
) =-0.25, we take
the new base point as:

57 . 0 ) 5 . 0 , 8 . 0 ( ) (
63 . 2 ) 5 . 0 , 8 . 0 ( ) (
1
1
= = A =
= = A + =

+
f x f f
f x f f
1 20
1 20
u Y
u Y
)
`

=
5 . 0
8 . 0
21
Y
21 . 1 ) 3 . 1 , 8 . 0 ( ) (
2
= = A + =
+
f x f f
2 21
u Y
)
`

=
3 . 1
8 . 0
22
Y
)
`

= =
3 . 1
8 . 0
22 3
Y X
Example
Step 6 continued: After selection of the new base point, we go to step 5.

This procedure has to be continued until the optimum point




is found.


)
`

=
5 . 1
0 . 1
opt
X
Powells method
Powells method is an extension of the basic pattern search method.

It is the most widely used direct search method and can be proved
to be a method of conjugate directions.

A conjugate directions method will minimize a quadratic function in a
finite number of steps.

Since a general nonlinear function can be approximated reasonably
well by a quadratic function near its minimum, a conjugate directions
method is expected to speed up the convergence of even general
nonlinear objective functions.
Powells method
Definition: Conjugate Directions

Let A=[A] be an n x n symmetric matrix. A set of n vectors (or directions) {S
i
} is
said to be conjugate (more accurately A conjugate) if



It can be seen that orthogonal directions are a special case of conjugate directions
(obtained with [A]=[I])

Definition: Quadratically Convergent Method

If a minimization method, using exact arithmetic, can find the minimum point in n
steps while minimizing a quadratic function in n variables, the method is called a
quadratically convergent method.
n j n i j i
j
T
i
,..., 2 , 1 , ,..., 2 , 1 , all for 0 = = = = AS S
Powells method
Theorem 1: Given a quadratic
function of n variables and two
parallel hyperplanes 1 and 2 of
dimension k < n. Let the constrained
stationary points of the quadratic
function in the hyperplanes be X
1
and
X
2
, respectively. Then the line joining
X
1
and X
2
is conjugate to any line
parallel to the hyperplanes. The
meaning of this theorem is illustrated
in a two-dimensional space in the
figure. If X
1
and X
2
are the minima

of
Q obtained by searching along the
direction S from two different starting
points X
a
and X
b
, respectively, the
line (X
1
- X
2
) will be conjugate to the
search direction S.
Powells method
Theorem 2: If a quadratic function


is minimized sequentially, once along each direction of a set of n
mutually conjugate directions, the minimum of the function Q will be
found at or before the nth step irrespective of the starting point.
C Q + + = X B AX X X
T T
2
1
) (
Example
Consider the minimization of the function



If denotes a search direction, find a direction S
2
which is

conjugate to the direction S
1
.

Solution: The objective function can be expressed in matrix form as:

2 1 2 1
2
2
2
1 2 1
2 6 2 6 ) , ( x x x x x x x x f + =
)
`

=
2
1
1
S
{ } { }
)
`


+
)
`

=
+ =
2
1
2 1
2
1
4 6
6 12
2
1
2 1
[A]
2
1
) (
x
x
x x
x
x
-
f X X X B X
T T
Example
The Hessian matrix [A] can be identified as



The direction



will be conjugate to


if
| |
(


=
4 6
6 12
A
)
`

=
2
1
s
s
2
S
)
`

=
2
1
1
S
| | ( ) 0
4 6
6 12
2 1
2
1
=
)
`


=
s
s
A
2
T
1
S S
Example
which upon expansion gives 2s
2
= 0 or s
1
= arbitrary and s
2
= 0. Since s
1
can have
any value, we select s
1
= 1 and the desired conjugate direction can be expressed
as




)
`

=
0
1
2
S
Powells Method: The Algorithm
The basic idea of Powells method
is illustrated graphically for a two
variable function in the figure. In
this figure, the function is first
minimized once along each of the
coordinate directions starting with
the second coordinate direction
and then in the corresponding
pattern direction. This leads to
point 5. For the next cycle of
minimization, we discard one of
the coordinate directions (the x
1

direction in the present case) in
favor of the pattern direction.

Powells Method: The Algorithm
Thus we minimize along u
2

and S
1
and point 7

. Then we
generate a new pattern
direction as shown in the
figure. For the next cycle of
minimization, we discard one
of the previously used
coordinate directions (the x
2

direction in this case) in favor
of the newly generated pattern
direction.
Powells Method: The Algorithm
Then by starting from point 8,
we minimize along directions S
1
and S
2
, thereby obtaining points
9 and 10, respectively. For the
next cycle of minimization, since
there is no coordinate direction
to discard, we restart the whole
procedure by minimizing along
the x
2
direction. This procedure
is continued until the desired
minimum point is found.
Powells Method: The Algorithm
Powells Method: The Algorithm
Powells Method: The Algorithm
Note that the search will be made sequentially in the directions S
n
;
S
1
, S
2
, S
3
,., S
n-1
, S
n
; S
p
(1)
; S
2
, S
3
,., S
n-1
, S
n
,

S
p
(1)
; S
p
(2)
;
S
3
,S
4
,., S
n-1
, S
n
,

S
p
(1)
, S
p
(2)
; S
p
(3)
,.until the minimum point is
found. Here S
i
indicates the coordinate direction u
i
and S
p
(j)
the jth
pattern direction.

In the flowchart, the previous base point is stored as the vector Z in
block A, and the pattern direction is constructed by subtracting the
previous base point from the current one in Block B.

The pattern direction is then used as a minimization direction in
blocks C and D.
Powells Method: The Algorithm
For the next cycle, the first direction used in the previous cycle is
discarded in favor of the current pattern direction. This is achieved
by updating the numbers of the search directions as shown in block
E.

Thus, both points Z and X used in block B for the construction of the
pattern directions are points that are minima along S
n
in the first
cycle, the first pattern direction S
p
(1)
in the second cycle, the second
pattern direction S
p
(2)
in the third cycle, and so on.

Quadratic convergence
It can be seen from the flowchart that the pattern directions S
p
(1)
, S
p
(2)
, S
p
(3)
,.are
nothing but the lines joining the minima found along the directions S
n
, S
p
(1)
,
S
p
(2)
,.respectively. Hence by Theorem 1, the pairs of direction (S
n
, S
p
(1)
), (S
p
(1)
,
S
p
(2)
), and so on, are A conjugate. Thus all the directions S
n
, S
p
(1)
, S
p
(2)
,. are A
conjugate. Since by Theorem 2, any search method involving minimization along a
set of conjugate directions is quadratically convergent, Powells method is
quadratically convergent.

From the method used for constructing the conjugate directions S
p
(1)
, S
p
(2)
,. , we
find that n minimization cycles are required to complete the construction of n
conjugate directions. In the ith cycle, the minimization is done along the already
constructed i conjugate directions and the n-i nonconjugate (coordinate) directions.
Thus, after n cycles, all the n search directions are mutually conjugate and a
quadratic will theoretically be minimized in n
2
one-dimensional minimizations.
This proves the quadratic convergence of Powells method.


Quadratic Convergence of Powells
Method
It is to be noted that as with most of the numerical techniques, the
convergence in many practical problems may not be as good as the
theory seems to indicate. Powells method may require a lot more
iterations to minimize a function than the theoretically estimated
number. There are several reasons for this:

1. Since the number of cycles n is valid only for quadratic functions, it will
take generally greater than n cycles for nonquadratic functions.

2. The proof of quadratic convergence has been established with the
assumption that the exact minimum is found in each of the one
dimensional minimizations. However, the actual minimizing step
lengths
i
* will be only approximate, and hence the subsequent
directions will not be conjugate. Thus the method requires more number
of iterations for achieving the overall convergence.
Quadratic Convergence of Powells
Method
3. Powells method described above can break down before the minimum point is
found. This is because the search directions S
i
might become dependent or
almost dependent during numerical computation.

Example: Minimize


From the starting point



using Powells method.
2
2 2 1
2
1 2 1 2 1
2 2 ) , ( x x x x x x x x f + + + =
)
`

=
0
0
1
X
Example
Cycle 1: Univariate search
We minimize f along from X
1
. To find the correct direction (+S
2
or

S
2
) for decreasing the value of f, we take the probe length as c=0.01. As f
1
=f
(X
1
)=0.0, and

f decreases along the direction +S
2
. To find the minimizing step length * along S
2
,
we minimize

As df/d = 0 at * =1/2, we have
)
`

= =
1
0
2 n
S S
1
0099 . 0 ) 01 . 0 , 0 . 0 ( ) ( f f f f < = = + =
+
2 1
S X c
= = +
2
1
) , 0 . 0 ( ) ( f f
2
S X
)
`

= + =
5 . 0
0
*
2 1 2
S X X
Example
Next, we minimize f along








f decreases along S
1
. As f (X
2
-S
1
) = f (- ,0.50) = 2
2
-2 -0.25, df/d =0 at
*=1/2. Hence

2698 . 0 ) 5 . 0 , 01 . 0 ( ) (
2298 . 0 ) 5 . 0 , 01 . 0 ( ) (
25 . 0 ) 5 . 0 , 0 . 0 ( ) (
Since .
5 . 0
0 . 0
from
0
1
2
2
= = =
> = = + =
= = =
)
`

=
)
`

+
f f f
f f f f
f f f
1 2
1 2
2
2 1
S X
S X
X
X S
c
c
)
`

= =
5 . 0
5 . 0
*
1 2
S X X
3

Example
Now we minimize f along






f decreases along +S
2
direction. Since


This gives
3
3
7599 . 0 ) 51 . 0 , 5 . 0 ( ) (
75 . 0 ) (
As .
5 . 0
5 . 0
from
1
0
f f f f
f f
< = = + =
= =
)
`

=
)
`

=
+
2 3
3
3 2
S X
X
X S
c
2
1
* at 0 , 75 . 0 ) 5 . 0 , 5 . 0 ( ) (
2
= = = + = +


d
df
f f
2 3
S X
)
`

= + =
0 . 1
5 . 0
*
2 3
S X X
4

Example
Cycle 2: Pattern Search
Now we generate the first pattern direction as:

and minimize f along S
p
(1)
from X
4
. Since



f decreases in the positive direction of S
p
(1)
. As

)
`

= =
5 . 0
5 . 0
2
1
0
1
2
1
2 4
(1)
p
X - X S
004975 . 1 ) 005 . 1 , 505 . 0 ( ) 005 . 0 1 , 005 . 0 5 . 0 ( ) (
0 . 1 ) (
) 1 (
4 4
= = + = + =
= =
+
f f f f
f f
p 4
S X
X
c
)
`

= + =
= =
= + = +
5 . 1
0 . 1
2
1
2
1
0 . 1
1
2
1
*
hence and 1.0 * at 0
00 . 1 50 . 0 25 . 0 ) 5 . 0 1 , 5 . 0 5 . 0 ( ) (
) 1 (
) 1 (
2
p 4 5
p 4
S X X
S X


d
df
f f
Example
The point X
5
can be identified to be the optimum point.
If we do not recognize X
5
as the optimum point at this stage, we proceed
to minimize f along the direction.








This shows that f can not be minimized along S
2
, and hence X
5
will be
the optimum point.
In this example, the convergence has been achieved in the second cycle
itself. This is to be expected in this case as f is a quadratic function, and
the method is a quadratically convergent method.

5
5
5
) (
and
) (
25 . 1 ) (
obtain would Then we . from
1
0
f f f
f f f
f f
> =
> + =
= =
)
`

+
2 5
2 5
5
5 2
S X
S X
X
X S
c
c
Indirect search (descent method)
Gradient of a function
The gradient of a function is an n-component vector given by:







The gradient has a very important property. If we move along the gradient
direction from any point in n dimensional space, the function value increases at
the fastest rate. Hence the gradient direction is called the direction of the
steepest ascent. Unfortunately, the direction of steepest ascent is a local
property not a global one.

c
c
c
c
c
c
= V

n
n
x
f
x
f
x
f
f

2
1
1
Indirect search (descent method)
The gradient vectors Vf evaluated at
points 1,2,3 and 4 lie along the
directions 11, 22, 33,44,
respectively.
Thus the function value increases at
the fastest rate in the direction 11 at
point 1, but not at point 2. Similarly,
the function value increases at the
fastest rate in direction 22 at point 2,
but not at point 3.
In other words, the direction of
steepest ascent generally varies from
point to point, and if we make
infinitely small moves along the
direction of steepest ascent, the path
will be a curved line like the curve 1-
2-3-4 in the
Indirect search (descent method)
Since the gradient vector represents the direction of steepest ascent, the
negative of the gradient vector denotes the direction of the steepest descent.

Thus, any method that makes use of the gradient vector can be expected to
give the minimum point faster than one that does not make use of the
gradient vector.

All the descent methods make use of the gradient vector, either directly or
indirectly, in finding the search directions.

Theorem 1: The gradient vector represents the direction of the steepest ascent.
Theorem 2: The maximum rate of change of f at any point X is equal to the
magnitude of the gradient vector at the same point.
Indirect search (descent method)
In general, if df/ds =Vf
T
u > 0 along a vector dX, it is called a direction
of ascent, and if df/ds < 0, it is called a direction of descent.

Evaluation of the gradient
The evaluation of the gradient requires the computation of the partial
derivatives of/ox
i
, i=1,2,.,n. There are three situations where the
evaluation of the gradient poses certain problems:

1. The function is differentiable at all the points, but the calculation of the
components of the gradient, of/ox
i
, is either impractical or impossible.
2. The expressions for the partial derivatives of/ox
i
can be derived, but they
require large computational time for evaluation.
3. The gradient Vf is not defined at all points.

Indirect search (descent method)
The first case: The function is differentiable at all the points, but the calculation
of the components of the gradient, of/ox
i
, is either impractical or impossible.

In the first case, we can use the forward finite-difference formula



to approximate the partial derivative of/ox
i
at X
m
. If the function value at the base
point X
m
is known, this formula requires one additional function evaluation to
find (of/ox
i
)|
Xm
. Thus, it requires n additional function evaluations to evaluate the
approximate gradient Vf |
Xm
. For better results, we can use the central finite
difference formula to find the approximate partial derivative of/ox
i
|
Xm

n i
x
f x f
x
f
i
m i i m
i
,..., 2 , 1 ,
) ( ) (
=
A
A +
~
c
c X u X
m
X
n i
x
x f x f
x
f
i
i i m i i m
i
,..., 2 , 1 ,
2
) ( ) (
=
A
A A +
~
c
c u X u X
m
X
Indirect search (descent method)
In these two equations, Ax
i
is a small scalar quantity and u
i
is a vector of order n
whose ith component has a value of 1, and all other components have a value of
zero.

In practical computations, the value of Ax
i
has to be chosen with some care. If Ax
i

is too small, the difference between the values of the function evaluated at (X
m
+
Ax
i
u
i
) and (X
m
- Ax
i
u
i
) may be very small and numerical round-off errors may
dominate. On the other hand, if Ax
i
is too large, the truncation error may
predominate in the calculation of the gradient.

If the expressions for the partial derivatives may be derived, but they require large
computational time for evaluation (Case 2), the use of the finite difference
formulas has to be preferred whenever the exact gradient evaluation requires more
computational time than the one involved with the equations:
n i
x
x f x f
x
f
i
i i m i i m
i
,..., 2 , 1 ,
2
) ( ) (
=
A
A A +
~
c
c u X u X
m
X
n i
x
f x f
x
f
i
m i i m
i
,..., 2 , 1 ,
) ( ) (
=
A
A +
~
c
c X u X
m
X
Indirect search (descent method)
If the gradient is not defined at all
points (Case 3), we can not use the
finite difference formulas.

For example, consider the function
shown in the figure. If the equation


is used to evaluate the derivative df/dx
at X
m
, we obtain a value of o
1
for a
step size Ax
1
and a value of o
2
for a
step size Ax
2
. Since in reality, the
derivative does not exist at the point
X
m
, the use of the finite-difference
formulas might lead to a complete
breakdown of the minimization
process. In such cases, the
minimization can be done only by one
of the direct search techniques
discussed.

n i
x
x f x f
x
f
i
i i m i i m
i
,..., 2 , 1 ,
2
) ( ) (
=
A
A A +
~
c
c u X u X
m
X
Rate of change of a function along
a direction
In most optimization techniques, we are interested in finding the rate of change of
a function with respect to a parameter along a specified direction S
i
away from a
point X
i
. Any point in the specified direction away from the given point X
i
can be
expressed as X=X
i
+ S
i
. Our interest is to find the rate of change of the function
along the direction S
i
(characterized by the parameter ), that is,



where x
j
is the jth component of X. But



where x
ij
and s
ij
are the jth components of X
i
and

S
i
, respectively.

c
c
c
c
=

=
j
n
j
j
x
x
f
d
df
1
ij ij
j
s s x
x
ij
= +
c
c
=
c
c
) (

Rate of change of a function along
a direction
Hence



If * minimizes f in the direction S
i
, we have
i
T
ij
n
j
j
f s
x
f
d
df
S V =
c
c
=

=1

i i
i
f
d
df
S X
S
* point at the
0
*
*



+
= V =
=
=
Steepest descent (Cauchy method)
The use of the negative of the gradient vector as a direction for minimization
was first made by Cauchy in 1847.

In this method, we start from an initial trial point X
1
and iteratively move along
the steepest descent directions until the optimum point is found.

The steepest descent method can be summarized by the following steps:

1. Start with an arbitrary initial point X
1
. Set the iteration number as i=1.
2. Find the search direction S
i
as


3. Determine the optimal step length * in the direction S
i
and set
) (
i i i
f f X S V = V =
i i i i i i i
f V = + =
+
* *
1
X S X X
Steepest descent (Cauchy method)
4. Test the new point, X
i+1
, for optimality. If X
i+1
is optimum, stop the
process. Otherwise go to step 5.

5. Set the new iteration number i=i+1 and go to step 2.

The method of steepest descent may appear to be the best
unconstrained minimization technique since each one-dimensional
search starts in the best direction. However, owing to the fact that
the steepest descent direction is a local property, the method is not
really effective in most problems.

Example
Minimize


Starting from the point
Solution:
Iteration 1: The gradient of f is given by:

2
2 2 1
2
1 2 1 2 1
2 2 ) , ( x x x x x x x x f + + + =
)
`

=
0
0
1
X
)
`

= V =
)
`

= V = V
)
`

+ +
+ +
=
)
`

c c
c c
= V
1
1
Therefore
1
1
) (
2 2 1
2 4 1
/
/
1 1
1 1
2 1
2 1
2
1
f
f f
x x
x x
x f
x f
f
S
X
Example
To find X
2
, we need to find the optimal step length
1
*. For this, we
minimize






As
)
`

=
)
`

+
)
`

= + =
= = = = +
1
1
1
1
1
0
0
: obtain we
, 1 at 0 Since . respect to with 2 ) , ( ) (
1 2
*
1
1
1 1
2
1 1 1 1
1
*
1
*
1
S X X
S X


d
df
f f
optimum. not is ,
0
0
1
1
) (
2 2 2
X X
)
`

=
)
`

= V = V f f
Example
Iteration 2:









Since the components of the gradient at X
3
, are not zero, we
proceed to the next iteration.
)
`

=
)
`

+
)
`

= + =
= =
= + + = +
)
`

= V =
2 . 1
8 . 0
1
1
5
1
1
1
hence and , 5 / 1 gives This . 0 set we
1 2 5 ) 1 , 1 ( ) (
minimize To
1
1
2
*
2 3
*
2
2
2
2
2 2 2 2 2 2
2 2
S X X
S X
S
2


d
df
f f
f
)
`

= V
2 . 0
2 . 0
3
f
Example
Iteration 3:








The gradient at X
4
is given by:

Since the components of the gradient at X
4
are not equal to zero, X
4
is not optimum
and hence we have to proceed to the next iteration. This process has to be
continued until the optimum point, is found.

)
`

=
)
`

+
)
`

= + =
= =
= + = +
)
`

= V =
4 . 1
0 . 1
2 . 0
2 . 0
0 . 1
2 . 1
8 . 0
Therefore, . 0 . 1 gives This . 0 set we
20 . 1 08 . 0 04 . 0 ) 2 . 0 2 . 1 , 2 . 0 8 . 0 ( ) (
As
2 . 0
2 . 0
3
*
3 4
*
3
3
3
2
3 3 3 3 3 3
3 3
S X X
S X
S
3


d
df
f f
f
)
`

= V
2 . 0
2 . 0
4
f
)
`

=
5 . 1
0 . 1
X*
Convergence Criteria
The following criteria can be used to terminate the iterative process:

1. When the change in function value in two consecutive iterations is small:



2. When the partial derivatives (components of the gradient) of f are small:




3. When the change in the design vector in two consecutive iterations is small:
1
) (
) ( ) (
c s

+
i
i 1 i
X
X X
f
f f
n i
x
f
i
,..., 2 , 1 ,
2
= s
c
c
c
3 1
c s
+ i i
X - X
Conjugate Gradient (Fletcher-
Reeves) Method
The convergence characteristics of the steepest descent method can be
improved greatly by modifying it into a conjugate gradient method which
can be considered as a conjugate directions method involving the use of the
gradient of the function.

We saw that any minimization method that makes use of the conjugate
directions is quadratically convergent. This property of quadratic
convergence is very useful because it ensures that the method will
minimize a quadratic function in n steps or less.

Since any general function can be approximated reasonably well by a
quadratic near the optimum point, any quadratically convergent method is
expected to find the optimum point in a finite number of iterations.
Conjugate Gradient (Fletcher-
Reeves) Method
We have seen that Powells conjugate direction method requires n single-
variable minimizations per iteration and sets up a new conjugate direction
at the end of each iteration.

Thus, it requires in general n
2
single-variable minimizations to find the
minimum of a quadratic function.

On the other hand, if we can evaluate the gradients of the objective
function, we can set up a new conjugate direction after every one
dimensional minimization, and hence we can achieve faster convergence.
Development of the Fletcher-
Reeves Method
Consider the development of an algorithm by modifying the steepest descent
method applied to a quadratic function f (X)=1/2 X
T
AX + B
T
X+C by imposing the
condition that the successive directions be mutually conjugate.

Let X
1
be the starting point for the minimization and let the first search direction be
the steepest descent direction:





where
1
*
is the minimizing step length in the direction S
1
, so that

*
1
1
1
*
1 1 2
1 1 1
or

1 2
X - X
S
S X X
B -AX - S
=
+ =
= V = - f
0
1
= V
2
X
S f
T
Development of the Fletcher-
Reeves Method
The equation

can be expanded as

from which the value of
1
* can be found as:

Now express the second search direction as a linear combination of S
1
and -Vf
2
:

where |
2
is to be chosen so as to make S
1
and S
2
conjugate. This requires that:

Substituting into leads to:


The above equation and the equation leads to

0
1
= V
2
X
S f
T
| | 0
1 1 1
= + + B ) S A(X S
T
1
*

1 1
AS S
S
AS S
B) (AX S -
1
1
1
1
1 1 *
1
f
T
T
T
T
V
=
+
=
1 2 2
S - S
2
| + V = f
0
1
=
2
AS S
T
1 2 2
S - S
2
| + V = f
0
1
=
2
AS S
T
0 )
2 1
= + V
1 2
S A(- S | f
T
*
1
1

1 2
X - X
S =
0 )
*
2
= V
1 2
1
T
1 2
S A(
) X - (X
- |

f
Development of the Fletcher-
Reeves Method
The difference of the gradients (Vf
2
- Vf
1
) can be expressed as:


With the help of the above equation, the equation can be
written as:


where the symmetricity of the matrix A has been used. The above equation can be
expanded as:


Since from , the above equation gives:
0 )
*
2
= V
1 2
1
T
1 2
S A(
) X - (X
- |

f
Development of the Fletcher-
Reeves Method
Next, we consider the third search direction as a linear combination of S
1
, S
2
, and -
Vf
3
as:

where the values of |
3
and o
3
can be found by making S
3
conjugate to S
1
and S
2
. By
using the condition S
1
T
AS
3
=0, the value of o
3
can be found to be zero. When the
condition S
2
T
AS
3
=0 is used, the value of |
3
can be obtained as:



so that the equation becomes
where |
3
is given by:
2 2
3 3
3
f f
f f
T
T
V V
V V
= |
2 3 3 3
S S | + V = f
2 2
3 3
3
f f
f f
T
T
V V
V V
= |
Development of the Fletcher-
Reeves Method
In fact can be generalized as:


where


The above equations define the search directions used in the Fletcher Reeves
method.
2 3 3 3
S S | + V = f
1 1
V V
V V
=
i
T
i
i
T
i
i
f f
f f
|
1
+ V =
i i i i
f S S |
Fletcher-Reeves Method
The iterative procedure of Fletcher-Reeves method can be stated as follows:
1. Start with an arbitrary initial point X
1
.
2. Set the first search direction S
1
= -V f (X
1
)=- V f
1
3. Find the point X
2
according to the relation


where
1
*
is the optimal step length in the direction S
1
. Set i=2 and go to the next
step.
4. Find V f
i
= V f(X
i
), and set


5. Compute the optimum step length
i
*
in the direction S
i
, and find the new point

1
*
1 1
S X X + =
2
1 2
1
2
i-
i
i
i i
f
f
f S - S

V
V
+ V =
i i i i
S X X
*
1
+ =
+
Fletcher-Reeves Method
6. Test for the optimality of the point X
i+1
. If X
i+1
is optimum,

stop the
process. Otherwise set the value of i=i+1 and go to step 4.

Remarks:

1. The Fletcher-Reeves method was originally proposed by Hestenes and
Stiefel as a method for solving systems of linear equations derived from
the stationary conditions of a quadratic. Since the directions S
i
used in
this method are A- conjugate, the process should converge in n cycles or
less for a quadratic function. However, for ill-conditioned quadratics
(whose contours are highly eccentric and distorted), the method may
require much more than n cycles for convergence. The reason for this
has been found to be the cumulative effect of the rounding errors.
Fletcher-Reeves Method
Remarks:

Remark 1 continued: Since S
i
is given by


any error resulting from the inaccuracies involved in the determination of
i
* , and from the
round-off error involved in accumulating the successive


terms, is carried forward through the vector S
i
. Thus, the search directions S
i
will be
progressively contaminated by these errors. Hence it is necessary, in practice, to restart the
method periodically after every, say, m steps by taking the new search direction as the
steepest descent direction. That is, after every m steps, S
m+1
is set equal to -Vf
m+1
instead of
the usual form. Fletcher and Reeves have recommended a value of m=n+1, where n is the
number of design variables.

1
2
1
2
i-
i
i
i i
f
f
f S - S

V
V
+ V =
2
1 1 - i
2
/

V V
i i
f f S
Fletcher-Reeves Method
Remarks:

2. Despite the limitations indicated above, the Fletcher-Reeves method is
vastly superior to the steepest descent method and the pattern search
methods, but it turns out to be rather less efficient than the Newton and
the quasi-Newton (variable metric) methods discussed in the latter
sections.
Example
Minimize


starting from the point

Solution:
Iteration 1



The search direction is taken as:
Example
To find the optimal step length
1
* along S
1
, we minimize
with respect to
1.
Here



Therefore

Example
Iteration 2: Since


the equation


gives the next search direction as


where

Therefore
1
2
1
2
i-
i
i
i i
f
f
f S - S

V
V
+ V =
1
2
1
2
2
2 2
S - S
f
f
f
V
V
+ V =
Example
To find
2
*, we minimize



with respect to
2
. As df/d
2
=8
2
-2=0 at
2
*=1/4, we obtain:



Thus the optimum point is reached in two iterations. Even if we do not
know this point to be optimum, we will not be able to move from this point
in the next iteration. This can be verified as follows:
Example
Iteration 3:
Now


Thus,


This shows that there is no search direction to reduce f further, and hence
X
3
is optimum.
Newtons method
Newtons Method:

Newtons method presented in One Dimensional Minimisation Methods can
be extended for the minimization of multivariable functions. For this,
consider the quadratic approximation of the function f (X) at X=X
i
using
the Taylors series expansion


where [J
i
]=[J]|
X
is the matrix of second partial derivatives (Hessian matrix)
of f evaluated at the point X
i
. By setting the partial derivative of the above
equation equal to zero for the minimum of f (X), we obtain
Newtons method
Newtons Method:

The equations

and


give:

If [J
i
] is nonsingular, the above equation can be solved to obtain an
improved approximation (X=X
i+1
) as
Newtons method
Newtons Method:

Since higher order terms have been neglected in the equation


the equation is to be used iteratively to find the
optimum solution X*.

The sequence of points X
1
, X
2
, ..., X
i+1
can be shown to converge to the
actual solution X* from any initial point X
1
sufficiently close to the
solution

X* , provided that [J
i
] is nonsingular. It can be seen that Newtons
method uses the second partial derivatives of the objective function (in the
form of the matrix [J
i
] and hence is a second order method.
Example 1
Show that the Newtons method finds the minimum of a quadratic function
in one iteration.

Solution: Let the quadratic function be given by



The minimum of f (X) is given by:

Example 1
The iterative step of


gives


where X
i
is the starting point for the ith iteration. Thus the above equation gives
the exact solution
) ] ([ ] [
1
1
B X X X
i
+ =

+
A A
i i
B X* X
1
1
] [

+
= = A
i
Minimization of a quadratic function
in one step
Example 2
Minimize

by taking the starting point as


Solution: To find X
2
according to


we require [J
1
]
-1
, where
Example 2
Therefore,
Example 2
As



Equation

Gives

To see whether or not X
2
is the optimum point, we evaluate
Newtons method
As g
2
=0, X
2
is the optimum point. Thus the method has converged in one
iteration for this quadratic function.

If f (X) is a nonquadratic function, Newtons method may sometimes
diverge, and it may converge to saddle points and relative maxima. This
problem can be avoided by modifying the equation


as:

where
i
* is the minimizing step length in the direction
Newtons method
The modification indicated by


has a number of advantages:

1. It will find the minimum in lesser number of steps compared to the
original method.
2. It finds the minimum point in all cases, whereas the original method may
not converge in some cases.
3. It usually avoids convergence to a saddle point or a maximum.

With all these advantages, this method appears to be the most powerful
minimization method.
Newtons method
Despite these advantages, the method is not very useful in practice, due
to the following features of the method:

1. It requires the storing of the nxn matrix [J
i
]
2. It becomes very difficult and sometimes, impossible to compute the
elements of the matrix [J
i
].
3. It requires the inversion of the matrix [J
i
] at each step.
4. It requires the evaluation of the quantity [J
i
]
-1
V f
i
at each step.

These features make the method impractical for problems involving a
complicated objective function with a large number of variables.
Marquardt Method
The steepest descent method reduces the function value when the design vector X
i
is away from the optimum point X*. The Newton method, on the other hand,
converges fast when the design vector X
i
is close to the optimum point X*. The
Marquardt method attempts to take advantage of both the steepest descent and
Newton methods.

This method modifies the diagonal elements of the Hessian matrix, [J
i
] as


where [I] is the identity matrix and o
i
is a positive constant that ensure the positive
definiteness of

when [J
i
] is not positive. It can be noted that when o
i
is sufficiently large (on the
order of 10
4
), the term o
i
[I]

dominates [J
i
] and the inverse of the matrix [J
i
]
becomes:
| | | | | | I J J
i i i
o + =
~
| | | | | | ] [
1
] [ ] [ ]
~
[
1 1
1
I I I J J
i
i i i i
o
o o = ~ + =

Marquardt Method
Thus if the search direction S
i
is computed as:


S
i
becomes a

steepest descent direction for large values of
o
i
. In the Marquardt method, the value of o
i
is to be taken
large at the beginning and then reduced to zero gradually as
the iterative process progresses. Thus, as the value of o
i

decreases from a large value to zero, the characteristic of the
search method change from those of the steepest descent
method to those of the Newton method.
| |
i i i
f J V =
1
~
S
Marquardt Method
The iterative process of a modified version of Marquardt method can be
described as follows:
1. Start with an arbitrary initial point X
1
and constants o
1
(on the order of
10
4
), c
1
(0< c
1
<1), c
2
(c
2
>1), and c (on the order of 10
-2
). Set the iteration
number as i =1.
2. Compute the gradient of the function, Vf
i
= Vf(X
i
).
3. Test for optimality of the point X
i
. If

X
i
is optimum and hence stop the process. Otherwise, go to step 4.
4. Find the new vector X
i+1
as


5. Compare the values of f
i+1
and f
i
. If f
i+1
< f
i
, go to step 6. If f
i+1
> f
i
, go
to step 7.

Marquardt Method
6. Set o
i+1
= c
1
o
i
, i=i+1, and go to step 2.

7. Set o
i
= c
2
o
i
and go to step 4.

An advantage of this method is the absence of the step size
i
along the
search direction S
i
. In fact, the algorithm above can be modified by
introducing an optimal step length in the equation


as:



where
i
* is found using any of the one-dimensional search methods
described before.

Example
Minimize


from the starting point

Using Marquardt method with o
1
=10
4
, c
1
=1/4, c
2
=2, and c=10
-2
.
Solution:
Iteration 1 (i=1)
Here f
1
= f (X
1
)=0.0 and

Example
Since , we compute










As
Example
We set o
2
=c
1
o
1
=2500, i=2, and proceed to the next iteration.
Iteration 2: The gradient vector corresponding to X
2
is given by





and hence we compute


Example
Since



we set


and proceed to the next iteration. The iterative process is to be continued
until the convergence criterion


is satisfied.
Quasi-Newton methods
The basic equation used in the development of the Newton method


can be expressed as:


or


which can be written in the form of an iterative formula, as


Note that the Hessian matrix [J
i
] is composed of the second partial derivatives of f
and varies with the design vector X
i
for a nonquadratic (general nonlinear)
objective function f.

Quasi-Newton methods
The basic idea behind the quasi-Newton or variable metric methods is to
approximate either [J
i
] by another matrix [A
i
] or [J
i
]
-1
by another matrix
[B
i
], using only the first partial derivatives of f. If [J
i
]
-1
is approximated by
[B
i
], the equation


can be expressed as:

where
i
* can be considered as the optimal step length along the direction


It can be seen that the steepest descent direction method can be obtained as
a special case of the above equation by setting [B
i
]=[I]
Computation of [B
i
]
To implement

an approximate inverse of the Hessian matrix, [B
i
]~[A
i
]
-1
, is to be
computed. For this, we first expand the gradient of f about an arbitrary
reference point, X
0
, using Taylors series as:


If we pick two points X
i
and X
i+1
and use [A
i
] to approximate [J
0
], the
above equation can be rewritten as:




Subtracting the second of the equations from the first yields:

Computation of [B
i
]

where


The solution of the equation


for d
i
can be written as:


where [B
i
]=[A
i
]
-1
denotes an approximation to the inverse of the Hessian
matrix
Computation of [B
i
]
It can be seen that the equation


represents a system of n equations in n
2
unknown elements of the matrix
[B
i
]. Thus for n>1, the choice of [B
i
] is not unique and one would like to
choose a [B
i
] that is closest to [J
0
]
-1
, in some sense.

Numerous techniques have been suggested in the literature for the
computation of [B
i
] as the iterative process progresses (i.e., for the
computation of [B
i+1
] once [B
i
] is known). A major concern is that in
addition to satisfying the equation

the symmetry and the positive definiteness of the matrix [B
i
] is to be
maintained; that is,if [B
i
] is symmetric and positive-definite, [B
i+1
] must
remain symmetric and positive-definite.
Quasi-Newton Methods
Rank 1 Updates

The general formula for updating the matrix [B
i
] can be written as:


where [AB
i
] can be considered to be the update or correction matrix added
to [B
i
]. Theoretically, the matrix [AB
i
] can have its rank as high as n.
However, in practice, most updates, [AB
i
] , are only of rank 1 or 2. To
derive a rank 1 update, we simply choose a scaled outer product of a vector
z for [AB
i
] as:


where the constant c and the n-component vector z are to be determined.
The above two equations lead to:
Quasi-Newton Methods
Rank 1 Updates

By forcing the above equation to satisfy the quasi-Newton condition

we obtain

from which we get

Since (z
T
g
i
) in the above equation is a scalar, we can rewrite the above
equation as:
Quasi-Newton methods
Rank 1 updates continued:
Thus a simple choice for z and c would be:


This leads to the unique rank 1 update formula for [B
k+1
]:



This formula has been attributed to Broyden. To implement the
above algorithm, an initial symmetric positive definite matrix is
selected for [B
i
] at the start of the algorithm, and the next point X
2
is
computed using:
Quasi-Newton Methods
Rank 1 updates:
Then the new matrix [B
2
] is computed using the equation


and the new point X
3
is determined from


The iterative process is continued until convergence is achieved. If
[B
i
] is symmetric, the equation



ensures that [B
i+1
] is also symmetric. However, there is no
guarantee
that [B
i+1
] remains positive definite even if [B
i
] is positive definite.
Quasi-Newton Methods
Rank 1 updates:
This might lead to a breakdown of the procedure, especially when
used for the optimization of nonquadratic functions. It can be verified
that the columns of the matrix [AB
i
] is given by:



are multiples of each other. Thus the updating matrix has only one
independent column and hence the rank of the matrix will be 1. This
is the reason why the above equation is considered to be a rank 1
updating formula. Although the Broyden formula given above is not
robust, it has the property of quadratic convergence. The rank 2
update formulas given next guarantee both symmetry and positive
definiteness of the matrix [B
i+1
] and are more robust in minimizing
general nonlinear functions, and hence are preferred in practical
applications.
Quasi-Newton Methods
Rank 2 Updates: In rank 2 updates, we choose the update matrix
[AB
i
] as the sum of two rank 1 updates as:

where c
1
and c
2
and the n-component vectors z
1
and z
2
are to be
determined. The above equation and the equation

lead to

By forcing the above equation to satisfy the quasi-Newton condition

we obtain:
Quasi-Newton Methods
Rank 2 Updates:


where

can be identified as scalars. In order to satisfy the above equations,
the following choices can be made:



Thus, the rank 2 update formula can be expressed as:
Quasi-Newton method
Rank 2 updates:




The above formula is known as Davidon-Fletcher-Powell (DFP)
formula.
Since


where S
i
is the search direction, d
i
=X
i+1
-X
i
can be rewritten as:




Quasi-Newton method
Rank 2 updates:
Thus the equation


can be expressed as:



Remarks:
1. The equations





are known as inverse update formulas since these equations
approximate the inverse of the Hessian matrix of f.

Quasi-Newton method
Remarks:
2. It is possible to derive a family of direct update formulas in which
approximations to the Hessian matrix itself are considered. For this, we
express the quasi-Newton condition as:

The procedure used in deriving the equations:




can be followed by using [A
i
], d
i
, and g
i
in place of [B
i
], g
i
, and d
i
,
respectively. This leads to rank 2 update formula similar to the first
equation above, known as the Broydon-Fletcher-Goldfarb-Shanno
(BFGS) formula
Quasi-Newton method
In practical computations, the equation


is rewritten more conveniently in terms of [B
i
], as:


3. The DFP and BFGS formulas belong to a family of rank 2 updates known as
Huangs family of updates, which can be expressed for updating the
inverse of the Hessian matrix as:


where

and
i
and
i
are constant parameters.
Quasi-Newton method
It has been shown that the equation




maintains the symmetry and positive definiteness of [B
i+1
] if [B
i
] is
symmetric and positive definite. Different choices of
i
and
i
in the above
equation lead to different algorithms. For example, when
i
=1 and
i
=0,
the above equation gives the DFP formula given below:
Quasi-Newton method
When
i
=1 and
i
=1, the equation


yields the BFGS formula given below:



4. It has been shown that the BFGS method exhibits superlinear
convergence near X*.
5. Numerical experience indicates that the BFGS method is the best
unconstrained variable metric method and is less influenced by errors in
finding
i
* compared to the DFP method.
Quasi-Newton method
6. The methods discussed in this section are also known as secant methods
since the equations



can be considered as secant equations.
DAVIDON-FLETCHER-POWELL METHOD
The iterative procedure of the Davison-Fletcher-Powell (DFP) method
can be described as follows:
1. Start with an initial point X
1
and a n x n positive definite symmetric
matrix [B
1
] to approximate the inverse of the Hessian matrix of f.
Usually, [B
1
] is taken as the identity matrix [I]. Set the iteration number
as i=1.
2. Compute the gradient of the function, Vf
i
, at point X
i
, and set

3. Find the optimal step length
i
* in the direction S
i
and set

4. Test the new point X
i+1
for optimality.

If

X
i+1
is optimal, terminate the
iterative process. Otherwise, go to step 5.


DAVIDON-FLETCHER-POWELL METHOD
5. Update the matrix [B
i
] using the equation


as


where





6. Set the new iteration number as i=i+1, and go to step 2.
DAVIDON-FLETCHER-POWELL METHOD
Note: The matrix [B
i+1
], given by the equation


Remains positive definite only if
i
* is found accurately. Thus if
i
* is not
found accurately in any iteration, the matrix [B
i
] should not be updated.
There are several alternatives in such a case:
One possibility is to compute a better value of
i
* by using more number of
refits in the one-dimensional minimization procedure (until the product
S
i
T
Vf
i+1
becomes sufficiently small). However, this involves more
computational effort.
Another possibility is to specify a maximum number of refits in the one-
dimensional minimization method and to skip the updating of [B
i
] if
i
*
could not be found accurately in the specified number of refits.
The last possibility is to continue updating the matrix [B
i
] using the
approximate values of
i
* found, but restart the whole procedure after
certain number of iterations, that is, restart with i=1 in step 2 of the
method.
Example 1
Show that the DFP method is a conjugate gradient method.
Solution:
Consider the quadratic function


for which the gradient is given by


The above equation and the equation

give:
Example 1
Since

The equation

becomes:

Or

Premultiplication of the above equation by [B
i+1
] leads to
Example 1
The equations


yield:

The equation


can be used to obtain


since [B
i
] is symmetric.
Example 1
By substituting the equations





into the equation


we obtain:
Example 1
The quantity can be written as:



since
i
* is the minimizing step in the direction S
i
. The above equation
proves that the successive directions generated in the DFP method are [A]
conjugate and hence the method is a conjugate gradient method.



Example 2
Minimize

Taking X
1
= as the starting point. Use cubic interpolation method for
one-dimensional minimization.
Solution: Since this method requires the gradient of f, we find that



Iteration 1
We take
Example 2
At Therefore,



To find
1
*, we minimize



with respect to
1
. The above equation gives:

Example 2
Since the solution of the equation df/d
1
=0 cannot be obtained in a simple
manner, we use the cubic interpolation method for finding
1
*.

Cubic Interpolation Method (First fitting)
Stage 1: As the search direction S
1
is normalized already, we go to stage 2.
Stage 2: To establish lower and upper bounds on the optimal step size
1
*, we
have to find two points A and B at which the slope df/d
1
has different
signs. We take A=0 and choose an initial step size of t
0
=0.25 to find B. At

1
=A=0 :
Example 2
At
1
=t
0
=0.25 :



As df/d
1
is negative, we accelerate the search by taking
1
=4t
0
=1.00.
At
1
= 1:


Since df/d
1
is still negative, we take
1
= 2. At
1
= 2:

Example 2
Although df/d
1
is still negative, it appears to have come close to zero and
hence we take the next value of
1
as 2.50.
At
1
= 2.50:



Since df/d
1
is negative at
1
=2.00 and positive at
1
=2.50, we take A=
2.0 (instead of ero for faster convergence) and B=2.5. Therefore,


Example 2
Stage 3: To find the optimal step length using the equation







We compute


Therefore:
0 ) ( 2
) 2 3 )( ( 2
) )( 2 )( ( 2
) (
where
) (
2
*
~
2 2
2 / 1 2
> ' ' +
+ ' + ' + '
+ ' ' + ' +
' ' =

+ ' + '
+ '
+ =
B A
A B A
A B A
B A
B A
A
f f A B
Z f Z f Z f A B
Q Z f f f Z A B
f f Z Q
A B
Z f f
Q Z f
A
Example 2
To find whether is close to
1
*, we test the value of df/d
1




Also,

Since df/d
1
is not close to zero at , we use a refitting technique.
Second fitting: Now we take A= since df/d
1
is negative at and B=2.5.
Thus,
Example 2
With these values, we find that







To test for convergence, we evaluate df/d

at . Since
t can be assumed to be sufficiently close to zero and hence we take
=2.201. This gives:
1 . 81 )] 68 . 174 )( 818 . 0 ( ) 238 . 80 [(
2 / 1 2
= + = Q
Example 2
Testing X
2
for Convergence: To test whether the DFP method has
converged, we compute the gradient of f at X
2
:




As the components of this vector are not close to zero, X
2
is not optimum
and hence the procedure has to be continued until the optimum point is
found.

Broyden-Fletcher-Goldfarb-Shanno
Method
As stated earlier, a major difference between the DFP and BFGS
methods is that in the BFGS method, the Hessian matrix is
updated iteratively rather than the inverse of the Hessian matrix.
The BFGS method can be described by the following steps:
1. Start with an initial point X
1
and a n x n positive definite symmetric
matrix [B
1
] as an initial estimate of the inverse of the Hessian
matrix of f. In the absence of information [B
1
] is taken as the
identity matrix [I]. Compute the gradient vector


and set the iteration number as i=1.
2. Compute the gradient of the function, Vf
i
, at point X
i
, and set
Broyden-Fletcher-Goldfarb-Shanno
Method
3. Find the optimal step length
i
* in the direction S
i
and set


4. Test the point X
i+1
for optimality. If || Vf
i+1
|| c, where c is a small quantity,
take X*~X
i+1
and stop the process. Otherwise go to step 5.
5. Update the Hessian matrix as:


where



6. Set the new iteration number as i=i+1 and go to step 2.
Broyden-Fletcher-Goldfarb-Shanno
Method
Remarks:

1. The BFGS method can be considered as a quasi-Newton, conjugate gradient, and
variable metric method.

2. Since the inverse of the Hessian matrix is approximated, the BFGS method can
be called an indirect update method.

3. If the step lengths
i
* are found accurately,the matrix [B
i
] retains its positive
definiteness as the value of i increases. However, in practical application, the
matrix [B
i
] might become indefinite or even singular if
i
* are not found
accurately. As such, periodic resetting of the matrix [B
i
] to the identity matrix [I]
is desirable. However, numerical experience indicates that the BFGS method is
less influenced by errors in
i
* than is the DFP method.

4. It has been shown that the BFGS method exhibits superlinear convergence near
X*.
Broyden-Fletcher-Goldfarb-Shanno
Method
Example: Minimize



From the starting point using the BFGS method with



and c=0.01.
Example
Solution:
Iteration 1 (i=1):
Here


and hence


To find the minimizing step length
i
* along S
1
, we minimize



With respect to
1
. Since df/d
1
=0 at
1
=1, we obtain:
Example


Since



We proceed to update the matrix [B
i
] by computing
Example
Example



The equation



gives
Example
Iteration 2 (i=2): The next search direction is determined as:


To find the minimizing step length
2
* along S
2
, we minimize



With respect to
2
. Since df/d
2
=0 at
2
*

=1/4, we obtain:


This point can be identified to be optimum since

You might also like