Backpropergation

11 Backpropagation
m m
m+1 T m+1
= ) Q :
V
, for m = M 1 } 2 1 .
(11.45)
Finally, the weights and biases are updated using the approximate steepest descent rule:
m
: k + 1 = : k DV D
m
m1 T
(11.46)
E k + 1 = E k DV .
(11.47)
Example
2
+2
To illustrate the backpropagation algorithm, lets choose a network and apply it to a particular problem. To begin, we will use the 1-2-1 network that
we discussed earlier in this chapter. For convenience we have reproduced
the network in Figure 11.8.
Next we want to define a problem for the network to solve. Suppose that we
want to use the network to approximate the function
S
g p = 1 + sin p for 2 d p d 2 .
4
(11.48)
To obtain our training set we will evaluate this function at several values
of p .
,QSXW
/RJ6LJPRLG/D\HU
Z
Q
/LQHDU/D\HU
D
Z
E

Z
Q

D
E
Z
Q
D
E

D ORJVLJ:SE
D SXUHOLQ:DE
Figure 11.8 Example Function Approximation Network

Before we begin the backpropagation algorithm we need to choose some initial values for the network weights and biases. Generally these are chosen
to be small random values. In the next chapter we will discuss some reasons for this. For now lets choose the values
11-14
Example
1
1
2
2
: 0 = 0.27 , E 0 = 0.48 , : 0 = 0.09 0.17 , E 0 = 0.48 .
0.41
0.13
The response of the network for these initial values is illustrated in Figure
11.9, along with the sine function we wish to approximate.

p
Figure 11.9 Initial Network Response
Next, we need to select a training set ^p 1 t 1` ^p 2 t 2` } ^p Q tQ` . In this
case, we will sample the function at 21 points in the range [-2,2] at equally
spaced intervals of 0.2. The training points are indicated by the circles in
Figure 11.9.
Now we are ready to start the algorithm. The training points can be presented in any order, but they are often chosen randomly. For our initial input we will choose p = 1 , which is the 16th training point:
0
a = p = 1.
The output of the first layer is then
1
1
1 0
1
D = I : D + E = ORJVLJ 0.27 1 + 0.48 = ORJVLJ 0.75
0.41
0.54
0.13
1

0.75
= 1+e
= 0.321 .
0.368
1

1+e
0.54
The second layer output is
11-15
11 Backpropagation
2
2
2 1
2
a = f : D + E = purelin 0.09 0.17 0.321 + 0.48 = 0.446 .
0.368
The error would then be
S 2
S
e = t a = 1 + sin p a = 1 + sin 1 0.446 = 1.261 .
4
4
The next stage of the algorithm is to backpropagate the sensitivities. Before we begin the backpropagation, recall that we will need the derivatives
1
2
of the transfer functions, f n and f n . For the first layer
n
1 1
e
1
1
d
= 1 a1 a1 .

f n =
= 2 = 1
n
dn 1 + e
1+e
1+e
1 + e
For the second layer we have
2
f n = d n = 1 .
dn
We can now perform the backpropagation. The starting point is found at

the second layer, using Eq. (11.44):
2 2
2
V = 2) Q W D = 2 f 2 n 2 1.261 = 2 1 1.261 = 2.522 .
The first layer sensitivity is then computed by backpropagating the sensitivity from the second layer, using Eq. (11.45):
1
1 1
1 a1 a1
0
2 T 2
0.09
V = ) Q : V =
2.522
1
1
0.17
1 a2 a2
0
1
0
0.09
= 1 0.321 0.321
2.522
0
1 0.368 0.368 0.17
0.227 = 0.0495 .
= 0.218 0
0 0.233 0.429
0.0997
The final stage of the algorithm is to update the weights. For simplicity, we
will use a learning rate D = 0.1 . (In Chapter 12 the choice of learning rate
will be discussed in more detail.) From Eq. (11.46) and Eq. (11.47) we have
11-16
Batch vs. Incremental Training
1 T
: 1 = : 0 DV D = 0.09 0.17 0.1 2.522 0.321 0.368

= 0.171 0.0772 ,
2
E 1 = E 0 DV = 0.48 0.1 2.522 = 0.732 ,

1
1
1 0 T
: 1 = : 0 DV D = 0.27 0.1 0.0495 1 = 0.265 ,
0.41
0.0997
0.420
1
1
1
E 1 = E 0 DV = 0.48 0.1 0.0495 = 0.475 .
0.13
0.0997
0.140
This completes the first iteration of the backpropagation algorithm. We

next proceed to randomly choose another input from the training set and
perform another iteration of the algorithm. We continue to iterate until the
difference between the network response and the target function reaches
some acceptable level. (Note that this will generally take many passes
through the entire training set.) We will discuss convergence criteria in
more detail in Chapter 12.
To experiment with the backpropagation calculation for this two-layer network, use the MATLAB Neural Network Design Demonstration Backpropagation Calculation (nnd11bc).
Batch vs. Incremental Training

Incremental Training
Batch Training
The algorithm described above is the stochastic gradient descent algorithm, which involves on-line or incremental training, in which the network weights and biases are updated after each input is presented (as with
the LMS algorithm of Chapter 10). It is also possible to perform batch training, in which the complete gradient is computed (after all inputs are applied to the network) before the weights and biases are updated. For
example, if each input occurs with equal probability, the mean square error
performance index can be written
1
F [ = E > H H @ = E > W D W D @ =
Q
T
Wq Dq
q=1
The total gradient of this performance index is
11-17
Wq Dq .
(11.49)

Backpropergation

Uploaded by

Copyright:

Available Formats

Backpropergation

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Backpropergation

Uploaded by

Copyright:

Available Formats

11 Backpropagation

D SXUHOLQ:DE

Figure 11.8 Example Function Approximation Network

The second layer output is

The error would then be

We can now perform the backpropagation. The starting point is found at

Batch vs. Incremental Training

: 1 = : 0 DV D = 0.09 0.17 0.1 2.522 0.321 0.368

E 1 = E 0 DV = 0.48 0.1 2.522 = 0.732 ,

This completes the first iteration of the backpropagation algorithm. We

Batch vs. Incremental Training

The total gradient of this performance index is

You might also like

Backpropergation

Uploaded by

Copyright:

Available Formats

Backpropergation

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Backpropergation

Uploaded by

Copyright:

Available Formats

11 Backpropagation

D SXUHOLQ :DE

Figure 11.8 Example Function Approximation Network

The second layer output is

The error would then be

We can now perform the backpropagation. The starting point is found at

Batch vs. Incremental Training

: 1 = : 0 DV D = 0.09 0.17 0.1 2.522 0.321 0.368

E 1 = E 0 DV = 0.48 0.1 2.522 = 0.732 ,

This completes the first iteration of the backpropagation algorithm. We

Batch vs. Incremental Training

The total gradient of this performance index is

You might also like

D SXUHOLQ:DE