Backpropergation
Backpropergation
Backpropergation
m m
m+1 T m+1
= ) Q :
V
, for m = M 1 } 2 1 .
(11.45)
Finally, the weights and biases are updated using the approximate steepest descent rule:
m
: k + 1 = : k DV D
m
m1 T
(11.46)
E k + 1 = E k DV .
(11.47)
Example
2
+2
To illustrate the backpropagation algorithm, lets choose a network and apply it to a particular problem. To begin, we will use the 1-2-1 network that
we discussed earlier in this chapter. For convenience we have reproduced
the network in Figure 11.8.
Next we want to define a problem for the network to solve. Suppose that we
want to use the network to approximate the function
S
g p = 1 + sin p for 2 d p d 2 .
4
(11.48)
To obtain our training set we will evaluate this function at several values
of p .
,QSXW
/RJ6LJPRLG/D\HU
Z
Q
/LQHDU/D\HU
D
Z
E
Z
Q
D
E
Z
Q
D
E
D ORJVLJ:SE
11-14
Example
1
1
2
2
: 0 = 0.27 , E 0 = 0.48 , : 0 = 0.09 0.17 , E 0 = 0.48 .
0.41
0.13
The response of the network for these initial values is illustrated in Figure
11.9, along with the sine function we wish to approximate.
p
Figure 11.9 Initial Network Response
Next, we need to select a training set ^p 1 t 1` ^p 2 t 2` } ^p Q tQ` . In this
case, we will sample the function at 21 points in the range [-2,2] at equally
spaced intervals of 0.2. The training points are indicated by the circles in
Figure 11.9.
Now we are ready to start the algorithm. The training points can be presented in any order, but they are often chosen randomly. For our initial input we will choose p = 1 , which is the 16th training point:
0
a = p = 1.
The output of the first layer is then
1
1
1 0
1
D = I : D + E = ORJVLJ 0.27 1 + 0.48 = ORJVLJ 0.75
0.41
0.54
0.13
1
0.75
= 1+e
= 0.321 .
0.368
1
1+e
0.54
11-15
11 Backpropagation
2
2
2 1
2
a = f : D + E = purelin 0.09 0.17 0.321 + 0.48 = 0.446 .
0.368
S 2
S
e = t a = 1 + sin p a = 1 + sin 1 0.446 = 1.261 .
4
4
The next stage of the algorithm is to backpropagate the sensitivities. Before we begin the backpropagation, recall that we will need the derivatives
1
2
of the transfer functions, f n and f n . For the first layer
n
1 1
e
1
1
d
= 1 a1 a1 .
f n =
= 2 = 1
n
dn 1 + e
1+e
1+e
1 + e
For the second layer we have
2
f n = d n = 1 .
dn
The first layer sensitivity is then computed by backpropagating the sensitivity from the second layer, using Eq. (11.45):
1
1 1
1 a1 a1
0
2 T 2
0.09
V = ) Q : V =
2.522
1
1
0.17
1 a2 a2
0
1
0
0.09
= 1 0.321 0.321
2.522
0
1 0.368 0.368 0.17
0.227 = 0.0495 .
= 0.218 0
0 0.233 0.429
0.0997
The final stage of the algorithm is to update the weights. For simplicity, we
will use a learning rate D = 0.1 . (In Chapter 12 the choice of learning rate
will be discussed in more detail.) From Eq. (11.46) and Eq. (11.47) we have
11-16
1 T
The algorithm described above is the stochastic gradient descent algorithm, which involves on-line or incremental training, in which the network weights and biases are updated after each input is presented (as with
the LMS algorithm of Chapter 10). It is also possible to perform batch training, in which the complete gradient is computed (after all inputs are applied to the network) before the weights and biases are updated. For
example, if each input occurs with equal probability, the mean square error
performance index can be written
1
F [ = E > H H @ = E > W D W D @ =
Q
T
Wq Dq
q=1
11-17
Wq Dq .
(11.49)