ELEG 5040: Homework #1: 1 Problem 1
ELEG 5040: Homework #1: 1 Problem 1
ELEG 5040: Homework #1: 1 Problem 1
Xingyu ZENG
March 7, 2015
1 Problem 1
Suppose three layers to be x, h, y. See Figure 1.
h1 = σ(x1 + 1) (1)
h2 = σ(x2 + 1) (2)
h3 = σ(1 − x1 − 2x2 ) (3)
y1 = σ(2.5 − h1 − h2 − h3 ) (4)
y2 = σ(h1 + h2 + h3 − 2.5) (5)
Here, σ(x)=1 if x ≥ 0 else σ(x)=0;
(y1,y2)
(h1,h2,h3)
(x1,x2)
2 Problem 2
The network has 3 layer, the first layer has the same number dimension as the input x,
the second layer has n nodes which is the number of support vectors, and the last layer
has only node which outputs 1 if the input data x satisfy the svm conditions.
1
Let the second layer to be h=[h1 ,h2 ...hn ], the last layer to be y.
d
X
hi = |xj − xi,j |2 (6)
j=1
hi
h̄i = exp(− ) (7)
σ2
n
X
y= αi h̄i (8)
i=1
ȳ = σ(y) (9)
3 Problem 3
XX
f (x)[i, j] = w[m, n]x[i − m, j − n] (10)
m n
g(x)[i, j] = x[i − u, j − v] (11)
XX
⇒ f (g(x))[i, j] = w[m, n]g(x)[i − m, j − n] (12)
m n
XX
= w[m, n]x[i − m − u, j − n − v] (13)
m n
XX
= w[m, n]x[(i − u) − m, (j − v) − n] (14)
m n
= f (x)[i − u, j − v] = g(f (x)) (15)
XX
f (x)[i, j] = w[m, n]x[i − m, j − n] (16)
m n
s(x)[i, j] = x[u ∗ i, v ∗ j] (17)
XX
⇒ f (s(x))[i, j] = w[m, n]s[i − m, j − n] (18)
m n
XX
= w[m, n]s[u ∗ (i − m), v ∗ (j − n)] (19)
m n
XX
6= w[m, n]s[u ∗ i − m, v ∗ j − n] (20)
m n
= f (x)[u ∗ i, v ∗ j] = s(f (x)) (21)
2
4 Problem 4
The output of k-stride convolution can be regarded as the output of one k-step pooling
applied on the output of 1-stride convolution.
When the convolution stride is k,
X
netj = (x ⊗ w)[j ∗ k] = wu ∗ xj∗k−u (22)
u
∂L X ∂L ∂netj X
⇒ = ∗ =− δj xj∗k−m (23)
∂wm ∂netj ∂wm
j j
(24)
∂L
Here δj = − ∂netj
is the j-th element of sensitivity map.
For 2D case,
X
neti,j = (x ⊗ w)[i ∗ k, j ∗ k] = wu,v ∗ xi∗k−u,j∗k−v (25)
u,v
∂L X ∂L ∂neti,j X
⇒ = ∗ =− δi,j xi∗k−m,j∗k−n (26)
∂wm,n ∂neti,j ∂wm,n
i,j i,j
∂L
Here δi,j = − ∂neti,j
is the i,j-th element of sensitivity map.
5 Problem 5
5.1 Squared Error Case
∂zk enetk 1 1
= −net
= −net
− = zk ∗ (1 − zk ) (27)
∂netk (1 + e k )2 1+e k (1 + e−netk )2
∂J ∂J ∂zk
⇒ δk = − =− ∗ (28)
∂netk ∂zk ∂netk
∂zk
= −(tk − zk ) ∗ = (zk − tk ) ∗ zk ∗ (1 − zk ) (29)
∂netk
When zk is near to 0 or 1, δk will be close to 0 even if zk − tk is large. The network
will not update parameters if such situation happens and the prediction will remain far
away from the target.
3
5.2 Cross Entropy Case
= tk − tk ∗ zk − zk ∗ (1 − tk ) = tk − zk (34)
Because the nonlinear activation function is softmax, the prediction sum is to be 1. If the
prediction error is large, then for the element with target value to be 1, the prediction
value will be far away from 1. That δk will be large.