ELEG 5040: Homework #1: 1 Problem 1

Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

ELEG 5040: Homework #1

Xingyu ZENG

March 7, 2015

1 Problem 1
Suppose three layers to be x, h, y. See Figure 1.

h1 = σ(x1 + 1) (1)
h2 = σ(x2 + 1) (2)
h3 = σ(1 − x1 − 2x2 ) (3)
y1 = σ(2.5 − h1 − h2 − h3 ) (4)
y2 = σ(h1 + h2 + h3 − 2.5) (5)
Here, σ(x)=1 if x ≥ 0 else σ(x)=0;

(y1,y2)

(h1,h2,h3)

(x1,x2)

Figure 1: Three layer network

2 Problem 2
The network has 3 layer, the first layer has the same number dimension as the input x,
the second layer has n nodes which is the number of support vectors, and the last layer
has only node which outputs 1 if the input data x satisfy the svm conditions.

1
Let the second layer to be h=[h1 ,h2 ...hn ], the last layer to be y.
d
X
hi = |xj − xi,j |2 (6)
j=1
hi
h̄i = exp(− ) (7)
σ2
n
X
y= αi h̄i (8)
i=1
ȳ = σ(y) (9)

where σ(y)=1 if y ≥ 0 else σ(y)=0

3 Problem 3
XX
f (x)[i, j] = w[m, n]x[i − m, j − n] (10)
m n
g(x)[i, j] = x[i − u, j − v] (11)
XX
⇒ f (g(x))[i, j] = w[m, n]g(x)[i − m, j − n] (12)
m n
XX
= w[m, n]x[i − m − u, j − n − v] (13)
m n
XX
= w[m, n]x[(i − u) − m, (j − v) − n] (14)
m n
= f (x)[i − u, j − v] = g(f (x)) (15)

Thus convolution has equivariance to translation.

XX
f (x)[i, j] = w[m, n]x[i − m, j − n] (16)
m n
s(x)[i, j] = x[u ∗ i, v ∗ j] (17)
XX
⇒ f (s(x))[i, j] = w[m, n]s[i − m, j − n] (18)
m n
XX
= w[m, n]s[u ∗ (i − m), v ∗ (j − n)] (19)
m n
XX
6= w[m, n]s[u ∗ i − m, v ∗ j − n] (20)
m n
= f (x)[u ∗ i, v ∗ j] = s(f (x)) (21)

Thus Convolution is not equivariant to downsampling.

2
4 Problem 4
The output of k-stride convolution can be regarded as the output of one k-step pooling
applied on the output of 1-stride convolution.
When the convolution stride is k,
X
netj = (x ⊗ w)[j ∗ k] = wu ∗ xj∗k−u (22)
u
∂L X ∂L ∂netj X
⇒ = ∗ =− δj xj∗k−m (23)
∂wm ∂netj ∂wm
j j

(24)
∂L
Here δj = − ∂netj
is the j-th element of sensitivity map.
For 2D case,
X
neti,j = (x ⊗ w)[i ∗ k, j ∗ k] = wu,v ∗ xi∗k−u,j∗k−v (25)
u,v
∂L X ∂L ∂neti,j X
⇒ = ∗ =− δi,j xi∗k−m,j∗k−n (26)
∂wm,n ∂neti,j ∂wm,n
i,j i,j

∂L
Here δi,j = − ∂neti,j
is the i,j-th element of sensitivity map.

5 Problem 5
5.1 Squared Error Case

∂zk enetk 1 1
= −net
= −net
− = zk ∗ (1 − zk ) (27)
∂netk (1 + e k )2 1+e k (1 + e−netk )2
∂J ∂J ∂zk
⇒ δk = − =− ∗ (28)
∂netk ∂zk ∂netk
∂zk
= −(tk − zk ) ∗ = (zk − tk ) ∗ zk ∗ (1 − zk ) (29)
∂netk
When zk is near to 0 or 1, δk will be close to 0 even if zk − tk is large. The network
will not update parameters if such situation happens and the prediction will remain far
away from the target.

3
5.2 Cross Entropy Case

enetk ∗ enetk0 − enetk ∗ enetk


P
∂zk k0
= P net 0 2 = zk (1 − zk ) (30)
∂netk ( k0 e k )
00
∂zk00 −enetk ∗ enetk
= P net 0 = −zk ∗ zk00 (31)
∂netk ( k0 e k )2
∂J X ∂J ∂zk0
⇒ δk = − =− ∗ (32)
∂netk 0
∂zk0 ∂netk
k
X t 0 ∂zk0 X
k
= ∗ = tk (1 − zk ) − tk00 ∗ zk (33)
0
zk0 ∗ ∂netk 00
k k 6=k

= tk − tk ∗ zk − zk ∗ (1 − tk ) = tk − zk (34)

Because the nonlinear activation function is softmax, the prediction sum is to be 1. If the
prediction error is large, then for the element with target value to be 1, the prediction
value will be far away from 1. That δk will be large.

You might also like