hw8 (5555)

Mathematical Foundations of Deep Neural Networks, M1407.
001200
E. Ryu
Spring 2024
Due 5pm, Monday, May 06, 2024
Problem 1: Transpose of downsampling. Consider the downsampling operator T : Rm×n →

R(m/2)×(n/2) , defined as the average pool with a 2 × 2 kernel and stride 2. For the sake of
simplicity, assume m and n are even. Describe the action of T ⊤ . More specifically, describe
how to compute T ⊤ (Y ) for any Y ∈ R(m/2)×(n/2) .
Clarification. The downsampling operator T is a linear operator (why?). Therefore, T has a

matrix representation A ∈ R(mn/4)×(mn) such that
T (X) = (A(X.reshape(mn))).reshape(m/2, n/2)
for all X ∈ Rm×n . The adjoint T ⊤ has two equivalent definitions. One definition is
T ⊤ (Y ) = (A⊤ (Y.reshape(mn/4))).reshape(m, n)
for all Y ∈ R(m/2)×(n/2) . Another is

m/2 n/2 m X
n
XX X
Yij (T (X))ij = (T ⊤ (Y ))ij (X)ij
i=1 j=1 i=1 j=1
for all X ∈ Rm×n and Y ∈ R(m/2)×(n/2) .

Hint. To spoil the suspence, T ⊤ is a constant times the nearest neighbor upsampling. Explain
why in your answer.
Problem 2: Nearest neighbor upsampling. How is the nearest neighbor upsampling operator
an instance of transpose convolution? Specifically, describe how
layer = nn . Upsample ( scale_factor =r , mode = ’ nearest ’)
where r is a positive integer, can be equivalently represented by

layer = nn . ConvTranspose2d (...)
layer . weight . data = ...
with ... appropriately filled in.
1
Problem 3: f-divergence. Let X and Y be two continuous random variables with densities pX
and pY . The f -divergence of X from Y is defined as
Z
pX (x)
Df (X∥Y ) = f pY (x) dx,
pY (x)
where f is a convex function such that f (1) = 0.
(a) Show that Df (X∥Y ) ≥ 0.
(b) Show that f = − log t and f = t log t correspond to the KL divergence.
Problem 4: Generalized inverse transform sampling. Let F : R → [0, 1] be the CDF of a

random variable and let U ∼ Uniform([0, 1]). If F is continuous and strictly increasing and
therefore invertible, then F −1 (U ) is a random variable with CDF F , because
P(F −1 (U ) ≤ t) = P(U ≤ F (t)) = F (t).
When F is not necessarily invertible, the generalized inverse of F is G : (0, 1) → R with
G(u) = inf{x ∈ R | u ≤ F (x)}.
Show that G(U ) is a random variable with CDF F .
Hint. Use the fact that F is right-continuous, i.e., limh→0+ F (x + h) = F (x) for all x ∈ R, and
that limx→−∞ F (x) = 0.
Problem 5: Change of variables formula for Gaussians. If φ : Rn → Rn is a one-to-one differ-

entiable function, Y = φ(X), and Y is a continuous random variable with density function pY ,
then X is a continuous random variable with density function
∂φ
pX (x) = pY (φ(x)) det (x) .
∂x
Let Y ∈ Rn be a continuous random vector with density

1 − 21 ∥y∥2
pY (y) = e ,
(2π)n/2
i.e., Y ∼ N (0, I). Let X = AY + b with an invertible matrix A ∈ Rn×n and a vector b ∈ Rn .
Define Σ = AA⊺ . Show that X is a continuous random vector with density
1 1 ⊺ −1
pX (x) = p e− 2 (x−b) Σ (x−b) .
n
(2π) det Σ
2
Problem 6: Inverse permutation. Let Sn denote the group of length-n permutations. Note
that the map i 7→ σ(i) is a bijection. Define σ −1 ∈ Sn as the permutation representing the
inverse of this map, i.e, σ −1 (σ(i)) = i for i = 1, . . . , n. Describe an algorithm for computing
σ −1 given σ.
Clarification. In this class, we defined σ as a list of length n containing the elements of {1, . . . , n}
exactly once. The output of the algorithm, σ −1 , should also be provided as a list.
Clarification. For this problem, it is sufficient to describe the algorithm in equations or pseu-
docode. There is no need to submit a Python script for this problem.
Problem 7: Permutation matrix. Given a permutation σ ∈ Sn , the permutation matrix of σ is

defined as  ⊺ 
eσ(1)
 e⊺ 
 σ(2)  n×n
Pσ =  ..  ∈ R ,

 . 
e⊺σ(n)
where e1 , . . . , en ∈ Rn are the standard unit vectors. Show
(a) (Pσ x)i = xσ(i) for all x ∈ Rn and i = 1, . . . , n,
(b) Pσ⊺ = Pσ−1 = Pσ−1 and
(c) | det Pσ | = 1.
Hint. If the rows of U ∈ Rn×n are orthonormal, we say U is an orthogonal matrix. Orthogonal
matrices satisfy U U ⊺ = U ⊺ U = I.

hw8 (5555)

Uploaded by

Document Informationclick to expand document information

Copyright:

Available Formats

hw8 (5555)

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

hw8 (5555)

Uploaded by

Copyright:

Available Formats

Mathematical Foundations of Deep Neural Networks, M1407.

Problem 1: Transpose of downsampling. Consider the downsampling operator T : Rm×n →

Clarification. The downsampling operator T is a linear operator (why?). Therefore, T has a

T (X) = (A(X.reshape(mn))).reshape(m/2, n/2)

for all Y ∈ R(m/2)×(n/2) . Another is

for all X ∈ Rm×n and Y ∈ R(m/2)×(n/2) .

where r is a positive integer, can be equivalently represented by

with ... appropriately filled in.

where f is a convex function such that f (1) = 0.

(a) Show that Df (X∥Y ) ≥ 0.

(b) Show that f = − log t and f = t log t correspond to the KL divergence.

Problem 4: Generalized inverse transform sampling. Let F : R → [0, 1] be the CDF of a

P(F −1 (U ) ≤ t) = P(U ≤ F (t)) = F (t).

When F is not necessarily invertible, the generalized inverse of F is G : (0, 1) → R with

G(u) = inf{x ∈ R | u ≤ F (x)}.

Show that G(U ) is a random variable with CDF F .

Problem 5: Change of variables formula for Gaussians. If φ : Rn → Rn is a one-to-one differ-

Let Y ∈ Rn be a continuous random vector with density

Problem 7: Permutation matrix. Given a permutation σ ∈ Sn , the permutation matrix of σ is

(a) (Pσ x)i = xσ(i) for all x ∈ Rn and i = 1, . . . , n,

(b) Pσ⊺ = Pσ−1 = Pσ−1 and

You might also like