Midtermsols Sp2010
Midtermsols Sp2010
Midtermsols Sp2010
L. El Ghaoui
Midterm Solutions
P := x ∈ R3 : x1 + 2x2 + 3x3 = 1 .
(a) Show that the set P is an affine subspace of dimension 2. To this end, express it
as x0 + span(x1 , x2 ), where x0 ∈ P, and x1 , x2 are independent vectors.
(b) Find the minimum Euclidean distance from 0 to the set P. Find a point that
achieves the minimum distance. (Hint: either apply a formula if you know it, or
prove that the minimum-distance point is proportional to the vector a := (1, 2, 3).)
Solutions:
x1 = 1 − 2x2 − 3x3 .
We check that the two vectors x1 , x2 are indeed independent, since λx1 + µx2 = 0
implies λ = µ = 0.
(b) The minimum distance to the affine set {x : Ax = b}, with b ∈ Rm , and
A ∈ Rm×n full row rank (that is, AAT is positive-definite), is given by the formula
x∗ = AT (AAT )−1 b. Applying this formula to A = aT , b = 1, yields
1
∗ a 1
x = T = 2 .
a a 14
3
1
√ √
The minimum distance is kx∗ k2 = 1/ aT a = 1/ 14.
Alternatively, we notice that any vector x ∈ R3 can be decomposed as x = ta + z,
with t ∈ R, and z ∈ R3 , with z T a = 0. The condition x ∈ P then implies
t = 1/aT a. Since kxk22 = t2 aT a + z T z = (1/aT a) + z T z, the objective function of
the minimum Euclidean distance problem
r
1
min kxk2 = min T
+ z T z : aT z = 0
x∈P z a a
2. (8 points) Consider the operation of finding the point symmetric to a given point about
a given line L in Rn .
x0
x
We define the line as L := {x0 + tu, t ∈ R}, where x0 is a point on the line, and u its
direction, which we assume is normalized: kuk2 = 1. For a given point x ∈ Rn , We
denote by f (x) ∈ Rn the point that is symmetric to x about the line. (See Fig 1.)
That is, f (x) = 2p(x) − x where p(x) is the projection of x on the line:
(a) Show that the mapping f is affine. Describe it in terms of a n × n matrix A and a
n × 1 vector b, such that f (x) = Ax + b for every x. (It will be useful to use the
notation P := uuT .)
(b) What is the geometric interpretation of the vector b?
(c) Show that the mapping f is linear if and only if the line passes through 0.
(d) Show that f (f (x)) = x for every x. What is the geometric meaning of this
property?
2
(e) What is the range and nullspace of the matrix A? What is the rank of A? Is A
invertible?
(f) Show that A is symmetric, find its eigenvalue decomposition (EVD). Hint: define
u2 , . . . , un to be an orthonormal basis for the subspace orthogonal to u, and show
that the orthogonal matrix U := [u, u2 , . . . , un ] contains eigenvectors of A.
(g) Find an SVD decomposition of A. What is the relationship between the EVD of
A with its SVD?
(h) Assume that the input is bounded: kxk2 ≤ 1. Find a bound on the Euclidean
norm of the output f (x). Find an input x that achieves the bound.
Solutions:
Hence
f (x) = 2p(x) − x = (2P − I)(x − x0 ) + x0 = Ax + b,
where A = 2P − I, b = 2(I − P )x0 .
(b) Since f (0) = b, the latter is simply the symmetric to the origin about the line.
(c) The mapping is linear if only if b = 0, that is, when x0 satisfies P x0 = x0 . Hence,
x0 = uuT x0 = (uT x0 )u is proportional to u. In that case, the line goes through 0,
since 0 = x0 + tu, with t = −(uT x0 ).
(d) We have P u = u. Further, P 2 = (uuT )(uuT ) = (uT u)uuT = uuT = P . The latter
implies
A2 = (2P − I)(2P − I) = 4P 2 − 4P + I = I.
In addition, P b = 2P (I − P )x0 = 0. We obtain Ab = (2P − I)b = −b. We thus
obtain that
f (f (x)) = A(Ax + b) + b = A2 x + Ab + b = x − b + b = x.
The geometry of this is simply that the symmetric to the symmetric is itself.
(e) The nullspace of A is the set of vectors with Ax = 0, meaning 2P x = x. Thus,
x = 2(uT x)u is proportional to u. Since Ax = 0, but Au = u 6= 0, we must have
uT x = 0, hence x = 2(uT x)u = 0. We conclude that the nullspace is {0}, the
range is Rn , and A is full rank, hence invertible since it is also square.
3
(f) Since A = 2uuT − I, it is symmetric.
Let u2 , . . . , un be an orthonormal basis for the subspace orthogonal to u; we have
uTi u = 0, i = 2, . . . , n. We have Au = u, Aui = −ui , i = 2, . . . , n. Hence the vec-
tor u is an eigenvector associated with the eigenvalue 1, and the ui , i = 2, . . . , n,
are eigenvectors all associated with the eigenvalue −1. Writing the previous con-
ditions compactly as AU = U Λ, with U = [u, u2 , . . . , un ] an orthogonal matrix,
Λ = diag(1, −1, . . . , −1), we obtain that A admits the symmetric eigenvalue de-
composition A = U ΛU T .
(g) We have P ui = (uT ui )u = 0, i = 2, . . . , n. With U := [u, u2 , . . . , un ], we get
P U = [P u, P u2 , . . . , P un ] = [u, 0, . . . , 0], therefore
AU = [Au, Au2 , . . . , Aun ] = [u, −u2 , . . . , −un ] =: V.
Both U, V are orthogonal matrices. Post-multiplying the above relation by U T =
U −1 , we obtain A = V U T , which is the SVD of A, with V the left singular vectors,
and U the right singular vectors. Note that every singular value of A is one, which
is consistent with A2 = AAT = AT A = I.
The relationship with the SVD is simply that the eigenvectors u, u2 , . . . , un are
the right singular vectors as well. Flipping the signs on the last n − 1 eigenvectors
provides the left singular vectors.
(h) We want to solve
max kAx + bk2 .
x : kxk2 ≤1
4
We then consider the projection of the points xi , i = 1, . . . , m, on the line L(w), and
look at the associated coordinates of the points on the line. These projected values are
given by ti (w) := arg min ktw − xi k2 , i = 1, . . . , m.
t
We assume that for any w, the empirical average t̂(w) of the projected values ti (w),
i = 1, . . . , m, and their empirical variance σ 2 (w), are both constant, independent of
the direction w (wih kwk2 = 1). Denote by t̂ and σ 2 the (constant) empirical average
and variance. Justify your answer to the following as carefully as you can.
is zero.
(c) Show that the empirical covariance matrix of the data points,
m
1 X
Σ := (xi − x̂)(xi − x̂)T ,
m i=1
is of the form σ 2 · I, where I is the identity matrix of order n. (Hint: the largest
eigenvalue λmax of the matrix Σ can be written as: λmax = maxw {wT Σw : wT w =
1}, and a similar expression holds for the smallest eigenvalue.)
Solutions:
Let us drop i for a moment, and solve the least-squares problem with variable
t ∈ R:
p∗ := min ktw − xk22 .
t
One can apply the closed-form solution for least-squares problem, in which the ma-
trix involved is the full column-rank matrix w. This leads to t(w) = (wT w)−1 wT x =
wT x. (Recall that kwk2 = 1.)
Alternatively, we can solve the above problem directly, exploiting again kwk2 = 1:
p∗ = min t2 − 2(wT x)t + kxk22 = min (t − (wT x))2 + C, with C := kxk22 − (wT x)2 .
t t
The quantity C is constant (independent of the variable t). The first term in
the objective function above is non-negative, hence p∗ ≥ C. This lower bound is
attained with t = xT w.
5
(b) The empirical average of the numbers ti (w), i = 1, . . . , m, is
m m
1 X 1 X T
t̂(w) = ti (w) = w xi = wT x̂,
m i=1 m i=1
where x̂ is the empirical average of the data points. We obtain that there is a
constant α ∈ R such that
∀ w, kwk2 = 1 : wT x̂ = α.
Expressing the condition above for both w and −w, we obtain that α = 0. This
means that x̂ is orthogonal to any (unit-norm) vector, hence it is zero.
(c) The empirical variance of the numbers ti (w), i = 1, . . . , m, is given by
m
2 1 X
σ (w) = (ti (w) − t̂(w))2 .
m i=1
where Σ is the empirical covariance matrix of the data points. The property of
constant variance is thus equivalent to the fact that the quadratic form w →
wT Σw is a constant function on the unit ball {w : kwk2 = 1}. We have denoted
by σ 2 this constant.
The largest and smallest eigenvalue of Σ admit the variational representation
Since the objective function of these problem is the same constant function, we
obtain that λmax (Σ) = λmin (Σ) = σ 2 . Hence all eigenvalues of Σ are equal, to σ 2 .
That is, the diagonal matrix of eigenvalues is given by Λ = σ 2 I. The eigenvalue
decomposition of Σ is of the form Σ = U T ΛU , with U an orthogonal matrix of
eigenvectors. Since Λ = σ 2 I, we obtain that Σ = σ 2 U T U = σ 2 I as well.