Midtermsols Sp2010

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

EE127A 3/19/10

L. El Ghaoui

Midterm Solutions

1. (4 points) Consider the set in R3 , defined by the equation

P := x ∈ R3 : x1 + 2x2 + 3x3 = 1 .


(a) Show that the set P is an affine subspace of dimension 2. To this end, express it
as x0 + span(x1 , x2 ), where x0 ∈ P, and x1 , x2 are independent vectors.
(b) Find the minimum Euclidean distance from 0 to the set P. Find a point that
achieves the minimum distance. (Hint: either apply a formula if you know it, or
prove that the minimum-distance point is proportional to the vector a := (1, 2, 3).)

Solutions:

(a) The affine subspace P is of dimension 2 in R3 , it is a (hyper-) plane. To show


this, we solve the equation for one of the variables, say x1 :

x1 = 1 − 2x2 − 3x3 .

This shows that any vector x ∈ P can be expressed as


       
1 − 2x2 − 3x3 1 −2 −3
x= x2  =  0  + x2  1  + x3  0  ,
x3 0 0 1

with x2 , x3 free parameters. Thus, P = x0 + span(x1 , x2 ), with


     
1 −2 −3
x0 =  0  , x 1 =  1  , x 2 =  0  .
0 0 1

We check that the two vectors x1 , x2 are indeed independent, since λx1 + µx2 = 0
implies λ = µ = 0.
(b) The minimum distance to the affine set {x : Ax = b}, with b ∈ Rm , and
A ∈ Rm×n full row rank (that is, AAT is positive-definite), is given by the formula
x∗ = AT (AAT )−1 b. Applying this formula to A = aT , b = 1, yields
 
1
∗ a 1  
x = T = 2 .
a a 14
3

1
√ √
The minimum distance is kx∗ k2 = 1/ aT a = 1/ 14.
Alternatively, we notice that any vector x ∈ R3 can be decomposed as x = ta + z,
with t ∈ R, and z ∈ R3 , with z T a = 0. The condition x ∈ P then implies
t = 1/aT a. Since kxk22 = t2 aT a + z T z = (1/aT a) + z T z, the objective function of
the minimum Euclidean distance problem
r
1
min kxk2 = min T
+ z T z : aT z = 0
x∈P z a a

is minimal when z = 0. This shows that x∗ = a/(aT a), as claimed.

2. (8 points) Consider the operation of finding the point symmetric to a given point about
a given line L in Rn .

x0
x

Figure 1: A point and its symmetric about the line L.

We define the line as L := {x0 + tu, t ∈ R}, where x0 is a point on the line, and u its
direction, which we assume is normalized: kuk2 = 1. For a given point x ∈ Rn , We
denote by f (x) ∈ Rn the point that is symmetric to x about the line. (See Fig 1.)
That is, f (x) = 2p(x) − x where p(x) is the projection of x on the line:

p(x) = arg min kp − xk2


p∈L

(a) Show that the mapping f is affine. Describe it in terms of a n × n matrix A and a
n × 1 vector b, such that f (x) = Ax + b for every x. (It will be useful to use the
notation P := uuT .)
(b) What is the geometric interpretation of the vector b?
(c) Show that the mapping f is linear if and only if the line passes through 0.
(d) Show that f (f (x)) = x for every x. What is the geometric meaning of this
property?

2
(e) What is the range and nullspace of the matrix A? What is the rank of A? Is A
invertible?
(f) Show that A is symmetric, find its eigenvalue decomposition (EVD). Hint: define
u2 , . . . , un to be an orthonormal basis for the subspace orthogonal to u, and show
that the orthogonal matrix U := [u, u2 , . . . , un ] contains eigenvectors of A.
(g) Find an SVD decomposition of A. What is the relationship between the EVD of
A with its SVD?
(h) Assume that the input is bounded: kxk2 ≤ 1. Find a bound on the Euclidean
norm of the output f (x). Find an input x that achieves the bound.

Solutions:

(a) The minimum of the function with values

h(t) = ktu + x0 − xk22 = t2 − 2tuT (x − x0 ) + kx − x0 k22


= (t − uT (x − x0 ))2 + constant

is obtained with t(x) := uT (x − x0 ). Thus, we have

p(x) = t(x)u + x0 = (uT (x − x0 ))u + x0 = P (x − x0 ) + x0 , P := uuT .

Hence
f (x) = 2p(x) − x = (2P − I)(x − x0 ) + x0 = Ax + b,
where A = 2P − I, b = 2(I − P )x0 .
(b) Since f (0) = b, the latter is simply the symmetric to the origin about the line.
(c) The mapping is linear if only if b = 0, that is, when x0 satisfies P x0 = x0 . Hence,
x0 = uuT x0 = (uT x0 )u is proportional to u. In that case, the line goes through 0,
since 0 = x0 + tu, with t = −(uT x0 ).
(d) We have P u = u. Further, P 2 = (uuT )(uuT ) = (uT u)uuT = uuT = P . The latter
implies
A2 = (2P − I)(2P − I) = 4P 2 − 4P + I = I.
In addition, P b = 2P (I − P )x0 = 0. We obtain Ab = (2P − I)b = −b. We thus
obtain that

f (f (x)) = A(Ax + b) + b = A2 x + Ab + b = x − b + b = x.

The geometry of this is simply that the symmetric to the symmetric is itself.
(e) The nullspace of A is the set of vectors with Ax = 0, meaning 2P x = x. Thus,
x = 2(uT x)u is proportional to u. Since Ax = 0, but Au = u 6= 0, we must have
uT x = 0, hence x = 2(uT x)u = 0. We conclude that the nullspace is {0}, the
range is Rn , and A is full rank, hence invertible since it is also square.

3
(f) Since A = 2uuT − I, it is symmetric.
Let u2 , . . . , un be an orthonormal basis for the subspace orthogonal to u; we have
uTi u = 0, i = 2, . . . , n. We have Au = u, Aui = −ui , i = 2, . . . , n. Hence the vec-
tor u is an eigenvector associated with the eigenvalue 1, and the ui , i = 2, . . . , n,
are eigenvectors all associated with the eigenvalue −1. Writing the previous con-
ditions compactly as AU = U Λ, with U = [u, u2 , . . . , un ] an orthogonal matrix,
Λ = diag(1, −1, . . . , −1), we obtain that A admits the symmetric eigenvalue de-
composition A = U ΛU T .
(g) We have P ui = (uT ui )u = 0, i = 2, . . . , n. With U := [u, u2 , . . . , un ], we get
P U = [P u, P u2 , . . . , P un ] = [u, 0, . . . , 0], therefore
AU = [Au, Au2 , . . . , Aun ] = [u, −u2 , . . . , −un ] =: V.
Both U, V are orthogonal matrices. Post-multiplying the above relation by U T =
U −1 , we obtain A = V U T , which is the SVD of A, with V the left singular vectors,
and U the right singular vectors. Note that every singular value of A is one, which
is consistent with A2 = AAT = AT A = I.
The relationship with the SVD is simply that the eigenvectors u, u2 , . . . , un are
the right singular vectors as well. Flipping the signs on the last n − 1 eigenvectors
provides the left singular vectors.
(h) We want to solve
max kAx + bk2 .
x : kxk2 ≤1

Using the SVD of A = U V T , we reduce the problem to


max kx̃ + U T bk2 ,
x̃ : kx̃k2 ≤1

where b̃ = U T b, and x̃ = V T x. The solution is obvious: simply choose a unit-norm


vector in the same direction as U T b:
UT b UT b
x̃ = = .
kU T bk2 kbk2
We obtain
AT b Ab b
x = V x̃ = = =− ,
kbk2 kbk2 kbk2
where b = f (0) = 2(I − P )x0 is the point symmetric to 0 about the line L.
In other words, the worst-case input in the ball {x : kxk2 ≤ 1} is simply the one
that extends away from the line in the direction opposite to the projection of 0
on the line.
3. (6 points) We are given m of points x1 , . . . , xm in Rn . To a given normalized direction
w ∈ Rn (kwk2 = 1), we associate the line with direction w passing through the origin,
L(w) = {tw : t ∈ R}.

4
We then consider the projection of the points xi , i = 1, . . . , m, on the line L(w), and
look at the associated coordinates of the points on the line. These projected values are
given by ti (w) := arg min ktw − xi k2 , i = 1, . . . , m.
t

We assume that for any w, the empirical average t̂(w) of the projected values ti (w),
i = 1, . . . , m, and their empirical variance σ 2 (w), are both constant, independent of
the direction w (wih kwk2 = 1). Denote by t̂ and σ 2 the (constant) empirical average
and variance. Justify your answer to the following as carefully as you can.

(a) Show that ti (w) = xTi w, i = 1, . . . , m.


(b) Show that the empirical average of the data points,
m
1 X
x̂ := xi ,
m i=1

is zero.
(c) Show that the empirical covariance matrix of the data points,
m
1 X
Σ := (xi − x̂)(xi − x̂)T ,
m i=1

is of the form σ 2 · I, where I is the identity matrix of order n. (Hint: the largest
eigenvalue λmax of the matrix Σ can be written as: λmax = maxw {wT Σw : wT w =
1}, and a similar expression holds for the smallest eigenvalue.)

Solutions:

(a) For a given i = 1, . . . , m, we have

ti (w) = arg min ktw − xi k2 .


t

Let us drop i for a moment, and solve the least-squares problem with variable
t ∈ R:
p∗ := min ktw − xk22 .
t

One can apply the closed-form solution for least-squares problem, in which the ma-
trix involved is the full column-rank matrix w. This leads to t(w) = (wT w)−1 wT x =
wT x. (Recall that kwk2 = 1.)
Alternatively, we can solve the above problem directly, exploiting again kwk2 = 1:

p∗ = min t2 − 2(wT x)t + kxk22 = min (t − (wT x))2 + C, with C := kxk22 − (wT x)2 .
t t

The quantity C is constant (independent of the variable t). The first term in
the objective function above is non-negative, hence p∗ ≥ C. This lower bound is
attained with t = xT w.

5
(b) The empirical average of the numbers ti (w), i = 1, . . . , m, is
m m
1 X 1 X T
t̂(w) = ti (w) = w xi = wT x̂,
m i=1 m i=1

where x̂ is the empirical average of the data points. We obtain that there is a
constant α ∈ R such that

∀ w, kwk2 = 1 : wT x̂ = α.

Expressing the condition above for both w and −w, we obtain that α = 0. This
means that x̂ is orthogonal to any (unit-norm) vector, hence it is zero.
(c) The empirical variance of the numbers ti (w), i = 1, . . . , m, is given by
m
2 1 X
σ (w) = (ti (w) − t̂(w))2 .
m i=1

Exploiting ti (w) = wT xi , i = 1, . . . , m, and t̂(w) = wT x̂ = 0, we obtain


m m m
!
1 X 1 X 1 X
σ 2 (w) = (wT xi )2 = (wT xi )(xTi w) = wT xi xTi w = wT Σw,
m i=1 m i=1 m i=1

where Σ is the empirical covariance matrix of the data points. The property of
constant variance is thus equivalent to the fact that the quadratic form w →
wT Σw is a constant function on the unit ball {w : kwk2 = 1}. We have denoted
by σ 2 this constant.
The largest and smallest eigenvalue of Σ admit the variational representation

λmax (Σ) = max wT Σw, λmin (Σ) = min wT Σw.


kwk2 =1 kwk2 =1

Since the objective function of these problem is the same constant function, we
obtain that λmax (Σ) = λmin (Σ) = σ 2 . Hence all eigenvalues of Σ are equal, to σ 2 .
That is, the diagonal matrix of eigenvalues is given by Λ = σ 2 I. The eigenvalue
decomposition of Σ is of the form Σ = U T ΛU , with U an orthogonal matrix of
eigenvectors. Since Λ = σ 2 I, we obtain that Σ = σ 2 U T U = σ 2 I as well.

You might also like