Lecture 4 Orthogonal It y

Download as pdf or txt
Download as pdf or txt
You are on page 1of 15

November 5, 2010

INNER PRODUCT SPACES


RODICA D. COSTIN
Contents
1. Inner product 1
1.1. Inner product 1
1.2. Inner product spaces 3
2. Orthogonal bases 3
2.1. Orthogonal projections 4
2.2. Existence of orthogonal basis in nite dimension 5
2.3. Orthogonal subspaces, orthogonal complements 6
2.4. Splitting of linear transformations 7
2.5. The adjoint and the four fundamental subspaces of a matrix 8
3. Least squares approximations 10
3.1. The Cauchy-Schwartz inequality 10
3.2. More about orthogonal projections 10
3.3. Overdetermined systems: best t solution 11
3.4. Another formula for orthogonal projections 12
4. Orthogonal and unitary matrices, QR factorization 12
4.1. Unitary and orthogonal matrices 12
4.2. Rectangular matrices with orthonormal columns. A simple
formula for orthogonal projections 14
4.3. QR factorization 14
1. Inner product
1.1. Inner product.
1.1.1. Inner product on real spaces. Vectors in R
n
have more properties than
the ones listed in the denition of vector spaces: we can dene their length,
and the angle between two vectors.
Recall that two vectors are orthogonal if and only if their dot product is
zero, and, more generally, the cosine of the angle between two unit vectors
in R
3
is their dot product. The notion of inner product extracts the essential
properties of the dot product, while allowing it to be dened on quite general
vector spaces. We will rst dene it for real vector spaces, and then we will
formulate it for complex ones.
1
2 RODICA D. COSTIN
Denition 1. An inner product on vector space V over F = R is an
operation which associate to two vectors x, y V a scalar x, y R that
satises the following properties:
(i) its is positive denite: x, x 0 and x, x = 0 if and only if x = 0,
(ii) it is symmetric: x, y = y, x
(iii) it is linear in the second argument: x, y + z = x, y + x, z and
x, cy = cx, y
Note that by symmetry it follows that an inner product is linear in the
rst argument as well: x + z, y = x, y +z, y and cx, y = cx, y.
Example 1. The dot product in R
2
or R
3
is clearly an inner product: if
x = (x
1
, x
2
, x
3
) and y = (y
1
, y
2
, y
3
) then dene
x, y = x y = x
1
y
1
+ x
2
y
2
+ x
3
y
3
Example 2. More generally, an inner product on R
n
is
(1) x, y = x y = x
1
y
1
+ . . . + x
n
y
n
Example 3. Here is another inner product on R
3
:
x, y = 5x
1
y
1
+ 10x
2
y
2
+ 2x
3
y
3
(some directions are weighted more than others).
Example 4. On spaces of functions the most useful inner products use
integration. For example, consider T be the vector space of all polynomials
with real coecients. Then
p, q =
_
1
0
p(t)q(t) dt
is an inner product on T (check!).
Example 5. Sometimes a weight is used: let w(t) be a positive function.
Then
p, q =
_
1
0
w(t)p(t)q(t) dt
is also an inner product on T (check!).
1.1.2. Inner product on complex spaces. For complex vector spaces extra
care is needed. The blueprint of the construction here can be seen on the
simplest case, C as a one dimensional vector space over C: the inner product
of z, z needs to be a positive number! It makes sense to dene z, z = zz.
Denition 2. An inner product on vector space V over F = C is an
operation which associate to two vectors x, y V a scalar x, y C that
satises the following properties:
(i) its is positive denite: x, x 0 and x, x = 0 if and only if x = 0,
(ii) it is linear in the second argument: x, y + z = x, y + x, z and
x, cy = cx, y
(iii) it is conjugate symmetric: x, y = y, x.
INNER PRODUCT SPACES 3
Note that conjugate symmetry combined with linearity implies that ., .
is conjugate linear in the rst variable: x + z, y = x, y + z, y and
cx, y = cx, y.
Please keep in mind that most mathematical books use inner product
linear in the rst variable, and conjugate linear in the second one. You
should make sure you know the convention used by each author.
Example 1. On the one dimensional complex vector space C an inner
product is z, w = zw.
Example 2. More generally, an inner product on C
n
is
(2) x, y = x
1
y
1
+ . . . + x
n
y
n
Example 3. Here is another inner product on C
3
:
x, y = 5x
1
y
1
+ 10x
2
y
2
+ 2x
3
y
3
(some directions are weighted more than others).
Example 4. Let T
C
be the vector space of all polynomials with complex
coecients. Then
p, q =
_
1
0
p(t)q(t) dt
is an inner product on T
C
(check!).
Example 5. Weights need to be positive: for w(t) a givenpositive func-
tion. Then
p, q =
_
1
0
w(t)p(t)q(t) dt
is also an inner product on T
C
(check!).
1.2. Inner product spaces.
Denition 3. A vector space V equipped with an inner product (V, ., .) is
called an inner product space.
Examples 1.-5. before are examples of inner product spaces over R, while
Examples 1.-5. are inner product spaces over C.
Let (V, ., .) be an inner product space. The quantity
|x| =
_
x, x
is called the norm of the vector x. For V = R
3
with the usual inner product
(which is the dot product) the norm of a vector is its length.
Vectors of norm one, x with |x| = 1, are called unit vectors.
2. Orthogonal bases
Let (V, ., .) be an inner product space.
Denition 4. Two vectors x, y V are called orthogonal if x, y = 0.
4 RODICA D. COSTIN
Note that the zero vector is orthogonal to any vector:
x, 0 = 0 for all x V
since x, 0 = x, 0y = 0x, y = 0.
Denition 5. A set B = v
1
, . . . , v
n
V is called an orthogonal set if
v
j
, v
i
= 0 for all i ,= j.
Remark. An orthogonal set which does not contain the zero vector is
a linearly independent set, since if v
1
, . . . , v
k
are orthogonal and c
1
v
1
+
. . . + c
k
v
k
= 0 then for any j = 1, . . . , k
0 = v
j
, c
1
v
1
+ . . . + c
k
v
k
= c
1
v
j
, v
1
+ . . . + c
j
v
j
, v
j
+ . . . + c
k
v
j
, v
k

= c
j
v
j
, v
j

and since v
j
,= 0 then v
j
, v
j
, = 0 which implies c
j
= 0. 2.
Denition 6. An orthonormal set of vectors which form a basis for V is
called an orthogonal basis.
An orthogonal basis made of unit vectors is called an orthonormal basis.
For example, the standard basis e
1
, . . . , e
n
is an orthonormal basis in R
n
(when equipped this the inner product given by the dot product).
Of course, if v
1
, . . . , v
n
is an orthogonal basis, then
1
v
1

v
1
, . . . ,
1
vn
v
n
is
an orthonormal basis.
In an inner product space, the coecients of the expansion of a vector in
an orthogonal basis can be easily calculated:
Theorem 7. Let B = v
1
, . . . , v
n
be an orthogonal basis for V .
Then the coecients of the expansion of any x V in the basis B are
found as
(3) x = c
1
v
1
+ . . . + c
n
v
n
where c
j
=
v
j
, x
|v
j
|
2
Proof.
Consider the expansion of x in the basis B: x = c
1
v
1
+ . . . + c
n
v
n
. For
each j = 1, . . . , n
v
j
, x = v
j
, c
1
v
1
+. . .+c
n
v
n
= c
1
v
j
, v
1
+. . .+c
j
v
j
, v
j
+. . .+c
n
v
j
, v
n

= c
j
v
j
, v
j
= c
j
|v
j
|
2
which gives the formula (3) for c
j
. 2
2.1. Orthogonal projections. Given two vectors x, v V the orthogo-
nal projection of x in the direction of v is
(4) P
v
x =
v, x
|v|
2
v
For V = R
3
with the inner product given by the dot product of vectors,
this is the usual geometric projection.
Remarks:
INNER PRODUCT SPACES 5
1. P
v
x is a scalar multiple of v.
2. (4) can be rewritten as
P
v
x =
v
|v|
, x
v
|v|
, where
v
|v|
is a unit vector
which shows that the projection in the direction of v depends only on its
direction, and not on its norm. Furthermore, note that P
v
= P
v
, so a
more appropriate name for P
v
is the orthogonal projection onto the subspace
Sp(v).
3. P
v
: V V is a linear transformation, whose range is Sp(v), and
which satises P
2
v
= P
v
.
Note that (x P
v
x) v (check!).
More generally, we can dene orthogonal projections onto higher dimen-
sional subspaces. Consider for example a two dimensional space Sp(v
1
, v
2
)
where we assume v
1
v
2
for simplicity (and v
1,2
,= 0). Consider for any x
the vector
(5) P
Sp(v
1
,v
2
)
x Px =
v
1
, x
|v
1
|
2
v
1
+
v
2
, x
|v
2
|
2
v
2
We have: Px U, P
2
= P and (x Px) Sp(v
1
, v
2
) (check!), so P
is the orthogonal projection (transformation) onto U. We will give exact
characterizations of all orthogonal projections below in 2.5.
2.2. Existence of orthogonal basis in nite dimension. We know that
any vector space has a basis. Moreover, any nite dimensional inner product
space has an orthogonal basis, and here is how to nd it:
Theorem 8. Gram-Schmidt Orthogonalization
Let u
1
, . . . , u
n
be a basis for V . Then an orthogonal basis v
1
, . . . , v
n

can be found as follows:


(6) v
1
= u
1
Sp(u
1
)
(7) v
2
= u
2

v
1
, u
2

|v
1
|
2
v
1
Sp(u
1
, u
2
)
(8) v
3
= u
3

v
1
, u
3

|v
1
|
2
v
1

v
2
, u
3

|v
2
|
2
v
2
Sp(u
1
, u
2
, u
3
)
.
.
.
(9) v
k
= u
k

k1

j=1
v
j
, u
k

|v
j
|
2
v
j
Sp(u
1
, . . . , u
k
)
.
.
.
(10) v
n
= u
n

n1

j=1
v
j
, u
k

|v
j
|
2
v
j
Sp(u
1
, . . . , u
n
)
6 RODICA D. COSTIN
The Gram-Schmidt orthogonalization process is exactly as in R
n
. At the
rst step we keep u
1
, (6). Next, we consider Sp(u
1
, u
2
) and we replace u
2
by a vector which is orthogonal to v
1
= u
1
, which is u
2
P
v
1
u
2
, giving (7).
We have produced the orthogonal basis v
1
, v
2
for Sp(u
1
, u
2
).
For the next step we add one more vector, and we consider Sp(u
1
, u
2
, u
3
) =
Sp(v
1
, v
2
, u
3
). We need to replace u
3
by a vector orthogonal to Sp(v
1
, v
2
)
and this is u
3
P
Sp(v
1
,v
2
)
u
3
, which is (8).
The procedure is continued; at each step k we subtract from u
k
the or-
thogonal projection of u
k
on Sp(v
1
, . . . , v
k1
).
To check that v
k
v
i
for all i = 1, . . . , k 1 we calculate, using the
linearity in the second argument of the inner product,
v
i
, v
k
= v
i
, u
k

k1

j=1
v
j
, u
k

|v
j
|
2
v
i
, v
j
= v
i
, u
k

v
i
, u
k

|v
i
|
2
v
i
, v
i
= 0
2
Corollary 9. Let U be a subspace of V . Any orthogonal basis of U can be
completed to an orthogonal basis of V .
Indeed, U has an orthogonal basis v
1
, . . . , v
k
by Theorem8, which can
be completed to a basis of V , v
1
, . . . , v
k
, u
k+1
. . . u
n
. Applying the Gram-
Schmidt orthogonalization procedure to this basis, we obtain an orthogonal
basis of V . It can be easily seen that the procedure leaves v
1
, . . . , v
k
un-
changed. 2
2.3. Orthogonal subspaces, orthogonal complements. Let (V, ., .)
be an inner product space.
Denition 10. Let U be a subspace of V . We say that a vector w V is
orthogonal to U if it is orthogonal to every vector in U:
w U if and only if w u for all u U
Denition 11. Two subspaces U, W V are orthogonal (U W) if
every u U is orthogonal to every w W.
Remark. If U W then U W = 0.
Indeed, suppose that x U W. Then x U and x W, and therefore
we must have x, x = 0 which implies x = 0.
Examples. Let V = R
3
.
(i) If the (subspace consisting of all vectors on) the z-axis is orthogonal to
any one dimensional subspace (a line through the origin) then is included
in the xy-plane.
(ii) In fact the z-axis is orthogonal to the xy-plane.
(iii) But the intuition coming from classical geometry stops here. As
vector subspaces, the yz-plane is not orthogonal to the xy-plane (they have
a subspace in common!).
INNER PRODUCT SPACES 7
Recall that any subspace U V as a complement in V (In fact, it has
innitely many). But exactly one of those is orthogonal to U:
Theorem 12. Let U be a subspace of V . Denote
U

= w V [ w U
Then U

is a subspace and
U U

= V
The space U

is called the orthogonal complement of U in V .


Proof of Theorem12.
To show that U

is a subspace, let w, w

; then u, w = 0 and
u, w

= 0 for any u U. Then for any c, c

any scalars in F we have


u, cw + c

= cu, w + c

u, w

= 0
which shows that cw + c

.
The sum is direct since if u U U

then u U and u U

, therefore
u, u = 0 therefore u = 0.
Finally to show that U + U

= V , let u
1
, . . . , u
k
be an orthogonal basis
of U, which can be completed to an orthogonal basis of V by Corollary 9:
u
1
, . . . , u
k
, v
1
, . . . , v
nk
.
Then U

= Sp(v
1
, . . . , v
nk
). Indeed, clearly Sp(v
1
, . . . , v
nk
) U

and by dimensionality, U

cannot contain any other linearly independent


vectors. Therefore U U

= V . 2
Remark. For any subspace U of a nite dimensional inner product space
V the orthogonal complement of the orthogonal complement of U is U:
_
U

= U
The proof is left as an exercise.
2.4. Splitting of linear transformations. Let U, V be two inner product
spaces over the same scalar eld F, and consider any linear transformation
T : U V .
If T is not one to one we can restrict the domain of T and obtain a one
to one any linear transformation: split U = A(T) A(T)

and dene

T to
be the restriction of T to the subspace A(T)

T : A(T)

V,

T(x) = T(x)
Note that !(

T) = !(T).
If T is not onto, then split V = !(T)!(T)

. The linear transformation

T : U !(T) dened by

T(x) = T(x) for all x U is onto (and it is


essentially T).
By splitting both the domain and the codomain we can write
(11) T : A(T)

A(T) !(T) !(T)

8 RODICA D. COSTIN
where the restriction of T:
T
0
: A(T)

!(T), T
0
(x) = T(x)
is one to one and onto: it is invertible!
Let us look at the matrix associated to T in suitable bases. Let B
1
=
u
1
, . . . u
k
be a basis for A(T)

and u
k+1
, . . . , u
n
a basis for A(T). Then
B
U
= u
1
, . . . , u
n
is a basis for U. Similarly, let B
2
= v
1
, . . . v
k
be a basis
for !(T) and v
k+1
, . . . , u
m
a basis for !(T)

. Then B
V
= v
1
, . . . , v
n
is
a basis for V . Then the matrix associated to T in the bases B
U
, B
V
has the
block form
_
M
0
0
0 0
_
where M
0
is the k k invertible matrix associated to T
0
in the bases B
1
, B
2
.
2.5. The adjoint and the four fundamental subspaces of a matrix.
2.5.1. The adjoint of a linear transformation. Let (V, ., .
V
), (U, ., .
U
) be
two inner product spaces over the same scalar eld F. We restrict the
discussion to nite dimensional spaces.
Denition 13. If T : U V is a linear transformation, its adjoint T

is
the linear transformation T

: V U which satises
Tu, v
V
= u, T

v
U
for all u U, v V
The existence of the adjoint, and its uniqueness, needs a proof, which is
not dicult, but will be omitted here.
We focus instead on linear transformations given by matrix multiplication,
in the vectors spaces R
n
or C
n
with the usual inner product (1), respectively
(2). In this case we will denote, for simplicity, the linear operator and its
matrix by the same letter and say, by abuse of language and notation, that
a matrix is a linear transformation. Denition 13 becomes: if M is an nm
matrix with entries in F, its adjoint is the matrix M

for which
(12) Mu, v
V
= u, M

v
U
for all u F
n
, v F
m
For F = R the usual linear algebra notation for the inner product (1) is
x, y = x
T
y
and in this notation, recalling that (AB)
T
= B
T
A
T
, relation (12) is
(13) u
T
M
T
v = u
T
M

v
which shows that for real matrices the adjoint is the transpose: M

= M
T
.
For F = C another linear algebra notation for the inner product (2) is
(14)
x, y = x
H
y, where x
H
is the Hermitian of the vector x : x
H
= x
T
and relation (12) is
u
H
M
H
v = u
H
M

v
INNER PRODUCT SPACES 9
which shows that for complex matrices the adjoint is the complex conjugate
of the transpose: M

= M
H
= M
T
.
The notations (13), (14) will not be used here: they are very useful for
nite dimensional vector spaces, but not for innite dimensional ones.
We prefer to use the unied notation M

for the adjoint of a matrix,


keeping in mind that M

= M
T
in the real case and M

= M
H
in the
complex case.
Note that
(15) (M

= M
2.5.2. The four fundamental subspaces of a matrix. Let M be an mn real
matrix.
a) Since the columns of M
T
are the rows of M, then
!(M
T
) =the column space of M
T
=the row space of M.
Theorem14 (ii) shows that the row space of M and the null space of M are
orthogonal complements to each other.
b) The left null space of M is, by denition, the set of all x R
m
so
that x
T
M = 0. Taking a transposition, we see that this is equivalent to
M
T
x = 0, therefore the left null space of M equals A(M
T
). Theorem14 (i)
shows that the left null space of M and the column space of M are orthogonal
complements each other.
Theorem 14. Let M be an n m matrix. The following hold:
(i) A(M

) = !(M)

(ii) !(M

= A(M)
Proof.
We need the following very useful fact:
Remark 15. Let (V, , ) be an inner product space. We have:
x, v = 0 for all v V if and only if x = 0.
This is easy to see, since if we take in particular v = x then x, x = 0
which implies x = 0.
To prove Theorem14 (i): we have u !(M)

if and only if u, Mv = 0
for all v R
n
if and only if M

u, v = 0 for all v R
n
if and only if (by
Remark 15) M

u = 0 hence u A(M

).
The statement (ii) of Theorem14 follows from (i) after replacing M by
M

and using (15). 2


We should take now another look at the splitting in 2.4. Formula (11):
T : A(T)

A(T) !(T) !(T)

is a splitting using the four fundamental spaces, and it can be also written
as
T : !(T

) A(T) !(T) A(T

)
10 RODICA D. COSTIN
(in innite dimensions extra care is needed here).
3. Least squares approximations
3.1. The Cauchy-Schwartz inequality. One of the most useful, and deep,
inequalities in mathematics, which holds in nite or innite dimensional in-
ner product spaces is:
Theorem 16. The Cauchy-Schwartz inequality
In an inner product vector space any two vectors x, y satisfy
(16) [x, y[ |x| |y|
with equality if and only if x, y are linearly dependent.
Before noting its rigorous and very short proof, let us have an intuitive
view based on the geometry of R
3
. Recall that the cosine of the angle
between two vectors x, y R
3
is cos = x y/(|x| |y|). Since [ cos [ 1
the Cauchy-Schwartz inequality follows. Equality holds only for = 0 or
= (in which cases x, y are scalar multiples of each other). We can
also have equality in the Cauchy-Schwartz inequality if x or y are the zero
vectors (when we cannot dene the angle between them), and in this case
the two vectors are linearly dependent as well.
Proof.
If y = 0 then the inequality is trivially true. Otherwise, denote c =
x, y/y, y, and we have
0 x cy, x cy = x, x
[x, y[
2
y, y[
which gives (16). 2
3.2. More about orthogonal projections. We already dened the or-
thogonal projection (linear transformation/matrix) onto one dimensional
and two dimensional subspaces in 2.1. The formulas are similar for higher
dimensional subspaces which are specied by an orthogonal basis. Here are
two more characterization of orthogonal projections.
The rst theorem gives us a very easy way to see if a linear transformation
is, or is not, an orthogonal projection:
Theorem 17. Let V be a nite dimensional inner product space. A linear
transformation P : V V is an orthogonal projection if and only if P
2
= P
and P = P

.
Proof.
Suppose that P satised (i) and (ii), and show that it is a projection.
We rst need to identify the subspace onto which P projects; obviously,
this would be !(P). We need to check that x Px !(P), therefore that
Py, x Px = 0 for all y V . Indeed,
Py, x Px = Py, x Py, Px = Py, x P

Py, x
INNER PRODUCT SPACES 11
= Py, x P
2
y, x = Py, x Py, x = 0
Conversely, let P be the orthogonal projection onto a subspace U. If
v
1
, . . . , v
k
is an orthonormal basis for U, then the formula for P is the
natural generalization of (5) which is (taking into account that we assumed
our v
j
to the unit vectors)
Px = v
1
, xv
1
+ . . . +v
k
, xv
k
Note that Pv
j
= v
1
, v
j
v
1
+ . . . +v
k
, v
j
v
k
= v
j
, v
j
v
j
= v
j
for all j.
To check that P
2
= P calculate
P
2
x = P (v
1
, xv
1
+ . . . +v
k
, xv
k
) = v
1
, xPv
1
+ . . . +v
k
, xPv
k
= v
1
, xv
1
+ . . . +v
k
, xv
k
= Px
To check that P

= P calculate, for any x, y V


y, Px = y, v
1
, xv
1
+. . .+v
k
, xv
k
= v
1
, xy, v
1
+. . .+v
k
, xy, v
k

v
1
, xv
1
, y+. . .+v
k
, xv
k
, y = v
1
, yv
1
+. . .+v
k
, yv
k
, x = Py, x
which completes the proof. 2
In an inner product space we can dene the distance between two vectors
x, y as |xy|. Note that this the usual distance between two points in R
3
.
The second characterization of orthogonal projections is very useful in
approximations. In planar and spacial geometry, the shortest distance be-
tween a point and a line (or a plane) is the one measured on a perpendicular
line; this is true in general:
Theorem 18. Let U be a subspace of the inner product V . For any x V
the point in U which is at minimal distance to x is the orthogonal projection
of x onto U:
|x Px| = min|x u| [ u U
Proof.
This is an immediate consequence to the Pythagorean theorem:
(17) |x u|
2
= |u Px|
2
+|x Px|
2
, for any u U
Exercise. Prove (17) using the fact that P is the orthogonal projection
onto U, therefore (xPx) U and that P
2
= P, P

= P (see Theorem17).
2
3.3. Overdetermined systems: best t solution. Let M be an n m
matrix with entries in R (we could as well work with F = C). By abuse of
notation we will speak of the matrix M both as a matrix and as the linear
transformation R
n
R
m
which takes x to Mx, thus denoting by !(M) the
range of the transformation (which is the same as the column space of M).
The linear system Mx = b has solutions if and only if b !(M). If
b , !(M) then the system has no solutions, and it is called overdetermined.
In practice overdetermined systems are not uncommon, usual sources be-
ing that linear systems are only models for more complicated phenomena,
12 RODICA D. COSTIN
and the collected data is subject to errors and uctuations. For practical
problems it is important to produce a best t solution: an x for which the
error Mx b is as small as possible.
There are many ways of measuring such an error, the most convenient is
the least squares error: nd x which minimizes the square error:
S = r
2
1
+ . . . + r
2
m
where r
j
= (Mx)
j
b
j
Of course, this is the same as minimizing |Mx b| where the inner
product is the usual dot product on R
m
. By Theorem18 it follows that
Mx must equal p = Pb, the orthogonal projection of b onto the subspace
!(M).
We only need to solve the system Mx = Pb, which is solvable since
Pb !(M). If M is one to one, then there is a unique solution x, otherwise
any vector in the space x +A(M) is a solution as well.
To nd a formula for x, recall that we must have (b Pb) !(M)
therefore, by Theorem14 (i), we must have (b Pb) A(M

), hence
M

b = M

Pb, so
M

b = M

Mx
which is called the normal equation in statistics.
If M is one to one, then so is M

M by Teorem14 (iii), and since M

M is
a square matrix, then it is invertible, and we can solve x = (M

M)
1
M

b
and we also have a formula for the projection
(18) Pb = M(M

M)
1
M

b
If M is not one to one, then one particular least squares solution can
be chosen among all in x + A(M). Choosing the vector with the smallest
norm, this gives x = M
+
b where M
+
is called the pseudoinverse of M. The
notion of pseudoinverse will be studied in more detail later on.
3.4. Another formula for orthogonal projections. Formula (18) is an-
other useful way of writing projections. Suppose that U is a subspace of R
m
which is given as the range of a matrix M; nding an orthogonal basis for U
usually requires some work, but this turns out not to be necessary in order
to write a nice formula for the orthogonal projection onto U = !(M). If the
matrix is not one to one, then by restricting its domain to a smaller subspace
we obtain a smaller matrix with the same range, see 2.4. With this extra
care, formula (18) gives the orthogonal projection onto the subspace !(M).
4. Orthogonal and unitary matrices, QR factorization
4.1. Unitary and orthogonal matrices. Let (V, , ) be an inner prod-
uct space. The following type of linear transformation are what isomor-
phisms of inner product spaces should be: linear, invertible, and they pre-
serve the inner product (therefore angles and lengths):
Denition 19. A linear transformation U : V V is called a unitary
transformation if U

U = UU

= I.
INNER PRODUCT SPACES 13
In particular, for V = C
n
or R
n
when the transformation is given the
multiplication by a matrix U:
Denition 20. An unitary matrix is an nn matrix with U

U = UU

=
I and therefore U
1
= U

.
and
Denition 21. An unitary matrix with real entries is called an orthogonal
matrix.
We should immediately state and prove the following properties of unitary
matrices, each one can be used as a denition for a unitary matrix:
Theorem 22. The following statements are equivalent:
(i) The n n matrix U is unitary.
(ii) The columns of U are an orthonormal set of vectors (therefore an or-
thonormal basis of C
n
).
(iii) U preserves angles: Ux, Uy = x, y for all x, y C
n
.
(iv) U is an isometry: |Ux| = |x| for all x C
n
.
Remark. It also follows that the rows of U are orthonormal whenever
the columns of U are orthonormal.
Examples. Rotation matrices and reexion matrices in R
n
, and their
products, are orthogonal (they preserve the length of vectors).
Remark. An isometry is necessarily one to one, and therefore, it is also
onto (in nite dimensional spaces).
Remark. The equivalence between (i) and (iv) is not true in innite
dimensions (unless the isometry is assumed onto).
We will need the following result which shows that in an inner product
space the inner product is completely recovered if we know the norm of every
vector:
Theorem 23. The polarization identity:
for complex spaces
f, g =
1
4
_
|f + g|
2
|f g|
2
+ i|f + ig|
2
i|f ig|
2
_
and for real spaces
f, g =
1
4
_
|f + g|
2
|f g|
2
_
The proof is by a straightforward calculation.
Proof of Theorem22.
(i)(ii) is obvious by matrix multiplication (line j of U

multiplying,
place by place, column i of U is exactly the dot product of column j complex
conjugated and column i).
(i)(iii) is obvious because U

U = I is equivalent to U

Ux, y = x, y
for all x, y R
n
, which is equivalent to (iii).
14 RODICA D. COSTIN
(iii)(iv) by taking y = x.
(iv)(iii) follows from the polarization identity. 2
4.2. Rectangular matrices with orthonormal columns. A simple
formula for orthogonal projections.
Let M = [u
1
, . . . , u
k
] be an n k matrix whose columns u
1
, . . . , u
k
are
an orthonormal set of vectors. Then necessarily n k. If n = k then the
matrix M is unitary, but assume here that n > k.
Note that A(M) = 0 since the columns are independent.
Note also that
M

M = I
(since line j of M

multiplying, place by place, column i of M is exactly


u
j
u
i
which equals 1 for i = j and 0 otherwise). Then the least squares
minimization formula (18) takes the simple form P = MM

:
Theorem 24. Let u
1
, . . . , u
k
be an orthonormal set, and M = [u
1
, . . . , u
k
].
The orthogonal projection onto U = Sp(u
1
, . . . , u
k
) = !(M) (cf. 3.4) is
P = MM

4.3. QR factorization. The following decomposition of matrices has count-


less applications, and extends to innite dimensions.
If an m n matrix M = [u
1
, . . . , u
n
] has linearly independent columns
(hence m n and A(M) = 0) then applying the Gram-Schmidt process on
the columns u
1
, . . . , u
n
where we also normalize, to produce an orthonormal
basis (not only an orthogonal one) amounts to factoring M as described
below.
Theorem 25. QR factorization of matrices
Any m k matrix M = [u
1
, . . . , u
k
] with linearly independent columns
can be factored as M = QR where Q is an m k matrix whose columns
form an orthonormal basis for !(M) (hence Q

Q = I) and R is an k k
upper triangular matrix with positive entries on its diagonal (hence R is
invertible).
In the case of a square matrix M then Q is also square, and it is a unitary
matrix.
If the matrix M has real entries, then Q and R have real entries, and if
k = n then Q is an orthogonal matrix.
Remark 26. A similar factorization can be written A = Q
1
R
1
with Q
1
an
m m unitary matrix and R
1
an m k rectangular matrix whose rst k
rows are the upper triangular matrix R and the last mk rows are zero:
A = Q
1
R
1
= [Q Q
2
]
_
R
0
_
= QR
Proof of Theorem25.
INNER PRODUCT SPACES 15
Every step of the Gram-Schmidt process (6), (7), . . . (9) is completed by
normalization: let
(19) q
j
=

j
|v
j
|
v
j
for all j = 1, . . . k, where
j
C, [
j
[ = 1
to obtain an orthonormal set q
1
, . . . , q
k
.
First, use (19) to replace the orthogonal set of vectors v
1
, . . . , v
k
in (6),
(7),. . .,(9) by the orthonormal set q
1
, . . . , q
k
.
Then invert: write u
j
s in terms of q
j
s. Since q
1
Sp(u
1
), q
2

Sp(u
1
, u
2
),. . . , q
j
Sp(u
1
, u
2
, . . . , u
j
), . . . then u
1
Sp(q
1
), u
2
Sp(q
1
, q
2
),. . . ,
u
j
Sp(q
1
, q
2
, . . . , q
j
), . . . and therefore there are scalars c
ij
so that
(20) u
j
= c
1j
q
1
+ c
2j
q
2
+ . . . + c
jj
q
j
for each j = 1, . . . , k
and since q
1
, . . . , q
k
are orthonormal, then
(21) c
ij
= q
i
, u
j

Relations (20), (21) can be written in matrix form as


M = QR
with
Q = [q
1
, . . . , q
k
], R =
_

_
q
1
, u
1
q
1
, u
2
. . . q
1
, u
k

0 q
2
, u
2
. . . q
2
, u
k

.
.
.
.
.
.
.
.
.
0 0 . . . q
k
, u
k

_
We have the freedom of choosing the constants
j
of modulus 1, and
they can be chosen so that all diagonal elements of R are positive (since
q
j
, u
j
=

j
v
j

v
j
, u
j
choose
j
= v
j
, u
j
/[v
j
, u
j
[). 2
For numerical calculations the Gram-Schmidt process described above
accumulates round-o errors. For large m and k other more ecient, nu-
merically stable, algorithms exist, and should be used.
Applications of the QR factorization to solving linear systems.
1. Suppose M is an invertible square matrix. To solve Mx = b, factoring
M = QR, the system is QRx = b, or Rx = Q

b which can be easily solved


since R is triangular.
2. Suppose M is an m k rectangular matrix, of full rank k. Since
m > k the linear system Mx = b may be overdetermined. Using the QR
factorization in Remark 26, the system is Q
1
R
1
x = b, or R
1
x = Q

1
b, which
is easy to see if it has solutions: the last mk rows of Q

1
b must be zero. If
this is the case, the system can be easily solved since R is upper triangular.

You might also like