Vector Norms
Vector Norms
Vector Norms
Special cases:
X
kvk1 = |vi |
qX
kvk2 = |vi |2 (Euclidean norm)
kvk∞ = max |vi |.
30
5.1. VECTOR NORMS 31
It is easy to verify that conditions (a), (b), (c) are satisfied for all p. The
triangle inequality is only satisfied for p ≥ 1. In fact, it goes the other way for
p < 1.
As I just said, for p < 1 these are not norms, because the triangle inequality fails.
Strangely enough, kxk0 satisfies the triangle inequality again, but it is still not a norm
because (c) fails.
32 CHAPTER 5. VECTOR AND MATRIX NORMS
Here the wi > 0 are some fixed weights. This could be useful if some measure-
ments in your data are more reliable than others, or some parts of a solution
vector are more important than others.
defines a norm.
kx − xN kA ≤ some estimate,
but you can’t get a direct estimate for the standard norm kx − xN k.
Here the A is actually a positive definite differential operator, not a matrix,
but the idea is the same.
R ⊂ S × S = {(a, b) : a, b, ∈ S}.
A relation is
• transitive if (a, b) and (b, c) ⇒ (a, c)
• symmetric if (a, b) ⇔ (b, a)
• antisymmetric if (a, b) and (b, a) ⇒ a = b.
• reflexive if (a, a) for all a.
Example: The ordering ≤ on a set of real numbers is transitive, antisymmetric and
reflexive. Likewise for the partial set ordering ⊂.
A relation is called an equivalence relation if it is transitive, symmetric and
reflexive.
Example: Think of the identity =.
5.1. VECTOR NORMS 33
Definition 5.5. Two norms are equivalent if there are constants 0 <
A ≤ B so that
Akvk ≤ |||v||| ≤ Bkvk ∀v
Theorem 5.6. (Main Theorem in this section) All vector norms in finite
dimensions are equivalent.
kvk ≤ M · kvk1 .
P P
Proof. Let ei be the ith basis vector. Then v = vi ei , kvk1 = |vi |, and
X
X
kvk =
vi e i
≤ |vi |kei k ≤ M kvk1
i
k · k : Cn with norm k · k1 → R
In infinite dimensions, the unit sphere is not compact, and the p-norms are
not equivalent.
34 CHAPTER 5. VECTOR AND MATRIX NORMS
σ1 ≥ σ2 ≥ · · · ≥ σr > 0.
A = U ΣV ∗ ,
• The σi are also the square roots of the nonzero eigenvalues of AA∗ . A∗ A
and AA∗ are of different sizes in general, but they have the same nonzero
eigenvalues.
(a) kAk ≥ 0 ∀A
(b) kAk = 0 ⇔ A=0
(c) kαAk = |α|kAk
Note:
• All the matrix norms we consider are defined for matrices of all sizes.
Properties (d) and (e) only apply if the sizes are compatible.
• Some books only require (a)–(d). For me, it does not deserve to be called
a matrix norm if it does not satisfy (e) also.
• Notice that (e) implies kAn k ≤ kAkn . That will be useful later.
This is a desirable property. Note that this definition requires two norms to
work together. Typically, a particular matrix norm is compatible with one or
more vector norms, but not with all of them.
There are three main sources of matrix norms: (1) vector-based norms; (2)
induced matrix norms; (3) norms based on eigenvalues.
We will now look at all of those in turn.
36 CHAPTER 5. VECTOR AND MATRIX NORMS
Proof. (a) Let ri∗ be the ith row of A, cj the jth column of B. Then
X X
kABksum = |(AB)ij | = |hcj , ri i|
ij ij
X
≤ kri k1 · kcj k1 = kAksum · kBksum .
ij
Proof. Basically the same proof as for the sum norm, except we use Cauchy-
Schwarz.
Proof. Write out what trace(A∗ A) is, and observe it is equal to kAk2F .
kU AV kF = kAkF .
Proof.
Similarly for V .
There will be more properties of the Frobenius norm in section 5.3.3.
Fact: The max-norm does not satisfy (e).
Exercise: Find a counterexample.
Definition 5.16. Given any vector norm, the induced matrix norm is
given by
kAvk
kAk = sup = sup kAvk.
v6=0 kvk kvk=1
It is easy to check that (a)–(e) are satisfied, and that these norms are auto-
matically compatible with the vector norm that produced them.
Theorem 5.17.
X
kAk1 = max |aij | (largest column sum)
j
i
X
kAk∞ = max |aij | (largest row sum)
i
j
Proof.
X XX
kAvk1 = |(Av)i | ≤ |aij | · |vj |
i i j
! !
X X X X
= |aij | |vj | ≤ max |aik | · |vj |
k
j i j i
!
X
= max |aik | · kvk1 .
k
i
To complete the proof, we need to find one particular v for which we get equality.
Assume that the largest column sum is in column j0 , then v = ej0 (standard
basis vector) will work.
The proof for p = ∞ is similar (exercise).
The proof for p = 2 will be done later, in corollary 5.21.
Example: Let
3 −1 4
A = 1 5 −9 .
2 6 5i
The row sums are 8, 15, 13. The column sums are 6, 12, 18.
Proof.
d1 v1 d 1 v1
.. .. ..
Dv = . = . .
.
dn vn dn vn
Then
X X p
kDvkpp = |di |p |vi |p ≤ max |di |p |vi |p = max |di | kvkpp ,
i i
i i
so
kDvkp ≤ max |di |.
i
To show equality, you need to find one particular vector for which you have
equality. For example, the standard basis vector ei0 for the index i0 which
corresponds to the maximum |di |.
Proof.
Likewise for V .
A = U ΣV ∗ .
Instead, you need to base the norms on the singular values. We will get to that
in a bit. First, we prove a few theorems about eigenvalues and norms in general.
Proof. If kAk is compatible with any vector norm, this is easy: Take v to be
the eigenvector to the largest eigenvalue, then
For a general norm (which satisfies (e)) we use a littletrick: Let V be the matrix
all of whose columns are equal to v. Then
Theorem 5.24. For any (square) matrix A and any > 0 there exists a
matrix norm so that
ρ(A) ≤ kAk ≤ ρ(A) + .
Note: This does not say that there is a single matrix norm that works for all
matrices A. It says that for each fixed A and fixed , there is such a norm.
Proof. Let ρ(A) = 1 − , and find a matrix norm so that kAk < 1 − (/2) < 1.
Then kAn k ≤ kAkn → 0.
1
A = A,
ρ(A) +
then
ρ(A)
ρ(A ) = < 1.
ρ(A) +
There is some matrix norm for which kA k < 1, so kAn k → 0. Since all matrix
norms are equivalent, this also applies to whatever matrix norm we are dealing
with. For large enough n, kAn k < 1, which implies
Remark: If a matrix A has spectral radius ρ, can kAvk every be larger than
ρ · kvk? The answer is yes, for example
1 1 1 2
= , ρ(A) = 1.
0 1 1 1
What the theorem says is that in the long run, if you keep applying A over and
over again, on average the vector cannot grow by any factor larger than ρ.
5.3. MATRIX NORMS 41
This is a subset of the complex plane which includes the spectrum. The numerical
radius is
r(A) = max |z|.
z∈W (A)
with
r(A) = r(B) = 1/2, r(AB) = 1.
σ = (σ1 , · · · , σr )T .
X
p=1: kAk∗ = σi (the nuclear norm)
i
sX
p=2: kAkF = σi2
i
The nuclear norm is new. The Schatten 2-norm turns out to be the Frobenius
norm. The Schatten ∞-norm is the matrix 2-norm.
Proof. We have A = U ΣV ∗ (SVD), and for both Frobenius norm and 2-norm
we proved earlier that kAk = kΣk. The rest is then obvious.
Sideline: It is interesting to consider what the Schatten 0-norm (which is not really
a norm) would be. This is the number of nonzero singular values, which is the rank
of A.
42 CHAPTER 5. VECTOR AND MATRIX NORMS
Lemma 5.28. If kAk < R for any matrix norm, or if ρ(A) < R, then the
power series converges.
Note: If ρ(A) > R, the series will diverge. If kAk > R for some norm, that is
inconclusive, since there may be some other norm which is less than R.
Example: The power series for 1/(1 − x) is
∞
1 X
= xk .
1−x
k=0
1
k(I − A)−1 k ≤ .
1 − kAk
p(t) = an tn + · · · + a1 t + a0 .
Here the input x is the coefficient vector (an , . . . , a0 )T , and the output y is the
vector of zeros (t1 , . . . , tn ).
If we change the input a little, from x to x + ∆x, we get a different output
y+∆y. ∆x could be measurement error, or roundoff error from putting numbers
on a computer. The question is: How sensitive is the output to small changes
in input?
Usually, the relative error is more meaningful. A relative error of 10−6 means
that we can trust 6 decimals in the number.
k∆xk/kxk
.
k∆yk/kyk
• The condition number depends on the norm used. Different norms give
different condition numbers, but usually of the same order of magnitude.
• The condition number is an interesting concept, but most of the time you
cannot actually calculate it. One exception is in linear algebra.
44 CHAPTER 5. VECTOR AND MATRIX NORMS
Example: This example explains how you would use the condition number.
Suppose you are calculating something on a standard computer. The com-
puter works with fixed accuracy, usually equivalent to about 15 decimals of
accuracy.
Suppose you know that the condition number of your problem is 1010 .
Whenever you put your numbers on the computer, you have to assume a
k∆xk of at least 10−15 , because your input gets rounded to 15 decimals. In
the worst case, that error will be magnified by 1010 , so your final relative error
could be 10−5 . That means you can only trust 5 decimals in your answer.
Caution: This is assuming the worst case, both in the original rounding
and in the error propagation. Most of the time, your answer will be correct to
more decimals. The point is that you cannot trust any more than 5 decimals.
Ax = b
A(x + ∆x) = (b + ∆b)
which implies
A∆x = ∆b ⇒ ∆x = A−1 ∆b.
k∆xk/kxk
κ = max .
∆b k∆bk/kbk
5.4. APPLICATIONS OF MATRIX NORMS 45
Now
kbk 1 kAk
kbk ≤ kAk · kxk ⇒ kxk ≥ , ≤
kAk kxk kbk
k∆xk ≤ kA−1 kk∆bk.
• This is the condition number for changes in the right-hand side b. You can
also consider the condition number for changes in A. That turns out the
be the same number, but that is a coincidence. If you consider the least
squares solution of an overdetermined Ax = b, the condition numbers for
changes in b and changes in A are different.
• The condition number for the forward problem “compute y = Ax” also
happens to be the same.
• As I said above, for different norms you get different condition numbers.
For the 2-norm, κ = σ1 /σn (ratio of largest and smallest singular value).
• In theoretical linear algebra, a matrix is either singular, or it is not. In
practical linear algebra, there is no such thing as an exactly singular ma-
trix, unless you have a matrix of small integers. What happens is that
kA−1 k, and therefore κ, gets so large that you have no digits of accuracy
left, so you computations are basically meaningless.