0% found this document useful (0 votes)
50 views79 pages

π on a computer!), but a piece of paper or

1. The document discusses numerical analysis and its importance in modeling physical systems using mathematical formulas. Numerical analysis develops methods to compute numerical approximations since most real-world problems cannot be solved exactly. 2. It then discusses sources of error in numerical computations, including rounding errors from approximating real numbers on computers. The document provides an example of solving a quadratic equation to illustrate how errors can arise. 3. Finally, it discusses efficiency considerations in algorithm design and introduces Big O notation to analyze the performance of algorithms. It provides an example comparing the naive and Horner's methods for polynomial evaluation.

Uploaded by

Ng Chun Hiu
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
0% found this document useful (0 votes)
50 views79 pages

π on a computer!), but a piece of paper or

1. The document discusses numerical analysis and its importance in modeling physical systems using mathematical formulas. Numerical analysis develops methods to compute numerical approximations since most real-world problems cannot be solved exactly. 2. It then discusses sources of error in numerical computations, including rounding errors from approximating real numbers on computers. The document provides an example of solving a quadratic equation to illustrate how errors can arise. 3. Finally, it discusses efficiency considerations in algorithm design and introduces Big O notation to analyze the performance of algorithms. It provides an example comparing the naive and Horner's methods for polynomial evaluation.

Uploaded by

Ng Chun Hiu
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 79

Lecture 1

Modelling the physical world is the raison d etre of classical mathematical analysis.
The study of continuous functions and their properties, of differentiation and integra-
tion, has it origins in the attempt to describe howthings move and behave. With the rise
of technology it became increasingly important to get actual numbers out of formulae
and equations. This is where numerical analysis comes into the scene: to develop meth-
ods to make mathematical models based on continuous mathematics effective.
In practice, one cannot simply plug numbers into formulae and compute every-
thing one would like to. Most problems require an innite number of steps to solve, and
one only has a nite amount of time available. Most numerical data would also require
an innite amount of storage (just try to store on a computer!), but a piece of paper or
a computer has only so much space. We therefore have to work with approximations.
1
In the early 19th century, C.F. Gauss, one of the most inuential mathematicians of all
time and a pioneer of numerical analysis, developed the method of least squares in or-
der to predict the reappearance of the recently discovered asteroid Ceres. He was well
aware of the limitations of numerical computing. In his own words:
Since none of the numbers which we take out from logarithmic and
trigonometric tables admit of absolute precision, but are all to a
certain extent approximate only, the results of all calculations per-
formed by the aid of these numbers can only be approximately true.
2
Nowadays, the established way of representing real numbers on computers is us-
ing oating-point arithmetic. In the double precision version of the IEEE standard for
oating-point arithmetic, a number is represented using 64 bits.
3
A number is written
x =f 2
e
,
where f is a fraction in [0, 1], represented using 52 bits, and e is the exponent, using 11
bits (what is the remaining 64th bit used for?). Two things are worth noticing about this
representation: there are largest possible numbers, and there are gaps between repre-
sentable numbers. The largest and smallest numbers representable in this form are of
the order of 10
308
, enough for most practical purposes. A bigger concern are the gaps,
which means that the results of many computations almost always have to be rounded
to the closest oating-point number.
Throughout this course, when going through calculations without using a com-
puter, we will usually use the terminology of signicant gures (s.f.) and work with 4
signicant gures in base 10. For example, in base 10,

3 equals 1.732 to 4 signicant


1
In discrete mathematics and combinatorics, approximation also becomes a necessity, albeit for a different
reason, namely computational complexity. Many combinatorial problems are classied as NP-hard, which
makes them computationally intractable.
2
From: C.F. Gauss, Theoria motus corporum coelestium in sectionibus conicis solem ambientium, 1809
3
A bit is a binary digits, that is, either 0 or 1.
1
2 LECTURE 1
gures. To count the number of signicant gures in a given number, start with the
rst non-zero digit from the left and, moving to the right, count all the digits thereafter,
counting nal zeros if they are to the right of the decimal point. For example, 1.2048,
12.040, 0.012048, 0.0012040 and 12040.0 all have 5 signicant gures (s.f.). In rounding
or truncation of a number to n s.f., the original is replaced by a number with n s.f. Note
that nal zeros to left of the decimal point may or may not be signicant: the num-
ber 1204000 has a least 4 signicant gures, but without any more information there is
no way of knowing whether or not any more gures are signicant. When 1203970 is
rounded to 5 signicant gures to give 1204000, an explanation that this has 5 signif-
icant gures is required. This could be made clear by writing it in scientic notation:
1.2040 10
6
. To say that a = 1.2048 to 5 s.f. means that the exact value of a becomes
1.2048 after rounding to 5 s.f.: that is to say, 1.20475 a 1.40485.
Example 1.1. Suppose we want to nd the solution to the quadratic equation
ax
2
+bx +c =0.
The two solutions to this problem are given by
(1.0.1) x
1
=
b +

b
2
4ac
2a
, x
2
=
b

b
2
4ac
2a
.
Inprinciple, to nd x
1
and x
2
one only needs to evaluate the expressions for given a, b, c.
Assume, however, that we are only allowed to compute to four signicant gures, and
consider the particular equation
x
2
+39.7x +0.13 =0.
Using the formula 1.0.1, we have, always rounding to four signicant gures,
a =1, b =39.7, c =0.13,
b
2
=1576.09 =1576 (to 4 s.f.) , 4ac =0.52 (to 4 s.f.),
b
2
4ac =1575.48 =1575 (to 4 s.f.) ,

b
2
4ac =39.69.
Hence, the computed solutions (to 4 signicant gures) are given by
x
1
=0.005, x
2
=39.69
The exact solutions, however, are
x
1
=0.0032748..., x
2
=39.6907...
The solution x
1
is completely wrong, at least if we look at the relative error:
|x
1
x
1
|
|x
1
|
=0.5268.
While the accuracy canbe increased by increasing the number of signicant gures dur-
ing the calculation, such effects happen all the time in scientic computing and the
possibility of such effects has to be taken into account when designing numerical algo-
rithms.
Note that is makes sense, as in the above example, to look at errors in a relative
sense. An error of one mile is certainly negligible when dealing with astronomical dis-
tances, but not so when measuring the length of a race track.
LECTURE 1 3
By analysing what causes the error it is sometimes possible to modify the method
of calculation in order to improve the result. In the present example, the problems are
being caused by the fact that b

b
2
4ac, and therefore
b +

b
2
4ac
2a
=
39.7+39.69
2
causes what is called catastrophic cancellation. A way out is provided by the observa-
tion that the two solutions are related by
(1.0.2) x
1
x
2
=
c
a
.
When b > 0, the calculation of x
2
according to (1.0.1) shouldnt cause any problems,
in our case we get 39.69 to four signicant gures. We can then use (1.0.2) to derive
x
1
=c/(ax
2
) =0.00327.
As we have seen, one can get around numerical catastrophes by choosing a clever
method for solving a problem, rather than increasing precision. So far we have consid-
ered errors introduced due to rounding operations. There are other sources of errors:
(1) Overow
(2) Errors in the model
(3) Human or measurements errors
(4) Truncation or approximation errors
The rst is rarely an issue, as we can represent numbers of order 10
308
on a com-
puter. The second two are important factors, but fall outside the scope of this lecture.
The third has to do with the fact that many computations are done approximately rather
than exactly. For computing the exponential, for example, we might use a method that
gives the approximation
e
x
1+x +
x
2
2
.
As it turns out, many practical methods give approximations to the true solution.
Lecture 2
Efciency
An important consideration in the design of numerical algorithms is efciency. We
would like to perform computations as fast as possible. Considerable speed-ups are
possible by clever algorithm design that aims to reduce the number of arithmetic oper-
ations needed to perform a task.
To conveniently study the performance of algorithms we use the big-O notation.
Given two functions f (n) and g(n) taking integer arguments, we say that
f (n) O(g(n)) or f (n) =O(g(n)),
if there exists a constant C > 0 and n
0
> 0, such that f (n) <C g(n) for all n > n
0
. For
example, nlog(n) =O(n
2
) and n
3
+10
308
n O(n
3
).
Example 2.1. Take, for example, the problem of evaluating a polynomial
p
n
(x) =a
0
+a
1
x +a
2
x
2
+ +a
n
x
n
for some x R and given a
0
, . . . , a
n
. A naive strategy would be as follows:
(1) Compute x, x
2
, . . . , x
n
,
(2) Multiply a
k
x
k
for k =1, . . . , n,
(3) Sum all the terms.
If each of the x
k
is computed individually from scratch, the overall number of multi-
plications is n(n +1)/2, or O(n
2
). This can be improved to 2n 1 multiplications by
computing the powers x
k
, 1 k n, iteratively. A smarter way, that also uses less inter-
mediate storage, can be derived by observing that the polynomial can be written in the
following form:
p
n
(x) =a
0
+x(a
1
+a
2
x +a
3
x
2
+ +a
n
x
n1
)
=a
0
+xp
n1
(x).
The polynomial in brackets has degree n 1, and once we have evaluated it, we only
need one additional multiplication to have the value of p(x). In the same way, p
n1
(x)
can be written as p
n1
(x) =a
1
+xp
n2
(x) for a polynomial p
n2
(x) of degree n2. This
suggests the possibility of recursion, leading to Horners Algorithm. The following MAT-
LAB code encodes the coefcients a
0
, . . . , a
n
as a vector with entries a(1), . . . , a(n +1).
p = a(n+1);
for k=n:1:1
p = a(k)+x
*
p;
end
1
2 LECTURE 2
This algorithmonly requires n multiplications, as opposed to the n(n+1)/2 of the naive
approach. Horners Method is the standard way of evaluating polynomials on comput-
ers.
An interesting and challenging eld is algebraic complexity theory, which deals with
lower bounds on the number of arithmetic operations needed to perform certain com-
putational tasks. It also asks questions such as whether Horners method and other al-
gorithms are optimal, that is, cant be improved upon.
Interpolation
How do we represent a continuous function on a computer? If f is a polynomial,
f (x) =p(x) =a
0
+a
1
x + +a
n
x
n
then we only need to store the n +1 coefcients a
0
, . . . , a
n
. Maybe surprisingly, polyno-
mials can represent a large variety of curves, as the following gure illustrates.
In fact, one can approximate an arbitrary continuous function on a bounded inter-
val by a polynomial.
Theorem 2.1 (Weierstrass). For any f C([0, 1], R) and any > 0 there exists a polyno-
mial p(x) such that
max
0x1

f (x) p(x)

.
In this lecture we begin by studying the problemof polynomial interpolation. Given
pairs (x
j
, y
j
) R, 0 j n, with distinct x
j
, we would like to nd a polynomial p of
smallest possible degree such that
p(x
j
) = y
j
, 0 j n.
0 1 2 3 4 5 6 7
6
8
10
12
14
16
18
20
22
FIGURE 1. The interpolation problem.
INTERPOLATION 3
The next lemma shows that it is indeed possible to nd such a polynomial of degree
at most n. We denote by P
n
the set of all polynomials of degree at most n. Note that this
also includes polynomials of degree smaller than n, and in particular constants, since
we allow coefcients such as a
n
in the representation a
0
+a
1
x + +a
n
x
n
to be zero.
Lemma 2.1. There exists polynomials L
k
P
n
such that
L
k
(x
j
) =

1 j =k,
0 j =k
Moreover, the polynomial
p
n
(x) =
n

k=0
L
k
(x)y
k
is in P
n
and satises p
n
(x
j
) = y
j
for 0 j n.
PROOF. Clearly, if L
k
exists, then it is a polynomial of degree n with roots at x
j
for
j =k. Hence,
L
k
(x) =C
k

j =k
(x x
j
) =C
k
(x x
0
) (x x
j 1
)(x x
j +1
) (x x
n
)
for a constant C
k
. To determine C
k
, set x = x
k
. Then L
k
(x
k
) =1 =C
k

j =k
(x
k
x
j
) and
hence
C
k
=
1

j =k
(x
k
x
j
)
.
Note that we assumed the x
j
to be distinct, otherwise we would have to divide by zero
and cause a disaster. We therefore get the representation
L
k
(x) =

j =k
(x x
j
)

j =k
(x
k
x
j
)
.
This proves the rst claim. Now set
p(x) =
n

k=0
y
k
L
k
(x).
Then p(x
j
) =

n
k=0
y
k
L
k
(x
j
) = y
j
L
j
(x
j
) = y
j
. Since p(x) is a linear combinations of the
L
k
, it lives in P
k
. This completes the prove.
Lecture 3
We have shown the existence of an interpolating polynomial. We next show that
this polynomial is uniquely determined.
THEOREM 3.1 (Lagrange Interpolation Theorem). Let n 0. Let x
j
, 0 j n, be
distinct real numbers and let y
j
, 0 j n, be any real numbers. Then there exists a
unique p(x) P
n
such that
(3.0.1) p(x
j
) = y
j
, 0 j n.
PROOF. The case n = 0 is clear, so lets assume n 1. In Lecture 2 we constructed
a polynomial p(x) of degree at most n satisfying the conditions (3.0.1), proving the ex-
istence part. For the uniqueness, assume that we have two such polynomials p(x) and
q(x) of degree at most n satisfying the interpolating property (3.0.1). The goal is to show
that they are the same. By assumption, the difference p(x) q(x) is a polynomial of de-
gree at most n that takes on the value p(x
j
) q(x
j
) = y
j
y
j
=0 at the n +1 distinct x
j
,
0 j n. By the Fundamental Theorem of Algebra, a non-zero polynomial of degree
n can have at most n distinct real roots, from which it follows that p(x) q(x) 0, or
p(x) =q(x). This concludes the proof.
DEFINITION 3.0.1. Given n +1 distinct real numbers x
j
, 0 j n, and n +1 real
numbers y
j
, 0 j n, the polynomial
p(x) =
n

k=0
L
k
(x)y
k
is called the Lagrange interpolation polynomial of degree n corresponding to the data
points (x
j
, y
j
), 0 j n. If the y
k
are the values of a function f , that is, if f (x
k
) = y
k
,
0 k n, then p(x) is called the Lagrange interpolation polynomial associated to f and
x
0
, . . . , x
n
.
A different take on the uniqueness problem can be arrived at by translating the
problem into a linear algebra one. For this, note that if p(x) = a
0
+a
1
x + +a
n
x
n
,
then the polynomial evaluation problem at the x
j
, 0 j n, can be written as a matrix
vector product:

y
0
y
1
.
.
.
y
n

1 x
0
x
n
0
1 x
1
x
n
1
.
.
.
.
.
.
.
.
.
.
.
.
1 x
n
x
n
n

a
0
a
1
.
.
.
a
n

,
or y = Xa. If the matrix X is invertible, then the interpolating polynomial is uniquely
determined by the coefcient vector a = X
1
y. The matrix X is invertible if and only if
1
2 LECTURE 3
det(X) =0. The determinant of X is the well-known Vandermonde determinant:
det(X) =det

1 x
0
x
n
0
1 x
1
x
n
1
.
.
.
.
.
.
.
.
.
.
.
.
1 x
n
x
n
n

j >i
(x
j
x
i
).
Clearly, this determinant is different fromzero if and only if the x
j
are all distinct, which
shows the importance of this assumption.
EXAMPLE 3.1. Consider the function f (x) =e
x
on the interval [1, 1], with interpo-
lation points x
0
=1, x
1
=0, x
2
=1. The Lagrange basis functions are given by
L
0
(x) =
(x x
1
)(x x
2
)
(x
0
x
1
)(x
0
x
2
)
=
1
2
x(x 1),
L
1
(x) =1x
2
,
L
2
(x) =
1
2
x(x +1).
The Lagrange interpolation polynomial is therefore given by
p(x) =
1
2
x(x 1)e
1
+(1x
2
)e
0
+
1
2
x(x +1)e
1
=1+x sinh(1) +x
2
(cosh(1) 1).
1 0.8 0.6 0.4 0.2 0 0.2 0.4 0.6 0.8 1
0
0.5
1
1.5
2
2.5
3
FIGURE 1. Lagrange interpolation of e
x
at 1, 0, 1.
If the data points (x
j
, y
j
) come from a function f (x), that is, if f (x
j
) = y
j
, then the
Lagrange interpolating polynomial can look very different from the original function. It
is therefore of interest to have some control over the interpolation error f (x) p
n
(x).
Clearly, without any further assumption on f the difference can be arbitrary. We will
therefore restrict to function f that are sufciently smooth. What this means is codied
in the following denition.
LECTURE 3 3
DEFINITION 3.0.2. Dene C
k
([a, b], R) to be the set of functions f : [a, b] R that
have k continuous derivatives on [a, b], and such that
max
axb
| f
( j )
(x)| <
for 0 j k, where f
( j )
(x) denotes the j -th derivative.
EXAMPLE 3.2. All polynomials belong toC
k
([a, b], R) for all bounded intervals [a, b]
and any integer k 0. However, f (x) =1/x C([0, 1], R), as f (x) for x 0.
THEOREM 3.2. Let n 0 and assume f C
n+1
([a, b], R). Let p(x) P
n
be the La-
grange interpolation polynomial associated to f and distinct x
j
, 0 j n. Then for
every x [a, b] there exists =(x) (a, b) such that
f (x) p(x) =
f
(n+1)
()
(n +1)!

n+1
(x),
where

n+1
(x) =(x x
0
) (x x
n
).
Note that is in general not known, but the magnitude of f p can often be
bounded as in the following corollary.
COROLLARY 3.1. Under the conditions as in Theorem 3.2,
| f (x) p(x)|
M
n+1
(n +1)!
|
n+1
(x)|,
where
M
n+1
= max
axb
| f
(n+1)
(x)|.
For the proof of Theorem 3.2 we need the following consequence of Rolles Theo-
rem.
LEMMA 3.1. Let f C
n
([a, b], R), and suppose f vanishes at n +1 points x
0
, . . . , x
n
.
Then there exists (a, b) such that the n-th derivative f
(n)
(x) satises f () =0.
PROOF. By Rolles Theorem, for any two x
i
, x
j
there exists a point in between where
f

vanishes, therefore f

vanishes at (at least) n points. Repeating this argument, it fol-
lows that f
(n)
vanishes at some point (a, b).
PROOF. (Theorem3.2) Assume x =x
j
for 0 j n (otherwise the theoremis clearly
true). Dene the function
(t ) = f (t ) p(t )
f (x) p(x)

n+1
(x)

n+1
(t ).
This function vanishes at n +2 distinct points, namely t = x
j
, 0 j n, and x. Assume
n >0 (the case n =0 is left as an exercise). By Lemma 3.1, the function
(n+1)
has a zero
(a, b), while the (n +1)-st derivative of p vanishes (since p is a polynomial of degree
n). We therefore have
0 =
(n+1)
() = f
(n+1)
()
f (x) p(x)

n+1
(x)
(n +1)!,
from which we get
f (x) p(x) =
f
(n+1)
()
(n +1)!

n+1
(x).
This completes the proof.
Lecture 4
Now that we have established the existence and uniqueness of the interpolation
polynomial, we would like know how good it approximates the function.
THEOREM 4.1. Let n 0 and assume f C
n+1
([a, b], R). Let p(x) P
n
be the La-
grange interpolation polynomial associated to f and distinct x
j
, 0 j n. Then for
every x [a, b] there exists =(x) (a, b) such that
(4.0.1) f (x) p(x) =
f
(n+1)
()
(n +1)!

n+1
(x),
where

n+1
(x) =(x x
0
) (x x
n
).
For the proof of Theorem 4.1 we need the following consequence of Rolles Theo-
rem.
LEMMA 4.1. Let f C
n
([a, b], R), and suppose f vanishes at n +1 points x
0
, . . . , x
n
.
Then there exists (a, b) such that the n-th derivative f
(n)
(x) satises f () =0.
PROOF. By Rolles Theorem, for any two x
i
, x
j
there exists a point in between where
f

vanishes, therefore f

vanishes at (at least) n points. Repeating this argument, it fol-
lows that f
(n)
vanishes at some point (a, b).
PROOF. (Theorem4.1) Assume x =x
j
for 0 j n (otherwise the theoremis clearly
true). Dene the function
(t ) = f (t ) p(t )
f (x) p(x)

n+1
(x)

n+1
(t ).
This function vanishes at n +2 distinct points, namely t = x
j
, 0 j n, and x. Assume
n >0 (the case n =0 is left as an exercise). By Lemma 4.1, the function
(n+1)
has a zero
(a, b), while the (n +1)-st derivative of p vanishes (since p is a polynomial of degree
n). We therefore have
0 =
(n+1)
() = f
(n+1)
()
f (x) p(x)

n+1
(x)
(n +1)!,
from which we get
f (x) p(x) =
f
(n+1)
()
(n +1)!

n+1
(x).
This completes the proof.
Theorem 4.1 contains an unspecied number . Even though we cant nd this
location in practice, the situation is not too bad as we can sometimes upper-bound the
n +1-st derivative of f on the interval [a, b].
1
2 LECTURE 4
COROLLARY 4.1. Under the conditions as in Theorem 4.1,
| f (x) p(x)|
M
n+1
(n +1)!
|
n+1
(x)|,
where
M
n+1
= max
axb
| f
(n+1)
(x)|.
PROOF. While we dont have the inthe expression4.0.1, by assumptionwe do have
the bound
| f
(n+1)
()| M
n+1
,
so that
| f (x) p(x)|

f
(n+1)
()
n+1
(x)
(n +1)!

M
n+1
|
n+1
(x)|
(n +1)!
.
This completes the proof.
EXAMPLE 4.1. Suppose we would like to approximate f (x) =e
x
by an interpolating
polynomial p P
1
at points x
0
, x
1
[0, 1] that are separated by a distance h = x
1
x
0
.
What h should we choose to achieve
|p(x) e
x
| 10
5
, x
0
x x
1
.
From Corollary 4.1, we get
|p(x) e
x
|
M
2

2
(x)
2
,
where M
2
=max
x
0
xx
1
| f
(2)
(x)| e (because x
0
, x
1
[0, 1] and f
(2)
(x) =e
x
) and
2
(x) =
(x x
0
)(x x
1
). To nd the maximum of
2
(x), rst write
x =x
0
+h, x
1
=x
0
+h.
for [0, 1]. Then

2
(x) =h(h h) =h
2
(1).
By taking derivatives with respect to we nd that the maximum is attained at =1/2.
Hence,

2
(x) h
2
1
2

1
1
2

=
h
2
4
.
We conclude that
|p(x) e
x
|
h
2
e
8
.
In order to achieve that this falls below 10
5
we require that h

8 10
5
/e = 0.4250
10
3
. This gives information on how big the spacing of points needs to be for linear
interpolation to achieve a certain accuracy.
Convergence
For a given set of points x
0
, . . . , x
n
and function f , we have a bound on the interpo-
lation. Is it possible to make the error smaller by adding more interpolation points, or
by modifying the distribution of these points? To be specic, consider the interval [a, b]
and let
x
j
=a +
j
n
(b a), 0 j n
be n +1 uniformly spaced points on [a, b]. Let p
n
(x) denote the Lagrange interpolation
polynomial of degree n for f at the points x
0
, . . . , x
n
. The question we ask is whether
lim
n
max
axb
|p
n
(x) f (x)| 0,
CONVERGENCE 3
that is, whether the error gets smaller as the number of interpolation points increases.
Perhaps surprisingly, the answer is negative, as the following famous example, known
as the Runge Phenomenon, shows.
EXAMPLE 4.2. Consider the function
f (x) =
1
1+x
2
on the interval [5, 5]. This certainly looks like a reasonably smooth function that
should be well approximated by interpolation. However, for this function the sequence
M
n+1
|
n+1
(x)| grows faster than (n +1)! and is therefore useless. In fact, the following
table suggests that that interpolation error is indeed exponential.
5 4 3 2 1 0 1 2 3 4 5
0.5
0
0.5
1
1.5
2


1/(1+x
2
)
n=11
FIGURE 1. The Runge phenomenon.
The reason for this behaviour is that the complex function z 1/(1 +z
2
) has poles at
the imaginary values i .
Lecture 5
Convergence
For a givenset of points x
0
, . . . , x
n
and function f , we have a bound onthe interpola-
tion error. Is it possible to make the error smaller by adding more interpolation points,
or by modifying the distribution of these points? The answer to this question can de-
pend on two things: the class of functions considered, and the spacing of the points.
Let p
n
(x) denote the Lagrange interpolation polynomial of degree n for f at the points
x
0
, . . . , x
n
. The question we ask is whether
lim
n
max
axb
|p
n
(x) f (x)| 0.
Perhaps surprisingly, the answer is negative, as the following famous example, known
as the Runge Phenomenon, shows.
Example 5.1. Consider the interval [a, b] and let
x
j
=a +
j
n
(b a), 0 j n,
be n +1 uniformly spaced points on [a, b]. Consider the function
f (x) =
1
1+25x
2
on the interval [1, 1].
1 0.5 0 0.5 1
1
0
1
2
1 0.5 0 0.5 1
1
0
1
2
1 0.5 0 0.5 1
1
0
1
2
1 0.5 0 0.5 1
1
0
1
2
1
2 LECTURE 5
This function is smooth and it appears unlikely to cause any trouble. However, when
interpolating at various equispacedpoints for increasing n, we see that the interpolation
error seems to increase. The reason for this phenomenon lies in the behaviour of the
complex function z 1/(1+z
2
).
The problem is not one of the interpolation method, but has to do with the spacing
of the points.
Example 5.2. Let us revisit the function 1/(1+25x
2
) and try to interpolate it at Cheby-
shev points:
x
j
=cos( j /n), 0 j n.
Calculating the interpolation error for this example shows a completely different result
as the previous example. In fact, plotting the error and comparing it with the case of
equispaced points shows that choosing the interpolation points in a clever way can be
a huge benet.
1 0.5 0 0.5 1
0.2
0
0.2
0.4
0.6
0.8
1
0 5 10 15 20
0
5
10
15
20
25
30


Equal spacing
Chebyshev spacing
FIGURE 1. Interpolation error for equispaced and Chebyshev points.
To summarise, we have the following two observations
(1) To estimate the difference | f (x) p(x)| we need assumptions on the function
f , for example, that it is sufciently smooth.
(2) The location of the interpolation points x
j
, 0 j n, is crucial. Equispaced
points can lead to unpleasant results!
An alternative form
The representation as Lagrange interpolation polynomial
p(x) =
n

k=0
L
k
(x) f (x
k
)
has some drawbacks. On the one hand, it requires O(n
2
) operations to evaluate. Besides
this, adding new interpolation points requires the recalculation of the Lagrange basis
polynomials L
k
(x). Both of these problems can be remedied by rewriting the Lagrange
interpolation formula.
Provided x =x
j
for 0 j n, the Lagrange interpolation polynomial can be written
as
(5.0.1) p(x) =

n
k=0
w
k
xx
k
f (x
k
)

n
k=0
w
k
xx
k
,
NEWTONS DIVIDED DIFFERENCES 3
where w
k
= 1/

j =k
(x
k
x
j
) are called the barycentric weights. Once the weights have
been computed, the evaluation of this form only takes O(n) operations, and updat-
ing it with new weights also only takes O(n) operations. To derive this formula, dene
L(x) =

n
k=0
(x x
k
) and note that p(x) = L(x)

n
k=0
w
k
/(x x
k
) f (x
k
). Noting also that
1 =

n
k=0
L
k
(x) = L(x)

n
k=0
w
k
/(x x
k
) (see Problem (2.6)) and dividing by this intel-
ligent one, Equation (5.0.1) follows. Finally, it can be shown that the problem of com-
puting the barycentric Lagrange interpolation is numerically stable at points such as
Chebyshev points.
Newtons divided differences
While the interpolation polynomial of degree at most n for a function f and n +1
points x
0
, . . . , x
n
is unique, it can appear in different forms. The one we have seen so
far is the Lagrange form, where the polynomial is given as a linear combination of the
Lagrange basis functions. A different way of representing the interpolation polynomial
is by writing it in the form
p(x) =a
0
+a
1
(x x
0
) + +a
n
(x x
0
) (x x
n1
).
Provided we have the coefcients a
0
, . . . , a
n
, evaluating the polynomial only requires n
multiplications using Horners Method. Moreover, it is easy to add new points: if x
n+1
is
added, the coefcients a
0
, . . . , a
n1
dont need to be changed.
Lecture 6
Newtons divided differences
While the interpolation polynomial of degree at most n for a funtion f and n +1
points x
0
, . . . , x
n
is unique, it can appear in different forms. The one we have seen so
far is the Lagrange form, where the polynomial is given as a linear combination of the
Lagrange basis functions:
p(x) =
n

k=0
L
k
(x)y
k
,
or some modications of this form, such as the barycentric form (see Lecture 5). A dif-
ferent way of representing the interpolation polynomial is by writing it in the form
(6.0.1) p(x) =a
0
+a
1
(x x
0
) + +a
n
(x x
0
) (x x
n1
).
Provided we have the coefcients a
0
, . . . , a
n
, evaluating the polynomial only requires n
multiplications using Horners Method. Moreover, it is easy to add new points: if x
n+1
is
added, the coefcients a
0
, . . . , a
n
dont need to be changed.
Well describe a method of computing the coefcients a
0
, . . . , a
n
using divided dif-
ferences. The divided differences associated to the function f and distinct x
0
, . . . , x
n
R
are dened recursively as
f [x
i
] := f (x
i
),
f [x
i
, x
i +1
] :=
f [x
i +1
] f [x
i
]
x
i +1
x
i
,
f [x
i
, x
i +1
, . . . , x
i +k
] : =
f [x
i +1
, x
i +2
, . . . , x
i +k
] f [x
i
, x
i +2
, . . . , x
i +k1
]
x
i +k
x
i
.
The divided differences can be computed froma divided difference table, where we
move from one column to the next by applying the rules above:
x
0
f
0
f [x
0
, x
1
]
x
1
f
1
f [x
0
, x
1
, x
2
]
f [x
1
, x
2
] f [x
0
, x
1
, x
2
, x
3
]
x
2
f
2
f [x
1
, x
2
, x
3
]
f [x
2
, x
3
]
x
3
f
3
From this table we also see that adding a new pair (x
n+1
, f
n+1
) would require an update
of the table that takes O(n) operations.
1
2 LECTURE 6
Theorem 6.1. Let x
0
, . . . , x
n
be distinct points. Then the interpolation polynomial for f
at points x
i
, . . . , x
i +k
is given by
p
i ,k
(x) = f [x
i
] + f [x
i
, x
i +1
](x x
i
) + f [x
i
, x
i +1
, x
i +2
](x x
i
)(x x
i +1
) +
+ f [x
i
, . . . , x
i +k
](x x
0
) (x x
i +k1
).
In particular, the coefcients in Equation (6.0.1) are given by the divided differences
a
k
= f [x
0
, . . . , x
k
].
PROOF. The proof is by induction on k. For the case k =0 we have p
0,i
(x) = f [x
i
] =
f (x
i
), so the claim is true in this case. Assume the statement holds for k > 0. Choose
a
k+1
such that
(6.0.2) p
i ,k+1
(x) =p
i ,k
+a
k+1
(x x
i
) . . . (x x
i +k
)
interpolates f at x
i
, . . . , x
i +k+1
. Note that p
i ,k+1
(x
j
) = f (x
j
) for i j i +k, so that we
only require a
k+1
to be chosen so that p
i ,k+1
(x
i +k+1
) = f (x
i +k+1
). Dene
(6.0.3) q(x) =
(x x
i
)p
i +1,k
(x) (x x
i +k+1
)p
i ,k
(x)
x
i +k+1
x
i
.
This polynomial has degree k +1, just like p
i ,k+1
(x). Moreover:
q(x
i
) =p
i ,k
(x
i
) = f (x
i
)
q(x
i +k+1
=p
i ,k+1
(x
i +k+1
) = f (x
i +k+1
)
q(x
j
) =
(x
j
x
i
) f (x
j
) (x
j
x
i +k+1
) f (x
j
)
x
i +k+1
x
i
= f (x
j
), i +1 j i +k.
This means that q(x) also interpolates f at x
i
, . . . , x
i +k+1
, and by the uniqueness of the
interpolation polynomial, must equal p
i ,k+1
(x). Lets now compare the coefcients of
x
k+1
in both polynomials. The coefcient of x
k+1
in p
i ,k+1
is a
k+1
, as can be seen
from (6.0.2). By the induction hypothesis, the polynomials p
i +1,k
(x) and p
i ,k
(x) have
the form
p
i +1,k
(x) = f [x
i +1
, . . . , x
k+i +1
]x
k
+ lower order terms ,
p
i ,k
(x) = f [x
i
, . . . , x
k+i
]x
k
+ lower order terms .
By plugging into (6.0.3), we see that the coefcient of x
k+1
in q(x) is
f [x
i +1
, . . . , x
i +k+1
] f [x
i
, . . . , x
i +k
]
x
i +k+1
x
i
= f [x
i
, . . . , x
i +k+1
].
This coefcient has to equal a
k+1
, and the claim follows.
Example 6.1. Lets nd the divided difference formof a cubic interpolation polynomial
for the points
(1, 1), (0, 1), (3, 181), (2, 39).
NEWTONS DIVIDED DIFFERENCES 3
The divided difference table would look like
(6.0.4)
j x
j
f
j
f [x
j
, x
j +1
] f [x
j
, x
j +1
, x
j +2
] f [x
0
, x
1
, x
2
, x
3
]
0 1 1
0
1 0 1
600
3(1)
=15
1811
30
=60
815
2(1)
=7
2 3 181
4460
20
=8
39181
23
=44
3 2 39
The coefcients a
j
= f [x
0
, . . . , x
j
] are given by the upper diagonal, the interpolation
polynomial is thus
p
3
(x) =a
0
+a
1
(x x
0
) +a
2
(x x
0
)(x x
1
) +a
3
(x x
0
)(x x
1
)(x x
2
)
=1+15x(x +1) +7x(x +1)(x 3).
Now suppose we add another data point (4, 801). This amounts to adding only one new
term to the polynomial. The new coefcient a
4
= f [x
0
, . . . , x
4
] is calculated by adding a
new line at the bottom of Table (6.0.4) as follows:
f
4
=801, f [x
3
, x
4
] =140, f [x
2
, x
3
, x
4
] =96, f [x
1
, . . . , x
4
] =22, f [x
0
, . . . , x
4
] =a
4
=3.
The updated polynomial is therefore
p
4
(x) =1+15x(x +1) +7x(x +1)(x 3) +3x(x +1)(x 3)(x +2).
Evaluating this polynomial can be done conveniently using Horners method,
1+x(0+(x +1)(15+(x 3)(7+3(x +2)))),
using only four multiplications.
Another thing to notice is that the order of the x
i
plays a role in assembling the
Newton interpolation polynomial, while the order did not play a role in Lagrange in-
terpolation. Recall the characterization of the interpolation polynomial in terms of the
Vandermonde matrix from Lecture 3. The coefcients a
i
of the Newton divided differ-
ence form can also be derived as the solution of a system of linear equations, this time
in convenient triangular form:

f
0
f
1
f
2
.
.
.
f
n

1 0 0 0
1 x
1
x
0
0 0
1 x
2
x
1
(x
2
x
0
)(x
2
x
1
) 0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1 x
n
x
0
(x
n
x
0
)(x
n
x
1
)

j <n
(x
n
x
j
)

a
0
a
1
a
2
.
.
.
a
n

,
Newton interpolation polynomial:
p(x) =f [x
0
] + f [x
0
, x
1
](x x
0
) +
+ f [x
0
, . . . , x
n
](x x
0
) (x x
n1
).
Divided differences:
f [x
i
] := f (x
i
),
f [x
i
, x
i +1
] :=
f [x
i +1
] f [x
i
]
x
i +1
x
i
,
f [x
i
, x
i +1
, . . . , x
i +k
] : =
f [x
i +1
, x
i +2
, . . . , x
i +k
] f [x
i
, x
i +2
, . . . , x
i +k1
]
x
i +k
x
i
.
Computing the divided differences:
x
0
f
0
f [x
0
, x
1
]
x
1
f
1
f [x
0
, x
1
, x
2
]
f [x
1
, x
2
] f [x
0
, x
1
, x
2
, x
3
]
x
2
f
2
f [x
1
, x
2
, x
3
]
f [x
2
, x
3
]
x
3
f
3
1
Lecture 7
Integration and Quadrature
We are interested in the problem of computing an integral

b
a
f (x) dx.
If possible, one computes the antiderivative F(x) (that is, the function such that F

(x) =
f (x)) and then obtains the integral as F(b) F(a). However, it is not always possible to
compute the antiderivative, as in the cases

1
0
e
x
2
dx,

0
cos(x
2
) dx.
The problemwe are facing then is to approximate such integrals numerically as good as
possible.
EXAMPLE 7.1. The Trapezium Rule seeks to approximate the integral, interpreted
as area under the curve, by the area of a trapeziumdened by the graph of the function.
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
One step Trapezium Rule
Assume we want to approximate the integral between x
0
and x
1
, and that x
1
x
0
= h.
Then the trapezium approximation is given by

x
1
x
0
f (x) dx I ( f ) =
h
2
( f (x
0
) + f (x
1
)),
1
2 LECTURE 7
as canbe easily veried. The TrapeziumRule canbe interpreted as integrating the linear
interpolant of f at the points x
0
and x
1
. The linear interpolant is given as
p
1
(x) =
x x
1
x
0
x
1
f (x
0
) +
x x
0
x
1
x
0
f (x
1
)
Integrating this function gives rise to the representation as area of a trapezium:

x
1
x
0
p
1
(x) dx =
h
2
( f (x
0
) + f (x
1
)).
Using the interpolationerror, we canderive the integrationerror for the TrapeziumRule.
We claim that

x
1
x
0
f (x) dx =

x
1
x
0
p
1
(x) dx
1
12
h
3
f

()
for some (x
0
, x
1
). To derive this, recall that the interpolation error is given by
f (x) =p
1
(x) +
(x x
0
)(x x
1
)
2!
f

((x))
for some (x
0
, x
1
). We can therefore write the integral as

x
1
x
0
f (x) dx =

x
1
x
0
p
1
(x) dx +
1
2

x
1
x
0
(x x
0
)(x x
1
) f

((x)) dx.
By the Integral Mean Value Theorem, there exists a (x
0
, x
1
) such that

x
1
x
0
(x x
0
)(x x
1
) f ((x)) dx = f

()

x
1
x
0
(x x
0
)(x x
1
) dx.
Writing out the integral on the right-hand side, we get

x
1
x
0
(x x
0
)(x x
1
) dx =

x
1
x
0
(x
2
(x
0
+x
1
)x +x
0
x
1
) dx
=
1
3
x
3

1
2
(x
0
+x
1
)x
2
+x
0
x
1
x|
x
1
x
0
=
=
1
6
(x
0
x
1
)
3
=
1
6
h
3
.
For the whole expression we therefore get

x
1
x
0
f (x) dx =

x
1
x
0
p
1
(x) dx
1
12
h
3
f

(),
as claimed.
EXAMPLE 7.2. Lets compute the integral

2
1
1
1+x
dx.
The antiderivative is ln(1 +x), so we get the exact expression ln(1.5) 0.4053 for this
integral. Using the Trapezium Rule we get
I =
21
2
( f (1) + f (2)) =
1
2

1
2
+
1
3

=
5
12
=0.4167
to four signicant gures.
Lecture 8
The trapezium rule is an example of a quadrature rule. A quadrature rule seeks to
approximate an integral as a weighted sum of function values

b
a
f (x) dx
n

k=0
w
k
f (x
k
),
where the x
k
are the quadrature nodes and the w
k
are called the quadrature weights.
A Newton-Cotes scheme of order n uses the Lagrange basis functions to construct
the interpolation weights. Given nodes x
k
= a +kh, 0 k n, where h =(b a)/n, the
integral is approximated by the integral of the Lagrange interpolant of degree n at these
points. If
p
n
=
n

k=0
L
k
f (x
k
),
then

b
a
f (x) dx I
n
( f ) :=

b
a
p
n
(x) dx =
n

k=0
w
k
f (x
k
),
where w
k
=

b
a
L
k
(x) dx.
EXAMPLE 8.1. The Newton-Cotes rule of degree n = 2 is called Simpsons rule. Let
x
0
= a, x
2
=b and x
1
=(a +b)/2. Dene h := x
1
x
0
=(b a)/2. The quadratic interpo-
lation polynomial is given by
p
2
=
(x x
1
)(x x
2
)
(x
0
x
1
)(x
0
x
2
)
f (x
0
) +
(x x
0
)(x x
2
)
(x
1
x
0
)(x
1
x
2
)
f (x
1
) +
(x x
1
)(x x
0
)
(x
2
x
0
)(x
2
x
1
)
f (x
2
).
We claim that
I
2
( f ) =

b
a
p
2
(x) dx =
h
3
( f (x
0
) +4f (x
1
) + f (x
2
)).
To show this, we make the substitution x = a +h for (0, 2). Then the integral be-
comes

b
a
p
2
(x) dx =h

2
0
(1)(2)
(h)(2h)
h
2
f (x
0
) +
(2)
h(h)
h
2
f (x
1
) +
(1)
(2h)h
h
2
f (x
2
) dx
=
=
h
2

2
3
f (x
0
) +
8
3
f (x
1
) +
2
3
f (x
2
)

=
h
3
( f (x
0
) +4f (x
1
) + f (x
2
)).
This shows the claim. As in the case of the trapezium rule, a detailed error analysis
shows that

b
a
f (x) dx = I
2
( f )
1
2880
(b a)
5
f
(4)
()
1
2 LECTURE 8
for some (a, b). From this derive the error bound
E
2
( f ) =

b
a
f (x) dx I
2
( f )

1
2880
(b a)
5
M
4
,
where M
4
is an upper bound on the absolute value of the fourth derivative on the inter-
val [a, b].
0 0.2 0.4 0.6 0.8 1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
One step Trapezium Rule
0 0.2 0.4 0.6 0.8 1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
One step Simpson Rule
EXAMPLE 8.2. For the function f (x) = 1/(1 +x), Simpsons rule gives the approxi-
mation
I
2
( f ) =
1
6

1
2
+
8
5
+
1
3

0.4056.
This is much closer to the true value 0.4055 than what the trapezium rule provides.
Composite integration rules
The trapezium rule uses only two points to approximate an integral, certainly not
enough for most applications. There are different ways to make use of more points and
function values in order to increase precision. One way, as we have just seen with the
Newton-Cotes scheme and Simpsons rule, is to use higher-order interpolants. A dif-
ferent direction is to subdivide the interval into smaller intervals and use lower-order
schemes, like the trapezium rule, on these smaller intervals. For this, we subdivide the
integral

b
a
f (x) dx =
n1

j =0

x
j +1
x
j
f (x) dx,
where x
0
=a, x
j
=a+j h for 0 j n. The composite trapeziumrule then approximates
each of these integrals using the trapezium rule:

b
a
f (x) dx
h
2
( f (x
0
) + f (x
1
)) +
h
2
( f (x
1
) + f (x
2
)) + +
h
2
( f (x
n1
) + f (x
n
))
=h

1
2
f (x
0
) + f (x
1
) + f (x
n1
) +
1
2
f (x
n
)

.
EXAMPLE 8.3. Lets look again at the function f (x) = 1/(1+x) and apply the com-
posite trapezium rule with h =0.1 on the intervale [1, 2] (that is, with n =10). Then

2
1
1
1+x
dx 0.1

1
4
+
1
2.1
+ +
1
2.9
+
1
6

=0.4066.
COMPOSITE INTEGRATION RULES 3
Recall that the exact integral was 0.4055.
For the error analysis, we claim that

b
a
f (x) dx =h

1
2
f (x
0
) + f (x
1
) + + f (x
n1
) +
1
2
f (x
n
)

1
12
h
2
(b a) f

()
for some (a, b). To see this, note that from the error analysis of the trapezium rule
we get, for some
j
(x
j
, x
j +1
),

b
a
f (x) dx =
n1

j =0

x
j +1
x
j
f (x) dx
1
12
h
3
f

(
j
)

=
1
2
f (x
0
) + f (x
1
) + + f (x
n1
) +
1
2
f (x
n
)
1
12
h
3
n1

j =0
f

(
j
).
Clearly the values f

(
j
) lie between the minimum and maximum of f

on the interval
(a, b), and so their average is also bounded by
min
x[a,b]
f

(x)
1
n
n1

j =0
f (
j
) max
x[a,b]
f

(x).
By the Mean Value Theorem, there exists a (a, b) such that the average above is at-
tained:
1
n
n1

j =0
f (
j
) = f

().
Therefore,

1
12
h
3
n1

j =0
f

(
j
) =
1
12
h
2
(nh) f

() =
1
12
h
2
(b a) f

(),
where we used that h =(b a)/n. This is the claimed expression for the error.
EXAMPLE 8.4. Consider the function f (x) =e
x
/x and the integral

2
1
e
x
x
dx.
What choice of parameter h will ensure that the approximation error of the composite
trapezium rule will by below 10
5
? Let M
2
denote an upper bound on the second deriv-
ative of f (x). The approximation error for the composite trapeziumrule with steplength
h is bounded by
E( f )
1
12
(b a)h
2
M
2
.
We can nd out M
2
by calculating the derivatives of f :
f

(x) =e
x

1
x
+
1
x
2

f

(x) =e
x

1
x
+
2
x
2
+
2
x
3

.
The second derivative f

(x) has a maximum at x =1, and the value is M
2
1.84. In the
interval [1, 2] we therefore have the bound
E( f )
1
12
1.84h
2
(21) =0.1533h
2
.
If we choose, for example, h =0.005 (this corresponds to taking n =200 steps), then the
error is bounded by 3.83 10
6
.
Some Quadrature Rules for

b
a
f (x) dx.
Trapezium rule:
T( f ) =
(b a)
2
( f (a) + f (b)).
Error bound:
1
12
(b a)
3
M
2
Simpsons rule:
S( f ) =
(b a)
6
( f (a) +4f ((a +b)/2) + f (b)).
Error bound:
1
2880
(b a)
5
M
4
Composite trapezium (h =(b a)/n, x
j
=a + j h):
T
c
( f ) =h

1
2
f (x
0
) + f (x
1
) + + f (x
n1
) +
1
2
f (x
n
)

.
Error bound:
1
12
h
2
(b a)M
2
Composite Simpson (h =(b a)/2m, x
j
=a + j h):
S
c
( f ) =
h
3
( f (x
0
) +4f (x
1
) +2f (x
2
) +
+2f (x
2m2
) +4f (x
2m1
) + f (x
2m
)).
Error bound:
1
180
h
4
(b a)M
4
M
k
denotes an upper bound on | f
(k)
(x)|, the k-the derivative of f on
the interval [a, b]. The error bounds are derived from the corresponding
error bounds of the interpolation polynomials of f .
1
Error Analysis of Simpsons Rule
Simpsons rule approximates the integral

b
a
f (x) dx by integrating the quadratic
interpolation polynomial of f (x) at the points a, (a +b)/2, b.
THEOREM 1.1. Let f C
4
([a, b], R). Then there exists a (a, b) such that
E( f ) :=

b
a
f (x) dx
b a
6

f (a) +4f ((a +b)/2) + f (b)

=
(b a)
5
2880
f
(4)
().
PROOF. The proof is based on Chapter 7 of Sli and Mayers, An Introduction to Nu-
merical Analysis. Consider the change of variable
x(t ) =
a +b
2
+
b a
2
t , t [1, 1].
Dene F(t ) = f (x(t )). In terms of this function, the integration error is written as

b
a
f (x) dx
b a
6
( f (a)+4f ((a+b)/2)+f (b)) =
b a
2

1
1
F() d
1
3
(F(1) +4F(0) +F(1))

.
Dene
G(t ) =

t
t
F() d
t
3
(F(t ) +4F(0) +F(t ))
for t [1, 1]. In particular, (b a)G(1)/2 is the integration error we are trying to esti-
mate. Consider the function
H(t ) =G(t ) t
5
G(1).
Notice that the third derivative H
(3)
(0) = 0, so applying Rolles Theorem repeatedly we
nd that there exists a (0, 1) such that
H
(3)
() =0.
Note that the third derivative of G is given by G
(3)
(t ) =
t
3
(F
(3)
(t )F
(3)
(t )), fromwhich
it follows that
H
(3)
() =

3
(F
(3)
() F
(3)
()) 60
2
G(1).
By the Mean Value Theorem, there exists a (, ) such that
H
(3)
() =
2
2
3
(F
(4)
() +90G(1)).
Since H
(3)
() =0 and =0, it follows that
G(1) =
1
90
F
(4)
() =
(b a)
4
1440
f
(4)
().
This nishes the proof.
1
Lecture 9
Composite integration rules subdivide the interval of integration and apply a quad-
rature rule to each interval:

b
a
f (x) dx =
n1

j =0

x
j +1
x
j
f (x) dx,
where x
0
=a, x
j
=a+j h for 0 j n. The composite trapeziumrule then approximates
each of these integrals using the trapezium rule:

b
a
f (x) dx h

1
2
f (x
0
) + f (x
1
) + + f (x
n1
) +
1
2
f (x
n
)

.
To derive the composite version of Simpsons rule, we subdivide the interval [a, b]
into 2m intervals and set h =(b 2)/2m, x
j
=a + j h, 0 j 2m. Then

b
a
f (x) dx =
m

j =1

x
2j
x
2j 2
f (x) dx.
Applying Simpsons rule to each of the integrals, we arrive at the expression
h
3

f (x
0
) +4f (x
1
) +2f (x
2
) + +2f (x
2m2
) +4f (x
2m1
) + f (x
2m
)

Using an error analysis similar to the case of the composite trapezium rule, one obtains
an error bound
b a
180
h
4
M
4
,
where M
4
is an upper bound on the absolute value of the fourth derivative of f on [a, b].
We restrict our error analysis to the composite Trapezium rule, the analysis for
Simpsons rule is similar but more involved. We claim that

b
a
f (x) dx =h

1
2
f (x
0
) + f (x
1
) + + f (x
n1
) +
1
2
f (x
n
)

1
12
h
2
(b a) f

()
for some (a, b). To see this, note that from the error analysis of the trapezium rule
we get, for some
j
(x
j
, x
j +1
),

b
a
f (x) dx =
n1

j =0

x
j +1
x
j
f (x) dx
1
12
h
3
f

(
j
)

=
1
2
f (x
0
) + f (x
1
) + + f (x
n1
) +
1
2
f (x
n
)
1
12
h
3
n1

j =0
f

(
j
).
1
2 LECTURE 9
Clearly the values f

(
j
) lie between the minimum and maximum of f

on the interval
(a, b), and so their average is also bounded by
min
x[a,b]
f

(x)
1
n
n1

j =0
f (
j
) max
x[a,b]
f

(x).
By the Mean Value Theorem, there exists a (a, b) such that the average above is at-
tained:
1
n
n1

j =0
f (
j
) = f

().
Therefore,

1
12
h
3
n1

j =0
f

(
j
) =
1
12
h
2
(nh) f

() =
1
12
h
2
(b a) f

(),
where we used that h =(b a)/n. This is the claimed expression for the error.
EXAMPLE 9.1. Having an error of order h
2
means that, every time we halve the step-
size (or, equivalently, double the number of points), the error decreases by a factor of 4.
More precisely, for n =2
k
we get an error E( f ) 2
2k
. Looking at the example function
f (x) = 1/(1+x) and applying the composite Trapezium rule, we get the following rela-
tionship between the logarithm of the number of points logn and the logarithm of the
error. The tted line has a slope of 1.9935 2, as expected from the theory.
0 1 2 3 4 5 6
20
18
16
14
12
10
8
6
log of number of steps
l
o
g

o
f

e
r
r
o
r
Error for composite trapezium rule
Summarising, we have seen the following integration schemes with their corre-
sponding error bounds:
Trapezium:
1
12
(b a)
3
M
2
Composite Trapezium:
1
12
h
2
(b a)M
2
Simpson:
1
2880
(b a)
5
M
4
Composite Simpson:
1
180
h
4
(b a)M
4
.
NUMERICAL LINEAR ALGEBRA 3
We conclude the section on quadrature with a denition of the order of precision of
a quadrature rule.
DEFINITION 9.0.1. A quadrature rule I ( f ) has degree of precision k, if it evaluates
polynomials of degree at most k exactly. That is,
I (x
j
) =

b
a
x
j
dx =
1
j +1
(b
j +1
a
j +1
), 0 j k.
For example, it is easy to show that the Trapezium rule has degree of precision 1 (it
evaluates 1 and x exactly), while Simpsons rule has degree of precision 3 (rather than 2
as expected!). In general, Newton-Cotes quadrature of degree n has degree of precision
n if n is odd, and n +1 if n is even.
Numerical Linear Algebra
Problems in numerical analysis can often be formulated in terms of linear algebra.
For example, the discretization of partial differential equations leads to problems in-
volving large systems of linear equations. The basic problemin linear algebra is to solve
a system of linear equations
(9.0.1) Ax =b,
where
A =

a
11
a
1n
.
.
.
.
.
.
.
.
.
a
m1
a
mn

is an mn matrix with real numbers as entries, and


x =

x
1
.
.
.
x
n

, b =

b
1
.
.
.
b
m

are vectors. We will often deal with the case m=n.


There are two main classes of methods for solving such systems.
(1) Direct methods attempt to solve (9.0.1) using a nite number of operations. An
example is the well-known Gaussian elimination algorithm.
(2) Iterative methods generate a sequence x
0
, x
1
, . . . of vectors in the hope that x
k
converges to a solution x of (9.0.1) as k .
Direct methods generally work well for dense matrices and moderately large n. Iterative
methods work well with sparse matrices, that is, matrices with very fewnon-zero entries
a
i j
, and large n.
An mn matrix A is given by
A =

a
11
a
1n
.
.
.
.
.
.
.
.
.
a
m1
a
mn
,

where a
i j
denotes the entry in the i -th row and j -
th column. The transpose A

is the matrix with en-


tries a

i j
:= a
j i
. It is the matrix A mirrored on the
diagonal from top left to bottom right.
Vectors and their transpose are given by
b =

b
1
.
.
.
b
n

, b

b
1
b
n

and will occasionally be interpreted as n 1 and


1n matrices.
The product of an mp matrix A with a p n
matrix B,
C = AB,
is the mn matrix C whose i j -th entry is given by
c
i j
=
p

k=1
a
i k
b
k j
.
The number of columns of A has to equal the num-
ber of rows of B for this denition to make sense.
The sum and difference of matrices of the same
size are dened component wise.
The nn matrix 1 is the matrix with 1 on the di-
agonal and 0 elsewhere, while 0 is the matrix con-
sisting of only zeros. The matrix 1 satises 1A = A
1
2
and A1 = A, whenever the dimensions are such
that this is dened.
A system of linear equations
a
11
x
1
+ +a
1n
x
n
=b
1
.
.
.
a
m1
x
1
+ +a
mn
x
n
=b
m
is written as a matrix vector product
(0.0.1) Ax =b,
where the mn matrix A is dened as above, and
x R
n
, b R
m
are (column) vectors.
If n =m (that is, the matrix A is square), thenthe
system (0.0.1) has a unique solution if and only if
the matrix A is invertible, that is, det(A) =0, where
det is the determinant. If A is not invertible, it is
called singular.
If A is invertible, there exists a matrix A
1
(the
inverse) such that
AA
1
= A
1
A =1.
The solution of (0.0.1) is then given by x = A
1
b.
Lecture 10
There are different approaches towards solving a system of linear equations
(10.0.1) Ax =b,
where A is an mn matrix.
(1) Direct methods attempt to solve (10.0.1) using a nite number of operations.
An example is the well-known Gaussian elimination algorithm.
(2) Iterative methods generate a sequence x
0
, x
1
, . . . of vectors in the hope that x
k
converges to a solution x of (10.0.1) as k .
Direct methods generally work well for dense matrices and moderately large n. Iterative
methods work well with sparse matrices, that is, matrices with very fewnon-zero entries
a
i j
, and large n. The following example illustrates howsparse matrices arise in practise.
EXAMPLE 10.1. Consider the differential equation
u
xx
= f (x)
with boundary conditions u(0) = u(1) = 0, where u is a twice differentiable function
on [0, 1], and u
xx
=
2
u/x
2
denotes the second derivative in x. We can discretize the
interval [0, 1] by setting x =1/(n +1), x
j
= j x, and
u
j
:=u(x
j
), f
j
:= f (x
j
).
The second derivative can be approximated by a nite difference
u
xx

u
i 1
2u
i
+u
i +1
(x)
2
.
At any one point x
j
, the differential equation thus translates to

u
i 1
2u
i
+u
i +1
(x)
2
= f
j
.
Making use of the initial conditions u(0) =u(1) =0, we get the system of equations

1
(x)
2

2 1 0 0 0 0
1 2 1 0 0 0
0 1 2 1 0 0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
0 0 0 0 2 1
0 0 0 0 1 2

u
1
u
2
u
3
u
4
.
.
.
u
n1
u
n

f
1
f
2
f
3
.
.
.
f
n1
f
n

.
The matrix is very sparse, it has only 3n 2 non-zero entries, out of n
2
possible! This
formis typical for matrices arising frompartial differential equations, and is well-suited
for iterative methods that exploit the specic structure of the matrix.
1
2 LECTURE 10
In the following, we assume our matrices to be square (m=n). A template for iter-
atively solving a linear system can be derived as follows. Write the matrix A as a differ-
ence A = A
1
A
2
, where A
1
and A
2
are somewhat simpler to handle than the original
matrix. Then the system of equations (10.0.1) can be written as
A
1
x = A
2
x +b.
This motivates the following approach: start with a vector x
0
and successively compute
x
k
from x
k+1
by solving
(10.0.2) A
1
x
k+1
= A
2
x
k
+b.
EXAMPLE 10.2. (Jacobis Method) Decompose the matrix A as
A =L +D+U,
where
L =

0 0 0 0 0
a
21
0 0 0 0
a
31
a
32
0 0 0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
a
n11
a
n12
a
n13
0 0
a
n1
a
n2
a
n3
a
nn1
0

, U =

0 a
12
a
13
a
1n1
a
1n
0 0 a
23
a
2n1
a
2n
0 0 0 a
3n1
a
3n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
0 0 0 0 a
n1n
0 0 0 0 0

are the lower and upper triangular parts, and


D =diag(a
11
, . . . , a
nn
) :=

a
11
0 0
0 a
22
0
.
.
.
.
.
.
.
.
.
.
.
.
0 0 a
nn

is the diagonal part. Jacobis method chooses A


1
= D and A
2
= (L +U). The corre-
sponding iteration (10.0.2) then looks like Dx
k+1
=(L +U)x
k
+b, or solving for x
k+1
,
(10.0.3) x
k+1
=D
1
(b (L +U)x
k
).
Note that since D is diagonal, it is easy to invert: just invert the individual entries. For
a concrete example, take the following matrix with its decomposition in diagonal and
off-diagonal part:
A =

2 1
1 2

2 0
0 2

0 1
1 0

.
Since

2 0
0 2

1
=

1/2 0
0 1/2

=
1
2

1 0
0 1

,
we get the iteration scheme
x
k+1
=

0 1/2
1/2 0

x
k
+
1
2
b.
We can also write the iteration (10.0.3) in terms of individual entries. If we denote
x
k
:=

x
(k)
1
.
.
.
x
(k)
n

,
LECTURE 10 3
i.e., we write x
(k)
i
for the i -th entry of the k-th iterate, then the iteration (10.0.3) becomes
(10.0.4) x
(k+1)
i
=
1
a
11

b
i

j =i
a
i j
x
(k)
j

.
for 1 i n.
EXAMPLE 10.3. (Gauss-Seidel) In the Gauss-Seidel method, we use a different de-
composition, namely A
1
=D+L and A
2
=U. This leads to the following system
(10.0.5) (D+L)x
k+1
=Ux
k
+b.
Though the right-hand side is not diagonal, as in the Jacobi method, the system is still
easily solvedfor x
k+1
when x
k
is given. To derive the entry-wise formula for this method,
we take a closer look at (10.0.5)

a
11
0 0
a
21
a
22
0
.
.
.
.
.
.
.
.
.
.
.
.
a
n1
a
n2
a
nn

x
(k+1)
1
x
(k+1)
2
.
.
.
x
(k+1)
n

b
1
b
2
.
.
.
b
n

0 a
12
a
1n
0 0 a
2n
.
.
.
.
.
.
.
.
.
.
.
.
0 0 0

x
(k)
1
x
(k)
2
.
.
.
x
(k)
n

.
Writing out the equations, we get
a
11
x
(k+1)
1
=b
1

a
12
x
(k)
2
+ +a
1n
x
(k)
n

a
21
x
(k+1)
1
+a
22
x
(k+1)
2
=b
2

a
23
x
(k)
3
+ +a
2n
x
(k)
n

.
.
.
a
i 1
x
(k+1)
1
+ +a
i i
x
(k+1)
i
=b
i

a
i i +1
x
(k)
i +1
+ +a
i n
x
(k)
n

Rearranging this, we get the formula


x
(k+1)
i
=
1
a
i i

b
i

j <i
a
i j
x
(k+1)
j

j >i
a
i j
x
(k)
j

for the k+1-th iterate of x


i
. Note that in order to compute the k+1-th iterate of x
i
, we al-
ready use values of the k +1-th iterate of x
j
for j <i . Note how this differs from (10.0.4),
where we only resort to the k-th iterate. Both methods have their advantages and dis-
advantages. While Gauss-Seidel may require less storage (we can overwrite each x
(k)
i
by
x
(k+1)
i
as we dont need the old value subsequently), Jacobis method can be used easier
in parallell (that is, all the x
(k+1)
i
can be computed by different processors for each i ).
Lecture 11
Vector and Matrix Norms
In order to study convergence of iterative methods, we need to be able to measure
distances between vectors and matrices.
Denition 11.0.1. A vector norm on R
n
is a real-valued function that satises the
following conditions:
(1) For all x R
n
: x 0 and x =0 if and only if x =0.
(2) For all R: x =|| x.
(3) For x, y R
n
:

x +y

x+

(Triangle Inequality).
Example 11.1. The typical examples are the following:
(1) The 2-norm
x
2
=

i =1
x
2
i
=

1/2
.
This is just the usual notion of Euclidean length.
(2) The 1-norm
x
1
=
n

i =1
|x
i
|.
(3) The -norm
x

= max
1i n
|x
i
|.
A convenient way to visualise these norms is via their unit circles. If we look at the sets
{x R
2
: x
p
=1}
for p =2, 1, , then we get the following shapes:
Now that we have dened a way of measuring distances between vectors, we can
talk about convergence.
1
2 LECTURE 11
Denition 11.0.2. A sequence of vectors x
k
R
n
, k = 1, 2, . . . , converges to x R
n
with
respect to a norm, if for all >0 there exists an N >0 such that for all k N:
x
k
x <.
The following lemma implies that for the purpose of convergence, it doesnt matter
whether we take the - or the 2-norm.
Lemma 11.1. For x R
n
,
x

x
2

nx

.
PROOF. Let M :=x

=max
1i n
|x
i
|. Note that
x
2
=M

i =1
x
2
i
M
2
1
2
M

n,
because x
2
i
/M
2
1. This shows the second inequality. For the rst one, note that there
is an i such that M =|x
i
|. It follows that
M

i =1
x
2
i
M
2
1
2
M =x

.
This completes the proof.
We note that a similar relationship can be shown involving the 1-norm.
Corollary 11.1. Convergence in the 2-norm is equivalent to convergence in the -norm.
That is, if x
k
x with respect to the -norm, then x
k
x with respect to the 2-normand
vice versa.
Example 11.2. Lets look at the vector
x =

1
1
1

.
The different norms are x
2
=

3, x
1
=3, x

=1.
For the behaviour of algorithms that deal with matrices we not only need to quan-
tify the size of vectors, but also the size by which these are modied by the operation of
a matrix on the vector. This leads us to study matrix norms.
Denition 11.0.3. A matrix norm is a non-negative function on the set of real n n
matrices such that, for every n n matrix A,
(1) A 0 and A =0 if and only if A =0.
(2) For all R, A =|| A.
(3) For all n n matrices A, B: A+B A+B.
(4) For all n n matrices A, B: AB AB.
Note that parts 1-3 just state that a matrix norm also is a vector norm, if we think
of the matrix as a vector. Part 4 of the denition has to do with the matrix-ness of a
matrix.
Example 11.3. The Frobenius norm of a matrix A is dened as
A
F
=


1i , j n
a
2
i j

1/2
.
It is just the norm of A considered as a vector. It is also a vector norm.
VECTOR AND MATRIX NORMS 3
The most useful class of matrix norms are the operator norms induced by a vector
norm.
Denition11.0.4. Givena vector norm, the corresponding operator normof annn
matrix A is dened as
A =max
x=0
Ax
x
= max
xR
n
x=1
Ax.
To see that the two formulations are equivalent, note that if we set =1/x, then
x =|| x =1, and
Ax
x
=A(x),
so taking the maximumover all vectors and dividing by their normis the same as taking
the maximum over all unit-norm vectors.
Lecture 12
Denition 12.0.1. A matrix norm is a non-negative function on the set of real n n
matrices such that, for every n n matrix A,
(1) A 0 and A =0 if and only if A =0.
(2) For all R, A =|| A.
(3) For all n n matrices A, B: A+B A+B.
(4) For all n n matrices A, B: AB AB.
Note that parts 1-3 just state that a matrix norm also is a vector norm, if we think
of the matrix as a vector. Part 4 of the denition has to do with the matrix-ness of
a matrix. The most useful class of matrix norms are the operator norms induced by a
vector norm.
Denition12.0.2. Givena vector norm, the corresponding operator normof annn
matrix A is dened as
A =max
x=0
Ax
x
= max
xR
n
x=1
Ax.
First, we have to verify that this is indeed a matrix norm.
Theorem 12.1. Given a vector norm , the corresponding operator norm is a matrix
norm.
PROOF. Properties 1-3 are easy to verify from the corresponding properties of the
vector norms. For example, A 0 because by the denition, there is no way it could
be negative. To show the property 4, namely,
AB AB
for n n matrices A and B, we rst note that for any y R
n
,

Ay

max
x=0
Ax
x
=A,
and therefore

Ay

.
Now let y =Bx for some x with x =1. Then
ABx ABx AB.
As this inequality holds for all unit-norm x, it also holds for the vector that maximises
ABx, and therefore we get
AB = max
x=1
ABx AB.
This completes the proof.
1
2 LECTURE 12
The interpretationis that the operator normmeasures howmucha vector x is being
stretched by a matrix A.
Even though the operator norms with respect to the various vector norms are of
immense importance in the analysis of numerical methods, they are hard to compute
or even estimate from their denition alone. It is therefore useful to have alternative
characterisations of these norms. The rst of these characterisations is concerned with
the norms
1
and

, and provides an easy criterion to compute these.


Lemma 12.1. For an n n matrix A, the operator norms with respect to the 1-norm and
the -norm are given by
A
1
= max
1j n
n

i =1
|a
i j
| (maximum column sum) ,
A

= max
1i n
n

j =1
|a
i j
| (maximum row sum) .
PROOF. We will proof this for the -norm. We rst show the inequality A


max
1i n

n
j =1
|a
i j
|. Let x be a vector suchthat x

=1. That means that all the entries


have absolute value |x
i
| 1. It follows that
Ax

= max
1i n

j =1
a
i j
x
j

max
1i n
n

j =1
|a
i j
x
j
|
max
1i n
n

j =1
|a
i j
|,
where the inequality follows fromwriting out the matrix vector product, interpreting the
-norm, using the triangle inequality for the absolute value, and the fact that |x
j
| 1
for 1 j n. Since this holds for arbitrary x with x

= 1, it also holds for the vector


that maximises max
x

=1
Ax

=A

, which concludes the proof of one directions.


In order to show A

max
1i n

n
j =1
|a
i j
|, let i be the index at which the maxi-
mum of the sum is attained:
max
1i n
n

j =1
|a
i j
| =
n

j =1
|a
i j
|.
Choose y to be the vector with entries y
j
=1 if a
i j
>0 and y
j
=1 if a
i j
<0. This vector
satises

=1 and, moreover,
n

j =1
y
j
a
i j
=
n

j =1
|a
i j
|,
by the choice of the y
j
. We therefore have
A

= max
x

=1
Ax

Ay

=
n

j =1
|a
i j
| = max
1i n
n

j =1
|a
i j
|.
This nishes the proof.
LECTURE 12 3
Example 12.1. Consider the matrix
A =

7 3 1
2 4 5
4 6 0

.
The operator norms with respect to the 1- and -norm are
A
1
=max{13, 13, 6} =13, A

=max{11, 11, 10} =11.


Howdo we characterise the matrix 2-normA
2
of a matrix? The answer is in terms
of the eigenvalues of A. Recall that a (possibly complex) number is an eigenvalue of
A, with associated eigenvector u, if
Au =u.
Denition 12.0.3. The spectral radius of A is dened as
(A) =max{|| : eigenvalue of A}.
Theorem12.2. For an n n matrix A we have
A
2
=

(A

A).
Example 12.2. Let
A =

1 0 2
0 1 1
1 1 1

.
The eigenvalues are the roots of the characteristic polynomial
p() =det(A1) =det

1 0 2
0 1 1
1 1 1

.
Evaluating this determinant, we get the equation
(1)(
2
2+4) =0.
The solutions are given by
1
=1 and
2,3
=1

3i . The spectral radius of A is therefore


(A) =max{1, 2, 2} =2.
Lecture 13
Recall that the spectral radius of a matrix A is dened as
(A) =max{|| : eigenvalue of A}.
The following theorem characterises the matrix 2-norm.
Theorem13.1. For an n n matrix A we have
A
2
=

(A

A).
For symmetric matrices, that is, matrices such that A

= A, the 2-norm is just the


spectral radius.
Lemma 13.1. If A is symmetric, then A
2
=(A).
PROOF. Let be an eigenvalue of A with corresponding eigenvector u, so that
Au =u.
Then
A

Au = A

u =A

u =Au =
2
u.
It follows that
2
is eigenvalue of A

A with corresponding eigenvector u. In particular,


A
2
2
=(A

A) =max{
2
: eigenvalue of A} =(A)
2
.
Taking square roots on both sides, the claim follows.
Example 13.1. We compute the eigenvalues, and thus the spectral radius and the 2-
norm, of the nite difference matrix
A =

2 1 0 0 0 0
1 2 1 0 0 0
0 1 2 1 0 0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
0 0 0 0 2 1
0 0 0 0 1 2

.
Let h =1/(n +1). We rst claim that the vectors u
k
, 1 k n, dened by
u
k
=

sin(kh)
.
.
.
sin(nkh)

are the eigenvectors of A, with corresponding eigenvalues

k
=2(1cos(kh)).
1
2 LECTURE 13
This can be veried by checking that
Au
k
=
k
u
k
.
In fact, for 2 k n1, the left-hand side of the j -th entry of the above product is given
by
2sin( j kh) sin(( j 1)kh) sin(( j +1)kh).
Using the trigonometric identity sin(x +y) = sin(x) cos(y) +cos(x) sin(y), we can write
this as
2sin( j kh) (cos(kh) sin( j kh) cos( j kh) sin(kh))
(cos(kh) sin( j kh) +cos( j kh) sin(kh))
=2(1cos(kh)) sin( j kh).
Now sin( j kh) is just the j -th entry of u
k
as dened above, so the coefcient in front
must equal the corresponding eigenvalue. The argument for k =1 and k =n is similar.
The spectral radius is the maximum modulus of such an eigenvalue,
(A) = max
1kn
|
k
| =2

1cos

n
n +1

.
As the matrix A is symmetric, this is also equal to the matrix 2-norm A:
A
2
=2

1cos

n
n +1

.
Convergence of Iterative Algorithms
In this section we focus on algorithms that attempt to solve a system of equations
(13.0.1) Ax =b
by starting with some vector x
0
and then successively computing a sequence x
k
, k 1,
by means of a rule
(13.0.2) x
k+1
=Tx
k
+c
for some matrix T and vector c.
Example 13.2. The Jacobi and Gauss-Seidel methods fall into this framework. Recall
the decomposition
A =L +D+U,
where L is the lower triangular, D the diagonal and U the upper triangular part. Then
the Jacobi method corresponds to the choice
T =T
J
=D
1
(L +U), c =D
1
b,
while the Gauss-Seidel method corresponds to
T =T
GS
=(L +D)
1
U, c =(L +D)
1
b,
Lemma 13.2. Let T and c be the matrices in the iteration scheme (13.0.2) corresponding
to either the Jacobi method or the Gauss-Seidel method, and assume that D and L+++D are
invertible. Then x is a solution of the systemof equations (13.0.1) if and only if x is a xed
point of the iteration (13.0.2), that is,
x =Tx +c.
CONVERGENCE OF ITERATIVE ALGORITHMS 3
PROOF. We write down the proof for the case of Jacobis method, the Gauss-Seidel
case being similar. We have
Ax =b =(L +D+U)x =b Dx =(L +U)x +b
x =D
1
(L +U)x +D
1
b
x =Tx +c.
This shows the claim.
The problem of solving Ax = b is thus reduced to the problem of nding a xed
point to an iteration scheme. The following important result shows how to bound the
distance of an iterate x
k
from the solution x in terms of the operator norm of T and an
initial distance of x
0
.
Theorem13.2. Let x be a solution of Ax =b, and x
k
, k 0, be a sequence of vectors such
that
x
k+1
=Tx
k
+c
for an n n matrix T and a vector c R
n
. Then, for any vector norm and associated
matrix norm, we have
x
k+1
x T
k+1
x
0
x.
for all k 0.
PROOF. We proof this by induction on k. Recall that for every vector x, we have
Tx Tx.
Subtracting the identity x =Tx +c from x
k+1
=Tx
k
+c and taking norms we get
(13.0.3) x
k+1
x =T(x
k
x) Tx
k
x.
Setting k = 0 gives the claim of the theorem for this case. If we assume that the claim
holds for k 1, k 1, then
x
k
x T
k
x
k1
x
by this assumption, and plugging this into (13.0.3) nishes the proof.
Corollary 13.1. Assume that in addition to the assumptions of Theorem 13.2, we have
T < 1. Then the sequence x
k
, k 0, converges to the solution x of Ax = b with respect
to the chosen norm.
PROOF. Assume x
0
= x (otherwise there is nothing to prove) and let > 0. Since
T <1, T
k
0 as k . In particular, there exists an integer N >1 such that for all
k >N,
T
k
<

x
0
x
.
It follows that for k > N we have x
k
x <, which completes the convergence proof.

Note that either one of T

<1 or T
2
<1 will imply convergence with respect to
both the 2-norm and the -norm. The reason is the equivalence of norms
x

x
2

nx

,
which implies that if the sequence x
k
, k 0, converges to x with respect to one of these
norms, it also converges with respect to the other one. Such an equivalence can also be
shown between the 2- and the 1-norm.
Frequently Asked Questions
Q: Which lectures are relevant for the test?
A: Lectures 1-13.
Q: Do we have to know proofs?
A: Yes, though there are exceptions. The following will not be asked:
Proof of Newtons divided differences (Lecture 6).
Derivation of error bound for Simpsons rule (stated, but not
proven in class).
Derivation of componentwise formula for Gauss-Seidel (Lecture
10).
Derivation of the characterisation of the matrix 1- and -norm
as maximum column and row sum (Lecture 12).
Derivation of the spectral radius of the nite difference matrix
(Lecture 13).
Q: Which problems are relevant?
A: Problem sheet 1, Problem sheet 2, Sections 1-4, Problem sheet 3,
Sections 1-2.
1
2
Overviewof lecture
Preliminary material (Taylors Theorem, Intermediate and Mean
Value Theorems, Rolles Theorem, Fundamental Theorem of Al-
gebra)
(1) Effect of rounding to signicant gures, quadratic equation.
Sources of error.
(2) Efciency, Horners method.
Problem of interpolation.
Construction of Lagrange interpolation polynomial.
(3) Existence and uniqueness of interpolation.
Interpolation error.
(4) Interpolation error, application of Rolles Theorem.
Convergence problem, Runges phenomenon.
(5) The role of interpolation node distribution.
Barycentric form of Lagrange interpolation.
(6) Newtons divided differences.
Example calculations, benets.
(7) Quadrature.
Trapezium rule with error bound.
(8) Newton-Cotes rules.
Simpsons rule and error bound (without proof ).
Composite integration.
(9) Composite trapezium and Simpsons rule.
Error for composite trapezium rule.
Order of magnitude of error.
Notion of degree of precision of quadrature rule.
(10) Direct methods vs iterative methods.
Finite difference example.
Jacobis and Gauss Seidel methods.
(11) Vector norms.
Geometric interpretation.
Proof of equivalence of 2-norm and -norm.
Matrix norms.
(12) Matrix norms.
Characterization of matrix 1 and 2 norms as maximum col-
umn and row sums.
Spectral radius and matrix 2-norm.
(13) Matrix 2-norm for symmetric matrix.
Characterisation of solution of linear systemas xed point of
iteration.
Convergence of iterative methods if T <1.
OVERVIEW OF LECTURE 3
Interpolation
Given distinct points x
0
, . . . , x
n
, and values y
0
, . . . , y
n
(sometimes referred to f
0
, . . . , f
n
when they come
from a function f (x)), there is a unique interpo-
lation polynomial p
n
(x) of degree n such that
p
n
(x
j
) = y
j
.
Existence follows from Lagrange interpola-
tion formula
p
n
(x) =
n

k=0
L
k
(x)y
k
.
Uniqueness follows from the Fundamental
Theorem of Algebra.
Note that the degree may be strictly smaller
than n. For example, if y
j
= c for 0 j n,
then
p
n
(x) c
has degree 0.
Interpolation Error
When interpolating a function f (x) at nodes
x
0
, . . . , x
n
, then
| f (x) p
n
(x)|
M
n+1
(n +1)!
|
n+1
(x)|,
where
M
n+1
= max
axb
| f
(n+1)
(x)|.
and
n+1
(x) =(x x
0
) (x x
n
).
4
Newtons Divided Differences
Newton interpolation polynomial:
p(x) =f [x
0
] + f [x
0
, x
1
](x x
0
) +
+ f [x
0
, . . . , x
n
](x x
0
) (x x
n1
).
Divided differences:
f [x
i
] := f (x
i
),
f [x
i
, x
i +1
] :=
f [x
i +1
] f [x
i
]
x
i +1
x
i
,
f [x
i
, x
i +1
, . . . , x
i +k
] : =
f [x
i +1
, x
i +2
, . . . , x
i +k
] f [x
i
, x
i +2
, . . . , x
i +k1
]
x
i +k
x
i
.
Computing the divided differences:
x
0
f
0
f [x
0
, x
1
]
x
1
f
1
f [x
0
, x
1
, x
2
]
f [x
1
, x
2
] f [x
0
, x
1
, x
2
, x
3
]
x
2
f
2
f [x
1
, x
2
, x
3
]
f [x
2
, x
3
]
x
3
f
3
OVERVIEW OF LECTURE 5
Some Quadrature Rules for

b
a
f (x) dx.
Trapezium rule:
T( f ) =
(b a)
2
( f (a) + f (b)).
Error bound:
1
12
(b a)
3
M
2
Simpsons rule:
S( f ) =
(b a)
6
( f (a) +4f ((a +b)/2) + f (b)).
Error bound:
1
2880
(b a)
5
M
4
Composite trapezium (h =(b a)/n, x
j
=a + j h):
T
c
( f ) =h

1
2
f (x
0
) + f (x
1
) + + f (x
n1
) +
1
2
f (x
n
)

.
Error bound:
1
12
h
2
(b a)M
2
Composite Simpson (h =(b a)/2m, x
j
=a + j h):
S
c
( f ) =
h
3
( f (x
0
) +4f (x
1
) +2f (x
2
) +
+2f (x
2m2
) +4f (x
2m1
) + f (x
2m
)).
Error bound:
1
180
h
4
(b a)M
4
M
k
denotes an upper bound on | f
(k)
(x)|, the k-the derivative of f on
the interval [a, b]. The error bounds are derived from the corresponding
error bounds of the interpolation polynomials of f .
6
Newton-Cotes Rules
A quadrature rule seeks to approximate an inte-
gral as a weighted sum of function values

b
a
f (x) dx
n

k=0
w
k
f (x
k
),
where the x
k
are the quadrature nodes and the w
k
are called the quadrature weights.
A Newton-Cotes scheme of order n uses the La-
grange basis functions to construct the interpola-
tion weights. Given nodes x
k
= a +kh, 0 k n,
where h =(b a)/n, then

b
a
f (x) dx I
n
( f ) :=

b
a
p
n
(x) dx =
n

k=0
w
k
f (x
k
),
where w
k
=

b
a
L
k
(x) dx.
Error analysis follows from error analysis of in-
terpolation:

b
a
f (x) dx I ( f )

b
a
| f (x) p
n
(x)| dx.
OVERVIEW OF LECTURE 7
Decompose n n matrix A as
A =L +D+U,
where
L: lower triangular,
D: diagonal,
U: upper triangular.
Jacobi Method
x
k+1
=D
1
(b (L +U)x
k
).
Componentwise, for 1 i n:
x
(k+1)
i
=
1
a
i i

b
i

j =i
a
i j
x
(k)
j

.
Gauss-Seidel Method
(D+L)x
k+1
=Ux
k
+b.
Componentwise, for 1 i n:
x
(k+1)
i
=
1
a
i i

b
i

j <i
a
i j
x
(k+1)
j

j >i
a
i j
x
(k)
j

8
Convergence of Iterative Methods
A sequence of vectors x
k
R
n
, k = 1, 2, . . . , con-
verges to x R
n
with respect to a norm, if for all
>0 there exists an N >0 such that for all k N:
x
k
x <.
Very important relation:
x

x
2

nx

.
A sequence x
k
, k 0, generated by the rule
x
k+1
=Tx
k
+c
for k 0 has a xed point x, if x satises
x =Tx +c.
The sequence x
k
converges to the xed point x
with respect to some norm if T < 1 with re-
spect to the associated operator norm.
Example: Gauss-Seidel T =(L +D)
1
U, Jacobi
T =D
1
(L +U).
OVERVIEW OF LECTURE 9
Matrix Norms
To any vector norm we associated the opera-
tor norm
A =max
x=0
Ax
x
= max
xR
n
x=1
Ax.
Whichof the twoequivalent forms is usedde-
pends on the application.
Interpretation as the maximum stretch of a
vector under the operation of A.
10
Eigenvalues, Inverses, etc.
Given a triangular matrix

a
11
0 0
a
21
a
22
0
.
.
.
.
.
.
.
.
. 0
a
n1
a
n2
a
nn

,
the eigenvalues can be read off the diagonal.
The eigenvalues of the inverse A
1
are the in-
verses of the eigenvalues of A.
The inverse of a 22 diagonal matrix can be
written as

11
0

21

22

1
=

11
0


21

11

22
1

22

.
If asked to bound the matrix norm of a prod-
uct, we can use
AB AB
and estimate the norm of A and B.
Lecture 14
So far we have seen that the condition T < 1 ensures that an iterative scheme of
the form
(14.0.1) x
k+1
=Tx
k
+c
converges to a vector x such that x = Tx +c as k . The converse is not true, there
are examples for which T 1 but the iteration (14.0.1) converges nevertheless.
Example 14.1. Recall the nite difference matrix
A =

2 1 0 0
1 2 1 0
0 1 2 0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
0 0 0 2

and apply the Jacobi method to compute a solution of Ax =b. The Jacobi method com-
putes the sequence x
k+1
=Tx
k
+c, where c =
1
2
b and
T =T
J
=D
1
(L +U) =
1
2

0 1 0 0
1 0 1 0
0 1 0 0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
0 0 0 0

.
We have T

= 1, so the convergence criterion doesnt apply for this norm. However,


one can show that all the eigenvalues satisfy || < 1. Since the matrix T is symmetric,
we have
T
2
=(T) <1,
where (T) denotes the spectral radius. It follows that the iteration (14.0.1) does con-
verge withrespect to the 2-norm, andtherefore also withrespect to the -norm, despite
having T

=1.
It turns out that the spectral radius gives rise to a necessary and sufcient condition
for convergence.
Theorem 14.1. Let T R
nn
be invertible. The iterates x
k
of (14.0.1) converge to a solu-
tion x of x =Tx +c for all starting points x
0
if and only if (T) <1.
PROOF. Let x
0
be any starting point, and dene, for all k 0,
z
k
=x
k
x.
Then z
k+1
= Tz
k
, as is easily veried. The convergence of the sequence x
k
to x is then
equivalent to the convergence of z
k
to 0.
1
2 LECTURE 14
Assume T has n eigenvalues
k
, 1 k n, with eigenvectors u
k
. The u
k
are a basis
of R
n
, and we can write
(14.0.2) z
0
=
n

j =1

j
u
j
for some coefcients
j
. For the iterate we get
z
k+1
=Tz
k
=T
k+1
z
0
=T
k+1

j =1

j
u
j

=
n

j =1

j
T
k+1
u
j
=
n

j =1

k+1
j
u
j
.
Now assume (T) < 1. Then |
j
| < 1 for all eigenvalues
j
, and therefore
k+1
j
0 as
k . Therefore, z
k+1
0 as k and x
k+1
x. If, on the other hand, (T) 1,
then there exists an index j such that |
j
| 1. If we choose a starting point x
0
such that
the coefcient
j
in (14.0.2) is not zero, then
|
j

k+1
j
| |
j
|,
for all k and we deduce that z
k+1
does not converge to zero.
The question that remains is whether there is a convenient way to estimate (T),
given that the eigenvalues may be difcult to compute. One way to bound the size of
the eigenvalues is by means of Gershgorins Theorem.
Theorem14.2. Every eigenvalue of an nn matrix A lives in one of the circles C
1
, . . . ,C
n
,
where C
i
has centre at the diagonal a
i i
and radius
r
i
=

j =i
|a
i j
|.
Example 14.2. Consider the matrix
A =

2 1 0
1 4 1
0 1 8

.
The centres are given by 2, 4, 8, and the radii by r
1
=1, r
2
=2, r
3
=1.
PROOF. Let be an eigenvalue of A, with associated eigenvector u, so that
Au =u.
The i -th row of this equation looks like
u
i
=
n

j =1
a
i j
u
j
.
LECTURE 14 3
FIGURE 1. Gershgorins circles.
Bringing a
i i
to the left, this implies the inequality
|a
i i
|

j =i
|a
i j
|
|u
j
|
|u
i
|
.
If the index i is such that u
i
is the component of u with largest absolute value, then the
right-hand side is bounded by r
i
, and we get
|a
i i
| r
i
,
which implies that lies in a circle of radius r
i
around a
i i
.
Gershgorins Theorem has implications on the convergence of Jacobis method. To
state these implications, we need a denition.
Denition 14.0.1. A matrix A is called diagonally dominant, if for all indices i we have
|a
i i
| >r
i
.
Corollary 14.1. Let A be diagonally dominant. Then the Jacobi method converges to a
solution of the system Ax =b for any starting point x
0
.
PROOF. We need to show that if A is diagonally dominant, then (T
J
) < 1, where
T
J
=D
1
(L +U) is the iteration matrix of Jacobis method. The i -th row of T
J
is given
by

1
a
i i

a
i 1
a
i i 1
0 a
i i +1
a
i n

.
By Gershgorins Theorem, all the eigenvalues of T
J
lie in a circle around 0 of radius
r
i
=
1
|a
i i
|

j =i
|a
i j
|.
It follows that if A is diagonally dominant, then r
i
<1, and therefore || <1 for all eigen-
values of T
J
. In particular, (T
J
) < 1 and Jacobis method converges for any starting
value x
0
.
Lecture 15
The Condition Number
In this chapter we discuss the sensitivity of a system of equations Ax =b to pertur-
bations in the data. This sensitivity is quantied by the notion of condition number. We
begin by illustrating the problem with a small example.
Example 15.1. Lets look at the system of equations with
A =

1
0 1

, b =

1+
1

,
where 0 <, <<1 (that is, much smaller than 1). The solution of Ax =b is
x =

.
We can think of as caused by rounding error. Thus =0 would give us an exact solu-
tion, while if is small and <<, then the change of x due to =0 can be large!
The following denition is deliberately vague, and will be made more precise in
light of the condition number.
Denition 15.0.1. A system of equations Ax = b is called ill-conditioned, if small
changes in the system cause big changes in the solution.
To measure the sensitivity of a solution with respect to perturbations in the system,
we introduce the condition number of a matrix.
Denition 15.0.2. Let be a matrix norm and A an invertible matrix. The condition
number of A is dened as
cond(A) =A

A
1

.
We write cond
1
(A), cond
2
(A), cond

(A) for the condition number with respect to


the 1, 2 and norms. We next show how the condition number is used to bound errors
in a solution.
Let x be the true solution of a system of equations Ax = b, and let x
c
= x +x be
the solution of a perturbed system
(15.0.1) A(x +x) =b +b,
where b is a perturbation of b. We are interested in bounding the relative error
x
x
in terms of b/b. We have
b = A(x +x) b = Ax,
1
2 LECTURE 15
from which we get x = A
1
b and x =

A
1
b

A
1

b. On the other
hand, b =Ax Ax, and combining these estimates, we get we get
(15.0.2)
x
x
A

A
1

b
b
=cond(A)
b
b
.
The condition number therefore bounds the relative error in the solution in terms of
the relative error in b. We can also derive a similar bound for perturbations A in the
matrix A. Note that a small condition number is a good thing, as it implies a small error.
The above analysis can also be rephrased in terms of the residual of a computed
solution. Suppose we have A and b exactly, but solving the system Ax =b by a compu-
tational method gives a computed solution x
c
=x +x that has an error. We dont know
the error, but we have access to the residual
r = Ax
c
b.
We can rewrite this equations as in ( 15.0.1), with r instead of b, so that we can inter-
pret the residual as a perturbation of b. The condition number bound (15.0.2) therefore
implies
x
x
cond(A)
r
b
.
We now turn to some examples of condition numbers.
Example 15.2. Let
A =

1
0 1

.
The inverse is given by
A
1
=
1

1 1
0

.
The condition numbers with respect to the 1,2 and norms are easily seen to be
cond
1
(A) =
2(1+)

, cond
2
(A) =
1

, cond

(A) =
2(1+)

.
If is small, the condition numbers are large and therefore cant guarantee small errors.
Example 15.3. A well-known example is the Hilbert matrix. Let H
n
by the n n matrix
with entries
h
i j
=
1
i + j 1
for 1 i , j n. This matrix is symmetric and positive denite (that is, H

n
= H
n
and
x

H
n
x >0 for all x =0). For example, for n =3 the matrix looks as follows
H
3
=

1
1
2
1
3
1
2
1
3
1
4
1
3
1
4
1
5

Examples such as the Hilbert matrix are not common in applications, but they serve as
a reminder that one should keep an eye on the conditioning of a matrix.
n 5 10 15 20
cond
2
(H
n
) 4.8 10
5
1.6 10
13
6.1 10
20
2.5 10
28
THE CONDITION NUMBER 3
2 3 4 5 6 7 8 9 10 11 12
0
5
10
15
n
Conditioning of Hilbert matrix
l
o
g
1
0
(
c
o
n
d
2
(
H
)
)
FIGURE 1. Condition number of Hilberts matrix.
It can be shown that the condition number of the Hilbert matrix is asymptotically given
by
cond(H
n
)
(

2+1)
4n+4
2
15/4

n
for n . To see the effect that this conditioning has to solving systems of equations,
lets look at a system
H
n
x =b,
with entries b
i
=

n
j =1
j
i +j 1
. The system is constructed such that the solution is x =
(1, . . . , 1)

. For n = 20 we get, solving the system using MATLAB, a solution x +x with


differs considerably from x. The relative error
x
2
x
2
44.9844.
What that means is that the computed solution is useless.
Example 15.4. An important example is the condition number of the omnipresent -
nite difference matrix
A =

2 1 0 0 0
1 2 1 0 0
0 1 2 0 0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
0 0 0 2 1
0 0 0 1 2

.
It can be shown that the condition number of this matrix is given by
cond
2
(A) =
4

2
h
2
,
where h =1/(n +1). If follows that the condition number increases with the number of
discretisation steps n.
4 LECTURE 15
Example 15.5. What is the condition number of a random matrix? If we generate ran-
dom 100100 matrices with normally distributed entries and look at the frequency of
the logarithm of the condition number, then we get the following:
1 2 3 4 5 6 7 8
0
200
400
600
800
1000
1200
1400
Distribution of log of condition number.
log(cond
2
(A))
F
r
e
q
u
e
n
c
y
FIGURE 2. Condition number of random matrix.
It should be noted that a randommatrix is not the same as any old matrix, and equally
not the same as a typical matrix arising in applications, so one should be careful in in-
terpreting statements about random matrices!
Computing the condition number can be difcult, as it involves computing the in-
verse of a matrix. In many cases one can nd good bounds on the condition number,
which can, for example, be used to tell whether a problem is ill-conditioned.
Example 15.6. Consider the matrix
A =

1 1
1 1.0001

, A
1
=10
4

1.0001 1
1 1

.
The condition number with respect to the -norm is given by cond

(A) =410
4
. We
would like to nd an estimate for this condition number without having to invert the
matrix A. To do this, note that for any x and b = Ab we have
Ax =b x = A
1
b x

A
1

b,
and we have the lower bound

A
1

x
b
.
Choosing x =(1, 1)

in our case, we get b =(0, 0.0001)

and the estimate


cond

(A) =A

A
1

2 10
4
.
This estimate is of the right order of magnitude (inparticular, it shows that the condition
number is large), and no inversion was necessary.
THE CONDITION NUMBER 5
To summarise:
A small condition number is a good thing, as small changes in the data lead to
small changes in the solution.
Condition numbers may depend on the problem the matrix arises from and
can be very large.
A large condition number is a sign that the matrix at hand is close to being
non-invertible.
Condition numbers also play a role in the convergence analysis of iterative matrix
algorithms. We will not discuss this aspect here and refer to more advanced lectures on
numerical linear algebra and matrix analysis.
Computing determinants and eigenvalues
For a 22 matrix
A =

a
11
a
12
a
21
a
22

,
the determinant is given by det(A) =a
11
a
22
a
12
a
21
.
For an n n matrix A, the computation can be reduced to that of smaller
determinants by means of row expansion
det(A) =a
11
det(A
11
) a
12
det(A
12
) + +(1)
1+n
a
1n
det(A
1n
),
where A
i j
denotes the (n1)(n1) matrix A with the i -th rowand j -th column
removed. More generally, we have for any 1 i n,
det(A) =
n

j =1
(1)
i +j
a
i j
det(A
i j
)
and the similar expansion along columns
det(A) =
n

i =1
(1)
i +j
a
i j
det(A
i j
)
An important property of the determinant is the multiplicative rule
(0.0.1) det(AB) =det(A) det(B).
The eigenvalues of an n n matrix A can be calculated as the roots of the
characteristic polynomial:
det(I A) =0
For example,
A =

0 1 0
1 0 1
0 1 0

I A =

1 0
1 1
0 1

.
Expanding along the rst row, we get
det(I A) =det

1
1

1 det

1 1
0

=(
2
1) =
3
2.
This polynomial has the roots
1
=

2,
2
=

2,
3
= 0, which are the eigen-
values of the matrix A.
A particular example are the eigenvalues of a Gauss-Seidel matrix. We have
T =(L +D)
1
U,
where A =L+D+U is the decomposition in lower triangular, diagonal, and up-
per triangular parts. To compute the eigenvalues of T, we use the multiplicative
rule (0.0.1):
det(I T) =0 det((L +D)(I T)) =0 det((L +D) +U) =0.
We can compute the eigenvalues by expanding the last expression, so there is
no need to compute inverses!
1
Lecture 16
Solving non-linear equations
Given a function f : RR, we would like to nd a solution to the equation
(16.0.1) f (x) =0.
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
1
0.8
0.6
0.4
0.2
0
0.2
0.4
0.6
0.8
1
x
f
(
x
)
For example, if f is a polynomial of degree 2, we can write down the solutions in closed
form (though, as seen in Lecture 1, this by no means solves the problem from a nu-
merical point of view!). In general, we will encounter functions for which a closed form
does not exist, or is not convenient to write down or evaluate. The best way to deal
with (16.0.1) is then to nd an approximate solution using an iterative method. Here we
will discuss two methods:
The bisection method.
Newtons method.
The bisection method only requires that f be continuous, while Newtons method also
requires differentiability but is faster.
The bisection method
Let f : R R be a continuous function on an interval [a, b], a < b. Assume that
f (a) f (b) < 0, that is, the function values at the end points have different signs. By the
intermediate value theorem (or common sense) there exists an x with a < x < b such
that f (x) =0.
1
2 LECTURE 16
The most direct method of nding such a zero x is by divide and conquer: deter-
mine the half of the interval [a, b] that contains x and shrink the interval to that half,
then repeat until the boundary points are reasonably close to x. This approach is called
the bisection method.
0 0.5 1
1
0.5
0
0.5
1
Iteration: 0, x = 0.3750, f(x) = 0.0980
0 0.5 1
1
0.5
0
0.5
1
Iteration: 1, x = 0.5000, f(x) = 0.4714
0 0.5 1
1
0.5
0
0.5
1
Iteration: 2, x = 0.2500, f(x) = 0.2903
0 0.5 1
1
0.5
0
0.5
1
Iteration: 3, x = 0.3750, f(x) = 0.0980
FIGURE 1. The bisection method.
To be more precise, starting with [a, b] such that f (a) f (b) < 0, we construct a se-
ries of decreasing intervals [a
n
, b
n
], n 1, each containing x. At each step, we cal-
culate the midpoint p
n
= (a
n
+b
n
)/2 and evaluate f (p
n
). If f (p
n
) f (a
n
) < 0 we set
[a
n+1
, b
n+1
] = [a
n
, p
n
], else [a
n+1
, b
n+1
] = [p
n
, b
n
]. We stop whenever b
n
a
n
< TOL
for some predened tolerance TOL, for example, 10
4
, and return the value p
n
.
In MATLAB this would look like:
while (ba >= TOL)
p = (a+b)/2; % Calculate midpoint
if f(a)
*
f(p)<0 % Change of sign detected
b = p; % Set right boundary to p
else
a = p; % Set left boundary to p
end
end
x = p; % Computed solution
Example 16.1. Lets look at the polynomial x
6
x1 on the interval [1, 2] with tolerance
TOL =0.2 (that is, we stop when we have located an interval of length 0.2 containing
the root x). Note that no closed form solution exists for polynomials of degree 5. The
bisection method is best carried out in form of a table. At each step the midpoint p
n
is
NEWTONS METHOD 3
obtained, and serves as next left or right boundary, depending on whether f (a
n
) f (p
n
) <
0 or not.
n a
n
f (a
n
) b
n
f (b
n
) p
n
f (p
n
)
1 1 1 2 61 1.5 8.89
2 1 1 1.5 8.89 1.25 1.5647
3 1 1 1.25 1.5647 1.125 0.0977
4 1.125 0.0977 1.25 1.5647 1.1847
We see that |b
4
a
4
| =0.125 <TOL, sowe stopthere anddeclare the solution p
4
=1.1847.
The following result shows that the bisection method indeed approximates a zero
of f to arbitrary precision.
Lemma 16.1. Let f : R R be a continuous function on an interval [a, b] and let p
n
,
n 1, be the sequence of midpoints generated by the bisection method on f . Let x be such
that f (x) =0. Then
|p
n
x|
1
2
n
|b a|.
In particular, p
n
x as n .
A convergence of this form is called linear.
PROOF. Let x [a, b] be such that f (x) =0. Since p
n
is the midpoint of [a
n
, b
n
] and
x [a
n
, b
n
], we have
|p
n
x|
1
2
|b
n
a
n
|.
By bisection, each interval has half the length of the preceding one:
|b
n
a
n
| =
1
2
|b
n1
a
n1
|.
Therefore,
|p
n
x|
1
2
|b
n
a
n
|
=
1
2
2
|b
n1
a
n1
|
=
=
1
2
n
|b
1
a
1
| =
1
2
n
|b a|.
This completes the proof.
Newtons method
If the function f : RR is differentiable, and we know how to compute f

(x), then
we can (under certain conditions) nd the root of f much quicker by Newtons method.
The idea behind Newtons method is to approximate f at a point x
n
by its tangent line,
and calculate the next iterate x
n+1
as the root of this tangent line.
Given a point x
n
with function value f (x
n
), we need to nd the zero-crossing of the
tangent line at (x
n
, f (x
n
)):
y = f

(x
n
)(x x
n
) + f (x
n
) =0.
4 LECTURE 16
0 0.5 1
1
0.5
0
0.5
1
Iteration: 0, x = 0.7000, f(x) = 0.8997
0 0.5 1
1
0.5
0
0.5
1
Iteration: 1, x = 0.0437, f(x) = 0.8090
0 0.5 1
1
0.5
0
0.5
1
Iteration: 2, x = 0.4819, f(x) = 0.4205
0 0.5 1
1
0.5
0
0.5
1
Iteration: 3, x = 0.3344, f(x) = 0.0295
FIGURE 2. Newtons method
Solving this for x, we get
x =x
n

f (x
n
)
f

(x
n
)
,
which is dened provided f

(x
n
) =0. Formally, Newtons method is as follows:
Start with x
1
[a, b] such that f

(x
1
) =0.
At each step, compute a new iterate x
n+1
from x
n
as follows:
x
n+1
=x
n

f (x
n
)
f

(x
n
)
.
Stop if |x
n+1
x
n
| <TOL for some predened tolerance.
Example 16.2. Consider again the function f (x) = x
6
x 1. The derivative is f

(x) =
6x
5
1. We apply Newtons method using a tolerance TOL =0.001. We get the sequence:
x
1
=1
x
2
=x
1

f (x
1
)
f

(x
1
)
=1.2
x
3
=x
2

f (x
2
)
f

(x
2
)
=1.1436
x
4
=1.349
x
5
=1.1347.
The difference |x
5
x
4
| is below the given tolerance, so we stop and declare x
5
to be our
solution. We can already see that in just four iterations we get a better approximation
that using the bisection method.
NEWTONS METHOD 5
We will see that the error of Newtons method is bounded as
|x
n+1
x| k|x
n
x|
2
for a constant k, provided we start sufciently close to x. This will be shown using the
theory of xed point iterations, discussed in the next lecture.
Newtons method is not without difculties. One can easily come up with starting
points where the method does not converge. One example is when f

(x
1
) 0, in which
case the tangent line at (x
1
, f (x
1
)) is almost horizontal and takes us far away from the
solution. Another one would be where the iteration oscillates between two values, as in
the following example.
2 1.5 1 0.5 0 0.5 1 1.5 2
2
1
0
1
2
3
4
5
6
x
f
(
x
)
y=x
3
2x+2
FIGURE 3. Newtons method fails
In the following lectures we will derive precise conditions under which Newtons
method converges to a solution.
Lecture 17
Fixed-point iterations
A root of a function f : R R is a number x R such that f (x) =0. A xed-point is
a root of a function of the form f (x) =g(x) x.
Denition17.0.1. A xed-point of a function g : RRis a number x such that g(x) =x.
In Newtons method we have
g(x) =x
f (x)
f

(x)
,
where x is a xed-point of g if and only if x is a root of f . A feature of the xed-point
formulation is that we may generate a sequence x
n
, n 1, by means of
x
n+1
=g(x
n
)
and hope that it converges to a xed-point of g. We will study conditions under which
this happens.
Example 17.1. Let f (x) =x
3
+4x
2
10. There are several ways to rephrase the problem
f (x) =0 as a xed-point problem g(x) =x.
(1) Let g
1
(x) = x x
3
4x
2
+10. Then g
1
(x) = x if and only if f (x) =0, as is easily
seen.
(2) Let g
2
(x) =
1
2

10x
3

1/2
. Then g
2
(x) =x x
2
=
1
4
(10x
3
) f (x) =0.
(3) Let g
3
(x) =

10
4+x

1/2
. Then it is also not difcult to verify that g
3
(x) =x is equiv-
alent to f (x) =0.
Example 17.2. We briey discuss a more intriguing example, the logistic map
g(x) =r x(1x)
with some r [0, 4]. Whether the iteration x
n+1
= g(x
n
) converges to a xed-point, and
how it converges, depends on the value of r . Three examples are shown below.
0 10 20 30 40 50
0.58
0.6
0.62
0.64
0.66
0.68
0.7
r = 2.8000
0 10 20 30 40 50
0.4
0.5
0.6
0.7
0.8
0.9
1
r = 3.5000
0 10 20 30 40 50
0
0.2
0.4
0.6
0.8
1
r = 3.8000
1
2 LECTURE 17
If we record the movement of x
n
for r ranging between 0 and 4, the following picture
emerges:
0 0.5 1 1.5 2 2.5 3 3.5 4
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Logistic map x = r x(1x)
r
x
It turns out that for small values of r we have convergence (which, incidentally, does
not depend on the starting value), for values slightly above 3 oscillation between two,
then four values, while for larger r we have chaotic behaviour. In that region, the tra-
jectory of x
n
is also highly sensitive to perturbations of the initial value x
1
. The precise
behaviour of such iterations is studied in dynamical systems.
Givena xed-point problem, the all important questionis whenthe iteration x
n+1
=
g(x
n
) converges. The following theorem gives an answer to this question.
Theorem17.1. (xed-point theorem) Let g be a smooth function on [a, b]. Assume
(1) g(x) [a, b] for x [a, b], and
(2) |g

(x)| <1 for x [a, b].


Then there exists a unique xed-point x = g(x) in [a, b] and the sequence {x
n
} dened by
x
n+1
=g(x
n
) converges to x. Moreover,
|x
n+1
x|
n
|x
1
x|
for some <1.
PROOF. Let f (x) =g(x)x. Then by (1), f (a) =g(a)a 0, and f (b) =g(b)b 0.
By the intermediate value theorem, there exists an x [a, b] such that f (x) =0. Hence,
there exists x [a, b] such that g(x) =x, showing the existence of a xed-point.
Next, consider x
n+1
=g(x
n
) for n 1, and let x =g(x) be a xed-point. Then
x
n+1
x =g(x
n
) g(x).
Assume without lack of generality x
n
> x. By the mean value theorem there exists a
(x, x
n
) such that
g

() =
g(x
n
) g(x)
x
n
x
,
FIXED-POINT ITERATIONS 3
and hence
x
n+1
x =g

()(x
n
x).
Since [a, b], assumption (2) gives g

() for some <1. Hence,


|x
n+1
x| |x
n
x|

n
|x
1
x|.
This proves the convergence. To show uniqueness, assume x, y are two distinct xed-
points of g with x < y. By the mean value theorem and assumption (2), there exists
(x, y) such that
g(x) g(y)
x y
=g

() <1.
But since both x and y are xed-points, we have
g(x) g(y)
x y
=1,
so we have a contradiction and x = y.
Example 17.3. Lets look at the functions from the example (17.1) to see for which one
we have convergence.
(1) g
1
(x) = x x
3
4x
2
+10 on [a, b] = [1, 2]. Note that g(1) = 6 [1, 2], therefore
assumption (1) is violated.
(2) g
2
(x) =
1
2
(10x
3
)
1/2
. The derivative is given by
g

2
(x) =
3x
2
4(10x
3
)
1/2
,
and therefore g

2
(2) =2.12. Condition (2) fails.
(3) The third formulation is
g
3
(x) =

10
4+x

1/2
The derivative is given by
g

3
(x) =
5
(4+x)
3/2

10
,
and therefore the function is strictly decreasing on [1, 2]. Since g
3
(2) =

5/3
and g
3
(1) =

2 are both in [1, 2], condition (1) is satised. Furthermore,


|g

3
(x)| 1/

10 < 1 for x [1, 2], so condition (2) is also satised. It follows


that the iteration x
n+1
= g
3
(x
n
) converges to a xed-point of g
3
. We can try
this out:
x
1
=1.5, x
2
=1.3484, x
3
=1.3674, x
4
=1.3665, . . .
We can apply the xed-point theorem to Newtons method. Let
g(x) =x
f (x)
f

(x)
.
Then
g

(x) =1
f

(x)
f

(x)
+
f (x) f

(x)
f

(x)
2
=
f (x) f

(x)
f

(x)
2
.
4 LECTURE 17
Let be a root of f . Then f () =0 and f

() =0, so that
g

() =0.
Hence, |g

()| < 1 at the xed-point. Now let > 0 be small and a = , b = +.


Then, by continuity,
|g

(x)| <
1
2
for x [a, b], and (2) holds. Furthermore,
|g(x) | =|g

()||x |
1
2
|x | <
for x [a, b]. Hence, g(x) [a, b] and (1) holds. It follows that in a small enough neigh-
bourhood of a root of f (x), Newtons method converges to that root (provided f

(x) =0
at that root).
Note that the argument with illustrates a key aspect of Newtons method: it only
converges when the initial guess x
1
is close enough to a root of f . What close enough
means is often not so clear.
Lecture 18
Recall that the xed-point theorem guarantees convergence of an iteration x
n+1
=
g(x
n
) to a unique xed-point of g in an interval [a, b] if g(x) [a, b] for x [a, b], and
|g

(x)| < 1 on [a, b]. We can apply this to Newtons method, when stated as xed-point
iteration with
g(x) =x
f (x)
f

(x)
.
The derivative is
g

(x) =1
f

(x)
f

(x)
+
f (x) f

(x)
f

(x)
2
=
f (x) f

(x)
f

(x)
2
.
If is a root of f , that is, f () =0, and if f

() =0, then
g

() =0.
Hence, by continuity, |g

(x)| < 1/2 in a small interval [a, b] with a = , b = +.


Furthermore, by the mean value theorem there exists [a, b] such that
|g(x) | =|g

()||x |
1
2
|x | <
for x [a, b]. Hence, g(x) [a, b] and both conditions of the xed-point theoremare sat-
ised. Newtons method, interpreted as xed-point iteration, converges. In the next sec-
tion we derive a stronger result for Newtons method, namely that it converges quadrat-
ically if the starting point is close enough.
Rates of convergence
The speed of iterative numerical methods is characterised by the rate of conver-
gence.
DEFINITION 18.0.1. The sequence x
n
, n 1, converges to with order 1, or linearly,
if
|x
n+1
| k|x
n
|
for some 0 <k <1. The sequence converges with order r , r 2, if
|x
n+1
| k|x
n
|
r
with k > 0. If the sequence converges with order r = 2, it is said to converge quadrati-
cally.
EXAMPLE 18.1. Consider the sequence x
n
=1/2
r
n
for r >1. Then x
n
0 as n .
Note that
x
n+1
=
1
2
r
n+1
=
1
2
r
n
r
=

1
2
r
n

r
=x
r
n
,
and therefore |x
n
0| 1 |x
n
0|
r
. We have convergence of order r .
1
2 LECTURE 18
The benet of having quadratic convergence is that at each step, the number of
correct signicant digits doubles. For example, if |x
n
| 0.1, then in the next step
we have |x
n+1
| k 0.01. We would like to show that Newtons method converges
quadratically to a root of a function f if we start the iteration sufciently close to that
root.
THEOREM 18.1. A xed-point iteration x
n+1
= g(x
n
) for differentiable g converges
to a xed-point of g quadratically, if g

() = 0 and the starting point x


1
is sufciently
close to .
Here, sufciently close means that there exists aninterval [a, b] for which this holds.
PROOF. Consider the Taylor expansion around :
g(x) =g() +g

()(x ) +
1
2
g

()(x )
2
+R,
where R is a remainder term of order O((x )
3
). Assume g

() =0. Then
g(x) g() =
1
2
g

()(x )
2
+R
and therefore
|g(x) g()| k |x |
2
.
Set x =x
n
, x
n+1
=g(x
n
), =g(). Then g(x) g() =x
n+1
and
|x
n+1
| k|x
n
|
2
.
This shows quadratic convergence.
In summary, we have the following points worth noting about the bisection method
and Newtons method.
The bisection method requires that f is continuous on [a, b], and that
f (a) f (b) <0.
Newtons method requires that f is continuous and differentiable, and more-
over requires a good starting point x
1
.
The bisection method converges linearly, while Newtons method converges
quadratically.
There is no obvious generalisation of the bisection method to higher dimen-
sion, while Newtons method generalises easily.
Newtons method in the complex plane
Before discussing Newtons method in higher dimensions, we present yet another
example illustrating the intricate behaviour of xed-point iterations, this time over the
complex numbers.
EXAMPLE 18.2. Consider the function
f (z) =z
3
1.
This function has exactly three roots, the roots of unity
z
k
=e
2i k
3
for k =0, 1, 2. As in the real case, Newtons method
z
n+1
=z
n

f (z
n
)
f

(z
n
)
NEWTONS METHOD IN THE COMPLEX PLANE 3
converges to one of these roots of unity if we start close enough. But what happens at
the boundaries? The following picture illustrates the behaviour of Newtons method for
this function in the complex plane, where each col or indicates to which root a starting
value converges:
If we look at the speed of convergence, we get the following picture:
4 LECTURE 18
Newtons method in two dimensions
In general, we look at the problem of nding a vector x R
2
such that
f (x) =

f
1
(x)
f
2
(x)

0
0

=0.
To derive Newtons method, we use the Taylor series in higher dimensions
f (y) = f (x) +J(x)(y x) + higher order terms,
where
J(x) =

f
1
x
1
f
1
x
2
f
2
x
1
f
2
x
2

is the Jacobian matrix of partial derivatives. As in the one dimensional case, we use the
linear approximation at x:
f (y) = f (x) +J(x)(y x).
If f (y) =0 is a root of this, then
f (x) +J(x)(y x) =0.
If the Jacobian matrix is invertible, then
y =x J(x)
1
f (x).
Newtons method in two dimension can then be stated as follows:
Start with x
1
,
At each step n, solve J(x
n
) =f (x
n
) for and set x
n+1
=x
n
+.
Repeat until a convergence criterion is satised (for example, x
n+1
x
n

2

TOL for some tolerance).
Note that the generalisation of the condition f

(x) = 0 is that the Jacobian J(x) is
invertible. The update step could have been written in terms of the inverse of the Jaco-
bian, but computing the inverse is usually avoided.
Lecture 19
Newtons method in two dimensions
We look at the problem of nding a vector x R
2
such that
f (x)

f
1
(x)
f
2
(x)

0
0

0.
To derive Newtons method, we use the Taylor series in higher dimensions
f (y) f (x) +J(x)(y x) + higher order terms,
where
J(x)

f
1
x
1
f
1
x
2
f
2
x
1
f
2
x
2

is the Jacobian matrix of partial derivatives. As in the one dimensional case, we use the
linear approximation at x
f (x) +J(x)(y x)
and look for the zero crossing of this expression. If the Jacobianmatrix is invertible, then
y x J(x)
1
f (x).
Newtons method in two dimension can then be stated as follows:
(1) Start with x
1
,
(2) At each step n, solve J(x
n
) f (x
n
) for and set x
n+1
x
n
+.
(3) Repeat until a convergence criterion is satised (for example, |x
n+1
x
n
|
2

TOL for some tolerance).
Note that the generalisation of the condition f
t
(x) / 0 is that the Jacobian J(x) is
invertible. The update step (2) could have been written in terms of the inverse of the
Jacobian, but computing the inverse is usually avoided.
EXAMPLE 19.1. Find x (x
1
, x
2
)

such that
f (x)

x
2
1
x
2
2
1
x
3
1
x
2
2
1

0.
using Newtons method. Then
f
1
x
1
2x
1
,
f
1
x
2
2x
2
f
2
x
1
3x
2
1
x
2
2
,
f
2
x
2
2x
3
1
x
2
.
The Jacobian is given by
J(x)

2x
1
, 2x
2
3x
2
1
x
2
2
, 2x
3
1
x
2

.
1
2 LECTURE 19
A step in Newtons method is then given by
x
n+1
x
n
+,
where is the solution of the system
J(x
n
) f (x
n
).
Then initial guess x
1
should obviously be chosen so that J(x
1
) is invertible. For example,
if we take x
1
(0, 0)

, then J(x
1
) is the zero matrix and we cant solve for .
1.1 1.15 1.2 1.25
0.7
0.75
0.8
0.85
0.9
0.95
x
1
x
2
Iteration: 5, x = (1.2365, 0.7274)
Fixed-point iterations in two dimensions
Consider g : R
2
R
2
and iterate
x
n+1
g(x
n
).
In analogy to the one-dimensional case, we seek conditions on g that guarantee con-
vergence to a xed point
g().
For example, recall the Gauss-Seidel or Jacobi methods, which were of the form
x
n+1
Tx
n
+c
andwere showntoconverge toa xedpoint if the spectral radius (T) (that is, the largest
modulus of an eigenvalue of T) satises (T) < 1. These are xed point iterations with
g(x) Tx +c.
THEOREM 19.1. Let g : R
2
R
2
be differentiable with continuous derivative. The
xed-point iteration x
n+1
g(x
n
) converges to a xed-point of g if
(1) x
1
is sufciently close to (local convergence),
(2) the spectral radius of the Jacobian matrix satises (J()) <1.
FIXED-POINT ITERATIONS IN TWO DIMENSIONS 3
Unless otherwise stated, we mean convergence with respect to the 2-norm. As we
have seen in the discussion of convergence of iterative methods in linear algebra, con-
vergence with respect to one of the 2-norm implies convergence with respect to the
norm and vice versa.
EXAMPLE 19.2. Consider the problem of nding a root of
f (x)

x
2
1
x
2
2
1
x
3
1
x
2
2
1

.
It is not difcult to see that this is equivalent to nding a xed-point of
g(x)

x
2
+1/(x
1
+x
2
)
1/x
3/2
1

.
To see this, note that
x
2
1
x
2
2
1 0 (x
1
x
2
)(x
1
+x
2
) 1 (x
1
x
2
) 1/(x
1
+x
2
)
x
1
x
2
+1/(x
1
+x
2
)
x
3
1
x
2
2
1 0 x
2
2
1/x
3
1
x
2
1/x
3/2
1
.
Choose an initial point x
1
(1.1, 0.9)

and observe the behaviour of the xed-point it-


eration x
n+1
g(x
n
):
1.1 1.15 1.2 1.25 1.3 1.35 1.4 1.45
0.65
0.7
0.75
0.8
0.85
0.9
0.95
x
1
x
2
Iteration: 39, x = (1.2366, 0.7273)
Convergence to the xed-point is slow, but it happens. The Jacobian matrix of the func-
tion g is given by
J(x)

1/(x
1
+x
2
)
2
11/(x
1
+x
2
)
2

3
2
x
5/2
1
0

.
At the xed-point (1.236, 0.7272)

we have
J()

0.2593 0.7407
0.8823 0

.
4 LECTURE 19
The eigenvalues are 0.12920.7979i . One can check that the maximum modulus
is
[[ 0.8 <1,
so that we have convergence.
We will not prove the xed-point theoremintwo dimensions. There is ananalogous
theorem about quadratic convergence, which applies to Newtons method.
THEOREM 19.2. Consider Newtons method for f (x) 0 and assume the Jacobian
J(x) is invertible for all x in a neighbourhood of the root . Then for close enough starting
point x
1
, the Newton iteration
x
n+1
x
n
+, J(x
n
) f (x
n
)
converges quadratically to in the sense that
|x
n+1
|
2
k |x
n
|
2
2
for some k >0.

You might also like