Math 103
Math 103
Math 103
January 2, 2010
ii Leah Edelstein-Keshet
Contents
Preface xvii
2 Areas 27
2.1 Areas in the plane . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.2 Computing the area under a curve by rectangular strips . . . . . . . . 29
iii
iv Contents
11 Appendix 221
11.1 How to prove the formulae for sums of squares and cubes . . . . . . . 221
11.2 Riemann Sums: Extensions and other examples . . . . . . . . . . . . 223
11.2.1 A general interval: a ≤ x ≤ b . . . . . . . . . . . . . . . 223
11.2.2 Using left (rather than right) endpoints . . . . . . . . . . 224
11.3 Physical interpretation of the center of mass . . . . . . . . . . . . . . 226
11.4 The shell method for computing volumes . . . . . . . . . . . . . . . . 229
11.4.1 Example: Volume of a cone using the shell method . . . . 229
11.5 More techniques of integration . . . . . . . . . . . . . . . . . . . . . 230
11.5.1 Secants and other “hard integrals” . . . . . . . . . . . . . 230
11.5.2 A special case of integration by partial fractions . . . . . 231
11.6 Analysis of data: a student grade distribution . . . . . . . . . . . . . . 232
11.6.1 Defining an average grade . . . . . . . . . . . . . . . . . 232
11.6.2 Fraction of students that scored a given grade . . . . . . . 232
11.6.3 Frequency distribution . . . . . . . . . . . . . . . . . . . 233
11.6.4 Average/mean of the distribution . . . . . . . . . . . . . 233
11.6.5 Cumulative function . . . . . . . . . . . . . . . . . . . . 234
11.6.6 The median . . . . . . . . . . . . . . . . . . . . . . . . . 236
11.7 Factorial notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236
11.8 Appendix: Permutations and combinations . . . . . . . . . . . . . . . 236
11.8.1 Permutations . . . . . . . . . . . . . . . . . . . . . . . . 236
11.9 Appendix: Tests for convergence of series . . . . . . . . . . . . . . . 237
11.9.1 The ratio test: . . . . . . . . . . . . . . . . . . . . . . . 238
11.9.2 Series comparison tests . . . . . . . . . . . . . . . . . . 239
11.9.3 Alternating series . . . . . . . . . . . . . . . . . . . . . 240
11.10 Adding and multiplying series . . . . . . . . . . . . . . . . . . . . . . 240
11.11 Using series to solve a differential equation . . . . . . . . . . . . . . . 241
Contents ix
Index 243
x Contents
List of Figures
3.1 Definite integrals for functions that take on negative values, and proper-
ties of the definite integral . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.2 How the area changes when the interval changes . . . . . . . . . . . . . 46
3.3 The area of a symmetric region . . . . . . . . . . . . . . . . . . . . . . 51
3.4 The areas A1 and A2 in Example 3 . . . . . . . . . . . . . . . . . . . . 52
3.5 The “area function” corresponding to a function f (x) . . . . . . . . . . . 54
3.6 Sketching the antiderivative of f (x) . . . . . . . . . . . . . . . . . . . . 55
3.7 Sketches of a functions and its antiderivative . . . . . . . . . . . . . . . 56
3.8 Splitting up a region to compute an integral . . . . . . . . . . . . . . . . 58
3.9 Integrating in the y direction . . . . . . . . . . . . . . . . . . . . . . . . 59
xi
xii List of Figures
4.8 The yearly day length cycle and average day length . . . . . . . . . . . . 78
8.1 Probability density and its cumulative function in Example 8.2.1 . . . . . 157
8.2 Median for Example 8.3.1 . . . . . . . . . . . . . . . . . . . . . . . . . 159
8.3 Mean versus median . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
8.4 Median and median for a nonsymmetric probability density . . . . . . . 162
8.5 Refining a histogram by increasing the number of bins leads (eventually)
to the idea of a continuous probability density. . . . . . . . . . . . . . . 166
8.6 Raindrop radius and volume probability distributions . . . . . . . . . . . 169
xv
xvi List of Tables
Preface
Integral calculus arose originally to solve very practical problems that merchants,
landowners, and ordinary people faced on a daily basis. Among such pressing problems
were the following: How much should one pay for a piece of land? If that land has an
irregular shape, i.e. is not a simple geometrical shape, how should its area (and therefore,
its cost) be calculated? How much olive oil or wine, are you getting when you purchase
a barrel-full? Barrels come is a variety of shapes and sizes. If the barrel is not close
to cylindrical, what is its volume (and thus, a reasonable price to pay)? In most such
transactions, the need to accurately measure an area or a volume went well beyond the
available results of geometry. (It was known how to compute areas of rectangles, triangles,
and polygons. Volumes of cylinders and cubes were also known, but these were at best
crude approximations to actual shapes and objects encountered in commerce.) This led to
motivation for the development of the topic we now call integral calculus.
Essentially, the approach is based on the idea of “divide and conquer”: that is, cut up
the geometric shape into smaller pieces, and approximate those pieces by regular shapes
that can be quantified using simple geometry. In computing the area of an irregular shape,
add up the areas of the (approximately regular) little parts in your “dissection”, to arrive at
an approximation of the desired area of the shape. Depending on how fine the dissection
(i.e. how many little parts), this approximation could be quite crude, or fairly accurate.
The idea of applying a limit to obtain the true dimensions of the object was a flash of
inspiration that led to modern day calculus. Similar ideas apply to computing the volume
of a 3D object by successive subdivisions.
It is the aim of a calculus course to develop the language to deal with such concepts,
to make such concepts systematic, and to find convenient and relevant shortcuts that can
be used to solve a variety of problems that have common features. More than that, it is
the purpose of this course to show that ideas developed in the original context of geometry
(finding areas or volumes of 2D or 3D shapes) can be generalized and extended to a variety
of applications that have little to do with geometry.
One area of application is that of computing total change given some time-dependent
rate of change. We encounter many cases where a process changes at a rate that varies
over time: the rate of production of hormone changes over a day, the rate of flow of water
in a river changes over the seasons, or the rate of motion of a vehicle (i.e. its velocity)
changes over its path. Computing the total change over some time span turns out to be
closely related to the same underlying concept of “divide and conquer”: namely, subdivide
(the time interval) and add up approximate changes over each of the smaller subintervals.
The same idea applies to quantities that are distributed not in time but rather over space.
xvii
xviii Preface
We show the connection between material that is spatially distributed in a nonuniform way
(e.g. a density that varies from point to point) and total amount of material (obtained by
the same process of integration).
A theme that unites much of the approach is that integral calculus has both analytic
(i.e. pencil and paper) calculations - but these apply to a limited set of cases, and analogous
numerical (i.e. computer-enabled) calculations. The two go hand-in-hand, with concepts
that are closely linked. A set of computer labs using a spreadsheet tool are an important
part of this course. The importance of seeing calculus from these two distinct but related
perspectives is stressed: on the one hand, analytic computations can be very powerful and
helpful, but at the same time, many interesting problems are too challenging to be handled
by integration techniques. Here is where the same ideas, used in the context of simple
computer algorithms, comes in handy. For this reason, the importance of understanding the
concepts (not just the technical results, or the “formulae” for integrals) is vital: Ideas used to
develop the analytic techniques on which calculus is based can be adapted to develop good
working methods for harnessing computer power to solve problems. This is particularly
useful in cases where the analytic methods are not sufficient or too technically challenging.
This set of lecture notes grew out of many years of teaching of Mathematics 103. The
material is organized as follows: In Chapter 1 we develop the basic formulae for areas and
volumes of elementary shapes, and show how to set up summations that describe compound
objects made up of many such shapes. An example to motivate these ideas is the volume
and surface area of a branching structure. In Chapter 2, we turn attention to the classic
problem of defining and computing the area of a two-dimensional region, leading to the
notion of the definite integral. In Chapter 3, we discuss the linchpin of Integral Calculus,
namely the Fundamental Theorem that connects derivatives and integrals. This allows us
to find a great shortcut to the analytic computations described in Chapter 2. Applications
of these ideas to calculating total change from rates of change, and to computing volumes
and masses are discussed in Chapters 4 and 5.
To expand our reach to other cases, we discuss the techniques on integration in Chap-
ter 6. Here, we find that the chain rule of calculus reappears (in the form of substitution
integrals), and a variety of miscellaneous tricks are devised to simplify integrals. Among
these, the most important is integration by parts, a technique that has independent applica-
tions in many areas of science.
We study the ideas of probability in Chapters 7 and 8. Here we rediscover the con-
nection between discrete sums and continuous integration, and apply the techniques to
computing expected values for random variables. The connection between the mean (in
probability) and the center of mass (of a density distributed in space) is illustrated.
Many scientific problems are phrased in terms of rules about rates of change. Quite
often such rules take the form of differential equations. In an earlier differential calculus
course, the student will have made acquaintance with the topic of such equations and qual-
itative techniques associated with interpreting their solutions. With the methods of integral
calculus in hand, we can solve some types of differential equations analytically. This is
discussed in Chapter 9.
The course concludes with the development of some notions of infinite sums and con-
vergence in Chapter 10. Of prime importance, the Taylor series is developed and discussed
in this concluding chapter.
Chapter 1
1.1 Introduction
This introductory chapter has several aims. First, we concentrate here a number of basic
formulae for areas and volumes that are used later in developing the notions of integral
calculus. Among these are areas of simple geometric shapes and formulae for sums of
certain common sequences. An important idea is introduced, namely that we can use the
sum of areas of elementary shapes to approximate the areas of more complicated objects,
and that the approximation can be made more accurate by a process of refinement.
We show using examples how such ideas can be used in calculating the volumes or
areas of more complex objects. In particular, we conclude with a detailed exploration of
the structure of branched airways in the lung as an application of ideas in this chapter.
Rectangular areas
Most integration techniques discussed in this course are based on the idea of carving up
irregular shapes into rectangular strips. Thus, areas of rectangles will play an important
part in those methods.
1
2 Chapter 1. Areas, volumes and simple sums
(a) (b)
h h
b b
(c) (d)
h
h
b b
(e) (f)
h r
h
θ
b
b
Figure 1.1. Planar regions whose areas are given by elementary formulae.
• In some cases, the height of a triangle is not given, but can be determined from other
information provided. For example, if the triangle has sides of length b and r with
enclosed angle θ, as shown on Figure 1.1(e) then its height is simply h = r sin(θ),
and its area is
A = (1/2)br sin(θ)
• If the triangle is isosceles, with two sides of equal length, r, and base of length b,
as in Figure 1.1(f) then its height can be obtained from Pythagoras’s theorem, i.e.
h2 = r2 − (b/2)2 so that the area of the triangle is
!
A = (1/2)b r2 − (b/2)2 .
θ/2
θ h
1
1/2
Figure 1.2. An equilateral n-sided polygon with sides of unit length can be dis-
sected into n triangles. One of these triangles is shown at right. Since it can be further
divided into two Pythagorean triangles, trigonometric relations can be used to find the
height h in terms of the length of the base 1/2 and the angle θ/2.
Solution
The polygon has n sides, each of length b = 1. We dissect the polygon into n isosceles
triangles, as shown in Figure 1.2. We do not know the heights of these triangles, but the
angle θ can be found. It is
θ = 2π/n
since together, n of these identical angles make up a total of 360◦ or 2π radians.
1 This calculation will be used again to find the area of a circle in Section 1.2.2. However, note that in later
chapters, our dissections of planar areas will focus mainly on rectangular pieces.
4 Chapter 1. Areas, volumes and simple sums
Let h stand for the height of one of the triangles in the dissected polygon. Then
trigonometric relations relate the height to the base length as follows:
opp b/2
= = tan(θ/2)
adj h
Using the fact that θ = 2π/n, and rearranging the above expression, we get
b
h=
2 tan(π/n)
Thus, the area of each of the n triangles is
" #
1 1 b
A= bh = b .
2 2 2 tan(π/n)
The statement of the problem specifies that b = 1, so
" #
1 1
A= .
2 2 tan(π/n)
The area of the entire polygon is then n times this, namely
n
An-gon = .
4 tan(π/n)
For example, the area of a square (a polygon with 4 equal sides, n = 4) is
4 1
Asquare = = = 1,
4 tan(π/4) tan(π/4)
where we have used the fact that tan(π/4) = 1.
As a second example, the area of a hexagon (6 sided polygon, i.e. n = 6) is
√
6 3 3 3
Ahexagon = = √ = .
4 tan(π/6) 2(1/ 3) 2
√
Here we used the fact that tan(π/6) = 1/ 3.
areas of triangles, and then taking a limit as the number of triangles gets larger. Later on, we do much the same,
but using rectangles in the dissections.
1.2. Areas of simple shapes 5
Definition of π
In any circle, π is the ratio of the circumference to the diameter of the circle. (Comment:
expressed in terms of the radius, this assertion states the obvious fact that the ratio of 2πr
to 2r is π.)
Shown in Figure 1.3 is a sequence of regular polygons inscribed in the circle. As the
number of sides of the polygon increases, its area gradually becomes a better and better
approximation of the area inside the circle. Similar observations are central to integral
calculus, and we will encounter this idea often. We can compute the area of any one of
these polygons by dissecting into triangles. All triangles will be isosceles, since two sides
are radii of the circle, whose length we’ll call r.
r r
h
b
Figure 1.3. Archimedes approximated the area of a circle by dissecting it into triangles.
Let r denote the radius of the circle. Suppose that at one stage we have an n sided
polygon. (If we knew the side length of that polygon, then we already have a formula for
its area. However, this side length is not known to us. Rather, we know that the polygon
should fit exactly inside a circle of radius r.) This polygon is made up of n triangles, each
one an isosceles triangle with two equal sides of length r and base of undetermined length
that we will denote by b. (See Figure 1.3.) The area of this triangle is
1
Atriangle = bh.
2
The area of the whole polygon, An is then
1 1
A = n · (area of triangle) = n bh = (nb)h.
2 2
We have grouped terms so that (nb) can be recognized as the perimeter of the polygon
(i.e. the sum of the n equal sides of length b each). Now consider what happens when we
increase the number of sides of the polygon, taking larger and larger n. Then the height
of each triangle will get closer to the radius of the circle, and the perimeter of the polygon
will get closer and closer to the perimeter of the circle, which is (by definition) 2πr. i.e. as
n → ∞,
h → r, (nb) → 2πr
so
1 1
A= (nb)h → (2πr)r = πr2
2 2
6 Chapter 1. Areas, volumes and simple sums
We have used the notation “→” to mean that in the limit, as n gets large, the quantity of
interest “approaches” the value shown. This argument proves that the area of a circle must
be
A = πr2 .
One of the most important ideas contained in this little argument is that by approximating a
shape by a larger and larger number of simple pieces (in this case, a large number of trian-
gles), we get a better and better approximation of its area. This idea will appear again soon,
but in most of our standard calculus computations, we will use a collection of rectangles,
rather than triangles, to approximate areas of interesting regions in the plane.
• The surface area of a right circular cylinder of height h and base radius r is
Scyl = 2πrh.
Units
The units of area can be meters2 (m2 ), centimeters2 (cm2 ), square inches, etc.
(a) (b)
h
s l
w
(c) (d)
r
h
A
Figure 1.4. 3-dimensional shapes whose volumes are given by elementary formulae
4. In particular, the volume of a cylinder with a circular base of radius r, (e.g. a disk) is
V = h(πr2 ).
4 3
V = πr .
3
6. The volume of a spherical shell (hollow sphere with a shell of some small thickness,
τ ) is approximately
7. Similarly, a cylindrical shell of radius r, height h and small thickness, τ has volume
given approximately by
Units
The units of volume are meters3 (m3 ), centimeters3 (cm3 ), cubic inches, etc.
8 Chapter 1. Areas, volumes and simple sums
Figure 1.5. Computing the volume of a set of disks. (This structure is sometimes
called the tower of Hanoi after a mathematical puzzle by the same name.)
(a) Compute the volume of a tower made up of four disks stacked up one on top of
the other, as shown in Figure 1.5. Assume that the radii of the disks are 1, 2, 3, 4 units and
that each disk has height 1.
(b) Compute the volume of a tower made up of 100 such stacked disks, with radii
r = 1, 2, . . . , 99, 100.
Solution
(a) The volume of the four-disk tower is calculated as follows:
V = V1 + V2 + V3 + V4 ,
where Vi is the volume of the i’th disk whose radius is r = i, i = 1, 2 . . . 4. The height of
each disk is h = 1, so
V = (π12 ) + (π22 ) + (π32 ) + (π42 ) = π(1 + 4 + 9 + 16) = 30π.
(b) The idea will be the same, but we have to calculate
V = π(12 + 22 + 32 + . . . + 992 + 1002 ).
It would be tedious to do this by adding up individual terms, and it is also cumbersome
to write down the long list of terms that we will need to add up. This motivates inventing
some helpful notation, and finding some clever way of performing such calculations.
3 Note that the idea of computing a volume of a radially symmetric 3D shape by dissection into disks will form
one of the main themes in Chapter 5. Here, the sums of the volumes of disks is exactly the same as the volume of
the tower. Later on, the disks will only approximate the true 3D volume, and a limit will be needed to arrive at a
“true volume”.
1.4. Summations and the “Sigma” notation 9
N
$
S = a1 + a2 + a3 + . . . + aN ≡ ak .
k=1
The Greek symbol Σ (“Sigma”) indicates summation. The symbol k used here is
called the “index of summation” and it keeps track of where we are in the list of summands.
The notation k = 1 that appears underneath Σ indicates where the sum begins (i.e. which
term starts off the series), and the superscript N tells us where it ends. We will be interested
in getting used to this notation, as well as in actually computing the value of the desired
sum using a variety of shortcuts.
The notation . . . signifies that we have left out some of the terms (out of laziness, or in cases
where there are too many to conveniently write down.) We could have just as well written
the sum with another symbol (e.g. n) as the index, i.e. the same operation is implied by
10
$
1.
n=1
To compute the value of the sum we use the elementary fact that the sum of ten ones is just
10, so
$10
S= 1 = 10.
k=1
Solution
4
$
S= k 2 = 1 + 22 + 32 + 42 = 1 + 4 + 9 + 16 = 30.
k=1
(We have already seen this sum in part (a) of The Tower of Hanoi.)
10 Chapter 1. Areas, volumes and simple sums
S = 3 + 3 + 3 + 3 + . . . + 3.
Solution
There are 100 terms, all equal, so we can take out a common factor
100
$ 100
$
S = 3 + 3 + 3 + 3 + ...+ 3 = 3=3 1 = 3(100) = 300.
k=1 k=1
Solution
We recognize that there is a pattern in the sequence of terms, namely, each one is 1/3 raised
to an increasing integer power, i.e.
" #2 " #3 " #4
1 1 1 1
S= + + + .
3 3 3 3
We can represent this with the “Sigma” notation as follows:
4 " #n
$ 1
S= .
n=1
3
The “index” n starts at 1, and counts up through 2, 3, and 4, while each term has the form of
(1/3)n . This series is a geometric series, to be explored shortly. In most cases, a standard
geometric series starts off with the value 1. We can easily modify our notation to include
additional terms, for example:
5 " #n " #2 " #3 " #4 " #5
$ 1 1 1 1 1 1
S= =1+ + + + + .
n=0
3 3 3 3 3 3
Learning how to compute the sum of such terms will be important to us, and will be de-
scribed later on in this chapter.
Solution
10
$ 10
$
2k − 2k = (2 + 22 + 23 + · · · + 210 ) − (23 + · · · + 210 ) = 2 + 22 .
k=1 k=3
The idea is that all but the first two terms in the first sum will cancel. The only remaining
terms are those corresponding to k = 1 and k = 2.
Solution
5
$ 5
$ 5
$
(1 + 3n ) = 1+ 3n .
n=0 n=0 n=0
N
$ N (N + 1)
S = 1 + 2 + 3 + ...+ N = k= . (1.1)
2
k=1
12 Chapter 1. Areas, volumes and simple sums
The following trick is due to Gauss. By aligning two copies of the above sum, one
written backwards, we can easily add them up one by one vertically. We see that:
S= 1 + 2 + ... + (N − 1) + N
+
S= N + (N − 1) + . . . + 2 + 1
2S = (1 + N ) + (1 + N ) + ... + (1 + N ) + (1 + N )
Thus, there are N times the value (N + 1) above, so that
N (1 + N )
2S = N (1 + N ), so S= .
2
Thus, Gauss’ formula is confirmed.
Two other useful formulae are those for the sums of consecutive squares and of
consecutive cubes:
N
$ N (N + 1)(2N + 1)
S2 = 1 2 + 2 2 + 3 2 + . . . + N 2 = k2 = . (1.2)
6
k=1
N " #2
3 3 3 3
$
3 N (N + 1)
S3 = 1 + 2 + 3 + . . . + N = k = . (1.3)
2
k=1
In the Appendix, we show how the formula for the sum of square integers can be
proved by a technique called mathematical induction.
Solutions
(a) We can separate this into three individual sums, each of which can be handled by alge-
braic simplification and/or use of the summation formulae developed so far.
20
$ 20
$ 20
$ 20
$
Sa = (2 − 3k + 2k 2 ) = 2 1−3 k+2 k2 .
k=1 k=1 k=1 k=1
Thus, we get
" # " #
20(21) (20)(21)(41)
Sa = 2(20) − 3 +2 = 5150.
2 6
(b) We can express the second sum as a difference of two sums:
50
% 50 & % 9 &
$ $ $
Sb = k= k − k .
k=10 k=1 k=1
Thus " #
50(51) 9(10)
Sb = − = 1275 − 45 = 1230.
2 2
Subtracting leads to
S − rS = (1 + r + r2 + . . . + rN ) − (r + r2 + . . . + rN + rN +1 )
S(1 − r) = 1 − rN +1 .
1 − rN +1
S= ,
1−r
which was the formula to be established.
Solution
This is a geometric series
10
$ 1 − 210+1 1 − 2048
Sc = 2k = = = 2047.
1−2 −1
k=0
when the series becomes an infinite series. We will use the following definition:
4 Convergence and divergence of series is discussed in fuller depth in Chapter 10 in the context of Taylor Series.
However, these concepts are so important that it was felt necessary to introduce some preliminary ideas early in
the term.
1.7. Prelude to infinite series 15
Definition
Suppose that S is an (infinite) series whose terms are ak . Then the partial sums, Sn , of this
series are
$n
Sn = ak .
k=0
We say that the sum of the infinite series is S, and write
∞
$
S= ak ,
k=0
provided that
n
$
S = lim ak .
n→∞
k=0
That is, we consider the infinite series as the limit of the partial sums as the number of
terms n is increased. In this case we also say that the infinite series converges to S.
We will see that only under certain circumstances will infinite series have a finite
sum, and we will be interested in exploring two questions:
1. Under what circumstances does an infinite series have a finite sum.
2. What value does the partial sum approach as more and more terms are included.
In the case of a geometric series, the sum of the series, (1.4) depends on the number
of terms in the series, n via rn+1 . Whenever r > 1, or r < −1, this term will get bigger in
magnitude as n increases, whereas, for 0 < r < 1, this term decreases in magnitude with
n. We can say that
lim rn+1 = 0 provided |r| < 1.
n→∞
These observations are illustrated by two specific examples below. This leads to the fol-
lowing conclusion:
Then
1 − (1/2)n+1
Sn = .
1 − (1/2)
We observe that as n increases, i.e. as we retain more and more terms, we obtain
1 − (1/2)n+1 1
lim Sn = lim = = 2.
n→∞ n→∞ 1 − (1/2) 1 − (1/2)
In this case, we write
∞ " #n
$ 1 1 1
=1+ + ( )2 + . . . = 2
n=0
2 2 2
and we say that “the (infinite) series converges to 2”.
We observe that as n grows larger, the sum continues to grow indefinitely. In this case, we
say that the sum does not converge, or, equivalently, that the sum diverges.
It is important to remember that an infinite series, i.e. a sum with infinitely many
terms added up, can exhibit either one of these two very different behaviours. It may
converge in some cases, as the first example shows, or diverge (fail to converge) in other
cases. We will see examples of each of these trends again. It is essential to be able to
distinguish the two. Divergent series (or series that diverge under certain conditions) must
be handled with particular care, for otherwise, we may find contradictions or seemingly
reasonable calculations that have meaningless results.
series. It is further studied in the homework problems. A similar example is given as an exercise for the student
in Lab 1 of this calculus course.
1.8. Application of geometric series to the branching structure of the lungs 17
Our lungs pack an amazingly large surface area into a confined volume. Most of
the oxygen exchange takes place in tiny sacs called alveoli at the terminal branches of the
airways passages. The bronchial tubes conduct air, and distribute it to the many smaller
and smaller tubes that eventually lead to those alveoli. The principle of this efficient organ
for oxygen exchange is that these very many small structures present a very large surface
area. Oxygen from the air can diffuse across this area into the bloodstream very efficiently.
The lungs, and many other biological “distribution systems” are composed of a
branched structure. The initial segment is quite large. It bifurcates into smaller segments,
which then bifurcate further, and so on, resulting in a geometric expansion in the number of
branches, their collective volume, length, etc. In this section, we apply geometric series to
explore this branched structure of the lung. We will construct a simple mathematical model
and explore its consequences. The model will consist in some well-formulated assumptions
about the way that “daughter branches” are related to their “parent branch”. Based on these
assumptions, and on tools developed in this chapter, we will then predict properties of the
structure as a whole. We will be particularly interested in the volume V and the surface
area S of the airway passages in the lungs6 .
r0
Segment 0
l0
Figure 1.6. Air passages in the lungs consist of a branched structure. The index
n refers to the branch generation, starting from the initial segment, labeled 0. All segments
are assumed to be cylindrical, with radius rn and length $n in the n’th generation.
1.8.1 Assumptions
• The airway passages consist of many “generations” of branched segments. We label
the largest segment with index “0”, and its daughter segments with index “1”, their
successive daughters “2”, and so on down the structure from large to small branch
segments. We assume that there are M “generations”, i.e. the initial segment has un-
dergone M subdivisions. Figure 1.6 shows only generations 0, 1, and 2. (Typically,
for human lungs there can be up to 25-30 generations of branching.)
• At each generation, every segment is approximated as a cylinder of radius rn and
length $n .
6 The surface area of the bronchial tubes does not actually absorb much oxygen, in humans. However, as an
example of summation, we will compute this area and compare how it grows to the growth of the volume from
one branching layer to the next.
18 Chapter 1. Areas, volumes and simple sums
• The number of branches grows along the “tree”. On average, each parent branch
produces b daughter branches. In Figure 1.6, we have illustrated this idea for b = 2.
A branched structure in which each branch produces two daughter branches is de-
scribed as a bifurcating tree structure (whereas trifurcating implies b = 3). In real
lungs, the branching is slightly irregular. Not every level of the structure bifurcates,
but in general, averaging over the many branches in the structure b is smaller than 2.
In fact, the rule that links the number of branches in generation n, here denoted xn
with the number (of smaller branches) in the next generation, xn+1 is
• The ratios of radii and lengths of daughters to parents are approximated by “pro-
portional scaling”. This means that the relationship of the radii and lengths satisfy
simple rules: The lengths are related by
Rules such as those given by equations (1.7) and (1.8) are often called self-similar growth
laws. Such concepts are closely linked to the idea of fractals, i.e. theoretical structures
produced by iterating such growth laws indefinitely. In a real biological structure, the
1.8. Application of geometric series to the branching structure of the lungs 19
number of generations is finite. (However, in some cases, a finite geometric series is well-
approximated by an infinite sum.)
Actual lungs are not fully symmetric branching structures, but the above approxi-
mations are used here for simplicity. According to physiological measurements, the scale
factors for sizes of daughter to parent size are in the range 0.65 ≤ α, β ≤ 0.9. (K. G.
Horsfield, G. Dart, D. E. Olson, and G. Cumming, (1971) J. Appl. Phys. 31, 207217.) For
the purposes of this example, we will use the values of constants given in Table 1.1.
initial value: x0
first iteration: x1 = bx0
second iteration: x2 = bx1 = b(bx0 ) = b2 x0
third iteration: x3 = bx2 = b(b2 x0 ) = b3 x0
..
.
By the same pattern, at the n’th generation, the number of segments will be
xn = bn x0 . (1.10)
This connection between the rule linking two generations and the resulting number of
members at each generation is useful in other circumstances. Equation (1.9) is sometimes
called a recursion relation, and its solution is given by equation (1.10). We will use the
same idea to find the connection between the volumes, and surface areas of successive
segments in the branching structure.
20 Chapter 1. Areas, volumes and simple sums
xn = x0 bn = 1 · bn = bn .
For example, if b = 2, the number of segments grows by powers of 2, so that the tree
bifurcates with the pattern 1, 2, 4, 8, etc.
To determine how many branch segments there are in total, we add up over all gen-
erations, 0, 1, . . . M . This is a geometric series, whose sum we can compute. Using equa-
tion (1.4), we find
M " #
$ 1 − bM+1
N= bn = .
n=0
1−b
Given b and M , we can then predict the exact number of segments in the structure. The
calculation is summarized further on for values of the branching parameter, b, and the
number of branch generations, M , given in Table 1.1.
vn = πrn2 $n .
Here we mean just a single segment in the n’th generation of branches. (There are bn such
identical segments in the n’th generation, and we will refer to the volume of all of them
together as Vn below.)
The length and radius of segments also follow a geometric progression. In fact, the
same idea developed above can be used to relate the length and radius of a segment in the
n’th, generation segment to the length and radius of the original 0’th generation segment,
namely,
$n = α$n−1 ⇒ $n = αn $0 ,
and
rn = βrn−1 ⇒ rn = β n r0 .
Thus the volume of one segment in generation n is
This is just a product of the initial segment volume v0 = πr02 $0 , with the n’th power of a
certain factor(α, β). (That factor takes into account that both the radius and the length are
being scaled down at every successive generation of branching.) Thus
vn = (αβ 2 )n v0 .
1.8. Application of geometric series to the branching structure of the lungs 21
Vn = bn vn = bn (αβ 2 )n v0 = (bαβ 2 )n v0 .
' () *
a
Here we have grouped terms together to reveal the simple structure of the relationship:
one part of the expression is just the initial segment volume, while the other is now a
“scale factor” that includes not only changes in length and radius, but also in the number of
branches. Letting the constant a stand for that scale factor, a = (bαβ 2 ) leads to the result
that the volume of all segments in the n’th layer is
Vn = an v0 .
The total volume of the structure is obtained by summing the volumes obtained at
each layer. Since this is a geometric series, we can use the summation formula. i.e.,
Equation (1.4). Accordingly, total airways volume is
30 30 " #
$ $ 1 − aM+1
V = Vn = v0 an = v0 .
n=0 n=0
1−a
The similarity of treatment with the previous calculation of number of branches is appar-
ent. We compute the value of the constant a in Table 1.2, and find the total volume in
Section 1.8.6.
where s0 is the surface area of the initial segment. Since there are bn branches at generation
n, the total surface area of all the n’th generation branches is thus
Sn = bn (αβ)n s0 = (bαβ )n s0 ,
'()*
c
where we have let c stand for the scale factor c = (bαβ). Thus,
S n = cn s0 .
This reveals the similar nature of the problem. To find the total surface area of the airways,
we sum up,
M " #
$ 1 − cM+1
S = s0 cn = s0 .
n=0
1−c
We compute the values of s0 and c in Table 1.2, and summarize final calculations of the
total airways surface area in section 1.8.6.
22 Chapter 1. Areas, volumes and simple sums
Table 1.2. Volume, surface area, scale factors, and other derived quantities. Be-
cause a and c are bases that will be raised to large powers, it is important to that their
values are fairly accurate, so we keep more significant figures.
M " # " #
$ 1 − bM+1 1 − (1.7)31
N= bn = = = 1.9898 · 107 ≈ 2 · 107 .
n=0
1−b 1 − 1.7
According to this calculation, there are a total of about 20 million branch segments overall
(including all layers, form top to bottom) in the entire structure!
Using the values for a and v0 computed in Table 1.2, we find that the total volume of all
segments in the n’th generation is
30 " #
$ 1 − aM+1 (1 − 1.13158831)
V = v0 an = v0 = 4.4 = 1510.3 cm3 .
n=0
1−a (1 − 1.131588)
Recall that 1 litre = 1000 cm3 . Then we have found that the lung airways contain about 1.5
litres.
1.8. Application of geometric series to the branching structure of the lungs 23
There are 100 cm per meter, and (100)2 = 104 cm2 per m2 . Thus, the area we have
computed is equivalent to about 28 square meters!
1500.0 1500.0
Cumulative volume to layer n Cumulative volume to layer n
(a) (b)
250000.0 250000.0
Cumulative surface area to n’th layer
Cumulative surface area to n’th layer
(c) (d)
Figure 1.7. (a) Vn , the volume of layer n (red bars), and the cumulative volume
down to layer n (yellow bars) are shown for parameters given in Table 1.1. (b) Same as (a)
but assuming that parent segments always produce two daughter branches (i.e. b = 2). The
graphs in (a) and (b) are shown on the same scale to accentuate the much more dramatic
growth in (b). (c) and (d): same idea showing the surface area of n’th layer (green) and
the cumulative surface area to layer n (blue) for original parameters (in c), as well as for
the value b = 2 (in d).
4. Suppose we want a set of tubes with a large surface area but small total volume.
Which single factor or parameter should we change (and how should we change it) to
correct this feature of the model, i.e. to predict that the total volume of the branching
tubes remains roughly constant while the surface area increases as branching layers
are added.
5. Determine how the branching properties of real human lungs differs from our as-
sumed model, and use similar ideas to refine and correct our estimates. You may
want to investigate what is known about the actual branching parameter b, the num-
ber of generations of branches, M , and the ratios of lengths and radii that we have
assumed. Alternately, you may wish to find parameters for other species and do a
1.9. Summary 25
1.9 Summary
In this chapter, we collected useful formulae for areas and volumes of simple 2D and 3D
shapes. A summary of the most important ones is given below. Table 1.3 lists the areas of
simple shapes, Table 1.4 the volumes and Table 1.5 the surface areas of 3D shapes.
We used areas of triangles to compute areas of more complicated shapes, including
regular polygons. We used a polygon with N sides to approximate the area of a circle, and
then, by letting N go to infinity, we were able to prove that the area of a circle of radius r
is A = πr2 . This idea, and others related to it, will form a deep underlying theme in the
next two chapters and later on in this course.
We introduced some notation for series and collected useful formulae for summation
of such series. These are summarized in Table 1.6. We will use these extensively in our
next chapter.
Finally, we investigated geometric series and studied a biological application, namely
the branching structure of lungs.
1
triangle base b, height h 2 bh
4 3
sphere radius r 3 πr
+N , -2
N (N +1)
13 + 23 + 33 + . . . + N 3 k=1 k
3
2 Sum of cubes
+N 1−r N +1
1 + r + r2 + r3 . . . rN k=0 rk 1−r Geometric sum
Areas
y=f(x)
a b x
We now consider the problem of determining the area of a region in the plane that
has the following special properties: The region is formed by straight lines on three sides,
and by a smooth curve on one of its edges, as shown in Figure 2.1. You might imagine
that the shaded portion of this figure is a plot of land bounded by fences on three sides, and
by a river on the fourth side. A farmer wishing to purchase this land would want to know
exactly how large an area is being acquired. Here we set up the calculation of that area.
27
28 Chapter 2. Areas
1.0 1.0
y=f(x)=x^2 y=f(x)=x^2
N=10 rectangles N=20 rectangles
0.0 0.0
0.0 x 1.0 0.0 x 1.0
1.0 1.0
y=f(x)=x^2 y=f(x)=x^2
N -> infinity
N=40 rectangles
0.0 0.0
0.0 x 1.0 0.0 x 1.0
approximation is fairly coarse when the number of rectangles is small9 . However, if the
number of rectangles is increased, (as shown in subsequent panels of this same figure), we
8 Not all planar areas have this property. Later examples indicate how to deal with some that do not.
9 That is, the area of the rectangles is very different from the area of the region of interest.
2.2. Computing the area under a curve by rectangular strips 29
obtain a better and better approximation of the true area. In the limit as N , the number of
rectangles, approaches infinity, the area of the desired region is obtained. This idea will
form the core of this chapter. The reader will note a similarity with the idea we already en-
countered in obtaining the area of a circle, though in that context, we had used a dissection
of the circle into approximating triangles.
With this idea in mind, in Section 2.2, we compute the area of the region shown in
Figure 2.2 in two ways. First, we use a simple spreadsheet to do the computations for us.
This is meant to illustrate the “numerical approach”.
Then, as the alternate analytic approach , we set up the Riemann sum corresponding
to the function shown in Figure 2.2. We will find that carefully setting up the calculation
of areas of the approximating rectangles will be important. Making a cameo appearance
in this calculation will be the formula for the sums of square integers developed in the
previous chapter. A new feature will be the limit N → ∞ that introduces the final step of
arriving at the smooth region shown in the final panel of Figure 2.2.
10 Note that all these values are approximations, correct to 4 decimal places. Compare with the exact calcula-
y = f (x) = x2 , 0 ≤ x ≤ 1.
By this we mean that we use “pen-and-paper” calculations, rather than computational aids
to determine that area.
We set up the rectangles (as shown in Figure 2.2, with detailed labeling in Fig-
ures 2.3), determine the heights and areas of these rectangle, sum their total area, and
then determine how this value behaves as the rectangles get more numerous (and thinner).
y y
y=f(x)=x2
y=f(x)
f(x N)
f(x)
f(x k )
Figure 2.3. The region under the graph of y = f (x) for 0 ≤ x ≤ 1 will be
approximated by a set of N rectangles. A rectangle (shaded) has base width ∆x and
height f (x). Since 0 ≤ x ≤ 1, and the all rectangles have the same base width, it follows
that ∆x = 1/N . In the panel on the right, the coordinates of base corners and two typical
heights of the rectangles have been labeled. Here x0 = 0, xN = 1 and xk = k∆x.
Table 2.1. The label, position, height, and area ak of each rectangular strip is
shown above. Each rectangle has the same base width, ∆x = 1/N . We approximate the
area under the curve y = f (x) = x2 by the sum of the values in the last column, i.e. the
total area of the rectangles.
rectangle on the curve. This implies that the height of the k-th rectangle is obtained from
substituting xk into the function, i.e. height = f (xk ). The base of every rectangle is the
same, i.e. base = ∆x = 1/N . This means that the area of the k-th rectangle, shown shaded,
is
ak = height × base = f (xk )∆x
A list of rectangles, and their properties are shown in Table 2.1. This may help the
reader to see the pattern that emerges in the summation. (In general this table is not needed
in our work, and it is presented for this example only, to help visualize how heights of
rectangles behave.) The total area of all rectangular strips (a sum of the values in the right
column of Table 2.1) is
N N N " #2 " #
$ $ $ k 1
AN strips = ak = f (xk )∆x = . (2.1)
N N
k=1 k=1 k=1
The expressions shown in Eqn. (2.1) is a Riemann sum. A recurring theme underlying
integral calculus is the relationship between Riemann sums and definite integrals, a concept
introduced later on in this chapter.
32 Chapter 2. Areas
We now rewrite this sum in a more convenient form so that summation formulae
developed in Chapter 1 can be used. In this sum, only the quantity k changes from term to
term. All other quantities are common factors, so that
" # N
1 $ 2
AN strips = k .
N3
k=1
The formula (1.2) for the sum of square integers can be applied to the summation, resulting
in " #
1 N (N + 1)(2N + 1) (N + 1)(2N + 1)
AN strips = = . (2.2)
N3 6 6N 2
In the box below, we use Eqn. (2.2) to compute that approximate area for values of N
shown in the first three panels of Fig 2.2. Note that these are comparable to the values we
obtained “numerically” in Section 2.2.1. (We plug in the value of N into (2.2) and use a
calculator to obtain the results below.)
If N = 10 strips (Figure 2.2a), the width of each strip is 0.1 unit. According to equa-
tion 2.2, the area of the 10 strips (shown in red) is
(10 + 1)(2 · 10 + 1)
A10 strips = = 0.385.
6 · 102
If N = 20 strips (Figure 2.2b), ∆x = 1/20 = 0.05, and
(20 + 1)(2 · 20 + 1)
A20 strips = = 0.35875.
6 · 202
If N = 40 strips (Figure 2.2c), ∆x = 1/40 = 0.025 and
(40 + 1)(2 · 40 + 1)
A40 strips = = 0.3459375.
6 · 402
We will define the true area under the graph of the function y = f (x) over the given
interval to be:
A = lim AN strips .
N →∞
This means that the true area is obtained by letting the number of rectangular strips, N , get
very large, (while the width of each one, ∆x = 1/N gets very small.)
In the example discussed in this section, the true area is found by taking the limit as
N gets large in equation (2.2), i.e.,
" #
1 (N + 1)(2N + 1) 1 (N + 1)(2N + 1)
A = lim = lim .
N →∞ N 2 6 6 N →∞ N2
To evaluate this limit, note that when N gets very large, we can use the approximations,
(N + 1) ≈ N and (2N + 1) ≈ 2N so that (simplifying and cancelling common factors)
(N + 1)(2N + 1) (N ) (2N )
lim = lim = 2.
N →∞ N2 N →∞ N N
2.3. The area of a leaf 33
2.2.3 Comments
Many student who have had calculus before in highschool, ask “why do we bother with
such tedious calculations, when we could just use integration?”. Indeed, our development
of Riemann sums foreshadows and anticipates the idea of a definite integral, and in short
order, some powerful techniques will help to shortcut such technical calculations. There
are two reasons why we linger on Riemann sums. First, in order to understand integration
adequately, we must understand the underlying “technology” and concepts; this proves
vital in understanding how to use the methods, and when things can go wrong. It also helps
to understand what integrals represent in applications that occur later on. Second, even
though we will shortly have better tools for analytical calculations, the ideas of setting up
area approximations using rectangular strips is very similar to the way that the spreadsheet
computations are designed. (However, the summation is handled automatically using the
spreadsheet, and no “formulae” are needed.) In Section 2.2.1, we gave only few details of
the steps involved. The student will find that understanding the ideas of Section 2.2.2 will
go hand-in-hand with understanding the numerical approach of Section 2.2.1.
The ideas outlined above can be applied to more complicated situations. In the next
section we consider a practical problem in which a similar calculation is carried out.
length of interval = 1 − 0 = 1
34 Chapter 2. Areas
y
y=f(x)=x(1−x)
x
0
1
y
k’th
rectangle yk =f(xk )
(enlarged)
x
x 0=0 xk
x1 x2 xn =1 Δx
Figure 2.4. In this figure we show how the area of a leaf can be approximated by
rectangular strips.
number of segments, N
1
width of rectangular strips, ∆x =
N
1 k
the k’th x value, xk = k =
N N
height of k’th rectangular strip, f (xk ) = xk (1 − xk )
The representative k’th rectangle is shown shaded in Figure 2.4: Its area is
"
# " #
1 k k
ak = base × height = ∆x · f (xk ) = · (1 − ) .
N N N
' () * ' () *
∆x f (xk )
N N N " # " #
$ $ $ 1 k k
AN strips = ak = ∆x · f (xk ) = · (1 − ) .
N N N
k=1 k=1 k=1
2.4. Area under an exponential curve 35
Simplifying the result (so we can use summation formulae) leads to:
" N "
#$ # " # N " # N
1 k k 1 $ 1 $ 2
AN strips = (1 − ) = k − k .
N N N N2 N3
k=1 k=1 k=1
Using the summation formulae (1.1) and(1.2) from Chapter 1 results in:
" #" # " #" #
1 N (N + 1) 1 (2N + 1)N (N + 1)
AN strips = − .
N2 2 N3 6
This is the area for a finite number, N , of rectangular strips. As before, the true area is
obtained as the limit as N goes to infinity, i.e. A = limN →∞ AN strips . We obtain:
" # " #
1 (N + 1) 1 (2N + 1)(N + 1) 1 1 1
A = lim − lim = − ·2= .
N →∞ 2 N N →∞ 6 N2 2 6 6
Remark:
The function in this example can be written as y = x − x2 . For part of this expression,
we have seen a similar calculation in Section 2.2. This example illustrates an important
property of sums, namely the fact that we can rearrange the terms into simpler expressions
that can be summed individually.
In the homework problems accompanying this chapter, we investigate how to de-
scribe leaves with arbitrary lengths and widths, as well as leaves with shapes that are ta-
pered, broad, or less symmetric than the current example.
ez ≈ 1 + z
36 Chapter 2. Areas
length of interval = 2 − 0 = 2
number of segments = N
2
width of rectangular strips, ∆x =
N
2 2k
the k’th x value, xk = k =
N N
height of k’th rectangular strip, f (xk ) = exk = e2(2k/N ) = e4k/N
We observe that the length of the interval (here 2) has affected the details of the calculation.
As before, the area of the k’th rectangle is
" #
2
ak = base × height = ∆x × f (xk ) = e4k/N ,
N
and the total area of all the rectangles is
" N
#$ " N
#$ " # %$
N
&
2 2 2
AN strips = e4k/N = rk = rk − r0 ,
N N N
k=1 k=1 k=0
where r = e4/N . This is a finite geometric series. Because the series starts with k = 1 and
not with k = 0, the sum is
" #. /
2 (1 − rN +1 )
AN strips = −1 .
N (1 − r)
interval divided by N). Since the interval starts at x0 = 2, and increments in units of (3/N ),
the k’th coordinate is xk = 2 + k(3/N ) = 2 + (3k/N ). The area of the k’th rectangle is
then AK = f (xk ) × ∆x = [(2 + (3k/N ))2 ](3/N ), and this is to be summed over k. A
similar algebraic simplification, summation formulae, and limit is needed to calculate the
true area.
Other examples
In the Appendix 11.2 we discuss a number of other examples with several modifications:
First, in Appendix 11.2.1, we show how to set up a Riemann sum for a more complicated
quadratic function on a general interval, a ≤ x ≤ b.
Second, we show how Riemann sums can be set up for left, rather than right endpoint
approximations. The results are entirely analogous.
y=f(x)
x
a b
Figure 2.5. The shaded area A corresponds to the definite integral I of the func-
tion f (x) over the interval a ≤ x ≤ b.
f (x) > 0 that is bounded and continuous11 on an interval [a, b] (also written a ≤ x ≤ b),
we define the definite integral,
0 b
I= f (x) dx (2.4)
a
to be the area A of the region under the graph of the function between the endpoints a and
b. See Figure 2.5.
2.6.1 Remarks
1. The definite integral is a number.
11 A function is said to be bounded if its graph stays between some pair of horizontal lines. It is continuous if
2. The value of the definite integral depends on the function, and on the two end points
of the interval.
3. From previous remarks, we have a procedure to calculate the value of the definite
integral by dissecting the region into rectangular strips, summing up the total area of
the strips, and taking a limit as N , the number of strips gets large. (The calculation
may be non-trivial, and might involve sums that we have not discussed in our simple
examples so far, but in principle the procedure is well-defined.)
y (a) (b)
x
0 1 0 1
y
(c) (d)
2 4 0 2 x
Figure 2.6. Examples (1-4) relate areas shown above to definite integrals.
2.6.2 Examples
We have calculated the areas of regions bounded by particularly simple functions. To
practice notation, we write down the corresponding definite integral in each case. Note
that in many of the examples below, we need no elaborate calculations, but merely use
previously known or recently derived results, to familiarize the reader with the new notation
just defined.
Example (1)
The area under the function y = f (x) = x over the interval 0 ≤ x ≤ 1 is triangular,
with base and height 1. The area of this triangle is thus A = (1/2)base× height= 0.5
(Figure 2.6a). Hence,
0 1
x dx = 0.5.
0
2.7. The area as a function 39
Example (2)
In Section 2.2, we also computed the area under the function y = f (x) = x2 on the interval
0 ≤ x ≤ 1 and found its area to be 1/3 (See Eqn. (2.3) and Fig. 2.6(b)). Thus
0 1
x2 dx = 1/3 + 0.333.
0
Example (3)
A constant function of the form y = 1 over an interval 2 lex ≤ 4 would produce a rectan-
gular region in the plane, with base (4-2)=2 and height 1 (Figure 2.6(c)). Thus
0 4
1 dx = 2.
2
Example (4)
The function y = f (x) = 1 − x/2 (Figure 2.6(d)) forms a triangular region with base 2
and height 1, thus
0 2
(1 − x/2) dx = 1.
0
y=f(x)
A(x)
a x b
Figure 2.7. We define a new function A(x) to be the area associated with the
graph of some function y = f (x) from the fixed endpoint a up to the endpoint x, where
a ≤ x ≤ b.
We will investigate how the area under the graph of a function changes as one of the
endpoints of the interval moves. We can think of this as a function that gradually changes
40 Chapter 2. Areas
(i.e. the area accumulates) as we sweep across the interval (a, b) from left to right in
Figure 2.1. The function A(x) represents the area of the region shown in Figure 2.7.
Extending our definition of the definite integral, we might be tempted to use the
notation 0 x
A(x) = f (x) dx.
a
However, there is a slight problem with this notation: the symbol x is used in slightly
confusing ways, both as the argument of the function and as the variable endpoint of the
interval. To avoid possible confusion, we will prefer the notation
0 x
A(x) = f (s) ds.
a
where N denotes the “end” of the sum, and k keeps track of where we are in the process
of summation. The symbol s, sometimes called a “dummy variable” is analogous to the
summation symbol k.
In the upcoming Chapter 3, we will investigate properties of this new “area function”
A(x) defined above. This will lead us to the Fundamental Theorem of Calculus, and will
provide new and powerful tools to replace the dreary summations that we had to perform
in much of Chapter 2. Indeed, we are about to discover the amazing connection between a
function, the area A(x) under its curve, and the derivative of A(x).
2.8 Summary
In this chapter, we showed how to calculate the area of a region in the plane that is bounded
by the x axis, two lines of the form x = a and x = b, and the graph of a positive function
y = f (x). We also introduced the terminology “definite integral” (Section 2.6) and the
notation (2.4) to represent that area.
One of our main efforts here focused on how to actually compute that area by the
following set of steps:
We showed both the analytic approach, using Riemann sums and summation formu-
lae to find areas, as well as numerical approximations using a spreadsheet tool to arrive at
similar results. We then used a variety of examples to illustrate the concepts and arrive at
computed areas.
As a final important point, we noted that the area “under the graph of a function” can
itself be considered a function. This idea will emerge as particularly important and will lead
us to the key concept linking the geometric concept of areas with the analytic properties
of antiderivatives. We shall see this link in the Fundamental Theorem of Calculus, in
Chapter 3.
42 Chapter 2. Areas
Chapter 3
The Fundamental
Theorem of Calculus
In this chapter we will formulate one of the most important results of calculus, the Funda-
mental Theorem. This result will link together the notions of an integral and a derivative.
Using this result will allow us to replace the technical calculations of Chapter 2 by much
simpler procedures involving antiderivatives of a function.
to represent that quantity. We also set up a technique for computing areas: the procedure
for calculating the value of I is to write down a sum of areas of rectangular strips and to
compute a limit as the number of strips increases:
0 b N
$
I= f (x)dx = lim f (xk )∆x, (3.1)
a N →∞
k=1
where N is the number of strips used to approximate the region, k is an index associated
with the k’th strip, and ∆x = xk+1 − xk is the width of the rectangle. As the number of
strips increases (N → ∞), and their width decreases (∆x → 0), the sum becomes a better
and better approximation of the true area, and hence, of the definite integral, I. Example
of such calculations (tedious as they were) formed the main theme of Chapter 2 .
We can generalize the definite integral to include functions that are not strictly pos-
itive, as shown in Figure 3.1. To do so, note what happens as we incorporate strips cor-
responding to regions of the graph below the x axis: These are associated with negative
values of the function, so that the quantity f (xk )∆x in the above sum would be negative
for each rectangle in the “negative” portions of the function. This means that regions of the
graph below the x axis will contribute negatively to the net value of I.
43
44 Chapter 3. The Fundamental Theorem of Calculus
If we refer to A1 as the area corresponding to regions of the graph of f (x) above the
x axis, and A2 as the total area of regions of the graph under the x axis, then we will find
that the value of the definite integral I shown above will be
I = A1 − A2 .
Thus the notion of “area under the graph of a function” must be interpreted a little carefully
when the function dips below the axis.
y y
y=f(x) y=f(x)
x x
(a) (b)
y y
y=f(x) y=f(x)
x x
a a b c
(c) (d)
Figure 3.1. (a) If f (x) is negative in some regions, there are terms in the sum (3.1)
that carry negative signs: this happens for all rectangles in parts of the graph that dip
1b
below the x axis. (b) This means that the definite integral I = a f (x)dx will correspond
to the difference of two areas, A1 − A2 where A1 is the total area (dark) of positive regions
minus the total area (light) of negative portions of the graph. Properties of the definite
integral: (c) illustrates Property 1. (d) illustrates Property 2.
property means that an integral will satisfy the same the same property. We illustrate some
of these in Fig 3.1.
0 a
1. f (x)dx = 0,
a
0 c 0 b 0 c
2. f (x)dx = f (x)dx + f (x)dx,
a a b
0 b 0 b
3. Cf (x)dx = C f (x)dx,
a a
0 b 0 b 0 b
4. (f (x) + g(x))dx = f (x) + g(x)dx,
a a a
0 b 0 a
5. f (x)dx = − f (x)dx.
a b
Property 1 states that the “area” of a region with no width is zero. Property 2 shows
how a region can be broken up into two pieces whose total area is just the sum of the
individual areas. Properties 3 and 4 reflect the fact that the integral is actually just a sum,
and so satisfies properties of simple addition. Property 5 is obtained by noting that if
we perform the summation “in the opposite direction”, then we must replace the previous
“rectangle width” given by ∆x = xk+1 − xk by the new “width” which is of opposite sign:
xk − xk+1 . This accounts for the sign change shown in Property 5.
This endpoint is considered as a variable12, i.e. we will be interested in the way that this
area changes as the endpoint varies (Figure 3.2(a)). We will now investigate the interesting
connection between A(x) and the original function, f (x).
We would like to study how A(x) changes as x is increased ever so slightly. Let
∆x = h represent some (very small) increment in x. (Caution: do not confuse h with
height here. It is actually a step size along the x axis.) Then, according to our definition,
0 x+h
A(x + h) = f (t) dt.
a
12 Recall that the “dummy variable” t inside the integral is just a “place holder”, and is used to avoid confusion
with the endpoint of the integral (x in this case). Also note that the value of A(x) does not depend in any way on
t, so any letter or symbol in its place would do just as well.
46 Chapter 3. The Fundamental Theorem of Calculus
y y
y=f(x) y=f(x)
A(x) A(x+h)
a x a x x+h
(a) (b)
y y
y=f(x) y=f(x)
A(x+h)−A(x) f(x)
a x x+h a h
(c) (d)
Figure 3.2. When the right endpoint of the interval moves by a distance h, the area
of the region increases from A(x) to A(x + h). This leads to the important Fundamental
Theorem of Calculus, given in Eqn. (3.2).
In Figure 3.2(a)(b), we illustrate the areas represented by A(x) and by A(x + h), respec-
tively. The difference between the two areas is a thin sliver (shown in Figure 3.2(c)) that
looks much like a rectangular strip (Figure 3.2(d)). (Indeed, if h is small, then the approx-
imation of this sliver by a rectangle will be good.) The height of this sliver is specified
by the function f evaluated at the point x, i.e. by f (x), so that the area of the sliver is
approximately f (x) · h. Thus,
or
A(x + h) − A(x)
≈ f (x).
h
As h gets small, i.e. h → 0, we get a better and better approximation, so that, in the limit,
A(x + h) − A(x)
lim = f (x).
h→0 h
The ratio above should be recognizable. It is simply the derivative of the area function, i.e.
dA A(x + h) − A(x)
f (x) = = lim . (3.2)
dx h→0 h
3.4. The Fundamental Theorem of Calculus 47
We have just given a simple argument in support of an important result, called the
Fundamental Theorem of Calculus, which is restated below..
Proof
See above argument. and Figure 3.2.
We would say that x2 /2 is an “antiderivative” of x and that (x2 /2) + 1 is also an “an-
tiderivative” of x. In fact, any function of the form
x2
g(x) = +C where C is any constant
2
is also an “antiderivative” of x.
This example illustrates that adding a constant to a given function will not affect
the value of its derivative, or, stated another way, antiderivatives of a given function are
defined only up to some constant. We will use this fact shortly: if A(x) and F (x) are both
antiderivatives of some function f (x), then A(x) = F (x) + C.
13 We often write “antiderivative”, with no hyphen.
48 Chapter 3. The Fundamental Theorem of Calculus
Proof
From comments above, we know that a function f (x) could have many different antideriva-
tives that differ from one another by some additive constant. We are told that F (x) is an
antiderivative of f (x). But from Part I of the Fundamental Theorem, we know that A(x) is
also an antiderivative of f (x). It follows that
0 x
A(x) = f (t) dt = F (x) + C, where C is some constant. (3.3)
a
Thus,
C = −F (a).
Remark 1: Implications
This theorem has tremendous implications, because it allows us to use a powerful new
tool in determining areas under curves. Instead of the drudgery of summations in order to
compute areas, we will be able to use a shortcut: find an antiderivative, evaluate it at the
two endpoints a, b of the interval of interest, and subtract the results to get the area. In the
case of elementary functions, this will be very easy and convenient.
Remark 2: Notation
We will often use the notation
x
F (t)|a = F (x) − F (a)
Cx C C Cx
xm+1
xn nxn−1 xm m+1
1 1
ln(x) ln(x)
x x
1 1
arctan(x) arctan(x)
1 + x2 1 + x2
1 1
arcsin(x) √ √ arcsin(x)
1 − x2 1 − x2
Table 3.1. Common functions and their derivatives (on the left two columns) also
result in corresponding relationships between functions and their antiderivatives (right two
columns). In this table, we assume that m (= −1, b (= 0, k (= 0. Also, when using ln(x) as
antiderivative for 1/x, we assume that x > 0.
leads to
0 1 21
2 1 2 1 3 1 4 22
3 1 1 1
I= (1 + x + x + x ) dx = (x + x + x + x )2 = 1 + + + ≈ 2.083.
0 2 3 4 0 2 3 4
0 1
3. I = e−2x dx,
−1
0 π ,x-
4. I = sin dx,
0 2
Solutions
1. An antiderivative of f (x) = x2 is F (x) = (x3 /3), thus
0 1 21 21
2 2 1 1
I= x dx = F (x)2 = (1/3)(x )2 = (13 − 0) = .
2 2 3 2
0 0 0 3 3
14 In fact, it is very good practice to perform such checks.
3.6. Examples: Computing areas with the Fundamental Theorem of Calculus 51
Comment: The evaluation of Integral 2. in the examples above is tricky only in that signs
can easily get garbled when we plug in the endpoint at -1. However, we can simplify our
work by noting the symmetry of the function f (x) = 1 − x2 on the given interval. As
shown in Fig 3.3, the areas to the right and to the left of x = 0 are the same for the interval
−1 ≤ x ≤ 1. This stems directly from the fact that the function considered is even15 .
Thus, we can immediately write
2
4 21
0 1 0 1
3 3 4
I= (1 − x2 ) dx = 2 (1 − x2 ) dx = 2 x − (x3 /3) 22 = 2 1 − (13 /3) = 4/3
−1 0 0
Note that this calculation is simpler since the endpoint at x = 0 is trivial to plug in.
y=1−x 2
x
−1 0 1
Figure 3.3. We can exploit the symmetry of the function f (x) = 1 − x2 in the
second integral of Examples 3.6.2. We can integrate over 0 ≤ x ≤ 1 and double the result.
We state the general result we have obtained, which holds true for any function with
even symmetry integrated on a symmetric interval about x = 0:
15 Recall that a function f (x) is even if f (x) = f (−x) for all x. A function is odd if f (x) = −f (−x).
52 Chapter 3. The Fundamental Theorem of Calculus
y
2
1.8 y=x3
1.6 y=x
1.4
1.2
1.0 y=x1/3
0.8
0.6
0.4 A1
0.2 A2
x
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
Solution
(a) The two curves, y = x3 and y = x1/3 , intersect at x = 0 and at x = 1 in the first
quadrant. Thus the interval that we will be concerned with is 0 < x < 1. On this
interval, x1/3 > x3 , so that the area we want to find can be expressed as:
0 1, -
A1 = x1/3 − x3 dx.
0
Thus,
21 21
x4/3 22 x4 22 3 1 1
A1 = − = − = .
4/3 20 4 20 4 4 2
(b) The two curves y = x3 and y = x also intersect at x = 0 and at x = 1 in the
first quadrant, and on the interval 0 < x < 1 we have x > x3 . The area can be
represented as
0 1
3 4
A2 = x − x3 dx.
0
3.7. Qualitative ideas 53
21 21
x2 22 x4 22 1 1 1
A2 = − = − = .
2 20 4 20 2 4 4
(c) The area calculated in (a) is twice the area calculated in (b). The reason for this is that
x1/3 is the inverse of the function x3 , which means geometrically that the graph of
x1/3 is the mirror image of the graph of x3 reflected about the line y = x. Therefore,
the area A1 between y = x1/3 and y = x3 is twice as large as the area A2 between
y = x and y = x3 calculated in part (b): A1 = 2A2 (see Figure 3.4).
Solution
The area is
0 1000 " , x -2 # 0 1000 " " # #
1
A= 100 + dx. = 100 + x2 dx.
0 100 0 10000
Note that the multiplicative constant (1/10000) is not affected by integration. The result is
21000 21000 " #
2 x3 22 1 4
A = 100x2 2 + 2 · = 105 .
0 3 0 10000 3
which corresponds to the area associated with the graph of the function f . As x moves
from left to right, we show how the “area” accumulated along the graph gradually changes.
(See A(x) in bottom panels of Figure 3.5): We start with no area, at the point x = a
(since, by definition A(a) = 0) and gradually build up to some net positive amount, but
then we encounter a portion of the graph of f below the x axis, and this subtracts from
the amount accrued. (Hence the graph of A(x) has a little peak that corresponds to the
point at which f = 0.) Every time the function f (x) crosses the x axis, we see that A(x)
has either a maximum or minimum value. This fits well with our idea of A(x) as the
antiderivative of f (x): Places where A(x) has a critical point coincide with places where
dA/dx = f (x) = 0.
54 Chapter 3. The Fundamental Theorem of Calculus
f(x) f(x)
x x
A(x) A(x)
x x
(a) (b)
f(x) f(x)
x x
A(x) A(x)
x x
(c) (d)
Figure 3.5. Given a function f (x), we here show how to sketch the corresponding
“area function” A(x). (The relationship is that f (x) is the derivative of A(x)
Sketching the function A(x) is thus analogous to sketching a function g(x) when we
are given a sketch of its derivative g % (x). Recall that this was one of the skills we built up in
learning the connection between functions and their derivatives in a first semester calculus
course.
Remarks
The following remarks
1x may be helpful in gaining confidence with sketching the “area”
function A(x) = a f (t) dt, from the original function f (x):
1. The endpoint of the interval, a on the x axis indicates the place at which A(x) = 0.
1This
a
follows from Property 1 of the definite integral, i.e. from the fact that A(a) =
a
f (t) dt = 0.
2. Whenever f (x) is positive, A(x) is an increasing function - this follows from the fact
that the area continues to accumulate as we “sweep across” positive regions of f (x).
3.7. Qualitative ideas 55
f(x)
+ + +
a x
−
−
g(x)
x
a
3. Wherever f (x), changes sign, the function A(x) has a local minimum or maximum.
This means that either the area stops increasing (if the transition is from positive to
negative values of f ), or else the area starts to increase (if f crosses from negative to
positive values).
4. Since dA/dx = f (x) by the Fundamental Theorem of Calculus, it follows that (tak-
56 Chapter 3. The Fundamental Theorem of Calculus
ing a derivative of both sides) d2 A/dx2 = f % (x). Thus, when f (x) has a local
maximum or minimum, (i.e. f % (x) = 0), it follows that A%% (x) = 0. This means that
at such points, the function A(x) would have an inflection point.
Given a function f (x), Figure 3.6 shows in detail how to sketch the corresponding function
0 x
g(x) = f (t)dt.
a
f(x)
+ +
x
a −
g(x)
x
a
Figure 3.7. The original functions, f (x) is shown above. The corresponding
functions g(x) is drawn below.
Solution
See Figure 3.7
We find that 20 22 . / . /
x2 22 x2 22 1 4
I=− + =− 0− + − 0 = 2.5
2 2−1 2 20 2 2
y
y= |x|
x
−1 0 2
Figure 3.8. In this example, to compute the integral over the interval −1 ≤ x ≤ 2,
we must split up the region into two distinct parts.
√
We see that there is a problem here. Recall that x1/2 = x. Hence, the function is not
defined for x < 0 and the interval of integration is inappropriate. Hence, this integral does
not make sense.
Simple integration using the antiderivative in Table 3.1 (for k = −r) leads to the result
2b
e−rx 22 13 4 13 4
I= = − e−rb − e0 = 1 − e−rb .
−r 20 r r
This is the area under the exponential curve between x = 0 and x = b. Now consider what
happens when b, the upper endpoint of the integral increases, so that b → ∞. Then the
value of the integral becomes
0 b
13 4 1 1
I = lim e−rx dx = lim 1 − e−rb = (1 − 0) = .
b→∞ 0 b→∞ r r r
(We used the fact that e−rb → 0 as b → ∞.) We have, in essence, found that
0 ∞
1
I= e−rx dx = . (3.5)
0 r
An integral of the form (3.5) is called an improper integral. Even though the domain
of integration of this integral is infinite, (0, ∞), observe that the value we computed is
finite, so long as r (= 0. Not all such integrals have a bounded finite value. Learning to
distinguish between those that do and those that do not will form an important theme in
Chapter 10.
3.9. Summary 59
Δy
x x
Figure 3.9. The area in the region shown here is best computed by integrating
in the y direction. If we do so, we can use the curved boundary as a single function that
defines the region. (Note that the curve cannot be expressed in the form of a function in the
usual sense, y = f (x), but it can be expressed in the form of a function x = f (y).)
asked to find the area between the curve y 2 − y + x = 0 and the y axis. However, one and
the same curve, y 2 − y + x = 0 forms the boundary from both the top and the bottom of
the region. We are unable to set up a series of rectangles with bases along the x axis whose
heights are described by this curve. This means that our definite integral (which is really
just a convenient way of carrying out the process of area computation) has to be handled
with care.
Let us consider this problem from a “new angle”, i.e. with rectangles based on the y
axis, we can achieve the desired result. To do so, let us express our curve in the form
x = g(y) = y − y 2 .
Then, placing our rectangles along the interval 0 < y < 1 on the y axis (each having base
of width ∆y) leads to the integral
0 1 0 1 " 2 # 21
y y 3 22 1 1 1
I= g(y) dy = (y − y 2 )dy = − 2 = 2 − 3 = 6.
0 0 2 3 0
3.9 Summary
In this chapter we first recapped the definition of the definite integral in Section 3.1, recalled
its connection to an area in the plane under the graph of some function f (x), and examined
its basic properties.
If one of the endpoints, x of the integral is allowed to vary, the area it represents,
A(x), becomes a function of x. Our construction in Figure 3.2 showed that there is a con-
nection between the derivative A% (x) of the area and the function f (x). Indeed, we showed
that A% (x) = f (x) and argued that this makes A(x) an antiderivative of the function f (x).
60 Chapter 3. The Fundamental Theorem of Calculus
This important connection between integrals and antiderivatives is the crux of In-
tegral Calculus, forming the Fundamental Theorem of Calculus. Its significance is that
finding areas need not be as tedious and labored as the calculation of Riemann sums that
formed the bulk of Chapter 2. Rather, we can take a shortcut using antidifferentiation.
Motivated by this very important result, we reviewed some common functions and
derivatives, and used this to relate functions and their antiderivatives in Table 3.1. We
used these antiderivatives to calculate areas in several examples. Finally, we extended the
treatment to include qualitative sketches of functions and their antiderivatives.
As we will see in upcoming chapters, the ideas presented here have a much wider
range of applicability than simple area calculations. Indeed, we will shortly show that the
same concepts can be used to calculate net changes in continually varying processes, to
compute volumes of various shapes, to determine displacement from velocity, mass from
densities, as well as a host of other quantities that involve a process of accumulation. These
ideas will be investigated in Chapters 4, and 5.
Chapter 4
Applications of the
definite integral to
velocities and rates
4.1 Introduction
In this chapter, we encounter a number of applications of the definite integral to practical
problems. We will discuss the connection between acceleration, velocity and displacement
of a moving object, a topic we visited in an earlier, Differential Calculus Course. Here
we will show that the notion of antiderivatives and integrals allows us to deduce details of
the motion of an object from underlying Laws of Motion. We will consider both uniform
and accelerated motion, and recall how air resistance can be described, and what effect it
induces.
An important connection is made in this chapter between a rate of change (e.g. rate
of growth) and the total change (i.e. the net change resulting from all the accumulation and
loss over a time span). We show that such examples also involve the concept of integration,
which, fundamentally, is a cumulative summation of infinitesimal changes. This allows us
to extend the utility of the mathematical tools to a variety of novel situations. We will see
examples of this type in Sections 4.3 and 4.4.
Several other important ideas are introduced in this chapter. We encounter for the
first time the idea of spatial density, and see that integration can also be used to “add up”
the total amount of material distributed over space. In Section 5.2.2, this idea is applied to
the density of cars along a highway. We also consider mass distributions and the notion of
a center of mass.
Finally, we also show that the definite integral is useful for determining the average
value of a function, as discussed in Section 4.6. In all these examples, the important step
is to properly set up the definite integral that corresponds to the desired net change. Com-
putations at this stage are relatively straightforward to emphasize the process of setting up
the appropriate integrals and understanding what they represent.
61
62 Chapter 4. Applications of the definite integral to velocities and rates
The quantity on the right hand side of Eqn. (4.1) is a displacement,, i.e., the difference
between the position at time T1 and the position at time T2 . In the case that T1 = 0, T2 = T ,
we have 0 T
v(t) dt = x(T ) − x(0),
0
as the displacement over the time interval 0 ≤ t ≤ T .
Similarly, since velocity is an anti-derivative of acceleration, the Fundamental Theo-
rem of Calculus says that
0 T2 2T2
2
a(t) dt = v(t)22 = v(T2 ) − v(T1 ). (4.2)
T1 T1
is the net change in velocity between time 0 and time T , (though this quantity does not
have a special name).
v a
This area represents
displacement This area represents
net velocity change
t t
T1 T2 T1 T2
Figure 4.1. The total area under the velocity graph represents net displacement,
and the total area under the graph of acceleration represents the net change in velocity
over the interval T1 ≤ t ≤ T2 .
However, applying result (4.1) over the time interval 0 ≤ t ≤ T also leads to
0 T
v dt = x(T ) − x(0).
0
Therefore, it must be true that the two expressions obtained above must be equal, i.e.
x(T ) − x(0) = vT.
Thus, for uniform motion, the displacement is proportional to the velocity and to the time
elapsed. The final position is
x(T ) = x(0) + vT.
This is true for all time T , so we can rewrite the results in terms of the more familiar (lower
case) notation for time, t, i.e.
x(t) = x(0) + vt. (4.3)
Let us refer to the initial velocity V (0) as v0 . The above connection between velocity and
acceleration holds for any final time T , i.e., it is true for all t that:
This just means that velocity at time t is the initial velocity incremented by an increase (over
the given time interval) due to the acceleration. From this we can find the displacement and
position of the particle as follows: Let us call the initial position x(0) = x0 . Then
0 T
v(t) dt = x(T ) − x0 . (4.5)
0
But
0 T 0 T " # 2T " #
t2 2 = v0 T + a T
2 2
I= v(t) dt = (v0 + at) dt = v0 t + a . (4.6)
0 0 2 2
0 2
So, setting Equations (4.5) and (4.6) equal means that
T2
x(T ) − x0 = v0 T + a .
2
But this is true for all final times, T , i.e. this holds for any time t so that
t2
x(t) = x0 + v0 t + a . (4.7)
2
This expression represents the position of a particle at time t given that it experienced a
constant acceleration. The initial velocity v0 , initial position x0 and acceleration a allowed
us to predict the position of the object x(t) at any later time t. That is the meaning of
Eqn. (4.7)16.
dy
= −ky, y(0) = y0 (4.9)
dt
has a solution
y(t) = y0 e−kt . (4.10)
Initially, at time t = 0, the acceleration is a(0) = g (since a(t) = g − kv(t), and v(0) = 0).
Therefore,
a(t) = g e−kt .
Since we now have an explicit formula for acceleration vs time, we can apply direct inte-
gration as we did in the examples in Sections 4.2.2 and 4.2.3. The result is:
0 T 0 T 0 T . / 2T
−kt −kt e−kt 22 (e−kT − 1) g 3 4
a(t) dt = ge dt = g e dt = g =g = 1 − e−kT .
0 0 0 −k 0 2 −k k
In the calculation, we have used the fact that the antiderivative of e−kt is e−kt /k. (This can
be verified by simple differentiation.)
66 Chapter 4. Applications of the definite integral to velocities and rates
50.0
velocity v(t)
0.0
0.0 t 30.0
Figure 4.2. Terminal velocity (m/s) for acceleration due to gravity g=9.8 m/s2 ,
and k = 0.2/s. The velocity reaches a near constant 49 m/s by about 20 s.
g g
v(t) → (1 − very small quantity) ≈ .
k k
Thus, when drag forces are in effect, the falling object does not continue to accelerate
indefinitely: it eventually attains a terminal velocity. We have seen that this limiting
velocity is v = g/k. The object continues to fall at this (approximately constant) speed as
shown in Figure 4.2. The terminal velocity is also a steady state value of Eqn. (4.8), i.e. a
value of the velocity at which no further change occurs.
17 We will use the terminology “total change” and “net change” interchangeably in this section.
4.3. From rates of change to total change 67
Changing temperature
We must carefully distinguish between information about the time dependence of some
function, from information about the rate of change of some function. Here is an example
of these two different cases, and how we would handle them
where t is time in minutes. Find the change in the temperature of the juice between
the times t = 1 and t = 5.
(b) The rate of change of temperature of a cup of coffee is observed to be
◦
f (t) = 8e−0.2t Celcius per minute
where t is time in minutes. What is the total change in the temperature between
t = 1 and t = 5 minutes ?
Solutions
(a) In this case, we are given the temperature as a function of time. To determine what
net change occurred between times t = 1 and t = 5, we find the temperatures at
each time point and subtract: That is, the change in temperature between times t = 1
and t = 5 is simply
T (5) − T (1) = 25(1 − e−0.5 ) − 25(1 − e−0.1 ) = 25(0.94 − 0.606) = 7.47◦ Celcius.
(b) Here, we do not know the temperature at any time, but we are given information
about the rate of change. (Carefully note the subtle difference in the wording.)
To get the total change, we would sum up all the small changes, f (t)∆t (over N
subintervals of duration ∆t = (5 − 1)/N + = 4/N ) for t starting at 1 and ending
at 5 min. We obtain a sum of the form f (tk )∆t where tk is the k’th time point.
Finally, we take a limit as the number of subintervals increases (N → ∞). By now,
we recognize that this amounts to a process of integration. Based on this variation
of the same concept we can take the usual shortcut of integrating the rate of change,
f (t), from t = 1 to t = 5. To do so, we apply the Fundamental Theorem as before,
reducing the amount of computation to finding antiderivatives. We compute:
0 5 0 5 25
−0.2t
2
−0.2t 2 −1 −0.2
I= f (t) dt = 8e dt = −40e 2 = −40e + 40e ,
1 1 1
−0.2 −1
I = 40(e −e ) = 40(0.8187 − 0.3678) = 18.
Only in the second case did we need to use a definite integral to find a net change, since we
were given the way that the rate of change depended on time. Recognizing the subtleties
of the wording in such examples will be an important skill that the reader should gain.
68 Chapter 4. Applications of the definite integral to velocities and rates
2
Growth
rate
0 1 2 3 4 year
Figure 4.3. Growth rates of two trees over a four year period. Tree 1 initially has
a higher growth rate, but tree 2 catches up and grows faster after year 3.
Solution
In this problem we are provided with a sketch, rather than a formula for the growth rate of
the trees. Our solution will thus be qualitative (i.e. descriptive), rather than quantitative.
(This means we do not have to calculate anything; rather, we have to make some important
observations about the behaviour shown in Fig 4.3.)
We recognize that the net change in height of each tree is of the form
0 T
Hi (T ) − Hi (0) = gi (t)dt, i = 1, 2.
0
where i = 1 for tree 1, i = 2 for tree 2, gi (t) is the growth rate as a function of time
(shown for each tree in Figure 4.3) and Hi (t) is the height of tree “i” at time t. But, by the
Fundamental Theorem of Calculus, this definite integral corresponds to the area under the
curve gi (t) from t = 0 to t = T . Thus we must interpret the net change in height for each
tree as the area under its growth curve. We see from Figure 4.3 that at t = 1 year, the area
under the curve for tree 1 is greater, so it has grown more. At t = 4 years the area under
the second curve is greatest so tree 2 has grown the most by that time.
3.0
f(t)
0.0
0.0 time, t 14.0
Figure 4.4. Rate of change of radius, f (t) for a growing tree over a period of 14 years.
has been cyclic, so that, over a period of 14 years, the growth rate of the radius of the tree
trunk (in cm/year) is given by the function
as shown in Figure 4.4. Let the height of the tree trunk be approximately constant over this
ten year period, and assume that the density of the trunk is approximately 1 gm/cm3 .
(a) If the radius was initially r0 at time t = 0, what will the radius of the trunk be at
time t later?
(b) What is the ratio of the mass of the tree trunk at t = 10 years and t = 0 years?
(i.e. find the ratio mass(10)/mass(0).)
Solution
(a) Let R(t) denote the trunk’s radius at time t. The rate of change of the radius of the tree
is given by the function f (t), and we are told that at t = 0, R(0) = r0 . A graph of this
growth rate over the first fifteen years is shown in Figure 4.4. The net change in the radius
is 0 0
t t
R(t) − R(0) = f (s) ds = (1.5 + sin(πs/5)) ds.
0 0
Here we have used the fact that the antiderivative of sin(ax) is −(cos(ax)/a).
Thus, using the initial value, R(0) = r0 (which is a constant), and evaluating the
integral, leads to
5 cos(πt/5) 5
R(t) = r0 + 1.5t − + .
π π
70 Chapter 4. Applications of the definite integral to velocities and rates
(The constant at the end of the expression stems from the fact that cos(0) = 1.) A graph
of the radius of the tree over time (using r0 = 1) is shown in Figure 4.5. This function
is equivalent to the area associated with the function shown in Figure 4.4. Notice that
Figure 4.5 confirms that the radius keeps growing over the entire period, but that its growth
rate (slope of the curve) alternates between higher and lower values.
25.0
R(t)
0.0
0.0 time, t 14.0
Figure 4.5. The radius of the tree, R(t), as a function of time, obtained by inte-
grating the rate of change of radius shown in Fig. 4.4.
5 5
R(10) = r0 + 15 − cos(2π) + .
π π
But cos(2π) = 1, so
R(10) = r0 + 15.
(b) The mass of the tree is density times volume, and since the density in this example
is constant, 1 gm/cm3 , we need only obtain the volume at t = 10. Taking the trunk to be
cylindrical means that the volume at any given time is
V (t) = π[R(t)]2 h.
In this problem we used simple anti-differentiation to compute the desired total change.
We also related the graph of the radial growth rate in Fig. 4.4 to that of the resulting radius
at time t, in Fig. 4.5.
4.4. Production and removal 71
(a) How many babies in total were born during this time period (i.e in the first 10 years
after the war)?
(b) Find the time T0 such that the total number of babies born from the end of the war
up to the time T0 was precisely 14 million.
Solution
(a) To find the number of births, we would integrate the birth rate, b(t) over the given
time period. The net change in the population due to births (neglecting deaths) is
0 10 0 10
P (10)−P (0) = b(t) dt = (5+2t) dt = (5t+t2 )|10
0 = 50+100 = 150[million babies].
0 0
(b) Denote by T the time at which the total number of babies born was 14 million. Then,
[in units of million]
0 T 0 T
I= b(t) dt = 14 = (5 + 2t) dt = 5T + T 2
0 0
While this problem involves simple integration, we had to solve for a quantity (T ) based
on information about behaviour of that integral. Many problems in real application involve
such slight twists on the ideas of integration.
p(t)
r(t)
t
0 3 6 9 12 3 6 9 0
(noon) hour
Figure 4.6. The rate of hormone production p(t) and the rate of removel r(t) are
shown here. We want to use these graphs to deduce when the level of hormone is highest
and lowest.
A typical example of two such functions are shown in Figure 4.6. This figure shows
the production and removal rates over a period of 24 hours, starting at midnight. Our first
task will be to use properties of the graph in Figure 4.6 to answer the following questions:
5. Find the maximal level of hormone in the blood over the period shown, assuming
that its basal (lowest) level is H = 0.
Solutions
1. We see directly from Fig. 4.6 that production rate is maximal at about 9:00 am.
4.4. Production and removal 73
This means that the net change in hormone level over the given time interval (amount
produced minus amount removed) is
0 b
H(b) − H(a) = Ptotal − Rtotal = (p(t) − r(t))dt.
a
We interpret this integral as the area between the curves p(t) and r(t). But we
must use caution here: For any time interval over which p(t) > r(t), this integral
will be positive, and the hormone level will have increased. Otherwise, if r(t) <
p(t), the integral yields a negative result, so that the hormone level has decreased.
This makes simple intuitive sense: If production is greater than removal, the level
of the substance is accumulating, whereas in the opposite situation, the substance is
decreasing. With these remarks, we find that the hormone level in the blood will be
greatest at 3:00 pm, when the greatest (positive) area between the two curves has
been obtained.
4. Similarly, the least hormone level occurs after a period in which the removal rate has
been larger than production for the longest stretch. This occurs at 3:00 am, just as
the curves cross each other.
5. Here we will practice integration by actually fitting some cyclic functions to the
graphs shown in Figure 4.6. Our first observation is that if the length of the cycle
(also called the period) is 24 hours, then the frequency of the oscillation is ω =
(2π)/24 = π/12. We further observe that the functions shown in the Figure 4.7
have the form
1.5
0.5
0 5 10 15 20
t
Figure 4.7. The functions shown above are trigonometric approximations to the
rates of hormone production and removal from Figure 4.6
increasing. For simplicity, we will take the amplitude A = 1. (In general, this would
just be a multiplicative constant in whatever solution we compute.) Then the net
increase in hormone over this period is calculated from the integral
0 15 0 15
Htotal = [p(t) − r(t)] dt = [(1 + sin(ωt)) − (1 + cos(ωt))] dt
3 3
In the problem set, the reader is asked to compute this integral and to show that the
amount of hormone that accumulated
√ over the time interval 3 ≤ t ≤ 15, i.e. between
3:00 am and 3:00 pm is 24 2/π.
Solution
If we invest P dollars (the “principal” i.e., the amount deposited) in the bank with interest
r (compounded continually) then the amount A(t) in the account at time t (in years), will
4.5. Present value of a continuous income stream 75
grow as follows:
, r -nt
A(t) = P 1 + ,
n
where r is the yearly interest rate (e.g. 5%) and n is the number of times per year that
interest is compound (e.g. n = 2 means interest compounded twice per year, n = 12
means monthly compounded interest, etc.) Define
r r
h= . Then n = .
n h
Then at time t, we have that
1
A(t) = P (1 + h) h rt
6 7
1 rt
= P (1 + h) h
≈ P ert for large n or small h.
Here we have used the fact that when h is small (i.e. frequent intervals of compounding)
the expression in square brackets above can be approximated by e, the base of the natural
logarithms. Recall that 6 7
1
e = lim (1 + h) h .
h→0
(This result was obtained in a first semester calculus course by selecting the base of expo-
nentials such that the derivative of ex is just ex itself.) Thus, we have found that the amount
in the bank at time t will grow as
where L is the expected number of years left in the lifespan of the individual to whom this
annuity will be paid, and where we have approximated a sum of payments by an integral
(of a continuous income stream). One problem is that we do not know in advance how long
76 Chapter 4. Applications of the definite integral to velocities and rates
the lifespan L will be. As a crude approximation, we could assume that this income stream
continues forever, i.e. that L ≈ ∞. In such an approximation, we have to compute the
integral: 0 ∞
P = f (t)e−rt dt. (4.13)
0
The integral in Eqn. (4.13) is an improper integral (i.e. integral over an unbounded do-
main), as we have already encountered in Section 3.8.5. We shall have more to say about
the properties of such integrals, and about their technical definition, existence, and proper-
ties in Chapter 10. We refer to the quantity
0 ∞
P = f (t)e−rt dt, (4.14)
0
1
P = 10000 · ,
r
e.g. if interest rate is 5% (and assumed constant over future years), then
10000
P = = $200, 000.
0.05
Therefore, we need to pay $200,000 today to get 10, 000 annually for every future year.
average horizontal position or center of mass. It is important to avoid confusing these distinct notions.
4.6. Average value of a function 77
Definition
The average value of f (x) over the interval a ≤ x ≤ b is
0 b
1
f¯ = f (x)dx.
b−a a
Example 1
Find the average value of the function y = f (x) = x2 over the interval 2 < x < 4.
Solution
0 4 24
1 1 x3 22 13 3 4 28
f¯ = 2
x dx = = 4 − 23 =
4−2 2 2 3 2 2 6 3
0 182
1
f¯ = f (t)dt
182 0
0 182 " #
1 πt
= 12 + 4 sin( ) dt
182 0 182
" # 2182
1 4 · 182 πt 2
= 12t − cos( ) 2
182 π 182 20
" #
1 4 · 182
= 12 · 182 − [cos(π) − cos(0)]
182 π
8
= 12 + ≈ 14.546 hours (4.15)
π
78 Chapter 4. Applications of the definite integral to velocities and rates
16.0 16.0
Figure 4.8. We show the variations in day length (cyclic curve) as well as the
average day length (height of rectangle) in this figure.
Thus, on average, the day is 14.546 hrs long during the spring and summer.
In Figure 4.8, we show the entire day length cycle over one year. It is left as an
exercise for the reader to show that the average value of f over the entire year is 12 hrs.
(This makes intuitive sense, since overall, the short days in winter will average out with the
longer days in summer.)
Figure 4.8 also shows geometrically what the average value of the function repre-
sents. Suppose we determine the area associated with the graph of f (x) over the interval of
interest. (This area is painted red (dark) in Figure 4.8, where the interval is 0 ≤ t ≤ 365,
i.e. the whole year.) Now let us draw in a rectangle over the same interval (0 ≤ t ≤ 365)
having the same total area. (See the big rectangle in Figure 4.8, and note that its area
matches with the darker, red region.) The height of the rectangle represents the average
value of f (t) over the interval of interest.
4.7 Summary
In this chapter, we arrived at a number of practical applications of the definite integral.
1. In Section 4.2, we found that for motion at constant acceleration a, the displace-
ment of a moving object can be obtained by integrating twice: the definite integral
of acceleration is the velocity v(t), and the definite integral of the velocity is the
displacement.
0 t 0 t
v(t) = v0 + a ds. x(t) = x(0) + v(s) ds.
0 0
(Here we use the “dummy variable” s inside the integral, but the meaning is, of
course, the same as in the previous presentation of the formulae.) We showed that at
4.7. Summary 79
t2
x(t) = x0 + v0 t + a .
2
3. We illustrated the connection between rates of change (over time) and total change
(between on time point and another). In general, we saw that if r(t) represents a rate
of change of some process, then
0 b
r(s) ds = Total change over the time interval a ≤ t ≤ b.
a
Applications of the
definite integral to
calculating volume,
mass, and length
5.1 Introduction
In this chapter, we consider applications of the definite integral to calculating geometric
quantities such as volumes of geometric solids, masses, centers of mass, and lengths of
curves.
The idea behind the procedure described in this chapter is closely related to those we
have become familiar with in the context of computing areas. That is, we first imagine an
approximation using a finite number of pieces to represent a desired result. Then, a limiting
process of refinement leads to the desired result. The technology of the definite integral,
developed in Chapters 2 and 3 applies directly. This means that we need not re-derive
the link between Riemann Sums and the definite integral, we can use these as we did in
Chapter 4.
In the first parts of this chapter, we will calculate the total mass of a continuous
density distribution. In this context, we will also define the concept of a center of mass.
We will first motivate this concept for a discrete distribution made up of a number of finite
masses. Then we show how the same concept can be applied in the continuous case. The
connection between the discrete and continuous representation will form an important link
with our study of analogous concepts in probability, in Chapters 7 and 8.
In the second part of this chapter, we will consider how to dissect certain three dimen-
sional solids into a set of simpler parts whose volumes are easy to compute. We will use
familiar formulae for the volumes of disks and cylindrical shells, and carefully construct
a summation to represent the desired volume. The volume of the entire object will then
be obtained by summing up volumes of a stack of disks or a set of embedded shells, and
considering the limit as the thickness of the dissection cuts gets thinner. There are some im-
portant differences between material in this chapter and in previous chapters. Calculating
volumes will stretch our imagination, requiring us to visualize 3-dimensional (3D) objects,
and how they can be subdivided into shells or slices. Most of our effort will be aimed at
understanding how to set up the needed integral. We provide a number of examples of this
procedure, but first we review the basics of elementary volumes that will play the dominant
role in our calculations.
81
82Chapter 5. Applications of the definite integral to calculating volume, mass, and length
m1 m2 m3 m4 m5
x1 x2 x3 x4 x5
In Figure 5.1 we see a number of beads distributed along a thin wire. We will label
each bead with an index, i = 1 . . . n (there are five beads so that n = 5). Each bead has a
certain position (that we will think of as the value of xi ) and a mass that we will call mi .
We will think of this arrangement as a discrete mass distribution: both the masses of the
beads, and their positions are of interest to us. We would like to describe some properties
of this distribution.
The total mass of the beads, M , is just the sum of the individual masses, so that
n
$
M= mi . (5.1)
i=1
mass distribution
ρ( x )
x
Δx
m1 m2 ... mn
x1 x2 ... xn
Figure 5.2. Top: A continuous mass distribution along a one dimensional bar,
discussed in Example 5.3.3. The density of the bar (mass per unit length), ρ(x) is shown
on the graph. Bottom: the discretized approximation of this same distribution. Here we
have subdivided the bar into n smaller pieces, each of length ∆x. The mass of each piece
is approximately mk = ρ(xk )∆x where xk = k∆x. The total mass of the bar (“sum of
all the pieces”) will be represented by an integral (5.2) as we let the size, ∆x, of the pieces
become infinitesimal.
x0 = 0, . . . , xk = k∆x, ..., xN = L
mk = ρ(xk )∆x.
(Observe that units are correct, that is massk =(mass/length)· length. Note that ∆x has units
of length.) The total mass is then a sum of masses of all the pieces, and, as we have seen in
an earlier chapter, this sum will approach the integral
0 L
M= ρ(x)dx (5.2)
0
−1 1
nucleus
−1 0 1
x
Figure 5.3. A cell (keratocyte) shown in (a) has a dense distribution of actin
in a band called the actin cortex. In (b) we show a schematic sketch of the actin cortex
(shaded). In (c) that band of actin is scaled and straightened out so that it occupies a
length corresponding to the interval −1 ≤ x ≤ 1. We are interested in the distribution
of actin filaments across that band. That distribution is shown in (d). Note that actin is
densest in the middle of the band. (a) Credit to Alex Mogilner.
cell) are found at the edge of the cell in a band called the actin cortex. It has been found
experimentally that the density of actin is greatest in the middle of the band, i.e. the position
corresponding to the midpoint of the edge of the cell shown in Fig. 5.3a. According to
Alex Mogilner19 , the density of actin across the cortex in filaments per edge µm is well
approximated by a distribution of the form
ρ(x) = α(1 − x2 ) − 1 ≤ x ≤ 1,
20
where x is the fraction of distance from midpoint to the end of the band (Fig. 5.3c and d).
Here ρ(x) is an actin filament density in units of filaments per µm. That is, ρ is the number
19 Alex Mogilner is a professor of mathematics who specializes in cell biology and the actin cytoskeleton
20 Note that 1µm (read “ 1 micro-meter” or “micron”) is 10−6 meters, and is appropriate for measuring lengths
of small objects such as cells.
5.3. Mass distribution and the center of mass 85
The integral above has already been computed (Integral 2.) in the Examples 3.6.2 of Chap-
ter 3 and was found to be 4/3. Thus, we have that there are N = 4α/3 actin filaments in
the band.
n
1 $
x̄ = xi mi .
M i=1
+n
xi mi
x̄ = +i=1
n .
i=1 mi
0 L
1
x̄ = xρ(x)dx .
M 0
Solution
(a) From our previous discussion, the total mass of the bar is
0 L 2L
ax2 22 aL2
M= ax dx = = .
0 2 0 2 2
(b) The average mass density along the bar is computed just as one would compute the
average value of a function: integrate the function over an interval and divide by the
length of the interval. An example of this type appeared in Section 4.6. Thus
0 " #
1 L 1 aL2 aL
ρ̄ = ρ(x) dx = =
L 0 L 2 2
A bar having a uniform density ρ̄ = aL/2 would have the same total mass as the bar
in this example. (This is the physical interpretation of average mass density.)
(c) The center of mass of the bar is
1L 0 L 2L
xp(x) dx 1 a x3 22 2a L3 2
x̄ = 0 = ax2 dx = = = L.
M M 0 M 3 0 2 aL2 3 3
Observe that the center of mass is an “average x coordinate”, which is not the same
as the average mass density.
(d) We can use the cumulative function defined in Eqn. (5.3) to figure out where half of
the mass is concentrated. Suppose we cut the bar at some position x = s. Then the
mass of this part of the bar is
0 s
as2
M1 = ρ(x) dx = ,
0 2
5.4. Miscellaneous examples and related problems 87
We ask for what values of s is it true that M1 is exactly half the total mass? Using
the result of part (a), we find that for this to be true, we must have
M as2 1 aL2
M1 = , ⇒ =
2 2 2 2
x=0
r
Δx
x=h
Solution
We assume a simple cylindrical tube and consider imaginary “slices” of this tube along
its vertical axis, here labeled as the “x” axis. Suppose that the thickness of a slice is ∆x.
Then the volume of each of these (disk shaped) slices is πr2 ∆x. The amount of glucose in
the slice is approximately equal to the concentration c(x) multiplied by the volume of the
slice, i.e. the small slice contains an amount πr2 ∆xc(x) of glucose. In order to sum up the
total amount over all slices, we use a definite integral. (As before, we imagine ∆x → dx
becoming “infinitesimal” as the number of slices increases.) The integral we want is
0 h
2
G = πr c(x) dx.
0
Even though the geometry of the test-tube, at first glance, seems more complicated than
the one-dimensional highway described in Chapter 4, we observe here that the integral
that computes the total amount is still a sum over a single spatial variable, x. (Note the
resemblance between the integrals
0 L 0 h
2
I= C(x) dx and G = πr c(x) dx,
0 0
here and in the previous example.) We have neglected the complication of the rounded bot-
tom portion of the test-tube, so that integration over its length (which is actually summation
of disks shown in Figure 5.4) is a one-dimensional problem.
In this case the total amount of glucose in the tube is
0 h " # 2h " #
2 2 0.5x2 22 2 0.5h2
G = πr (0.1 + 0.5x)dx = πr 0.1x + 2 = πr 0.1h + 2 .
0 2 0
Suppose that the height of the test-tube is h = 10 cm and its radius is r = 1 cm.
Then the total mass of glucose is
" #
0.5(100)
G = π 0.1(10) + = π (1 + 25) = 26π gm.
2
5.4. Miscellaneous examples and related problems 89
In the next example, we consider a circular geometry, but the concept of dissecting
and summing is the same. Our task is to determine how to set up the problem in terms
of an integral, and, again, we must imagine which type of subdivision would lead to the
summation (integration) needed to compute total amount.
b(r)
2
b(r)=1−r r Δr
Figure 5.5. A colony of bacteria with circular symmetry. A ring of small thickness
∆r has roughly constant density. The superimposed curve on the left is the bacterial density
b(r) as a function of the radius r.
Solution
Figure 5.5 shows a rough sketch of a flat surface with a colony of bacteria growing on it.
We assume that this distribution is radially symmetric. The density as a function of distance
from the center is given by b(r), as shown in Figure 5.5. Note that the function describing
density, b(r) is smooth, but to accentuate the strategy of dissecting the region, we have
shown a top-down view of a ring of nearly constant density on the right in Figure 5.5. We
see that this ring occupies the region between two circles, e.g. between a circle of radius r
and a slightly bigger circle of radius r + ∆r. The area of that “ring”21 would then be the
area of the larger circle minus that of the smaller circle, namely
Aring = π(r + ∆r)2 − πr2 = π(2r∆r + (∆r)2 ).
However, if we make the thickness of that ring really small (∆r → 0), then the quadratic
term is very very small so that
Aring ≈ 2πr∆r.
21 Students commonly make the error of writing Aring = π(r + ∆r − r)2 = π(∆r)2 . This trap should be
avoided! It is clear that the correct expression has additional terms, since we really are computing a difference of
two circular areas.
90Chapter 5. Applications of the definite integral to calculating volume, mass, and length
Consider all the bacteria that are found inside a “ring” of radius r and thickness ∆r
(see Figure 5.5.) The total number within such a ring is the product of the density, b(r) and
the area of the ring, i.e.
To get the total number in the colony we sum up over all the rings from r = 0 to r = 1
and let the thickness, ∆r → dr become very small. But, as with other examples, this is
equivalent to calculating a definite integral, namely:
0 1 0 1 0 1
Btotal = (1 − r)(2πr) dr = 2π (1 − r2 )rdr = 2π (r − r3 )dr.
0 0 0
Vcylinder = πr2 h.
2. The volume of a circular disk of thickness τ , and radius r (shown on the left in
Figure 5.6 ), is a special case of the above,
Vdisk = πr2 τ.
3. The volume of a cylindrical shell of height h, with circular radius r and small
thickness τ (shown on the right in Figure 5.6) is
Vshell = 2πrhτ.
r
τ
r
h
τ
disk shell
Figure 5.6. The volumes of these simple 3D shapes are given by simple formulae.
We use them as basic elements in computing more complicated volumes. Here we will
present examples based on disks. In Appendix 11.4 we give an example based on shells.
y y y
x x x
Summing up the volumes of all such disks in the stack leads to the total volume of disks
$
Vdisks = π[f (xk )]2 ∆x.
k
When we increase the number of disks, making each one thinner so that ∆x → 0, we
y y=f(x)
disk thickness:
Δx
Figure 5.8. Here the solid of revolution is formed by revolving the curve y = f (x)
about the y axis. A typical disk used to approximate the volume is shown. The radius of the
disk (parallel to the y axis) is r = y = f (x). The thickness of the disk (parallel to the x
axis) is ∆x. The volume of this disk is hence v = π[f (x)]2 ∆x
In most of the examples discussed in this chapter, the key step is to make careful observation
of the way that the radius of a given disk depends on the function that generates the surface.
5.5. Volumes of solids of revolution 93
(By this we mean the function that specifies the curve that forms the surface of revolution.)
We also pay attention to the dimension that forms the disk thickness, ∆x.
Some of our examples will involve surfaces revolved about the x axis, and others
will be revolved about the y axis. In setting up these examples, a diagram is usually quite
helpful.
y
f(x k )
x
xk
Δx
Δx
Figure 5.9. When the semicircle (on the left) is rotated about the x axis, it gen-
erates a sphere. On the right, we show one disk generated by the revolution of the shaded
rectangle.
In Figure 5.9, we show the sphere dissected into a set of disks, each of width ∆x. The disks
are lined up along the x axis with coordinates xk , where −R ≤ xk ≤ R. These are just
integer multiples of the slice thickness ∆x, so for example,
The radius of the disk depends on its position22. Indeed, the radius of a disk through the x
axis at a point xk is specified by the function rk = f (xk ). The volume of the k’th disk is
Vk = πrk2 ∆x.
By the above remarks, using the fact that the function f (x) determines the radius, we have
.8 /2
Vk = π R2 − x2k ∆x = π(R2 − x2k )∆x.
as ∆x → 0, this sum becomes a definite integral, and represents the true volume. We start
the summation at x = −R and end at xN = R since the semi-circle extends from x = −R
to x = R. Thus
0 R 0 R
Vsphere = π[f (xk )]2 dx = π (R2 − x2 ) dx.
-R -R
We compute this integral using the Fundamental Theorem of calculus, obtaining
" # 2R
2 x3 22
Vsphere = π R x − .
3 2−R
Observe that this is twice the volume obtained for the interval 0 < x < R,
" # 2R " #
x3 22 R3
Vsphere = 2π R2 x − = 2π R 3
− .
3 20 3
Solution
The object has the y axis as its axis of symmetry. Hence disks are stacked up along the
y axis to approximate this volume. This means that the width of each disk is ∆y. This
accounts for the dy in the integral below. The volume of each disk is
where the radius, r is now in the direction parallel to the x axis. Thus we must express
radius as
r = x = f −1 (y),
5.5. Volumes of solids of revolution 95
y y
y=f(x)=1−x2
x x
Figure 5.10. The curve that generates the shape of a paraboloid (left) and the
shape of the paraboloid (right).
It is helpful to note that once we have identified the thickness of the disks (∆y), we are
guided to write an integral in terms of the variable y, i.e. to reformulate the equation
describing the curve. We compute
0 1 " # 21 " #
y 2 22 1 π
V =π (1 − y) dy = π y − = π 1 − = .
0 2 2
0 2 2
The above example was set up using disks. However, there are other options. In
Appendix 11.4 we show yet another method, comprised of cylindrical shells to compute
the volume of a cone. In some cases, one method is preferable to another, but here either
method works equally well.
Example 3
Find the volume of the surface formed by rotating the curve
√
y = f (x) = x, 0≤x≤1
Solution
(a) If we rotate this curve about the x axis, we obtain a bowl shape; dissecting this
surface leads to disks stacked along the x axis, with thickness ∆x → dx, with radii
in the y direction, i.e. r = y = f (x), and with x in the range 0 ≤ x ≤ 1. The
96Chapter 5. Applications of the definite integral to calculating volume, mass, and length
(b) When the curve is rotated about the y axis, it forms a surface with a sharp point at the
origin. The disks are stacked along the y axis, with thickness ∆y → dy, and radii in
the x direction. We must rewrite the function in the form
x = g(y) = y 2 .
We now use the interval along the y axis, i.e. 0 < y < 1 The volume is then
0 1 0 1 0 1 21
2 2 2 4 y 5 22 π
V =π [f (y)] dy = π [y ] dy = π y dy = π 2 = .
0 0 0 5 0 5
Things are more complicated for “curves” that are not straight lines, but in many cases, we
are interested in calculating the length of such curves. In this section we describe how this
can be done using the definite integral “technology”.
The idea of dissection also applies to the problem of determining the length of a
curve. In Figure 5.11, we see the general idea of subdividing a curve into many small
“arcs”. Before we look in detail at this construction, we consider a simple example, shown
in Figure 5.12. In the triangle shown, by the Pythagorean theorem we have the length of
the sloped side related as follows to the side lengths ∆x, ∆y:
∆$2 = ∆x2 + ∆y 2 ,
%9 & : " #2
! ∆y 2 ∆y
∆$ = ∆x2 + ∆y 2 = 1+ ∆x = 1+ ∆x.
∆x2 ∆x
We now consider a curve given by some function
as shown in Figure 5.11(a). We will approximate this curve by a set of line segments, as
shown in Figure 5.11(b). To obtain these, we have selected some step size ∆x along the
x axis, and placed points on the curve at each of these x values. We connect the points
with straight line segments, and determine the lengths of those segments. (The total length
23 The reader should recall that this formula is a simple application of Pythagorean theorem.
5.6. Length of a curve: Arc length 97
y y
y=f(x) y=f(x)
x x
y
y=f(x)
Δy
Δx
x
Figure 5.11. Top: Given the graph of a function, y = f (x) (at left), we draw
secant lines connecting points on its graph at values of x that are multiples of ∆x (right).
Bottom: a small part of this graph is shown, and then enlarged, to illustrate the relationship
between the arc length and the length of the secant line segment.
Δl
Δy
Δx
x
Figure 5.12. The basic idea of arclength is to add up lengths ∆l of small line
segments that approximate the curve.
of the segments is only an approximation of the length of the curve, but as the subdivision
gets finer and finer, we will arrive at the true total length of the curve.)
We show one such segment enlarged in the circular inset in Figure 5.11. Its slope,
shown at right is given by ∆y/∆x. According to our remarks, above, the length of this
98Chapter 5. Applications of the definite integral to calculating volume, mass, and length
We recognize the ratio inside the square root as as the derivative, dy/dx. If our curve is
given by a function y = f (x) then we can rewrite this as
8
2
d$ = 1 + (f % (x)) dx.
Thus, the length of the entire curve is obtained from summing (i.e. adding up) these small
pieces, i.e.
0 b8
2
L= 1 + (f % (x)) dx. (5.4)
a
Example 1
Find the length of a line whose slope is −2 given that the line extends from x = 1 to x = 5.
Solution
We could find the equation of the line and use the distance formula. But for the purpose of
this example, we apply the method of Equation (5.4): we are given that the slope f % (x) is
-2. The integral in question is
0 5 ! 0 5 ! 0 1 √
L= 2
1 + (f (x)) dx =
% 2
1 + (−2) dx = 5 dx.
1 1 5
We get
2
√ 0 5 √ 25 √ √
L= 5 dx = 5x22 = 5[5 − 1] = 4 5.
1 1
Example 2
Find an integral that represents the length of the curve that forms the graph of the function
Solution
We find that
dy
= f % (x) = 3x2 .
dx
5.6. Length of a curve: Arc length 99
At this point, we will not attempt to find the actual length,√as we must first develop tech-
niques for finding the anti-derivative for functions such as 1 + 9x4 .
y = f (x) = 1 − x2 for 0 ≤ x ≤ 1
and the accumulated length along the curve from left to right, L which is just a sum of
such values. The Table 5.6 shows steps in the calculation of the ratio ∆y/∆x, the value
of ∆$, the cumulative sum, and, finally the total length L. The final value of L = 1.4782
represents the total length of the curve over the entire interval 0 < x < 1.
In Figure 5.13(a) we show the actual curve y = 1 − x2 . with points placed on it
at each multiple of ∆x. In Figure 5.13(b), we show (in blue) how the lengths of the little
straight-line segments connecting these points changes across the interval. (The segments
on the left along the original curve are nearly flat, so their length is very close to ∆x. The
segments on the right part of the curve are much more sloped, and their lengths are thus
bigger.) We also show (in red) how the total accumulated length L depends on the position
x across the interval. This function represents the total arc-length of the curve y = 1 − x2 ,
from x = 0 up to a given x value. At x = 1 this function returns the value y = L, as it has
added up the full length of the curve for 0 ≤ x ≤ 1.
1.5
y = f(x) =1-x^2
0.0
0.0 1.0
1.5
Arc Length
cumulative length L
y = f(x) =1-x^2
length increment
0.0
0.0 1.0
+
x y = f (x) ∆y/∆x ∆$ L= ∆$
adult. As is the case in humans, the teeth on an alligator do not form or sprout simultane-
ously. In the development of the baby alligator, there is a sequence of initiation of teeth,
one after the other, at well-defined positions along the jaw.
Paul Kulesa, a former student of James D Muray, set out to understand the pattern of
development of these teeth, based on data in the literature about what happens at distinct
stages of embryonic growth. Of interest in his research were several questions, including
what determines the positions and timing of initiation of individual teeth, and what mecha-
nisms lead to this pattern of initiation. One theory proposed by this group was that chemical
signals that diffuse along the jaw at an early stage of development give rise to instructions
that are interpreted by jaw cells: where the signal is at a high level, a tooth will start to
initiate.
While we will not address the details of the mechanism of development here, we
will find a simple application of the ideas of arclength in the developmental sequence of
teething. Shown in Figure 5.14 is a smiling baby alligator (no doubt thinking of some
future tasty meal). A close up of its smile (at an earlier stage of development) reveals the
shape of the jaw, together with the sites at which teeth are becoming evident. (One of these
sites, called primordia, is shown enlarged in an inset in this figure.)
Paul Kulesa found that the shape of the alligator’s jaw can be described remarkably
well by a parabola. A proper choice of coordinate system, and some experimentation leads
to the equation of the best fit parabola
y = f (x) = −ax2 + b
where a = 0.256, and b = 7.28 (in units not specified).
We show this curve in Figure 5.15(a). Also shown in this curve is a set of points at
which teeth are found, labelled by order of appearance. In Figure 5.15(b) we see the same
curve, but we have here superimposed the function L(x) given by the arc length along the
102Chapter 5. Applications of the definite integral to calculating volume, mass, and length
curve from the front of the jaw (i.e. the top of the parabola), i.e.
0 x!
L(x) = 1 + [f % (s)]2 ds.
0
This curve measures distance along the jaw, from front to back. The distances of the teeth
from one another, or along the curve of the jaw can be determined using this curve if we
know the x coordinates of their positions.
The table below gives the original data, courtesy of Dr. Kulesa, showing the order
of the teeth, their (x, y) coordinates, and the value of L(x) obtained from the arclength
formula. We see from this table that the teeth do not appear randomly, nor do they fill in
the jaw in one sweep. Rather, they appear in several stages.
In Figure 5.15(c), we show the pattern of appearance: Plotting the distance along the
jaw of successive teeth reveals that the teeth appear in waves of nearly equally-spaced sites.
(By equally spaced, we refer to distance along the parabolic jaw.) The first wave (teeth 1, 2,
3) are followed by a second wave (4, 5, 6, 7), and so on. Each wave forms a linear pattern
of distance from the front, and each successive wave fills in the gaps in a similar, equally
spaced pattern.
The true situation is a bit more complicated: the jaw grows as the teeth appear as
shown in 5.15(c). This has not been taken into account in our simple treatment here, where
we illustrate only the essential idea of arc length application.
Table 5.2. Data for the appearance of teeth, in the order in which they appear
as the alligator develops. We can use arc-length computations to determine the distances
between successive teeth.
5.6. Length of a curve: Arc length 103
8.0 10.0
Alligator teeth arc length L(x) along jaw
4
11
1
jaw y = f(x)
5
2
8
12
6
9
3
10
7
13
0.0 0.0
-6.0 6.0 0.0 5.5
(a) (b)
10.0
Distance along jaw 13
3 10
12
2
4
11
teeth in order of appearance
0.0
0.0 13.0
(c) (d)
Figure 5.15. (a) The parabolic shape of the jaw, showing positions of teeth and
numerical order of emergence. (b) Arc length along the jaw from front to back. (c) Distance
of successive teeth along the jaw. (d) Growth of the jaw.
5.7. Summary 105
5.6.2 References
1. P.M. Kulesa and J.D. Murray (1995). Modelling the Wave-like Initiation of Teeth
Primordia in the Alligator. FORMA. Cover Article. Vol. 10, No. 3, 259-280.
2. J.D. Murray and P.M. Kulesa (1996). On A Dynamic Reaction-Diffusion Mechanism
for the Spatial Patterning of Teeth Primordia in the Alligator. Journal of Chemical
Physics. J. Chem. Soc., Faraday Trans., 92 (16), 2927-2932.
3. P.M. Kulesa, G.C. Cruywagen, S.R. Lubkin, M.W.J. Ferguson and J.D. Murray (1996).
Modelling the Spatial Patterning of Teeth Primordia in the Alligator. Acta Biotheo-
retica, 44, 153-164.
5.7 Summary
Here are the main points of the chapter:
1. We introduced the idea of a spatially distributed mass density ρ(x) in Section 5.2.2.
Here the definite integral represents
0 b
ρ(x) dx = Total mass in the interval a ≤ x ≤ b.
a
4. The mean is an average x coordinate, whereas the median is the x coordinate that
splits the distribution into two equal masses (Geometrically, the median subdivides
the graph of the distribution into two regions of equal areas). The mean and median
are the same only in symmetric distributions. They differ for any distribution that is
asymmetric. The mean (but not the median) is influenced more strongly by distant
portions of the distribution.
106Chapter 5. Applications of the definite integral to calculating volume, mass, and length
5. In the later parts of this chapter, we showed how to compute volumes of various
objects that have radial symmetry (“solids of revolution”). We showed that if the
surface is generated by rotating the graph of a function y = f (x) about the x axis
(for a ≤ x ≤ b), then its volume can be described by an integral of the form
0 b
V = π[f (x)]2 dx.
a
We used this idea to show that the volume of a sphere of radius R is Vsphere =
(4/3)πR3
In the Chapters 7 and 8, we find applications of the ideas of density and center of
mass to the context of a probability distribution and its mean.
Chapter 6
Techniques of
Integration
In this chapter, we expand our repertoire for antiderivatives beyond the “elementary” func-
tions discussed so far. A review of the table of elementary antiderivatives (found in Chap-
ter 3) will be useful. Here we will discuss a number of methods for finding antiderivatives.
We refer to these collected “tricks” as methods of integration. As will be shown, in some
cases, these methods are systematic (i.e. with clear steps), whereas in other cases, guess-
work and trial and error is an important part of the process.
A primary method of integration to be described is substitution. A close relationship
exists between the chain rule of differential calculus and the substitution method. A second
very important method is integration by parts. Aside from its usefulness in integration
per se, this method has numerous applications in physics, mathematics, and other sciences.
Many other techniques of integration used to form a core of methods taught in such courses
in integral calculus. Many of these are quite technical. Nowadays, with sophisticated
mathematical software packages (including Maple and Mathematica), integration can be
carried our automatically via computation called “symbolic manipulation”, reducing our
need to dwell on these technical methods.
y = mx + b.
change in y ∆y
m= = .
change in x ∆x
∆y = m∆x.
107
108 Chapter 6. Techniques of Integration
Δy
Δx
Figure 6.1. The slope of the line shown here is m = ∆y/∆x. This means that the
small quantities ∆y and ∆x are related by ∆y = m∆x.
If we take a very small step along this line in the x direction, call it dx (to remind us of an
“infinitesimally small” quantity), then the resulting change in the y direction, (call it dy) is
related by
dy = m dx.
Now suppose that we have a curve defined by some arbitrary function, y = f (x)
which need not be a straight line.For a given point (x, y) on this curve, a step ∆x in the x
direction is associated with a step ∆y in the y direction. The relationship between the step
secant
y+Δ y tangent
y+dy y=f(x)
y y
x x+ Δx x x+dx
Figure 6.2. On this figure, the graph of some function is used to illustrate the
connection between differentials dy and dx. Note that these are related via the slope of a
tangent line, mt to the curve, in contrast with the relationship of ∆y and ∆x which stems
from the slope of the secant line ms on the same curve.
sizes is:
∆y = ms ∆x,
where now ms is the slope of a secant line (shown connecting two points on the curve in
Figure 6.2). If the sizes of the steps are small (dx and dy), then this relationship is well
approximated by the slope of the tangent line, mt as shown in Figure 6.2 i.e.
dy = mt dx = f % (x)dx.
6.1. Differential notation 109
The quantities dx and dy are called differentials. In general, they link a small step on the
x axis with the resulting small change in height along the tangent line to the curve (shown
in Figure 6.2). We might observe that the ratio of the differentials, i.e.
dy
= f % (x),
dx
appears to link our result to the definition of the derivative. We remember, though, that the
derivative is actually defined as a limit:
∆y
f % (x) = lim .
∆x→0 ∆x
When the step size ∆x is quite small, it is approximately true that
∆y dy
≈ .
∆x dx
This notation will be useful in substitution integrals.
Examples
We give some examples of functions, their derivatives, and the differential notation that
goes with them.
dy = 3x2 dx.
2. The function y = f (x) = tan(x) has derivative f % (x) = sec2 (x). Therefore
Proof
Since F (x) and G(x) have the same derivative, we have
d d
F (x) = G(x),
dx dx
d d
F (x) − G(x) = 0,
dx dx
d
(F (x) − G(x)) = 0.
dx
This means that the function F (x) − G(x) should be a constant, since its derivative is zero.
Thus
F (x) − G(x) = C,
so
F (x) = G(x) + C,
as required. F (x) and G(x) are called antiderivatives of f (x), and this confirms, once
more, that any two antiderivatives differ at most by a constant.
In another terminology, which means the same thing, we also say that F (x) (or G(x))
is the integral of the function f (x), and we refer to f (x) as the integrand. We write this as
follows: 0
F (x) = f (x) dx.
This notation is sometimes called “an indefinite integral” because it does not denote a
specific numerical value, nor is an interval specified for the integration range. An indefinite
6.3. Simple substitution 111
integral is a function with an arbitrary constant. (Contrast this with the definite integral
studied in our last chapters: in the case of the definite integral, we specified an interval, and
interpreted the result, a number, in terms of areas associated with curves.) We also write
0
f (x) dx = F (x) + C,
if we want to indicate the form of all possible functions that are antiderivatives of f (x). C
is referred to as a constant of integration.
How do we handle this? We reason as follows. The df /dx (a quantity that is, itself, a
function) is the derivative of the function f (x). That means that f (x) is the antiderivative
of df /dx. Then, according to the Fundamental Theorem of Calculus,
0
df
dx = f (x) + C.
dx
We can write this same result using the differential of f , as follows:
0
df = f (x) + C.
The following examples illustrate the idea with several elementary functions.
Examples
1
1. d(cos x) = cos x + C.
1
2. dv = v + C.
1
3. d(x3 ) = x3 + C.
u = f (x),
112 Chapter 6. Techniques of Integration
appears in the integrand. In that case, the substitution will lead to eliminating x entirely in
favour of the new quantity u, and simplification may occur.
f (x) = (x + 1)10 .
u = (x + 1),
then " #
d(x + 1) dx d(1)
du = dx = + dx = (1 + 0)dx = dx.
dx dx dx
Now replacing (x + 1) by u and dx by the equivalent du we get:
0
F (x) = u10 du.
u11 (x + 1)11
F (x) = = + C.
11 11
In the last step, we converted the result back to the original variable, and included the
arbitrary integration constant. A very important point to remember is that we can always
check our results by differentiation:
Check
Differentiate F (x) to obtain
dF 1
= (11(x + 1)10 ) = (x + 1)10 .
dx 11
6.3. Simple substitution 113
This integration can be done by making the substitution u = x + 1 for which du = dx. We
can handle the endpoints in one of two ways:
In the last steps we have plugged in the new endpoints (appropriate to u).
Here we plugged in the original endpoints (as appropriate to the variable x).
Solutions
0
2
1. I = dx. Let u = x + 2. Then du = dx and we get
x+2
0 0
2 1
I= du = 2 du = 2 ln |u| = 2 ln |x + 2| + C.
u u
0 1
3
2. I = x2 ex dx. Let u = x3 . Then du = 3x2 dx. Here we use method 2 for
0
handling endpoints. 0
du 1 1 3
eu = eu = ex + C.
3 3 3
Then 0 21
1
1 x3 22
2 x3 1
I= x e dx = e 2 = (e − 1).
0 3 0 3
(We converted the antiderivative to the original variable, x, before plugging in the
original endpoints.)
0
1
3. I = dx. Let u = x + 1, then du = dx so we have
(x + 1)2 + 1
0
1
I= du = arctan(u) = arctan(x + 1) + C.
u2 + 1
0 !
4. I = (x + 3) x2 + 6x + 10 dx. Let u = x2 + 6x + 10. Then du = (2x + 6) dx =
2(x + 3) dx. With this substitution we get
0 0
√ du 1 1 u3/2 1 1
I= u = u1/2 du = = u3/2 = (x2 + 6x + 10)3/2 + C.
2 2 2 3/2 3 3
0 π
5. I = cos3 (x) sin(x) dx. Let u = cos(x). Then du = − sin(x) dx. Here we use
0
method 1 for handling endpoints. For x = 0, u = cos 0 = 1 and for x = π, u =
cos π = −1, so changing the integral and endpoints to u leads to
0 −1 2−1
u4 22 1
I= u (−du) = − 2 = − ((−1)4 − 14 ) = 0.
3
1 4 1 4
6.3. Simple substitution 115
Here we plugged in the new endpoints that are relevant to the variable u.
0
1
6. dx. Let u = ax + b. Then du = a dx, so dx = du/a. Substitute the
ax + b
above equations into the first equation and simplify to get
0 0
1 du 1 1 1
I= = du = ln|u| + C.
u a a u a
Substitute u = ax + b back to arrive at the solution
0
1 1
I= dx = ln |ax + b| (6.1)
ax + b a
0 0
1 1 1
7. I = 2
dx = dx. This can be brought to the form of
b + ax b 1 + (a/b)x2 !
2 2
an
! arctan type integral as follows: Let u = (a/b)x , so u = a/b x and du =
a/b dx. Now substituting these, we get
0 0
1 1 du ! 1 1
I= 2
! = b/a du
b 1+u a/b b 1 + u2
1 1 !
I = √ arctan(u) du = √ arctan( a/b x) + C.
ba ba
u = (1 + x2 ).
Then
du = 2x dx,
and dx = du/2x. Attempting to convert the integral to the form containing u would lead
to 0
√ du
I= u .
2x
We have not succeeded in eliminating x entirely, so the expression obtained contains a
mixture of two variables. We can proceed no further. This substitution did not simplify the
integral and we must try some other technique.
116 Chapter 6. Techniques of Integration
Solution
We observe that the denominator of the integrand is a perfect square, i.e. that x2 − 6x+ 9 =
(x − 3)2 . Replacing this in the integral, we obtain
0 0
1 1
I= dx = dx.
x2 − 6x + 9 (x − 3)2
Now making the substitution u = (x − 3), and du = dx leads to a power type integral
0 0
1 1
I= du = u−2 du = −u−1 = − + C.
u2 (x − 3)
Solution
Here we use “completing the square” to express the denominator in the form x2 −6x+10 =
(x − 3)2 + 1. Then the integral takes the form
0
1
I= dx.
1 + (x − 3)2
Remark: in cases where completing the square gives rise to a constant other than 1 in the
denominator, we use the technique illustrated in Example 6.3.3 Eqn. (6.1) to simplify the
problem.
We will show shortly that the integrand can be simplified to the sum of two fractions, i.e.
that 0 0
1 A B
I= dx = + dx,
(1 − x)(1 + x) (1 − x) (1 + x)
where A, B are constants. The algebraic technique for finding these constants, and hence of
forming the simpler expressions, called Partial fractions, will be discussed in an upcoming
section. Once these constants are found, each of the resulting integrals can be handled by
substitution.
In the special case that A = B = x, the last two identities above lead to:
From these, we can generate a variety of other identities as special cases. We list the most
useful below. The first two are obtained by combining the double-angle formula for cosines
with the identity sin2 (x) + cos2 (x) = 1.
Solution
This integral can be computed by a simple substitution, similar to Example 5 of Section 6.3.
We let u = cos(x) and du = − sin(x)dx to get the integral into the form
0
−u3 − cos3 (x)
I = − u2 du = = + C.
3 3
We need none of the trigonometric identities in this case. Simple substitution is always the
easiest method to use. It should be the first method attempted in each case.
Solution
This is an example in which the “Useful trigonometric identity” 1 leads to a simpler inte-
gral. We write
0 0 0
2 1 + cos(2x) 1
I = cos (x) dx = dx = (1 + cos(2x)) dx.
2 2
Then clearly, " #
1 sin(2x)
I= x+ + C.
2 2
Solution
We can rewrite this integral in the form
0
I = sin2 (x) sin(x) dx.
Now using the trigonometric identity sin2 (x) + cos2 (x) = 1, leads to
0
I = (1 − cos2 (x)) sin(x) dx.
The first part is elementary, and the second was shown in a previous example. Therefore
we end up with
cos3 (x)
I = − cos(x) + + C.
3
Note that it is customary to combine all constants obtained in the calculation into a single
constant, C at the end.
Aside from integrals that, themselves, contain trigonometric functions, there are other
cases in which use of trigonometric identities,
√ though at first seemingly √ unrelated, is cru-
cial. Many expressions involving the form 1 ± x2 or the related form a ± bx2 will be
simplified eventually by conversion to trigonometric expressions!
Solution
The simple substitution u = 1 − x2 will not work, (as shown by a similar example in
Section 6.3). However, converting to trigonometric expressions will do the trick. Let
(In Figure 6.3, we show this relationship on a triangle. This diagram is useful in reversing
the substitutions after the integration step.) Then 1 − x2 = 1 − sin2 (u) = cos2 (u), so the
1
x
u
1−x 2
Figure 6.3. This triangle helps to convert the (trigonometric) functions of u to the
original variable x in Example 6.5.4.
substitutions lead to
0 ! 0
I= cos (u) cos(u) du = cos2 (u) du.
2
From a previous example, we already know how to handle this integral. We find that
" #
1 sin(2u) 1
I= u+ = (u + sin(u) cos(u)) + C.
2 2 2
6.5. Trigonometric substitutions 121
(In the last step, we have used the double angle trigonometric identity. We will shortly see
why this simplification is relevant.)
We now desire to convert the result back to a function of the original variable, x.
We note that x = sin(u) implies u = arcsin(x). To convert the term cos(u) back to an
expression depending on x we can use the relationship 1 − sin2 (u) = cos2 (u), to deduce
that 8 !
cos(u) = 1 − sin2 (u) = 1 − x2 .
Remark: In computing a definite integral of the same type, we can circumvent the
need for the conversion back to an expression involving x by using the appropriate method
for handling endpoints. For example, the integral
0 1 !
I= 1 − x2 dx
0
can be transformed to
0 π/2 !
I= cos2 (u) cos(u) du,
0
y
y= 9 − x 2
Solution
The semicircle is one quarter of a circle of radius 3. Its edge is described by the equation
!
y = f (x) = 9 − x2 .
We will assume that the density per unit area is uniform. However, the mass per unit
length along the x axis is not uniform, due to the shape of the object. We apply the idea of
integration: If we cut the shape at increments of ∆x along the x axis, we get a collection
of pieces whose mass is each proportional to f (x)∆x. Summing up such contributions and
letting the widths ∆x → dx get small, we arrive at the integral for mass. The total mass of
the shape is thus
0 3 0 3!
M= f (x) dx = 9 − x2 dx.
0 0
1 9
M= π(3)2 = π.
4 4
6.5. Trigonometric substitutions 123
(We could also see this by performing a trigonometric substitution integral.) The second
integral can be done by simple substitution. Consider
0 3 0 3 !
I= xf (x) dx = x 9 − x2 dx.
0 0
2
Let u = 9 − x . Then du = −2x dx. The endpoints are converted as follows: x = 0 ⇒
u = 9 − 02 = 9 and x = 3 ⇒ u = 9 − 32 = 0 so that we get the integral
0 0
√ 1
I= u du.
9 −2
We can reverse the endpoints if we switch the sign, and this leads to
0 " # " 3/2 # 29
1 9 1/2 1 u 2
I= u du = 2 .
2 0 2 3/2 20
Since 93/2 = (91/2 )3 = 33 , we get I = (33 )/3 = 32 = 9. Thus the x coordinate of the
center of mass is
I 9 4
x̄ = = = .
M (9/4)π π
We can similarly find the y coordinate of the center of mass: To do so, we would express
the boundary of the shape in the form x = f (y) and integrate to find
0 3
ȳ = yf (y) dy.
0
!
For the semicircle, y 2 + x2 = 9, so x = f (y) = 9 − y 2 . Thus
0 3 !
ȳ = y 9 − y 2 dy.
0
This integral looks identical to the one we wrote down for x̄. Thus, based on this similarity
(or based on the symmetry of the problem) we will find that
4
ȳ = .
π
Solution
We aim for simplification by the identity 1 + tan2 (u) = sec2 (u), so we set
This integral will require further work, and will be partly calculated by Integration by Parts
in Appendix 11.5. In this example, the triangle shown in Figure 6.5 shows the relationship
between x and u and will help to convert other trigonometric functions of u to functions of
x.
2
1+x
x
u
1
Figure 6.5. As in Figure 6.3 but for example 6.5.6.
The idea is to break this up into simpler rational expressions by finding constants A, B
such that
1 A B
= + .
(ax + b)(cx + d) (ax + b) (cx + d)
Each part can then be handled by a simple substitution, as shown in Example 6.3.3, Eqn. (6.1).
We give several examples below.
1 A(x − 1) + B(x + 1)
= .
x2 − 1 x2 − 1
This means that
1 = A(x − 1) + B(x + 1)
must be true for all x values. We now ask what values of A and B make this equation hold
for any x. Choosing two “easy” values, namely x = 1 and x = −1 leads to isolating one
or the other unknown constants, A, B, with the results:
1 = −2A, 1 = 2B.
Thus B = 1/2, A = −1/2, so the integral can be written in the simpler form
"0 0 #
1 −1 1
I= dx + dx .
2 (x + 1) (x − 1)
(A common factor of (1/2) has been taken out.) Now a simple substitution will work for
each component. (Let u = x + 1 for the first, and u = x − 1 for the second integral.) The
result is 0
1 1
I= 2
= (− ln |x + 1| + ln |x − 1|) + C.
x −1 2
1 A B
= + .
x(1 − x) x (1 − x)
Then
1 = A(1 − x) + Bx.
This must hold for all x values. In particular, convenient values of x for determining the
constants are x = 0, 1. We find that
A = 1, B = 1.
Thus 0 0 0
1 1 1
I= dx = dx + dx.
x(1 − x) x 1−x
Simple substitution now gives
I = ln |x| − ln |1 − x| + C.
126 Chapter 6. Techniques of Integration
i.e. 0 0
uv = v du + u dv.
Solution
Let u = ln(x) and dv = dx. Then du = (1/x) dx and v = x.
0 0 0
ln(x) dx = x ln(x) − x(1/x) dx = x ln(x) − dx = x ln(x) − x.
Solution
At first, it may be hard to decide how to assign roles for u and dv. Suppose we try u = ex
and dv = xdx. Then du = ex dx and v = x2 /2. This means that we would get the integral
in the form 0 2
x2 x x x
I= e − e dx.
2 2
This is certainly not a simplification, because the integral we obtain has a higher power of
x, and is consequently harder, not easier to integrate. This suggests that our first attempt
was not a helpful one. (Note that integration often requires trial and error.)
Let u = x and dv = ex dx. This is a wiser choice because when we differentiate u,
we reduce the power of x (from 1 to 0), and get a simpler expression. Indeed, du = dx,
v = ex so that 0 0
xe dx = xe − ex dx = xex − ex + C.
x x
To find a definite integral of this kind on some interval (say 0 ≤ x ≤ 1), we compute
0 1 21
2
I= x x x
xe dx = (xe − e ) 22 = (1e1 − e1 ) − (0e0 − e0 ) = 0 + e0 = e0 = 1.
0 0
Note that all parts of the expression are evaluated at the two endpoints.
128 Chapter 6. Techniques of Integration
Solution
We can calculate this integral by repeated application of the idea in the previous example.
Letting u = xn and dv = ex dx leads to du = nxn−1 and v = ex . Then
0 0
In = xn ex − nxn−1 ex dx = xn ex − n xn−1 ex dx.
Each application of integration by parts, reduces the power of the term xn inside an integral
by one. The calculation is repeated until the very last integral has been simplified, with
no remaining powers of x. This illustrates that in some problems, integration by parts is
needed more than once.
Solution
Let u = arctan(x) and dv = dx. Then du = (1/(1 + x2 )) dx and v = x so that
0
1
I = x arctan(x) − x dx.
1 + x2
The last integral can be done with the simple substitution w = (1 + x2 ) and dw = 2x dx,
giving
0
I = x arctan(x) − (1/2) (1/w)dw.
We obtain, as a result
1
I = x arctan(x) − ln(1 + x2 ).
2
Solution
We might try to fit this into a similar pattern, i.e. let u = tan(x) and dv = dx. Then
du = sec2 (x) dx and v = x, so we obtain
0
I = x tan(x) − x sec2 (x) dx.
This is not really a simplification, and we see that integration by parts will not necessarily
work, even on a seemingly related example. However, we might instead try to rewrite the
integral in the form 0 0
sin(x)
I = tan(x) dx = dx.
cos(x)
Now we find that a simple substitution will do the trick, i.e. that w = cos(x) and dw =
− sin(x) dx will convert the integral into the form
0
1
I= (−dw) = − ln |w| = − ln | cos(x)|.
w
This example illustrates that we should always try substitution, first, before attempting
other methods.
We refer to this integral as I1 because a related second integral, that we’ll call I2 will appear
in the calculation.
Solution
Let u = ex and dv = sin(x) dx. Then du = ex dx and v = − cos(x) dx. Therefore
0 0
I1 = −ex cos(x) − (− cos(x))ex dx = −ex cos(x) + cos(x)ex dx.
We now have another integral of a similar form to tackle. This seems hopeless, as we
have not simplified the result, but let us not give up! In this case, another application of
integration by parts will do the trick. Call I2 the integral
0
I2 = cos(x)ex dx,
so that
I1 = −ex cos(x) + I2 .
Repeat the same procedure for the new integral I2 , i.e. Let u = ex and dv = cos(x) dx.
Then du = ex dx and v = sin(x) dx. Thus
0
I2 = ex sin(x) − sin(x)ex dx = ex sin(x) − I1 .
130 Chapter 6. Techniques of Integration
This appears to be a circular argument, but in fact, it has a purpose. We have determined
that the following relationships are satisfied by the above two integrals:
I1 = −ex cos(x) + I2
I2 = ex sin(x) − I1 .
We can eliminate I2 , obtaining
that is,
I1 = −ex cos(x) + ex sin(x) − I1 .
Rearranging (taking I1 to the left hand side) leads to
6.8 Summary
In this chapter, we explored a number of techniques for computing antiderivatives. We here
summarize the most important results:
1. Substitution is the first method to consider. This method works provided the change
of variable results in elimination of the original variable and leads to a simpler, more
elementary integral.
2. When using substitution on a definite integral, endpoints can be converted to the
new variable (Method 1) or the resulting antiderivative can be converted back to its
original variable before plugging in the (original) endpoints (Method 2).
3. The integration by parts formula for functions u(x), v(x) is
0 0
u dv = uv − v du.
Integration by parts is useful when u is easy to differentiate (but not easy to integrate).
It is also helpful when the integral contains a product of elementary functions such
as xn and a trigonometric or an exponential function. Sometimes more than one
application of this method is needed. Other times, this method is combined with
substitution or other simplifications.
6.8. Summary 131
4. Using integration by parts on a definite integral means that both parts of the formula
are to be evaluated at the endpoints.
√
5. Integrals involving 1 ± x2 can be simplified by making a trigonometric substitu-
tion.
6. Integrals with products or powers of trigonometric functions can sometimes be sim-
plified by application of trigonometric identities or simple substitution.
7. Algebraic tricks, and many associated manipulations are often applied to twist and
turn a complicated integral into a set of simpler expressions that can each be handled
more easily.
8. Even with all these techniques, the problem of finding an antiderivative can be very
complicated. In some cases, we resort to handbooks of integrals, use symbolic ma-
nipulation software packages, or, if none of these work, calculate a given definite
integral numerically using a spreadsheet.
7.1 Introduction
In this chapter we lay the groundwork for calculations and rules governing simple discrete
probabilities24. Such skills are essential in understanding problems related to random pro-
cesses of all sorts. In biology, there are many examples of such processes, including the
inheritance of genes and genetic diseases, the random motion of cells, the fluctuations in
the number of RNA molecules in a cell, and a vast array of other phenomena.
To gain experience with probability, it is important to see simple examples. In this
chapter, we discuss experiments that can be easily reproduced and tested by the reader.
24 I am grateful to Robert Israel for comments regarding the organization of this chapter
133
134 Chapter 7. Discrete probability and the laws of chance
7.3.2 Outcome
Whenever we perform the experiment, exactly one outcome happens. In this chapter we
will deal with discrete probability, where there is a finite list of possible outcomes.
Consider the following experiment: We toss a coin and see how it lands. Here there
are only two possible results: “heads” (H) or “tails” (T). A fair coin is one for which these
results are equally likely. This means that if we repeat this experiment many many times,
we expect that on average, we get H roughly 50% of the time and T roughly 50% of the
time. This will lead us to define a probability of 1/2 for each outcome.
Similarly, consider the experiment of rolling a dice: A six-sided die can land on any
of its six faces, so that a “single experiment” has six possible outcomes. For a fair die, we
anticipate getting each of the results with an equal probability, i.e. if we were to repeat
the same experiment many many times, we would expect that, on average, the six possible
events would occur with similar frequencies, each 1/6 of the times. We say that the events
are random and unbiased for “fair” dice.
We will often be interested in more complex experiments. For example, if we toss a
coin five times, an outcome corresponds to a five-letter sequence of “Heads” (H) and “Tails”
(T), such as THTHH. We are interested in understanding how to quantify the probability
of each such outcome in fair (as well as unfair) coins. If we toss a coin ten times, how
probable is it that we get eight out of ten heads? For dice, we could ask how likely are
we to roll a 5 and a 6 in successive experiments? A 5 or a 6? For such experiments we
are interested in quantifying how likely it is that a certain event is obtained. Our goal in
this chapter is to make more precise our notion of probability, and to examine ways of
quantifying and computing probabilities. To motivate this investigation, we first look at
results of a real experiment performed in class by students.
pi = xi /N,
i.e pi is the fraction of times that the result i is obtained out of all the experiments. We ex-
pect that if we repeated the experiment many more times, this empirical probability would
7.3. Simple experiments 135
Rules of probability
About Rule 1: pi = 0 implies that the given outcome never happens, whereas pi = 1
implies that this outcome is the only possibility (and always happens). Any value inside
the range (0,1) means that the outcome occurs some of the time. Rule 2 makes intuitive
sense: it means that we have accounted for all possibilities, i.e. the fractions corresponding
to all of the outcomes add up to 100% of the results.
In a case where there are M possible outcomes, all with equal probability, it follows
that pi = 1/M for every i.
F (xi ) = Prob(X ≤ xi ).
The function F merely sums up all the probabilities of outcomes up to and including xi ,
hence is called “cumulative”. This implies that F (xn ) = 1 where xn is the largest value
attainable by the random variable. For example, in the rolling of a die, if we list the possible
outcomes in ascending order as {1, 2, . . . , 6}, then F (6) stands for the probability of rolling
a 6 or any lower value, which is clearly equal to 1 for a six-sided die.
Table 7.1. Results of a real coin-tossing experiment carried out by 121 students
in this mathematics course. Each student tossed a coin 10 times. We recorded the “fre-
quency”, i.e. the number of students ni who each got xi = 0, 1, 2, . . . , 10 heads. The
fraction of the class that got each outcome, ni /N , is identified with the (empirical) prob-
ability of that outcome, p(xi ). We also compute the cumulative function F (xi ) in the last
column. See Figure 7.1 for the same data presented graphically.
n
$
x̄ = xi p(xi ) .
i=0
The expected value is a kind of “average value of x”, where values of x are weighted
by their frequency of occurrence. This idea is related to the concept of center of mass
defined in Section 5.3.1 (x positions weighted by masses associated with those positions).
138 Chapter 7. Discrete probability and the laws of chance
0.4 1.0
empirical probability of i heads in 10 tosses Cumulative function
0.0 0.0
0.0 number of heads (i) 10.0 0.0 number of heads (i) 10.0
Figure 7.1. The data from Table 7.1 is shown plotted on this graph. A total of
N = 121 people were asked to toss a coin n = 10 times. In the bar graph (left), the
horizontal axis reflects i, the number, of heads (H) that came up during those 10 coin
tosses. The vertical axis reflects the fraction p(xi ) of the class that achieved that particular
number of heads. In the lower graph, the same data is shown by the discrete points. We
also show the cumulative function that sums up the values from left to right. Note that the
cumulative function is a “step function” .
The mean is a point on the x axis, representing the “average” outcome of an experiment.
(Recall that in the distributions we are describing, the possible outcomes of some observa-
tion or measurement process are depicted on the x axis of the graph.) The mean is not the
same as the average value of a function, discussed in Section 4.6. (In that case, the average
is an average y coordinate, or average height of the function.)26
We also define quantities that represents the width of the distribution. We define the
variance, V and standard deviation, σ as follows:
The variance is related to the square of the quantity represented on the x axis, and since
the standard deviation its square root, σ carries the same units as x. For this reason, it is
26 Note to the instructor: students often mix these two distinct meanings of the word average, and they should
common to associate the value of σ, with a typical “width” of the distribution. Having a
low value of σ means that most of the experimental results are close to the mean, whereas
a large σ signifies that there is a large scatter of experimental values about the mean.
In the problem sets, we show that the variance can also be expressed in the form
V = M2 − x̄2 ,
where M2 is the second moment of the distribution. Moments of a distribution are defined
as the values obtained by summing up products of the probability weighted by powers of
x.
Example 7.1 (Rolling a die) Suppose you toss a die, and let the random variable be X be
the number obtained on the die, i.e. (1 to 6). If this die is fair, then it is equally likely to get
any of the six possible outcomes, so each has probability 1/6. In this case
xi = i, i = 1, 2 . . . 6 p(xi ) = 1/6.
Example 7.2 (Expected number of heads (empirical)) For the empirical probability dis-
tribution shown in Figure 7.1, the mean (expected value) is calculated from results in Ta-
ble 7.1 as follows:
10
$
x̄ = xi p(xi ) = 0(0)+1(0.0083)+2(0.0165)+. . .+8(0.0579)+9(0)+10(0) = 5.2149
k=0
140 Chapter 7. Discrete probability and the laws of chance
Thus, the mean number of heads in this set of experiments is about 5.2. This is close to
what we would expect intuitively in a fair coin, namely that, on average, 5 out of 10 tosses
(i.e. 50%) would result in heads. To compute the variance we form the sum
10
$ 10
$
2
V = (xk − x̄) p(xk ) = (k − 5.2149)2 p(k).
k=0 k=0
Here we have used the mean calculated above and the fact that xk = k. We obtain
(Because there was no replicate of the experiment that led to 9 or 10 heads out of 10
tosses,
√ these values do not contribute to the calculation.) The standard deviation is then
σ = V = 1.4328.
Based on the results in Table 7.2 and on the two principles outline above, we can compute
the probability of obtaining 0, 1, 2, or 3 successes out of 3 trials. The results are shown
in Table 7.3. In constructing Table 7.3, we have considered all the ways of obtaining 0
Probability of X heads
Prob(X = 0) = q 3
Prob(X = 1) = 3pq 2
Prob(X = 2) = 3p2 q
Prob(X = 3) = p3
successes (there is only one such way, namely SSS, and its probability is p3 ), all the ways
of obtaining only one success (here we must allow for SFF, FSF, FFS, each having the
same probability pq 2 ) etc. Since these results are mutually exclusive (only one such result
is possible for any given replicate of the 3-trial experiment), the addition principle is used
to compute the probability Prob(SFF or FSF or FFS).
142 Chapter 7. Discrete probability and the laws of chance
1
1 1
1 2 1
1 3 3 1
1 4 6 4 1
1 5 10 10 5 1
Table 7.4. Pascal’s triangle contains the binomial coefficients of the C(n, k).
Each term in Pascal’s triangle is obtained by adding the two diagonally above it. The top
of the triangle represents C(0, 0). The next row represents C(1, 0) and C(1, 1). For row
number n, terms along the row are the binomial coefficients C(n, k), starting with k = 0
at the beginning of the row and and going to k = n at the end of the row.
The binomial coefficients are symmetric, so that C(n, k) = C(n, n − k). They are entries
that occur in Pascal’s triangle, shown in Table 7.4.
0.4 0.4
The binomial distribution The binomial distribution
0.0 0.0
-0.5 10.5 -0.5 10.5
Figure 7.2. The binomial distribution is shown here for n = 10. We have plotted
Prob(X = k) versus k for k = 0, 1, . . . 10. This distribution is the same as the probability
of getting X heads out of 10 coin tosses for a fair coin. In the first panel, the probability
of success and failure are the same, i.e. p = q = 0.5. The distribution is then symmetric.
In the second panel, the probability of success is p = 1/4, so q = 3/4 and the resulting
distribution is skewed.
What does the binomial theorem say about the binomial distribution? First, since
there are only two possible outcomes in each Bernoulli trial, it follows that
p + q = 1, and hence (p + q)n = 1.
144 Chapter 7. Discrete probability and the laws of chance
That is, the sum of these terms represents the sum of probabilities of obtaining k =
0, 1, . . . , n successes. (And since this accounts for all possibilities, it follows that the sum
adds up to 1.)
We can compute the mean and variance of the binomial distribution using the follow-
ing tricks. We will write out an expansion for a product of the form (px + q)n . Here x will
be an abstract quantity introduced for convenience (i.e., for making the trick work):
n
$ n
$
(px + q)n = C(n, k)(px)k q n−k = C(n, k)pk q n−k xk .
k=0 k=0
The mean of the binomial distribution is X̄ = np where n is the number of trials and p is
the probability of success in one trial.
We continue to compute other quantities of interest. Multiply both sides of Eqn. 7.1
by x to obtain
$ n
nx(px + q)n−1 p = C(n, k)pk q n−k kxk .
k=0
Take the derivative again. The result is
n
$
n(px + q)n−1 p + n(n − 1)x(px + q)n−2 p2 = C(n, k)pk q n−k k 2 xk−1 .
k=0
Plug in x = 1 to get
n
$
np + n(n − 1)p2 = k 2 C(n, k)pk q n−k = M2 .
k=0
Thereby we have calculated the second moment of the distribution, the variance, and the
standard deviation. In summary, we found the following results:
7.6. Bernoulli trials 145
The second moment M2 , the Variance V and the standard deviation σ of a binomial distri-
bution are
M2 = np + n2 p2 − np2 ,
V = M2 − X̄ 2 = np − np2 = np(1 − p) = npq,
√
σ = npq.
0.4
The Normal distribution
0.0
-4.0 4.0
Figure 7.3. The Normal (or Gaussian) distribution is given by equation (7.2) and
has the distribution shown in this figure.
As the number of Bernoulli trials grows, i.e. as we toss our imaginary coin in longer
and longer sets (n → ∞), a remarkable thing happens to the binomial distribution: it
becomes smoother and smoother, until it grows to resemble a continuous distribution that
looks like a “Bell curve”. That curve is known as the Gaussian or Normal distribution. If
we scale this curve
√ vertically and horizontally (stretch vertically and compress horizontally
by the factor N/2) and shift its peak to x = 0, then we find a distribution that describes
the deviation from the expected value of 50% heads. The resulting function is of the form
1 2
p(x) = √ e−x /2 (7.2)
2π
146 Chapter 7. Discrete probability and the laws of chance
We will study properties of this (and other) such continuous distributions in a later
section. We show a typical example of the Normal distribution in Figure 7.3. Its cumulative
distribution is then shown (without and with the original distribution superimposed) in
Figure 7.4.
1.0 1.0
Figure 7.4. The Normal probability density with its corresponding cumulative function.
Genotype: aA AA aa Aa
Probability: pq p2 q2 pq
Genotype: aA or Aa AA aa
Probability: 2pq p2 q2
Table 7.5. If the probability of finding allele A is p and the probability of finding
allele A is q, then the eye color gene probabilities are as shown in the top table. However,
because genotype Aa is equivalent to genotype aA, we have combined these outcomes in
the revised second table.
In Table 7.6, we note, for example, that if the couple are both of type aA, each parent
can “donate” either a or A to the progeny, so we expect to see children of types aa, aA, AA
in the ratio 1:2:1 (regardless of the values of p and q).
We can now group together and summarize all the progeny of a given genotype, with
the probabilities that they are produced by one or another such random mating. Using this
table, we can then determine the probability of each of the three genotypes in the next
generation.
Mother: AA aA aa
p2 2pq q2
Father:
1 1
AA AA 2 aA 2 AA Aa
p2 p4 2pqp 2
p2 q 2
1 1 1 1 1 1 1
aA 2 aA 2 AA 4 aa 2 aA 4 AA 2 aa 2 Aa
2 2 2 2
2pq 2pqp 4p q 2pqq
1 1
aa Aa 2 aA 2 aa aa
q2 p2 q 2 2pqq 2
q4
Example 7.3 (Probability of AA progeny) Find the probability that a random (Hardy Wein-
berg) mating will give rise to a progeny of type AA.
Solution 1
Using Table 7.6, we see that there are only four ways that a child of type AA can result
from a mating: either both parents are AA, or one or the other parent is Aa, or both parents
are Aa. Thus, for children of type AA the probability is
1 1 1
Prob(child of type AA) = p4 + (2pqp2 ) + (2pqp2 ) + (4p2 q 2 ).
2 2 4
Simplifying leads to
In the problem set, we also find that the probability of a child of type aA is 2qp, the
probability of the child being type aa is q 2 . We thus observe that the frequency of genotypes
of the progeny is exactly the same as that of the parents. This type of genetic makeup is
termed Hardy-Weinberg genetics.
Alternate solution
child
AA
father mother
2pq p2 2pq p2
Aa AA Aa AA
1/2 1 1 1/2
A or A A or A
(pq+p 2 ) . ( pq + p2 )
Figure 7.5. A tree diagram to aid the calculation of the probability that a child
with genotype AA results from random assortative (Hardy Weinberg) mating.
In Figure 7.5, we show an alternate solution to the same problem using a tree dia-
gram. Reading from the top down, we examine all the possibilities at each branch point.
A child AA cannot have any parent of genotype aa, so both father and mother’s genotype
could only have been one of AA or Aa. Each arrow indicating the given case is accom-
panied by the probability of that event. (For example, a random individual has probability
2pq of having genotype Aa, as shown on the arrows from the father and mother to these
genotypes.) Continuing down the branches, we ask with what probability the given parent
would have contributed an allele of type A to the child. For a parent of type AA, this is
certainly true, so the given branch carries probability 1. For a parent of type Aa, the proba-
bility that A is passed down to the child is only 1/2. The combined probability is computed
as follows: we determine the probability of getting an A from father (of type AA OR Aa):
This is Prob(A from father)=(1/2)2pq + 1 · p2 ) = (pq + p2 ) and multiply it by a similar
probability of getting A from the mother (of type AA OR Aa). (We must multiply, since
we need A from the father AND A from the mother for the genotype AA.) Thus,
Prob(child of type AA) =(pq + p2 )(pq + p2 ) = p2 (q + p)2 = p2 · 1 = p2 .
It is of interest to investigate what happens when one of the assumptions we made is
150 Chapter 7. Discrete probability and the laws of chance
relaxed, for example, when the genotype of the individual has an impact on survival or on
the ability to reproduce. While this is beyond our scope here, it forms an important theme
in the area of genetics.
(a)
q p
x
−1 0 1
(b)
x
Figure 7.6. A random walker in 1 dimension takes a step to the right with proba-
bility p and a step to the left with probability q.
The process described here is classic, and often attributed to a drunken wanderer. In
our case, we could consider this motion as a 1D simplification of the random tumbles and
swims of a bacterium in its turbulent environment. it is usually the case that a goal of this
swim is a search for some nutrient source, or possibly avoidance of poor environmental
conditions. We shall see that if the probabilities of left and right motion are unequal (i.e.
the motion is biased in one direction or another) this swimmer tends to drift along towards
a preferred direction.
In this problem, each step has only two outcomes (analogous to a trial in a Bernoulli
experiment). We could imagine the walker tossing a coin to determine whether to move
7.8. Random walker 151
right or left. We wish to characterize the probability of the walker being at a certain posi-
tion at a given time, and to find her expected position after n steps. Our familiarity with
Bernoulli trials and the binomial distribution will prove useful in this context.
Example
(a) What is the probability of a run of steps as follows: RLRRRLRLLLL
(b) Find the probability that the walker moves k steps to the right out of a total run of n
consecutive steps.
(c) Suppose that p = q = 1/2. What is the probability that a walker starting at the origin
returns to the origin on her 10’th step?
Solution
(a) The probability of the run RLRRRLRLLL is the product pqpppqpqqq = p5 q 5 . Note
the similarity to the question “What is the probability of tossing HTHHHTHTTT?”
(b) This problem is identical to the problem of k heads in n tosses of a coin. The proba-
bility of such an event is given by a term in the binomial distribution:
P(k out of n moves to right)=C(n, k)pk q n−k .
(c) The walker returns to the origin after 10 steps only if she has taken 5 steps to the left
(total) and 5 steps to the right (total). The order of the steps does not matter. Thus
this problem reduces to the problem (b) with 5 steps out of 10 taken to the right. The
probability is thus
P(back at 0 after 10 steps) = P(5 out of 10 steps to right)
" #10 " #
5 5 1 10! 1
=C(10, 5)p q = C(10, 5) = = 0.24609
2 5!5! 1024
Mean position
We now ask how to determine the expected position of the walker after n steps, i.e. how
the mean value of x depends on the number of steps and the probabilities associated with
each step. After 1 step, with probability p the position is x = +1 and with probability q,
the position is x = −1. The expected (mean) position after 1 move is thus
x1 = p(+1) + q(−1) = p − q
But the process follows a binomial distribution, and thus the mean after n steps is
xn = n(p − q).
152 Chapter 7. Discrete probability and the laws of chance
7.9 Summary
In this chapter, we introduced the notion of discrete probability of elementary events. We
learned that a probability is always a number between 0 and 1, and that the sum of (dis-
crete) probabilities of all possible (discrete) outcomes is 1. We then described how to
combine probabilities of elementary events to calculate probabilities of compound inde-
pendent events in a variety of simple experiments. We defined the notion of a Bernoulli
trial, such as tossing of a coin, and studied this in detail.
We investigated a number of ways of describing results of experiments, whether in
tabular or graphical form, and we used the distribution of results to define simple numerical
descriptors. The mean is a number that, more or less, describes the location of the “center”
of the distribution (analogous to center of mass), defined as follows:
The standard deviation is, roughly speaking, the “width” of the distribution.
While the chapter was motivated by results of a real experiment, we then investigated
theoretical distributions, including the binomial. We found that the distribution of events in
a repetition of a Bernoulli trial (e.g. coin tossed n times) was a binomial distribution, and
we computed the mean of that distribution.
Suppose that the probability of one of the events, say event e1 in a Bernoulli trial is p (and
hence the probability of the other event e2 is q = 1 − p), then
n!
P (k occurrences of given event out of n trials) = pk q n−k .
k!(n − k)!
This is called the binomial distribution. The mean of the binomial distribution, i.e. the
mean number of events e1 in n repeated Bernoulli trials is
x̄ = np.
Chapter 8
Continuous probability
distributions
8.1 Introduction
In Chapter 7, we explored the concepts of probability in a discrete setting, where outcomes
of an experiment can take on only one of a finite set of values. Here we extend these
ideas to continuous probability. In doing so, we will see that quantities such as mean and
variance that were previously defined by sums will now become definite integrals. Here
again, we will see the concepts of integral calculus in the context of practical examples and
applications.
We begin by extending the idea of a discrete random variable to the continuous case.
We call x a continuous random variable in a ≤ x ≤ b if x can take on any value in this
interval. An example of a random variable is the height of a person, say an adult male,
selected randomly from a population. (This height typically takes on values in the range
0.5 ≤ x ≤ 3 meters, say, so a = 0.5 and b = 3.)
If we select a male subject at random from a large population, and measure his height,
we might expect to get a result in the proximity of 1.7-1.8 meters most often - thus, such
heights will be associated with a larger value of probability than heights in some other
interval of equal length, e.g. heights in the range 2.7 < x < 2.8 meters, say. Unlike
the case of discrete probability, however, the measured height can take on any real number
within the interval of interest. This leads us to redefine our idea of a continuous probability,
using a continuous function in place of the discrete bar-graph seen in Chapter 7.
density is challenging for many students. Reinforcing the analogy with discrete masses versus distributed mass
density (discussed in Chapter 5) may be helpful.
153
154 Chapter 8. Continuous probability distributions
gous to the connection between the mass of discrete beads and a continuous mass density,
encountered previously in Chapter 5.
Definition
A function p(x) is a probability density provided it satisfies the following properties:
1. p(x) ≥ 0 for all x.
1b
2. a p(x) dx = 1 where the possible range of values of x is a ≤ x ≤ b.
The probability that a random variable x takes on values in the interval a1 ≤ x ≤ a2
is defined as
0 a2
p(x) dx.
a1
The transition to probability density means that the quantity p(x) does not carry the same
meaning as our previous notation for probability of an outcome xi , namely p(xi ) in the
discrete case. In fact, p(x)dx, or its approximation p(x)∆x is now associated with the
probability of an outcome whose values is “close to x”.
Unlike our previous discrete probability, we will not ask “what is the probability that
x takes on some exact value?” Rather, we ask for the probability that x is within some
range of values, and this is computed by performing an integral30 .
Having generalized the idea of probability, we will now find that many of the asso-
ciated concepts have a natural and straight-forward generalization as well. We first define
the cumulative function, and then show how the mean, median, and variance of a contin-
uous probability density can be computed. Here we will have the opportunity to practice
integration skills, as integrals replace the sums in such calculations.
Definition
For experiments whose outcome takes on values on some interval a ≤ x ≤ b, we define a
cumulative function, F (x), as follows:
0 x
F (x) = p(s) ds.
a
Then F (x) represents the probability that the random variable takes on a value in the range
(a, x) 31 . The cumulative function is simply the area under the probability density (between
the left endpoint of the interval, a, and the point x).
The above definition has several implications:
0 b
30 Remark: the probability that x is exactly equal to b is the integral p(x) dx. But this integral has a value
b
zero, by properties of the definite integral.
31 By now, the reader should be comfortable with the use of “s” as the “dummy variable” in this formula, where
Solution
The function is positive in the interval 0 ≤ x ≤ 6, so we can define the desired probability
density. Let
,π -
p(x) = C sin x .
6
(a) We must find the normalization constant, C, such that Property 2 of continuous prob-
ability is satisfied, i.e. such that
0 6
1= p(x) dx.
0
(We have used the fact that cos(0) = 1 in a step here.) But by Property 2, for p(x)
to be a probability density, it must be true that C(12/π) = 1. Solving for C leads to
the desired normalization constant,
π
C= .
12
Note that this calculation is identical to finding the area
0 6 ,π -
A= sin x dx,
0 6
1.0
F(x)
p(x)
0.0
0.0 6.0
Figure 8.1. The probability density p(x) (black), and the cumulative function
F (x) (red) for Example 8.2.1. Note that the area under the black curve is 1 (by normal-
ization), and thus the value of F (x), which is the cumulative area function is 1 at the right
endpoint of the interval.
constant. The only difference is the ultimate step of evaluating the integral at the variable endpoint x rather than
the fixed endpoint b = 6.
158 Chapter 8. Continuous probability distributions
The mean of a probability density is defined similarly, but the definition simplifies by virtue
1b
of the fact that a p(x) dx = 1. Since probability distributions are normalized, the denom-
inator in Eqn. (8.1) is simply 1.Consequently, the mean of a probability density is given as
follows:
Definition
For a random variable in a ≤ x ≤ b and a probability density p(x) defined on this interval,
the mean or average value of x (also called the expected value), denoted x̄ is given by
0 b
x̄ = xp(x) dx.
a
To avoid confusion note the distinction between the mean as an average value of x versus
the average value of the function p over the given interval. Reviewing Example 5.3.3 may
help to dispel such confusion.
The idea of median encountered previously in grade distributions also has a parallel
here. Simply put, the median is the value of x that splits the probability distribution into
two portions whose areas are identical.
Definition
The median xmed of a probability distribution is a value of x in the interval a ≤ xmed ≤ b
such that
0 xmed 0 b
1
p(x) dx = p(x) dx = .
a xmed 2
It follows from this definition that the median is the value of x for which the cumulative
function satisfies
1
F (xmed ) = .
2
Solution
To find the mean we compute
0 6 ,π -
π
x̄ = x sin x dx.
12 0 6
8.3. Mean and median 159
34
3π 4
Integration by parts is required here
3π 4. Let u = x, dv = sin 6 x dx.
6
Then du = dx, v = − π cos 6 x . The calculation is then as follows:
% &
π 6 , π - 226 6 0 6 ,π -
x̄ = −x cos x 2 +
2 cos x dx
12 π 6 0 π 0 6
% &
1 , π - 226 6 , π - 226
= −x cos x 22 + sin x 22
2 6 0 π 6 0
" #
1 6 6 6
= −6 cos(π) + sin(π) − sin(0) = = 3. (8.2)
2 π π 2
1
F (xmed ) = .
2
Using the form of the cumulative function from Example 8.2.1, we find that
1.0
F(x)
0.5
0.0
0.0 x med 6.0
Figure 8.2. The cumulative function F (x) (red) for Example 8.2.1 in relation
to the median, as computed in Example 8.3.1. The median is the value of x at which
F (x) = 0.5, as shown in green.
0 xmed ,π - 1 1, ,π -- 1
sin s ds = ⇒ 1 − cos xmed = .
0 6 2 2 6 2
34 Recall from Chapter 6 that
1 1
udv = vu − vdu. Calculations of the mean in continuous probability often
involve Integration by Parts (IBP), since the integrand consists of an expression xp(x)dx. The idea of IBP is
to reduce the integration to something involving only p(x)dx, which is done essentially by “differentiating” the
term u = x, as we show here.
160 Chapter 8. Continuous probability distributions
Remark
A glance at the original probability distribution should convince us that it is symmetric
about the value x = 3. Thus we should have anticipated that the mean and median of this
distribution would both occur at the same place, i.e. at the midpoint of the interval. This
will be true in general for symmetric probability distributions, just as it was for symmetric
mass or grade distributions.
p(x) p(x)
x x
Figure 8.3. In a symmetric probability distribution (left) the mean and median are
the same. If the distribution is changed slightly so that it is no longer symmetric (as shown
on the right) then the median may still be the same, which the mean will have shifted to the
new “center of mass” of the probability density.
We have seen in Example 8.3.1 that for symmetric distributions, the mean and the
median are the same. Is this always the case? When are the two different, and how can we
understand the distinction?
Recall that the mean is closely associated with the idea of a center of mass, a concept
from physics that describes the location of a pivot point at which the entire “mass” would
8.4. Applications of continuous probability 161
1.0
F(x)
0.5
p(x)
0.0
0.0 xmed 6.0
Figure 8.4. As in Figures 8.1 and 8.2, but for the probability density p(x) =
(π/36)x sin(πx/6). This function is not symmetric, so the mean and median are not the
same. From this figure, we see that the median is approximately xmed = 3.6. We do not
show the mean (which is close but not identical). We can compute both the mean and the
median for this distribution using numerical integration with the spreadsheet. We find that
the mean is x̄ = 3.5679. Note that the “most probable value”, i.e. the point at which p(x)
is maximal is at x = 3.9, which is again different from both the mean and the median.
heights in a population, and explore how the distribution of radii is related to the distribution
of volumes in raindrop drop sizes. The interpretation of the probability density and the
cumulative function, as well as the means and medians in these cases will form the main
focus of our discussion.
Normalization
We first find the constant of normalization, i.e. find the constant C such that
0 ∞ 0 ∞
p(t) dt = Ce−kt dt = 1.
0 0
Recall that an integral of this sort, in which one of the endpoints is at infinity is called an
improper integral36. Some care is needed in understanding how to handle such integrals,
and in particular when they “exist” (in the sense of producing a finite value, despite the
infinitely long domain of integration). We will delay full discussion to Chapter 10, and
state here the definition:
0 ∞ 0 T
I= Ce−kt dt ≡ lim IT where IT = Ce−kt dt.
0 T →∞ 0
The idea is to compute an integral over a finite interval 0 ≤ t ≤ T and then take a limit as
the upper endpoint, T goes to infinity (T → ∞). We compute:
0 T . −kt / 2T
e 2 = 1 C(1 − e−kT ).
2
IT = C e−kt dt = C
0 −k 2
0 k
Now we take the limit:
1 1
I = lim IT = lim C(1 − e−kT ) = C(1 − lim e−kT ). (8.3)
T →∞ T →∞ k k T →∞
To compute this limit, recall that for k > 0, T > 0, the exponential term in Eqn. 8.3 decays
to zero as T increases, so that
lim e−kT = 0.
T →∞
Thus, the second term in braces in the integral I in Eqn. 8.3 will vanish as T → ∞ so that
the value of the improper integral will be
1
I = lim IT = C.
T →∞ k
1
To find the constant of normalization C we require that I = 1, i.e. C = 1, which means
k
that
C = k.
Thus the (normalized) probability density for the decay is
p(t) = ke−kt .
This means that the fraction of atoms that decay between time t1 and t2 is
0 t2
k e−kt dt.
t1
36 We have alreadyencountered such integrals in Sections 3.8.5 and 4.5. See also, Chapter 10 for a more detailed
discussion of improper integrals.
164 Chapter 8. Continuous probability distributions
Cumulative decays
The fraction of the atoms that decay between time 0 and time t (i.e. “any time up to time
t” or “by time t - note subtle wording”37) is
0 t 0 t
F (t) = p(s) ds = k e−ks ds.
0 0
Thus, the probability of the atoms decaying by time t (which means anytime up to time t)
is
F (t) = 1 − e−kt .
We note that F (0) = 0 and F (∞) = 1, as expected for the cumulative function.
We compute this integral again as an improper integral by taking a limit as the top endpoint
increases to infinity, i.e. we first find
0 T
IT = tp(t) dt,
0
37 Note that the precise English wording is subtle, but very important here. “By time t” means that the event
1
t̄ = lim IT = .
T →∞ k
1
t̄ = .
k
h h
h
Δh Δh
Figure 8.5. Refining a histogram by increasing the number of bins leads (eventu-
ally) to the idea of a continuous probability density.
Suppose we want to record this distribution in more detail. We could divide the
population into smaller groups by shrinking the size of the interval or “bin” into which
height is subdivided. (An example is shown in Figure 8.5(b)). Here, by a “bin” we mean a
little interval of width ∆h where h is height, i.e. a height interval. For example, we could
keep track of the heights in increments of 50 cm. If we were to plot the number of students
in each height category, then as the size of the bins gets smaller, so would the height of the
bar: there would be fewer students in each category if we increase the number of categories.
To keep the bar height from shrinking, we might reorganize the data slightly. Instead
of plotting the number of students in each bin, we might plot
The important point to consider is that the height of each bar in the plot represents the
number of students per unit height.
38 I am grateful to David Austin for developing this example.
8.4. Applications of continuous probability 167
This type of plot is precisely what leads us to the idea of a density distribution. As
∆h shrinks, we get a continuous graph. If we “normalize”, i.e. divide by the total area
under the graph, we get a probability density, p(h) for the height of the population. As
noted, p(h) represents the fraction of students per unit height39 whose height is h. It is thus
a density, and has the appropriate units. In this case, p(h) ∆h represents the fraction of
individuals whose height is in the range h ≤ height ≤ h + ∆h.
(c) Suppose that we are told that F (75) = 0.8 and that F (80) differs from F (75) by
0.11. Interpret this information in plain English. What is the probability of surviving
to age 80? Which is larger, F (75) or F (80)?
(d) Use the information in part (c) to estimate the probability of dying between the ages
of 75 and 80 years old. Further, estimate p(80) from this information.
Solution
(a) The probability of dying by age a is the same as the probability of dying any time
up to age a. Restated, this is the probability that the age of death is in the interval
0 ≤ age of death ≤ a. The appropriate quantity is the cumulative function, for this
probability density
0 a
F (a) = p(x) dx.
0
(b) The probability of surviving to age a is the same as the probability of not dying
before age a. By the elementary properties of probability discussed in the previous
chapter, this is
1 − F (a).
(c) F (75) = 0.8 means that the probability of dying some time up to age 75 is 0.8.
(This also means that the probability of surviving past this age would be 1-0.8=0.2.)
From the properties of probability, we know that the cumulative distribution is an
increasing function, and thus it must be true that F (80) > F (75). Then F (80) =
F (75) + 0.11 = 0.8 + 0.11 = 0.91. Thus the probability of surviving to age 80
is 1-0.91=0.09. This means that 9% of the population will make it to their 80’th
birthday.
(d) The probability of dying between the ages of 75 and 80 years old is exactly
0 80
p(x) dx.
75
However, we can also state this in terms of the cumulative function, since
0 80 0 80 0 75
p(x) dx = p(x) dx − p(x) dx = F (80) − F (75) = 0.11
75 0 0
Thus the probability of death between the ages of 75 and 80 is 0.11.
To estimate p(80), we use the connection between the probability density and the
cumulative distribution40:
p(x) = F % (x). (8.4)
Then it is approximately true that
F (x + ∆x) − F (x)
p(x) ≈ . (8.5)
∆x
(Recall the definition of the derivative, and note that we are approximating the deriva-
tive by the slope of a secant line.) Here we have information at ages 75 and 80, so
∆x = 80 − 75 = 5, and the approximation is rather crude, leading to
F (80) − F (75) 0.11
p(80) ≈ = = 0.022 per year.
5 5
Several important points merit attention in the above example. First, information contained
in the cumulative function is useful. Differences in values
1 b of F between x = a and x = b
are, after all, equivalent to an integral of the function a p(x)dx, and are the probability
of a result in the given interval, a ≤ x ≤ b. Second, p(x) is the derivative of F (x). In
the expression (8.5), we approximated that derivative by a small finite difference. Here we
see at play many of the themes that have appeared in studying calculus: the connection be-
tween derivatives and integrals, the Fundamental Theorem of Calculus, and the relationship
between tangent and secant lines.
40 In Eqn. (8.4) there is no longer confusion between a variable of integration and an endpoint, so we could
revert to the notation p(a) = F " (a), helping us to identify the independent variable as age. However, we have
avoided doing so simply so that the formula in Eqn. (8.5) would be very recognizable as an approximation for a
derivative.
8.4. Applications of continuous probability 169
(a) Determine what is the probability density for raindrop radii, p(r)? Interpret the
meaning of that function.
(b) What is the associated cumulative function F (r) for this probability density? Inter-
pret the meaning of that function.
Solution
This problem is challenging because one may be tempted to think that the uniform distribu-
tion of drop radii should give a uniform distribution of drop volumes. This is not the case,
as the following argument shows! The sequence of steps is illustrated in Figure 8.6.
r r
4 4
p(V) (c) F(V) (d)
V V
Figure 8.6. Probability densities for raindrop radius and raindrop volume (left
panels) and for the cumulative distributions (right) of each for Example 8.4.5.
170 Chapter 8. Continuous probability distributions
(a) The probability density function is p(r) = 1/4 for 0 ≤ r ≤ 4. This means that
the probability per unit radius of finding a drop of size r is the same for all radii in
0 ≤ r ≤ 4, as shown in Fig. 8.6(a). Some of these drops will correspond to small
volumes, and others to very large volumes. We will see that the probability per unit
volume of finding a drop of given volume will be quite different.
(c) The cumulative function F (r) is proportional to the radius of the drop. We use the
connection between radii and volume of spheres to rewrite that function in terms of
the volume of the drop: Since
4
V = πr3 (8.7)
3
we have
" #1/3
3
r= V 1/3 .
4π
Substituting this expression into the formula (8.6), we get
" #1/3
1 3
F (V ) = V 1/3 .
4 4π
(d) We now use the connection between the probability density and the cumulative distri-
bution, namely that p is the derivative of F . Now that the variable has been converted
to volume, that derivative is a little more “interesting”:
p(V ) = F % (V )
Therefore,
" #1/3
1 3 1 −2/3
p(V ) = V .
4 4π 3
Thus the probability per unit volume of finding a drop of volume V in 0 ≤ V ≤ 43 π43
is not at all uniform. This probability density is shown in Fig. 8.6(c) This results
from the fact that the differential quantity dr behaves very differently from dV , and
reinforces the fact that we are dealing with density, not with a probability per se. We
note that this distribution has smaller values at larger values of V .
8.5. Moments of a probability density 171
0 b
zero’th moment M0 = f (x) dx
a
0 b
first moment M1 = x f (x) dx
a
0 b
second moment M2 = x2 f (x) dx
a
..
.
0 b
n’th moment Mn = xn f (x) dx.
a
Observe that moments of any order are defined by integrating the distribution f (x)
with a suitable power of x over the interval [a, b]. However, in practice we will see that
usually moments up to the second are usefully employed to describe common attributes of
a distribution.
172 Chapter 8. Continuous probability distributions
(This follows from the basic property of a probability density.) Thus The zero’th moment
of any probability density is 1. Further
0 b
M1 = x p(x) dx = x̄ = µ.
a
That is, The first moment of a probability density is the same as the mean (i.e. expected
value) of that probability density. So far, we have used the symbol x̄ to represent the mean
or average value of x but often the symbol µ is also used to denote the mean.
The second moment, of a probability density also has a useful interpretation. From
above definitions, the second moment of p(x) over the interval a ≤ x ≤ b is
0 b
M2 = x2 p(x) dx.
a
We will shortly see that the second moment helps describe the way that density is dis-
tributed about the mean. For this purpose, we must describe the notion of variance or
standard deviation.
Variance
The variance is defined as the average value of the quantity (distance f rom mean)2 ,
where the average is taken over the whole distribution. (The reason for the square is that
we would not like values to the left and right of the mean to cancel out.) For discrete
probability with mean, µ we define variance by
8.5. Moments of a probability density 173
$
V = (xi − µ)2 pi .
0 b
V = (x − µ)2 p(x) dx.
a
√
σ= V.
Let us see what this implies about the connection between the variance and the moments
of the distribution.
0 b 0 b 0 b
2
V = x p(x)dx − 2µx p(x) dx + µ2 p(x) dx
a a a
0 b 0 b 0 b
2 2
= x p(x)dx − 2µ x p(x) dx + µ p(x) dx.
a a a
We recognize the integrals in the above expression, since they are simply moments of the
probability distribution. Using the definitions, we arrive at
V = M2 − 2µ µ + µ2 .
Thus
V = M2 − µ2 .
Observe that the variance is related to the second moment, M2 and to the mean, µ of the
distribution.
174 Chapter 8. Continuous probability distributions
√ !
σ = V = M2 − µ2 .
(This was already known, since we have determined that the zeroth moment of any proba-
bility density is 1.) We also find that
0 b 0 b 2b
1 1 x2 22 b 2 − a2
M1 = x p(x) dx = x dx = = .
a b−a a b − a 2 2a 2(b − a)
(b − a)(b + a) b+a
µ = M1 = = .
2(b − a) 2
The value (b + a)/2 is a midpoint of the interval [a, b]. Thus we have found that the mean µ
is in the center of the interval, as expected for a symmetric distribution. The median would
be at the same place by a simple symmetry argument: half the area is to the left and half
the area is to the right of this point.
To find the variance we calculate the second moment,
0 b 0 b " # 3 2b
1 1 x 22 b 3 − a3
M2 = x2 p(x) dx = x2 dx = = .
a b−a a b−a 3 a 2 3(b − a)
(b − a)(b2 + ab + a2 ) b2 + ab + a2
M2 = = .
3(b − a) 3
41 As noted before, this is a uniform distribution. It has the shape of a rectangular band of height C and base
(b − a).
8.6. Summary 175
8.6 Summary
In this chapter, we extended the discrete probability encountered in Chapter 7 to the case of
continuous probability density. We learned that this function is a probability per unit value
(of the variable of interest), so that
0 b
p(x)dx = probability that x takes a value in the interval (a, b).
a
and showed that the first few moments are related to mean and variance of the probability.
Most of these concepts are directly linked to the analogous ideas in discrete probability,
but in this chapter, we used integration in place of summation, to deal with the continuous,
rather than the discrete case.
Chapter 9
Differential Equations
9.1 Introduction
A differential equation is a relationship between some (unknown) function and one of its
derivatives. Examples of differential equations were encountered in an earlier calculus
course in the context of population growth, temperature of a cooling object, and speed of a
moving object subjected to friction. In Section 4.2.4, we reviewed an example of a differ-
ential equation for velocity, (4.8), and discussed its solution, but here, we present a more
systematic approach to solving such equations using a technique called separation of vari-
ables. In this chapter, we apply the tools of integration to finding solutions to differential
equations. The importance and wide applicability of this topic cannot be overstated.
In this course, since we are concerned only with functions that depend on a single
variable, we discuss ordinary differential equations (ODE’s), whereas later, after a mul-
tivariate calculus course where partial derivatives are introduced, a wider class, of partial
differential equations (PDE’s) can be studied. Such equations are encountered in many ar-
eas of science, and in any quantitative analysis of systems where rates of change are linked
to the state of the system. Most laws of physics are of this form; for example, applying
the familiar Newton’s law, F = ma, links the position of a pendulum’s mass to its accel-
eration (second derivative of position).42 Many biological processes are also described by
differential equations. The rate of growth of a population dN/dt depends on the size of
that population at the given time N (t).
Constructing the differential equation that adequately represents a system of interest
is an art that takes some thought and experience. In this process, which we call “modeling”,
many simplifications are made so that the essential properties of a given system are cap-
tured, leaving out many complicating details. For example, friction might be neglected in
“modeling” a perfect pendulum. The details of age distribution might be neglected in mod-
eling a growing population. Now that we have techniques for integration, we can devise a
new approach to computing solutions of differential equations.
Given a differential equation and a starting value, the goal is to make a prediction
42 Newton’s law states that force is proportional to acceleration. For a pendulum, the force is due to gravity, and
the acceleration is a second derivative of the x or y coordinate of the bob on the pendulum.
177
178 Chapter 9. Differential Equations
about the future behaviour of the system. This is equivalent to identifying the function that
satisfies the given differential equation and initial value(s). We refer to such a function as
the solution to the initial value problem (IVP). In differential calculus, our exploration
of differential equations was limited to those whose solution could be guessed, or whose
solution was supplied in advance. We also explored some of the fascinating geometric and
qualitative properties of such equations and their predictions.
Now that we have techniques of integration, we can find the analytic solution to a
variety of simple first-order differential equations (i.e. those involving the first derivative
of the unknown function). We will describe the technique of separation of variables. This
technique works for examples that are simple enough that we can isolate the dependent
variable (e.g. y) on one side of the equation, and the independent variable (e.g. time t) on
the other side.
where the rate of births is given by the product of the per capita average birth rate b and the
population size y. Similarly, the rate of mortality is given by my. Translating the rate of
43 Of course, we must keep in mind that such predictions are based on simplifying assumptions, and are to be
change over a corresponding interval y0 ≤ y ≤ y(T ). Here y0 is the given starting value
of y (prescribed by the initial condition in (9.1)). We do not yet know y(T ), but our goal
is to find that value, i.e to predict the future behaviour of y. Integrating leads to
0 y(T ) 0 T 0 T
1
dy = k dt = k dt,
y0 y 0 0
2y(T ) 2T
2 2
ln |y|2
2 = kt22 ,
y0 0
y(T )
= ekT ,
y0
y(T ) = y0 ekT .
But this result holds for any arbitrary final time, T . In other words, since this is true for any
time we chose, we can set T = t, arriving at the desired solution
The above formula relates the predicted value of y at any time t to its initial value, and to
all the parameters of the problem. Observe that plugging in t = 0, we get y(0) = y0 ekt =
y0 e0 = y0 , so that the solution (9.3) satisfies the initial condition. We leave as an exercise
for the reader45 to validate that the function in(9.3) also satisfies the differential equation in
(9.1).
By solving the initial value problem (9.1), we have determined that, under ideal con-
ditions, when the net per capita growth rate t is constant, a population will grow expo-
nentially with time. Recall that this validates results that we had encountered in our first
calculus course.
result is the same as k times the original function, as required by the equation (9.1).
9.3. Terminal velocity and steady states 181
a(t) = g,
which is equivalent to the statement that the velocity increases at a constant rate,
dv
= g. (9.4)
dt
Because g is constant, we do not need to use separation of variables, i.e. we can integrate
each side of this equation directly46 . Writing
0 0 0
dv
dt = g dt + C = g dt + C,
dt
where C is an integration constant, we arrive at
v(t) = gt + C. (9.5)
Here we have used (on1 the LHS)1that v is the antiderivative of dv/dt. (equivalently, we can
simplify the integral dvdt dt = dv = v). Plugging in v(0) = 0 into Eqn. (9.5) leads to
0 = g · 0 + C = C, so the constant we need is C = 0 and the velocity satisfies
v(t) = gt.
We have just arrived at a result that parallels Eqn. (4.4) of Section 4.2.3 (in slightly different
notation).
ma(t) = mg − γv(t),
where γ is the frictional coefficient. For an object of constant mass, we can divide through
by m, so
γ
a(t) = g − v(t).
m
46 It is important to note the distinction between this simple example and other cases where separation of vari-
ables is required. It would not be wrong to use separation of variables to find the solution for Eqn. (9.4), but it
would just be “overkill”, since simple integration of the each side of the equation “as is” does the job.
182 Chapter 9. Differential Equations
Let k = γ/m. Then, the velocity at any time satisfies the differential equation and initial
condition
dv
= g − kv, v(0) = 0. (9.6)
dt
We can find the solution to this differential equation and predict the velocity at any time t
using separation of variables.
20.0 velocity v
terminal velocity
0.0 time t
0.0 10.0
Figure 9.1. The velocity v(t) as a function of time given by Eqn. (9.7) as found in
Section 9.3.2. Note that as time increases, the velocity approaches some constant terminal
velocity. The parameters used were g = 9.8 m/s2 and k = 0.5.
Consider a time interval 0 ≤ t ≤ T , and suppose that, during this time interval, the
velocity changes from an initial value of v(0) = 0 to the final value, v(T ) at the final time,
T . Then using separation of variables and integration, we get
dv
= g − kv,
dt
dv
= dt,
g − kv
0 v(T ) 0 T
dv
= dt.
0 g − kv 0
9.3. Terminal velocity and steady states 183
Substitute u = g − kv for the integral on the left hand side. Then du = −kdv, dv =
(−1/k)du, so we get an integral of the form
0
1 1 1
− du = − ln |u|.
k u k
1
− (ln |g − kv(T )| − ln |g|) = T,
k
" 2 2#
1 2 g − kv(T ) 2
− ln 2
2 2 = T,
k g 2
2 2
2 g − kv(T ) 2
ln 22 2 = −kT.
g 2
We are finished with the integration step, but the function we are trying to find, v(T )
is still tangled up inside an expression involving the natural logarithm. Extricating it will
involve some subtle reasoning about signs because there is an absolute value to contend
with. As a first step, we exponentiate both sides to remove the logarithm.
2 2
2 g − kv(T ) 2
2 2 = e−kT ⇒ |g − kv(T )| = ge−kT .
2 g 2
Because the constant g is positive, we could remove absolute values signs from it. To
simplify further, we have to consider the sign of the term inside the absolute value in the
numerator. In the case we are considering here, v(0) = 0. This will mean that the quantity
g − kv(T ) is always be non-negative (i.e. g − kv(T ) ≥ 0). We will verify this fact shortly.
For the moment, supposing this is true, we can write
This is the solution to the initial value problem (9.6). It predicts the velocity of the
falling object through time. Note that we have arrived once more at the result obtained
in Eqn. (4.11), but using the technique of separation of variables47 .
We graph the expression given in (9.7) in Figure 9.1. Note that as t increases, the
term e−kt decreases rapidly, so that the velocity approaches a constant whose value is
g
v(t) → .
k
We call this the terminal velocity48 .
all evening at a uniform rate. Most people do not drink this way, instead quaffing a few large drinks over some
hour(s). It is possible to describe this, but we will not do so in this chapter.
50 This is also a simplifying assumption, as the rate of metabolism can depend on other factors, such as food
intake.
186 Chapter 9. Differential Equations
0.6
0.4
0.2
0
0 2 4 6 8 10
Figure 9.2. The level of alcohol in the blood is described by Eqn. (9.8) for the first
two hours of drinking. At t = 2h, the drinking stopped (so a = 0 from then on). The level
of alcohol in the blood then decays back to zero, following Eqn. (9.10).
The variable y(t) represents the concentration of chemical at time t, and the same
differential equation describes this chemical process. As above, given any initial level of
the substance, y(t) = y0 , the level of y will eventually approach the steady state, y = a/b.
formulate a model, then it shows how the problem can be recast as a single ODE in one dependent variable.
Finally, it illustrates a slightly different integral.
52 As we have assumed that the hole is at h = 0, we henceforth consider the height of the fluid surface, h(t) to
h
vΔ t
A a a
Figure 9.3. We investigate the time it takes to empty a container full of fluid by
deriving a differential equation model and solving it using the methods developed in this
chapter. A is the cross-sectional area of the cylindrical tank, a is the cross-sectional area
of the hole through which fluid drains, v(t) is the velocity of the fluid, and h(t) is the time
dependent height of fluid remaining in the tank (indicated by the dashed line). The volume
of fluid leaking out in a time span ∆t is av∆t - see small cylindrical volume indicated on
the right.
We refer to V (t) as the volume of fluid in the container at time t. Note that for the
cylindrical container, V (t) = Ah(t) where A is the cross-sectional area and h(t) is the
height of the fluid at time t. The rate of change of V is
dV
= −(rate volume lost as fluid flows out).
dt
(The minus sign indicates that the volume is decreasing).
At every second, some amount of fluid leaves through the hole. Suppose we are
told that the velocity of the water molecules leaving the hole is precisely v(t) in units of
cm/sec. (We will find out how to determine this velocity shortly.) Then in one second,
those particles have moved a distance v cm/sec · 1 sec = v cm. In fact, all the particles in
a little cylinder of length v behind these molecules have also left the hole. Indeed, if we
know the area of the hole, we can determine precisely what volume of water exits through
the hole each second, namely
(The small inset in Fig. 9.3 shows a little “cylindrical unit” of fluid that flows out of the
hole per second. The area is a and the length of that little volume is v. Thus the volume
leaving per second is va.)
So far we have a relationship between the volume of fluid in the tank and the velocity
of the water exiting the hole:
dV
= −av.
dt
Now we need to determine the velocity v of the flow to complete the formulation of the
problem.
188 Chapter 9. Differential Equations
In fact, both the height of fluid and its exit velocity are!constantly changing as the fluid
drains, so we might write [v(t)]2 = 2gh(t) or v(t) = 2gh(t). We have arrived at this
result using an energy balance argument.
where k is a constant that depends on the size and shape of the cylinder and its hole:
a!
k= 2g.
A
If the area of the hole is very small relative to the cross-sectional area of the tank, then
k will be very small, so that the tank will drain very slowly (i.e. the rate of change in h
per unit time will not be large). On a planet with a very high gravitational force, the same
tank will drain more quickly. A taller column of water drains faster. Once its height has
been reduced, its rate of draining also slows down. We comment that Equation (9.12) has
a minus sign, signifying that the height of the fluid decreases.
Using simple principles such as conservation of mass and conservation of energy,
we have shown that the height h(t) of water in the tank at time t satisfies the differential
equation (9.12). Putting this together with the initial condition (height of fluid h0 at time
t = 0), we arrive at initial value problem to solve:
dh √
= −k h, h(0) = h0 . (9.13)
dt
Clearly, this equation is valid only for h non-negative. We also
√ remark that Eqn. (9.13) is
nonlinear53 as it involves the variable h in a nonlinear term, h. Next, we use separation
of variables to find the height as a function of time.
chosen in this chapter are simple enough that we will not experience the true challenges of such nonlinearities.
190 Chapter 9. Differential Equations
! T !
h(T ) = −k + h0
2
"! #2
T
h(T ) = h0 − k .
2
Since this is true for any time t, we can also write the form of the solution as
"! #2
t
h(t) = h0 − k . (9.14)
2
Eqn. (9.14) predicts fluid height remaining in the tank versus time t. In Fig. 9.4 we show
some of the “solution curves”54 , i.e. functions of the form Eqn. (9.14) for a variety of initial
fluid height values h0 . We can also use our results to predict the emptying time, as shown
in the next section.
emptying time
0.0 V time t
0.0 20.0
Figure 9.4. Solution curves obtained by plotting Eqn. (9.14) for three different
initial heights of fluid in the container, h0 = 2.5, 5, 10. The parameter k = 0.4 in
each case. The “V” points to the time it takes the tank to empty starting from a height of
h(t) = 10.
54 As before, this figure was produced by plotting the analytic solution (9.14). A numerical method alternative
would use Euler’s Method and the spreadsheet to obtain the (approximate) solution directly from the initial value
problem (9.13).
9.6. Density dependent growth 191
"! #2
t
h0 − k = 0.
2
The time it takes to empty the tank depends on the initial height of water in the tank. Three
examples are shown in Figure 9.4 for initial heights of h0 = 2.5, 5, 10. The emptying time
depends on the square-root of the initial height. This means, for instance, that doubling
√ the
height of fluid initially in the tank only increases the time it takes by a factor of 2 ≈ 1.41.
Making the√hole smaller has a more direct “proportional” effect, since we have found that
k = (a/A) 2g.
Here r > 0 is called the intrinsic growth rate and K > 0 is called the carrying capacity.
K reflects that size of the population that can be sustained by the given environment. We
can understand this equation as a modified growth law in which the “density dependent”
term, r(K − N )/K, replaces the previous constant net growth rate k.
192 Chapter 9. Differential Equations
Replacing (N/K) by y in each case, we obtain the scaled equation and initial condition
given by
dy
= ry(1 − y), y(0) = y0 . (9.16)
dt
Now the variable y(t) measures population size in “units” of the carrying capacity, and
y0 = N0 /K is the scaled initial population level. Here again is an initial value problem,
like Eqn. (9.13), but unlike Eqn. (9.1), the logistic differential equation is nonlinear. That
is, the variable y appears in a nonlinear expression (in fact a quadratic) in the equation.
so that
A(1 − y) + By = 1.
This must be true for all y, and in particular, substituting in y = 0 and y = 1 leads to
A = 1, B = 1 so that 2 2
2 y 2
I = ln |y| − ln |1 − y| = ln 22 2.
1 − y2
That is, we want y as a function of t. After exponentiating both sides we need to remove
the absolute value. We will now assume that y is initially smaller than 1, and show that it
remains so. In that case, everything inside the absolute value is positive, and we can write
y(t)
= ert+K = eK ert = Cert .
(1 − y(t))
In the above step, we have simply renamed the constant, eK by the new name C for sim-
plicity. C > 0 is now also an arbitrary constant whose value will be determined from the
initial conditions. Indeed, if we substitute t = 0 into the most recent equation, we find that
y(0)
= Ce0 = C,
(1 − y(0))
so that
y0
C= .
(1 − y0 )
We will use this fact shortly. What remains now is some algebra to isolate the desired
function y(t)
y(t) = (1 − y(t))Cert .
3 4
y(t) 1 + Cert = Cert .
194 Chapter 9. Differential Equations
Cert 1
y(t) = = .
(1 + Cert ) (1/C)e−rt + 1
The desired function is now expressed in terms of the time t, and the constants r, C. We
can also express it in terms of the initial value of y, i.e. y0 , by using what we know to be
true about the constant C, i.e. C = y0 /(1 − y0 ). When we do so, we arrive at
1 y0
y(t) = 1+y0 −rt = . (9.17)
y0 e +1 (y0 + (1 − y0 )e−rt )
Some typical solution curves of the logistic equation are shown in Fig. 9.5.
1.0 y(t)
0.0 time t
0.0 30.0
Figure 9.5. Solution curves for y(t) in the scaled form of the logistic equation
based on (9.18). We show the predicted behaviour of y(t) as given by Eqn. (9.17) for three
different initial conditions, y0 = 0.1, 0.25, 0.5. Note that all solutions approach the value
y = 1.
We can convert this result to an equivalent expression for the unscaled total population N (t)
by recalling that y(t) = N (t)/K. Substituting this for y(t), and noting that y0 = N0 /K
leads to
N0
N (t) = . (9.19)
(N0 + (K − N0 )e−rt )
It is left as an exercise for the reader to check this claim.
Now recall that r > 0. This means that e−rt is a decreasing function of time. There-
fore, (9.18) implies that, after a long time, the term e−rt in the denominator will be negli-
gibly small, and so
y0
y(t) → = 1,
y0
so that y will approach the value 1. This means that
The population will thus settle into a constant level, i.e., a steady state, at which no further
change will occur.
As an aside, we observe that this too, could have been predicted directly from the
differential equation. By setting dy/dt = 0, we find that
0 = ry(1 − y),
which suggests that y = 1 is a steady state. (This is also true for the less interesting case
of no population, i.e. y = 0 is also a steady state.) Similarly, this could have been found
by setting the derivative to zero in Eqn. (9.15), the original, unscaled logistic differential
equation. Doing so leads to
" #
dN K −N
= 0 ⇒ rN = 0.
dt K
If r > 0, the only values of N satisfying this steady state equation are N = 0 or N =
K. This implies that either N = 0 or N = K are steady states. The former is not too
interesting. It states the obvious fact that if there is no population, then there can be no
population growth. The latter reflects that N = K, the carrying capacity, is the population
size that will be sustained by the environment.
In summary, we have shown that the behaviour of the logistic equation for population
growth is more realistic than the simpler exponential growth we studied earlier. We saw
in Figure 9.5, that a small population will grow, but only up to some constant level (the
carrying capacity). Integration, and in particular the use of partial fractions allowed us to
make a full prediction of the behaviour of the population level as a function of time, given
by Eqn. (9.19).
a group of individuals born at the same time. Such a group is called a “cohort”.55. In 1825,
Gompertz suggested that the rate of mortality, m would depend on the age of the individu-
als. Because we consider a group of people who were born at the same time, we can trade
”age” for ”time”. Essentially, Gompertz assumed that mortality is not constant: it is low
at first, and increase as individuals age. Gompertz argued that mortality increases expo-
nentially. This turns out to be equivalent to the assumption that the logarithm of mortality
increases linearly with time.56 It is easy to see that these two statements are equivalent:
Suppose we assume that for some constants A > 0, µ > 0,
log mortality
ln(m)
slope µ
age, t
Figure 9.6. In the Gompertz Law of Mortality, it is assumed that the log of mor-
tality increases linearly with time, as depicted by Eqn. 9.20 and by the solid curve in this
diagram. Here the slope of ln(m) versus time (or age) is µ. For real populations, the
mortality looks more like the dashed curve.
2. There is “natural” mortality, but no other type of removal. This means we ignore the
mortality caused by epidemics, by violence and by wars.
3. We consider a single cohort, and assume that no new individuals are introduced (e.g.
by immigration)57.
We will now study the size of a “cohort”, i.e. a group of people who were born in the same
year. We will denote by N (t) the number of people in this group who are alive at time t,
where t is time since birth, i.e. age. Let N (0) = N0 be the initial number of individuals in
the cohort.
The rate of change of cohort size = −[number of deaths per unit time]
= −[mortality rate] · [cohort size]
dN (t)
= −m(t)N (t),
dt
and using information about the size of the cohort at birth leads to the initial condition,
N (0) = N0 . Together, this leads to the initial value problem
dN (t)
= −m(t)N (t), N (0) = N0 .
dt
Note similarity to Eqn. (9.1), but now mortality is time-dependent.
In the Problem set, we apply separation of variables and integrate over the time in-
terval [0, T ]: to show that the remaining population at age t is
m0
(eµt −1)
N (t) = N0 e− µ .
9.8 Summary
In this chapter, we used integration methods to find the analytical solutions to a variety of
differential equations where initial values were prescribed.
We investigated a number of population growth models:
1. Exponential growth, given by dydt = ky, with initial population level y(0) = y0
was investigated (Eqn. (9.1)). This model had an unrealistic feature that growth is
unlimited.
57 Note that new births would contribute to other cohorts.
198 Chapter 9. Differential Equations
3 K−N 4
2. The Logistic equation dN
dt = rN K was analyzed (Eqn. (9.15)), showing that
density-dependent growth can correct for the above unrealistic feature.
3. The Gompertz equation, dNdt(t) = −m(t)N (t), was solved to understand how age-
dependent mortality affects a cohort of individuals.
10.1 Introduction
This chapter has several important and challenging goals. The first of these is to under-
stand how concepts that were discussed for finite series and integrals can be meaningfully
extended to infinite series and improper integrals - i.e. integrals of functions over an infi-
nite domain. In this part of the discussion, we will find that the notion of convergence and
divergence will be important.
A second theme will be that of approximation of functions in terms of power series,
also called Taylor series. Such series can be described informally as infinite polynomials
(i.e. polynomials containing infinitely many terms). Understanding when these objects are
meaningful is also related to the issue of convergence, so we use the background assembled
in the first part of the chapter to address such concepts arising in the second part.
y
LA
HOA
y=f(x)
x
x0
Figure 10.1. The function y = f (x) (solid heavy curve) is shown together with its
linear approximation (LA, dashed line) at the point x0 , and a better “higher order” approx-
imation (HOA, thin solid curve). Notice that this better approximation stays closer to the
graph of the function near x0 . In this chapter, we discuss how such better approximations
can be obtained.
The theme of approximation has appeared often in our calculus course. In a previous
199
200 Chapter 10. Infinite series, improper integrals, and Taylor series
That is, we consider the infinite series as the limit of partial sums Sn as the number of
terms n is increased. If this limit exists, we say that the infinite series converges58 to S.
This leads to the following conclusion:
58 If the limit does not exist, we say that the series diverges.
10.2. Convergence and divergence of series 201
If this inequality is not satisfied, then we say that this sum does not exist (meaning that it is
not finite).
It is important to remember that an infinite series, i.e. a sum with infinitely many
terms added up, can exhibit either one of these two very different behaviours. It may
converge in some cases, as the first example shows, or diverge (fail to converge) in other
cases. We will see examples of each of these trends again. It is essential to be able to
distinguish the two. Divergent series (or series that diverge under certain conditions) must
be handled with particular care, for otherwise, we may find contradictions or “seemingly
reasonable” calculations that have meaningless results.
We can think of convergence or divergence of series using a geometric analogy. If we
start on the number line
+∞ at the origin and take a sequence of steps {a0 , a1 , a2 , . . . , ak , . . .},
we can think of S = k=0 ak as the total distance we have travelled. S converges if that
distance remains finite and if we approach some fixed number.
"convergence"
"divergence"
In order for the sum of ‘infinitely many things’ to add up to a finite number, the
terms have to get smaller. But just getting smaller is not, in itself, enough to guarantee
convergence. (We will show this later on by considering the harmonic series.) There are
rigorous mathematical tests which help determine whether a series converges or not. We
discuss some of these tests in Appendix 11.9.
202 Chapter 10. Infinite series, improper integrals, and Taylor series
Definition
An improper integral is an integral performed over an infinite domain, e.g.
0 ∞
f (x) dx.
a
The value of such an integral is understood to be a limit, as given in the following definition:
0 ∞ 0 b
f (x) dx = lim f (x) dx.
a b→∞ a
i.e. we evaluate an improper integral by first computing a definite integral over a finite
domain a ≤ x ≤ b, and then taking a limit as the endpoint b moves off to larger and larger
values. The definite integral can be interpreted as an area under the graph of the function.
The essential question being addressed here is whether that area remains bounded when we
include the “infinite tail” of the function (i.e. as the endpoint b moves to larger values.) For
some functions (whose values get small enough fast enough) the answer is “yes”.
Definition
When the limit shown above exists, we say that the improper integral converges. Other
wise we say that the improper integral diverges.
With this in mind, we compute a number of classic integrals:
We have used the fact that limb→∞ e−rb = 0 since (for r, b > 0) the exponential function
is decreasing with increasing b. Thus the limit exists (is finite) and the integral converges.
In fact it converges to the value I = 1/r.
y
y= 1/x
y=1/x 2
x
1
Figure 10.3. In Sections 10.3.2 and 10.3.3, we consider two functions whose
values decrease along the x axis, f (x) = 1/x and f (x) = 1/x2 . We show that one, but not
the other encloses a finite (bounded) area over the interval (1, ∞). To do so, we compute
an improper integral for each one. The heavy arrow is meant to remind us that we are
considering areas over an unbounded domain.
I = lim ln(b) = ∞
b→∞
The fact that we get an infinite value for this integral follows from the observation that
ln(b) increases without bound as b increases, that is the limit does not exist (is not finite).
Thus the area under the curve f (x) = 1/x over the interval 1 ≤ x ≤ ∞ is infinite. We say
that the improper integral of 1/x diverges (or does not converge). We will use this result
again in Section 10.4.1.
59 We do not chose the interval (0, ∞) because this function is undefined at x = 0. We want here to emphasize
Thus, the limit exists, and, in fact, I = 1, so, in contrast to the Example 10.3.2, this integral
converges.
We observe that the behaviours of the improper integrals of the functions 1/x and
1/x2 are very different. The former diverges, while the latter converges. The only differ-
ence between these functions is the power of x. As shown in Figure 10.3, that power affects
how rapidly the graph “falls off” to zero as x increases. The function 1/x2 decreases much
faster than 1/x. (Consequently 1/x2 has a sufficiently “slim” infinite “tail”, that the area
under its graph does not become infinite - not an easy concept to digest!) This observations
leads us to wonder what power p is needed to make the improper integral of a function
1/xp converge. We answer this question below.
Thus this integral converges provided that the term b1−p does not “blow up” as b increases.
For this to be true, we require that the exponent (1 − p) should be negative, i.e. 1 − p < 0
or p > 1. In this case, we have
1
I= .
p−1
To summarize our result,
0 ∞
1
dx converges if p > 1, diverges if p ≤ 1.
1 xp
10.3. Improper integrals 205
1
Examples: 1/xp that do or do not converge
1. The integral
0 ∞
1
√ dx, diverges.
1 x
√ 1 1
We see this from the following argument: x = x 2 , so p = 2 < 1. Thus, by the
general result, this integral diverges.
2. The integral
0 ∞
x−1.01 dx, converges.
1
Suppose we are given two functions, f (x) and g(x), both continuous on some infinite
interval [a, ∞). Suppose, moreover, that, at all points on this interval, the first function is
smaller than the second, i.e.
0 ≤ f (x) ≤ g(x).
Then the following conclusionsa can be made:
0 ∞ 0 ∞
1. f (x) dx ≤ g(x) dx. (This means that the area under f (x) is smaller than
a a
the area under g(x).)
0 ∞ 0 ∞
2. If g(x) dx converges, then f (x) dx converges. (If the larger area is finite,
a a
so is the smaller one)
0 ∞ 0 ∞
3. If f (x) dx diverges, then g(x) dx diverges. (If the smaller area is infinite,
a a
so is the larger one.)
a These statements have to be carefully noted. What is assumed and what is concluded works “one way”. That
is the order “if..then” is important. Reversing that order leads to a common error.
60 The reader should notice the similarity of these ideas to the comparisons made for infinite series in the
Appendix 11.9.2. This similarity stems from the fact that there is a close connection between series and integrals,
a recurring theme in this course.
206 Chapter 10. Infinite series, improper integrals, and Taylor series
We establish that the harmonic series diverges by comparing it to the improper integral of
the related function62.
1
y = f (x) = .
x
61 We have already noticed a similar surprise in connection with the improper integral of 1/x. These two
“surprises” are closely related, as we show here using a comparison of the series and the integral.
62 This function is “related” since for integer values of x, the function takes on values that are the same as
successive terms in the series, i.e. if x = k is an integer, then f (x) = f (k) = 1/k
10.4. Comparing integrals and series 207
1.0
0.0
0.0 11.0
Figure 10.4. The harmonic series is a sum that corresponds to the area under the
staircase shown above. Note that we have purposely shown the stairs arranged so that they
are higher than the function. This is essential in drawing the conclusion that the sum of the
series is infinite: It is larger than an area under 1/x that we already know to be infinite, by
Section 10.3.2.
In Figure 10.4 we show on one graph a comparison of the area under this curve, and a
staircase area representing the first few terms in the harmonic series. For the area of the
staircase, we note that the width of each step is 1, and the heights form the sequence
1 1 1
{1, , , , . . .}
2 3 4
Thus the area of (infinitely many) of these steps can be expressed as the (infinite) harmonic
series,
∞
1 1 1 1 1 1 $1
A= 1·1+1· + 1 · + 1 · + ... = 1 + + + + ... = .
2 3 4 2 3 4 k
k=1
On the other hand, the area under the graph of the function y = f (x) = 1/x for 0 ≤ x ≤ ∞
is given by the improper integral
0 ∞
1
dx.
1 x
208 Chapter 10. Infinite series, improper integrals, and Taylor series
converges, since p = 2 > 1. Notice, however, that the comparison does not give us a value
to which the sum converges. It merely indicates that the series does converge.
p(x) = x5 + 2x2 + 3x + 2.
Our introduction to differential calculus started with such functions for a reason: these
functions are convenient and simple to handle. We found long ago that it is easy to compute
derivatives of polynominals. The same can be said for integrals. One of our first examples,
in Section 3.6.1 was the integral of a polynomial. We needed only use a power rule to
integrate each term. An additional convenience of polynomials is that “evaluating” the
function, (i.e. plugging in an x value and determining the corresponding y value) can be
done by simple multiplications and additions, i.e. by basic operations easily handled by
an ordinary calculator. This is not the case for, say, trigonometric functions, exponential
10.5. From geometric series to Taylor polynomials 209
functions, or for that matter, most other functions we considered63. For this reason, being
able to approximate a function by a polynomial is an attractive proposition. This forms our
main concern in the sections that follow.
We can arrive at connections between several functions and their polynomial approx-
imations by exploiting our familiarity with the geometric series. We use both the results
for convergence of the geometric series (from Section 10.2) and the formula for the sum of
that series to derive a number of interesting, (somewhat haphazard) results64 .
Recall from Sections 1.7.1 and 10.2 that the sum of an infinite geometric series is
∞
$ 1
S = 1 + r + r2 + . . . + rk + . . . = rk = , provided |r| < 1. (10.3)
1−r
k=0
To connect this result to a statement about a function, we need a “variable”. Let us consider
the behaviour of this series when we vary the quantity r. To emphasize that now r is our
variable, it will be helpful to change notation by substituting r = x into the above equation,
while remembering that the formula in Eqn (10.3) hold only provided |r| = |x| < 1.
Then for every x in −1 < x < 1, it is true that f (x) can be approximated by terms in the
polynomial
p(x) = 1 + x + x2 + . . . (10.6)
In other words, by (10.3), for |x| < 1 the two expressions “are the same”, in the sense that
the polynomial converges to the value of the function. We refer to this p(x) as an (infinite)
Taylor polynomial65 or simply a Taylor series for the function f (x). The usefulness of this
kind of result can be illustrated by a simple example.
Example 10.1 (Using the Taylor Series (10.6) to approximate the function (10.5)) Compute
the value of the function f (x) given by Eqn. (10.5) for x = 0.1 without using a calculator.
63 For example, to find the decimal value of sin(2.5) we would need a scientific calculator. These days the
distinction is blurred, since powerful hand-held calculators are ubiquitous. Before such devices were available,
the ease of evaluating polynomials made them even more important.
64 We say “haphazard” here because we are not yet at the point of a systematic procedure for computing a
Taylor Series. That will be done in Section 10.6. Here we “take what we can get” using simple manipulations of
a geometric series.
65 A Taylor polynomial contains finitely many terms, n, whereas a Taylor series has n → ∞.
210 Chapter 10. Infinite series, improper integrals, and Taylor series
Solution: Plugging in the value x = 0.1 into the function directly leads to 1/(1 − 0.1) =
1/0.9, whose evaluation with no calculator requires long division66 . Using the polynomial
representation, we have a simpler method:
We provide a few other examples based on substitutions of various sorts using the geomet-
ric series as a starting point.
1
= 1 + (−t) + (−t)2 + (−t)3 + . . .
1 − (−t)
1
= 1 − t + t2 − t3 + t4 + . . . provided |t| < 1
1+t
This means we have produced a series expansion for the function 1/(1 + t). We can go
farther with this example by a new manipulation, whereby we integrate both sides to arrive
at a new function and its expansion, shown next.
66 This example is slightly “trivial”, in the sense that evaluating the function itself is not very difficult. However,
in other cases, we will find that the polynomial expansion is the only way to find the desired value.
10.5. From geometric series to Taylor polynomials 211
The formula appended on the right is just a compact notation that represents the pattern of
the terms. Recall that in Chapter 1, we have gotten thoroughly familiar with such summa-
tion notation67.
Example 10.2 (Evaluating the logarithm for x = 0.25) An expansion for the logarithm
is definitely useful, in the sense that (without a scientific calculator or log tables) it is not
possible to easily calculate the value of this function at a given point. For example, for x =
0.25, we cannot find ln(1 + 0.25) = ln(1.25) using simple operations, whereas the value
of the first few terms of the series are computable by simple multiplication, division, and
2
0.253
addition (0.25 − 0.25
2 + 3 ≈ 0.2239). (A scientific calculator gives ln(1.25) ≈ 0.2231,
so the approximation produced by the series is relatively good.)
When is the series for ln(1 + x) in (10.7) expected to converge? Retracing our steps
from the beginning of Example 10.5.2 we note that the value of t is not permitted to leave
the interval |t| < 1 so we need also |x| < 1 in the integration step68 . We certainly cannot
expect the series for ln(1 + x) to converge when |x| > 1. Indeed, for x = −1, we have
ln(1 + x) = ln(0) which is undefined. Also note that for x = −1 the right hand side of
(10.7) becomes " #
1 1 1
− 1 + + + + ... .
2 3 4
This is the recognizable harmonic series (multiplied by -1). But we already know from
Section 10.4.1 that the harmonic series diverges. Thus, we must avoid x = −1, since
the expansion will not converge there, and neither is the function defined. This example
illustrates that outside the interval of convergence, the series and the function become
“meaningless”.
Example 10.3 (An expansion for ln(2)) Strictly speaking, our analysis does not predict
what happens if we substitute x = 1 into the expansion of the function found in Sec-
tion 10.5.3, because this value of x is outside of the permitted range −1 < x < 1 in which
the Taylor series can be guaranteed to converge. It takes some deeper mathematics (Abel’s
theorem) to prove that the result of this substitution actually makes sense, and converges,
i.e. that
1 1 1
ln(2) = 1 − + − + . . .
2 3 4
We state without proof here that the alternating harmonic series converges to ln(2).
in the first few terms of such a series in any approximation of practical value.
68 Strictly speaking, we should have ensured that we are inside this interval of convergence before we computed
∞
1 $
2
= 1 − t2 + t4 − t6 + t8 + . . . = (−1)n t2n
1+t
k=0
This series will converge provided |t| < 1. Now integrate both sides, and recall that the
antiderivative of the function 1/(1 + t2 ) is arctan(t). Then
0 x 0 x
1
2
dt = (1 − t2 + t4 − t6 + t8 + . . .) dt
0 1+t 0
∞
x3 x5 x7 $ x(2k−1)
arctan(x) = x − + − + ... = (−1)k+1 . (10.8)
3 5 7 (2k − 1)
k=1
Example 10.4 (An expansion for π) For a particular application of this expansion, con-
sider plugging in x = 1 into Equation (10.8). Then
1 1 1
arctan(1) = 1 − + − + ...
3 5 7
But arctan(1) = π/4. Thus we have found a way of computing the irrational number π,
namely %∞ &
" #
1 1 1 $ 1
π = 4 1 − + − + ... = 4 (−1)k+1 .
3 5 7 (2k − 1)
k=1
While it is true that this series converges, the convergence is slow. (This can be seen by
adding up the first 100 or 1000 terms of this series with a spreadsheet.) This means that it
is not practical to use such a series as an approximation for π. (There are other series that
converge to π very rapidly that are used in any practical application.)
Here we will use the function to directly determine the coefficients ak . To determine a0 ,
let x = 0 and note that
f (0) = a0 + a1 0 + a2 02 + a3 03 + . . . = a0 .
We conclude that
a0 = f (0).
69 The development of this section was motivated by online notes by David Austin.
10.6. Taylor Series: a systematic approach 213
Here we have used the notation f (k) (x) to denote the k’th derivative of the function. Now
evaluate each of the above derivatives at x = 0. Then
f % (0) = a1 , ⇒ a1 = f % (0)
f !! (0)
f %% (0) = 2a2 , ⇒ a2 = 2
f !!! (0)
f %%% (0) = 2 · 3a3 , ⇒ a3 = 2·3
f (k) (0)
f (k) (0) = k!ak , ⇒ ak = k!
This gives us a recipe for calculating all coefficients ak . This means that if we can compute
all the derivatives of the function f (x), then we know the coefficients of the Taylor series
as well. Because we have evaluated all the coefficients by the substitution x = 0, we say
that the resulting power series is the Taylor series of the function about x = 0.
This is a very interesting series. We state here without proof that this series converges for
all values of x. Further, the function defined by the series is in fact equal to ex that is,
∞
x2 x3 $ xk
ex = 1 + x + + + ... =
2 6 k!
k=0
214 Chapter 10. Infinite series, improper integrals, and Taylor series
The implication is that the function ex is completely determined (for all x values)
by its behaviour (i.e. derivatives of all orders) at x = 0. In other words, the value of the
function at x = 1, 000, 000 is determined by the behaviour of the function around x = 0.
This means that ex is a very special function with superior “predictable features”. If a
function f (x) agrees with its Taylor polynomial on a region (−a, a), as was the case here,
we say that f is analytic on this region. It is known that ex is analytic for all x.
We can use the results of this example to establish the fact that the exponential func-
tion grows “faster” than any power function xn . That is the same as saying that the ratio of
ex to xn (for any power n) increases with x. We leave this as an exercise for the reader.
We can also easily obtain a Taylor series expansion for functions related to ex , with-
out assembling the derivatives. We start with the result that
∞
u2 u3 $ uk
eu = 1 + u + + + ... =
2 6 k!
k=0
f (x) = sin x, f % (x) = cos x, f %% (x) = − sin x, f %%% (x) = − cos x, f (4) (x) = sin x, . . .
∞
x3 x5 x7 $ x2n+1
sin x = x − + − + ... = (−1)n .
3! 5! 7! n=0
(2n + 1)!
We state here without proof that the function sin(x) is analytic, so that the expansion
converges to the function for all x.
It is instructive to demonstrate how successive terms in a Taylor series expansion
lead to approximations that improve. Doing this kind of thing will be the subject of the last
computer laboratory exercise in this course.
10.6. Taylor Series: a systematic approach 215
2.0
T1
T3
T2
sin(x)
T4
-2.0
0.0 7.0
Here we demonstrate this idea with the expansion for the function sin(x) that we just
obtained. To see this, consider the sequence of polynomials
T1 (x) = x,
x3
T2 (x) = x − ,
3!
3
x x5
T3 (x) = x − + ,
3! 5!
x3 x5 x7
T4 (x) = x − + − .
3! 5! 7!
Then these polynomials provide a better and better approximation to the function sin(x)
close to x = 0. The first of these is just a linear (or tangent line) approximation that we
had studied long ago. The second improves this with a quadratic approximation, etc. Fig-
ure 10.5 illustrates how the first few Taylor polynomials approximate the function sin(x)
near x = 0. Observe that as we keep more terms, n, in the polynomial Tn (x), the approx-
imating curve “hugs” the graph of sin(x) over a longer and longer range. The student will
be asked to use the spreadsheet, together with some calculations as done in this section, to
produce a composite graph similar to Fig. 10.5 for some other function.
216 Chapter 10. Infinite series, improper integrals, and Taylor series
Example 10.5 (The error in successive approximations) For a given value of x close to
the base point (at x = 0), the error in the approximation between the polynomials and
the function is the vertical distance between the graphs of the polynomial and the function
sin(x) (shown in black). For example, at x = 2 radians sin(2) = 0.9093 (as found on
a scientific calculator). The approximations are: T1 (2) = 2, which is very inaccurate,
T2 (2) = 2 − 23 /3! ≈ 0.667 which is too small, T3 (2) ≈ 0.9333 that is much closer and
T4 (2) ≈ .9079 that is closer still. In general, we can approximate the size of the error using
the next term that would occur in the polynomial if we kept a higher order expansion. The
details of estimating such errors is omitted from our discussion.
We also note that all polynomials that approximate sin(x) contain only odd powers
of x. This stems from the fact that sin(x) is an odd function, i.e. its graph is symmetric to
rotation about the origin, a concept we discussed in an earlier term.
The Taylor series for cos(x) could be found by a similar sequence of steps. But in
this case, this is unnecessary: We already know the expansion for sin(x), so we can find the
Taylor series for cos(x) by simple differentiation term by term. (Since sin(x) is analytic,
this is permitted for all x.) We leave as an exercise for the reader to show that
∞
x2 x4 x6 $ x2n
cos(x) = 1 − + − + ... = (−1)n .
2 4! 6! n=0
(2n)!
Since cos(x) has symmetry properties of an even function, we find that its Taylor series is
composed of even powers of x only.
A simple substitution (e.g. u = x2 ) will not work here, and we cannot find an antideriva-
tive. Here is how we might approach the problem using Taylor series: We know that the
10.7. Application of Taylor series 217
t3 t5 t7
sin t = t − + − + ...
3! 5! 7!
Substituting t = x2 , we have
x6 x10 x14
sin(x2 ) = x2 − + − + ...
3! 5! 7!
In spite of the fact that we cannot antidifferentiate the function, we can antidifferentiate the
Taylor series, just as we would a polynomial:
0 1 0 1
x6 x10 x14
sin(x2 ) dx = (x2 − + − + . . .) dx
0 0 3! 5! 7!
" # 21
x3 x7 x11 x15 2
= − + − + . . . 22
3 7 · 3! 11 · 5! 15 · 7! 0
1 1 1 1
= − + − + ...
3 7 · 3! 11 · 5! 15 · 7!
This is an alternating series so we know that it converges. If we add up the first four terms,
the pattern becomes clear: the series converges to 0.31026.
dy
= y,
dx
y(0) = 1.
Indeed, from previous work, we know that the solution of this differential equation and ini-
tial condition is y(x) = ex , but we will pretend that we do not know this fact in illustrating
the usefulness of Taylor series. In some cases, where separation of variables does not work,
this option would have great practical value.
Let us express the “unknown” solution to the differential equation as
y = a0 + a1 x + a2 x2 + a3 x3 + a4 x4 + . . .
dy
= a1 + 2a2 x + 3a3 x2 + 4a4 x3 + . . .
dx
218 Chapter 10. Infinite series, improper integrals, and Taylor series
dy
But according to the differential equation, dx = y. Thus, it must be true that the two Taylor
series match, i.e.
This equality hold for all values of x. This can only happen if the coefficients of like terms
are the same, i.e. if the constant terms on either side of the equation are equal, if the terms of
the form Cx2 on either side are equal, and so on for all powers of x. Equating coefficients,
we obtain:
a0 = a1 = 1, ⇒ a1 = 1,
a1 = 2a2 , ⇒ a2 = a21 = 21 ,
a2 1
a2 = 3a3 , ⇒ a3 = 3 = 2·3 ,
a3 1
a3 = 4a4 , ⇒ a4 = 4 = 2·3·4 ,
an−1 1 1
an−1 = nan , ⇒ an = n = 1·2·3...n = n! .
10.8 Summary
The main points of this chapter can be summarized as follows:
whereas 0 ∞
1
I= dx diverges.
1 x
10.8. Summary 219
4. Using a comparison between integrals and series we showed that the harmonic series,
∞
$ 1 1 1 1 1
= 1 + + + + ... + + ... diverges.
k 2 3 4 k
k=1
5. More generally, our results led to the conclusion that the “p” series,
∞
$ 1
converges if p > 1, diverges if p ≤ 1.
kp
k=1
6. We studied Taylor series and showed that some can be found using the formula for
convergent geometric series. Two examples of Taylor series that were obtained in
this way are
x2 x3 x4
ln(1 + x) = x − + − + ... for |x| < 1
2 3 4
and
x3 x5 x7
arctan(x) = x − + − + ... for |x| < 1
3 5 7
7. In discussing Taylor series, we considered some of the following questions: (a) For
what range of values of x can we expect the series to converges? (b) Suppose we
approximate the function on the right by a finite number of terms on the left. How
good is that approximation? Another interesting question is: (c) If we include more
and more such terms, does that approximation get better and better? (i.e., does the
series converge to the function?) (d) Is the convergence rate rapid? Some of these
questions occupy the attention of career mathematicians, and are beyond the scope
of this introductory calculus course.
8. More generally, we showed that the Taylor series for a function about x = 0,
∞
$
2 3
f (x) = a0 + a1 x + a2 x + a3 x + . . . = ak xk .
k=0
f (k) (0)
ak =
k!
9. We discussed some of the applications of Taylor series. We used Taylor series to
approximate a function, to find an approximation for a definite integral of a function,
and to solve a differential equation.
220 Chapter 10. Infinite series, improper integrals, and Taylor series
Chapter 11
Appendix
N
$ N (N + 1)(2N + 1)
k2 = ,
6
k=1
using a technique called induction. The idea of the method is to check that the formula
works for one or two simple cases (e.g. the “sum” of just one or just two terms of the
series), and then show that whenever it works for one case (summing up to N ), it has to
also work for the next case (summing up to N + 1).
First, we verify that this formula works for a few test cases:
N = 1: If there is only one term, then clearly, by inspection,
1
$
k 2 = 12 = 1.
k=1
221
222 Chapter 11. Appendix
Now the sum of the first N + 1 squares will be just a bit bigger: it will have one more term
added to it:
N
$ +1 $N
SN +1 = k2 = k 2 + (N + 1)2 .
k=1 k=1
Thus
N (N + 1)(2N + 1)
SN +1 = + (N + 1)2 .
6
Combining terms, we get
. /
N (2N + 1)
SN +1 = (N + 1) + (N + 1) ,
6
2N 2 + N + 6N + 6 2N 2 + 7N + 6
SN +1 = (N + 1) = (N + 1) .
6 6
Simplifying and factoring the last term leads to
(2N + 3)(N + 2)
SN +1 = (N + 1) .
6
We want to check that this still agrees with what the formula predicts. To make the notation
simpler, we will let M stand for N + 1. Then, expressing the result in terms of the quantity
M = N + 1 we get
M
$ [2(N + 1) + 1][(N + 1) + 1] [2M + 1][M + 1]
SM = k 2 = (N + 1) =M .
6 6
k=1
This is the same formula as we started with, only written in terms of M instead of N . Thus
we have verified that the formula works. By Mathematical Induction we find that the result
has been proved.
11.2. Riemann Sums: Extensions and other examples 223
(k + 1)3 − (k − 1)3 = 6k 2 + 2,
so
n
$ n
3 4 $
(k + 1)3 − (k − 1)3 = (6k 2 + 2).
k=1 k=0
But looking more carefully at the left hand side (LHS), we see that
n
$
((k + 1)3 − (k − 1)3 ) = 23 − 03 + 33 − 13 + 43 − 23 + 53 − 33 ... + (n + 1)3 − (n − 1)3 .
k=1
most of the terms cancel, leaving only −1 + n3 + (n + 1)3 , so this means that
n
$ n
$
−1 + n3 + (n + 1)3 = 6 k2 + 2,
k=1 k=1
so
n
$
k 2 = (−1 + n3 + (n + 1)3 − 2n)/6 = (2n3 + 3n2 + n)/6.
k=1
n
$
Similarly, the formula for k 3 , can be obtained by starting with
k=1
y = f (x) = x2 + 2x + 1 a ≤ x ≤ b.
70 I want to thank Robert Israel for contributing this material
224 Chapter 11. Appendix
Here the interval is a ≤ x ≤ b. Let us leave the values of a, b general for a moment, and
consider how the calculation is set up in this case. Then we have
length of interval = b − a
number of segments = N
b−a
width of rectangular strips = ∆x =
N
(b − a)
the k’th x value = xk = a + k
N
height of k’th rectangular strip = f (xk ) = x2k + 2xk + 1
as the area under the function f (x) = x2 + 2x + 1, over the interval a ≤ x ≤ b. Observe
that the solution depends on a, and b. (The endpoints of the interval influence the total
area under the curve.) For example, if the given interval happens to be 2 ≤ x ≤ 4. then,
substituting a = 2, b = 4 into the above result for A, leads to
4−2 2 32
A = (2 + 1)2 (4 − 2) + (2 + 1)(4 − 2)2 + = 18 + 12 + =
3 3 3
In the next chapter, we will show that the tools of integration will lead to the same conclu-
sion.
y
y=f(x)
x
a b
y y
x x
a=x 0 x1 x k−1 x k x N=b a=x 0 x1 x k−1 x k x N=b
Figure 11.1. The area under the curve y = f (x) over an interval a ≤ x ≤ b could
be computed by using either a left or right endpoint approximation. That is, the heights of
the rectangles are adjusted to match the function of interest either on the right or on their
left corner. Here we compare the two approaches. Usually both lead to the same result
once a limit is computer to arrive at the “true ” area.
N
$
Right endpoints: AN strips = f (xk )∆x.
k=1
N
$ −1
Left endpoints: AN strips = f (xk )∆x.
k=0
f (x) = x2 , 0 ≤ x ≤ 1,
We now compare the right and left endpoint approximation. These are shown in panels of
Figure 11.2. Note that
1 k
∆x = , xk = ,
N N
The area of the k’th rectangle is
2
ak = f (xk ) × ∆x = (k/N ) (1/N ) ,
The first rectangle corresponds to k = 0 in the left endpoint approximation (rather than
k = 1 in the right endpoint approximation). But the k = 0 rectangle makes no contribution
(as its area is zero in this example) and we have one less rectangle at the right endpoint of
the interval, since the N’th rectangle is k = N − 1. Then the sum is
" #
1 (2(N − 1) + 1)(N − 1)(N ) (2N − 1)(N − 1)
AN strips = = .
N3 6 6N 2
(2N − 1)(N − 1) 2 1
A = lim AN strips = lim 2
= = .
N →∞ N →∞ 6N 6 3
We see that, after computing the limit, the result for the “true area” under the curve is
exactly the same as we found earlier in this chapter using the right endpoint approximation.
100.0 100.0
0.0 0.0
0.0 10.0 0.0 10.0
100.0
Comparison of
Right Left
approximations
0.0
0.0 10.0
Figure 11.2. Rectangles with left or right corners on the graph of y = x2 are
compared in this picture. The approximation shown in pink is “missing” the largest rect-
angle shown in green. However, in the limit as the number of rectangles, N → ∞, the true
area obtained is the same.
subdivides the distribution into two equal masses (or, more generally, produces equal sized
areas under the graph of the density function.) The center of mass assigns a greater weight
to parts of the distribution that are “far away” in the same sense. (However, for symmetric
distributions, the median and the mean are the same.)
In physics, we speak of the “moment of mass” of a distribution about a point. This
quantity is related to the tendency of the mass to contribute a torque, i.e. to make the
object rotate. Suppose we are interested in a particular point of reference x. In a discrete
mass distribution, for example, the moment of mass of each of the beads relative to point
x is given by the product of the mass and its distance away from the point - as with the
teeter totter, beads farther away will contribute more torque than beads closer to point x,
and heavier beads (i.e. greater mass) will contribute more torque than lighter beads. For
example, mass 1 contributes an amount m1 (x − x1 ) to the total moment of mass of the
distribution about the point x. Altogether the moment of mass of the distribution about the
228 Chapter 11. Appendix
point x is defined as
n
$
M1 (x) = mi (x − xi ).
i=1
The center of mass is a special point x̄ such that the moment of mass about that point is
zero. (Loosely speaking the tendency to rotate to the left or the right are the same: thus the
distribution would be balanced if it “rested on that point”.)
m1 m2 m3
x1 x2 x3
x
Figure 11.3. A discrete set of masses m1 , m2 , m3 is distributed at positions
x1 , x2 , x3 . The center of mass of the distribution is the position at which the given mass
distribution would balance, here represented by the white triangle.
n
% n
&
$ $
x̄ mi − mi xi = 0.
i=1 i=1
But we already know that the first summation above is just the total mass, so that
% n &
$
x̄M − mi xi = 0,
i=1
so, taking the second term to the other side and dividing by M leads to
n
1 $
x̄ = mi xi .
M i=1
We have recovered precisely the definition of the center of mass or “average x coordinate”.
11.4. The shell method for computing volumes 229
y y
y=f(x)=1−x
x x
y
y=1−x
x
dx
Figure 11.4. Top: The curve that generates the cone (left) and the shape of the
cone (right). Bottom: the cone showing one of the series of shells that are used in this
example to calculate its volume.
We use the shell method71 to find the volume of the cone formed by rotating the curve
y =1−x
Solution
We show the cone and its generating curve in Figure 11.4, together with a representative
shell used in the calculation of total volume. The volume of a cylindrical shell of radius r,
height h and thickness τ is
Vshell = 2πrhτ.
We will place these shells one inside the other so that their radii are parallel to the x axis
(so r = x). The heights of the shells are determined by their y value (i.e. h = y = 1 − x =
71 Note to the instructor: This material may be skipped in the interest of time. It presents an alternative to the
disk method, but there may not be enough time to cover this in detail.
230 Chapter 11. Appendix
1 − r). For the tallest shell r = 0, and for the flattest shell r = 1. The thickness of the shell
is ∆r. Therefore, the volume of one shell is
Vshell = 2πr(1 − r) ∆r.
The volume of the object is obtained by summing up these shell volumes. In the limit,
as ∆r → dr gets infinitesimally small, we recognize this as a process of integration. We
integrate over 0 ≤ r ≤ 1, to obtain:
0 1 0 1
V = 2π r(1 − r) dr = 2π (r − r2 ) dr.
0 0
We find that " # 21 " #
r2 r3 2 = 2π 1 − 1 = π .
2
V = 2π −
2 3 2
0 2 3 3
Simplify further:
1 2 1 1 2
I= (ln 5 − ln 3) + (ln 8 − ln 5) = − ln 5 − ln 3 + ln 8.
2 3 6 2 3
This method can be used to solve any integral that contain a fraction with a degree 1
polynomial in the numerator and a degree 2 polynomial (that has two roots) in the denom-
inator.
This is just saying that the sum of the number of students in every one of the categories has
to add up to the total class size. The fraction of the class that scored grade xi is
pi
.
N
(Dividing by N has normalized the distribution. The value pi /N is the empirical probabil-
ity of getting grade xi .) The mean or average grade is:
50
1 $
X̄ = xi pi .
N i=0
11.6. Analysis of data: a student grade distribution 233
25.0
Grade Distribution
0.0 31.9
0.0 mean 50.0
Figure 11.5. Distributions of grades on a test with 50 point maximum. There were
a total of 76 students writing the test. The mean grade 31.9 is shown.
Table 11.1. Distribution of grades (out of 50) for a class of 76 students. The mean
grade for this class is 31.9474.
gories according to the proportion of the class that was in that category. (The terminology
weighted average is sometimes used.)
We define the mean or average grade in the distribution by
M
$ pi
x̄ = x̃i . (11.1)
i=1
N
Where M is the number of bins. An equivalent way of expressing the mean (average) is:
M +M
1 $ x̃i pi
x̄ = x̃i pi = +i=1
M
. (11.2)
N i=1 i=1 pi
The sum in the denominator of this last fraction is simply the total class size.
In Table 11.1, we show steps in the calculation of the mean grade for the class. This
calculation is easily handled on the same spreadsheet that recorded the frequency of grades
and that was used to plot the bar graph of that distribution. Equations 11.1 and 11.2 are
saying the same thing. We will see the second of these again in the context of a more
general probability distribution in Chapter 8.
i
$
Fi = pk .
k=1
Then Fi is the number of students whose grade xk was between x1 and xi (x1 ≤
xk ≤ xi ). Of course, when we add up all the way to the last category, we arrive at the total
number of students in the class (assuming each student wrote the test and received a grade).
Thus
$M
Fm = pk = N,
k=1
Where as before, M stands for the number of “bins” used to represent the grade distribution.
(Note that each student has been counted in one of the categories corresponding to the grade
he or she achieved.) Another way of saying the same thing is that
m
$ pk
= 1.
N
k=1
In Figure 11.6 we show the cumulative function, i.e. we plot x̃i vs Fi . Note that this graph
is a step function. That is,the function takes on a set of discrete values with jumps at every
5th integer73.
80.0 80.0
50%
40.
50%
Grade Distribution
Figure 11.6. Top: The same grade distribution as in Figure 11.5, but showing
the cumulative function. The grid has been removed for easier visualization of that step
function. Bottom:The cumulative function is used to determine an approximate median
grade.
73 Note: ideally, this graph should be discontinuous, with horizontal segments only. The vertical“jumps” cannot
correspond to values of a function. However the spreadsheet tool used to plot this function does not currently
allow this graphing option.
236 Chapter 11. Appendix
Example
1! = 1
2! = 2 · 1 = 2
3! = 3 · 2 · 1 = 6
4! = 4 · 3 · 2 · 1 = 24
5! = 5 · 4 · 3 · 2 · 1 = 120
We also define
0! = 1
(a) n!
n distinct objects
n n−1 n−2 ... 2 1 n slots
n!
P(n,k)=
(n−k)!
(b)
C(n,k) k!
(c)
Figure 11.7. This diagram illustrates the meanings of permutations and combi-
nations. (a) The number of permutations (ways of arranging) n objects into n slots. There
are n choices for the first slot, and for each of these, there are n − 1 choices for the second
slot, etc. In total there are n! ways of arranging these objects. (Note that the order of
the objects is here important.) (b) The number of permutations of n objects into k slots,
P (n, k), is the product n · (n − 1) · (n − 2) . . . (n − k + 1) which can also be written as
a ratio of factorials. (c) The number of combinations of n objects in groups of k is called
C(n, k) (shown as the first arrow in part c). Here order is not important. The step shown
in (b) is equivalent o the two steps shown in (c). This means that there is a relationship
between P (n, k) and C(n, k), namely, P (n, k) = k!C(n, k).
74 Recall that ⇒ means “implies that”. This is a one-way implication: A ⇒ B says that “A implies B”
and cannot be used to conclude that B implies A. ⇔ means that each statement implies the other, a two-way
implication. Just as it is important to “obey traffic signs” and avoid “driving the wrong way” on a one-way street,
it is also important to be careful about use of these mathematical statements.
238 Chapter 11. Appendix
∞
$ ak+1
If ak is a series with an > 0 and lim = L, then
k→∞ ak
k=0
k! = k · (k − 1) · (k − 2) . . . 3 · 2 · 1.
then
1 1
ak+1 = , ak = ,
(k + 1)! k!
1
ak+1 (k+1)! k! 1
= lim 1 = lim = lim = 0.
ak k→∞
k!
k→∞ (k + 1)! k→∞ k + 1
Thus L = 0, L < 1 so this series converges by the ratio test. Later, we will see a second
method (comparison) to arrive at the same conclusion.
This series is the Harmonic Series. To apply the ratio test, we note that
1 1
ak+1 = , ak = ,
k+1 k
1
ak+1 k+1 k
L = lim = lim 1 = lim = 1.
k→∞ ak k→∞ k→∞ k + 1
k
Since L = 1, in this case, the test is inconclusive. In fact, we show in Section 10.4 that the
harmonic series diverges.
11.9. Appendix: Tests for convergence of series 239
Here
ak+1
ak+1 = rk+1 , ak = rk , = r,
ak
ak+1
L = lim = r.
k→∞ ak
So, by the ratio test, if L = r < 1 then the geometric series converges (confirming a fact
we have already established).
such that terms of one series are always smaller than terms of the other, i.e. satisfy
Then $ $
bk converges ⇒ ak converges,
$ $
ak diverges ⇒ bk diverges.
+
The idea behind the first of these statements is that the “smaller” series ak is
“squeezed in” between+ 0 (the lower bound) and the sum of the larger series (which we
know must exist, since bk converges.) This means that the smaller series cannot become
unbounded. For the second statement, we have that the smaller of the two series is known
to diverge, forcing the larger also to be unbounded. One must carefully observe that “⇒”
applies only in one direction. (For example, if the smaller series converges, we cannot
conclude anything about the larger series.)
Solution: We compare terms in this series to a terms in a geometric series with r = 12 . i.e.
consider
1 1
ak = k , bk = k .
2 +1 2
Then clearly
0 < ak < b k for every k
+ 1
(since the denominator in ak is larger). But we know that 2k converges. Therefore, so
+ 1
does 2k +1
.
1 1 1 $ 1
1− + − + ... = (−1)n+1
2 3 4 n
We will show that this series converges (essentially because terms nearly cancel out), and
in fact, we show in Section 10.5.3 that it converges to the number ln(2) ≈ 0.693. More
generally, we have the following result.
If S is an alternating series,
∞
$
S= (−1)k ak = a1 − a2 + a3 − a4 + . . .
k=1
with ak > 0 and such that (1) |a1 | ≥ |a2 | ≥ |a3 | ≥ . . . etc. and (2) limk→∞ ak = 0, then
the series converges. (This was established by Leibniz in 1705.)
+ + + +
If ak and bk both converge and ak = S bk = T , then
+ + + +
(a) (ak + bk ) converges and (ak + bk ) = ak + bk = S + T .
+ +
(b) cak = c ak = cS, where c is any constant.
+ + + +k
(c) The product ( ak ) · ( bk ) = ∞ k=0 i=0 ai bk−i = S · T .
11.11. Using series to solve a differential equation 241
Example:
y = a0 + a1 x + a2 x2 + a3 x3 + a4 x4 + a5 x5 + . . .
Using the information from the initial conditions, we get y(0) = a0 = 1 and y % (0) = a1 =
0. Now we can write down the derivatives:
y % = a1 + 2a2 x + 3a3 x2 + 4a4 x3 + 5a5 x4 + . . .
y %% = 2a2 + 2 · 3x + 3 · 4x2 + 4 · 5x3 + . . .
y %% = xy
2a2 + 2 · 3a3 x + 3 · 4a4 x2 + 4 · 5a5 x3 + . . . = x(a0 + a1 x + a2 x2 + a3 x3 + . . .)
2a2 + 2 · 3a3 x + 3 · 4a4 x2 + 4 · 5a5 x3 + . . . = a0 x + a1 x2 + a2 x3 + a3 x4 + . . .
2a2 = 0 ⇒ a2 = 0,
1
2 · 3a3 = a0 ⇒ a3 = 2·3 ,
3 · 4a4 = a1 ⇒ a4 = 0,
4 · 5a5 = a2 ⇒ a5 = 0,
1
5 · 6a6 = a3 ⇒ a6 = 2·3·5·6 .
x3 x6
y =1+ + + ...
2·3 2·3·5·6
If we continue in this way, we can write down many terms of the series.
242 Chapter 11. Appendix
Index
3D as a function, 39
objects, 81 circle, 6
of planar region, 27
Abel’s theorem, 211 of simple shapes, 1
acceleration, 62 parallelogram, 2
actin polygon, 3
cortex, 84 rectangle, 2
addition triangle, 2
principle, 140 true, 35
age average, 234
distribution, 167 mass density, 86
of death, 167 of probability distribution, 137
airways weighted, 234
surface area, 23 average value
volume, 22 of a function, 76, 161
Airy’s equation, 241
alcohol bacterial
in the blood, 185 motion, 150
algorithm, 29 balance
allele, 146, 165 energy, 188
alligator, 101 mass, 186
alternating series, 240 bank
alveoli, 17 interest rate, 74
analytic, 214 bell
approach, 29 curve, 145
annuity, 74 Bernoulli trial, 140
anti-differentiation, 49 bifurcate, 18
antiderivative, 47, 110 bin, 166, 233
table of, 49 binomial
applications coefficient, 142
of integration, 61 distribution, 140, 143
approximation theorem, 142
left endpoint, 224 birth, 71, 178
linear, 36, 200 blood alcohol, 185
right endpoint, 224 branch
Archimedes, 4 daughter, 18
area parent, 18
243
244 Index
undefined
function, 58
units, 7
zygote, 147