Aisling McCluskey, Brian McMaster - Undergraduate Analysis - A Working Textbook-Oxford University Press (2018)

Undergraduate Analysis
Undergraduate
Analysis
A Working Textbook
Aisling McCluskey
Senior Lecturer in Mathematics
National University of Ireland, Galway
Brian McMaster
Honorary Senior Lecturer
Queen’s University Belfast
1
3
Great Clarendon Street, Oxford, OX2 6DP,
United Kingdom
Oxford University Press is a department of the University of Oxford.
It furthers the University’s objective of excellence in research, scholarship,
and education by publishing worldwide. Oxford is a registered trade mark of
Oxford University Press in the UK and in certain other countries
© Aisling McCluskey and Brian McMaster 2018
The moral rights of the authors have been asserted
First Edition published in 2018
Impression: 1
All rights reserved. No part of this publication may be reproduced, stored in
a retrieval system, or transmitted, in any form or by any means, without the
prior permission in writing of Oxford University Press, or as expressly permitted
by law, by licence or under terms agreed with the appropriate reprographics
rights organization. Enquiries concerning reproduction outside the scope of the
above should be sent to the Rights Department, Oxford University Press, at the
address above
You must not circulate this work in any other form
and you must impose this same condition on any acquirer
Published in the United States of America by Oxford University Press
198 Madison Avenue, New York, NY 10016, United States of America
British Library Cataloguing in Publication Data
Data available
Library of Congress Control Number: 2017963197
ISBN 978–0–19–881756–7 (hbk.)
ISBN 978–0–19–881757–4 (pbk.)
Printed and bound by
CPI Group (UK) Ltd, Croydon, CR0 4YY
We dedicate this book to all those practitioners of the craft of analysis whose
apprentices we have been in times long past, and to the colleagues who in more
recent years have shared with us their insights and their enthusiasm.
In particular, we salute with gratitude and affection:
Samuel Verblunsky
Derek Burgess
Ralph Cooper
James McGrotty
David Armitage
Tony Wickstead
Ariel Blanco
Ray Ryan
John McDermott
AMcC, BMcM, October 2017
Preface
Mathematical analysis underpins calculus: it is the reason why calculus works, and
it provides a toolkit for handling situations in which algorithmic calculus doesn’t
work. Since calculus in its turn underpins virtually the whole of the mathematical
sciences, analytic ideas lie right at the heart of scientific endeavour, so that a
confident understanding of the results and techniques that they inform is valuable
for a wide range of disciplines, both within mathematics itself and beyond its
traditional boundaries.
This has a challenging consequence for those who participate in third-level
mathematics education: large numbers of students, many of whom do not regard
themselves primarily as mathematicians, need to study analysis to some extent; and
in many cases their programmes do not allow them enough time and exposure to
grow confident in its ideas and techniques. This programme-time poverty is one
of the circumstances that have given analysis the unfortunate reputation of being
strikingly more difficult than other cognate disciplines.
Aspects of this perception of difficulty include the lack of introductory gradual-
ness generally observed in the literature, and the without loss of generality factor:
experienced analysts are continually simplifying their arguments by summoning
up a battery of shortcuts, estimations and reductions-to-special-cases that are
part of the discipline’s folklore, but which there is seldom class time to teach in
any formal sense: instead, students are expected to pick up these ideas through
experience of working on examples. Yet the study time allocated to analysis in
early undergraduate programmes is often insufficient for this kind of learning
by osmosis. The ironic consequence is that basic analytic exercises are not only
substantially harder for the beginner than for the professional, but substantially
harder than they need to be.
This text, through its careful design, emphasis and pacing, sets out to develop
understanding and confidence in analysis for first-year and second-year under-
graduates embarked upon mathematics and mathematically related programmes.
Keenly aware of contemporary students’ diversity of motivation, background
knowledge and time pressures, it consistently strives to blend beneficial aspects
of the workbook, the formal teaching text and the informal and intuitive tutorial
discussion. In particular:
1. It devotes ample space and time for development of insight and confidence in
handling the fundamental ideas that – if imperfectly grasped – can make
analysis seem more difficult than it actually is.
2. It focuses on learning through doing, presenting a comprehensive integrated
range of examples and exercises, some worked through in full detail, some
supported by sketch solutions and hints, some left open to the reader’s
initiative (and some with online solutions accessible through the publishers).
viii PREFACE
3. Without undervaluing the absolute necessity of secure logical argument, it

legitimises the use of informal, heuristic, even imprecise initial explorations of
problems aimed at deciding how to tackle them. In this respect it creates an
atmosphere like that of an apprenticeship, in which the trainee analyst can
look over the shoulder of the experienced practitioner, look under the bonnet
of the problem and watch the roughwork develop, noting the occasional
failures of opening gambits and the tricks of the trade that can be mobilised in
order to circumvent them.
The price that has to be paid for such an approach is that the book is more
verbose, sometimes positively long-winded, and certainly longer than one that
would concentrate solely on finalised versions of standard proofs and slick model
answers. Yet it appears to us that such a price is well worth paying: for one thing,
it is our experience that a text principally consisting of streamlined, finalised
demonstrations and solutions creates in the mind of many beginners a misleading
and demoralising impression that this is how they are expected to create solutions
at the first attempt; for another, the extra material – far from being just digressional
– summarises what we find it necessary to say, time and time again, to students who
ask us eminently reasonable questions such as: ‘How do I start this?’ ‘How can we
be expected to think of that?’ ‘Why is that step true, and why did you think of taking
it?’ An additional benefit is that the text will be easier and quicker to read, since
the thoughtful reader will often find answers promptly supplied to the questions
that would otherwise have impeded progress to the next step.
Especially because less-specialised learners will often need to deal with only
some of the material covered here, we have streamed the presentation into basic
and more advanced chapters and, within these, we have flagged up relatively
specialised topics and sophisticated arguments that can reasonably be omitted
without compromising overall comprehension. Analysis is more welcoming to the
learner who has thoroughly grasped a modest amount of material than to one who
has an imprecise understanding of a larger body of knowledge.
It is central to our teaching philosophy and to our classroom experience that
students learn at a deeper level through doing than they ever could through
reading alone: despite our intention to present here as full an account of basic
analytic concepts, results and techniques as is reasonable to set before learners
who have many other competing demands on their time and energy, it is only by
active study, engaging in a broad range of exercises, that they will gain confidence
and empowerment in acquiring useable, performable knowledge and the insight
that directs it. Our account is therefore intended as a working textbook: each
idea encountered is embedded in worked examples and in exercises – some
with solutions, some with helpful hints encouraging the reader to explore and to
internalise that idea.
Contents
A Note to the Instructor xiii

A Note to the Student Reader xv
1 Preliminaries 1
1.1 Real numbers 1
1.2 The basic rules of inequalities — a checklist of things you probably know
already 2
1.3 Modulus 3
1.4 Floor 4
2 Limit of a sequence — an idea, a definition, a tool 5
2.1 Introduction 5
2.2 Sequences, and how to write them 6
2.3 Approximation 10
2.4 Infinite decimals 11
2.5 Approximating an area 13
2.6 A small slice of π 16
2.7 Testing limits by the definition 17
2.8 Combining sequences; the algebra of limits 24
2.9 POSTSCRIPT: to infinity 29
2.10 Important note on ‘elementary functions’ 35
3 Interlude: different kinds of numbers 37
3.1 Sets 37
3.2 Intervals, max and min, sup and inf 40
3.3 Denseness 47
4 Up and down — increasing and decreasing sequences 53
4.1 Monotonic bounded sequences must converge 53
4.2 Induction: infinite returns for finite effort 62
4.3 Recursively defined sequences 71
4.4 POSTSCRIPT: The epsilontics game — the ‘fifth factor of difficulty’ 75
5 Sampling a sequence — subsequences 77
5.1 Introduction 77
5.2 Subsequences 77
5.3 Bolzano-Weierstrass: the overcrowded interval 83
6 Special (or specially awkward) examples 87
6.1 Introduction 87
6.2 Important examples of convergence 87
x CONTENTS
7 Endless sums — a first look at series 103

7.1 Introduction 103
7.2 Definition and easy results 104
7.3 Big series, small series: comparison tests 111
7.4 The root test and the ratio test 118
8 Continuous functions — the domain thinks that the graph is unbroken 125
8.2 An informal view of continuity 127
8.3 Continuity at a point 133
8.4 Continuity on a set 134
8.5 Key theorems on continuity 138
8.6 Continuity of the inverse 146
9 Limit of a function 153
9.2 Limit of a function at a point 158
10 Epsilontics and functions 169
10.1 The epsilontic view of function limits 169
10.2 The epsilontic view of continuity 174
10.3 One-sided limits 177
11 Infinity and function limits 185
11.1 Limit of a function as x tends to infinity or minus infinity 185
11.2 Functions tending to infinity or minus infinity 192
12 Differentiation — the slope of the graph 201
12.2 The derivative 203
12.3 Up and down, maximum and minimum: for differentiable functions 213
12.4 Higher derivatives 223
12.5 Alternative proof of the chain rule 225
13 The Cauchy condition — sequences whose terms pack tightly together 229
13.1 Cauchy equals convergent 229
14 More about series 237
14.1 Absolute convergence 237
14.2 The ‘robustness’ of absolutely convergent series 242
14.3 Power series 252
15 Uniform continuity — continuity’s global cousin 259
15.2 Uniformly continuous functions 263
15.3 The bounded derivative test 272
CONTENTS xi
16 Differentiation — mean value theorems, power series 277

16.2 Cauchy and l’Hôpital 277
16.3 Taylor series 284
16.4 Differentiating a power series 287
17 Riemann integration — area under a graph 293
17.2 Riemann integrability — how closely can rectangles approximate areas
under graphs? 295
17.3 The integral theorems we ought to expect 305
17.4 The fundamental theorem of calculus 313
18 The elementary functions revisited 325
18.2 Logarithms and exponentials 325
18.3 Trigonometric functions 332
19 Exercises: for additional practice 341
Suggestions for further reading 377

Index 379
A Note to the Instructor
The first twelve chapters present the ideas of analysis to which virtually everyone
enrolled upon a degree pathway within mathematical sciences will require expo-
sure. Those whose degree is explicitly in mathematics are likely to need most of the
rest. Of course, how this material is divided across the years or across the semesters
will vary from one institution to another.
Most of the exercises set out within the text are provided with specimen
solutions either complete, outlined or hinted at, but in the final chapter we have
also included a suite of over two hundred problems which are intended to assist you
in creating assessments for your student groups. Specimen solutions to these are
available to you, but not directly to your students, by application to the publishers:
please see the webpage www.oup.co.uk/companion/McCluskey&McMaster for
how to access them.
Prior knowledge that the reader should have before undertaking study of this
material includes a familiarity with elementary calculus and basic manipulative
algebra including the binomial theorem, a good intuitive understanding of the
real number system including rational and irrational numbers, basic proof tech-
niques including proof by contradiction and by contraposition, very basic set
(and function) theory, and the use of simple inequalities including modulus.
Substantial revision notes on several of these topics are provided within the text
where appropriate.
A Note to the Student Reader
If, as a student of the material that this book sets forth, you are enrolled on a
course of study at a third-level institution, your instructors will guide and pace you
through it. Careful consideration of the feedback they give you on the work you
submit will be very profitable to you as you develop competence and confidence.
If you are an independent reader, not engaged with such an institution’s pro-
grammes, we intend that you also will find that the text supports your endeav-
ours through its design: in particular, through the expansive (almost leisurely)
treatment of the initial ideas that really need to be thoroughly grasped before you
proceed, through the informal and intuitive background discussions that seek to
develop a feel for concepts that will work in parallel with their precise mathematical
formulations, and through the explicit inclusion of roughwork paragraphs that
allow you to look over the shoulder of the more experienced practitioner of the
craft and under the bonnet of the problem being tackled.
In both cases, our strongest advice to you is to work through every exercise
as you encounter it, and either check your answer against a specimen answer
where available, see if it convinces a colleague or fellow student, or submit it for
assessment or feedback as appropriate. Nobody learns analysis merely by reading
it, any more than you can learn swimming or cycling just by reading a how-to book,
however well-intentioned or knowledgably written it may be. No one can teach you
analysis without your commitment; but you can choose to learn it and, if you do,
this working textbook is designed to help you towards success.
.........................................................................
1 Preliminaries
.........................................................................
1.1 Real numbers

You can choose to think of the real numbers as being all the possible decimals –
finite and infinite, recurring and non-recurring, positive and negative and zero,
whole numbers and fractions and surds1 and non-surds such as π and e, and every
possible combination of such objects. Equally well, you can choose to think of
them as being (or being represented by) all the points that lie on a continuous
unbroken straight line (the real line, the real axis) that stretches away endlessly in
both directions. Somewhere on that line is a point marked 0 (zero) which separates
the positives (on its right) from the negatives (on its left), and pacing out from
zero at regular intervals in both directions lie the whole numbers (the integers) like
distance markers along that endless road.
1
−√8 −1.7 2 π
−3 −2 −1 0 1 2 3
A naı̈ve picture of the real line
This is not, of course, a proper definition of what real numbers are. We are taking
what is sometimes called a naïve view of the system of real numbers: not having
sufficient time to construct it – to dig deeply enough into the logical foundations of
mathematics to come up with a guarantee of its existence – we are instead seeking
to highlight the common consensus on how real numbers behave, combine and
compare. This consensus will already be enough to let us start explaining some
basic ideas in analysis (and we shall say more about the finer structure of the real
numbers in Chapter 3).
Nothing in Section 1.2 is likely to strike the student reader as being much more
than common sense, and nor should it at this stage of study. Nevertheless, it is all
too easy to make mistakes in comparisons between numbers – inequalities – and it
is consequently important to keep these apparently obvious rules in mind and to
build up a good measure of confidence in their use, especially because so many
arguments in analysis depend upon using inequalities. Sections 1.3 and 1.4 present
a couple of useful operations on real numbers that are strongly connected with
inequalities.
√
√ 5
3
√
1 that is, non-rational numbers involving roots, such as 2, √ , 10 − 3 2.
1+ 2
Undergraduate Analysis: A Working Textbook, Aisling McCluskey and Brian McMaster 2018.
© Aisling McCluskey and Brian McMaster 2018. Published 2018 by Oxford University Press
2 1 PRELIMINARIES
1.2 The basic rules of inequalities — a checklist

of things you probably know already
• Each real number is either positive or zero or negative. ‘Non-negative’ means
positive or zero.
• x > y and y < x both mean x − y is positive2.
• x ≥ y and y ≤ x both mean x − y is non-negative3.
• x < y < z means both x < y and y < z. Likewise for >, ≤, ≥.
• If x < y and y < z, then x < z. Likewise for >, ≤, ≥.
• If x ≤ y and y ≤ x, then x = y.
• If x and y are different real numbers, then one of them is greater than the other,
and is usually denoted4 by max{x, y}.
• You can add a number to an inequality without damaging it:
x < y ⇒ x + a < y + a.
• You can add two inequalities:
(x < y and a < b) ⇒ x + a < y + b.
• Notice how to use the symbol ‘ ⇒ ’ (pronounced implies): the last line is
shorthand for ‘if x < y and a 0, we have x < y ⇒ ax < ay.
• If you multiply an inequality by a negative number, the inequality becomes
reversed:
provided that a < 0, we have x < y ⇒ ax > ay.
• You can multiply two inequalities provided that all the numbers involved are
positive:
(0 < a < b and 0 < x < y) ⇒ ax < by ;
(0 < a ≤ b and 0 < x ≤ y) ⇒ ax ≤ by.
• Provided that the numbers involved are positive, you can take reciprocals
across an inequality, and the inequality becomes reversed:
x < y ⇒ 1/x > 1/y provided that x, y are positive.
• Provided that the numbers involved are positive, you can take square roots5
across an inequality, and the inequality is preserved:
√ √
x < y ⇒ x < y provided that x, y are positive. Likewise for cube roots,
fourth roots and so on.
2 – and are pronounced as x is greater/larger/bigger than y, y is less/smaller than x.

3 – and are pronounced as x is greater than or equal to y, y is less than or equal to x.
4 If x = y then max{x, y}√
means x (or y, which is the same thing).
5 Recall that the symbol x always means the non-negative square root of x.
1.3 MODULUS 3
• ‘There are large integers:’ that is, for any given real number x we can find an
integer n so that n > x.
1.3 Modulus
1.3.1 Definition If x is a real number, we define6 its modulus (also called its
absolute value) as |x| = the greater of x and −x. That is:
• If x ≥ 0 then |x| = x;
• If x < 0 then |x| = −x.
Since the effect of modulus is to ‘throw away the minus from negative numbers’,
the following should be obvious:
1.3.2 Proposition For any real numbers x, y:

• x ≤ |x|, −x ≤ |x|,
• | − x| = |x|,
• |xy| = |x||y|,

• xy = |x|
|y| provided that y = 0,
√
• x2 = |x|.
1.3.3 The triangle inequality For any real numbers x and y, we have
|x + y| ≤ |x| + |y|.
Proof
Since x ≤ |x| and y ≤ |y|, adding gives us x + y ≤ |x| + |y|.
Exactly the same reasoning gives us −x + (−y) = −(x + y) ≤ |x| + |y|.
Now |x + y| is either x + y or −(x + y). So whichever one it is, it is ≤ |x| + |y|.
Note
It is easy to extend this by induction7 to deal with any finite list of numbers, thus:
|x1 + x2 + x3 + . . . + xn | ≤ |x1 | + |x2 | + |x3 | + . . . + |xn |.
1.3.4
Thereverse triangle inequality For any real numbers x and y, we have
|x| − |y| ≤ |x − y|.
6 More briefly: |x| = max{x, −x}.

7 We discuss this type of argument in detail later in the text.
4 1 PRELIMINARIES
Proof
Use the triangle inequality on x = (x − y) + y and we get |x| ≤ |x − y| + |y|, from
which |x| − |y| ≤ |x − y|.
Interchange
x and y, and we also get |y| − |x| ≤ |y − x| = |x − y|.
Now |x| − |y| is either |x| − |y| or |y| − |x|. So whichever one it is, it is ≤ |x − y|.
1.4 Floor
1.4.1 Definition When x is a real number, we define the floor of x (also called
the integer part of x or, informally, x rounded down to the nearest integer) to be the
largest integer that is ≤ x. The usual notation for the floor of x is x , although some
books write it as [x]. For instance, 5.6 = 5, π = 3, 7 = 7, −8 12 = −9.
If you choose to imagine the real numbers as being set out along the real line,
with the integers – marked here by heavier dots – embedded into it at regular
intervals, then the following diagram should help you to picture the relationship
between x and x .
x x + 1
Case 1: when x is not an integer
x x + 1
Case 2: when x itself is an integer
In both cases, the essential inequality connecting x and x is
x ≤ x < x + 1
or, equivalently
x − 1 < x ≤ x.
.........................................................................
2 Limit of a sequence
— an idea, a definition,
a tool
.........................................................................
2.1 Introduction
Mathematical analysis has acquired a reputation – not entirely justified – for
seeming more difficult than other first-year undergraduate study areas. We shall
begin our exploration of it by seeking to identify the factors that have contributed
to this image, and what we can do to explain or address them.
Firstly, the study of mathematics is cumulative to a greater degree than that
of most disciplines. Each new block of mathematics that a student encounters is
built directly on other, underpinning, blocks, and it is practically impossible to
achieve confidence in the new without having previously identified and grasped the
older supporting material. No matter how well you can implement differentiation
algorithms, your chance of successfully finding the second derivative of x4 is very
limited until you’ve learned your three-times table.
Secondly, mathematics is hard. By that we do not mean that it is intrinsically
difficult: in this sense, ‘hard’ is the opposite of ‘soft’, not the opposite of ‘easy’.
Learning a piece of mathematics requires a precise understanding of the terms
that it involves, of the arguments that it employs and of the questions that it seeks
to answer. A broad appreciation, a solid general overview of the topic, will on its
own be utterly insufficient for actual application. Precision of concept and of logical
discourse, as well as the previously mentioned cumulativeness, are the hallmarks
of a discipline that is ‘hard’ in this sense.
Yet these two factors are common to the whole of mathematics. Why does
analysis in particular have such a daunting public image?
It seems to us that, thirdly, a lack of introductory gradualness comes into play
here. Most topics, in mathematics and elsewhere, can be adequately explained to
the beginner by working initially on simple special cases. So the usual arena for
first steps in linear algebra is something like the coordinate plane, rather than
an infinite-dimensional Banach space; French language lessons do not kick off by
handing out a table of the complete tenses of common irregular verbs. In analysis,
however, the very first concept that a beginner has to make sense of is one of the
most demanding: until you have a crisp understanding of the notion of the limit
of a sequence (or, a matter of similar difficulty, of the supremum of a set of real
6 2 LIMIT OF A SEQUENCE — AN IDEA , A DEFINITION, A TOOL
numbers) you can neither read nor carry out any significant analytic activity. On
the credit side, this means that we can honestly promise the beginner that the
material gets easier once we are through most of Chapter 2 – an interesting contrast
with many topics, both mathematical and otherwise – provided always that this
first concept is fully and thoroughly understood before we go any further.
Fourthly – and this is another point that applies to the whole of the discipline,
but is particularly relevant just here – mathematics as a subject and mathematicians
as a breed are inclined to prefer conciseness over verbosity when they present final
versions of their work, and to feel more at home with terse, lean, point-by-point
arguments rather than expansive, wordy, descriptive accounts. There are, however,
some key moments in analysis where expansive rather than compressed accounts
actually help in delivering understanding, and the definition of sequence limits,
right at the start of our study, is one of them. It is perfectly possible to write down
that definition in one line: but if we do, most readers will not see the point of it,
will not grasp the kind of problem that it is set up to address and will not be able
to make effective use of it even in quite simple examples. So – with apologies to all
those who don’t like reading essays – we see no alternative to spending a fair bit of
time and several hundred words filling in the background and ‘thinking out loud’
about how to use this idea in applications. We again reiterate that the concept itself
is not intrinsically difficult; it is merely different from mathematical notions that
you have already mastered, and needs a particular form of argument presentation
in order to get the best out of it. We also commit to getting back to concise, un-
wordy arguments as soon as and wherever possible.
With all this in mind, we shall devote most of Chapter 2 to a thorough and
leisurely exploration of this one single idea that opens the path to analytic argu-
ments in mathematics: limits of sequences – its intuitive meaning, some of the
contexts in which it arises, how to define it in terms sufficiently precise to do
serious mathematics with it, and how to handle that rigorous definition in a range
of illustrative examples. Please keep in mind that, once the opening chapter is safely
assimilated, most of the rest of the first-year analysis syllabus is easier. (By the way,
there is a fifth factor contributing to the widespread perception of the difficulty of
introductory analysis, but it concerns its logical structure rather than its narrowly
mathematical content, so we shall set it aside until some familiarity with the basic
idea has been gained – see Section 4.4.)
2.2 Sequences, and how to write them

A real sequence in mathematics, sometimes more properly called an infinite real
sequence, is an unending list of real numbers in a particular order: a first one, a
second, a third, and so on without end. In other topics within mathematics, it pays
to look at unending lists of objects of other kinds – complex numbers, functions,
sets – but for the present we shall restrict our focus to real numbers, and use the
single word ‘sequence’ always to mean ‘real sequence’ (since no other varieties
are under our attention). The sorts of symbols that we write down to identify a
particular sequence that we want to work with look like one of the following:
2.2 SEQUENCES, AND HOW TO WRITE THEM 7
(a1 , a2 , a3 , a4 , · · · , an , · · · )
(a1 , a2 , a3 , a4 , · · · )
(an )n∈N
(an )n≥1
(an )
– and in many cases we complete the description by setting down a formula for
how to calculate each individual number an in the list (the so-called nth term). For
instance, if we wish to talk about the sequence of all perfect squares, that is, all
the squares of positive integers in their natural order, then all of the following are
acceptable symbols:
(1, 4, 9, 16, · · · , n2 , · · · )
(1, 4, 9, 16, · · · )
(n2 )n∈N
(n2 )n≥1
(n2 )
(a1 , a2 , a3 , a4 , · · · , an , · · · ) where an = n2 for each positive integer n
(a1 , a2 , a3 , a4 , · · · ) where an = n2 , each positive integer n
(an )n∈N in which an = n2 for each n
(an )n≥1 with an = n2 for each n
(an ), an = n2 for each positive integer n
It may seem a little irritating that so many different styles of symbol are allowed,
but this is mostly to enable us to tailor the notation we use to the particular problem
that we are working on without writing more than is necessary. For instance, if the
formula for an is as simple as an = n2 , then we really have no need for a separate
symbol for the nth term, and we might just as well write it as n2 all the time; on the
other hand, if the nth term is something as complicated as
n!(n + 1)!(2n + 3)! t n

((n + 2)!)3 (4n − 1)!
then we shall certainly not want to write that out more often than is needful, and in
such cases, having a brief symbol such as an to stand in for it will be a considerable
benefit and relief.
Although the idea of denoting a sequence by a list of its first few terms or a
formula for its general term, wrapped up in brackets, is little more than common
sense, it will be important to use this notation consistently and correctly. So we
now flag up a few dos and don’ts concerning how best to employ it:
• Whenever you use a notation like (a1 , a2 , a3 , a4 , · · · , an , · · · ) or
(a1 , a2 , a3 , a4 , · · · ), be careful not to leave out the final row of dots: because a
symbol such as (a1 , a2 , a3 , a4 , · · · , an ) or (a1 , a2 , a3 , a4 ) is a standard way to
write a finite list of numbers consisting of only n or, indeed, only four items,
and you will confuse the person reading your work if you use it when you
actually intend an infinite sequence.
• Also be cautious about using such a symbol as (1, 4, 9, 16, · · · ): however

obvious it may be to you that this intends the sequence of perfect squares, there
are other perfectly good sequences whose first four terms are 1, 4, 9 and 16.
Therefore, only use this style of notation if it is genuinely clear what the
‘pattern’ of the terms is. Note that the symbol (12 , 22 , 32 , 42 , · · · ) makes this
pattern quite unambiguous.
• Always take care not to leave off the enclosing brackets when writing down a
sequence: if you write just n2 , your reader will think that you mean only the
single number n2 (for some particular n that you have in mind) rather than the
whole endless list of the squares.
• There are some occasions when n = 1 is not the best starting point for a
sequence. If, for instance, we need to discuss the sequence

n
,
(n − 1)(n − 3)
then we dare not use n = 1 or n = 3 because it would lead to division by zero

(which is, of course, meaningless). The notation can be tweaked slightly to
avoid this, for example, by writing

n
(n − 1)(n − 3) n≥4
which starts the list off safely at n = 4.

• Again, if we want to work with the endless list of factorials, it may be useful to
recall that zero-factorial is a perfectly good and useful number, and explicitly to
include it in our discussion by using a notation such as
(n!)n≥0 .
Here are a few illustrative examples of sequences, some presented in more than
one style of symbol. You may find it useful to ‘translate them into English’ in your
head; for instance, the first is ‘the sequence of odd positive integers’, the fourth is
‘the sequence of primes’, the sixth is ‘the sequence of reciprocals of the positive
integers but with the sign alternating’ and so on.
2.2.1 Example
1. (1, 3, 5, 7, 9, · · · ) = (2n − 1)n≥1

2.
3 3 9 15 33 63 1 1 1 1
, , , , , ,··· = 1 + ,1 − 2,1 + 3,1 − 4,···
2 4 8 16 32 64 2 2 2 2
n
2 + (−1)n−1
=
2n n∈N
2.2 SEQUENCES, AND HOW TO WRITE THEM 9
3.
1 1 1 1 1
5, , 5, , 5, , 5, , 5, , · · ·
2 4 8 16 32
= (xn ) where xn = 5 if n is odd but xn = 2−n/2 if n is even.
4. (2, 3, 5, 7, 11, · · · ) = (yn )n≥1 where yn is the nth prime number. Notice how
potentially misleading the first symbol was here: it could have meant several
different sequences including, for example, ‘two, and then the odd integers
excluding the perfect squares’. The second symbol was free from any such
ambiguity.
5.
(1, −8, 27, −64, 125, −216, · · · ) = (1, −8, 27, −64, · · · , (−1)n−1 n3 , · · · )
= ((−1)n−1 n3 )n∈N
Once again the first symbol might have been misunderstood, but the second
and third left no room for confusion.
6.
1 1 1 1 1 1 1 (−1)n−1
,− , ,− , ,− , ,··· =
1 2 3 4 5 6 7 n n≥1
7. √ 1 1 1
(1, 2, 3 3 , 4 4 , 5 5 , · · · )
8.
2
1 1 1 3 1 4 1 n
(1 + 1) , 1 + , 1+ , 1+ ,··· = 1+
2 3 4 n n∈N
9.
1 1 1 1 1 1
1, 1 + , 1 + + , 1 + + + , · · ·
2 2 3 2 3 4
10.
1 1 1 1 1 1
1, 1 + , 1 + + , 1 + + + , · · ·
2 2 4 2 4 8
You should notice that some sequences, but by no means all of them, seem to
be settling towards an ‘equilibrium value’, a ‘steady state’ as we scan further and
further along the list. For instance, (2) above appears to be settling towards 1, and
(6) towards 0; in contrast, (1) and (4) are so far showing no sign of settling, but are
‘exploding towards infinity’ (and of course we shall need to make that phrase a lot
more precise before we do anything serious with it) while (5) is doing some kind
of cosmic splits by exploding towards infinity and minus infinity at the same time
(same comment). In the case of the last four sequences (7) to (10), it is much less
clear – to unaided common sense – what is going to happen in the long run.
This feature of settling towards some limiting steady state is the most impor-
tant property that a sequence can possess. Our major upcoming task is to seek
ways of deciding whether a given sequence ultimately settles or not, and if so, to
what steady state it does ‘gravitate’. As a first step in tackling that task, we need
to find a way to describe such a settling process that is crisp and precise enough
that we can do proper mathematics with it. In this description, we shall need to
avoid all vague and undefined phrases like ‘gravitate’ and ‘gets extremely close
to’ and ‘is as nearly equal to as makes no difference’ without, of course, throwing
away the valuable intuition that these phrases try to capture.
2.3 Approximation
Across the full expanse of science, engineering and mathematics, we find instances
where some interesting constant is known not precisely but ‘only’ by estimation, by
approximation. In most such scenarios, we expect to see not just one approxima-
tion, but several obtained at various times and by different procedures (hopefully
with increasing accuracy over time) and, using if necessary a little imagination,
we can conceive of an endless process of refinement (new experiments, wider data
collection, more powerful computation, more sophisticated digital image enhance-
ment …) capable of generating better and better estimates for ever. Of course, in
the best of all possible worlds it would be ideal if, at some point in the process,
we should meet and recognise the exact value of the elusive constant…but this is
unrealistic for several reasons (including the fact that no measuring device ever
invented can operate to infinite precision) and, even within mathematics itself 1,
one must normally be content with an endlessly refining approximation procedure.
‘Your estimate is only as good as your assessment of its error’ as the maxim
puts it, so each approximation process has to focus on how bad the error term
is…or, more precisely, on how bad the error term could be: because we are actually
never going to know the exact size of the error since that presupposes knowing
the exact value of the constant that we are struggling to estimate. The final piece
in this jigsaw is: how good do we need the approximation to be? – for most
estimations are carried out with a view to application, and different applications
depend for their success or validity on different levels of accuracy. (If you are in
the business of manufacturing ball bearings for use in cheap, disposable water
pumps, then a radius accuracy of 0.1 mm may well be good enough since this
also helps to hold the price down; but if your next customer is installing similar
devices in a submarine lab environment where failure means transporting the
device up through a thousand fathoms for replacement, you had better increase
that accuracy by an order of magnitude or two; and if you want to seek a contract
with a commercial aircraft manufacturer for whom pump failure places lives in
jeopardy, another order of magnitude again…)
1 The best of all possible worlds!

2.4 INFINITE DECIMALS 11
Proper understanding of an approximation procedure therefore entails aware-

ness of six separate quantities: the ideal value; the stage we have reached in the
approximation process (let’s call that stage n); the current, nth approximation; the
(actual) error2 at this stage; a calculable ‘worst case scenario’ (WCS) estimate for
that error; and the tolerance (the amount of error that would be acceptable for
the application that we presently have in mind). For a given tolerance, a good
approximation process is one for which we can find a value of n that forces the worst
case scenario error estimate at stage n to be smaller than the tolerance, because
then the (necessarily smaller) actual error will also be less than the tolerance – and
therefore satisfy the customer or the demands of the application. Preferably, we
should also wish all the later-stage errors to be as small or smaller than that: so
that if, in caution or for some other reason, we take a larger value of n than the one
first found, we shall still be safely within the tolerance of error.
A perfect approximation process would be one that was capable of this for each
and every value of the tolerance (except for the physically absurd suggestion of
tolerance = zero).
2.4 Infinite decimals

One of the most basic situations in which endless lists of approximations turn up
is that in which we write down – or rather, imagine writing down – an infinite
decimal. For instance, let us consider the meaning of the statement that the fraction
13
27 can be written in this way:
13
= 0.48148148148 . . . .
27
For each positive integer n, let pn stand for the decimal expansion, up to the nth
decimal digit, of this number. So…
p1 = 0.4
p2 = 0.48
p3 = 0.481
p4 = 0.4814
p5 = 0.48148
.. ..
. .
…and so on. None of these numbers equals 13 27 exactly: if you take any one
of them and multiply it by 27, you don’t get 13; indeed, just from the way in
2 that is, the difference between the nth approximation and the ideal value
which multiplication is carried out, you don’t get a whole number. They are,
however, approximations to 13 27 and – broadly speaking – they provide better and
better approximations as you work along the list: indeed, you could get ‘as close to
13
27 as you needed to be’ just by going far enough.
It is that last, slightly vague, comment that we need to make precise and, in order
to pin down its exact meaning, we shall look at the errors in the approximations,
the differences 13 27 − pn :
13
27 − p1 = 0.08148148 . . . < 0.1 = 10−1
13
27 − p2 = 0.00148148 . . . < 0.01 = 10−2
13
27 − p3 = 0.00048148 . . . < 0.001 = 10−3
.. .. ..
. . .
13
27 − pn = < 0.00 . . . 01 = 10−n
.. .. ..
. . .
Notice that this display doesn’t tell us explicitly the exact value of these errors3,
but that this is not going to matter: because, instead, we have an overestimate of
the size of the typical nth -stage error that is simple enough to work with. Look:
• If we are allowed a certain ‘tolerance of error’, that is, we’ve been asked to get
some approximations whose actual errors are less than that tolerance, we can
1 N
now easily see how to do it. Just find some positive integer N such that 10
is smaller than the permitted tolerance, and then pN will be a good enough
approximation because 0 < 13 1 N 13
27 − pN < ( 10 ) , so 27 − pN is also smaller
than the tolerance.
• Continuing…not only is that pN good enough, but all the later ones pN+1 ,
pN+2 , pN+3 , pN+4 …will also be good enough in the same sense of ‘good’: they
all have actual errors that are less than our allowed tolerance.
Incidentally, if we had approximated from above instead of from below, by opt-
ing for the list of numbers 0.5, 0.49, 0.482, 0.4815, 0.48149…, then the differences
13
27 − pn would have been negative. That would not have bothered us too much
because the size of the error is usually more important than whether it is positive
or negative. So we would have taken pn − 13 27 as the error measurement in this case
instead of 13
27 − p n , and the rest of the calculations would have worked out almost
exactly the same.
The way to avoid worrying about whether our approximations are overestimates
or underestimates is simply to define the error to mean 13 27 − p
n , so that error
measurements are always counted as positive. We shall do this in future.
The last small step we take in order to compress our account of this string of
improving approximations into a compact phrase is to agree on a standard symbol
for what we called the tolerance of error. For historical reasons, the Greek letter ε
3 for one thing, we are only working to so-many decimal places at this point
2.5 APPROXIMATING AN AREA 13
(pronounced ‘EP-silon’) is used. Thus, our precise and concise reason for declaring
the list of numbers (pn ) to be a ‘perfect’ approximation process for the fraction
13
27 is:
for each ε > 0, we can find
a positive
integer N such that
for every n ≥ N, we get pn − 13
27
< ε.
You should probably read that last couple of lines several times in order to feel
how it captures all the aspects of our lengthy discussion. Notice particularly that
the N that we find depends on the particular ε that we are challenged with: if they
change ε, we are free to change4 the replying N that we find (and we shall usually
need to do so). Sometimes it is denoted by N(ε) or Nε instead of plain N in order
to make exactly this point, and other commonly employed symbols for it are nε ,
n0 , n1 and n2 .
2.5 Approximating an area

For the moment, let’s forget everything we ever knew about differential calculus
and integration (we shall study these in detail later) and think how we might try to
find the area A of the region in the coordinate plane that lies above the horizontal
axis, below the curve y = x2 and between the vertical lines x = 0 and x = 2.
y = x2
0 1 2 x
As a first attempt, we could divide this area into vertical strips by adding in extra
vertical lines at x = 0.2, 0.4, 0.6, 0.8, 1.0, 1.2, 1.4, 1.6 and 1.8. Since (for positive
numbers) p < q implies p2 < q2 , within each of these strips, the lowest point of
the curve is at its left-hand edge and the highest point is at its right-hand edge. If we
therefore imagine, inside each of the strips, the tallest rectangle that fits underneath
4 For instance, in our ‘13/27’ example, when ε is set at 0.001, N = 3 is a good enough choice;
if the application requires ε to be reset to 0.000001, then N will have to alter to 6 at least. The
relationship between ε and N is not always as simple as this, however.
the curve, it is easy to write down the area (length times breadth) of each of these
rectangles. By adding these together, we get an estimate of the area under the curve.
y = x2
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 x
If we denote this estimate by U10 – since it is visibly an Underestimate of the

intended area and we used ten strips in calculating it – we get
U10 = 0.2{0.22 + 0.42 + 0.62 + 0.82 + 12 + 1.22 + 1.42 + 1.62 + 1.82 }
which calculates out as 2.28. In just the same way, we can find an overestimate
(let us denote it by O10 ) of the desired curved area by considering the shortest
rectangle within each vertical strip that fits above the curve, as indicated in the
next diagram:
y = x2
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 x
This time, the calculation is
O10 = 0.2{0.22 + 0.42 + 0.62 + 0.82 + 12 + 1.22 + 1.42 + 1.62 + 1.82 + 22 } = 3.08.
2.5 APPROXIMATING AN AREA 15
It would, of course, be wrong to claim that U10 and O10 are accurate estimates of
the area A that we set out to find: for one thing, our diagrams suggest that they are
not; for another, the relatively large difference (0.8) between them makes it clear
that they certainly cannot both have a high degree of accuracy. However, there are
two very encouraging aspects of the discussion:
1. If we re-run the argument with more and narrower vertical strips, there are
good prospects that the accuracy will improve.
2. We have control of the error: since the desired area A lies between U10 and
O10 , the error we make in proposing either of these as an approximation to A
cannot be more than the difference O10 − U10 . Therefore if, as we hope, the
difference between the overestimate and the underestimate becomes smaller as
we increase the number of strips, we have an improving sequence of
approximations to A, just as in the previous illustration the decimals of
increasing length provided an improving sequence of approximations to
13/27.
Therefore, instead of using just ten vertical strips to slice up the area A, imagine
that we choose a positive integer n and divide A into n strips (meeting at n2 , n4 , n6
and so on up to 2n−2
n ). There are only very minor changes in the argument: we get
the underestimate
2 2 2
2 2 4 6 2n − 2 2
Un = + + + ··· +
n n n n n
8 2
= (1 + 22 + 32 + · · · + (n − 1)2 )
n3
and the overestimate

2 2 2 2
2 2 4 6 2n
On = + + + ··· +
n n n n n
8 2
= (1 + 22 + 32 + · · · + n2 ).
n3
Now we can call in an algebraic identity for the sum of consecutive squares that
you may have come across before:
k(k + 1)(2k + 1)
12 + 22 + 32 + 42 + · · · + k2 =
6
(if this is not familiar to you, you will find a proof of it in the next chapter but one,
as paragraph 4.2.2). Using this, the underestimate formula simplifies to
8 (n − 1)(n)(2n − 1) 4(n − 1)(2n − 1)

Un = = ,
n3 6 3n2
the overestimate formula to
8 (n)(n + 1)(2n + 1) 4(n + 1)(2n + 1)

On = 3
=
n 6 3n2
and the difference between them to
4(n + 1)(2n + 1) 4(n − 1)(2n − 1) 4(6n) 8

On − U n = − = = .
3n2 3n2 3n2 n
At this point we are ready to obtain estimates for A that are as accurate as we
choose to make them. For instance, if we need an approximation whose error is
0.01 or smaller, choosing n = 800 will be good enough since, at that point, n8 is
0.01 and the error we make in claiming
4(801)(1601)
‘A = O800 = = 2.67169 approximately’
3(800)2
is less than that. If, instead, we need the error to be smaller than 0.0001, then
choosing n = 80, 000 (or, indeed, anything larger than 80,000 – for instance, it
might make the arithmetic simpler if we opted for n = 100, 000 instead5) will
achieve it. Indeed, no matter how small the error is required to be, we now have
a simple rule of thumb for choosing a positive integer N so that any value of n
that exceeds N will give us an approximation On (or Un or, indeed, anything in
between) whose actual error is smaller: we have, in the language of Section 2.4, set
up a ‘perfect’ approximation procedure for A.
2.6 A small slice of π

We conclude this group of illustrations with one that has more underlying theory
than the previous two, so we shall consider it only briefly. If you add together an
initial block of terms in the list
1 1 1 1 1 1 1
1, − , + , − , + , − , + , − , ···
3 5 7 9 11 13 15
then (firstly) the total is an estimate for π/4 and (secondly) the error in that
estimate is smaller than the modulus of the next number in the list – the first one
that you decided not to take. Exactly why this is true is not at all obvious, but we
shall investigate it later in the text (see 18.3.17).
If, for instance, we add the first five numbers, then the running total so far is
263/315 which is really not all that good as an estimate, but does at least come
with an assessed error: the error must be smaller than 1/11, the modulus of the
sixth term. Likewise, the total of the first twenty fractions in the list (which would
5 Working to five decimal places, O100,000 calculates out at 2.66671 and U100,000 at 2.66663
2.7 TESTING LIMITS BY THE DEFINITION 17
be very tedious but routine to calculate by hand) will provide an approximation

that differs from π/4 by less than 1/41, the (modulus of the) twenty-first term.
Continuing, if we needed an estimate whose error was less than 0.01, notice that
1
term number n here is ± 2n−1 , so by choosing n = 51 or more we would get the
th
modulus of the n term to be at most 1/101: in other words, the sum of the first
50 terms will differ from π/4 by less than 0.01.
It should be becoming clear that we can force the error term here to be smaller
than any given tolerance, merely by taking an initial block with enough terms in it.
So, although the laborious arithmetic that it entails severely limits the usefulness
of this approximation process, it is nevertheless ‘perfect’ in the technical sense that
we are using here.
2.7 Testing limits by the definition

The case studies on which we have spent the last few pages should, at least, have
removed every element of surprise from the following definition.
2.7.1 Definition 1. A sequence (xn )n∈N is said to converge to a limit (or to

tend to ), where is a real number, if:
for each ε > 0 there is some positive integer nε such that
|xn − | < ε for every n ≥ nε .
2. When this is so, we write xn → (as n → ∞) or, equivalently,
limn→∞ xn = or
lim xn = .
n→∞
(The phrases ‘(as) n → ∞’ are often omitted, especially in situations where we
might otherwise be writing them many times over. When not omitted, these
phrases are commonly spoken as ‘as n tends to infinity’.)
3. Not all sequences have limits. Those that do are called convergent, whereas
those that do not are called divergent.
It will probably help you to keep on thinking of xn as the nth item in a succession
of approximations to , and of ε as the tolerance for some intended application.
In that sense the open interval ( − ε, + ε), which consists of exactly the
numbers whose distances from are smaller than ε, is where to find the ‘good’
approximations – those whose errors are smaller than the current tolerance. Keep
in mind that the physical distance between numbers x and y on the real line is
|x − y|, so that the phrase |xn − | < ε simply says ‘the distance between xn and
is smaller than ε’.
ε ε

−ε +ε
The ‘good’ approximations to lie between −tolerance and +tolerance

1
2.7.2 Example To show that the harmonic sequence n n≥1 converges6 to 0.
Draft solution
Our task is – given any positive tolerance ε – to find a value of n beyond which
all terms of the sequence lie within that tolerance
of 0. Such a task usually needs a
piece of roughwork first. We want n1 − 0 < ε, that is, n1 < ε, that is, n > ε1 . This
shows how big n needs to be. However, ε1 is probably not an integer so, in order to
line up with the definition, we had better round it up to the next whole number (or
one greater still, if you prefer) and call that nε . Now we are ready:
Solution
1
Given ε > 0, let nε be any integer larger ε . Then for every integer n ≥ 1nε we
1 than
have n > ε , therefore n < ε, that is, n − 0 < ε. By the definition, limn→∞ n = 0.
1 1
2.7.3 Example Put xn = 4 − 3n−2 for each positive integer n. We show that (xn )
is a convergent sequence, and that its limit is 4.
Draft solution
We need to arrange |xn − 4| < ε for each given positive tolerance ε – or, more
precisely, to decide how big n needs to be in order to force this to happen. Now
2
|xn − 4| simplifies to n32 and this will be less than ε just when n3 > ε1 , that is, when
n2 > ε3 , that is, when n > ε3 . We can locate a suitable nε for the definition by
rounding up that last expression to a whole number.
Solution
3
Given ε > 0, let nε be any integer larger than ε. Then for every integer n ≥ nε
3 3 ε
we have n > ε and therefore n2 > ε and therefore n12 < 3 and therefore n32 < ε,
that is,
3
|xn − 4| = |4 − 3n−2 − 4| = < ε.
n2
By definition, the sequence (xn ) converges to 4.
2.7.4 Example To show that
n(3n − 1)
lim = 3.
n→∞ n2 + 1
6 While we begin to build up experience and confidence in using this definition, we shall often
practise on sequences (such as this one) for which it is possible to guess fairly easily the exact
numerical value of the limit.
Draft solution
Let xn stand for the typical term in this sequence and let ε denote a given positive
tolerance. We want to arrange that |xn − 3| < ε for sufficiently big values of n,
that is,
2
n(3n − 1) 3n − n − 3n2 − 3 −n − 3 n+3

n2 + 1 − 3 = n2 + 1 = n2 + 1 = n2 + 1 < ε.
This time it is not straightforward to determine exactly how big n must be to force
this, but we do not need to do so exactly: we can look for a WCS7 overestimate that
is easier to work with, and use that instead to decide where nε can be safely placed.
Look carefully at the following overestimation:8
n+3 n+3 n + 3n 4n 4
< ≤ = 2 = .
n +1
2 n2 n2 n n
4 4
Now it is easy to make n less than ε: just ensure that n exceeds ε or, rather, an
integer larger than that.
Solution
Given ε > 0, let nε be any integer larger than ε4 . Then for every integer n ≥ nε we
have n > ε4 and therefore n4 > ε1 and therefore n4 < ε. But

n(3n − 1) n+3 n+3 n + 3n 4n 4

|xn − 3| = 2 − 3 = 2 < ≤ = 2 =
n +1 n +1 n2 n2 n n
so also |xn − 3| < ε. By definition, xn → 3.

n+5
2.7.5 EXERCISE Show that the sequence converges to 12 .
2n + 13 n≥1
Partial draft solution

n + 5 1
Given ε > 0, we need to arrange that − <ε. That simplifies
2n + 13 2
3
to <ε. Either calculate9 how big n needs to be in order to ensure
2(2n + 13)
7 ‘worst case scenario’

8 and keep in mind that, to increase a fraction of positive numbers, you can increase the
numerator, or decrease the denominator, or both.
2(2n + 13) 1 3
9 The inequality we desire is equivalent to > , and to 2n + 13 > , and to
3 ε 2ε
3 − 26ε 3 − 26ε
2n > , and to n > . So if we choose nε to be a positive integer that is bigger than
2ε 4ε
3 − 26ε
, then n ≥ nε will guarantee that we get it.
4ε
that this happens, or else (preferably) use some (WCS) overestimation to make
your task easier; for instance:
3 4 2 2 1
< = < = .
2(2n + 13) 2(2n + 13) 2n + 13 2n n
2.7.6 EXERCISE Show that the sequence (17n−3 − 2)n≥1 converges.

They haven’t told us what the limit is, but it will not be difficult to make an informed
guess. Roughly what is the value of the nth term if n is very big? What if n = 1, 000?
What if n = 1, 000, 000? What if n = 1, 000, 000, 000? Once you have correctly
guessed what the limit is going to turn out to be, it should be simple enough to
calculate how big n must be taken in order to make the nth stage error less than
any given ε > 0.
2.7.7 EXERCISE Prove the convergence (and evaluate the limit) of the sequence
(an ) described by
15n2 + n + 1
an = .
5n2 − n − 2
You should again be able to guess the limit pretty certainly just by trying huge
values of n (but better methods are coming). The tricky point this time comes in the
(WCS) overestimating of the error term. You ought to find that the error simplifies
4n + 7
to 2 , and it is then tempting to argue as follows:
5n − n − 2
4n + 7 4n + 7n 11n 11n
≤ 2 = 2 < 2 ...
5n2 −n−2 5n − n − 2 5n − n − 2 5n
but the last step is wrong: by changing 5n2 − n − 2 into 5n2 we have
actually increased the denominator and therefore decreased the fraction,
which is the exact opposite of what we intended and needed. Instead, try this:
5n2 − n − 2 ≥ 5n2 − n2 − 2n2 so
11n 11n 11n 12n 6
≤ 2 = 2 < 2 =
5n2 − n − 2 5n − n2 − 2n2 2n 2n n
and so on.
2.7.8 Remark How a sequence converges (or not) is not influenced in any way by
the first few terms – nor, indeed, by the first trillion terms. For imagine that we take
a convergent sequence (an ) with limit , and alter the first trillion (= 1012 ) terms
in some fashion. Given positive ε, we can find nε so that n ≥ nε makes the error
terms |an − | in the original sequence less than ε. If it happens that nε is more than
a trillion, this remains true for the modified sequence (since the modifications only
affected the early terms). Yet if nε is a trillion or less, we see that n ≥ 1012 forces the
nth stage errors in both the modified and the unmodified sequences to be smaller
than ε once again. In both cases, the limit has not been affected.
This allows us, when exploring the limit of a sequence, to ignore the first few (or
the first many – but never infinitely many) terms if it simplifies our argument. Here
is an illustration:
3n
2.7.9 Example To show that → 0.
7n2 − 6n − 12
Roughwork and partial draft solution

3n
Given ε > 0, we want to get 2 < ε. When n = 1, 7n2 − 6n − 12
7n − 6n − 12
is actually negative so the modulus signs are important; but their importance
disappears once n ≥ 2 since then 7n2 − 6n − 12 is positive. Thus, provided
we deal only with n ≥ 2, our problem becomes the slightly simpler condition
3n
< ε.
7n − 6n − 12
2
As before, we would like to replace that fractional expression with a larger but
simpler overestimate, so that (i) it will become easy to see how big n needs to be
to make the overestimate less than ε, and (ii) then the original (smaller) fraction
will automatically be less than ε also. We dare not simplify by throwing away the
−6n−12 from the denominator because that would increase the denominator and
thus decrease the fraction – the opposite of what we need. Nor can we easily replace
the −6n − 12 by −6n2 − 12n2 this time because it would make the denominator go
negative, and oblige us to use modulus signs again. Instead, and keeping in mind
that the 7n2 in that denominator is (for big values of n) more important than the
6n or the 12, consider replacing the 6n by a ‘slice’ of 7n2 that is small compared
with 7n2 but big compared with 6n. (The intention here is to simplify the algebra
in the denominator, while ensuring that it remains positive but becomes smaller
than it was.)
Let us be a bit more specific: provided that n is 7 or more, 6n will be smaller
than n2 , so 7n2 − 6n − 12 will be larger than 7n2 − n2 − 12 = 6n2 − 12. Now
do something similar to get rid of the −12: provided that n is 4 or more, 12 will
be smaller than n2 , so 6n2 − 12 will be larger than 6n2 − n2 = 5n2 . Therefore
(gathering up all the restrictions on the value of n that we found useful), if n is at
3n 3n 0.6
least 7, we shall get 7n2 − 6n − 12 > 5n2 , and 2 < 2 = . Now
7n − 6n − 12 5n n
the rest of the proof will run like that of earlier examples because we have a simple
(WCS) overestimate of the nth stage error and because our decision to ignore the
first 6 terms will not alter our conclusion.
Before we become too complacent about using the phrase ‘the limit of a
sequence’, we ought to take the trouble to check that no sequence can ever possess
two or more limits.
2.7.10 Theorem: uniqueness of limit of a convergent sequence If a sequence

(an ) converges to a limit 1 , and also converges to a limit 2 , then 1 = 2 .
Proof
If not, then one of the two is larger. Without loss of generality we’ll assume 1 < 2
(otherwise, just change the labels of the two alleged limits). Put ε = 12 (2 −1 ) > 0.
2ε
1 2
1 − ε 2 + ε
1 + ε = 2 − ε
ε is half the difference between 1 and 2
From the definition of limit there must be a positive integer n1 such that
1 − ε < an < 1 + ε
for every n ≥ n1 . Then again, there must be another positive integer n2 such that
2 − ε < an < 2 + ε
for every n ≥ n2 . Choose any integer n that is bigger than both n1 and n2 , and
we have all of these inequalities working for us simultaneously. In particular, and
holding in mind that the way we chose ε made 1 + ε and 2 − ε be the same
number:
an < 1 + ε = 2 − ε < an
which produces the contradiction an < an .
2.7.11 Example To show that each constant sequence converges (and that its limit
is that constant).
Solution
Consider a sequence (xn ) in which every xn is the same number: that is, a sequence
of the form (c, c, c, c, c, · · · ). We show that the limit of (xn ) is also c. In other
notation, limn→∞ c = c.
Given ε > 0, let us choose nε = 1…yes, with a constant sequence, we can get
away with a constant choice of nε also. Then for every n ≥ nε , we get |xn − c| =
|c − c| = 0 < ε and the demonstration is complete.
2.7.12 EXERCISE
• Show by example that the following statement is not true: if the sequence (a2n )
converges to 2 , then the sequence (an ) must converge to or to −.
• Prove that if the sequence (a2n ) converges to 0, then (an ) converges to 0.
Roughwork
• For questions like the first part, the thing to keep in mind is that if you only
know about the value of x2 (x being a real number) then, generally, you don’t
know whether x itself is positive or negative.
• The second part of this exercise highlights a small trick that frequently turns
out to be extremely useful. Suppose we know that a particular sequence (xn )
converges to a limit , and we wish to use this information to show that a
different but related sequence (yn ) converges to a limit m. Our task is to
demonstrate that, for any given ε > 0, we can force |yn − m| < ε (for
sufficiently large values of n). The available information is that |xn − | actually
can be made less than any given positive number, such as ε…but not only ε:
absolutely any positive quantity can be used instead if it helps us solve the
problem – for the basic given information is that |xn − | can be forced to be
less than any positive tolerance whatever.
Here, we need to show that (given ε > 0) we can make |an − 0| < ε, and the
given information is that |a2n − 0| can be made less than any tolerance. We ask
ourselves: what should that tolerance be, in order to be able to show that
|an − 0| < ε? Another reading of the sentence (if needed) should show you that
we can choose the ‘missing’ tolerance as ε2 : because |a2n − 0| < ε2 certainly
implies that |an − 0| < ε. So a formal solution to the second part can begin
as follows:
Partial solution
Given that (a2n ) converges to 0, and given ε > 0, notice that ε2 is also greater than
0, so there is a positive integer n0 such that |a2n − 0| < ε2 for all n ≥ n0 …
2.7.13 EXERCISE Let (xn )n≥1 be any sequence. Verify that xn → 0 if and only if
|xn | → 0.
Roughwork
The cautious approach to an ‘if and only if ’ claim is to break the argument into two
parts: the ‘if ’ and the ‘only if ’. That is, we set out to show the following:
1. if xn → 0 then |xn | → 0, and
2. if |xn | → 0 then xn → 0.
To set up part (1), assume that xn → 0. Then (for a given value of ε > 0) what
we know is that |xn − 0| < ε for all n ≥ some n0 . What we need to know is that
||xn | − 0| < ε for all sufficiently large values of n. Now compare what we know
with what we want to know.
Part (2) should work in a very similar way.
2.7.14 EXERCISE Let (an )n≥1 be any sequence. Show (directly from the defini-
tion of the limit) that an → if and only if an − → 0 if and only if |an − | → 0.
Roughwork
The second if-and-only-if is something that we know already: just put xn = an − in
the previous exercise (2.7.13). So we need only focus on the first one: that an →
if and only if an − → 0. For any given ε > 0, write down the definitions of what
an → and an − → 0 require, and compare them.
2.8 Combining sequences; the algebra of limits

This section offers us the first of a number of quicker ways to establish limits for
relatively complicated sequences. The basic idea of these results is that whenever
you can see how to build up a sequence from simpler ones whose limits you
already know, it is possible to write down the limit of the complicated one just
by combining the limits of its simpler ‘components’. Furthermore, the proof via the
definition that this is legitimate is done once and for all in the verification of the
results, and there is no need to repeat that proof for each suitable example that you
meet from then on.
2.8.1 Theorem Suppose that (an ) and (bn ) are convergent sequences, with limits
and m respectively. Then:
1. an + bn → + m,
2. an − bn → − m,
3. an bn → m,
4. For each constant k, kan → k,
5. |an | → ||,
an
6. Provided that m = 0 and that no bn is zero, also → .
bn m
REMARK: There are several other ways of expressing this collection of results,
including the following version:
1. lim(an + bn ) = lim an + lim bn ,
2. lim(an − bn ) = lim an − lim bn ,
3. lim(an bn ) = lim an lim bn ,
4. For each constant k, lim(kan ) = k lim an ,
5. lim |an | = | lim an |,
an lim an
6. Provided that no division by zero occurs, lim = .
bn lim bn
(The entire result can even be turned into English, thus: taking limits of conver-
gent sequences is compatible with addition, with subtraction, with multiplication,
2.8 COMBINING SEQUENCES; THE ALGEBRA OF LIMITS 25
with ‘scaling’ (that is, multiplying by constants), with taking modulus, and with
division provided always that no illegal division by zero is attempted.)
We shall eventually provide proofs of all six parts of this highly useful result, but
not all at once since we are keener just now to show how to use it. Here is a start
on that project:
Proof of part (1)

Let ε > 0 be given. Then (of course) ε/2 is also a positive number.10 Since we
know that (an ) converges to , there is some positive integer (let us call it n1 )
such that
ε
n ≥ n1 forces |an − | < .
2
For similar reasons, there is another positive integer (call this one n2 ) such that
ε
n ≥ n2 forces |bn − m| < .
2
We cannot know which of n1 , n2 is the greater, but one of them is.11 Write n0 =
max{n1 , n2 } and notice that whenever n ≥ n0 , we get both the displayed lines
working for us at the same time. Therefore, for every n ≥ n0 :
ε ε
|an + bn − ( + m)| = |(an − ) + (bn − m)| ≤ |an − | + |bn − m| < + = ε.
2 2
Thus the proof of (1) is complete. (Notice the use of the triangle inequality 1.3.3 in
the last line here.)
Proof of part (4)

In the special case k = 0, we know this already: because then (kan ) is a constant
sequence of zeroes, k is exactly zero and an earlier example told us that lim 0 = 0.
So for the rest of the proof we can assume that k = 0.
ε
Given any ε > 0, notice that is also positive (and that we needed the modulus
|k|
signs on k to make that true). Since an → , there must exist a positive integer (call
it n0 , for instance) such that
ε
n ≥ n0 forces |an − | < .
|k|
10 Why did we make that choice? Well, to show that an + bn converges to + m, we need to
arrange that |an + bn − ( + m)| shall be less than ε. Rearrange that desired punch-line into
|an − + bn − m)| < ε. Each of |an − | and |bn − m| can be made as small as we please…and if
we make each of them less than one half of ε, then their combined total will be smaller than two
halves of ε, which is exactly what we needed. Now, rejoin the main text.
11 Unless, of course, they happen to be equal, in which case it really doesn’t matter which one
you choose.
Then also
ε
n ≥ n0 forces |kan − k| = |k(an − )| = |k||an − | < |k| =ε
|k|
which is exactly what the definition asked in order to show that kan → k.
2.8.2 EXERCISE Construct a proof of part (2) of the theorem. (You should find
that an argument very like that given for part (1) will be convincing; after all,
addition and subtraction are quite similar operations.)
Moving towards the use of this theorem now, notice for a start that (according
to part (3), and remembering from Example 2.7.2 that n1 tends to zero)

1 11 1 1
lim 2 = lim = lim lim = 0.0 = 0
n nn n n
and, consequently, that

1 1 1 1 1
lim 3 = lim 2 = lim 2 lim = 0.0 = 0
n n n n n
1
and so on. Consequently → 0 for each positive integer k. (It is easy to prove this
nk
formally by induction,12 if you wish to try it.) This observation, combined with
pieces of our main theorem, allows us to deal with a large number of limit problems
quickly and painlessly. Consider, for instance:
2.8.3 Example To establish the convergence of the sequence (an ) described by
15n2 + n + 1
an = .
5n2 − n − 2
Solution
Begin by dividing the numerator and denominator of the fraction formula by the
largest power of n appearing, n2 (which dividing, of course, doesn’t change the
fraction in the least):
15n2 + n + 1 15 + n−1 + n−2 15 + 0 + 0 15

an = = → = = 3.
5n2 − n − 2 5 − n−1 − 2n−2 5 − 0 − 2(0) 5
We are finished already! Notice, however, how many aspects of the key theo-
rem were used in that last line: part (6) to let us deal with the numerator and
12 If this technique is not familiar to you, wait until Section 4.2 where we shall discuss it in
detail.
2.8 COMBINING SEQUENCES; THE ALGEBRA OF LIMITS 27
denominator separately, parts (1) and (2) to add/subtract the separate parts of each
and part (4) to see that lim 2n−2 = 2 lim n−2 . It is not considered necessary to write
out all of these moves, but be aware of them.
2.8.4 Example To find the limit (as n → ∞) of:

2 2−3n

n − 2n + 5 n − 4+5n
.
n3 + n2 + 7 − 3n + π

Solution
(Despite its forbidding appearance, this expression has been built up layer by
layer from simple pieces, and we only need to ‘shadow’ its construction to get the
answer.) To begin with, dividing top and bottom by n:
2 − 3n 2n−1 − 3 2(0) − 3 −3
= −1 → =
4 + 5n 4n + 5 4(0) + 5 5
and therefore (using part 5 of the theorem)

2 − 3n
→ −3 = 3 .
4 + 5n 5 5
Next, again dividing top and bottom by n:

n − 2−3n
4+5n 1 − n−1 2−3n
4+5n 1 − 0 35 1
= → =
3n + π 3 + π n−1 3 + π(0) 3
and consequently

n − 2−3n
4+5n
→ 1 = 1.
3n + π 3 3

Moving on (but still digesting fractional formulas by dividing top and bottom
by the largest power involved, currently n3 ):
n2 − 2n + 5 n−1 − 2n−2 + 5n−3 0 − 2(0) + 5(0) 0

= −1 −3
→ = = 0.
n +n +7
3 2 1 + n + 7n 1 + 0 + 7(0) 1
Finally, we put the whole construction together:

2 n − 2−3n
n − 2n + 5 4+5n 1 1 1

n3 + n2 + 7 − 3n + π → 0 − = − = .
3 3 3

2.8.5 EXERCISE Use the algebra of limits to find the limit of each of the sequences
whose nth terms are given by the following formulae:
1. 1 − 2n + 3n2 − 4n3
−5 − 6n + 7n2 + 8n3

2. 11 − 6n 5
5n + 2

9 − 4n 9 − 4n + n2
3.
1 + n + 7n2 − 1 − n + 7n2

π 2 n2 3 − n4 1 − π n
4. + | | − 3
π n2 + π 3 3 + n4 2+π n2
Remark
In the first of these, be sure to check that the denominator of the given fraction
is never zero (because otherwise the relevant part of the key theorem could not
be used). In the fourth of these exercises, keep in mind the traditional ‘order of
precedence’ of the arithmetical operations: for example, that multiplications and
divisions are always done before additions and subtractions, except where brackets
or other ‘enclosing’ operations dictate otherwise (since material inside brackets and
the like needs to be evaluated first). In this context, pairs of modulus signs behave
like brackets.
2.8.6 EXERCISE Prove part (5) of the algebra of limits theorem. That is, given
that an → , show that |an | → ||.
Draft solution
For a slick solution, almost all you will need is the reverse triangle inequality 1.3.4

|x| − |y| ≤ |x − y|
(where x, y are any real numbers).
2.8.7 A look forward What we have done so far about detecting and proving
limits of sequences works fine provided that either we can sensibly guess the limit
and then come up with an overestimate of the error term that is simple enough
to work with, or else we can break the typical term down into simpler pieces
whose separate limits we already know. Unfortunately, many important and useful
sequences don’t fall into either of those categories. For instance (look back to a
previous example 2.2.1 (7)), although it is not very hard to guess the limit of n1/n ,
it is then far from obvious how we could estimate the error-gap between that guess
and the typical term. Again, in the cases of ( (1 + n1 )n )n≥1 and of
2.9 POSTSCRIPT: TO INFINITY 29

1 1 1 1
1 + + + + ··· + ,
2 3 4 n n≥1
although a few minutes’ roughwork with a calculator will strongly suggest that the
nth term of each is getting steadily bigger as n increases, it is hardly self-evident
whether or not they are getting steadily closer to some limiting ‘ceiling’ or, if so,
what number that ceiling might be. In short, we need more techniques, more
analytic technology, to tackle such questions. The aim of the next chapter is to
develop and deploy some of that technology.
2.9 POSTSCRIPT: to infinity

It is built into our definition of convergence of a sequence (xn )n∈N to a limit,
as n → ∞, that the limit is always a real number: it is not a mysterious symbol
such as ∞ or −∞. Indeed, many useful theorems would collapse if we did not
define convergence in this fashion: in particular, several of the algebra of limits
results would have fallen into chaos by inviting us to calculate such expressions as
∞/∞, ∞ − ∞ and 0 times ∞. Yet there are circumstances in which the idea of a
sequence ‘moving towards infinity or minus infinity’ can be useful and can be made
precise. Indeed, we have used one such scenario many times already, for in every
sequential convergence question the label n that serves to identify the nth term xn
does indeed ‘→ ∞’ as the familiar notation says.13 Hopefully it has been clear that
the phrase ‘n → ∞’ simply means that n becomes enormously big, bigger than any
possible upper bound or ceiling n0 that could have been suggested. It is this insight
that allows us to give a clear definition of a sequence tending to infinity also:
2.9.1 Definition A sequence (xn )n∈N is said to tend to infinity or diverge to

infinity if:
for each K > 0 there is some positive integer nK such that
xn > K for every n ≥ nK .
When this is so, we write xn → ∞ or even limn→∞ xn = ∞, but we must take

care not to treat limn→∞ xn in such cases as if it were a number (for instance, by
using it in an arithmetical calculation). Avoid saying that ‘(xn ) converges to infinity’
since, in fact, (xn ) is not convergent at all.
In a similar way, we can define the idea of a sequence tending to minus infinity:
13 Likewise, it is unlikely to surprise anyone if we claim that n2 → ∞ and n3 → ∞ (as

n → ∞).
2.9.2 Definition A sequence (xn )n∈N is said to tend to minus infinity or diverge to
minus infinity if:
for each K < 0 there is some positive integer nK such that
xn < K for every n ≥ nK .
When this is so, we write xn → −∞ or even limn→∞ xn = −∞.
2.9.3 Example To show that n3 − 5n2 − 2n − 17 → ∞ as n → ∞.
Roughwork
Given K > 0, we need to guarantee that, for all values of n that are big enough,
n3 −5n2 −2n−17 > K. Now (to simplify the algebra, just as we did for convergent
sequences)14
1 3
1. 5n2 ≤ 10 n provided that n ≥ 50,
1 3
2. 2n ≤ 10 n provided that 20 ≤ n and therefore surely if n ≥ 5,
2
1 3
3. 17 ≤ 10 n provided that 170 ≤ n , and therefore surely if n ≥ 6.
3
Hence, n ≥ 50 will force
1 3 1 1 7
n3 − 5n2 − 2n − 17 ≥ n3 − n − n3 − n3 = n3 .
10 10 10 10
7 3 10K
Then, to force 10 n > K, it is good enough to take n > 3
7 …
Solution
10K
Given K > 0, choose an integer nK > max 50, 3
7 . Then (as shown in the
roughwork) n ≥ nK will guarantee that
7 3
n3 − 5n2 − 2n − 17 ≥ n > K.
10
2.9.4 EXERCISE Show that
3n4 − 7n2
→ ∞ as n → ∞.
2n2 + n − 1
14 Sometimes, even the roughwork needs roughwork. Here, as n becomes bigger, the n3 will
become much bigger and more important than the other pieces. To bring this out, we ought to
force each of 5n2 , 2n and 17 to be less than a small fraction of the n3 …for instance, less than one
tenth of it. How do we make 5n2 less than a tenth of n3 ? By ensuring that 50n2 < n3 …and any
value of n over 50 will do that. Then the same kind of discussion will handle 2n and 17. At this
point you can rejoin the text and see why we restrict n as we do.
2.9.5 EXERCISE Write out a formal proof of the following (almost immediate)
consequence of the definitions: xn → −∞ if and only if −xn → ∞.
2.9.6 Example To verify that
100 − n5
→ −∞ as n → ∞.
200 + n2 + n4
2.9.7 Roughwork By the previous exercise, it is equivalent to show that
n5 − 100
→ ∞ as n → ∞
200 + n2 + n4
(which will be slightly easier since there are fewer minuses involved). Given K > 0,
we therefore need to arrange (just by taking large enough values of n) that
n5 − 100
> K.
200 + n2 + n4
This is too hard an inequality to solve directly, so we should try to replace

n5 − 100
by a smaller but simpler fraction that we can still make greater than
200 + n2 + n4
K: and this will involve replacing the numerator n5 − 100 by something smaller,
but the denominator 200 + n2 + n4 by something bigger. As n becomes large, it
will be the n5 that matters more in the numerator – for instance, if n is at least 4,
then the 100 will be smaller than one tenth of n5 , and the numerator will then be
bigger than nine tenths of n5 .
In the denominator, it will be the n4 (again, the biggest power of n appearing)
that matters most – for instance, if n is at least 2 then n2 will be smaller than n4 ,
and if n is at least 4 then 200 will be smaller than n4 . So n ≥ 4 will guarantee that
200 + n2 + n4 is less than n4 + n4 + n4 .
Summary so far: provided than n ≥ 4, we have that
n5 − 100 0.9n5 0.9n5 3n

> 4 = = .
200 + n + n
2 4 n +n +n
4 4 3n4 10
It will be easy to ensure that this final fraction is greater than K: we need only make
n exceed 10K/3. A formal solution can now be compiled as follows:
2.9.8 Solution Given K > 0, we choose nK to be any positive integer that is greater
than both 4 and 10K
3 . Then (as seen in the roughwork) for any n ≥ nK , we shall
have
n5 − 100 0.9n5 0.9n5 3n 3

> = = ≥ nK > K.
200 + n2 + n4 n4 + n4 + n4 3n4 10 10
This shows
n5 − 100
→∞
200 + n2 + n4
which, by the preceding exercise, is equivalent to the question posed.
2.9.9 EXERCISE Show from the definitions that:

1. if xn ≥ yn for all n ≥ some n0 , and limn→∞ yn = ∞, then limn→∞ xn = ∞
also.
2. if xn ≤ yn for all n ≥ some n0 , and limn→∞ yn = −∞, then
limn→∞ xn = −∞ also.
Roughwork towards part 1

The task is, given K > 0, to obtain evidence that xn > K for all sufficiently large
values of n. What we know is that yn > K for all sufficiently large values of n, and
also that xn ≥ yn for every n ≥ n0 . . .
Some fragments of the algebra of limits are still valid for sequences that diverge
to ∞ or −∞; for instance:
2.9.10 Theorem
1. if xn → ∞ and yn → ∞ (as n → ∞) then xn + yn → ∞ also;

2. if xn → ∞ (as n → ∞) and there is some positive constant A such that
−A ≤ yn ≤ A for every n ∈ N 15 then xn + yn → ∞ also;
3. if xn → ∞ and A is a positive constant then Axn → ∞;
4. if xn → ∞ and A is a negative constant then Axn → −∞;
1
5. if xn → ∞ then → 0 (as n → ∞);
xn
1
6. if xn → −∞ then → 0 (as n → ∞);
xn
1
7. if xn > 0 for all n ≥ some n0 and xn → 0 then → ∞ (as n → ∞);
xn
1
8. if xn < 0 for all n ≥ some n0 and xn → 0 then → −∞ (as n → ∞).
xn
(Notice that in 5 and 6 we have been slightly cavalier about never dividing by
zero. If some of the xn were zero then the corresponding 1/xn would not be defined.
However, in each case xn is eventually becoming very big (whether positive or
negative) and therefore certainly non-zero; to recover the result in all cases, we
only have to ignore the first few terms where zero might have occurred and, as
usual, ignoring finitely many terms has no effect upon a limit.)
15 that is, (yn )n∈N is bounded: see paragraph 4.1.4 for more on this.
Sample proofs
All of the above are proved in a routine16 way from the definitions, so we shall only
demonstrate a few.
Roughwork towards 1.
Given K > 0, we need to show that xn + yn > K for all n ≥ some threshold
value. Now we are told that xn > K for all n ≥ some n0 , and also that yn > K
for all n ≥ some possibly different n1 . What if we combine these inequalities while
n ≥ max{n0 , n1 }? Remember that K is positive, so 2K > K.17
Proof of 2.
Begin by choosing A as in 2., and let K be a given positive constant.
(Roughwork: we need to arrange that xn + yn > K (for large values of n) and,
knowing that yn ≥ −A in any case, it will be enough to get xn > K + A; but this
we can do, because xn → ∞.)
Since xn → ∞, we can find a positive integer n0 such that, for each n ≥ n0 ,
xn > K + A. Since we also know that yn ≥ −A, adding the two inequalities gives
xn + yn > K. Therefore xn + yn → ∞.
Proof of 4.
Suppose that xn → ∞ and that A is a negative constant. Given K < 0, we need
to think how to ensure that Axn < K and, because negative multipliers can cause
confusion in inequalities, it is safer to write that as −Axn > −K, that is, as |A|xn >
|K|, or as xn > |K|
|A| .
Since xn → ∞ and |K| |K|
|A| is positive, there exists18 n0 ∈ N such that xn > |A| for all
n ≥ n0 : that is, |A|xn > |K| or −Axn > −K or Axn < K. Therefore Axn → −∞.
16 Incidentally, routine is not the same as easy! By calling an argument routine, all we mean is
that it is built up by putting together the definitions and the ‘obvious’ results in a rather predictable
fashion, without depending on surprise insights. That may or may not be brief or easy – in some
cases, it is neither.
17 Incidentally, a slightly more elegant solution could begin with xn > K/2 for all n ≥ some
n0 and yn > K/2 for all n ≥ some n1 .
18 This is similar to the trick we discussed in the roughwork to 2.7.12; once we are told that
xn → ∞, then xn can be made larger not only than a particular positive constant such as |K| that
we are challenged with, but also than any positive expression built from that constant that suits
our purpose, such as |K|/|A|.
Proof of 7.
Supposing that xn > 0 for all n ≥ some n0 and that xn → 0; let K be any given
positive constant. Then K1 is positive19 so there exists n1 ∈ N such that |xn −0| < K1
for all n ≥ n1 : indeed, if n ≥ n0 as well, then 0 < xn < K1 for all such n. Therefore
1 1
n ≥ max{n0 , n1 } ⇒ 0 < xn < ⇒ >K
K xn
which, since K can be arbitrarily big, delivers 1/xn → ∞.
2.9.11 EXERCISE Choose any other two parts of this theorem and write out
proofs for them.
2.9.12 Example To obtain an easier proof that
100 − n5
→ −∞ as n → ∞.
200 + n2 + n4
2.9.13 Solution Once we divide the top and bottom lines by n5 , the biggest power
of n appearing, it is immediate from the algebra of limits that
200 + n2 + n4
→ 0 as n → ∞
100 − n5
and we also notice that the expression is negative for all n ≥ 3. The conclusion
follows from part 8 above.
2.9.14 EXERCISE Decide (with proof) whether each of the following is true or
false:
• If xn → 0 and no xn is exactly zero, then either 1/xn → ∞ or 1/xn → −∞.
• If xn → 0 and no xn is exactly zero, then 1/|xn | → ∞.
2.9.15 Remark Once we had checked that 1/n → 0, we learned from the algebra
of limits that its various positive-integer powers
1 1 1 1
, , , ,
n2 n3 n4 n5
and so on, also converged to zero. This conclusion is not, however, restricted to
integer powers:
19 Where did that come from? Once again, it comes from thinking about what we need to show.
In order to prove that x1n → ∞, we need to get x1n > K for big values of n and, provided that xn is
positive, that is the same as asking for xn < K1 . Can we make that happen? Yes, because xn → 0.
2.10 IMPORTANT NOTE ON ‘ELEMENTARY FUNCTIONS’ 35
2.9.16 Example To use part (5) of the preceding theorem to show that, for any
positive real number a,
1
→ 0.
na
Solution
By the referenced part (5), it is good enough to show that na → ∞.
(Roughwork:
√ for each given positive K, we need to arrange that na > K, that is,
that n > K . . .)
a
√
Given K > 0, we note that a K is also √ merely a real number, so we can find√a
positive integer n0 that is larger: n0 > a K. Then for any n ≥ n0 we have n > a K
and therefore20 na > K. In other words, the sequence (na ) diverges to ∞, as
desired.
2.10 Important note on ‘elementary functions’

The ideal way to develop any mathematical text is, of course, to begin with
elementary material that everyone already agrees upon and, using that as basis, to
work step-wise through more sophisticated matters, always establishing the truth
of any newly encountered result as a consequence of earlier ones. While it would
be possible to do precisely this in an account of real analysis, and while it is indeed
essential as regards the evolution of the theory, the suite of examples that rightfully
illustrate that theory would be somewhat sterile and flavourless if we adhered too
rigidly to the policy.
The reason is that the so-called elementary functions21 such as sin x, cos x, ln x
and ex , and their basic properties, will already be familiar to anyone who sets out to
read this text, and we can greatly increase the illustrative power of our examples22
by using them: but a logically sound definition of these functions requires a
surprising deal of preparation, and so therefore does proper mathematical proof
that they do behave, in all circumstances, in the ways in which those readers believe
that they do. It will not, in fact, be until Chapter 18 that we set out definitions of
the four functions just named.
20 Here we are taking it for granted that larger positive numbers have larger ath powers which,
although true, will not be amenable to formal demonstration until we have properly defined xa
for all real a and positive real x.
21 Note that there are a number of notations in use for the natural logarithm function in
particular; we have chosen to use ‘ln’, but ‘log’ and ‘loge ’ are also widespread.
22 We should stress that expressions such as sin x and ex are functions rather than sequences,
since it is to be understood that their control variable x is a real number rather than just a
positive integer.
However,
we can use such functions to build a wide variety of sequences, such
as (sin n), n sin n1 , e1+sin(π n/4) , (ln(1 + 1/n)) and so on, n being a positive integer in each
case.
Pending our future encounter with those definitions and proofs, it seems
right that we allow ourselves to mobilise certain basic information concerning
sin x, cos x, ln x and ex provided that we take care not to use it in the development
of the theory, but only in examples, and provided that we do eventually get around
to showing that this information is reliable. The following summary lists explicitly
the details, concerning the four functions, that we temporarily accept for use in
examples (and that we promise, in the long run, to establish).
1. • sin : R → [−1, 1] is an odd, periodic function in the sense that
sin(−x) = − sin x and sin(x + 2π ) = sin x in all cases.
• −1 ≤ sin x ≤ 1 for all real x.

• The derivative of sin x is cos x.
• sin 0 = sin π = 0, sin(π/2) = 1, sin(−π/2) = −1.
2. • cos : R → [−1, 1] is an even, periodic function in the sense that
cos(−x) = cos x and cos(x + 2π ) = cos x in all cases.
• −1 ≤ cos x ≤ 1 for all real x.

• The derivative of cos x is − sin x.
• cos 0 = 1, cos π = −1, cos(π/2) = cos(−π/2) = 0.
• sin2 x + cos2 x = 1.

• cos x = sin π2 − x for all x.
3. • ln : (0, ∞) → R is an increasing function.
• ln 1 = 0, ln(xy) = ln x + ln y, ln(x/y) = ln x − ln y, ln(xy ) = y ln x.
• The derivative of ln x is 1/x.
• For positive values of x very close to 0, ln x is enormous but negative.
• For sufficiently big positive values of x, ln x is enormous and positive.
4. • The exponential function ex is an increasing function from R to (0, ∞): all
its values are strictly positive.
• It equals its own derivative.
• e0 = 1, ex+y = ex ey , ex−y = ex /ey , (ex )y = exy .
• For very large negative x, ex is extremely small (but still positive).
• For large positive x, ex is extremely big (and, of course, positive).
• ln(ex ) = x for all real x, eln x = x for all positive x.
.........................................................................
3 Interlude: different
kinds of numbers
.........................................................................
3.1 Sets
It is convenient for us to use a little of the language and symbolism of set theory,
although the theory itself lies beyond the scope of this text. By the term set we
mean a well-defined collection of distinct objects that are called its elements. By
well-defined we mean that each object either definitely is an element of the set in
question, or definitely is not: there must be no borderline cases. (So, for instance, we
have to avoid ideas such as ‘the set of all very large integers’ or ‘the set of numbers
that are extremely close to 3’: for it would be a matter of opinion and context which
objects were to belong to such ill-defined collections.) By distinct we mean in
practice that repetitions among the elements of a set are not allowed, so that, for
instance, the set of prime factors of 360 (= 23 × 32 × 5) has only the three elements
2, 3 and 5, although the list (2, 2, 2, 3, 3, 5) of its prime factors comprises six items.
3.1.1 Notation If S is a set then

• x ∈ S says that x is one of the elements of S,
• x∈
/ S says that x is not one of the elements of S.
Enclosing a list of its elements within curly brackets is a quick way to create a
symbol for a small, simple set. For instance, {2, 3, 5} denotes the set of prime factors
of 360, and {1, 4, 9, 16, 25, 36, 49, 64, 81} is a reasonably tidy notation for the set
of perfect squares that are less than one hundred. The same style can be used for
larger sets, even including some infinite sets, provided that the pattern within the
listing is really obvious: most people will accept {1, 2, 3, 4, 5, 6, · · · 98, 99, 100}
as ‘clearly’ meaning the set of the first one hundred positive integers, and
{2, 4, 8, 16, 32, 64, · · · } as the (infinite) set of all the powers of 2.1 For more
complicated sets, however, this type of notation really doesn’t work.
The sets we particularly work with are sets of real numbers – either real numbers
of a particular type, or a selection of real numbers lifted out for some particular
discussion. Some of these have turned out to be useful so often in the past that
there are now standard symbols for them that should be known and recognised:
1 More precisely, the powers of two in which the index is a positive whole number.
38 3 INTERLUDE: DIFFERENT KINDS OF NUMBERS
3.1.2 Number systems
1. N is the set of positive integers,2 the set of whole numbers that are greater than
zero. In this case, the curly bracket notation is clear enough:
N = {1, 2, 3, 4, 5, · · · }.
2. Z is the set of all integers – positive, negative and zero whole numbers. The
symbol comes from the German word Zahl meaning number. Sometimes we
stretch the list-in-curly-brackets notation and display it as
Z = {· · · − 3, −2, −1, 0, 1, 2, 3, 4, · · · }.
3. Q is the set of rational numbers, those that can be expressed exactly as a
fraction in the usual sense of the word, that is, as an integer divided by a
non-zero integer. For instance, 23 ∈ Q, 1.4 ∈ Q because 1.4 = 75 exactly,
−3 ∈ Q because −3 = −3 1 (or, if you feel it is somehow cheating to have 1 as
the unnecessary denominator of a fraction, then −3 = −6 2 is an alternative
reason).
√
4. R is the set of all real numbers. (So, for instance, π/e ∈ R but −1 ∈ / R.)
It is not obvious to common sense alone that there are any real √ numbers
√ √ that
are not rational. You may have seen proofs that surds such as 2, 5, 3 12 and
so on are not rational but, if not, here is a sample argument that is worth reading
carefully as an illustration of proof by contradiction. We shall take it as given that
every positive integer n (except 1) can be expressed as a product of powers of prime
numbers, and that (apart from the order in which these are written) such a prime
decomposition is unique for each n.
√
3.1.3 Proposition The real number 35 is not a rational number.
Proof
Suppose it were. Then it must be possible to find two integers p and q such that
√ p p2
35 = q . Then 35 = q2 or, more simply,
p2 = 35q2 .
(Now pick on one of the primes that appear to be involved in that last equation–
say, the 5. The 7 would do equally well.)
Let 5a be the power of 5 that appears in the prime decomposition of p, and 5b be
the power of 5 that appears in the prime decomposition of q. (We are not ruling out
2 We have avoided using the phrase ‘natural numbers’. Some writers use the term natural
numbers as a synonym for the positive integers, and others take it to mean the set comprising
the positive integers and 0 itself. In addition, some writers use the symbol N to mean the set of
natural numbers rather than the set of positive integers. So 0 ∈ N in some books and 0 ∈ / N in
others! Be aware of this possibly confusing point if you are reading a range of textbooks.
3.1 SETS 39
that 5 may not be a prime factor at all of p or of q, for a or b might be zero.) Then 52a
is the power appearing in (the decomposition of) p2 , and 52b the power appearing
in q2 . Also the power appearing in 35q2 = 5 × 7 × q2 will be 52b+1 . However, p2
and 35q2 are the same number so, from the uniqueness of prime decomposition,
52a and 52b+1 must be the same thing – which tells us that 2a = 2b + 1 and that
1
a−b= .
2
Since a and b are integers, that is impossible.

√ The contradiction completes the proof
(for it shows that it is impossible for 35 to be rational).
Real numbers that are not rational are called irrational. We shall explore a few
aspects of the relationship between reals, rationals and irrationals in the third
section of this chapter.
3.1.4 Definition If A and B are sets, and if every element of A is also an element
of B, then A is called a subset of B, and we write A ⊆ B. For example, N, Z and
Q are subsets of R, while N and Z are subsets of Q, and N is a subset of Z. Notice
that the wording of the definition makes every set a subset of itself: A ⊆ A merely
because every element of A is (of course) an element of A.
There are ways of combining two (or more) sets that will also help us to discuss
some matters in analysis:
3.1.5 Definition Let A and B be any sets. We define

1. their union A ∪ B to be the set of all objects that are elements of A or of B (or
of both A and B),
2. their intersection3 A ∩ B to be the set of all objects that are elements both of A
and of B,
3. the set difference A \ B to be the set of all objects that are elements of A but
not elements of B.
We can usefully re-write those definitions, concentrating on which objects are
elements of the three defined sets:
1. x ∈ A ∪ B says precisely that x ∈ A or x ∈ B (or both),
2. x ∈ A ∩ B says precisely that x ∈ A and x ∈ B,
3. x ∈ A \ B says precisely that x ∈ A but x ∈
/ B.
It is perfectly possible for the intersection of two sets not to include any elements
at all. For this reason among others, it is useful to admit the idea of an empty set –
a set that does not include any objects as elements. The standard symbol for such
a set is ∅. Naturally, most of the sets we deal with in practice are non-empty.
3 (sometimes informally called their overlap)

3.1.6 Selection Whenever A is a set and P(x) is a statement that makes sense4 for
each element x of A, the notation
{x ∈ A : P(x)} or {x ∈ A | P(x)}
means the subset of A comprising just those elements of A for which the statement
in question is true. This is a much more versatile style of notation than the
list-in-curly-brackets that we previously presented. The whole symbol is usually
pronounced as ‘the set of all x in A such that P(x) is true’ (and the words ‘is true’
can be left out). Whether you use a colon (:) or a vertical (|) half-way through
the symbol is a matter of taste and readability; for instance, if P(x) begins with
something like |x| or already involves a colon, then using the other divider will
help the eye to take in quickly what is written. Of course, this selection notation only
applies to sets that are subsets of some pre-existing set, but that is not a problem for
us since virtually all the sets we need to work with are subsets of the real number
system R. Here are a few illustrations:

p
• x ∈ R | x = for some integers p, q where q = 0
q
– is the definition of the set Q of rationals.
• {x ∈ R : x ∈ / Q}
– is the set R \ Q of irrationals.
• {x ∈ R | x = 2n , some n ∈ N}
– is the set we previously (and a little clumsily) wrote as {2, 4, 8, 16, 32, 64, · · · } .
• {x ∈ R : |x| < 3}
– is the ‘solid block’ of real numbers lying between −3 and 3. We shall next turn
our attention towards such unbroken ranges of real numbers.
3.2 Intervals, max and min, sup and inf

An interval is a subset of the real line R that – intuitively speaking – has no gaps
in it, but stretches unbroken across a connected region of R. It is easy to turn that
intuitive impression into a sharp definition:
3.2.1 Definition A non-empty subset I of R is an interval if x ∈ I, z ∈ I, x < y < z

together imply that y ∈ I.
4 That is to say, for each individual x ∈ A, P(x) is either true or false – once again, there must
be no borderline cases.
3.2 INTERVALS, MAX AND MIN, SUP AND INF 41
There are several different kinds of interval, depending on whether the subset
extends limitlessly far up or down the real line, and on whether it includes or
excludes points ‘at the edge’ (properly called endpoints of the interval), and we list
all of them here, together with an exact set-theoretic description of each type (in
each case, a and b denote arbitrary real numbers, with a < b if appropriate):
1. (a, b) = {x ∈ R : a < x < b}
2. [a, b] = {x ∈ R : a ≤ x ≤ b}
3. (a, b] = {x ∈ R : a < x ≤ b}
4. [a, b) = {x ∈ R : a ≤ x < b}
5. [a, a] = {x ∈ R : a ≤ x ≤ a} = {a}
6. (a, ∞) = {x ∈ R : a < x}
7. [a, ∞) = {x ∈ R : a ≤ x}
8. (−∞, b) = {x ∈ R : x < b}
9. (−∞, b] = {x ∈ R : x ≤ b}
10. (−∞, ∞) = R
(Incidentally, some texts do not class case (5) as an interval at all, and some
others call it a degenerate interval.)
The numbers a and b appearing in these descriptions are referred to as the
endpoints of the relevant interval, and all the other points of an interval are called
its interior points. It is important to bear in mind that an endpoint of an interval
may or may not itself belong to that interval. Also notice that the symbols ∞ and
−∞ are not counted as endpoints, mainly because (whatever they are) they are
not real numbers: their purpose in these notations is just to draw attention to the
absence of a right-hand or a left-hand endpoint.
The first five cases in our list are called bounded intervals. Cases (6) and (7) are
called bounded below (but not bounded above) while cases (8) and (9) are called
bounded above (but not bounded below). These ideas can be extended to apply to
any subsets of the real line:
3.2.2 Definition An upper bound of a set A of real numbers means a number u

such that a ≤ u for every a ∈ A. The set A is called bounded above if it has an upper
bound. Of course, if u is an upper bound of A then any number bigger than u is
also an upper bound of A.
3.2.3 Definition A lower bound of a set A of real numbers means a number l such
that l ≤ a for every a ∈ A. The set A is called bounded below if it has a lower bound.
Of course, if l is a lower bound of A then any number smaller than l is also a lower
bound of A.
3.2.4 Definition The set A is called bounded if it is both bounded above and
bounded below: that is, if it has an upper bound and a lower bound.
You should check back to our list of types of interval and see that the way we used
the terms ‘bounded’, ‘bounded above’ and ‘bounded below’ there is consistent with
the definitions that we have just given.
a lower bound of A A an upper bound of A
other lower bounds of A other upper bounds of A
Bounds for a set
3.2.5 EXERCISE Verify that a subset A of the real numbers is:

• bounded if and only if it is contained in some bounded interval,
• bounded above if and only if it is contained in some bounded-above interval,
• bounded below if and only if it is contained in some bounded-below interval,
• bounded if and only if there is some constant K such that, for every a ∈ A,
|a| ≤ K.
Specimen solution
Consider the last of these four assertions. If there does exist a constant K as
described, then −K ≤ a ≤ K for every a ∈ A, so −K and K are respectively
lower and upper bounds for A, and A is therefore bounded. Conversely, if A is
indeed bounded, and we then choose a lower bound l and an upper bound u for it,
then for each a ∈ A we have
l ≤ a ≤ u ≤ |u|, −a ≤ −l ≤ | − l| = |l|
so both a and −a are less than or equal to max{|u|, |l|}. Put K = max{|u|, |l|} and
we have (for each a ∈ A) |a| ≤ K.
When I is an interval that is bounded above, that is, one of the form (a, b) or
[a, b] or [a, b) or (a, b] or (−∞, b) or (−∞, b], it is obvious where the ‘right-hand
edge’ of I is (namely, the endpoint b) and it is obvious from the shape of the closing
bracket whether that edge point belongs to the interval or not. These are things also
worth asking about sets that are more complicated than intervals, but we need to
be more careful about defining ‘right-hand edge’ for a set that is not just an interval.
3.2.6 Definition If A is a set of real numbers and m is a particular real number

then we say that m is the maximum element of A if
• m ∈ A and
• x ≤ m for every x ∈ A
(in other words, if m is both an element of A and an upper bound of A). Notice
immediately that many sets do not possess a maximum element: for instance,
(a, b), (0, 1) ∪ (2, 4), N and { 12 , 34 , 78 , 15 31
16 , 32 , · · · } do not. On the other hand, (a, b],
1
(0, 1) ∪ (2, 4], {−n : n ∈ N} and { 2n : n ∈ N} do have (fairly obvious) maximum
elements. Informally, we often use terms such as biggest, largest, greatest or top
element instead of maximum element.
3.2.7 Definition The supremum of a non-empty set A of real numbers that is

bounded above means the least5 upper bound of A. It is often written briefly as
sup A.
This definition will be better understood if we unpack it a little. To say that a

number t is the supremum of a set A says two things: firstly, that t is one of A’s
upper bounds and, secondly, that no smaller number can be. That is, t ≥ x for
every x ∈ A but, for any positive number ε, t − ε fails to be greater than or equal
to all of the elements of A, that is, it is strictly less than at least one element of A.
In summary, t = sup A says:
• t ≥ x for every x ∈ A, and
• for each ε > 0 there exists xε ∈ A such that t − ε < xε .
3.2.8 EXERCISE
• Show that if a set A possesses a maximum element, then this maximum

element is the supremum of A.
• Show that if sup A belongs to A as an element, then it is the maximum element
of A.
• For each interval that has a right-hand endpoint, show that the supremum is
that right-hand endpoint.
• For the set
−3 −5
A= + : n ∈ N, m ∈ N
n+1 m+4
prove that sup A = 0.
Specimen solution
Consider, for example, the fourth of these assertions. It is clear that every element
of the set in question is negative, so 0 is an upper bound of the set. If ε is any given
positive number, then we can choose (after some roughwork6) an integer p that is
larger than ε8 . Now choose n, m both greater than p and we see that
5 We are using the word ‘least’ in its common-sense meaning here; if in any doubt, please refer
forward to paragraph 3.2.9.
6 We need to find an element of A that is greater than −ε. Looking at what a typical element
3 5
of A is, and getting rid of the complicating minuses, that says we want n+1 + m+4 < ε. We shall
1 1 ε
achieve that if both n+1 and m+4 are less than 8 , so those bottom-line integers will need to be
greater than ε8 …
8 1 ε −3 −3ε
n+1>n>p> , < , > ,
ε n+1 8 n+1 8
8 1 ε −5 −5ε
m+4>m>p> , < , >
ε m+4 8 m+4 8
and by adding these two lines we find a particular element of A:
−3 −5 (−3 − 5)ε
+ > =0−ε
n+1 m+4 8
thus confirming that 0 is the supremum (because no smaller number 0 − ε exceeds

all of A’s elements).
The preceding three paragraphs can be replicated ‘in the opposite direction’
to investigate left-hand edges of bounded-below sets. Here are the appropriate
definitions:
3.2.9 Definition If A is a set of real numbers and l is a particular real number then
we say that l is the minimum element of A if
• l ∈ A and
• l ≤ x for every x ∈ A
(in other words, if l is both an element of A and a lower bound of A).
Notice that many sets do not possess a minimum element: for instance,
(a, b), (0, 1) ∪ (2, 4), {−n : n ∈ N} and {0.1, 0.01, 0.001, 0.0001, · · · } do not. On
the other hand, [a, b), [0, 1) ∪ (2, 4), N and {−1, 12 , − 13 , 14 , − 15 , 16 , − 17 , · · · } do
have (fairly obvious) minimum elements. Informally, we often use terms such as
smallest, least or bottom element instead of minimum element.
3.2.10 DefinitionThe infimum of a non-empty set A of real numbers that is

bounded below means the greatest lower bound of A. It is often written briefly
as inf A.
We shall again unpack that definition a little. To say that a number t is the
infimum of a set A says two things: firstly, that t is one of A’s lower bounds and,
secondly, that no greater number can be. That is, t ≤ x for every x ∈ A but, for any
positive number ε, t + ε fails to be less than or equal to all of the elements of A, that
is, it is strictly bigger than at least one element of A. In summary, t = inf A says:
• t ≤ x for every x ∈ A, and
• for each ε > 0 there exists xε ∈ A such that t + ε > xε .
3.2.11 EXERCISE
• Show that if a set A possesses a minimum element, then this minimum element
is the infimum of A.
• Show that if inf A belongs to A as an element, then it is the minimum element

of A.
• For each interval that has a left-hand endpoint, show that the infimum is that
left-hand endpoint.
• For the set 7
2
A= + : n ∈ N, m ∈ N
n + 3 m + 12
prove that inf A = 0.
Partial solution
For instance, let us consider the first of these assertions.
Let z denote the minimum element of the set A. That is, z belongs to A as an
element, and z ≤ x for every x in A. The second of those observations tells us that
z is one of the lower bounds of A. On the other hand, for any ε > 0 we can indeed
find an element xε of A that is less than z + ε, namely xε = z. So z is the infimum
of A.
We have made the point several times that many sets do not have maximum
or minimum elements. The vital point about sups and infs7 is that, in contrast,
these virtually always exist within the real numbers – provided only that the set in
question does not ‘stretch off towards infinity or minus infinity’ and is not merely
the empty set. This is, in many ways, the most critical property of R:8
3.2.12 The completeness principle for the real number system
Every non-empty set of real numbers that is bounded above has a supremum.
Every non-empty set of real numbers that is bounded below has an infimum.
It is possible to construct the real number system within a framework of set
theory and to establish this key completeness property, but such a construction
lies outside the scope of this text so we must ask you to take it on trust at present.
When you have seen how powerful it is, you will have better reasons for going
deeper into set theory with a view to understanding such a construction.
Note once again that the sup and the inf of a set A might or might not belong
to A as elements. Note also, as a point of interest, that each of the two sentences in
our statement of the completeness principle logically implies the other,9 so we did
not really need to state both of them.
7 The official Latin plurals are suprema and infima, but it is common practice to speak of sups
and infs, and also of sup and inf in less formal discussions.
8 But not a property of Q! We shall return to this issue in 3.3.9.
9 Supposing that each non-empty bounded-above subset of R has a supremum, and that A ⊆ R
is non-empty and bounded below, put B = the set of lower bounds of A. Then B is non-empty and
bounded above (by any element of A that you choose to consider) so it possesses a supremum s.
Now it is routine to confirm that s is the infimum of A.
We’ll finish off this section with a few results that combine or compare sup and
inf of different sets.
3.2.13 Lemma Suppose that A and B are two non-empty subsets of R, each
bounded above. Let A + B mean the set {a + b : a ∈ A, b ∈ B}. Then A + B
is also bounded above, and sup(A + B) = sup A + sup B.
3.2.14 Lemma Suppose that A is a non-empty subset of R and is bounded above,

and that k is a positive real number. Let kA mean the set {ka : a ∈ A}. Then kA is
bounded above, and sup(kA) = k sup A.
Proof
Let s be a temporary symbol for sup A. We know that (for each a ∈ A) a ≤ s and
therefore ka ≤ ks, so at least ks is one of the upper bounds of the set kA. Given
ε > 0, we see that εk is also positive, therefore s − εk < a for some a ∈ A. Hence
ks − ε < ka where ka is an element of kA. This establishes ks as the supremum
of kA.
3.2.15 Lemma Suppose that A is a non-empty subset of R and is bounded above,

and that k is a negative real number. Let kA mean the set {ka : a ∈ A}. Then kA is
bounded below, and inf(kA) = k sup A.
3.2.16 EXERCISE
1. Prove 3.2.13.
2. State and prove modifications of 3.2.13 and 3.2.14 for infima instead of
suprema.
3. Prove 3.2.15.
4. State and prove a modification of 3.2.15 for a set kA where A is bounded below
(and k is negative).
5. Use these lemmata10 to determine the supremum of the set

3 4
C= − : m ∈ N, n ∈ N .
n+1 m+2
6. Determine the supremum and the infimum of the set

12 2
D= 5+ 2 − : m ∈ N, n ∈ N .
n + 1 m2 + 3m + 5
10 The official plural of the Greek word lemma is lemmata, but it is perfectly ok to use the
anglicised plural ‘lemmas’ instead.
3.3 DENSENESS 47
Partial solutions
In part (1), it is purely routine to check that sup A + sup B is an upper bound of
the set A + B.
Now if ε > 0 is given, note that ε/2 is also positive, so there are elements
a ∈ A, b ∈ B greater than sup A − ε/2 and sup B − ε/2 respectively. Combine
these observations.
For part (5), the notation set up in the lemmata lets us express the set C as
3A + (−4)B where

1 1
A= :n∈N , B= :m∈N .
n+1 m+2
It is easy to see that the biggest element (and therefore the supremum) of A is 12
and that the infimum of B is 0. Using the machinery set up by the lemmata, we
therefore find
sup C = sup(3A + (−4)B) = sup(3A) + sup(−4B)

1 3
= 3 sup A − 4 inf B = 3 − 4(0) = .
2 2
3.3 Denseness
Our main objective in this section is to establish (and to use) the fact that between
each two distinct real numbers, there is a rational number. We should begin,
however, by looking a little more closely at (what appears to be) the simplest
number system of all,11 that of the positive integers N.
Suppose that a is a particular positive integer and that b1 is another that is less
than a. Since the differences between integers have to be integers, it follows that b1
is at most a − 1. Likewise, if a > b1 > b2 (all three being positive integers) then
b2 is at most b1 − 1 and, consequently, at most a − 2. Continuing this argument,
a > b1 > b2 > b3 will guarantee that b3 ≤ a − 3 provided that all these numbers
are positive integers.
Repeating this argument a times, we find that a > b1 > b2 > b3 > · · · > ba will
guarantee that ba ≤ a − a = 0 if all the numbers involved are positive integers: but
this is impossible, since the positive integer ba cannot be ≤ 0. The contradiction
shows that no strictly decreasing succession of positive integers, starting with a,
can contain more than a terms.
This insight can be presented as a statement about sets of positive integers, as
follows:
11 Actually, N is not as simple as it appears to be. In particular, a complete logical account of the
positive integer system would need to justify the idea of carrying out some procedure an arbitrary
positive-integer-number of times, as we describe in this discussion and in the next proof. But this
is not a textbook on mathematical logic, and we shall accept some intuitive input into our view
of N, just as we did – and continue to do – concerning the real number system R.
3.3.1 Proposition Every non-empty subset of N possesses a least element.
Proof
Given non-empty A ⊆ N, suppose that A does not possess a least element – that
is, for each element of A that we look at, there will always be a smaller element of
A. Since A is not empty, we can choose an element a somewhere in A.
Since a is not the least element of A, we can find some a1 in A that is smaller.
Since a1 is not the least element of A, we can find some a2 in A smaller than a1 .
Since a2 is not the least element of A, we can find a3 in A smaller than a2 ,
and so on.
Run that argument a times, and we shall have created a strictly decreasing
succession of a + 1 positive integers beginning with a. This contradicts what we
observed above.
3.3.2 Theorem: Q is dense in R If c < d are any two distinct real numbers, then
there is a rational number q such that c < q < d.
Roughwork
The informal idea is this. We choose a positive integer n so big that n1 is smaller
than the gap d − c between c and d, and we think about all the rational fractions
whose denominator is n. These are evenly spaced out across the entire real line at
intervals of n1 , and some of them lie to the left of c, and some of them lie to the
right of d. If we imagine switching our attention from one that lies > d, step by
step toward the left with strides of length n1 until we eventually reach one that lies
< c then, because each step that we took was shorter than the gap between c and
d, one of them must have fallen into that gap. The first one that does this is the
rational q that we were looking for.
1 m k
n n n
c d
Hunting for rationals between c and d
Proof
Case 1: assume that d > 1.
Choose a positive integer n that is greater than 1/(d − c). Then n1 < d − c.
Choose next a positive integer k that is greater than nd. This ensures that nk > d,
and therefore that the set
3.3 DENSENESS 49

k
M= k∈N: ≥d
n
of positive integers is non-empty. Appealing to the above proposition, let m be the

smallest element of M. (Now the intuition is that m n is the smallest fraction with
denominator n that lies at or to the right of d, so one further step to the left will drop
us into the gap between c and d.)
It cannot be the case that m = 1: because m n ≥ d, and we assumed that
d > 1 ≥ n1 . So m − 1 is still a positive integer, but is not in M. That means
m−1
n < d. In addition,
m−1 m 1 m
= − > − (d − c) ≥ d − (d − c) = c
n n n n
m−1
that is, the rational n lies strictly between c and d.
Case 2: now suppose that d is any real number at all.

Choose a positive integer p that is greater than 1 − d. Then c + p < d + p and
d +p > 1; so, by Case 1, there is a rational number q such that c+p < q < d +p. It
follows that c < q − p < d where q − p, a rational minus an integer, is still rational.
Hence the result.
3.3.3 Note It is now easy to see that, between any two distinct real numbers c
and d, there are actually infinitely many rationals: because if not, then the finite set
Q ∩ (c, d) would have a smallest element q , and then there would be no further
rationals between c and q , contradicting denseness. (Alternatively, once we have
one rational q between c and d, then denseness says there is another q1 between c
and q, and another q2 between c and q1 , and another q3 between c and q2 , and so
on endlessly.)
3.3.4 EXERCISE
1. If a and b are rational, b = 0, and x is irrational, show that a + bx must be

irrational.
2. Given distinct real numbers c < d, show that there is an irrational number
lying between them.
Partial solution
1. Use proof by contradiction: assume that a + bx equals a rational number,
rearrange to obtain a formula for x and conclude that x is actually rational (in
contradiction to what was given).
2. Use denseness of Q twice to find rationals a and
√ b such that c < a < b < d
and then consider the number a + 12 (b − a) 2.
3.3.5 Note Now it follows (by just the same style of argument as in the previous
Note) that between any two distinct real numbers there are actually infinitely
many irrational numbers, as well as infinitely many rational numbers: both Q
and R \ Q have this denseness property. The informal mental picture we should
now be building is that, no matter how small a segment of R we look at and no
matter how high a magnification we use, we shall always see an interleaved mix
of rationals and irrationals – indeed, infinitely many of each of them. This has
important consequences for limits of sequences and for sups and infs:
3.3.6 Proposition
1. Every real number is the limit of a sequence of rationals.

2. Every real number is the limit of a sequence of irrationals.
Proof
For any x ∈ R and each n ∈ N in turn, we can use denseness to find a rational
number between x − n1 and x + n1 : call this rational number qn since it may well
depend on n. So
1 1
x − < qn < x + ,
n n
that is, |x − qn | < n1 .12 Given ε > 0, if we choose an integer n0 > 1ε , it follows that
n ≥ n0 will guarantee that |x − qn | < ε, so qn → x as n → ∞. The proof of the
second part is almost identical.
3.3.7 EXERCISE Given any real number x, show that

1. x is the limit of a sequence of rationals each of which is less than x;
2. x is the limit of a sequence of irrationals each of which is greater than x.
3.3.8 Proposition
1. Every real number x is the supremum of the set of rationals that are less than x.
2. Every real number x is the infimum of the set of rationals that are greater
than x.
3. Every real number x is the supremum of the set of irrationals that are less
than x.
4. Every real number x is the infimum of the set of irrationals that are greater
than x.
12 The next chapter will provide us with a slick and tidy way to finish the argument from that
point: see 4.1.18.
3.3 DENSENESS 51
Proof
Let A = Q ∩ (−∞, x) comprise all the rationals less than x. Certainly x is an upper
bound of that set. Also, for any ε > 0, denseness says that there is a rational q
between x − ε and x. This q belongs to A, and q > x − ε. Hence x is the supremum
of A. The other three parts are proved in just the same way.
3.3.9 Note The following statement is untrue: ‘every non-empty subset of Q that
is bounded above in Q has a least upper bound in Q’. For instance, consider the set
A of rationals whose squares are less than 2. It is clearly non-empty, and bounded
above in Q by, for example, 32 . Now suppose it did have a least upper bound λ
in Q.
√
• If λ < 2, √then we can (due to denseness) find a rational q such that
λ < q < 2 which gives q < 2. This shows that q belongs to A and yet
2
exceeds the alleged upper bound λ for A.

√ √
• If λ > 2, then we can find rational r such that 2 < r < λ. Any rational a
that is ≥ r will have a2 ≥ r2 > 2 and therefore cannot belong to A; in other
words, every element of A must be less than r. Consequently r is an upper
bound in Q for A and yet is less than the least such upper bound.
√ √
• If λ = 2 then 2 has to be rational.
All three cases have now run into contradiction, and the demonstration13 is
complete.
This outcome contrasts strongly with the completeness principle for real num-
bers, which can be expressed as ‘every non-empty subset of R that is bounded above
in R has a least upper bound in R’: in other words, Q is not complete but R is.
In many ways, this is the most important difference between how R works as a
number system and how Q does.
3.3.10 Example
√ To verify that the infimum of the set B = {q : q is rational and
q2 ≤ 2} is − 2.
Solution
√
(a) − 2 is not rational√
so it cannot belong
√ to B.
Any rational q < − 2 has |q| > 2 and q2 = |q|2 > 2, so it cannot belong
to B either. √ √
Hence every element of B must be greater than − 2, that is, − 2 is a lower
bound for B.
√
13 It is possible to revamp this argument in a way that avoids all mention of 2, and that
therefore takes place entirely within the family of rational numbers.
√
(b) If ε > 0 then ε = min{ε,√ 2} is also14
√ greater than 0 so, by denseness, there
is a rational r between − 2 and − 2 + ε .
Our choice of ε ensures that this r will be negative, and
√ √
− 2 < r < 0 ⇒ |r| < 2 ⇒ r2 = |r|2 < 2,
that is, r ∈ B.
√
Hence − 2 is inf B.
3.3.11 EXERCISE
√
• Verify that the supremum of {q : q is rational and q2 ≤ 2} is 2.
14 The step that most often puzzles the reader is the replacement of ε by ε at this point. Why
is something like this necessary at all? Because if ε were too big, the
√ next step
√ might go wrong. If,
for instance, ε were 3 then, when we chose a rational r between − 2 and − 2+ε, our otherwise
random choice of rational lies in the interval (−1.414, +1.586) (to three decimal places) and, for
all we know, r could be +1.5. This number has a square greater than 2 and therefore fails to lie
within B, destroying
√ the punch-line of our demonstration. Making sure that the ‘new epsilon’ is
no bigger than 2 guarantees that this will not happen.
.........................................................................
4 Up and down —
increasing and
decreasing sequences
.........................................................................
If a sequence converges to a limit, then it is perfectly possible that it does not do

so steadily. For instance, the sequence ( 11 , 13 , 12 , 14 , 13 , 15 , 14 , 16 , · · · ) does tend to 0 as
limit, but in a ‘big step forward, small step back’ fashion: as you scan along the
list of terms, you find that you are moving away from 0 half the time. On the
other hand, the sequence ( 11 , − 12 , 13 , − 14 , 15 , − 16 , 17 , − 18 , · · · ) converges to zero ‘in
both directions at once’, alternately over- and underestimating the eventual limit.
However, many important sequences move steadily in one direction (up or down)
and, for those that do, it is often easier to determine whether they converge or not.
The opening section of this chapter will focus on sequences of this kind.
4.1 Monotonic bounded sequences must converge

4.1.1 Definition A sequence (xn )n≥1 is called
• an increasing sequence if xn ≤ xn+1 for every n;
• a decreasing sequence if xn ≥ xn+1 for every n;
• a monotonic (or monotone) sequence if it is either increasing or decreasing.1
So these are the sequences that ‘move steadily in one direction – up or down’
as you scan along the list of terms. There are some very obvious examples, such as
1 1 1 √
the decreasing sequences n , n2 , n3 , (−n) and (− n), and the increasing

sequences (n), (n2 ) and 1 − n1 . For less transparent examples, it is often useful
to calculate the difference2 xn − xn+1 to see whether it is always positive (in
which case the sequence is decreasing) or always negative (in which case it is
increasing) or sometimes positive and sometimes negative (in which case it cannot
be monotonic).
1 Notice that, since the inequalities are non-strict (that is, they allow equality), a constant
sequence is both increasing and decreasing.
2 or sometimes – provided that the terms are positive – the ratio xn+1 /xn if that might simplify
through a lot of cancelling: the ratio will be always greater than or equal to 1 if the sequence is
increasing, but less than or equal to 1 if it is decreasing.
54 4 INCREASING AND DECREASING SEQUENCES
4.1.2 Example With

6n − 1
xn =
4n + 3
decide whether (xn ) is or is not monotonic.
Solution
In this case,
6n − 1 6(n + 1) − 1 6n − 1 6n + 5
xn − xn+1 = − = −
4n + 3 4(n + 1) + 3 4n + 3 4n + 7
which, when you bring it to a common denominator, comes to
−22
.
(4n + 3)(4n + 7)
This is obviously negative for all n, so the sequence is increasing (and thus
monotonic).
4.1.3 Example With

4 2
xn = 3 + − 2
n n
show that (xn ) is a decreasing sequence.
Solution
In the present case,

4 2 4 2
xn − xn+1 =3+ − 2 − 3+ −
n n n + 1 (n + 1)2
which, when it has all been brought to a common denominator, simplifies to
4n2 − 2
.
n2 (n + 1)2
Since n is at least 1, this expression is always positive, so (xn ) is decreasing, as

predicted.
A sequence is not, of course, the same thing as a set. For instance, sets are not
allowed to include repeated elements but sequences may; also, sequences present
their elements in a particular order but sets do not. However, every sequence
(xn )n∈N gives rise to a set, namely the set {xn : n ∈ N} of all its terms, and it is
4.1 MONOTONIC BOUNDED SEQUENCES MUST CONVERGE 55
sometimes helpful to look at the two together. For instance, this gives us an easy
way to talk about the boundedness of a sequence:
4.1.4 Definition A sequence is said to be bounded if the set of all its terms is
bounded.
Likewise, a sequence is said to be bounded above if the set of all its terms is bounded
above, and bounded below if the set of all its terms is bounded below.
In view of our earlier discussions on boundedness (paragraph 3.2.5 in particu-

lar), there are several slightly different ways to recognise when this happens:
4.1.5 Lemma A sequence (xn )n∈N is bounded if and only if one of the following
(equivalent) conditions holds:
1. Some bounded closed interval [a, b] includes xn for every n,
2. Some bounded interval of the form [−K, K] includes xn for every n,
3. There is some constant K > 0 such that |xn | < K for every n,
4. All terms of the sequence lie within some fixed distance from 0,
5. All terms of the sequence lie within some fixed distance from a number c ∈ R.
There are important connections between convergence and boundedness of a

sequence, one of which is:
4.1.6 Lemma Every convergent sequence is bounded.
Proof
Suppose that (xn )n∈N converges and that its limit is . By the definition (and
choosing ε = 1 for convenience), there is a positive integer n0 such that all the
0 lie between − 1 and + 1, that is, less than
terms of the sequence after the nth
1 unit distant from . The earlier terms x1 , x2 , x3 , · · · , xn0 −1 may well be further
away from , but there are only a finite number of them: so we can find the biggest
distance from one of them to …call it M. If we now let M = max{M, 1}, then
every xn lies within the distance M from , and so (xn )n∈N is bounded.
4.1.7 Alert The converse of this lemma is certainly not true! For instance, the
‘alternating sequence’ ((−1)n )n∈N is bounded (it lies entirely inside [−1, 1], for
example) but not convergent. However, we’ll now demonstrate a correct partial
converse, namely the result embedded in the name of this section:
4.1.8 Theorem
1. A sequence that is increasing and bounded above must converge. (The limit is
the supremum of the set of all its terms.)
2. A sequence that is decreasing and bounded below must converge. (The limit is
the infimum of the set of all its terms.)
Proof
(1) Suppose that (xn )n∈N is increasing and bounded above. The set X = {xn : n ∈ N}
is non-empty and bounded above so, by the completeness principle, its supremum
(let’s denote it by ) does exist. The definition of supremum then tells us that, if ε
is any positive number:
• xn ≤ for every n ≥ 1, and
• − ε < xm for at least one positive integer m.
Now use the fact that the sequence is increasing, and we get:
− ε < xm ≤ xm+1 ≤ xm+2 ≤ xm+3 ≤ xm+4 ≤ · · · ≤ ,
in other words, every term of the sequence from number m onwards lies between
− ε and , whence |xn − | < ε for every n ≥ m. Thus, (xn )n∈N converges to .
= supremum
x1 x2 x3 x4 x5 …
Increasing plus bounded ...
−ε +ε
x1 x2 x3 x4 x5 … xm xm+1 xm+2 …
Increasing plus bounded implies convergent
Exercise
Prove part (2) of this theorem: the proof will closely resemble what we have just
set out, but a lot of the inequalities will be the other way around.
4.1.9 Example To use the ‘bounded + monotonic’ theorem to show that the
following sequence (xn ) is convergent: where, for each positive integer n, xn is
defined to be the product of fractions
3 8 15 n2 − 1
xn = × × × ··· × .
4 9 16 n2
Solution
Since all the fractions involved here are positive, it is clear that xn > 0 for all n,
and so the sequence (xn ) is bounded below. Also, comparing the formulae for xn
(n + 1)2 − 1 1
and for xn+1 , we see that xn+1 = xn × = xn × 1 − .
(n + 1) (n + 1)2
2

1
Since the extra multiplier 1 − is positive and less than 1, it follows
(n + 1)2
that xn+1 < xn , that is, that the sequence (xn ) is decreasing. According to the last
theorem (part (2)) it must converge.
4.1.10 Example To find the infimum of the set

4 2
X = 3+ − 2 :n∈N .
n n
Solution
As pointed out already, the sequence whose nth term is the typical number in this
set is decreasing and tends to 3. By the small print of the ‘bounded + monotonic’
theorem, 3 has to be the infimum of the set.
4.1.11 EXERCISE If
2n3 − 5n2 − 4n − 2
an =
2n3 + n2
find the supremum of the set {an : n ∈ N}.
Hint
Most of the work consists in checking that the sequence (an ) is increasing. Verify
first that
2 6
1− 2 − = an .
n 2n + 1
√
4.1.12 EXERCISE Show that the sequence (n − n) is increasing but not
bounded.
4.1.13 Example Let t ∈ (0, 1) be a constant. We show that the sequence (t n )n∈N
converges to 0.
Solution
Since 0 < t < 1, all the powers of t are positive, and t n t < t n 1 = t n , that is, t n+1 < t n
for every n. So the given sequence is decreasing and bounded below by zero, and
therefore must converge to some limit . Now we need to identify .
The sequence (t n+1 )n∈N = (t 2 , t 3 , t 4 , t 5 , · · · ) is the original sequence with its
first term removed so (see earlier comment) it also converges to (the same) . Yet
also (by part (4) of the algebra of limits) t n+1 = t × t n has to converge to t. That
gives = t (since limits are unique when they exist) which, since t = 1, forces
= 0 as predicted.
4.1.14 EXERCISE Let t ∈ (1, ∞) be a constant. Show that the sequence (t n )n∈N
diverges.
Partial solution
Use proof by contradiction. If this sequence did not diverge, it would have to
converge to a limit which (as in the last example) would have to satisfy = t.
Check that the sequence is increasing, and consider the consequences for the
number .
We have delayed proving part (3) of the algebra of limits theorem until now,
because we wanted the convergent sequences are bounded theorem to help in the
demonstration. This proof is harder than most we have done so far, so we shall
first roughwork our way through what needs to be shown and how we might show
it, and then crystallise out a proper proof from that discussion.
4.1.15 Limit of a product If an → and bn → m then an bn → m.

Roughwork
Knowing only that the separate ‘error’ terms |an − | and |bn − m| can be made
as small as we wish just by taking n sufficiently large, we must show the same kind
of smallness for |an bn − m|. Unfortunately for us, an bn − m will not factorise or
simplify at all. We might, however, get somewhere if we un-simplify it by bringing
in an extra term that has something in common with each half, such as an m or bn .
Try that out:
an bn − m = an bn − an m + an m − m = an (bn − m) + (an − )m.
Because both |an − | and |bn − m| can be made really small, we could now force
each half of this to be smaller than ε2 . For (an − )m this will be easy: just insist
ε
that an − is smaller in modulus than 2m (but notice the danger of dividing by m
in the case where m = 0). For an (bn − m), however, it is not so straightforward:
insisting that bn − m be smaller (in modulus) than 2aεn will not work because that
quantity varies with n. We need to replace that an by a constant somehow, and it
needs to be a constant big enough to deal with all of the an s at once. That is exactly
what we wanted boundedness of (an ) for: find a positive constant K so that |an | is
ε
always ≤ K, then insist that bn − m shall be smaller (in modulus) than 2K , and
everything ought to work. Now let’s see if it does:
Proof
ε
Given ε > 0, use first the fact that an → to find n0 such that3 |an −| <
2|m| + 1
whenever n ≥ n0 . Next, use the fact that (an ), being convergent, must also be
3 The +1 on the bottom line has been put in purely to make sure that we are at no risk of
dividing by zero, and to avoid having to treat m = 0 as a special case.
bounded, so we can find a constant K > 0 such that |an | < K for every value
of n. Now use the convergence of bn to m to find another integer n1 such that
ε
|bn − m| < whenever n ≥ n1 .
2K
For each n ≥ max{n0 , n1 } we now have
|an bn − m| = |an bn − an m + an m − m| = |an (bn − m) + (an − )m|

≤ |an (bn − m)| + |(an − )m|
(using the triangle inequality there)

ε ε

= |an ||bn − m| + |an − ||m| ≤ K + |m|
2K 2|m| + 1
ε ε
< + = ε.
2 2
The proof is finished. (Notice that we were more careful with the modulus signs in
the final proof than we had been in the opening roughwork.)
4.1.16 EXERCISE Fill in the details in the following outline proof of part (6) of
the algebra of limits theorem. If bn → m, and neither m nor any of the bn is zero,
then
1 1
→ .
bn m
Outline proof
Notice first that
1 1 |bn − m|
− = .
bn m |m||bn |
That last expression could get into deep trouble4 if bn were to become close to zero,
|m| 3|m|
so we need to prevent that.5 Find n1 so that < |bn | < whenever n ≥ n1 .
2 2
For such values of n, check that
|bn − m| 2|bn − m|
< .
|m||bn | m2
m2 ε
Next, find n2 for which |bn − m| < whenever n ≥ n2 .
2
Put all the pieces together.
Lastly, it is now quite easy to prove part (6) from part (3) and the above: begin
by writing abnn as an × b1n and using part (3) on that product.
4 In order to make a fraction small (in modulus), we need to make its top line small but prevent
its bottom line from becoming too small.
5 Since |bn | → |m|, we can keep |bn | close enough to |m| – say, between one half of |m| and
three halves of |m| – to keep it well away from zero.
We’ll conclude this section with two more results that connect limits with
inequalities where, this time, the inequalities are between the terms of two or more
sequences rather than, as above, between the terms of a single sequence.
4.1.17 Theorem: limits across an inequality If (an ) and (bn ) are two convergent
sequences such that an ≤ bn for every n ∈ N, then lim an ≤ lim bn .
Proof
For brevity, put 1 = lim an , 2 = lim bn . If 1 were not ≤ 2 then the number
1 − 2
ε = would be strictly greater than zero. Convergence tells us that, for
2
sufficiently large n, both |an − 1 | and |bn − 2 | will be smaller than ε, that is,
1 − ε < an < 1 + ε, 2 − ε < bn < 2 + ε.
Our choice of ε, however, arranges that 1 − ε and 2 + ε are the same number
– indeed, that is precisely why we chose it so. Therefore (for large values of n like
this) bn < an , and this contradiction establishes the result.
Remarks
1. Be careful not to use this result on sequences about whose convergence you are
unsure. For instance, (−1)n is certainly less than 2 for every value of n . . . but
does this tell us that lim(−1)n ≤ 2? No, because lim(−1)n does not exist.
2. Also be aware that the strict inequality < is not preserved under limits in this
way: that is, an < bn for all n does not guarantee that lim an < lim bn . A simple
illustration of this is that − n1 is certainly strictly less than + n1 for every n, and
each tends to 0, but it would be foolish to claim that 0 < 0 as a consequence.
4.1.18 Theorem: the ‘sandwich’, or the ‘squeeze’ Of three sequences (an ), (bn ),
(cn ) suppose we know that an ≤ bn ≤ cn for all n, and also that (an ), (cn ) converge
to the same limit . Then also bn → .
Proof
Given any ε > 0 we can first use the given convergence to find positive integers
na , nc such that |an − | < ε for n ≥ na and |cn − | < ε for n ≥ nc . Then for each
n ≥ max{na , nc } we shall have both of these inequalities true at once, and so
− ε < an < + ε, − ε < cn < + ε.
Combining pieces of this display with the given inequality an ≤ bn ≤ cn we see

that:
− ε < an ≤ bn ≤ cn < + ε
(for all n ≥ max{na , nc }), which places bn between ± ε, as required.
4.1.19 Examples
1. To find the limit of the sequence whose nth term is

√
6n − 5 sin(n2 + π n)
.
2n + 3
Solution
The awkward-looking trigonometric term must lie between −5 and +5, so the
6n − 5 6n + 5
nth term here lies between and . Since each of these converges
2n + 3 2n + 3
to 3 via the algebra of limits, so must the given sequence.
2. To find the limit of the sequence whose nth term is
n2
.
n2 + 3n cos(n3 + n + 1) − 2 sin(π ln 5n − 16)
Solution
Take care with the inequalities when estimating bottom lines of fractions. We
know −1 ≤ cos θ ≤ +1 and −1 ≤ sin θ ≤ +1 no matter what (real) number
θ may be, so
n2 − 3n − 2 ≤ n2 + 3n cos(n3 + n + 1) − 2 sin(π ln 5n − 16) ≤ n2 + 3n + 2
is guaranteed. Taking reciprocals, and assuming n ≥ 4 to avoid problems with

n2 − 3n − 2 and other terms being possibly negative, we now get
1 1 1
≥ 2 ≥ 2
n2 − 3n − 2 n + 3n cos(n3 + n + 1) − 2 sin(π ln 5n − 16) n + 3n + 2
(note the reversal of the inequalities) and therefore
n2 n2 n2
≥ ≥ .
n2 − 3n − 2 n2 + 3n cos(n3 + n + 1) − 2 sin(π ln 5n − 16) n2 + 3n + 2
Since (via the algebra of limits) the first and third of these expressions
converge to 1, so must the given sequence that is squeezed between them. The
circumstance, that we ignored the first three terms, has no effect on limiting
behaviour, of course.
4.1.20 EXERCISE For each positive integer n, let:
4n2 4n2 − 1 4n2 − 2 4n2 − 3 4n2 − n

tn = + 3 + 3 + 3 + ··· + 3 .
7n3 7n + 3 7n + 6 7n + 9 7n + 3n
Investigate the limiting behaviour of the sequence (tn )n≥1 .

Partial solution
As we scan along the list of n + 1 separate fractions whose sum defines tn , the
numerators decrease and the denominators increase; consequently the largest of
these fractions is the first and the smallest is the last. Therefore
4n2 − n 4n2
(n + 1) < tn < (n + 1).
7n3 + 3n 7n3
4.2 Induction: infinite returns for finite effort

Mathematics – to make a terribly obvious point – is peculiarly full of ‘universal’
statements: statements that claim to be true not just for particular values of the
unknowns that they contain, nor even for an overwhelming majority of values, but
for all of them that, in context, make sense. Some such statements will be extremely
familiar to you; to give a few examples:
• Every positive integer greater than 1 can be expressed as a product of prime
numbers.
• Every right-angled triangle has the square on its longest side equal in area to
the total of the squares on the other two sides.
• Every quadratic equation has two (real or complex) solutions (counting by
multiplicity).
• (x − 1)(x4 + x3 + x2 + x + 1) = x5 − 1 for every real (or complex) value of x.
• For any real number x we can find a positive integer n that is bigger than x.
• For each real number t between −1 and +1 we can find θ ∈ [−π/2, π/2] such
that sin(θ) = t.
• Each bijective mapping has a unique inverse.
Universal statements are (relatively) difficult to prove, because you must provide
a proof that works for every scenario that the statement claims to work for.
(Untrue universals are, on the other hand, (relatively) easy to disprove because
you only need to find one instance that they claim to work for, but in which
they give a false result.) Sometimes we may be lucky enough to find a single
demonstration that deals with all values of the variables at once: for instance, the
fourth statement above can be verified just by multiplying out the brackets on
the left-hand side, cancelling everything you can, and looking at what is left over
(noticing incidentally that the actual value of x makes no difference); the seventh
statement can be confirmed by merely constructing what the desired inverse has
to be, and then checking that it works (and noticing that which bijective map we
started with does not affect the construction in any way). On other occasions,
though, we find that we need to break up the demonstration into a number of
4.2 INDUCTION: INFINITE RETURNS FOR FINITE EFFORT 63
cases, depending on the value(s) of the variable(s). For example, the proof we gave
of the result
lim(kxn ) = k lim xn
divided into the two cases k = 0 and k = 0 and, later, the discussion of whether
(xn ) converges or not usually splits into cases such as 0 < x < 1, −1 < x < 0, x = 0,
x = 1, x = − 1, |x| > 1 because either the result or the argument (or both) will run
differently depending on the variable’s value.
In this area, the ‘worst conceivable situation’ is that in which we appear to be
forced to consider an infinite number of cases. At the time of writing, the notorious
‘3n + 1’ problem seems to be stuck in this nightmare zone. The ‘3n + 1’ problem is
this: given a positive integer n, either divide it by 2 (if it is even) or multiply it by 3
and add 1 (if it is odd); now repeat that process on your answer, and on the answer
to that, and so on. Question: do you always get to 1 in a finite number of moves?
Here is an illustration, starting on 58:
58 → 29 → 88 → 44 → 22 → 11 → 34 → 17 → 52 → 26 → 13 → 40 →
→ 20 → 10 → 5 → 16 → 8 → 4 → 2 → 1
Well, we reached 1 that time. Does that always happen, no matter what n
we start with? Nobody knows, perhaps in part because although an enormous
number of individual initial n’s have been checked out, and many special cases
have been successfully handled, nobody has yet devised a finite list of special-
case arguments that comprehensively covers all positive integers. Warning: do not
invest a disproportionate amount of your time into exploring this problem; you
have plenty of other things to do.
(Mathematical) induction is a pattern of proof that is highly successful in estab-
lishing universal statements that are controlled by a positive integer. Its strength lies
in the fact that a (usually quite routine) demonstration along generally predictable
lines will cover all positive integer cases at once: so that when it works (which is not
always, but very frequently), it proves the truth of an infinite number of statements
all at once. Here in outline is what the pattern of proof by induction is:
• Step 0: express the result that you are trying to prove as a sequence (S(n))n∈N of
statements, where S(n) involves the typical positive integer n.
• Step 1: check that the first statement S(1) is actually true.
• Step 2: assume the truth of a particular (but unspecified) S(k).
• Step 3: deduce from this that the next statement S(k + 1) is also true.
That’s all you need to do. At that point, induction says that all of the statements
S(n) are true statements.
4.2.1 Example To show using induction that 72n−1 +52n+1 +12 is exactly divisible
by 24, for every positive integer n.
Solution
We’ll follow slavishly the pattern of proof set out above; once you are familiar with
induction, you can take some shortcuts.
• Step 0: For each n ∈ N let S(n) be the statement: 72n−1 + 52n+1 + 12 is exactly
divisible by 24.
• Step 1: S(1) says that 7 + 125 + 12 is divisible by 24. Since the total is 144, this
is indeed true.
• Step 2: Assume the truth of a particular S(k); that is, that 72k−1 + 52k+1 + 12
really is divisible by 24.
• Step 3: Now 72(k+1)−1 + 52(k+1)+1 + 12 − (72k−1 + 52k+1 + 12) simplifies to
72k−1 (49 − 1) + 52k+1 (25 − 1) which certainly is a multiple of 24 (write it as
24m, say) because 49 − 1 and 25 − 1 are. Therefore
72(k+1)−1 + 52(k+1)+1 + 12 = (72k−1 + 52k+1 + 12) + 24m
which, using Step 2, is the total of two multiples of 24, and therefore itself a
multiple of 24. In other words, S(k + 1) is also true.
By induction, all of the statements are true: that is, 72n−1 + 52n+1 + 12 is exactly
divisible by 24 for every positive integer n.
4.2.2 Example: the sum of the first n perfect squares We show that, for every
positive integer n,
n(n + 1)(2n + 1)
12 + 22 + 32 + 42 + · · · + n2 = .
6
Solution
• Step 0: For each n in turn let S(n) be the statement:
n(n + 1)(2n + 1)
12 + 22 + 32 + 42 + · · · + n2 = .
6
(1)(1 + 1)(2 + 1)
• Step 1: S(1) says that 12 = which is certainly true.
6
• Step 2: Assume the truth of a particular S(k); that is, that
k(k + 1)(2k + 1)
12 + 22 + 32 + 42 + · · · + k2 = .
6
• Step 3: Adding the next perfect square to each side will not damage the
equation, so
k(k + 1)(2k + 1)
12 + 22 + 32 + 42 + · · · + k2 + (k + 1)2 = + (k + 1)2
6
k+1 k+1 2
= (k(2k + 1) + 6(k + 1)) = 2k + k + 6k + 6
6 6
k+1 2 k+1
= 2k + 7k + 6 = ((k + 2)(2k + 3))
6 6
(k + 1)(k + 1 + 1)(2(k + 1) + 1)
= .
6
In other words, S(k + 1) is also true.

By induction, all of the statements are true.
4.2.3 Example: Bernoulli’s inequality Let x be a real number with x ≥ −1. We

show that, for every positive integer n, (1 + x)n ≥ 1 + nx.
Solution
• Step 0: For each n in turn let S(n) be the statement: (1 + x)n ≥ 1 + nx.
• Step 1: S(1) says that (1 + x)1 ≥ 1 + x which is not very interesting but
certainly true.
• Step 2: Assume the truth of a particular S(k); that is, that (1 + x)k ≥ 1 + kx.
• Step 3: Now because x ≥ −1, we know that (1 + x) is positive or zero, so it is
safe to multiply a non-strict inequality by it. Thus:
(1 + x)(1 + x)k ≥ (1 + x)(1 + kx) = 1 + x + kx + kx2
and therefore, since kx2 cannot be negative,
(1 + x)k+1 ≥ 1 + x + kx = 1 + (k + 1)x.

4.2.4 Example If n + 1 distinct straight lines are drawn on a plane surface, we

show that they cannot have more than n(n + 1)/2 crossing points.
Solution
• Step 0: For each value of n in turn let S(n) be the statement: n + 1 distinct
straight lines can’t cross at more than n(n + 1)/2 points.
• Step 1: S(1) says that 2 distinct straight lines cannot cross at more than
1(2)/2 = 1 point, which is a simple geometric truth.
• Step 2: Assume the truth of a particular S(k); that is, that k + 1 such lines have
at most k(k + 1)/2 crossing points.
• Step 3: If we are now given (k + 1) + 1 = k + 2 such lines, imagine looking at
the first k + 1 of them. By step 2, those lines have at most k(k + 1)/2 crossing
points. Imagine we now draw in the last (the (k + 2)th ) line: it hits each of the
previous k + 1 lines at most once, so the total number of crossing points now is,
at the most, k(k + 1)/2 + (k + 1). This rearranges easily as (k + 1)(k + 2)/2,
which is the same as (k + 1)((k + 1) + 1)/2. In other words, S(k + 1) is also true.
4.2.5 EXERCISES
1. Show by induction that, for every positive integer n:
n2 (n + 1)2
13 + 23 + 33 + · · · + n3 = .
4
2. Use induction to verify that, whenever n is a positive integer:
n(n + 1)(n + 2)(n + 3)

1×2×3+2×3×4+3×4×5+· · ·+n(n+1)(n+2) = .
4
3. Verify that 12n − 23n + 34n is divisible by 22 for all positive integer values of n.
4.2.6 Comments
• It’s a very common experience, when you first meet induction, to feel that it is
cheating in some sense! It can seem that, instead of proving the wished-for
result properly, you are just assuming (at step 2) that it is true already. This is
not, however, what is going on. The desired result is something that claims to
work for all positive integers and, in that phrase, the most important word is
all. At step 2, what we are assuming is definitely not that the relevant statement
is true for all positive integers but merely for one particular positive integer. This
is perfectly reasonable: in fact, at step 1 we already confirmed that it actually is
true for n = 1, so there is nothing outrageous or illogical about supposing that
it might be true for some (other) values. This is all that we are doing at step 2.
• Step 3 is the only part of the argument at which you usually have to pause and
think a bit. The question to be pondered is: how am I going to turn statement
number k into statement number k + 1? – and there is no all-purpose answer: it
will depend on what these statements are trying to say. Looking back at our
four little case studies above, in the first one the two lumps of algebra were very
similar in appearance, and it was a reasonable guess that if we subtracted them
we might see in a convenient form what the difference was. For the second,
adding in the next perfect square was the natural way to trade up from the sum
of k squares to the sum of k + 1 of them. In the third, powers of (1 + x) were
the essential ingredient, and we had to think how to turn (1 + x)k into
(1 + x)k+1 . . . to which the simple answer, once the question is posed, is:
multiply by another (1 + x), and first ask yourself whether it is actually safe to
do that to an inequality. In the fourth example, how can we predict the
behaviour of k + 2 straight lines when we already know only how k + 1 lines
behave? How else, apart from keeping one line aside for the moment, letting the
remaining k + 1 lines do what we know they can, and then bringing in the last
line to see how it might interact with the others? In many cases, then, there is a
kind of inevitability about how you trade up from statement k to statement
k + 1, but you may need to look quite carefully at those statements before you
see what it is.
• As to why induction is a valid method of proof, it may help if you imagine the
various component statements S(1), S(2), S(3), . . . stacked one above the
previous one, like the rungs of an (endless) ladder ‘heading off to infinity’. By
checking the truth of S(1) you are, almost literally, getting your foot on the
bottom rung of the ladder – testing that it is strong enough to take the weight
of careful inspection. The main part of the induction process, then, the
demonstration that S(k + 1) follows as a logical deduction from S(k), says that
you can always climb from any ‘sound’ rung to the one above it. So, start
climbing: the first rung is strong/valid/true, therefore the one above (that is, the
second rung) is also. From that observation, it follows that the one above that
(the third rung) is equally sound. From that, so is the fourth. From that, so is
the fifth . . . when is this process going to stop? Never! The way in which the
positive integers are naturally ordered is that any particular one of them can be
reached in a finite number of steps starting at 1 and increasing by 1 each time.
For that reason, any particular S(n) is accessible by the process set out in the
induction template, and must therefore be a true statement.
• If you really want to understand why induction works, think back to paragraph
3.3.1 (every non-empty set of positive integers possesses a least element). Once
we know that S(1) is true and that S(k) implies S(k + 1) for each k ≥ 1, then
suppose that some of the S(n)’s are not true: that is, that the set
W = {n ∈ N : S(n) is false}
is not empty. By 3.3.1, W has to possess a least element w. Now w cannot be 1

since S(1) is known to be true, so w − 1 is still a positive integer and it is strictly
smaller than the smallest element of W: therefore w − 1 is not in W, which tells
us that S(w − 1) must be a true statement. Yet since S(w − 1) implies S(w), this
guarantees that S(w) is also true, which contradicts w being an element of W. In
consequence of that contradiction, none of the S(n)s can have been false.
• Occasionally, n = 1 is not the best place at which to start an induction
argument. Just as in the case of (other) sequences, it can sometimes be
convenient to begin at n = 0, or at n = 2, or at some other initial value. In fact,
our third case study above would have been slightly easier to read if we had
expressed it as: ‘If n distinct straight lines are drawn on a plane surface, where
n ≥ 2, show that they cannot have more than n(n − 1)/2 crossing points’. The
only change in procedure that such a twist requires is that Step 1 ought to alter
to become: ‘Step 1: check that the statement S(n) with the lowest possible value
of n is actually true ’. (Plus, of course, that we keep in mind that the core
argument S(k) implies S(k + 1) only needs to work for the realistic values of k.)
We shall finish the section by doing two further examples: one that illustrates
how to carry out these minor changes when ‘n = 1’ is not the right starting
point, and one that observes that what we called Step 0 can on occasions be a
little tricky to get right.
4.2.7 Example For each integer n ≥ 9, we show that n! > 4n .
Solution
• Step 0: For each n = 9, 10, 11, 12, · · · let S(n) be the statement: n! > 4n .
• Step 1: S(9) says that 9! > 49 . A little calculation shows that the left-hand side is
362880 and the right-hand side is 262144, so this statement is correct.
• Step 2: Assume the truth of a particular S(k): that is, that k! > 4k .
• Step 3: (In order to turn k! into the expected left-hand side (k + 1)! of statement
k + 1, we need to multiply by k + 1, which is of course, positive . . . indeed, it is
at least 10 since k ≥ 9.)
k! > 4k ⇒ (k + 1)k! > (k + 1)4k ≥ (10)4k > (4)4k = 4k+1
in other words, S(k + 1) is also true.

By induction, all of the statements are true from n = 9 onwards.
4.2.8 Example To verify that every integer ≥ 2 can be expressed as a product of

prime factors.
Attempted solution
• Step 0: For each integer n ≥ 2 let S(n) be the statement ‘n can be expressed as a
product of primes’. (That’s the obvious way to break the claimed result into
layers, isn’t it?)
• Step 1: S(2) says that 2 is a product of primes; but 2 is itself a prime, so this
statement is vacuously true: 2 = 2 gives the prime factorisation of 2.
• Step 2: Assume the truth of a particular S(k); that is, k can be expressed as a
product of primes.
• Step 3: Suddenly we hit a snag. There is no evident way to get from the prime
factors of k to the prime factors of k + 1. Indeed, no prime factor of k can
possibly divide into k + 1 since they differ by 1.
However, we can re-word Step 0 and try again:
Reattempted solution
• Step 0: For each integer n ≥ 2 let S(n) be the statement ‘each integer from 2 up
to n can be expressed as a product of primes’. (If we can prove all of those true,
we shall have what we want.)
• Step 1: S(2) says just that 2 is a product of primes; but, as before, this statement
is vacuously true: 2 = 2 gives the prime factorisation of 2.
• Step 2: Assume the truth of a particular S(k); that is, that each integer from 2 up
to k can be expressed as a product of primes.
• Step 3: Now with a view to S(k + 1), we know from Step 2 that each integer
from 2 to k can be prime-factorised, and we only still need to look at k + 1. If
k + 1 happens to be prime, there is no need to do anything: k + 1 = k + 1 is a
trivial prime factorisation. Otherwise, k + 1 is not prime and (by definition of
prime) can be written as the product of two smaller numbers, say, k + 1 = a.b
where a, b are at least 2 but less than k + 1. Yet then Step 2 tells us that each of
a and b can be written as the product of a list of primes and, putting the two lists
together, we have a prime factorisation of a.b = k + 1. So S(k + 1) is confirmed.
By induction, all the statements S(n) are true, and so all integers from 2 upwards
can be prime-factorised.
4.2.9 EXERCISES
1. Verify that n < 2n for every positive integer n.

2. Show that, for each integer n ≥ 3:
3(2n)! < 22n (n!)2 .
3. Provided that n ≥ 4, show that 3n > n3 .

In roughwork for the last of these three problems, the step from truth for k to truth
for k + 1 amounts to showing that 3k3 > (k + 1)3 , that is, that 3 > (1 + 1k )3 . Since
k is at least 4, the right-hand side here is ≤ (1.25)3 which satisfactorily calculates
out at a little under 2.
4.2.10 Note: binomial coefficients It may be useful to round off this section by
revising the binomial theorem and the coefficients that appear in connection with
it. Whenever n is a positive integer and k is an integer such that 0 ≤ k ≤ n, the
symbol
n n!
=
k k!(n − k)!
is called a binomial coefficient. It is, amongst other interpretations, the number
of different possible selections of k objects that can be chosen6 from n dis-
tinct
n objects.
n Straightforward
n n calculations confirm that (for all relevant n and k)
n
0 = n = 1, 1 = n, n−k = k and, most importantly:
6 This is why it is usually pronounced as ‘n choose k ’.


n n n+1
+ = .
k−1 k k
We shall, on several occasions, make use of the theorem itself:
4.2.11 The binomial theorem For each n ∈ N,
n
n n k
(1 + x) = x .
k
k=0
Proof
• Step 0: For each n ∈ N in turn, let S(n) denote the statement:
(1 + x)n = n0 nk xk .

• Step 1: S(1) says that (1 + x)1 = 10 + 11 x which is trivially correct.
j
• Step 2: Assume the truth of a particular S(j); that is, that (1 + x)j = 0 kj xk .
(Notice that we are, as usual, taking care not to use the same symbol with more
than one meaning.)
• Step 3: In order to turn (1 + x)j into the expected left-hand side (1 + x)j+1 of
statement number j + 1, we need to multiply by another (1 + x) and carefully
gather up7 each power of x that appears:
j
j k
(1 + x)j+1 = (1 + x) x
k
k=0

j 1 j 2 j 3 j
= (1 + x) 1 + x + x + x + ··· + xj−1 + xj
1 2 3 j−1

j j j j j j
=1+ + 1 (x1 ) + + (x2 ) + + (x3 ) + · · · + 1 + (xj ) + xj+1
1 2 1 3 2 j−1

j j 1 j j 2 j j 3 j j
=1+ + (x ) + + (x ) + + (x ) + · · · + + (x ) + xj+1
j
1 0 2 1 3 2 j j−1

j+1 1 j+1 2 j+1 3 j+1 j
=1+ x + x + x + ··· + x + xj+1
1 2 3 j
j+1
j+1 k
= x
k
k=0
– using the identity from 4.2.10 at the last-but-one line. In other words,
S(k + 1) is also true.
7 We are using overscoring as another form of bracketing, to try to improve readability in these
few lines of algebra.
4.3 RECURSIVELY DEFINED SEQUENCES 71
4.3 Recursively defined sequences

Up to this point, every individual sequence that we have worked on has been
specified by writing down either a formula for its nth term or a list whose pattern
was obvious enough that we could find such a formula easily. Not all important
sequences work like this: for instance, although the pattern in the Fibonacci
sequence 1, 1, 2, 3, 5, 8, 13, 21, 34, · · · is clear enough, it is far from obvious how
to obtain an explicit formula for its nth term; rather, the pattern is expressed by
noting that each term from the third onwards is the sum of the preceding two:
fn = fn−1 + fn−2 provided that n ≥ 3
and, to complete the definition, f1 = 1, f2 = 1. This is an instance of what is called

recursive definition.
Another instance that you are likely to have met is the Newton – Raphson
approximation process. In this, faced with an equation of the form f (x) = 0 that
we cannot solve precisely, but in which the function f can be differentiated (see
Chapter 12 if this idea is unfamiliar), we make an initial rough guess x = x1 at the
solution, perhaps via a sketch graph, and then improve that guess to a second one:
f (x1 )
x2 = x1 − (where f denotes the derivative).
f (x1 )
f(x)
desired solution
f(x1) P(x1, f(x1))
x2 x1 x
Tangent to curve at P crosses x-axis at x2
In general, x2 will be significantly closer to the true solution than x1 was, and we
can now repeat the improvement process
f (x2 )
x3 = x2 − ,
f (x2 )
f (x3 )
x4 = x3 − ,
f (x3 )
f (x4 )
x5 = x4 −
f (x4 )
and so on. In most cases this sequence of improving approximations converges to

a limit whose exact value is a solution to the original equation. The definition of
this sequence
f (xn )
xn+1 = xn − ,
f (xn )
together with a random (but not grossly inaccurate) guess value for x1 , is another
example of recursive definition.
The essential characteristic of this style of definition is that, instead of telling us
explicitly what the nth term is, it provides us with an algorithm for determining it
from the value(s) of one or more previous terms in the list. In time, that means that
we could in principle work out any particular term that was of interest to us …but
how could we hope to find the limit of such a sequence if, indeed, it had one?
4.3.1 Example To identify, if it exists, the limit of the sequence (xn )n≥1 defined
recursively by: √
x1 = 12; xn+1 = 3xn + 28 (n ≥ 1).
Draft solution
With so little given information, we need to start by calculating the first few terms.
They work out (to four decimal places) as follows:
12, 8, 7.2111, 7.0451, 7.0097, 7.0021, 7.0004, · · ·
It must be stressed that this is very little evidence as to what happens as n goes to
infinity! Nevertheless, it is enough to let us make a clutch of informed guesses: we
guess that all the terms lie between 12 and 7, that the sequence is decreasing, and
that the limit is 7. Now we have a definite proposal to try to establish:
Solution
• Step 0: For each n ≥ 1 let S(n) be the statement: 7 < xn ≤ 12.
• Step 1: S(1) says that 7 < 12 ≤ 12, which is true.
• Step 2: Assume the truth of a particular S(k); that is, that 7 < xk ≤ 12.
• Step 3: (How are we to get from xk to the next term? As the recursive definition
told us, we multiply by 3, add 28, and take the square root. So:)

21 < 3xk ≤ 36, 49 < 3xk + 28 ≤ 64, 7< 3xk + 28 = xk+1 ≤ 8 ≤ 12.

By induction, all of the statements S(n) are true, that is, all terms of the sequence
do lie between 7 exclusive and 12 inclusive.
(Next, to compare xn with xn+1 , the square root makes it less easy to see what
2 .
is going on, so let us instead square both sides and actually compare xn2 with xn+1
4.3 RECURSIVELY DEFINED SEQUENCES 73
√ √
Keep in mind that 0 <a < b implies that a < b, as we pointed out in paragraph
1.2; we shall use this several times in the next few pages.)
Notice that
xn2 − xn+1
2
= xn2 − 3xn − 28 = (xn − 7)(xn + 4)
and because we already know that all terms lie between 7 and 12, that is a product of
two positive numbers, therefore positive itself. So xn2 > xn+12 and, taking (positive)
square roots, we get xn > xn+1 for all values of n, that is, the sequence is decreasing.
We now know that (xn ) is both bounded and decreasing, and must therefore
converge to some limit . Also (xn+1 ), being merely the sequence (xn ) without its
first term, converges to the same , and (xn2 ) converges to 2 by algebra of limits.
Take limits across the equation
2
xn+1 − 3xn − 28 = 0 for all n
and we get the quadratic 2 − 3 − 28 = 0 which factorises predictably into

( − 7)( + 4) = 0. Thus, the only conceivable values of are 7 and −4. Yet
7 < xn ≤ 12 (for all n) informs us8 that 7 ≤ ≤ 12, so −4 is really not a
possible value, and only 7 remains. We conclude that (xn ) must indeed converge
to the limit 7.
We’ll do a second example, but this time cutting the ‘running commentary’
back to the extent that is normally expected in a solution.
4.3.2 Example For the sequence (an ) specified by the formulae a1 = 0, an+1 =
√
4an + 77:
1. Show that 0 ≤ an < 11 for every n ≥ 1,
2. Show that the sequence is increasing,
3. Explain why it must possess a limit, and evaluate that limit.
Solution
• Step 0: For each n let S(n) be the statement: 0 ≤ an < 11.
• Step 1: S(1) says 0 ≤ 0 < 11 which is true.
• Step 2: Assume that 0 ≤ ak < 11 for some particular k.
3: Then 4(0) + 77 ≤ 4ak + 77√
• Step √ < 4(11) + 77, and so
√
0 ≤ 77 ≤ 4ak + 77 = ak+1 < 121 = 11. Therefore S(k + 1) is also true.
By induction, all the statements S(n) are true. This proves (1).
8 See taking limits across an inequality: Theorem 4.1.17.

Using (1), a2n − a2n+1 = a2n − 4an − 77 = (an + 7)(an − 11) is the product of a
positive and a negative, therefore negative. That is, a2n < a2n+1 and so an < an+1 for
each n. This proves (2).
Since (an ) is now bounded and increasing, it must converge. Let be its limit.
Also an+1 → . Taking limits across a2n+1 − 4an − 77 = 0 (for all n) gives
2 − 4 − 77 = ( + 7)( − 11) = 0, so can only be −7 or 11. Yet = −7
is impossible because every term is at least 0. Therefore an → 11.
4.3.3 EXERCISE For the sequence (an ) specified by the formulae a1 = 16,
√
an+1 = 20 + an :
1. Show that 5 < an ≤ 16 for every n ≥ 1,
2. Show that the sequence is decreasing,
4.3.4 EXERCISE Find (if it exists) the limit of the indicated sequence:

√ √ √ √
5, 5 + 5, 5 + 5 + 5, 5 + 5 + 5 + 5, · · · .
Draft solution
We first need a clearer idea of what this sequence is. The pattern is that, for each
term in turn, the next one is created by
√adding 5 and taking the square root. In other
√
words, an+1 = 5 + an . Also a1 = 5 to get the process started. This flags up the
recursive nature of the problem. Calculate the first few terms and they seem to be
increasing towards a limit of approximately 2.791. What number could that be?
If there is indeed a limit then (an+1 ) will also converge to √so, from
a2n+1 = 5 + an we get 2 = 5 + . This quadratic has solutions (1 ± 21)/2 =
2.7913, −1.7913 approximately, so now we can see exactly what number that limit
is going to be.
Break the problem into the same three sections as before:
√ √
1. Show that 5 ≤ an < (1 + 21)/2 for every n ≥ 1,
2. Show that the sequence is increasing,
4.3.5 EXERCISE Investigate the following sequence:

4, 3(4) − 2, 3( 3(4) − 2) − 2, 3( 3( 3(4) − 2) − 2) − 2,

3( 3( 3( 3(4) − 2) − 2) − 2) − 2, · · · .
4.4 POSTSCRIPT: THE EPSILONTICS GAME 75
4.3.6 EXERCISE Investigate the following sequence:

1.5, 3(1.5) − 2, 3( 3(1.5) − 2) − 2, 3( 3( 3(1.5) − 2) − 2) − 2,

3( 3( 3( 3(1.5) − 2) − 2) − 2) − 2, · · · .
4.3.7 EXERCISE
1. The sequence (cn )n∈N is defined recursively by the two formulae

2 + 2cn
c1 = 2, cn+1 = (while n ∈ N).
3 + cn
Show that
• 1 < cn ≤ 2 for all n ∈ N,
• (cn )n∈N is a decreasing sequence,
• (cn )n∈N converges: and determine what its limit is.
2. Alter part 1 by changing only the first term, to c1 = 2/3, and re-work the
problem (changing whatever has to be changed).
4.4 POSTSCRIPT: The epsilontics game —

the ‘fifth factor of difficulty’
You have now seen two versions of the ‘challenge-response game’ that is endemic
in analytic arguments and that, for many students, contributes to the perception
that analysis is difficult. Look again at the definition (2.7.1) of limit of a sequence,
and the standard way (3.2.7) of identifying the supremum of a set:
For each ε > 0 there is some positive integer nε such that · · ·
For each ε > 0 there exists xε ∈ A such that · · ·
In most cases, the actual mathematical calculation of finding or estimating nε or
xε ∈ A is pretty routine. The difficulty, such as it is, lies in the English words ‘For
each’, ‘there is’, ‘there exists’ or in the logic that is represented by them.
It is important to bear in mind that each argument such as these is, in some
sense, a game – a competition. Our imaginary opponent puts forward a value of
ε as a challenge. In order to win the point, we have to come up with a response
(nε , xε , …) that satisfies the requirements of the game. A complete argument (a
‘proof ’, a ‘solution’) is then simply a winning strategy: a procedure that will find a
winning response for every legitimate challenge – ‘for every positive epsilon’.
Having worked through the examples and exercises in the first few chapters, you
should by now be reasonably acclimatised to the interactive nature of arguments
of this kind, but they do pose an initial barrier to the casual reader.
We shall encounter several more variations of this ‘challenge-response’ epsilon-
tics game as we proceed.
.........................................................................
5 Sampling a sequence
— subsequences
.........................................................................
5.1 Introduction
If we sample an (infinite) sequence by picking out a finite number of its terms,
it would be unreasonable to expect such a sample to be at all ‘representative’ in
the sense of telling us anything useful about the sequence as a whole. Certainly it
will not tell us anything about its possible limit: for we have pointed out several
times that changing or deleting any finite number of terms does not affect limiting
behaviour in any way. What if, instead, we sample an infinite selection of terms?
Then at least our selection (in the original order) will constitute a sequence in its
own right, and we might reasonably expect that its behaviour will tell us something,
but not everything, about the original sequence; on the other hand, knowledge
about the whole sequence is likely to give us all the information we might need
about the newly selected one. This short chapter gives a more precise description
of the idea that we are sketching here, and develops and applies a few rather
predictable results plus one that is less obvious and much more powerful (the
Bolzano-Weierstrass theorem) and which will play a key role later in the text.
5.2 Subsequences
Informally, a subsequence of a sequence is an endless list of its terms in their
original order. So if (xn )n≥1 is any sequence, and we imagine its terms strung out
in an unending list
x1 , x2 , x3 , x4 , x5 , x6 , x7 , x8 , x9 , x10 , · · ·
then a subsequence will be created by scanning along that list and lifting out an
unending selection of what we find; for instance,
x7 , x16 , x21 , x22 , x39 , x122 , · · ·
or
x4 , x8 , x13 , x400 , x401 , x605 , x677 , x759 , · · · .
78 5 SAMPLING A SEQUENCE — SUBSEQUENCES
How should we best write a general subsequence when we don’t know in advance
which terms are to be picked out? We could relabel the first item chosen as y1
and the second chosen as y2 and the third as y3 and so on, and the resulting
symbol (yn )n∈N is certainly a perfectly good notation for a sequence, but it has lost
any visible connection with the original sequence that we started with. A better
method is to give labels to the places in the original sequence where we find the
subsequence’s items: if our first choice occurred at the nth1 place in the original list,
and our second at the nth2 place, and our third at the n th place and so on, then the
3
chosen numbers that build up the subsequence are
xn1 , xn2 , xn3 , xn4 , · · ·
and the entire subsequence can now be written as (xnk )k≥1 . This is a slightly
cluttered symbol, but it succeeds in capturing two of the important aspects of a
subsequence: that it actually is an infinite sequence, and that its terms are some of
the terms from the original (xn ). The third important aspect – that the order has
to be the same as was in the original (no back-tracking allowed) – is captured by
insisting that n1 < n2 < n3 < n4 < · · · , in other words, that the sequence of labels
(n1 , n2 , n3 , n4 , · · · ) has to be strictly increasing.
With that discussion behind us, we can now write down concisely what a
subsequence is (and how to denote it):
5.2.1 Definition Suppose we are given a sequence (xn )n≥1 .

For any strictly increasing sequence (nk )k≥1 of positive integers –
that is, n1 < n2 < n3 < · · · < nk < · · · –
the sequence (xnk )k ≥ 1 = (xn1 , xn2 , xn3 , · · · , xnk , · · · ) is a subsequence of it.
Notice how we used different symbols, in the sequence and in the subsequence,
for ‘the thing that is going to infinity’. Indeed, we must be careful not to use a symbol
with two meanings at once anywhere in mathematics, and especially not when
working with subsequences: if we had denoted our subsequence by such a notation
as (xnn )n≥1 then confusion would be practically guaranteed. Of course there is no
need to use the particular letter k that we chose here: (xni )i≥1 or (xnp )p≥1 would
have done equally well.
5.2.2 Examples
1. For any sequence (xn )n≥1 , the following are a few of its subsequences: the
sequence of even-numbered terms (x2k )k≥1 , the sequence of odd-numbered
terms (x2k−1 )k≥1 , the sequence (xk2 )k≥1 = (x1 , x4 , x9 , x16 , x25 , · · · ), the
sequence (xk! )k≥1 = (x1 , x2 , x6 , x24 , x120 , · · · ), the sequence
(x2k )k≥1 = (x2 , x4 , x8 , x16 , x32 , · · · ).
2. In particular,
(5k − 1) is a subsequence
of (n), (k−2 ) is a subsequence of (n−1 ),
1 1
is a subsequence of n , (sin(7k + 3k + 6)) is a subsequence of (sin n),
2
√
k! √
( 3k) is a subsequence of ( n n).
3k
5.2 SUBSEQUENCES 79
3. The
sequence

1 1 1 1 1
(−1)n 2 + = −2 − , 2 + , −2 − , 2 + , · · · does not
n n∈N 1 2 3 4
converge, but two of its subsequences (those consisting of the odd terms and
the even terms) are

1 1 1 1
−2 − , −2 − , −2 − , −2 − , · · ·
1 3 5 7

2k−1 1 1
= (−1) 2+ = − 2+
2k − 1 k∈N 2k − 1 k∈N
and

1 1 1 1
2 + ,2 + ,2 + ,2 + ,···
2 4 6 8

2k 1 1
= (−1) 2+ = 2+
2k k∈N 2k k∈N
both of which do converge (to −2 and to 2 respectively).

4. The sequence (n(1 + (−1)n ))n∈N = (0, 4, 0, 8, 0, 12, 0, 16, · · · ) is unbounded,
but at least one of its subsequences (that consisting of the odd terms) is not
only bounded but convergent; indeed, it is constant at 0. Another of its
subsequences is (2k(1 + (−1)2k )) = (4k) = (4, 8, 12, 16, · · · ) which is
unbounded.
5. The sequence (sin(nπ/2)) is the ‘endlessly recycling’ sequence
(1, 0, −1, 0, 1, 0, −1, 0, 1, · · · )
from which we could evidently extract a subsequence converging to 1, another

converging to −1 and a third converging to 0 (as well as many others that do
not converge).
6. Any subsequence of a subsequence of (xn )n≥1 is itself a subsequence of
(xn )n≥1 .
Let us formalise the insight that convergence of the whole sequence implies
convergence of every subsequence:
5.2.3 Theorem Each subsequence of a convergent sequence converges, and to the

same limit as does the original sequence.
Proof
Suppose that xn → and that (xnk )k≥1 is a subsequence of (xn )n≥1 . We need to
show that xnk → (as k → ∞).
Notice that, because n1 < n2 < n3 < · · · < nk < · · · , we get
n1 ≥ 1, n2 ≥ 2, n3 ≥ 3, · · ·
and, in general,
nk ≥ k.
If ε > 0 is given, xn → tells us that there is a positive integer n0 such that n ≥ n0
forces |xn −| to be < ε. Now notice that k ≥ n0 forces nk ≥ k ≥ n0 , that is, nk ≥ n0 ,
from which we get |xnk − | < ε. (Less formally, big enough values of k guarantee
that the error |xnk − | will be smaller than ε.) Hence xnk → , as required.

5.2.4 Example To show that the sequence (−1)n 2 + n1 n∈N is divergent.
Solution
In 5.2.2 (3) we noticed that this sequence has a subsequence whose limit is −2 and
another whose limit is 2. By the theorem, this could not happen if the full sequence
converged. Hence the result.
Here is a kind of weak converse to Theorem 5.2.3:
5.2.5 Example Suppose we are given a sequence (xn )n ≥ 1 and a number such
that the subsequence of odd-numbered terms (x2k−1 )k ≥ 1 and the subsequence of
even-numbered terms (x2k )k ≥ 1 both converge to . To show that (xn )n ≥ 1 itself
converges to .
Solution
Given ε > 0, the convergence of the two subsequences tells us that there are positive
integers ko and ke such that
|x2k−1 − | < ε whenever k ≥ ko , and

|x2k − | < ε whenever k ≥ ke .
Put n0 = max{2ko − 1, 2ke }. It follows that when n ≥ n0 (whether n is odd or

even) we have |xn − | < ε. Hence xn → .
5.2.6 HARDER EXERCISE It is not difficult to modify 5.2.5 to show that, if we

break up a sequence into three, or four, or indeed any finite number of subse-
quences (using all of the terms) and find that all these subsequences converge to the
same limit, then so must the original sequence. (Suggestion: induction.) However,
this does not work if we break it into an infinite number of subsequences: see if
you can devise a sequence (yn )n≥1 such that
• for each prime number p, the subsequence (y(pk ) )k≥1 converges to zero, and
• all of the terms yn for which n is not a power of a prime equal zero, and yet
• the whole sequence (yn )n≥1 does not converge to zero.
5.2 SUBSEQUENCES 81
5.2.7 EXERCISE Prove that

• each subsequence of an increasing sequence is increasing,
• each subsequence of a decreasing sequence is decreasing.
Partial proof
If (an )n∈N is increasing and (ank )k∈N is one of its subsequences then, for each k, we
have nk < nk+1 . Fill in the integers that lie between them, and we see
nk < nk + 1 < nk + 2 < · · · < nk+1 − 1 < nk+1 .
Since the original sequence was increasing, that yields
ank ≤ ank +1 ≤ ank +2 ≤ · · · ≤ ank+1 −1 ≤ ank+1 ,
so ank ≤ ank+1 .
Expressing the last exercise briefly as ‘sequence is monotonic implies subse-
quence is monotonic’, it would be foolish to expect anything like a full converse
(along the lines of ‘subsequence is monotonic implies sequence is monotonic’)
since a single subsequence simply does not contain enough information about the
parent sequence for such a conclusion to be at all plausible. All the same, it would
not be unreasonable of us to expect some kind of partial converse, in which the
monotonicity of a subsequence told us something about the ordering of the terms
of the entire sequence. It is therefore a little surprising to see, from the next result,
that possession of a monotonic subsequence tells us absolutely nothing about the
parent sequence
5.2.8 Theorem Every sequence has a monotonic subsequence.
Proof
Let (xn )n∈N be any sequence. To help us (almost literally) see through this curious
proof, let us call a positive integer m farsighted if xm is greater than all the later
terms in the sequence, that is, if xm > xq for every q > m. (Imagine that you are
standing on xm and trying to see off to infinity over the heads of all the later xq ’s in
the sequence; if you can do that, then the m that you are using is farsighted.) Now
either there are infinitely many farsighted integers, or else there are only finitely
many (perhaps even none).
In the first case, there is an (endless) succession m1 < m2 < m3 < · · · of far-
sighted integers, and by definition of farsighted we get
xm1 > xm2 > xm3 > · · ·
that is, a decreasing subsequence of (xn )n∈N .

In the second case we can count through to an integer r that is bigger than every
farsighted integer and know that, from r onwards, no integer is farsighted, that
is, for every integer s we encounter, there will be a greater integer s for which
xs ≤ xs . (Every xs has its view of infinity obstructed by some later xs , so to speak.)
This yields firstly xr ≤ xr1 for some r1 > r, and then in turn xr1 ≤ xr2 for some
r2 > r1 , xr2 ≤ xr3 for some r3 > r2 and so on without end. Look: we are forming
an increasing subsequence
xr ≤ xr1 ≤ xr2 ≤ xr3 ≤ · · ·
this time. The demonstration is complete.
5.2.9 EXERCISES Find the limit of each of the following sequences:

0.9n +5n+17 ,
2
1.

2. 1
,
n3

3. (5n4 + 2n2 − 1)3 − 1
.
3(5n4 + 2n2 − 1)3 + 2
Draft solution
In each case, it will be enough to recognise the given sequence as a subse-
quence of some (simpler) sequence whose limit you already know or can easily
calculate.
5.2.10 EXERCISES Show that the following sequences are divergent:

1.
3 + 4n
(−1)n .
2+n n∈N
nπ nπ
2. 5 sin + 3 cos .
7 11 n∈N
3. The sequence (1/pmax(n))n≥2 given by pmax(n) = the largest prime factor of

n (assuming n ≥ 2).
Fragment of solution
In the second of these, you may save yourself a good deal of trigonometry by
noticing the value of the nth term when n is an odd number times 77, and again
when n is an even number times 77. (The idea of focusing on 77 is driven simply
by a wish to avoid fractions if we can reasonably do so without losing too much
information.)
5.3 BOLZANO-WEIERSTRASS: THE OVERCROWDED INTERVAL 83
5.3 Bolzano-Weierstrass: the overcrowded interval

There is an ancient mathematical insight sometimes called the pigeonhole princi-
ple: if you put n + 1 pigeons into n pigeonholes, then at least one pigeonhole will
contain more than one pigeon. (More broadly, if you put kn + 1 or more letters
into n mailboxes, where k and n are positive integers, then at least one mailbox
must contain k + 1 or more letters.) The Bolzano-Weierstrass result starts out as
an infinite version of this: if you distribute an infinite number of pigeons across
two pigeonholes, then at least one pigeonhole will end up containing an infinite
number of pigeons.
Let’s re-express that in terms of sequences. If all the terms of a sequence lie within
the union A ∪ B of two sets A, B of real numbers, then at least one of the two sets
must include an infinite number of those terms.1 Formally: if (xn )n∈N is a sequence
of elements of A ∪ B, then either A or B (or both) will include the number xn for
infinitely many values of n.
Hopefully, that sounds like little more than common sense …and yet it has an
un-obvious and powerful consequence:
5.3.1 The Bolzano-Weierstrass theorem Every bounded sequence has a conver-

gent subsequence.
Proof
Let (xn )n∈N be any bounded sequence. By its boundedness, we can find a closed
interval I0 = [−M, M] (for some positive M) that includes every term xn .
Now I0 is the union of its left half and its right half: indeed, we can write
I0 = [−M, 0] ∪ [0, M]. So we can pick one of the two halves that includes xn for
infinitely many values of n. Call whichever half we pick I1 , and select also n1 for
which xn1 ∈ I1 .
Next, repeat this argument upon I1 : for I1 is the union of its left half and its right
half, and we can pick one of the two halves that includes xn for infinitely many
values of n. Call whichever half we pick this time I2 , and select also n2 greater than
n1 for which xn2 ∈ I2 . The phrase greater than n1 is legitimate because we have an
infinite number of possible n2 s to pick from, and can therefore surely arrange to
make that choice larger than our previous n1 , whatever it was.
Next, repeat this argument upon I2 : I2 is the union of its left half and its right
half, and we can pick one of the two halves that includes xn for infinitely many
values of n. Call whichever half we pick this time I3 , and select also n3 greater than
n2 for which xn3 ∈ I3 .
1 Unlike real pigeonholes, the sets A and B do not have to be disjoint; unlike real pigeons, the
terms of the sequence do not have to be all distinct from one another.
And so on without end (and you can make an appeal to induction if you feel that
it’s necessary).
This generates a sequence (I1 , I2 , I3 , · · · , Ik , · · · ) of closed intervals, each con-
tained in the previous one and exactly half of its length, and also a subsequence
(xn1 , xn2 , xn3 , · · · , xnk , · · · ) of the original sequence, such that (for all k) xnk belongs
to Ik . All we still need to do is to use the ‘shrinking’ nature of the intervals to show
that the subsequence converges.
Write the typical Ik as [ak , bk ] and notice that, since the length of I1 was M and
the (half) length of I2 was M/2 and the length of I3 was M/4 and so on, bk is actually
ak + M/2k−1 . Since Ik+1 is either the left or the right half of Ik , we also have
ak ≤ ak+1 ≤ bk+1 ≤ M
for all k, so (ak ) is an increasing bounded sequence, and therefore converges to

some limit . Finally,
ak ≤ xnk ≤ bk = ak + M/2k−1
lets us appeal to the squeeze because, since 2−k → 0 as k → ∞ (see 4.1.13), ak and
ak + M/2k−1 both converge to (the same) . Thus we have (xnk ) converging to .
An alternative approach to proving Bolzano-Weierstrass is outlined in para-
graph 5.3.7.
For the purposes of this text, the really important applications of Bolzano-
Weierstrass happen only after we have defined continuous functions. Until we reach
that point, 5.3.2 and 5.3.3 will provide a little insight into how it may be used. In
any case, please take note of 5.3.5 and 5.3.6 which have a direct bearing on our
ongoing study of sequences.
5.3.2 Example Let us (temporarily) call a sequence (xn )n∈N channelled if
|x1 − x2 | < 1/2, |x2 − x3 | < 1/3, |x3 − x4 | < 1/4, · · ·
and, in general, |xn − xn+1 | < 1/(n + 1) for every positive integer n. To prove that
every bounded sequence has a channelled subsequence.
Solution
Thanks to what Bolzano-Weierstrass tells us, it will be enough just to show that
every convergent sequence has a channelled subsequence; so let us tackle that.
Suppose (yn )n∈N converges to . Then there is n1 ∈ N such that |yn − | < 14
whenever n ≥ n1 . There is also n2 ∈ N such that |yn − | < 16 whenever n ≥ n2 ,
and we can make sure that n2 > n1 (just by increasing n2 if necessary). Then there
is also n3 ∈ N such that |yn − | < 18 whenever n ≥ n3 , and we can make sure that
n3 > n2 (just by increasing n3 if necessary).
5.3 BOLZANO-WEIERSTRASS: THE OVERCROWDED INTERVAL 85
We are now into an induction process, and it will generate more and more
positive integers n3 < n4 < n5 < n6 < · · · such that (in the sequence (yn )):
1
• each two terms from number n4 onwards will be less than 10 away from ,
1
1
– and so on. Using these distance estimates (and the triangle inequality), we
see that
• |yn1 − yn2 | ≤ |yn1 − | + | − yn2 | < 1/4 + 1/4 = 1/2,
• |yn2 − yn3 | ≤ |yn2 − | + | − yn3 | < 1/6 + 1/6 = 1/3,
• |yn3 − yn4 | ≤ |yn3 − | + | − yn4 | < 1/8 + 1/8 = 1/4,
• |yn4 − yn5 | ≤ |yn4 − | + | − yn5 | < 1/10 + 1/10 = 1/5
— and so on. In other words, the subsequence (yn1 , yn2 , yn3 , yn4 · · · ) is channelled.
5.3.3 EXERCISE Let us (equally temporarily) call a sequence (xn )n∈N superchan-
nelled if
|x1 − x2 | < 10−1 , |x2 − x3 | < 10−2 , |x3 − x4 | < 10−3 , · · ·
and, in general, |xn − xn+1 | < 10−n for every positive integer n. Prove that every
bounded sequence has a superchannelled subsequence.
5.3.4 HARDER EXERCISE Suppose we are given a sequence of points

(P1 , P2 , P3 , · · · ) in the coordinate plane, each lying inside the square whose corners
have coordinates (0, 0), (1, 0), (1, 1) and (0, 1), and suppose also that for every
positive ε there is at least one of these points that lies below the horizontal line
y = ε. Show that there is a point on the x-axis such that every open disc centred
on that point includes infinitely many of the points Pn .
Draft solution
Think of the coordinates (xn , yn ) of the typical point Pn in the sequence. For
each positive integer k (thinking ε = 1k ) we get a positive integer nk such that
0 < ynk < 1k . Can we use Bolzano-Weierstrass on the sequence (xnk ) of all the
x-coordinates of the associated points? What does it tell us if we do?
5.3.5 Example Let (an )n∈N be a bounded sequence that is not convergent. We
show that it must have two convergent subsequences possessing different limits.
Solution
Find M > 0 such that [−M, M] contains the entire sequence. By Bolzano-
Weierstrass, there is a subsequence (ank )k∈N that converges to a limit . Yet (an )n∈N
itself does not converge to : which means that there is some positive number ε
so small that an never settles permanently between − ε and + ε. This in turn
means that, whichever n0 in N we think of, there is some greater n > n0 for which
xn is outside those borderlines . . .put more tidily, there is a whole subsequence
of (an )n∈N lying in [−M, − ε] ∪ [ + ε, M]. By pigeonholing, one of the two
intervals here contains a subsequence. Use Bolzano-Weierstrass on this and it
gives us another sub-subsequence (still a subsequence of the original, of course)
converging to a limit which, since it has to lie in [−M, − ε] ∪ [ + ε, M] (by
the theorem on taking limits across an inequality – see 4.1.17), cannot be the same
number as .
That is actually a more useful result than it initially looks. Establishing that
a sequence converges by the original definition is, of course, seldom easy . . .but
using that same definition to show that a sequence diverges can be very awkward
indeed. What the theorems on ‘convergent implies bounded’ and ‘convergence of
subsequences’ plus the last example, put together, tell us is that it is never necessary.
Of course a sequence that is unbounded or that possesses two subsequences with
different limits cannot be convergent (by earlier results), but now we see that the
converse is also valid:
5.3.6 Proposition A sequence is divergent if and only if it is unbounded or has

two subsequences with different limits.
Proof
A sequence that is unbounded, or possesses subsequences with differing limits,
must be divergent by Lemma 4.1.6 and Theorem 5.2.3. The converse – that a
divergent sequence must have one of these characteristics – is what Example 5.3.5
demonstrated.
5.3.7 EXERCISE Use the every sequence has a monotonic subsequence theorem
(5.2.8) to give an alternative proof of the Bolzano-Weierstrass result.
(This is quite a simple exercise to carry out, and most people regard this
alternative proof as both shorter and easier to follow than the one we presented
earlier, and yet somehow less informative, more like ‘rabbit out of the hat’ show-
off maths. Of course, you are free to use whichever works better for you.)
5.3.8 A look forward By this point, we have built up a good range of techniques
for establishing convergence and evaluating limits that are capable of working
across a wide variety of sequences. A wide variety, but by no means all – for
there are many important and useful sequences that need some additional (and
sometimes quite individual) attention. The business of our next chapter is to gather
together and explore many of these ‘routine-procedure-resistant’ examples.
.........................................................................
6 Special (or specially

awkward) examples
.........................................................................
6.1 Introduction
We apologise in advance for the rather fragmentary character of this chapter but,
as we pointed out at the end of Chapter 5, there are many convergent sequences
that do not readily give up their secrets under routine uses of the techniques we
have so far developed, and you will need to become acquainted with their limits at
some point. It might as well be now. Keep in mind the squeeze, which will often
turn out to play a role here.
6.2 Important examples of convergence

6.2.1 Geometric sequences For which real values of x does the geometric
sequence
(xn )n∈N
converge?
Solution
We have already seen (see 4.1.13 and 4.1.14) that for 0 < x < 1 we get xn → 0
and that for x > 1 we get (xn )n∈N divergent, and an appeal to 2.7.13 shows that for
−1 < x < 0 we get xn → 0 again. Now we only need to address the few missing
cases.
If x = 1 then it is obvious that xn → 1, and if x = 0 then it is obvious that
x → 0.
n
If x = −1 then the odd and even powers of x are (respectively) −1 and 1, so

evidently (xn )n∈N is again divergent.
In the final case x < − 1, if (xn )n ∈ N were to converge to some limit then
(algebra of limits) |xn | = |x|n → || which is impossible since |x| > 1.
We conclude that (xn )n ∈ N converges only when −1 <x ≤ 1.
6.2.2 Negative powers of n For any t > 0 we have n−t → 0.
88 6 SPECIAL (OR SPECIALLY AWKWARD) EXAMPLES
Solution
Since n−t is certainly positive, we need only ensure that (given ε > 0) we can
find n0 ∈ N so large that n−t < ε will always happen once n ≥ n0 . Straightforward
roughwork will show how big n0 needs to be:
n−t < ε ⇔ nt > 1/ε

⇔ t ln n > ln(1/ε)
⇔ ln n > ln(1/ε)/t
√
⇔ n > eln(1/ε)/t (= 1/ t ε).
A formal proof is now easy to construct, beginning with: ‘Given ε > 0, choose
an integer n0 greater than eln(1/ε)/t . Then for any n ≥ n0 , we have . . . ’.
√
6.2.3 nth roots of a constant If a > 0, how does the sequence ( n a)n∈N behave?
Solution
In the special case a = 1 it is clear that this sequence converges to 1.
√
Now if a > 1 we shall have n a > 1 for every n (for the nth power of a positive
number less than 1 will still be less than 1), so let us write
√
n
a = 1 + hn
where hn is positive, and its subscript n just serves to remind us that its actual value
may vary with n. Raise both sides to the power of n and think what the binomial
theorem tells us:
√ n(n − 1)
a = ( n a)n = (1 + hn )n = 1 + n(hn ) + (hn )2 + · · ·
2!
where there are several more terms to come, but all we need to know about them
is that they are all positive. Each term on the right-hand side is therefore smaller
than a, and we focus on the second one:
n(hn ) < a.
This rearranges to give

a
0 < hn <
n
√
and an easy application of the squeeze gives us hn → 0, and n a = 1 + hn → 1.
In the third case 0 < a < 1,√notice that a−1 is greater than 1 so, by the second
case, we already know that a−1 → 1. By the algebra of limits it follows that
n
√n
a → 1−1 = 1 so the desired limit is once again 1.
6.2 IMPORTANT EXAMPLES OF CONVERGENCE 89
6.2.4 Example To find the limit of the sequence whose nth term is
√
3n2 +n−2
xn = 123.
Solution
√
This is a subsequence of the sequence ( n a) (for a = 123) whose limit we know to
be 1. Therefore xn → 1 also.
√
6.2.5 EXERCISE Determine the limit of (an )n∈N where an = 3 + 7n + 9n .
n n
Draft solution
Begin with the obvious remark that
9n < 3n + 7n + 9n < 9n + 9n + 9n = (3)9n .
Take nth roots right across and think squeeze.

Somewhat surprisingly, almost exactly the same argument works on what
appears at first sight to be a much harder problem:
√
6.2.6 nth root of n The sequence ( n n) converges to 1.
Solution
Ignoring the case n = 1 (which, as usual, cannot affect the limiting behaviour) we
√
see that n n > 1 for every other n, so let us write
√
n
n = 1 + hn
where hn is positive and will vary with n. Raise both sides to the power of n and
use the binomial theorem:
√ n(n − 1)
n = ( n n)n = (1 + hn )n = 1 + n(hn ) + (hn )2 + · · ·
2!
where there are several more terms to come and they are all positive. Each term on
the right-hand side is therefore smaller than n, but this time we focus on the third
one:
n(n − 1)
(hn )2 < n.
2!
Dividing both sides by n and remembering that n is at least 2, this rearranges to
give
2
(hn )2 <
n−1
and so
2
0 < hn < .
n−1

2
Now it is fairly obvious (or see paragraph 2.7.12) that n−1 → 0, the squeeze
√
gives us hn → 0, and n n = 1 + hn → 1.
√
6.2.7 Example To determine the limit of (an ) where an = 5n+2
7n + 10.
Solution
√
The nth term broadly resembles n n but, of course, we have to do better than broad
resemblance. One approach (assuming n > 2) is:
√ √ √ √
5n+2
5n + 2 < 7n + 10 <
5n+2
7n + 10 = ( 7n+10 7n + 10)2
3.5n+5
(think carefully about the use of the index laws1 in that calculation) at which point
√
we see that the first and last items involve subsequences of ( n n) and therefore
converge to 1, after which the squeeze tells us that an → 1 also.
6.2.8 EXERCISE
√
1. Determine the limit of n 20n − 15. (It may help to begin by observing that
5n ≤ 20n − 15 < 20n for each n.)
2. Determine the limit of (an )n∈N where
√
an = (n−1 ) n 1n + 2n + 3n + · · · + nn .
6.2.9 Factorials grow faster than powers For any constant t, the sequence

tn
n! n∈N
converges to zero. This may seem contrary to common sense at first sight; for
example, try t = 10, calculate the first few terms
10, 50, 166.666 . . . , 416.666 . . . , 833.333 . . . · · ·
1 In particular, you should take note that if x > 1 and a xb .
This is another of those plausible-looking arithmetical results whose full understanding will have
to wait until we have properly defined the elementary functions, including general powers, in
Chapter 18.
– and there is certainly no sign yet that they are drifting closer to zero. But we need
to look at what happens for seriously big values of n, and the early, small values
may be misleading.
Solution
We can assume that t is positive. (Because, firstly, in the case where t = 0 the result
is immediate; and, secondly, once we have proved it in the case t > 0, if we are
then challenged with a sequence

un
(an )n∈N =
n! n∈N
where u is negative, we shall already know that
|u|n
→ 0.
n!
That is, |an | → 0. Now an earlier exercise (2.7.13) tells us that an → 0 also.)
Comparing an+1 with an , we find that
t
an+1 = an ,
n+1
and that fraction multiplier t/(n + 1) will be small once n becomes substantially
bigger than t. More precisely, once n exceeds 2t, the fraction multiplier will be less
than a half and, therefore, each term will be more than 50% smaller than the one
before it. This lets us set up a clear proof that the terms tend to zero:
As soon as the integer n exceeds 2t, we have
t n+1 t tn t tn 1 tn
= < =
(n + 1)! n + 1 n! 2t n! 2 n!
– that is, an+1 < 0.5an . So, picking an integer n0 > 2t, we have
an0 +1 < 0.5an0 , an0 +2 < (0.5)2 an0 , an0 +3 < (0.5)3 an0 , · · ·
· · · an0 +k < (0.5)k an0 (k ≥ 1).
Since (0.5)k → 0 as k → ∞ and all the terms are positive, the sequence (an )
converges to zero (by the squeeze) as claimed. (Notice that we entirely ignored
the first n0 terms but, as usual, this does not affect limiting behaviour.)
6.2.10 How does the nth root of n! behave? It gets seriously big. More precisely,
the sequence (xn )n≥1 given by
1
xn = √
n
n!
converges to 0.
Solution
For any positive constant ε put t = 1/ε > 0 and call in the previous result, that
tn
→ 0.
n!
In particular we can find a positive integer n0 such that n ≥ n0 guarantees that
tn
<1
n!
which in turn gives

1
< t −n = εn
n!
and
1
√
n
<ε
n!
therefore establishing the claim.
6.2.11 EXERCISE Investigate the limiting behaviour of the sequence whose nth
term, for n > 10, is
1
√ .
n
(10)(11)(12)(13) · · · (n − 2)(n − 1)n
6.2.12 EXERCISE Determine the limit, as n → ∞, of:

2
106n 2n 1
(i) , (ii) √ , (iii) √ .
(n2 )! n! + n! n!
(n!)!
Remark
Watch out for subsequences, and for the squeeze.
√ between large square roots Determine the limiting behaviour

6.2.13 Difference
√
of n+a− n + b for positive constants a and b.
Solution
Arrange the labelling so that a > b (for if a and b are equal, the sequence is
constantly zero). Notice first (difference of two squares) that
√ √ √ √ √ √
( n + a − n + b)( n + a + n + b) = ( n + a)2 − ( n + b)2
= (n + a) − (n + b) = a − b
from which we find that

√ √ a−b
0< n+a− n+b= √ √ .
n+a+ n+b
√ √ √
The bottom line of that last quotient is more than n + n = 2 n so it can
be made arbitrarily large. It follows that the quotient itself tends to zero, and the
squeeze gives us √ √
n + a − n + b → 0.
6.2.14 EXERCISE
to show that, if xn → where

1. Use the same kind of algebraic reorganisation √
√
each xn and itself are positive, then xn → .
2. (Using the algebra of limits when it becomes appropriate) determine the limit
(as n → ∞) of
4n2 + 2n + 1 − 4n2 − n + 5.
n
6.2.15 How does the sequence 1 + n1 behave? This is a classic instance
n≥1
of common sense misleading
us. Thinking informally (and sloppily) leads us
1
to expect that, since 1 + n converges to 1, this is essentially a large power
of 1 and therefore, going to the limit, is also 1. The weakest point in that
draft
argument is that we have to allow n to go off to infinity before regarding 1 + n1
as being 1, and therefore we cannot talk about its nth power at all after we have
done that.
What’s actually going on in this sequence is a contest between 1 + n1 itself
getting smaller as n increases while, on the other hand, the fact that we are
simultaneously raising it to higher and higher powers is attempting to make it
bigger. The first step towards a proper understanding of this example is to admit
that it is far from obvious which of these two tendencies is going to win.
The second step is to swallow our pride and calculate the first few terms:
2, 2.25, 2.370370, 2.441406, 2.488320, 2.521626, 2.546500, 2.565785, · · ·
(where the answers have been reported to six decimal places). It appears that this is
an increasing sequence so far, but the numerical value of the limit (if it even exists)
is not yet obvious. Still, we now have a possible strategy: to try to show that the
sequence is, indeed, always increasing, and perhaps bounded as well?
At a first or second reading you might choose to ignore the proofs of the next
three paragraphs, and you will be none the worse for it. Do not, however, ignore
the result that emerges.
6.2.16 Recall, from an earlier example (4.2.3): – that if x ≥ −1 and n is any

positive integer, then
(1 + x)n ≥ 1 + nx.
This is the vital ingredient in the demonstration of the following unexpected and
unattractive lemma:
n
1 n+1
6.2.17 Lemma For each n ∈ N we have 1 − ≥ .
(n + 1) 2 n+2
Proof
1
Take x = − in the preceding Recall, and simplify the resulting algebra:
(n + 1)2
n
1 n
1− ≥1−
(n + 1)2 (n + 1)2
n n 1
=1− 2 >1− 2 =1−
n + 2n + 1 n + 2n n+2
(n + 2) − 1 n+1
= = .
n+2 n+2
Carefully note the disturbance of inequality caused by ditching the +1 from the
bottom line. This shrinks the bottom line, and therefore increases the fraction; but
the fraction has an overall minus attached, so the nett result is to decrease the total.
Now it turns out that this ugly duckling of a lemma is precisely what is needed
to prove the result that we want:

1 n
6.2.18 Proposition The sequence 1+ n is increasing.
n≥1
Proof
xn+1
Denoting the typical term by xn , we carefully simplify the ratio2 xn and seek
evidence that the answer is greater than 1.
n+1
1
xn+1 1 + n+1
= n
xn 1 + n1
n+1
n+2
n+1
= n+1 n
n
n
n+2
n+2 n
= ×
n+1
n+1 n+1
2 n
n+2 n + 2n
=
n+1 (n + 1)2
2 This is a case where subtracting xn from xn+1 would not readily have helped us.
n
n+2 n2 + 2n + 1 − 1
=
n+1 (n + 1)2
2 n
n+2 (n + 2n + 1) − 1
=
n+1 (n + 1)2
n
n+2 1
= 1−
n+1 (n + 1)2

n+2 n+1
≥ =1
n+1 n+2
by the lemma. Hence the result, that xn+1 ≥ xn for all n ≥ 1.

1 n
6.2.19 Proposition The sequence 1+ n converges to a limit that is
n≥1
somewhere between 2.5 and 3.
Proof
Since we know that it is increasing and that the sixth term already exceeds 2.5, we
need only confirm that it is bounded above by 3. That is disarmingly simple, using
the binomial theorem again:
n 1
1 n 1 n(n − 1) 1 2
1+ =1+ +
n 1 n 2! n
3 n
n(n − 1)(n − 2) 1 1
+ + ··· +
3! n n

n 1 1 n(n) 1 2 n(n)(n) 1 3 1
<1+ + + + ··· +
1 n 2! n 3! n n!
1 1 1 1
=1+1+ + + + ··· +
2! 3! 4! n!
1 1 1 1
<1+1+ + + + · · · + n−1
2 2·2 2·2·2 2
1 n
1− 2
=1+ <1+2=3
1 − 12
and the proof is complete. Notice that we assumed n to be big enough that all the
terms of the binomial expansion that we listed actually came into play: but that
is harmless since an upper bound for later terms in this (increasing) sequence is
certainly an upper bound also for the earlier, smaller ones.
6.2.20 Important notes
1. Taking more care in our estimations will let us find much more accurate values
for the limit in question. On the lower side, the limit of this increasing
sequence has to exceed each term that we choose to calculate exactly, such as
term number 10 (which is 2.593742 to six decimal places), term number 20 at

2.653298, term number 100 at 2.704814. On the upper side, we could sharpen
the above argument along the following lines:

1 n 1 1 1 1
1+ <1+1+ + + + ··· +
n 2! 3! 4! n!

1 1 1 1 1 1 1 1
<1+1+ + + + 1+ + + + ··· +
2 6 24 120 6 6×7 6×7×8 6 × 7 × 8 × ··· × n

1 1 1 1 1 1 1 1
< 1+1+ + + + 1+ + + + · · · + n−5
2 6 24 120 6 6×6 6×6×6 6
⎛ n−4 ⎞
1
65 1 ⎜1 − 6 ⎟ 65 1 1
= + ⎝ ⎠< + = 2.718333 (to 6 decimal places).
24 120 1 − 16 24 120 1 − 1
6
2. The limit is actually the irrational number written as e, as in loge (that is, ln)
and ex and exponential growth. Its numerical value is approximately 2.71828.
3. Summary: n
1
1+ → e.
n
4. This turns out to be a special case (the case x = 1) of a highly important limit
that you need to know, but whose detailed proof will have to wait until much
later (Chapter 18, in fact; paragraph 18.2.16) in this account:
x n
1+ → ex .
n
6.2.21 Examples To find the limits of the sequences whose nth terms are as
follows:
n!+1
−1 n π n 1
(1 + 2n ) , 1− , 1+ ,
n n! + 1
2 n+2
1 n +n n
1+ 2 , .
n n+3
Solution
The first three are immediate from the above theorem and its subsequent notes:
n
they are e2 , e−π and e (since the third is a subsequence of 1 + n1 ).
n≥1
The fourth one – let us denote it by xn – needs a little more attention. We can
express xn as
2
1 n 1 n
1+ 2 1+ 2
n n
– and the first of these factors is easy enough to deal with: it converges to e because
n
it represents a subsequence of 1 + n1 (which, of course, tends to e). The
remaining problem is to estimate the second factor which (check the index laws)
is the nth root of the first factor. Now the (convergent) first factor is bounded: it lies
always between two (evidently positive) constants a and b:
n2
1
a≤ 1+ 2 ≤b
n
for all n and so, now taking nth roots:

√ 1 n √n
n
a≤ 1+ 2 ≤ b
n
n
1
for all n. At this point, the squeeze gives us 1 + n2
→ 1 and so
n2 n
1 1
xn = 1 + 2 1+ 2 → e × 1 = e.
n n
Number five also needs some rearrangement of its algebra:

n+2
n n + 3 −n−2
=
n+3 n
−1
3 n 3 −2
= 1+ 1+ → (e3 )−1 × (1)−2 = e−3 .
n n
using, incidentally, parts of the algebra of limits.
6.2.22 EXERCISE Determine the limits of:
5n+2 n2 n2 +4n−1 2 n

n+3 n2 + 5 1 n + 2n + 8
, , 1+ 2 , ,
n+6 n2 n + 3n n2 + n + 7
2
0.5 n +n+10 4 n
1− , 1− 2 .
n n
(The fourth one is not at all easy.)
6.2.23 EXERCISE Use induction to show that, for each positive integer n:
n n
n! ≥ 3 .
3
(You may find that the result established in the proof of 6.2.19 is useful here.)
The final case study in this set – unlike the one we have just done – is included
merely for interest and, if you are short of time, you may safely leave it out. It
concerns the Fibonacci sequence (fn )n∈N that we mentioned briefly in our work
on recursively defined sequences, the sequence
1, 1, 2, 3, 5, 8, 13, 21, 34, · · ·
that is created by choosing the first and second terms to be 1 and, from then on,
letting fn+2 = fn+1 + fn (n ≥ 1), that is, creating each term by adding the
two that are immediately before it. It seems highly unlikely at first sight that the
sequence of numbers (fn ) is settling towards a limit (indeed, it would be fairly easy
to show that it is unbounded, and therefore divergent) but its rate of growth is quite
a different matter.
Even a casual look at the sequence (fn ) will show that the terms are growing at
quite a steady rate, increasing by about 60% each time once the pattern is securely
established. This is rather curious …why should a sequence formed by adding turn
out to be one that is propagated by multiplying by about 1.6, and what is the limiting
value of this multiplier if, indeed, it has a limiting value? That is, can we find the
f
limit of the ratio, the ‘growth rate’ n+1
fn ? Note that our usual trick of trying to show
that the sequence (of ratios) is monotonic and bounded will not work this time: a
glance at the first few ratios
1, 2, 1.5, 1.666666, 1.6, 1.625, 1.615385, 1.619048, · · ·
shows that it is neither increasing nor decreasing, but is apparently ‘homing in

from both sides’ towards a presumed (but still unproven) limit.
Let us rough-work our way blindly towards a possible solution. If (and that
f
might be a big if because we are still only guessing) n+1fn converges to some limit
, then certainly is no less than 1 since the Fibonacci sequence is increasing, so
= 0, and 1/ makes sense. Look at the Fibonacci formation law
fn+2 = fn+1 + fn ,
divide across by fn+1 to get
fn+2 fn
=1+ ,
fn+1 fn+1
and observe that the left-hand side also converges to (for it is just the ratio
sequence with its first term left out). Taking limits on both sides, we deduce that
= 1 + 1 or, more readably:
2 − − 1 = 0.
This quadratic
√
will not√factorise, so we solve it instead by the quadratic formula
to obtain 2 and 1−2 5 , that is, approximately 1.618034 and −0.618034. At this
1+ 5
point we can be pretty confident that we know what number the limit is bound
to be, but bear in mind that we have not yet proved that any limit exists for the
growth-rate sequence.
Let’s try a different approach. A sequence whose growth-rate is constant (say,
permanently equal to x), as opposed to merely settling towards a constant as limit,
can only take the form
a, ax, ax2 , ax3 , · · · axn−1 , · · · .
Is it at all possible for such a sequence to satisfy the Fibonacci recurrence relation?
Let’s see:
fn+2 = fn+1 + fn now says axn+1 = axn + axn−1
which cancels down to x2 = x + 1, that is, to the same quadratic equation

x2 − x − 1 = 0 that we met a moment ago, and whose two possible solutions
we calculated. To save some writing, let us use temporary symbols for those
solutions, say √ √
1+ 5 1− 5
α= and β = .
2 2
(Keep in mind that, because of the equation that they satisfy, α + 1 = α 2 and
β +1 = β 2 .) So the only constant-growth-rate sequences that satisfy the Fibonacci
recurrence look like aα n−1 or bβ n−1 .
Wild surmise: is it possible that the real Fibonacci sequence is a combination of
these? Something like aα n−1 + bβ n−1 ?
If this were true then, to make the first and second terms both equal 1, we’d have
to replace a and b by numbers p and q that make the tentative formula yield exactly
these values for n = 1 and n = 2, that is:
p + q = 1, pα + qβ = 1.
Yet these are very easy equations to solve for p and q! We get
1−β α−1
p= , q=
α−β α−β
which, when you substitute in the values we have for α and β, simplify to
√ √
5+ 5 5− 5
p= , q=
10 10
(approximately 0.7236 and 0.2764 respectively).

After so much conjecture, we at last have a proposal to set up and defend:
6.2.24 Proposition With the parameters α, β, p and q as evaluated above, the

Fibonacci sequence is explicitly described by the formula
fn = pα n−1 + qβ n−1 (n ≥ 1).
Proof
The (originally described) Fibonacci sequence is fully specified by the conditions
f1 = 1, f2 = 1, fn+2 = fn+1 + fn (n ≥ 1). That is, once we have agreed that
these conditions are to hold, there is no ambiguity as to what every term in that
sequence has to be. (You may take that as obvious, on the grounds that we could
calculate from these conditions the value of any particular term that we wanted.
If you are not convinced by that, an induction argument upon the statement ‘all
the terms up to and including the nth term are completely specified by the given
conditions’ can easily be constructed.)
Yet the (possibly new?) sequence gn = pα n−1 + qβ n−1 (n ≥ 1) does satisfy
g1 = 1 and g2 = 1 since we picked the numbers p and q expressly so as to make
that happen, and also
gn+1 + gn = pα n + qβ n + pα n−1 + qβ n−1 = pα n−1 (α + 1) + qβ n−1 (β + 1)

= pα n−1 (α 2 ) + qβ n−1 (β 2 ) = pα n+1 + qβ n+1
= gn+2 .
That is, (gn ) satisfies all of the conditions that completely specified the Fibonacci
sequence. This can only mean that (gn ) actually is the Fibonacci sequence, and our
proof is complete.
6.2.25 Note For ease of use, we can slightly simplify the formula just obtained for
fn as follows:
fn = pα n−1 + qβ n−1
√ √ n−1 √ √ n−1
5+ 5 1+ 5 5− 5 1− 5
= +
10 2 10 2
√ √ √ n−1 √ √ √ n−1
5(1 + 5) 1 + 5 5(1 − 5) 1 − 5
= −
5×2 2 5×2 2
√ n √ n
1 1+ 5 1 1− 5
= 5− 2 − 5− 2
2 2
1 1
= 5− 2 α n − 5− 2 β n .
6.2.26 Example
√
The limit of the growth-rate in the Fibonacci sequence is
1+ 5
α= 2 .
Solution
Using the formula for the nth Fibonacci number fn obtained in the Note, we have:
fn+1 α n+1 − β n+1

=
fn αn − β n
n
β
α−β α
= n
β
1− α
β
which, because the number α is numerically less than 1 (in fact it is approximately
−0.382) converges to √
α−0 1+ 5
=α=
1−0 2
as we claimed.
1 1
6.2.27 Postscript Of the two components 5− 2 α n and 5− 2 β n of our formula for
th
the n Fibonacci number, the first is by far the more important. For example, if we
put n = 15, the formula yields (to five decimal places) f15 = 609.99967 + 0.00033.
The reason is that β has modulus smaller than 1, and therefore its powers tend to
1
zero rather rapidly. More exactly, when we regard 5− 2 α n as an approximation to
fn , the error term
1 1
| fn − 5− 2 α n | = 5− 2 |β|n
which decreases to the limit zero. Notice also that even for n = 1, the error term
1
5− 2 |β|1 is only about 0.2764, so all of the error terms are much less than 1. It
1
follows that fn is always the integer closest to 5− 2 α n . An additional detail – taking
1
note of the sign of 5− 2 β n – is that
1
• for odd values of n, fn is the integer just greater than 5− 2 α n , whereas
1
• for even values of n, fn is the integer just less than 5− 2 α n .
6.2.28 EXERCISES
1. With fn continuing to denote the nth Fibonacci number, what is the limiting
behaviour of
fn+2
?
fn
2. If we were to alter the defining conditions of the Fibonacci sequence by
changing only the first two terms, say, to
g1 = 7, g2 = 4, gn+2 = gn+1 + gn (n ≥ 1)
or to
g1 = a, g2 = b, gn+2 = gn+1 + gn (n ≥ 1)
for any constants a and b, what effect would that have on the limiting
behaviour of the growth rate?
3. Investigate the sequence (hn )n≥1 defined by
h1 = 1, h2 = 1, hn+2 = 3hn+1 + 4hn (n ≥ 1).

.........................................................................
7 Endless sums — a first

look at series
.........................................................................
7.1 Introduction
Every ten-year-old school child knows that zero point endlessly many threes means
one third. This is not in doubt. What does need some critical analysis, however, is
whether zero point endlessly many threes is a legitimate symbol at all.
We are so over-familiar with the decimal system of representing numbers that
it is all too easy to forget what a superb invention it was. Its beauty and power
reside in the ability it gives us to write down any whole number whatsoever, and
a great many non-integers too, using only twelve symbols: the digits 0, 1, 2, 3, 4,
5, 6, 7, 8 and 9, the decimal point (or, if you prefer, the decimal comma) and the
minus sign. The power derives from the fact that it is a positional system: each
symbol carries information not only from its shape but also from where it occurs
in relation to the other symbols (especially the decimal point). Thus, for instance,
12825 actually means 1(10)4 + 2(10)3 + 8(10)2 + 2(10)1 + 5(10)0 and the two 2s
have different meanings, different significances, because of where they sit. In the
same way, positive numbers less than 1 can be denoted by placing digits of lower
and lower significance to the right of the decimal point: 0.4703 means 4(10)−1 +
7(10)−2 + 0(10)−3 + 3(10)−4 .
Seeking to extend that notation to non-terminating decimals1 raises an issue that
virtually no ten-year-old is in a position to handle with full rigour. The phrase zero
point endlessly many threes suggests that we write down, or at least imagine writing
down,
0.33333333 . . . and so on for ever,
and that this ought to mean
3 3 3 3 3
+ + + + + · · · and so on for ever.
10 100 1000 10000 100000
The first (practical) problem here is that no-one has ever lived long enough to write
down an infinite list of threes, nor indeed will the universe last long enough for
this to occur; but the deeper (conceptual) difficulty is that this is not how addition
works. Adding is essentially a finite procedure: we know what adding two numbers
1 which, at the least, obliges us to admit one more symbol, namely the row of dots · · ·
104 7 ENDLESS SUMS — A FIRST LOOK AT SERIES
means, and from that we can add three, or four, or a million, just by grouping them
together in pairs or by implementing some kind of induction argument; but adding
an infinite list of numbers does not make sense.
However, we have already faced and overcome this difficulty while talking about
where the idea of sequence limits comes from. Instead of grappling with a virtual
symbol such as that invoked by the slightly mystical phrase zero point endlessly
many threes, look instead at the sequence of perfectly ordinary numbers
( 0.3, 0.33, 0.333, 0.3333, 0.33333, · · · )

3 3 3 3 3 3
= , + , + + ,
10 10 100 10 100 1000
3 3 3 3 3 3 3 3 3
+ + + , + + + + ,··· .
10 100 1000 10000 10 100 1000 10000 100000
If this has a limit, then that limit will be the natural way to interpret zero point
endlessly many threes. It does, and – to the undoubted satisfaction of many former
ten-year-olds – the limit is one third.
More importantly, though, this discussion provides a fruitful suggestion as to
how we might seek to make sense of the sum of an arbitrary endless list of numbers,
namely:
• don’t attempt to add all of them,
• just add the first n
• and then look for a limit of that partial total as n → ∞.
7.2 Definition and easy results

Informally, a series arises whenever we try to add all the terms of a sequence. So,
if (ak )k∈N is any sequence, the associated series question is: can we make sense of
a1 + a2 + a3 + · · · + an + · · ·?
As we indicated in the introduction, the standard way to attempt to do this is to

create a second sequence
(a1 , a1 + a2 , a1 + a2 + a3 , a1 + a2 + a3 + a4 , · · · )
and seek a limit for it.

Some more notation at this point will help to reduce how much we have to write:
the sum of the first n terms of (ak )k∈N is called the nth partial sum and denoted

by k=nk=1 ak rather than a1 + a2 + a3 + · · · + an , and we’ll often denote it even
more briefly by sn or Sn . The series itself – the issue that we are trying to resolve – is
∞
denoted by ∞ k=1 ak or 1 ak .
7.2 DEFINITION AND EASY RESULTS 105

The capital Greek letter is sigma, and the attached k = 1 and k = n are
usually abbreviated to merely 1 and n, or even left out altogether if the context
makes it clear what range of values is involved. Just as for sequences, it is perfectly
fine to start off at k = 0 or at k = 2 or elsewhere if k = 1 is not the best first step:

for instance, 8 + 16 + 32 + 64 + · · · + 2n can be expressed either as nk=3 2k or as
n−2 k+2
k=1 2 .
(There is nothing special about the choice of the letter k here – you may use
just about any symbol you please since it is a ‘dummy variable’ having no effect on
the outcome, but don’t use n or it will create confusion. Symbols such as nn=1 2n
really don’t mean anything unambiguous – if we are told to begin with n = 1 and
continue until n = n, where are we meant to stop? Not until n = n? But n is always
equal to n, isn’t it . . . ? This is the kind of vicious circle that may drag us in if we
employ the same symbol with two different meanings at the same time.)
7.2.1 Definition A series is a pair of sequences (ak )k∈N and (sn )n∈N linked

together by the conditions sn = nk=1 ak for each n. The first is called the sequence
of terms and the second (as we already said) is called the sequence of partial sums.
∞
The series itself is denoted by the symbol ∞ 1 ak or 1 an .
It is the second of the two sequences whose limiting behaviour we have to focus
on when we examine a series – so much so that, for nearly all practical purposes,
it is legitimate to shorten the above official definition to:
∞
Shortened definition:
a series 1 ak is the sequence (sn )n∈N of partial sums
(where sn means nk=1 ak = a1 + a2 + a3 + · · · + an ).
An advantage of the shortened definition is that it makes the rest of this
definition paragraph so obvious as to be scarcely worth saying:
7.2.2 Continuing definition A series is said to be

• convergent if the sequence of partial sums is convergent,
• divergent if the sequence of partial sums is divergent,
• bounded if the sequence of partial sums is bounded,
and so on. One slight disturbance of this pattern is important: when the sequence
of partial sums does converge to some limit , we don’t call the limit of the
series; we call it the sum to infinity of the series (or, more briefly, the sum of
the series).
Irritatingly, and despite what we said above about never using the same symbol
with two different meanings at the same time, for historical reasons the symbol
∞
1 ak is often used to mean both the series and (provided it does converge) the
sum to infinity of that series. We apologise for this, but you may need to know it
when you read textbooks. So, for example, the statements
∞ k
1
converges, and its sum is 1
1
2
and
∞ k
1
=1
1
2
should be viewed as saying exactly the same thing, although in the first the sigma
symbol means the series itself, while in the second the sigma symbol means the
sum (to infinity) of that series.
One more ambiguity alert: when working on a series, be careful not to use too
many pronouns! You are dealing with two sequences at once, so try to avoid phrases
like ‘it tends to zero’ or ‘it is bounded’ or ‘it converges’ unless the context really
does make it clear which of the two you mean.
7.2.3 Example: geometric series For any x ∈ (−1, 1), the geometric series
∞

xk
k=0
1
converges, and its sum is .
1−x
Solution
If we multiply out the product
(1 − x)(1 + x + x2 + x3 + · · · + xn−1 )
we see that all of the terms cancel in pairs except for 1 and −xn . Dividing across
by (non-zero) (1 − x) shows that the nth partial sum of this series
1 − xn
sn = 1 + x + x2 + x3 + · · · + xn−1 = .
1−x
1
Taking limits (as n → ∞) we find that sn → as predicted.
1−x
(Note also the effect of starting such a geometric series at a point other than
k = 0: for an integer m ≥ 0 the series
∞

xm + xm+1 + xm+2 + xm+3 + · · · = xk
k=m
can be thought of as
∞
1 xm
m
x xr = xm = .
r=0
1−x 1−x
This is legitimate because each partial sum can be factorised in just such a fashion,
after which we can let the number of terms tend to infinity and obtain the claimed
conclusion in the limit.)
7.2.4 Example We confirm that the recurring decimal indicated by the phrase
zero point endlessly many nines represents the number 1.
Solution
The phrase actually means the limit – if it has a limit – of the sequence
0.9, 0.99, 0.999, 0.9999, · · · ,

in other words, the sum of the series ∞ k
k=0 0.9(1/10) .
th
To investigate this, we must look at the n partial sum

n−1
n−1
sn = 0.9(1/10)k = 0.9 (1/10)k .
k=0 k=0
By the above example, the limit of this is
1 0.9
0.9 1
= =1
1 − 10 0.9
as predicted.
7.2.5 EXERCISE
1. Determine the meaning (and the numerical value) of
−3.2283737373737 · · · .
2. Think about how you would prove that every recurring decimal represents a
rational number.

If x ≥ 1 or x ≤ −1 then the geometric series ∞ k
k=0 x does not converge, but
th
diverges. You can check this out by examining the n partial sum and using the
definition, but it is easier and quicker to engage the following little theorem instead:
∞
7.2.6 Theorem If a series k=1 xk converges, then xk → 0.
Proof
That the series converges tells us that the nth partial sum sn converges to some limit
. Also sn+1 converges to the same limit since loss of its first term has no bearing
on the limit. Then xn+1 = sn+1 − sn → − = 0. Hence the result. (We lost x1 in
that demonstration but, once again, early terms don’t have any impact on the limit
of a sequence.)
7.2.7 Alert:
converse of this result is not true. That is, xk → 0 is not enough to
The
guarantee that ∞ k=1 xk converges. We’ll do an example next to demonstrate this.
7.2.8 Example: the harmonic series diverges The series

∞ 1 1 1 1
1 k =1+ 2 + 3 + 4 + · · · diverges.
Solution
Scan along the list of fractions and you will realise that, since they are steadily
decreasing, the biggest in any block is the first in that block and the smallest is the
last. So, for instance,

1 1 1 1 1 1
+ > + =2 = ,
3 4 4 4 4 2

1 1 1 1 1 1
+ + + >4 = ,
5 6 7 8 8 2

1 1 1 1 1 1
+ + ··· + + >8 =
9 10 15 16 16 2
and so on. The pattern emerging here (and writing sn for the nth partial sum as
usual) is

1 1 1 1 1 1 1
s4 > 1 + + = 1 + 2 , s8 > 1 + + + = 1 + 3 ,
2 2 2 2 2 2 2

1 1 1 1 1
s16 > 1 + + + + = 1 + 4
2 2 2 2 2
and, in general,
1
s(2n ) >1+n
2
for each n ≥ 2.
We see from this that the subsequence s(2n ) of (sn ) is unbounded,2 so the
partial-sum sequence (sn ) itself is also unbounded and therefore cannot converge.
7.2.9 EXERCISE
1. Find a positive integer N for which the N th partial sum of the harmonic series
is greater than 2018.
2. Let (ak )k∈N be any given sequence of positive numbers converging to a limit
> 0. Show that the following series diverges:
∞
ak
1
k
2 If this subsequence were bounded, we could find a constant M such that, for every positive
integer n, s(2n ) < M. The previous display now gives us 1 + n/2 < M, that is, n < 2M − 2 for
every positive integer, which is absurd.

For the first part, use an inequality from the solution of the previous example. For
the second part, begin by noting that there is a positive integer n0 such that ak > /2
for each k ≥ n0 , then observe that there is a power of 2 (say, 2m for some m ∈ N)
that is greater than n0 , and then revisit the estimation process that established the
previous example.
Generally speaking, series whose terms are exclusively positive are easier to work
with, as we shall see in the next two sections. There is, however, a class of series that
have many negative terms but whose convergence is nevertheless easy to establish:
7.2.10 The alternating series test Suppose that (ak )k≥1 is a decreasing sequence
of positive numbers that converges to zero. Then the ‘alternating series’
∞

(−1)k−1 ak = a1 − a2 + a3 − a4 + a5 − · · ·
1
converges.
Proof
Since the terms ak are positive and getting steadily smaller, the typical ‘even’ partial
sum
s2n = a1 + (−a2 + a3 ) + (−a4 + a5 ) + (−a6 + a7 ) + · · · + (−a2n−2 + a2n−1 ) − a2n
is less than a1 (and is therefore bounded above). Furthermore,
s2n+2 − s2n = a2n+1 − a2n+2 > 0
so (s2n ) is also an increasing sequence. Therefore it converges to some limit .

For very much the same reasons, the ‘odd’ partial sums form a bounded,
decreasing sequence (s2n−1 ) which also converges to some limit .
Lastly, s2n = s2n−1 + a2n and, taking limits across that line, we get = + 0, in
other words, and are the same number. It follows that the entire sequence (sn )
of partial sums converges to (see 5.2.5 and 5.2.6 for more discussion).
7.2.11 Example
1. The ‘alternating harmonic series’
∞
1 1 1 1 1 1
(−1)k−1 = 1 − + − + − + · · ·
1
k 2 3 4 5 6
converges.
2. For any a > 0 the series
∞
1 1 1 1 1 1
(−1)k−1 a = 1 − a + a − a + a − a + · · ·
1
k 2 3 4 5 6
converges. (Recall paragraph 2.9.16.)
Solution
Both of these are immediate from the alternating series test.
7.2.12 EXERCISE Decide whether the following are convergent or divergent:

∞
1. 2k + 1
(−1)k−1 ;
1
3k2 − 1
∞
2. 1 −2k
(−1)k−1 1 + .
1
3k
7.2.13 HARDER EXERCISE We define a sequence (bn ) by the formulae:
1 1
b2k−1 = ; b2k = ,
k 2k + 2
noting that the odd- and even-numbered terms have different descriptions.

• Is it legitimate to apply the alternating series test to ∞
1 (−1)
k−1 b ?
k
∞
• Does 1 (−1) bk converge or diverge?
k−1
7.2.14 Note Some parts of the algebra of limits transfer immediately from

sequences to series. If two series ak and bk converge, with sums sa and
sb , say, then the series
(ak + b k ) converges to the sumsa + sb simply
because
its nth partial sum nk=1 (ak + bk ) can be rearranged as nk=1 ak + nk=1 bk and
therefore does converge to sa + sb . We write this briefly as

(ak + bk ) = ak + bk
(provided, of course, that the two sums on the right-hand side do exist). In the
same way, and subject to similar provisos:

(ak − bk ) = ak − bk ,

(Cak ) = C ak
where C is any constant. On the other hand, no such result is available for

multiplication:
we do not obtain a partial sum for ak bk by multiplying partial
sums for ak and bk !
7.3 BIG SERIES, SMALL SERIES: COMPARISON TESTS 111
7.3 Big series, small series: comparison tests

We suggested earlier that series whose terms are all positive are easier to deal with.
Here is the basic and simple reason why:
7.3.1 Theorem A series of non-negative terms is convergent if and only if its

sequence of partial sums is bounded.
Proof
∞
If a series 1 ak has ak ≥ 0 for all values of k then, considering its partial sums sn :
sn+1 = sn + an+1 ≥ sn
for all n, that is, (sn ) is an increasing sequence. Therefore (sn ) will converge if it is
bounded, and vice versa.
In less formal language, this result converts the relatively difficult idea of con-
vergence (for this class of series only) into the relatively easy one of boundedness:
you will get convergence precisely when the terms are so small that, no matter
how many of them you add together, there is some absolute upper ceiling to
how big a total you accumulate. ‘Small series converge, big series diverge’. This
insight, in turn, we can sharpen up into a group of results called in general series
comparison tests:
7.3.2 The direct comparison test Suppose (ak )k∈N and (bk )k∈N are
two
sequences of non-negative terms and ak ≤ bk for every k ∈ N. Then if ∞1 bk

converges, ∞ a
1 k must also converge.
∞
(Equivalently, if ∞
1 ak diverges, 1 bk must also diverge.)
Proof

If ∞ 1 bk converges, the previous theorem tells us that there is some upper bound
(call it M) for all of its partial sums: that is, for every n ∈ N we have

n
bk ≤ M.
1
n n
Yet the fact that ak ≤ bk for every k ∈ N assures us that 1 ak ≤ 1 bk by simple
addition, so
n
ak ≤ M
1
∞
also. Now the same theorem gives us the convergence of 1 ak .
7.3.3 EXERCISE: the direct comparison test with scalingSuppose (ak )k∈N and
(bk )k∈N are two sequences of non-negative terms, and we can find a positive
∞ ∞
constant C such that ak ≤ Cbk for every k ∈ N. Then if 1 bk converges, 1 ak
must also converge.
Comment
You will find that the proof
is virtually identical to the previous one: at the line
before the last you’ll get n1 ak ≤ CM, but CM is also just a constant.
7.3.4 Remark To increase further the usefulness of comparison between two

series, we need to think about the effect of changing or ignoring a few terms at
the start. By way of example, consider a geometric series whose sum we already
know:
1 10
1 + 0.1 + (0.1)2 + (0.1)3 + (0.1)4 + · · · + (0.1)n + · · · = = .
1 − 0.1 9
What does that tell us about
(10 + 1) + 0.1 + (300 + (0.1)2 ) + (0.1)3 + (0.1)4 + · · · + (0.1)n + · · · ?
Well, as long as we are sure that only the first and third terms have been altered, the
answer ought to be obvious: the altered series converges to a sum of 10 + 300 + 10 9
simply because every partial sum from the third one onwards has been increased
by 310, and that will feed through to the limit. So we see that – unlike the similar
scenario in sequences – changing a finite number of terms in a series does affect
its convergence behaviour, but only in quite a predictable fashion: the total of the
changes that you added to individual terms gets added onto the sum-to-infinity. In
particular, if the original series did converge, then so must the altered one . . . and
vice versa, because additions can be cancelled out by adding their negatives. We
conclude that:
If a series converges, then after alteration or omission of a finite number of terms,
the new series also converges, and vice versa.
More simply, it is always safe to alter or delete a finite number of terms from a
series provided that we only want to know whether or not it converges (and do not
care what particular sum it converges to). This allows us to modify the previous
two results as follows:
7.3.5 The direct comparison test with alterations/omissions Suppose (ak )k∈N
and (bk )k∈N are two sequences of non-negative terms, and that we can find a

positive
∞ integer k0 such that ak ≤ bk for every k ≥ k0 . Then if ∞
1 bk converges,
1 ka must also converge.
7.3.6 The direct comparison test with scaling and alterations/omissions Sup-
pose (ak )k∈N and (bk )k∈N are two sequences of non-negative terms and we can
find a positive constant
∞ integer k0 such that ak ≤ Cbk for every
C and a positive
k ≥ k0 . Then if ∞ 1 bk converges, 1 ak must also converge.
7.3.7 Examples For each of the following definitions of ak , decide whether

∞
1 ak converges or diverges.
√
1. 5 + 3 sin(k kπ )
ak = ;
2k
√
2. k2 − 30k cos(k + 2 k)
ak = .
4k3 + 7
Solution
1. The top line of the fraction that defines ak always lies between 2 and 8 so,
firstly, all the terms are positive (and therefore we can use the theory
developed so far) and, secondly, ak ≤ 8( 12 )k .

Since (the geometric series) ( 12 )k converges, so does ak (by the direct
comparison test with scaling).
2. There is a risk that several terms here may be negative because of the minus on
the top line. However, if k ≥ 31 then
k2 = k × k > 30k ≥ 30k cos(anything)
so, from that point on, the top line and ak itself are definitely positive: so we
shall just ignore the first 30 terms. Furthermore, if k ≥ 60 then

1 2 1
k = k × k > 30k ≥ 30k cos(anything)
2 2
√
and therefore k2 − 30k cos(k + 2 k) is at least ( 12 )k2 which, in turn, gives us

k2 k2 1 1
ak ≥ ≥ = provided k ≥ 60.
2(4k3 + 7) 2(4k3 + 7k3 ) 22 k
1
Since the harmonic series k diverges, so must the ‘bigger’ series ak by
1
‘comparison’, along with scaling by 22 and omission of terms.
7.3.8 EXERCISE
1. Given a sequence (tn )n∈N of real numbers,

∞ not all of them positive, that
converges
∞ to a limit > 0, show that 1 tk (0.99)k converges and that
1
1 tk k diverges.
2. Provided with
the extra information (which we shall confirm soon – see
7.3.13) that k−2 converges, show that both
6k − 5
k(k2 + 17)
and
6k + 2
k(8k2 − 5)
also converge.
There is another variety of comparison test that, in some cases at least, saves us
the bother of ignoring initial terms and guessing about scaling constants:
7.3.9 The limit comparison test (sometimes denoted by LCT) Suppose that
ak
(ak )k∈N and (bk )k∈N are two sequences of positive terms and that tends to a
bk
non-zero limit. Then either the two series both converge, or they both diverge.
Proof
Let denote the (non-zero) limit of the ratio of ak to bk . Using ε = /2 in the
definition of limit, there is a positive integer k0 such that, for k ≥ k0 , we have
ak 3
< < .
2 bk 2
The right-hand portion of that inequality rearranges to give ak < (3/2)bk for
large values of k, so the direct comparison test with scaling 3/2 (and ignoring

terms up to the kth
0 ) tells us that if bk converges then so must ak .
Now the left-hand portion of the displayed line rearranges to produce

bk < (2/)ak for large values of k, so the same argument establishes that if ak

converges then so must bk .
7.3.10 Examples To decide whether the following converge or diverge:

k4 − 3k3 + 8k2 − 5
1. ak where ak = ;
1 + k + k2 + k3 + k4 + 2k5

2. 2k − 1 k
ak where ak = .
3k
Solution
1. (Roughly speaking, the biggest power of k will dominate each line of the
fraction, that is, the top line will be dominated by k4 and the bottom line
k4 1
dominated by 2k5 . So ak resembles 5 = . Therefore . . . )
2k 2k
Let us consider bk = 1k . We see that
ak k5 − 3k4 + 8k3 − 5k 1 − 3k−1 + 8k−2 − 5k−4

= =
bk 1 + k + k2 + k3 + k4 + 2k5 k−5 + k−4 + k−3 + k−2 + k−1 + 2
whose limit is 0.5 which is not zero. By the LCT, both series must converge or

else both series must diverge. Yet the harmonic series bk diverges, and

therefore so does the given series ak .
2 k
2. In the second example, try bk = 3 . Then
k
ak k − 1/2 −0.5 k
= = 1+
bk k k

whose limit is e−0.5 which is not zero. Since the geometric series bk

converges, the LCT tells us that ak must do so also.
7.3.11 EXERCISE Revisit Exercise 7.3.8 and use the limit comparison test to get
quicker, easier solutions of each of the problems that it posed.
7.3.12 Example: a telescoping series To show, by directly calculating the typical

∞ 1
partial sum, that the series 1 converges.
k(k + 1)
Solution
A quick answer depends on noticing3 that
1 1 k+1−k 1
− = = ,
k k+1 k(k + 1) k(k + 1)
so the nth partial sum of this series
1 1 1 1 1
+ + + + ··· +
1(1 + 1) 2(2 + 1) 3(3 + 1) 4(4 + 1) n(n + 1)
1 1 1 1 1 1 1 1 1 1
= − + − + − + − + ··· + −
1 2 2 3 3 4 4 5 n n+1
cancels down almost completely (telescopes) to 11 − n+1

1
whose limit is obviously 1.
−2
7.3.13 Example To show that the series k converges.
7.3.14 Solution Use the preceding example and the limit comparison test: if we
put
1 1
ak = , bk =
k2 k(k + 1)
ak
then it is immediate that → 1 = 0, so the convergence of bk proves the
bk
convergence of ak .
3 The so-called theory of partial fractions, which you may have come across, helps one to notice
such things (especially in more complicated examples).
7.3.15 EXERCISE Find constants a and b such that
1 a b
= +
k(k + 2) k k+2
∞ 1
and hence show that the series 1 converges (because its partial sums
k(k + 2)
collapse ‘telescopically’).
7.3.16 HARDER EXERCISE Let t > 1 be a constant. Prove that the series
∞
1 1 1 1 1 1
k−t = 1 + t
+ t + t + t + t + t ···
2 3 4 5 6 7
k=1
converges.
Partial solution
We can use the same kind of estimation of partial sums that we employed for the
harmonic series (in paragraph 7.2.8) but, instead of grouping the terms in blocks
that end with a negative power of 2, this time we make them start with a negative
power of 2, thus:

1 1 1 1 1 1
+ < + = 2 = t−1 ,
2t 3t 2t 2t 2t 2

1 1 1 1 1 1
+ + + < 4 = t−1 ,
4t 5t 6t 7t 4t 4

1 1 1 1 1 1
+ + · · · + + < 8 = t−1
8t 9t 14t 15t 8t 8
and so on. The pattern emerging this time concerning the partial sums is:
1 1 1
s(2n −1) < 1 + + + ··· +
2t−1 4t−1 (2n−1 )t−1
for each n ≥ 2. Now, recognise the right-hand side of the last display as being
a partial sum of a convergent geometric series, and therefore bounded above by
some constant M (the sum-to-infinity of that geometic series). Lastly, argue that
each partial sum of the original series is less than s(2n −1) for a suitably chosen value
of n, and is therefore less than M. An appeal to Theorem 7.3.1 (convergent equals
bounded for series of positive terms) will now complete the argument.
7.3.17 Note Another consequence of the (very limited) effect of deleting

a finite
number of terms from a series concerns its so-called tails. For a series ∞
k=1 xk and
a positive integer n, the series
∞

xk = xn+1 + xn+2 + xn+3 + xn+4 + · · ·
k=n+1

is called a tail of ∞ th
1 xk – more precisely, its n tail. Since this is formed by
omitting the first n terms, part at least of the following example should be quite
obvious.
∞
7.3.18 Example Given a series 1 xk , to show that the following three state-
ments are equivalent:

1. ∞ 1 xk converges,

2. every tail of ∞ 1 xk converges,

3. at least one tail of ∞1 xk converges.
Solution
• Suppose that statement 1 is true. For each n ≥ 1, the nth tail is formed by
omitting the first n terms of the series and therefore, by Remark 7.3.4, it is itself
a convergent series, that is, 2 is true.
• Suppose that statement 2 is true. Then the truth of 3 is immediate.
• Suppose
∞ that statement 3 is true. So we can find a positive integer n such that
k=n+1 xk converges to some limit s. (For N > n) the N partial sum of the
th
given series can now be written as

N
N
xk = (x1 + x2 + x3 + · · · + xn ) + xk
1 k=n+1
and, just by the algebra of limits for sequences,

N
xk → (x1 + x2 + x3 + · · · + xn ) + s (as N → ∞).
1
∞
Thus, 1 xk is a convergent series.
7.3.19 Note It may be helpful to draw attention to the structure of that last
demonstration. To claim that a number of statements are equivalent is to say that if
any one of them is true then so are all the others. So when we said that statements
1, 2 and 3 were equivalent, we were asserting that 1 implies 2, 2 implies 1, 2 implies
3, 3 implies 2, 1 implies 3 and 3 implies 1. Fortunately there is no need to give a
separate demonstration of each of these six implications: for instance, once one has
established both 1 implies 2 and 2 implies 3, then 1 implies 3 follows immediately.
The usual way to set up an efficient proof of equivalence for three statements is to
confirm either that
1⇒2⇒3⇒1
or that
1⇒3⇒2⇒1
that is, to set out a cyclical proof. This is what we did in 7.3.18 above. As another
illustration, if we wished to write out in full detail why the five ‘equivalent condi-
tions’ in 4.1.5 actually are equivalent, a cyclical proof along the lines of 1 ⇒ 2 ⇒
3 ⇒ 4 ⇒ 5 ⇒ 1 or 1 ⇒ 4 ⇒ 3 ⇒ 5 ⇒ 2 ⇒ 1 would be an efficient strategy.
∞
7.3.20 EXERCISE Given a convergent series 1 xk , let sn denote the sum of the
nth tail (which, by 7.3.18, is necessarily convergent). Show that limn→∞ sn = 0.
7.4 The root test and the ratio test

The two most powerful tests to be discussed in this chapter are called the root
test (or the nth root test) and the ratio test (or d’Alembert’s ratio test). They both
require the terms of a series to be non-negative before you can apply them to it.
They are quite similar in effectiveness and in the details of their proofs, but in most
applications one of them is likely to work much more easily than the other, so it is
important to make a sensible choice of which one to try. We’ll return later to how
you should make that choice.
∞
7.4.1 The nth root test Suppose that n=1 an is a series of non-negative terms
√
and that n an converges to a limit (as n → ∞). Then:
1. if < 1 then the series converges,
2. if > 1 then an does not tend to zero, and therefore the series diverges.
Proof
1. Assuming < 1, we consider the number half-way between and 1; we can
write this either as + ε or as 1 − ε if we choose ε to be half the length of the
1−
gap, that is, ε = .
2
2ε
1
( − ε)
+ε=1−ε
(Limit is less than 1)
√
n a → we can find n such that, for every n ≥ n :
Because n 0 0
√
n
an < + ε = 1 − ε, and therefore an < (1 − ε)n .
7.4 THE ROOT TEST AND THE RATIO TEST 119

Since the geometric series (1 − ε)n converges, so does the ‘smaller’ series

an according to the comparison test (with alteration/omissions, since we
‘lost’ the first n0 terms here).
2. Now assuming > 1 (and thinking ε = − 1 > 0) we can find n1 such that,
for every n ≥ n1 :
√
n
an > − ε = 1, and therefore an > 1.
1
=−ε ( + ε)
(Limit is greater than 1)
guarantees that an → 0 cannot be true, so by an earlier theorem

That certainly
(7.2.6) we get an to be divergent.
7.4.2 Remark You may well have noticed that we have gone back to using n
instead of k for the label on the typical term. This is safe because, at the moment, we
are not looking both at terms and at partial sums in the same paragraph, so there
are not two different labels to keep separate in our minds. Of course, it is perfectly
acceptable to use k or another letter instead of n.
7.4.3 Warning If the limit of the nth root of the nth term is exactly 1, this
test tells us nothing at all: for all it knows, the series could be convergent or
divergent. So we will need to look for a different test when such a borderline case
−1 −2
arises. As illustration of this point, notice that n is divergent and n is
th th
convergent, but in both cases you get a limit of 1 for the n root of the n term.
√n √ 2
(Note that n2 = n2/n = n n , and recall the result of 6.2.6.)
7.4.4 Example Does the series
(1 + 3 )n2
n
(1 + πn )n
2
converge or diverge?
Solution
The typical term is positive so we can try the root test. The nth root of the nth term is
(1 + n3 )n
(1 + πn )n
e3
which converges to eπ = e3−π which is less than 1, so the given series converges.
7.4.5 Example For precisely which positive values of t does the series
3n2 − 1 n
tn
2n2 − 1
converge?
Solution
All terms are positive, and the nth root of the nth term is

3n2 − 1
t
2n2 − 1
3t
which has a limit of so, by the root test,
2
1. for 0 < t < 2/3 the limit is < 1 and so the series converges,
2. for t > 2/3 the limit is > 1 and the nth term does not tend to zero and the
series must diverge.
It remains to ponder what happens when t is exactly 2/3. Luckily, in that borderline
case the nth term itself is
n n 2 n
3n2 − 1 2 6n − 2
=
2n2 − 1 3 6n2 − 3
which is (just) greater than 1 (in the final fraction, the top line exceeds the bottom
line by 1). That shows once again that the nth term cannot tend to zero and the
series diverges.
We conclude that the given series converges precisely when 0 < t < 2/3.
7.4.6 EXERCISE
1. Determine whether or not the series
1
n2 +2
1−
3n − 1
converges.
2. For which positive values of t does the series
(3n + 1)n
tn
nn−1
converge, and for which does it diverge?

∞
7.4.7 D’Alembert’s ratio test Suppose that n=1 an is a series of positive terms
an+1
and that the growth rate converges to a limit (as n → ∞). Then:
an
1. if < 1 then the series converges,

2. if > 1 then an does not tend to zero, and therefore the series diverges.
Proof
1. Assuming < 1, we again consider the number + ε = 1 − ε half-way
1−
between and 1, where ε = .
2
2ε
1
( – ε)
+ε=1−ε
(Limit is less than 1)
an+1
Because → we can find m such that, for every n ≥ m:
an
an+1
< + ε = 1 − ε, and therefore an+1 < (1 − ε)an .
an
Therefore
am+1 < (1 − ε)am , am+2 < (1 − ε)am+1 < (1 − ε)2 am ,

am+3 < (1 − ε)am+2 < (1 − ε)3 am , · · ·
and the pattern emerging is that
am+k < (1 − ε)k am for each k ≥ 1.

∞ the geometric series (1 − ε) converges, so does the ‘smaller’ series
Since k
k=1 am+k according to the comparison test with scaling (by am ). We ‘lost’
the first m terms this time, but that doesn’t prevent the entire series an from
converging also.
2. Now assuming > 1 (and thinking ε = − 1 > 0) we can find p such that,
for every n ≥ p:
an+1
> − ε = 1, and therefore an+1 > an .
an
1
=−ε ( + ε)
(Limit is greater than 1)
So (an ) is an increasing sequence of positive terms if we disregard the first p

terms. That guarantees that its limit (if it even has one) cannot be 0, and so

once again an is divergent.
7.4.8 Warning If the limit of the growth rate is exactly 1, this test too tells us
nothing at all, and we must seek a different test to analyse such a borderline case.
−1 −2
For instance, the divergent series n and the convergent series n both
have limits of 1 for their growth rates.
7.4.9 Example To decide whether the following series converges or diverges:
(n!)3 25n
vn where vn = .
(3n)!
Solution
vn+1
Carefully cancel all you can in the ratio and you should find that the growth
vn
rate is
(n + 1)3 25
.
(3n + 1)(3n + 2)(3n + 3)
Using the algebra of limits, we see that this fraction converges to 32/27 which is

greater than 1, so vn diverges.
7.4.10 Example For exactly which positive values of x does the following series
converge?
(n + 1)!(2n + 2)! xn
wn where wn =
(3n + 3)!
Solution
wn+1
The growth rate cancels to
wn
(n + 2)(2n + 3)(2n + 4)x (2n + 3)(2n + 4)x

=
(3n + 4)(3n + 5)(3n + 6) 3(3n + 4)(3n + 5)
whose limit is 4x/27. By the ratio test, therefore:

1. for 0 < x < 27/4 the limit is less than 1 and the series converges, but
2. for 27/4 < x the limit exceeds 1, the nth term does not tend to zero and the
series diverges.
It remains unclear at first what will happen in the borderline case x = 27/4.
Notice, however, that when x = 27/4, the growth rate is actually
(2n + 3)(2n + 4)27 (6n + 9)(6n + 12)

=
12(3n + 4)(3n + 5) (6n + 8)(6n + 10)
which is greater than 1 (look at the individual factors in the top and bottom lines).
Thus wn+1 > wn , the terms are increasing, the terms do not converge to zero and
the series again diverges.
We conclude that this series converges only when 0 < x < 27/4.

7.4.11 EXERCISE Determine whether or not the series an converges, where:
nn × n!
an = .
(2n − 1)!
7.4.12 EXERCISE Determine the range of values of the real number x for which
the series
(n!)4 x2n
(4n)!
converges. (Note that the wording of the question allows x itself to be negative.)
7.4.13 Remark Notice that the ratio test is particularly suitable for series whose
terms involve several factorials, simply because massive cancellation occurs. For
instance,
(n + 1)! 1 × 2 × 3 × 4 × · · · × n × (n + 1)
= = n + 1,
n! 1 × 2 × 3 × 4 × ··· × n
(3n)! 1 × 2 × 3 × 4 × · · · × 3n
=
(3n + 3)! 1 × 2 × 3 × 4 × · · · × (3n) × (3n + 1) × (3n + 2) × (3n + 3)
1
=
(3n + 1)(3n + 2)(3n + 3)
and so on.
Take care that, when writing down the formula for term number n + 1, you do
so by replacing n by n + 1 at each of its appearances in term number n so that, for
example, 2n + 1 turns into 2n + 3 (that is, 2(n + 1) + 1). Don’t just ‘add 1 to each
bracket’.
Series in which the nth term’s formula is dominated by nth powers are likely
candidates for simplification by the nth root rest, of course. Where the formula
contains both factorials and nth powers, the decision is less clear-cut – but unless
and until you can access some useful information about the nth root of factorial n,
the ratio test is still probably the better bet.
.........................................................................
8 Continuous functions
— the domain thinks
that the graph
is unbroken
.........................................................................
8.1 Introduction
In order to study continuity, we need to make sure that we understand the ideas of
function (mapping, map, transformation), domain, codomain (target), range, one-
to-one (injective, 1 – 1), onto (surjective), composite (composition) and inverse
function. Let us begin by revising the most basic points now, and promising to
review other concepts through this chapter as we come to need them.
There are two styles of definition of the term function that you need to be aware
of. We’ll begin with the informal one, which is the one that we shall actually use
almost all of the time. A function from a set D to a set C is any rule, however it may
be expressed, that for each element of D allows us to determine a single associated
element of the set C. If the letter f stands for the function – the rule – then for
each x ∈ D the associated element of C is written as f (x). To describe a particular
function, you need to identify both sets D and C and, most importantly, to specify
the rule in enough detail to allow the reader to work out what f (x) is for each
possible x in D.
If you find that definition rather unsatisfactory, then your unease is justified.
Apart from being somewhat vague (which is bad enough already), it suffers from
the more serious flaw that it does not spell out the meaning of the words rule or
associated, and these are at least as needful of definition as the word function was.1
The way around that is to ground the ideas entirely in set theory, as in the following
formal definition.
A function, also called a mapping, map or transformation, consists of a list of
three sets called (respectively) the domain, the codomain (or target) and the graph,
which satisfy a particular condition that we now describe. If the sets are denoted
1 This is more than a little like defining a circle to mean a perfectly round figure, and then
going on to define ‘round’ to mean ‘shaped like a circle’. We end up merely noting the equivalence
of two ideas – which, of course, is better than nothing – without actually succeeding in defining
either of them.
126 8 CONTINUOUS FUNCTIONS
by D, C and respectively then, firstly, is a subset of the cartesian product set

D × C (that is, consists entirely of ordered pairs of elements (a, b) where each a
belongs to D and each b belongs to C) and, secondly:
for each x ∈ D there is a unique y ∈ C such that (x, y) ∈ .
The function is usually represented by a single symbol such as f . Then the entire
phrase
f is a function whose domain is D and whose codomain is C
is abbreviated to
f :D→C
and usually spoken as
f maps D to C.
For each x in D, the object represented above by y is conventionally written as f (x)
and called the value of f at x or the image of x under f .
Whether, in a particular question, you choose to think of the informal or the
formal definition, let us be clear that a function involves three objects: the domain,
the codomain, and the process of converting each element of the domain into an
element of the codomain. If you change any one of these, then you are looking at
a different function.
To convince yourself that the formal and informal definitions are essentially
saying the same thing, consider this: if we had drawn the graph of some (formally
defined) function f with perfect accuracy then, for each individual x in its domain,
we could trace vertically up or down the graph paper until we found a point on the
graph whose first coordinate was x, because this was guaranteed by the condition.
Furthermore, the same condition also guaranteed that only one such point could
exist. Then the second coordinate of that point is the (single) value of f (x), and we
have uncovered ‘the rule’ which the informal definition spoke of. Yet, conversely,
if we begin with exact knowledge of the rule, and then precisely mark the point
(x, f (x)) for every single value of x in the domain, we shall have drawn the graph
with absolute precision. Thus, perfect knowledge of the rule and perfect knowledge
of the graph determine one another. In the ideal world of infinite precision (which
is where definitions live), a function is its graph.
A real function is a function whose domain and codomain are both subsets of
the real line R. Since all of the functions discussed in this text are real functions,
we shall drop the qualifier real and simply refer to them as functions. In most
examples,2 the domain will be either an interval or a union of intervals, and the
2 An important exception is that of a sequence: formally, a sequence is simply a real function

x : N → R whose domain is N, although the usual xn notation for its typical value instead of
x(n) rather obscures this.
8.2 AN INFORMAL VIEW OF CONTINUITY 127
individual function itself will be spelled out by presenting some sort of formula
or algorithm for calculating f (x) for each permissible input value x. When this is
so, by way of default the domain will comprise all the real numbers x for which
the formula for f (x) makes sense, and the default codomain will be R itself. For
example, the phrase

f is the function given by f (x) = 4 − x2
or, in a more abbreviated style,

the function f (x) = 4 − x2
or, in even more tightly compressed style,

the function 4 − x2
√ 2] → R and that, for each x in the domain, the associated

will mean that f : [−2,
element f (x) of R is 4 − x2 . Likewise,
the function g(x) = (x − 1)−2 (x + 3)−1
will mean the function g : (−∞, −3) ∪ (−3, 1) ∪ (1, ∞) → R defined by the
formula g(x) = (x − 1)−2 (x + 3)−1 . Note, in each case, how the domain has been
defaulted to be as extensive as possible, subject to the formula always returning a
real number.
You will already be familiar with the graphs of common and important functions
such as sin x, cos x and ex , of quadratics with formulas of the form ax2 + bx + c and
of ‘straight line’ functions of the form mx + c. Graphs, even rather rough sketch
graphs, are a useful way of storing and presenting information about functions and,
although they are not in themselves proofs, they often enable us to make sensible
guesses about how particular functions behave. Indeed, a decent sketch graph
frequently helps us to build up a sound, logical proof by supporting and guiding
our intuition in a visual way. Accordingly, we shall begin our investigation of
continuous functions by a short series of sketch graphs aimed at visually explaining
what it is that we are trying to define.
8.2 An informal view of continuity

Take a careful look at these functions and at the back-of-the-envelope graphs we
have drawn for them.
f : (−1, 5) → R, f (x) = x(x−1)(x−2)(x−4); g : (1, 3) → R, g(x) = x2 x.

Graph of f
2
Graph of g
The graph of the first, f , is a classic example of an unbroken ‘continuous’ curve:

we could draw the whole thing without lifting our pen or pencil off the paper. In
contrast, g has a broken graph – there is an obvious gap between the part that lies
to the left of x = 2 and the part that lies to the right. Certainly we could draw
pieces of this graph without lifting our pen, but not the part that is close to x = 2.
Common sense suggests we could say that f is continuous at each point we look
at, but that g is not continuous at 2. One of our immediate tasks is to describe
the difference between these two situations in mathematically precise terms that
do not depend upon our limited artistic skills or on our ability to interpret visual
clues, while taking care not to lose the intuition embedded in such diagrams.
x5 + 1
h : (−3, 0) → R, h(x) = if x = −1, h(−1) = −1;
x4 − 1
x5 + 1
k : (−3, 0) → R, k(x) = 4 if x = −1, k(−1) = −5/4;
x −1
−1
Graph of h
−1
Graph of k
x5 + 1
m : (−3, −1) ∪ (−1, 0) → R, m(x) = .
x4 − 1
−1
Graph of m
As far as visual intuition goes, k looks to be continuous at x = −1 and h does not:

the graph of h looks as if one of its points has been placed incorrectly. In any case,
it seems impossible to draw the part of the graph of h that lies near to −1 without
lifting our pen. It is tempting to say that m also is non-continuous at x = −1
because of the (tiny) break in its graph there, but that is misleading: since −1 is not
actually in the domain of m, the graph of m cannot properly be said to possess a
break at x = −1: rather, at the point where x = −1, the graph of m does not exist
at all. In this sense, the graph of m appears to be unbroken or continuous at every
point at which it makes sense.
Here, for future use, is another trio of graphs that collectively make the same
points:
sin x
p : (−7, 0) ∪ (0, 7) → R, p(x) = ;
x
sin x
q : (−7, 7) → R, q(x) = if x = 0, q(0) = 2;
x
sin x
r : (−7, 7) → R, r(x) = if x = 0, r(0) = 1.
x
Graph of p
Graph of q
Graph of r
We see that p is not defined at all at x = 0 so the question of it being continuous

here really does not arise. In the other two cases, a value has been assigned to the
function at x = 0 and – continuing to think informally – it appears that q is not
continuous at 0 because of the break in its graph, but that r (due to a ‘wiser’ choice
of value at 0) is continuous here.
s : (−2, 2) → R, s(x) = 1 if − 2 < x ≤ 0, s(x) = x−1 if 0 < x < 2.
Graph of s
In this case, there is what we might be inclined to call an infinite gap at x = 0

and, unlike the previous situation involving p, q and r, there is no way to remedy
that by making a judicious choice of a value for the function at x = 0.
Now, how can we turn these informal diagrammatic insights into a proper
mathematical definition?
To make sure that the graph of a function f does not have any kind of gap or
break at a point p in its domain, we need the values of f (x) to approximate f (p)
very closely whenever x (in the domain of f ) is very close to p. The values of x and
of f (x), however, do not constitute sequences – endless lists of one number after
another – rather, they range over solid blocks or intervals of numbers, and this
is unfortunate for us because we have spent the last hundred or so pages learning
how to work with sequences, and would like to be able to capitalise on the skills that
we have developed. We could, though, use sequences as probes within these solid
blocks to seek out breaks. For instance, in the case of the function, sketched above,
that we called g : (1, 3) → R, g(x) = x2x, if we probe near the point x = 2 by
1
using the sequence (xn )n≥1 = 2 − which surely converges to 2, we find
n+1 n≥1
2

1
that g(2) = 8 but that (for every n) g(xn ) = 2 − n+1 which converges to 4, and
so does not converge to g(2). This bears witness to the presence of some kind of
break in the graph at 2, and it does so in a way that is independent of any attempt
we might make to draw the graph.
g(2) = 8
BREAK!
··
·
g(xn) ↛ g(2) g(x3)
g(x2)
g(x1)
x1 x2 x3 · ··
xn → 2
A sequence (approaching a ‘break point’) detects the break


1
Of course, if we had used the different probe-sequence (yn )n≥1 = 2 + n+1
n≥1
2
1
which also converges to 2, this time (for all n) g(yn ) = 2 2 + n+1 which does
converge to 8 = g(2), so we observe that not every probe near to 2 will succeed in
identifying the break.
g(y1)
g(y2)
g(y3)
g(yn) → g(2) ·
··
·· · y3 y2 y1
yn → 2
A sequence (approaching a ‘break point’) fails to detect the break
On the other hand, look again at the first function whose graph we sketched
above, the one described by f : (−1, 5) → R, f (x) = x(x − 1)(x − 2)(x − 4)
and which we thought ‘looked continuous’ at, for instance, the point where x = 3.
If we send in absolutely any probing sequence (zn ) that converges to 3, we see
from the algebra of sequence limits that f (zn ) = zn (zn − 1)(zn − 2)(zn − 4) →
3(3 − 1)(3 − 2)(3 − 4) which is precisely the value f (3) of f at 3: so no sequence
probe finds any evidence of a break in the graph at x = 3.
These examples – and you might usefully try a few more like them just to
reinforce the point – allow us to see a way to identify graphs that are continuous
at a point x = p by using sequences, and without needing to depend on imprecise
and time-consuming sketches: if at least one sequence probe (xn ) (in the domain
of f ) that converges to p finds that f (xn ) does not converge to f (p), then there is
some kind of continuity-fracturing break at p; on the other hand, if all sequences
(xn ) that converge to p find that f (xn ) → f (p), then there is no such break, and we
have continuity.
8.3 CONTINUITY AT A POINT 133
This is, incidentally, not the only way to define and identify continuity for real
functions, but in the present context it is the quickest to understand and the easiest
to use, so we shall run with it:
8.3 Continuity at a point

8.3.1 Definition A function f : D → C is continuous at a point p of its domain
D when, for every sequence (xn ) in D that converges to p, we have f (xn ) → f (p).
8.3.2 Example To show that the (polynomial) function described by

p(x) = x3 − 6x2 + 17x + 11 is continuous at x = 4.
Solution
Let (xn )n≥1 be any sequence in the domain (R) of p that converges to 4 as limit.
Then, using the algebra of limits:
p(xn ) = xn3 − 6xn2 + 17xn + 11 → 43 − 6(42 ) + 17(4) + 11 = p(4).
Hence the result.
8.3.3 EXERCISE Show that the (rational) function defined by the formula
x4 + 3x3 + 8x + 24
r(x) =
x4 + x2 − 20
is continuous at x = −1. Why can your argument not be modified to show that it
is also continuous at x = −2?
8.3.4 Example To show that the ‘floor’ function j(x) = x is not continuous at
x = 2, but is continuous at x = 2.2.
Solution
To take the second part of the question first . . . if (xn ) is any sequence whose limit
is 2.2 then, for all n ≥ some n0 , we shall have |xn −2.2| < 0.2, that is, 2.0 < xn < 2.4
which implies that j(xn ) = xn = 2. Ignoring the first n0 − 1 terms of the sequence
(which, as usual, has no effect on its limiting behaviour) we get j(xn ) → 2 = j(2.2).
Therefore j is continuous at 2.2.
In contrast, if we take the particular sequence (yn ) described by
yn = 2−(n+1)−1 then certainly yn → 2. However, since each term of this sequence
lies between 1.5 and 2 and therefore has floor 1, we find that j(yn ) → 1 = j(2).
The discovery of even one sequence convergent to 2 whose convergence is not
preserved by j shows that j is not continuous at 2.
8.3.5 EXERCISE
√ Verify that the function s(x) = x2 − x2 is not continuous
at x = 3.
8.3.6 Example To show that the function f defined by
f (x) = |x2 + 3x| if x is rational; f (x) = −|x2 + 3x| if x is irrational
is continuous at x = 0, but not continuous at x = 1.
Solution
Firstly, the domain of f is the whole of R so 0 is a point of its domain. Also note
that, since 0 is rational, f (0) = |02 + 3(0)| = 0.
For any x we have −|x2 + 3x| ≤ f (x) ≤ |x2 + 3x| so, if (xn ) is any sequence in
R that converges to zero, we get
−|xn2 + 3xn | ≤ f (xn ) ≤ |xn2 + 3xn |.
Now (as n → ∞) |xn2 + 3xn | → 0 and −|xn2 + 3xn | → 0. The squeeze then tells
us that f (xn ) → 0 also. By our definition, f is continuous at 0.
Secondly, since 1 is rational, f (1) = |12 + 3(1)| = 4.
We can√easily devise a sequence of irrationals that converges to 1: for instance,
yn = 1 + n2 will do fine. Then f (yn ) = −|yn2 + 3yn | → −|12 + 3(1)| = −4. Since
4 and −4 are different, the definition says that f is not continuous at 1.
(It is quite routine to modify that argument to show that f is not continuous
anywhere except at x = 0 and at x = −3.)
8.3.7 EXERCISE Assuming elementary trigonometric properties, show that the

function f that is defined by
f (x) = x sin(x−1 ) if x = 0; f (0) = 0
is continuous at x = 0.
By far the most useful, best behaved functions are those that are continuous not
just at individual points, but at every point in their domains. This is where we shall
now concentrate our attention.
8.4 Continuity on a set

8.4.1 Definition
• A function f : D → C is said to be continuous (or, for emphasis, continuous on

D) if it is continuous at every point of its domain D.
• A function f : D → C is said to be continuous on a set S (where S is a subset of
D) if the restriction of f to S is continuous.
8.4 CONTINUITY ON A SET 135
Recall, at this point, that the restriction of f to S is ‘the same function as f was’
except that it is only defined at the points of S. That is, it is the function f : S → C
described by f (x) = f (x) for every x ∈ S. By way of example, the function
f (x) = x is – as we saw earlier – not continuous at x = 2, but it is continuous
on [2, 3) because it is actually constant there (and, as we shall soon easily verify,
constant functions are always continuous). The difference3 arises because, when
we investigate f itself near x = 2, we must examine what f does to all sequences
that converge to 2 and, since the domain of f is R, that includes sequences that
approach 2 from the left as well as sequences that approach 2 from the right; in
contrast, when we investigate f on the set [2, 3), we only examine how it behaves
on that set, and not therefore what might be happening to the left of 2.
For convenience, we combine definitions 8.3.1 and 8.4.1 into one:
8.4.2 Definition A function f : D → C is continuous precisely when:

• for every sequence (xn ) in D that converges to an element p of D, we have
f (xn ) → f (p).
The displayed condition here is often rendered in English as ‘f preserves limits of

sequences’.
Some functions, naturally enough, are extremely simple to prove continuous:
8.4.3 Example A constant function (say, f : D → C where f (x) = some constant

k for all x ∈ D) is continuous.
Solution
For each p ∈ D and each sequence (xn ) in D that converges to p, we have
limn→∞ f (xn ) = limn→∞ k = k = f (p).
8.4.4 Example An identity function (that is, a function f : D → C where

f (x) = x for all x ∈ D) is continuous.
Solution
For each p ∈ D and each sequence (xn ) in D that converges to p, we have
limn→∞ f (xn ) = limn→∞ xn = p = f (p).
3 If it helps you to understand this point, feel free to think in terms of the informal description
of continuous functions as those whose domains believe that their graphs are unbroken. The
domain of f includes 2 and numbers to the right and to the left of 2, so it is able to ‘see’ the abrupt
change in height of the graph, from 1 immediately to the left of 2, to 2 at and immediately to the
right of 2; on the other hand, the domain of f is only [2, 3) and, consequently, all it ‘sees’ of the
graph is an unbroken horizontal line.
At this point, if we knew that it was safe to add and multiply continuous
functions and be certain that the results were continuous, we could begin to
build up a useful catalogue of basic continuous functions such as kx, x2 , kx2 ,
kx2 + mx, kx3 + mx2 + qx + r and many others . . .
8.4.5 Theorem Suppose that the functions f and g are continuous on a set D; then
so are
1. f + g,
2. f − g,
3. their product fg,
4. kf for any constant k,
5. f /g provided that g(x) = 0 for each x ∈ D, and
6. |f |.
Proof
All six parts are proved in the same rather predictable fashion, so we shall only
demonstrate the (very slightly trickier) part 5.
Let (xn ) be any sequence in D that converges to an element p of D. Because f and
g are continuous on D, we know that f (xn ) → f (p) and that g(xn ) → g(p). We
additionally know that none of the g(xn ) can be zero. By the algebra of limits for
sequences yet again, f (xn )/g(xn ) → f (p)/g(p), that is,

f f
(xn ) → (p).
g g
Thus f /g is continuous.
8.4.6 Corollary
1. Every polynomial is continuous (on its domain R).

2. Every rational function is continuous (on its domain).
Proof
The typical polynomial
p(x) = a0 + a1 x + a2 x2 + a3 x3 + · · · + an xn
can be built up in a finite number of moves from the basic constituents of x (that
is, the identity function on R) and constants by adding, multiplying and scaling.
Parts 1, 3 and 4 of the theorem assure us that continuity will not be lost in that
process.
8.4 CONTINUITY ON A SET 137
The typical rational function is one polynomial divided by another
p1 (x)
r(x) =
p2 (x)
and its domain comprises all real numbers except those at which the denominator
p2 (x) takes the value zero. Once we avoid those points (which, of course, are not
in the domain of r anyway), the first part of the corollary says that r(x) is one
continuous function divided by another that is nonzero, and part 5 of the theorem
tells us that r(x) is continuous.
There is a way of combining functions that simply does not apply to sequences
and, therefore, was not mentioned in the earlier chapters of this text. If f : D → C
and g : C → B are two functions such that the codomain of f equals4 the domain
of g then, for each x ∈ D, we see that f (x) lies in the domain of g and therefore
g(f (x)) makes sense and is an element of B. Thus the correspondence of x to g(f (x))
has created a function from D to B. It is called the composite or the composition of
f and g, and denoted by g ◦ f in most textbooks. In summary:
If f : D → C and g : C → B then their composite is the function
g◦f :D→B
specified by
(g ◦ f )(x) = g(f (x)) for each x ∈ D.
8.4.7 Warning Beware that in some books the composite is written as gf and, in
this case, you must take great care not to confuse it with the product g times f .
√
Just to illustrate this: if f (x) = 3 x and g(x) = x2 + 5 then the ordinary product
√
gf is given by (gf )(x) = g(x)f (x) = (x2 + 1) 3 x, but the composite g ◦ f by
√ √
(g ◦ f )(x) = g( 3 x) = ( 3 x)2 + 5. Notice also that √the composite the other way
round, f ◦ g, is given by (f ◦ g)(x) = f (x + 5) = 3 x2 + 5 and is a completely
2
different function from g ◦ f .
The relevance of the composition idea at this point is:
8.4.8 Theorem: continuity of composites If f : D → C and g : C → B are both

continuous then so is their composite
g ◦ f : D → B.
Proof
Let (xn ) be any sequence in D that converges to an element p of D. Because f is
continuous, we know that f (xn ) → f (p). Yet this convergence takes place inside
4 This definition works equally well if the codomain of f is merely a subset of the domain of g.
the domain of the continuous function g, and therefore g(f (xn )) → g(f (p)). In
other words, (g ◦ f )(xn ) → (g ◦ f )(p). Therefore g ◦ f is continuous.
√
8.4.9 A look forward Once we get around to checking that 3 x is continuous, this
theorem will assure us that both of the composites in the above Warning paragraph
are continuous.
More generally, once we have confirmed continuity for functions such as
sin, cos, ex and so on, Theorem 8.4.8 will help to generate a huge and diverse
array of functions whose continuity we shall know in advance: expressions such as
cos x 1+x+x2 33 − 7 sin2 x cos4 x

esin x , sin(ex ), cos(sin(ecos x )), esin x+e , e 2+3x+4x2 , ···
sin x + cos x + 5
will be immediately seen to be continuous for no more complicated reason than

that they have been created out of known basic continuous components by combin-
ing them through virtually any algebraic processes (including composition) that
took care only to avoid dividing by zero. In short: continuity, at nearly any point,
of nearly any formula that you can write down, will be a foregone conclusion except,
of course, where division by zero raises its ugly head.5
8.5 Key theorems on continuity

Next, we set out to justify the claim that continuous functions are easier to work
with, and ‘behave better’ than others.
8.5.1 Lemma If s is the supremum of a set A of real numbers, then there is a

sequence of points of A that converges to s.
Proof
For each positive integer n, the definition of supremum tells us that there is an
element an of A such that s − n1 < an ≤ s. Now the squeeze shows that an → s.
8.5.2 The intermediate value theorem (‘IVT’) Let f be continuous on (at least)
a closed bounded interval [a, b] and suppose that either f (a) < λ < f (b) or
f (a) > λ > f (b). Then there is a number c ∈ (a, b) such that f (c) = λ.
(In other words, any number that lies intermediate between two values of a
continuous function on an interval actually is a value of that function.)
5 And provided, of course, that you never try to apply a function to a number that does not lie
in its domain: taking square roots or logarithms of a negative number, for instance, will destroy
not only the continuity of an expression, but also its very meaning.
8.5 KEY THEOREMS ON CONTINUITY 139
Proof
Take the first case6 f (a) < λ < f (b). We put
X = {x ∈ [a, b) : f (x) < λ},
and we see that X is not empty (for a at least belongs to it) and is bounded above
by b. Put s = the supremum of X.
By the lemma, there is a sequence (an ) in X such that an → s. Continuity gives
us f (an ) → f (s) and, since each f (an ) < λ, therefore (using 4.1.17)
f (s) = lim f (an ) ≤ λ.........(1)

n→∞
In particular, since f (s) < f (b), s must be strictly less than b, and the whole of
(s, b] lies outside X: that is, f (y) ≥ λ at every point y of (s, b].
Now choose any sequence7 (yn ) in (s, b] that converges to s and, just as before,
we get
f (s) = lim f (yn ) ≥ λ...........(2).
n→∞
From (1) and (2) we conclude that f (s) = λ.
Notice that the variant of the IVT that uses non-strict inequalities is also perfectly
valid (and doesn’t count as a different theorem):
8.5.3 The intermediate value theorem (again) Let f be continuous on (at least)
a closed bounded interval [a, b] and suppose that either f (a) ≤ λ ≤ f (b) or
f (a) ≥ λ ≥ f (b). Then there is a number c ∈ [a, b] such that f (c) = λ.
Proof
If f (a) < λ < f (b) or f (a) > λ > f (b) then there is nothing new to prove, because
the original IVT applies. Yet if λ equals f (a) or f (b), the result is trivial anyway:
c = a or b will satisfy.
8.5.4 Example To show that the equation p(x) = x2 (x2 − 4)(x − 3) − 1 = 0 has
at least one (real) solution.
Proof
Since p is continuous (being just a polynomial) and p(0) = −1 is negative, all
we need to do is to find a value of x such that p(x) is positive: for then zero will lie
between two values of p, and the IVT will guarantee that zero actually is a value of p
6 For the second case, apply the conclusion of the first case to the continuous function (−f ).
7 For instance, yn = s + b−s
n will do.
– as required. A little experimentation will readily find, for example, that p(1) = 5
which is positive. Therefore by 8.5.2 there is some c ∈ (0, 1) such that p(c) = 0.
8.5.5 EXERCISE
1. Fill in the details in the following alternative (but heavier-handed) solution of

the last example: if n is a positive integer then
p(n)
→ 1 as n → ∞
n5
so p(n)/n5 will be positive for all sufficiently large values of n. Therefore we

can find a positive integer n2 such that p(n2 ) is positive. Similarly we can find a
negative integer n1 such that p(n1 ) is negative. Now the IVT says that there is
c ∈ (n1 , n2 ) such that p(c) = 0.
2. Think how you could modify the demonstration of part 1 to prove that every
polynomial equation of odd degree (that is, one in which the highest power of
x appearing is an odd power) has at least one real solution.
8.5.6 Example To show that the equation p(x) = x2 (x2 − 4)(x − 3) − 1 = 0 has
five (real) solutions.
Roughwork
This needs rather more trial-and-error . . . more precisely, it requires enough
number-crunching to find not just one but five intervals over which p(x) changes
sign. We eventually discovered this:
p(−2) = −1; p(−1) = +11; p(0) = −1;

p(1) = +5; p(2) = −1; p(4) = +191.
Now we can use the same argument as in the previous example:
Solution
Since p is a continuous polynomial and p(−2) = −1 < 0 < 11 = p(−1), there
exists c1 in (−2, −1) such that p(c1 ) = 0.
Since p is a continuous polynomial and p(−1) = 11 > 0 > −1 = p(0), there
exists c2 in (−1, 0) such that p(c2 ) = 0.
Since p is a continuous polynomial and p(0) = −1 < 0 < 5 = p(1), there exists
c3 in (0, 1) such that p(c3 ) = 0.
Since p is a continuous polynomial and p(1) = 5 > 0 > −1 = p(2), there exists
c4 in (1, 2) such that p(c4 ) = 0.
Since p is a continuous polynomial and p(2) = −1 < 0 < 191 = p(4), there
exists c5 in (2, 4) such that p(c5 ) = 0.
Because of the intervals in which they lie, c1 , c2 , c3 , c4 and c5 must all be different
solutions of the equation. (You probably also know that a polynomial equation
of degree 5, such as this, can never have more than five solutions: indeed, that a
polynomial equation of degree n can never have more than n distinct solutions.)
8.5.7 Example To show that a function that is continuous on an interval and all
of whose values are rational must actually be constant on that interval.
Solution
Let f : I → C be continuous on the interval I. If it were not constant, we could
find x1 , x2 in I such that f (x1 ) = f (x2 ). Suppose, to make the picture more definite,
that x1 < x2 and that f (x1 ) > f (x2 ) (the other cases will work out in a very similar
manner). We know that f (x1 ), f (x2 ) are rational, but we can choose an irrational
number j that lies between them.8 By the IVT, j must be a value of f at a point
somewhere in the interval I: but this contradicts what we were told about its values
all being rational.
8.5.8 Example Given a continuous function f on the interval [0, 3] such that
f (0) = −f (3), to show that the equation f (x) + f (x + 2) = f (x + 1) has a solution
in [0, 1].
Solution
We create a new function (suggested by the equation we are trying to solve)
g(x) = f (x) − f (x + 1) + f (x + 2). This is defined on [0, 1] and is continuous
there (because it has been built from continuous components f , x + 1 and x + 2).
Notice that
g(0) = f (0) − f (1) + f (2)
and
g(1) = f (1) − f (2) + f (3) = f (1) − f (2) − f (0) = −(f (0) − f (1) + f (2))
have opposite signs.9 Thus 0 lies intermediately between two values g(0), g(1) of
continuous g and therefore is a value of it: 0 = g(t) for some t ∈ [0, 1], that is,
0 = f (t) − f (t + 1) + f (t + 2) or f (t) + f (t + 2) = f (t + 1).
8.5.9 EXERCISE Given a continuous function f : [0, 1] → [0, 1], show that the
equation
(f (x))2 = x5
must have at least one solution in [0, 1].
f (x )−f (x )
8 For instance, f (x2 ) + 1 √ 2 would do.
2
9 Unless, of course, both are equal to zero: but then the result is immediate.
Roughwork
Does g(x) = (f (x))2 − x5 define a continuous function? Also pay attention to the
codomain of f this time.
8.5.10 Optional extra – another proof of IVT This proof ought to remind you of
how we proved Bolzano-Weierstrass . . . which is very timely: for very shortly we
are going to be making serious use of Bolzano-Weierstrass at last.
8.5.11 Lemma Suppose that f : [a, b] → R is continuous on the interval [a, b],
and f (a) < 0, and f (b) ≥ 0. Then there is c ∈ (a, b] such that f (c) = 0.
Proof
As a piece of temporary jargon, let us call [a, b] a signchange interval for f to mean
that f (a) < 0 and f (b) ≥ 0. We are going to look for smaller signchange intervals
for this function.
Consider the midpoint m = (a + b)/2 of [a, b]. If f has a negative value here,
then [m, b] is a signchange interval for f ; if not, then [a, m] is a signchange interval
for f ; in each case we have found one half of [a, b] – label this half [a1 , b1 ] – that is
a signchange interval for f .
Repeat the process: we shall find one half of [a1 , b1 ] – label this half [a2 , b2 ] –
that is a signchange interval for f .
Repeat the process: we shall find one half of [a2 , b2 ] – label this half [a3 , b3 ] –
that is a signchange interval for f .
Continue indefinitely.
We are producing two sequences (an ) and (bn ) in [a, b] and, because each
interval contains the next one, they satisfy
a1 ≤ a2 ≤ a3 ≤ a4 ≤ · · · < b; b1 ≥ b2 ≥ b3 ≥ b4 ≥ · · · > a.
So these two sequences are monotonic and bounded, and therefore converge:
an → c (as n → ∞); bn → c (as n → ∞)
for some c, c in [a, b]. But also, since each interval has just half the length of the
previous one:
bn − an = (b − a)/2n .
Taking limits there, we see that c − c = 0: in other words, c = c .
Now f is negative at the left endpoint of each signchange interval so f (an ) < 0
for all n, whence (using continuity at last) f (c) = limn→∞ f (an ) ≤ 0.
Equally, f (bn ) ≥ 0 for all n, whence (via continuity) f (c ) = limn→∞ f (bn ) ≥ 0.
Since c = c , when we combine these we get f (c) = 0 as desired.
(Also, since f (a) < 0 and f (c) = 0, a and c cannot be equal; so c ∈ (a, b].)
8.5.12 Theorem – the IVT yet again Let f be continuous on a closed bounded
interval [a, b] and suppose that either f (a) ≤ λ ≤ f (b) or f (a) ≥ λ ≥ f (b). Then
there is a number c ∈ [a, b] such that f (c) = λ.
Proof
If λ = f (a) or f (b), the result is immediate. Otherwise, apply the last lemma to the
function f (x) − λ (or to the function −f (x) + λ for the case f (a) > λ > f (b)).
8.5.13 Definition If A is a subset of the domain of a function f : D → C,

the notation f (A) means the set of all the values f (x) as x varies over A. More
formally,
f (A) = { f (x) : x ∈ A}
and is called the image of A or, sometimes, the range of f over the set A. Since it is a
set of real numbers, it may be bounded above, or bounded below, or bounded, or
have a maximum, or a minimum, or a supremum, or an infimum. By convention,
these various properties are then also ascribed to f : we speak of the function being
bounded above, or bounded below, or bounded, or having a maximum value,
or a minimum value, or a supremum, or an infimum on the set A. Once again
it is perfectly possible for a well-behaved function not to have a maximum or a
minimum value, or even not to be bounded at all: easy examples can be set up to
show this. However, for a continuous function on a closed, bounded interval, no
such eccentric behaviour can occur – as we shall now show.
8.5.14 Theorem – continuous functions on closed bounded intervals are

bounded If f : [a, b] → R is continuous, then it is bounded on [a, b].
Proof
Suppose firstly that it were not bounded above. Then, for each positive integer n,
n cannot be an upper bound for the range f ([a, b]), so there is a point xn ∈ [a, b]
such that f (xn ) > n.
According to Bolzano-Weierstrass, the bounded sequence (xn ) thus created has
a convergent subsequence: xnk → p for some p ∈ [a, b]. Now continuity tells
us that f (xnk ) → f (p) so, in particular, (f (xnk )) is a convergent sequence, and
therefore bounded. Yet it is not: for we arranged that, for each positive integer k,
f (xnk ) > nk ≥ k. The contradiction shows that f must have been bounded above.
Secondly, much the same argument will yield a contradiction from supposing
that f were not bounded below. (Alternatively, since we now know that continuous
functions on [a, b] are always bounded above, apply that fact to the continuous
function (−f ): for ‘(−f ) bounded above’ and ‘f bounded below’ say exactly the
same thing.)
This theorem entitles us to speak of the supremum and the infimum of any
continuous function on a closed bounded interval, and yet there is better news
than that – the sup and the inf are actually values of the function: so we can safely
speak of the function’s biggest and smallest values instead.
8.5.15 Theorem – sup and inf are attainedIf f : [a, b] → R is continuous, then
it possesses a maximum and a minimum value on [a, b].
Proof
Knowing from the previous theorem that f ([a, b]) is bounded and therefore has
a supremum fsup and an infimum finf , we need to find x0 , x1 in [a, b] such that
f (x0 ) = finf and f (x1 ) = fsup .
For each n ∈ N, the definition of supremum tells us that there is a point yn in
[a, b] such that
1
fsup − < f (yn ) ≤ fsup .
n
Once more we have a bounded sequence (yn ), and once more Bolzano-Weierstrass
promises us a convergent subsequence: say
ynk → p as k → ∞
where also p ∈ [a, b]. Appealing to continuity, we find that f (p) = limk→∞ f (ynk ).
Yet, when we take limits (as k → ∞) across the inequality
1
fsup − < f (ynk ) ≤ fsup
nk
we get fsup − 0 ≤ f (p) ≤ fsup , in other words, f (p) = fsup as required.

A similar argument will show that finf is a value (and therefore the least value)
of f over the interval. (Alternatively, we could apply what we just proved to show
that (−f ) attains a greatest value and, as is easily seen, minus the greatest value of
(−f ) is, precisely, the least value of f .)
8.5.16 Optional extra – an alternative proof that suprema are attained Again
let f : [a, b] → R be continuous, and suppose that it does not attain a maximum
value. In that case, fsup is always strictly greater than the values of f , so the function
1
g(x) =
fsup − f (x)
is defined and continuous everywhere on [a, b]. By above, g(x) is bounded: there
is a positive constant K such that, for every x ∈ [a, b]:
1 1
g(x) ≤ K, that is, ≤ K, that is, fsup − f (x) ≥ ,
fsup − f (x) K
1
that is, f (x) ≤ fsup − .
K
This, however, contradicts fsup being the supremum (the least possible upper
bound) of the values of f .
8.5.17 EXERCISE Show by means of examples (preferably simple ones) that

1. A continuous function on a bounded open interval can fail to be bounded,
2. A continuous function on a bounded open interval, even if it is bounded, can
fail to have a maximum value and can fail to have a minimum value,
3. A continuous function on an unbounded closed interval can fail to be
bounded,
4. A continuous function on an unbounded closed interval, even if it is bounded,
can fail to have a maximum value and can fail to have a minimum value,
5. A discontinuous function on a bounded closed interval can fail to be bounded,
6. A discontinuous function on a bounded closed interval, even if it is bounded,
can fail to have a maximum value and can fail to have a minimum value.
8.5.18 Example Given a continuous function f : [0, ∞) → [0, ∞) such that:
for each ε > 0 there is K > 0 such that f (x) < ε whenever x > K,
show that f has a maximum value.
Solution
(Note that we cannot immediately use the ‘supremum is attained’ theorem since
the domain of f here is an unbounded interval. But it would be good if we could
somehow force the action to take place on a closed bounded interval and, since f (x)
is nearly 0 for very big values of x, that perhaps could be arranged . . . )
In the special case where f is constant at 0, the result is trivial.
If not, then we can find a ∈ [0, ∞) such that f (a) > 0.
Then the given condition on f tells us that we can find a positive number K
so that:
for every x > K, we get 0 ≤ f (x) < f (a)/2.
It should be obvious that K ≥ a. Now on the closed bounded interval [0, K], f
must have a biggest value (f (b), say), and this is at least as big as f (a). It is therefore
bigger than every value that f can take on (K, ∞) since they are all smaller than
f (a). In other words, f (b) is the maximum value that f takes anywhere in [0, ∞).
8.5.19 EXERCISE (Assuming standard information about the exponential func-

tion) given a continuous function f : [0, 10000] → (0, ∞), show that there is a
positive constant b such that
for every x ∈ [0, 10000], f (x) > bex .
(One approach is to consider f (x)e−x . Pay attention to the codomain of f .)

8.5.20 EXERCISE Let f : R → R be any continuous function. We define a new

continuous function g : R → R like this:
g(x) = f (5 sin x + 7 cos x), (x ∈ R).
Show that there is a positive constant K such that |g(x)| < K for every x ∈ R. (You
can assume standard facts about the trig functions.)
8.6 Continuity of the inverse

Not all continuous functions have inverses but, for those that do, the inverse is
also continuous in most of the cases that we encounter. Let us first revise the idea
of inverse mapping. If f : A → B is a mapping or map or function (of any kind, not
only one of the real functions that we are otherwise exclusively focusing upon)
then:
• We call it one-to-one or injective if x = x in A implies that f (x) = f (x ) in B,
that is, if f (x) = f (x ) happens only in the obvious special case where x = x ;
• We call it onto or surjective if its range is the whole of its codomain (not just a
subset of it), that is, if every element y of B is f (x) for some suitably chosen x
in A;
• We call it bijective or a bijection10 if it is both one-to-one and onto.
• An identity mapping is a map idA : A → A whose domain and codomain are
identical and which leaves every element unchanged: that is, idA (x) = x for
every x in A.
• If (given a map f : A → B) there is another map g : B → A such that
g ◦ f = idA and f ◦ g = idB , in other words, such that g(f (x)) = x for every x in
A and f (g(y)) = y for every y in B then11 f is said to be invertible, g is called the
inverse of f , and g is usually written as f −1 .
• It is important not to confuse the inverse map f −1 with the reciprocal 1f : they
are very different ideas. For instance, if f is the function described by the
formula x2 + 1 then its reciprocal 1f is, of course, defined by the formula
1
; yet f is not bijective, so the inverse map f −1 does not exist (see the next
x +1
2
bullet point). Again, if g is defined by the formula
√ x3 + 1 then, as is readily
checked, the inverse map is that defined by x − 1 with domain R , which is
3
1
entirely different from the reciprocal function 3 with domain
x +1
(−∞, −1) ∪ (−1, ∞).
10 or just plain one-to-one onto

11 less formally, such that f and g completely cancel out one another’s effect
8.6 CONTINUITY OF THE INVERSE 147
• The key result from the basic theory of sets and mappings is that the invertible
mappings are precisely the bijective mappings: f has an inverse if and only if it is
both one-to-one and onto.
That last result is important even in very elementary algebra. For instance, what
is the inverse of f (x) = x2 ? The short answer is that it doesn’t have one . . . if, by
that brief formula, we mean the function
f : R → R given by f (x) = x2 :
because it is obvious that this function (now that we have described it fully) is
neither one-to-one nor onto.12 If we modify the definition to
f : R → [0, ∞) given by f (x) = x2
then at least it becomes onto, since every element of [0, ∞) is the square of some
real number, but it is still not one-to-one. However, if we modify the definition
again to read
f : [0, ∞) → [0, ∞) given by f (x) = x2
then what we are looking at now is both one-to-one and onto, so it does possess an
inverse.
The inverse, naturally, is the square root function – the √ function
√
g : [0, ∞) →[0, ∞) given by g(x) = x – because it is clear that x2 = x and
√
( y)2 = y for all non-negative x and y. This pattern of seeking an inverse for
some important function that initially did not have one, by restricting the domain
and/or codomain of its defining formula until it becomes one-to-one and onto, is
common and valuable – we shall meet it again in Chapter 18.
Incidentally, it is important to keep in mind that the last three display lines
defined three different functions, even though we (rather incorrectly) used the
same letter f to stand for all of them.
Just as, amongst sequences, the monotonic ones were often easier to work with
than the rest, functions whose values steadily increase or steadily decrease have
some desirable and useful properties, and it will pay us to give clear definitions to
these classes of function now:
8.6.1 Definition A (real) function f : D → C is said to be

• increasing (on D) if x < y in D ⇒ f (x) ≤ f (y),
• strictly increasing (on D) if x < y in D ⇒ f (x) < f (y),
• decreasing (on D) if x < y in D ⇒ f (x) ≥ f (y),
12 For example, because f (1) = f (−1), and because −3 is in the codomain but is not in the
range of f .
• strictly decreasing (on D) if x < y in D ⇒ f (x) > f (y),

• monotonic if it is either increasing or decreasing,
• strictly monotonic if it is either strictly increasing or strictly decreasing.
(These properties are usually easy to visualise in a sketch graph, simply by

looking to see whether the graph climbs steadily up the page or down as we scan
across from left to right.)
It turns out that a continuous function on an interval is one-to-one if and only
if it is either strictly increasing or strictly decreasing. For that reason, when we set
out to establish results concerning inverses of important functions, we really lose
nothing in the way of generality if we deal only with strictly monotonic functions.
8.6.2 Lemma If f : D → f (D) is strictly monotonic then it is also one-to-one and

possesses an inverse. The inverse is strictly increasing if f is strictly increasing, and
is strictly decreasing if f is strictly decreasing.
Proof
We’ll consider only the case where f is strictly increasing – the other is very similar.
If x = y in D then either x < y or y < x. Accordingly either f (x) < f (y) or
f (y) < f (x): and in both cases, f (x) = f (y), as required for injectivity. The choice
of codomain has ensured that f is also onto, so it is invertible.
Given p < q in f (D) choose x, y in D such that p = f (x) and q = f (y), that is,
x = f −1 (p) and y = f −1 (q). If it were true that x ≥ y then the increasing nature of
f would yield the contradiction p = f (x) ≥ f (y) = q. Hence we must have x < y,
that is, f −1 (p) < f −1 (q). Therefore f −1 is a strictly increasing function.
8.6.3 Lemma Let f : I → R be a continuous function on an interval I. Then its

range f (I) is also an interval.
Proof
Any number y that lies between two elements f (x1 ), f (x2 ) of the range is, according
to the IVT, a value of f , that is, another element of the range: but this is the defining
characteristic of an interval.
8.6.4 Lemma Let f : I → R be a continuous strictly monotonic function on an

open interval I. Then its range f (I) is also an open interval.
Proof
For any element f (x) of the range, x ∈ I cannot be an endpoint of the (open)
interval I, so we can find x , x in I such that x < x < x . Depending on whether
f is (strictly) increasing or decreasing, it follows that either f (x ) < f (x) < f (x ) or
f (x ) > f (x) > f (x ). In both cases, f (x) fails to be an endpoint for the interval f (I),
which therefore cannot include any of its endpoints: hence the result.
8.6.5 EXERCISE Confirm that

• if f is an increasing (respectively, decreasing) function then −f is a decreasing
(respectively, increasing) function,
• if f is a strictly increasing (respectively, strictly decreasing) function then −f is
a strictly decreasing (respectively, strictly increasing) function,
• if f and g are both decreasing (on the same domain) then so is f + g but
• there is an example of two strictly decreasing functions f and g (on the same
domain) whose product fg is strictly increasing.
8.6.6 Remark – optional extra Once we think in more detail about what a con-
tinuous strictly monotonic function can do to intervals of various types, a picture
emerges that is rather more complicated than the last lemma suggests. As a way of
building intuition about this, you could check out the details summarised in the
following table:
Table 8.1. Possible forms of f (I) for interval I and continuous strictly monotonic f
I f (I) if f is contin. and f (I) if f is contin. and

str. increasing str. decreasing
[a, b] [f (a), f (b)] [f (b), f (a)]

(a, b) (c, d) or (c, ∞) or (−∞, d) (c, d) or (c, ∞) or (−∞, d)
or (−∞, ∞) or (−∞, ∞)
[a, b) [f (a), d) or [f (a), ∞) (c, f (a)] or (−∞, f (a)]
(a, b] (c, f (b)] or (−∞, f (b)] [f (b), d) or [f (b), ∞)
[a, ∞) [f (a), d) or [f (a), ∞) (c, f (a)] or (−∞, f (a)]
(a, ∞) (c, d) or (c, ∞) or (c, d) or (c, ∞) or
(−∞, d) or (−∞, ∞) (−∞, d) or (−∞, ∞)
(−∞, b] (c, f (b)] or (−∞, f (b)] [f (b), d) or [f (b), ∞)
(−∞, b) (c, d) or (c, ∞) or (−∞, d) (c, d) or (c, ∞) or
or (−∞, ∞) (−∞, d) or (−∞, ∞)
(−∞, ∞) (c, d) or (c, ∞) or (−∞, d) (c, d) or (c, ∞) or (−∞, d)
or (−∞, ∞) or (−∞, ∞)
8.6.7 EXERCISE – optional extra Confirm that the table above is complete and
correct. That is, for each row, verify that when the interval I has the indicated form
and the function f : I → R is continuous and strictly increasing, f (I) must take
one of the listed forms, and confirm by examples that each listed form actually can
occur; then repeat the exercise for f continuous and strictly decreasing.
Partial solution
Let us consider just the fifth row, the one in which I is of the form [a, ∞).
When f is strictly increasing, since a is the least element of I = [a, ∞)

the interval f (I) must have a least element, namely, f (a). Also, no element
f (x) of f (I) can be a greatest element of f (I) because x ∈ I ⇒ ∃x ∈ I with
x < x ⇒ f (x) < f (x ) ∈ f (I). Therefore [f (a), d) and [f (a), ∞) are the only
possible forms for f (I).
The continuous strictly increasing function f (x) = x2 on [0, ∞) maps [0, ∞)
onto [0, ∞).
The continuous strictly increasing function f (x) = −x−1 on [1, ∞) maps [1, ∞)
onto [−1, 0).
The third column entries can be confirmed by noting that if f is continuous and
strictly decreasing then −f is continuous and strictly increasing, and then using
what we just established in the second column.
8.6.8 Proposition Suppose that f : I → f (I) is strictly increasing on the interval

I, and continuous at a point p ∈ I. Then the inverse f −1 : f (I) → I is continuous
at the point f (p).
Proof
The inverse exists and is strictly increasing by 8.6.2.
Firstly, we shall look in detail at the case where p is not an endpoint of I
(and, consequently, f (p) is not an endpoint of the interval f (I)). Let (yn ) be any
sequence in f (I) that converges to f (p); we need to show that (f −1 (yn )) converges
to f −1 (f (p)) = p.
For each n ∈ N, yn = f (xn ) for some (unique) xn ∈ I, namely xn = f −1 (yn ). Let
ε > 0 be given. There is no loss of generality in assuming that ε is small enough
to ensure that the interval [p − ε, p + ε] lies inside I: for were it not so, we would
replace ε by a smaller number that does ensure this. Consequently, we can talk
about f (p − ε) and f (p + ε), and know that the first is smaller than f (p) and the
second is greater than f (p). Since yn → f (p), we see that13 there will be a positive
integer n0 such that
n ≥ n0 ⇒ yn ∈ (f (p − ε), f (p + ε))
which, in turn, implies that
xn = f −1 (yn ) ∈ (p − ε, p + ε)
merely because f −1 is strictly increasing. So n ≥ n0 forces |xn − p| < ε, and we

have xn → p as required.
13 If this step is not sufficiently clear to you, try putting δ equal to the smaller of the two
numbers f (p) − f (p − ε) and f (p + ε) − f (p), and notice that (for sufficiently large values of
n) we shall have |yn − f (p)| < δ.
Secondly, if I possesses a left-hand endpoint and p happens to be that endpoint,

we need to make small changes to the argument of the last paragraph. This time
we ensure that the interval [p, p + ε) lies inside I. Then f (p + ε) makes sense and
is greater than f (p), so yn must belong to [f (p), f (p + ε)) for all n ≥ some n0 . It
follows that (for these large values of n) xn = f −1 (yn ) belongs to [p, p + ε), and we
again conclude that xn → p.
The third possibility – that I possesses a right-hand endpoint and p happens to
be that endpoint – works out in a manner very similar to the second.
f (p + ε)
f (p)
yn
f (p − ε)
p − ε xn p p+ε
xn = f −1(yn)
Continuity of an inverse mapping
Almost exactly the same argument will show that this also works for decreasing
in place of increasing:
8.6.9 Proposition Suppose that g : I → g(I) is strictly decreasing on the interval I,

and continuous at a point p ∈ I. Then the inverse g −1 : g(I) → I is continuous at
the point g(p).
(Alternatively, you may be able to see how to prove the second proposition for
free, just by applying the first proposition to the increasing function −g.)
Combining the two propositions (invoked at each point of the domain) and
Lemma 8.6.2, we obtain the continuous inverse theorem in the form that is usually
most useful:
8.6.10 Theorem A real function, defined on an interval, that is continuous and

strictly increasing on that interval, possesses an inverse (defined upon its range)
that is also continuous and strictly increasing.
A real function, defined on an interval, that is continuous and strictly decreasing

on that interval, possesses an inverse (defined upon its range) that is also continu-
ous and strictly decreasing.
8.6.11 EXERCISE Verify that

√
• for each odd positive integer n, the function f (x) = n
x is continuous on R,
√
• for each even positive integer n, the function f (x) = x is continuous on
n
[0, ∞).
.........................................................................
9 Limit of a function
.........................................................................
9.1 Introduction
As in the previous chapter, it may be useful to begin by outlining informally and
visually the next topic that we are going to define and investigate. Let’s start by
reviewing the sketch graphs we drew in our first attempt to explain continuity.
Graph of f
Graph of g
Graph of h
154 9 LIMIT OF A FUNCTION
Graph of k
Graph of m
Graph of p
Graph of q
9.1 INTRODUCTION 155
Graph of r
Graph of s
We initially presented continuity (at a particular point x = a) as a way to

distinguish between functions such as f , k and r whose graphs did not seem to
possess any sort of break at the point of interest, and the others whose graphs did –
in some sense – break there. Amongst the discontinuous functions, however, some
of those graphs are seen to be more severely broken than others: g, for instance,
has a gap of 4 units between the left and right portions of its graph, and s has what
appears to be an infinite gap. In contrast, h and m could be thought of as failing
continuity on a mere technicality: either because the function has not been defined
at all at x = a, or because it has, but in a way that is incompatible with its behaviour
close to x = a. Similar remarks apply to p and q.
One way to make explicit the difference between g and s on the one hand, and
h, m, p and q on the other, is to ask how small a change in the function’s definition
would alter it into a continuous function. In the case of m, simply writing in the
extra phrase ‘m(−1) = −5/4’ makes the modified function continuous (because
it turns m into k); in the case of h, altering the value of h(−1) from −1 to −5/4
would have the same effect. That is, altering (or creating) the value of the function
at one single point is all it takes to turn m or h into a continuous function. Similar
remarks – again – apply to p and q. In sharp contrast, there is no way to convert g
or s to continuity by intervening at a single value – the gaps in their graphs simply
cannot be bridged by moving (or filling in) just one point.
This insight also gives us a way to build a proper mathematical definition of the
distinction between functions like g and s, and functions such as h, m, p and q. Since
a function of the second type (let us now denote it by f : D → R) fails continuity
at x = a only because its value at a is either undefined or, in some sense, ‘wrongly
defined’, we can use sequences to probe its behaviour near to a exactly as we did
for ordinary continuity but consistently avoiding x = a. So, starting with Chapter
8’s definition of continuity at the point a:
for every sequence (xn ) in D such that xn → a, we find that f (xn ) → f (a)
we should firstly replace the (perhaps undefined) f (a) by a symbol for the limiting
number to which all the sequences (f (xn )) need to converge, and secondly prevent
the sequences (xn ) from including a as one or more of their terms. This suggests
that the defining characteristic of functions of the second type is this: there is a real
number such that
for every sequence (xn ) in D\{a} such that xn → a, we find that f (xn ) → .
When this is the case, the number which the values of f are approximating
better and better as we approach a (but without actually reaching a, of course) is
called the limit of f (x) as x tends to a or the limit of f at a. In this language, looking
back at our sketch graphs, we intend to say that both h and m have limits of −5/4
at −1, that both p and q have limits of 1 as we approach 0, but that g does not
have a limit as we approach 2 and that s does not have a limit at 0. (Of course, we
still need to show that these are true statements – but at least we now possess a
logically sound definition against which to test their truthfulness.) We also gain
from the discussion an alternative definition of continuity, namely: a function f
is continuous at a point a of its domain if the limit of f as we approach a is
precisely f (a).

··
·
f (xn) → f (x3)
f (x2)
f (x1)
x1 x2 x3 · ·· a
xn → a
Suggested definition of function limit
Next, a technical warning: since the definition that we are setting up (of limit
of f (x) as x approaches a) is entirely dependent on what happens to sequences in
D \ {a} that converge to a, we must take care never to use it if, in fact, there are no
such sequences! For instance, any attempt to find a limit of ln x as x approaches −1,
√
or of x at x = −0.3, or of arcsin x as x tends to 2, is doomed to fail since these
functions are undefined close to the point that we are claiming to approach. For
a subtler example, think about the factorial function f (x) = x! . Now, we don’t
often consider ‘factorial’ as a real function at all since it only handles non-negative
integers but, nevertheless, it does satisfy the requirements of our definition of a real
function (with domain N ∪ {0}). Let us ask, then: what is the limit of this function
x! as x tends to, say, 2? The domain D of this function is {0, 1, 2, 3, 4, 5, · · · }, so
D \ {2} is {0, 1, 3, 4, 5, · · · }, and the definition needs us to look at a typical sequence
in {0, 1, 3, 4, 5, · · · } that converges to 2 . . . but no such sequences exist: a sequence in
that set never gets within the distance 1 of 2, so it cannot converge to 2. We conclude
that the limit of x! as we approach 2 (or, indeed, as we approach any other number)
cannot be defined.
The final matter that we ought to stress before concluding this introductory
section is that, although functions such as m and h appear somewhat artificial at
first sight – contrived examples, designed to deliver a teaching point rather than
practical, useful algebra or calculus – questions concerning the limiting values
of non-continuous functions do turn up in a very large number of important
application-oriented problems, and we should outline a couple of these before
starting our serious study of function limits. The first is one that we have touched
on already (see the functions p, q, r again), and the second introduces an idea that
is fundamental to differential calculus, which we shall work on in Chapter 12.
sin x
9.1.1 Example When x is interpreted as an angle in radians, the ratio
x
calculates out very close to 1 when x is small. This is the basis of many arguments
in Science as well as in Mathematics proper, in which sin x is replaced by (the much
sin x
easier to handle) x as a high-quality approximation. Let us ask, then: when is
x
exactly 1?
Reply
sin x
Never. There is no value of x for which = 1. Of course, as is widely known,
x
the ratio is very close to 1 provided that x is sufficiently close to 0…but a careful
examination shows that the only solution of the equation sin x = x is x = 0, and we
sin x 0
cannot replace x by 0 in the ratio since is meaningless. What this example is
x 0
sin x
informally expressing is that the limit of the function , as x approaches 0, is 1.
x
9.1.2 Example The point P = (3, 9) lies on the graph of the quadratic function
f (x) = x2 . In an attempt to evaluate the slope of the graph exactly at the point
P, we take a nearby point on the graph – say, the point Q = (3 + h, (3 + h)2 ) –
and work out the gradient of the straight line PQ. Provided that the number h is
really small, that straight line should hug the curve closely enough to ensure that
the gradient of PQ will be a good approximation to the gradient of the curve itself
and – thinking imprecisely for a moment – when h = 0, the approximation ought
to become perfect. What, then, actually does happen to the gradient of PQ when
we allow h to become zero?
Reply
It ceases to have any meaning whatsoever. Since the gradient of the straight line
PQ (change in y-coordinate divided by change in x-coordinate) is actually
(3 + h)2 − 32 6h + h2
= ,
(3 + h) − 3 h
0
replacing h by 0 gives, once again, the meaningless symbol . Notice that we cannot
0
simply cancel an h top and bottom in the previous display unless we write h = 0
into the contract, because cancelling just means dividing top and bottom lines by h,
and this is illegal precisely in the special case h = 0 that we really wanted to get
(3 + h)2 − 32
at. What the example is trying to work towards is not the value of
(3 + h) − 3
at h = 0, but its limit at that point: for this is what will give us the gradient of the
curve itself at the point P.
9.2 Limit of a function at a point

As the ‘technical warning’ pointed out, we must first think about the kind of point
at which calculation of a function limit makes sense:
9.2.1 Definition Let S be a set of real numbers and p a real number. We call p a
limit point1 of S if there is at least one sequence of elements of S \ {p} that converges
to p. (Note that p may or may not be an element of S.)
9.2.2 Notes
1. The only case that will occupy our attention is that in which S is the domain of
some function. Then p being a limit point of S is exactly what is needed in
order that we can sensibly try to find a limit of that function at p.
2. If S happens to be an interval then it is easy to see that the limit points of S are
precisely the points of S and the endpoints of S (one or both of which, of
course, might be elements of S already).
3. In nearly all the examples in this text, the domain of a function will be either a
non-degenerate interval or the union of a finite list of non-degenerate
intervals. For such a domain D it is again easy to see that the limit points are
simply the points of D together with all the endpoints of those intervals. (The
recent exception was x!, whose domain N ∪ {0} had no limit points
whatsoever.)
4. In the event that you might need to deal with some function whose domain is
more complicated, the following lemma may be useful in identifying limit
points:
1 also called an accumulation point

9.2 LIMIT OF A FUNCTION AT A POINT 159
9.2.3 Lemma If S ⊆ R and p ∈ R, then p is a limit point of S if and only if:
for each ε > 0, the interval (p − ε, p + ε) includes

a point of S that is different from p.
Proof
If p is a limit point of S, choose a sequence (xn ) in S\{p} such that xn → p. Then for
any choice of ε > 0 there are, in fact, infinitely many xn in the interval (p−ε, p+ε),
and they are all different from p.
Conversely, suppose that the displayed condition holds. Then, choosing ε = 1/n
for each positive integer n in turn, we can find a point of S (call it xn ) in (p−ε, p+ε)
but distinct from p. The sequence (xn ) of elements of S \ {p} thus created satisfies
p − 1/n < xn < p + 1/n so, using the squeeze, it converges to p. Hence p is indeed
a limit point of S.
9.2.4 Optional exercise Verify that

• the limit points of the set of rational numbers include all real numbers,
• the set of integers has no limit points,
• every positive integer is a limit point of the set
{n + (m + 1)−1 : n ∈ N, m ∈ N}.
9.2.5 Definition Suppose that f : D → C, that p is a limit point of its domain D

and that ∈ R. Then we say that f (x) converges to the limit as x → p if:
for every sequence (xn ) in D\{p} such that xn → p, we find that f (xn ) → .
It is also common practice to call the limit of f at p, and to write all this as
lim f (x) =
x→p
or to abbreviate it as f (x) → (as x → p).
9.2.6 Example To show that the function f given by
x2 − 9
f (x) =
x−3
converges to a limit of 6 as x → 3.
Solution
The domain of f is D = (−∞, 3) ∪ (3, ∞) so it has 3 as a limit point.
Let (xn ) be any sequence in D \ {3} such that xn → 3. Then
xn2 − 9 (xn − 3)(xn + 3)

f (xn ) = = = xn + 3 → 6.
xn − 3 xn − 3
(There is an important point near the end of the line which it is all too easy to miss.
When we cancel xn − 3, what we are doing is dividing top and bottom by xn − 3. Of
course we dare not divide by zero . . . but since xn belongs to D \ {3}, we know that
xn − 3 is definitely non-zero; it is precisely this detail that allows us to cancel, and
thus saves us from hitting a nonsensical conclusion, such as: that the limit is 00 .)
Hence f (x) → 6 as x → 3.
Alternative solution
To see more easily what is happening when x is close to 3, it often helps to put
x = 3 + h and then consider h → 0 instead. Thus, given any sequence (xn ) in
D \ {3} such that xn → 3, if we set xn = 3 + hn for each n, then we see that:
(3 + hn )2 − 9 6hn + h2n
f (xn ) = f (3 + hn ) = = = 6 + hn → 6.
(3 + hn ) − 3 hn
(Please note again that the cancellation is legal precisely because we are not
cancelling or dividing zeros: hn can never be exactly zero because xn is never
exactly 3.)
We conclude that limx→3 f (x) = 6.
For a fairly straightforward question such as the previous one, there was very
little to choose between the two different solutions we demonstrated. However, if
the algebra is more complicated and the possibility of cancelling less obvious, then
the trick of putting x = a + h to explore close to x = a can save you quite a bit of
time and effort.
9.2.7 Example To determine the limit, as x → −1, of the function
x5 + 1
f (x) = .
x4 − 1
Solution
Since the only real numbers x for which x4 = 1 are 1 and −1, the bottom line goes
zero only at 1 and −1, the domain D of this function is (−∞, −1)∪(−1, 1)∪(1, ∞),
and −1 is a limit point of D.
Given any sequence (xn ) in D \ {−1} whose limit is −1, put xn = −1 + hn for
each n, and notice that
(−1 + hn )5 + 1 h5n − 5h4n + 10h3n − 10h2n + 5hn

f (xn ) = f (−1 + hn ) = =
(−1 + hn )4 − 1 h4n − 4h3n + 6h2n − 4hn
h4n − 5h3n + 10h2n − 10hn + 5
=
h3n − 4h2n + 6hn − 4
(noting that hn = 0 is what allows us that cancellation, and that to keep the new
bottom line non-zero we also need to prevent xn = 1, that is, avoid letting hn = 2).
Now since hn → 0 we see that, by ignoring the first few terms if necessary, we
can be sure that hn = 2. Then
h4n − 5h3n + 10h2n − 10hn + 5 5 5

f (xn ) = → =− .
hn − 4hn + 6hn − 4
3 2 −4 4
Hence limx→−1 f (x) = −5/4.
9.2.8 EXERCISE The point P = (2, 2) lies on the curve described by y = f (x) =
x3 − 3x2 + 6. Determine the gradient of this curve at the point P. (Hint: set the
problem up as in 9.1.2, and deal with the limit as in 9.2.6.)
9.2.9 Example The function f is defined as follows:
6x2
f (x) = 0 when x = 2 or x = 5/2; otherwise f (x) = .
2x − 5
Determine its limiting behaviour as x → 2.
Solution
The domain is R since, although the fraction formula fails to make sense at x = 5/2,
a separate definition of f (5/2) has been provided. (Why a separate definition of
f (2) has also been provided is not obvious, but the definition of f as a whole is
unambiguous.)
We consider any sequence (xn ) in R\{2} whose limit is 2, and we put xn = 2+hn
for each n, where hn → 0. By ignoring (if necessary) the first few terms, we can
arrange that |xn −2| < 1/2, that is, 3/2 < xn < 5/2: therefore the separate definition
of f at 5/2 is no longer relevant, and
6xn2 6(2 + hn )2 24 + 24hn + 6h2n

f (xn ) = = =
2xn − 5 2(2 + hn ) − 5 2hn − 1
– whose limit (using again the algebra of limits for sequences) is −24. Therefore
lim f (x) = −24.

x→2
9.2.10 Remark Two general points that emerge from the last example are worth
stressing. Firstly, although the value ascribed to f (2) was a distinctly odd choice,
this had absolutely no effect upon the limit as x approached 2 because, when
exploring limiting behaviour of f (x) as x approaches a point p, we don’t care what
(if anything) f (x) does when x exactly equals p – only how it behaves when x is very
close to but distinct from p.
Secondly, the separate definition given to f (5/2) was also essentially irrelevant
to the problem since any investigative sequence (xn ) converging to 2 will eventually
be closer to 2 than 5/2 is, so that the limiting behaviour of (f (xn )) cannot be
influenced by how f behaves at 5/2 or, indeed, at any significant distance from
2. This insight can even be rephrased into an occasionally useful theorem:
9.2.11 Theorem: limits are locally determined Let f : D → C be a func-

tion, p be a limit point of its domain D and η a positive real number. Now let
g : D ∩ (p − η, p + η) → C be the same function2 as f except that it is only
defined on D ∩ (p − η, p + η). Then f (x) → as x → p if and only if g(x) → as
x → p.
Proof
Suppose that g(x) → as x → p. For any sequence (xn ) in D \ {p} that converges
to p, since η > 0 we can find n0 such that n ≥ n0 implies that p − η < xn < p + η,
which in turn shows that
f (xn ) = g(xn ) → as n → ∞
and hence f (x) → as x → p.

The converse is simpler because any sequence in (D \ {p}) ∩ (p − η, p + η)
(converging to p) already is a sequence in D \ {p} (converging to p).
Comment
What that result says is that, if you wish to determine the limit of f as x approaches
p, it is good enough to narrow your attention to what f does in any open interval
centred on p. By way of illustration, if you were asked to find the limit at x = 3.3
of the following function
√
x0.37 5x4 + 29 sin(3x2 + π/17) if x < 3,
h(x) =
2x if x ≥ 3
2 Such a function g is more properly called a restriction of f : in this case, the restriction of f to
D ∩ (p − η, p + η).
then you could choose to work with h(x) as if it were only defined on, say, (3.1, 3.5).
Yet on that interval, h(x) = 2x is just twice an identity function, so its limit is almost
immediately seen to be 6.6.
The usefulness of the insight, that the limit as x → 3.3 is only influenced by
what happens locally at 3.3, for example, on the interval (3.1, 3.5), is that it allows
us to ignore completely the more complicated behaviour of the function outside
that locality.
9.2.12 EXERCISE Show that x , the floor of x,

1. possesses a limit as x tends to any real number p that is not an integer,
2. does not possess a limit as x tends to any integer. (Hint: if, for an integer m, the
function f given by f (x) = x does have a limit as x → m, then for every
sequence (xn ) in R \ {m} such that xn → m we must have f (xn ) → . Try this
with, for instance, xn = m + (n + 1)−1 and then again with
xn = m − (n + 1)−1 to seek a contradiction.)
9.2.13 EXERCISE The function g is defined as follows:
x3
g(x) = 0 when x is any integer; otherwise g(x) = .
x+2
Determine its limiting behaviour as x → 13 .
The next example is something of a trick question but, nevertheless, it makes an

important point.
9.2.14 Example
√ To investigate the limit, as x → 0, of the function
f (x) =( x4 − x2 )2 .
Solution
(Aside: yes, of course we want to ‘cancel’ the squaring and the square root, but this
will be legal only when the square root exists as a real number.)
Now x4 − x2 factorises easily into x2 (x + 1)(x − 1), from which we see that it is
non-negative when x ≤ −1 and when x = 0 and when x ≥ 1, but negative when
x lies in the open interval (−1, 0), and negative again when x lies in (0, 1). Under
our convention about the domain of a formula-defined function comprising all the
real numbers for which the formula delivers a real answer, f (x) exists and is x4 −x2
on (−∞, −1] ∪ {0} ∪ [1, ∞) but is undefined on (−1, 0) ∪ (0, 1).
Since 0 is the only member of the domain of f that lies in, for instance, the
interval (−0.5, 0.5), 0 is not a limit point of the domain, so the limit of f (as we
approach 0) is not defined.
graph of x4 − x2
graph of √x4 − x2
graph of (√x4 − x2 )2
9.2.15 EXERCISE Investigate the limit, as x → 1, of the function g(x) = eln(x−1) .

(Hint: begin by identifying the domain of g.)
At this stage we can start developing basic theorems about function limits that
will allow us to handle them more efficiently than by the definition alone. Some
of these will strike you as very predictable, given what we have already seen about
sequences and continuous functions. We start with an observation that justifies
our use of the word the whenever we talk about the limit of a function at a point:
9.2.16 Theorem A function cannot converge to two or more different limits at a

limit point of its domain.
Proof
With a view to a contradiction, suppose that f : D → C, that p is a limit point of D,
that f (x) → 1 and f (x) → 2 as x → p, and that 1 = 2 . Pick any sequence (xn )
in D \ {p} that converges to p as limit. Then f (xn ) has to converge both to 1 and
to 2 : which is impossible by 2.7.10.
9.2.17 The algebra of limits for functions Suppose that f : D → C and g : B → A

are two functions, that p is a limit point of the intersection D ∩ B of their domains,
and that (as x → p) f (x) → and g(x) → m. Then, as x → p:
1. f (x) + g(x) → + m,
2. f (x) − g(x) → − m,
3. f (x)g(x) → m,
4. for constant k, kf (x) → k,
5. provided that m = 0,
f (x)
→ ,
g(x) m
6. |f (x)| → ||.
Proof
These can all be proved in the same (and hopefully obvious) way. For example, let’s
do numbers (3) and (5):
Limit of a product In the above notation, let (xn )n∈N be an arbitrary sequence in
D ∩ B \ {p} that converges to p. Then (via the algebra of sequence limits)
(fg)(xn ) = f (xn )g(xn ) → m
and therefore (fg)(x) → m as x → p.

Limit of a quotient (This is the only proof among the six that needs a little extra
caution: because we need to avoid any risk of dividing by zero.)
In the above notation, let (xn )n∈N be an arbitrary sequence in D ∩ B \ {p} that
converges to p.
Because g(x) → m = 0 as x → p, we get g(xn ) → m = 0 as n → ∞, so there
is n0 ∈ N such that n ≥ n0 ⇒ g(xn ) = 0. Now (ignoring the first n0 terms, which
has no effect on sequence limits)

f f (xn )
(xn ) = → (as n → ∞)
g g(xn ) m
and therefore

f f (x)
(x) = → (as x → p)
g g(x) m
as required.
9.2.18 EXERCISE Choose two other parts of this theorem and write out proofs
for them.
9.2.19 Theorem: a squeeze or sandwich rule for function limits Suppose that
f : D → R, g : D → R, h : D → R are three functions, that D ⊆ D ∩ D (that
is, wherever g is defined, so are f and h), that p is a limit point of D , that
f (x) ≤ g(x) ≤ h(x) for each x ∈ D and that
lim f (x) = lim h(x) = , say.

x→p x→p
Then also
lim g(x) = .
x→p
9.2.20 EXERCISE Construct a proof of this theorem.
We said in this chapter’s introduction that a limit of a continuous function would

always turn out to be simply the value of the function at the appropriate point. This,
together with its converse, is a highly useful characterisation of continuity that we
shall use many times. The only tricky detail in confirming its correctness is that,
when we test for continuity at a point and for limit at that point, we need to use
two slightly different families of sequences.
9.2.21 Theorem Consider a real function f : D → C and a point p of D that is

also a limit point of D. Then the following are equivalent:
1. f is continuous at p;
2. f (x) → f (p) as x → p.
Solution
First, suppose that (1) holds. With a view to establishing (2), let (xn ) be any
sequence in D \ {p} that converges to p. Then merely because (xn ) is a sequence in
D that converges to p, continuity gives f (x) → f (p) as x → p as we wanted.
The converse is a little less straightforward. Suppose this time that (2) holds, and
let (xn ) be any sequence in D that converges to p.
• If there are only finitely many values of n for which xn = p then we can ignore
them without implication for limiting processes, and thus regard (xn ) as a
sequence in D \ {p} that converges to p. By supposition, f (xn ) → f (p) as
n → ∞.
• At the other extreme, if there are only finitely many values of n for which
xn = p, then we can equally well ignore them, regard (xn ) as a constant
sequence (p), and immediately have f (xn ) = f (p) → f (p) as n → ∞.
• In the remaining case, (xn ) divides up into two (infinite) subsequences, one of
which (call it (yn )) lies in D \ {p} and converges to p, while the other (call it
(zn )) is constant at p. Since both (f (yn )) and (f (zn )) converge to f (p) – the first
via condition (2) and the second because it is constant, an exercise in Chapter 5
(see 5.2.5, 5.2.6) tells us that f (xn ) → f (p) once again.
In each of the three possible scenarios, we have what we needed in order to
conclude that f is continuous at p.
Once more we find that a benefit of possessing a battery of basic theorems is that
we can tackle examples without having to fall back on the definitions:
9.2.22 Example A function f : (0, ∞) → R is defined3 thus: for each irrational

p
x, put f (x) = 0; for each rational x = q where the fraction has been expressed in
its lowest terms, put f (x) = q5 if q is prime but f (x) = − q7 if q is not prime. To
show that f (x) → 0 as x → 0.
Solution
Notice that |f (x)| ≤ 7x in all cases. That is:
−7x ≤ f (x) ≤ 7x, all x ∈ (0, ∞).
Now the functions 7x and −7x are both continuous (on R) so, using 9.2.21,
limx→0 7x = 0 and limx→0 (−7x) = 0. Using the ‘new squeeze’ 9.2.19 on the
previous display gives limx→0 f (x) = 0 as required.
9.2.23 Example To revisit the determination of the limit, as x → −1, of the

function
x5 + 1
f (x) = .
x4 − 1
Since what happens at −1 is irrelevant to this limit, we can assume that x = −1.
Then (with a little effort of factorisation):
x5 + 1 (x + 1)(x4 − x3 + x2 − x + 1)
lim f (x) = lim = lim
x4 − 1 (x + 1)(x − 1)(x2 + 1)
x −x +x −x+1
4 3 2
= lim
(x − 1)(x2 + 1)
lim(x − x3 + x2 − x + 1)
4 5 5
= = =−
lim((x − 1)(x + 1))
2 (−2)(2) 4
3 This is an example in which any attempt at sketching the graph is likely to waste quite a lot
of time!
(noting that all the limits are as x → −1, the cancellation was legitimate since
x + 1 was non-zero, the algebra of limits allowed us to operate on the top and
bottom lines separately, and the fact that top and bottom lines were continuous
gave us the numerical values of those two limits immediately.)
9.2.24 Example To determine the limit, as x → 4, of the function f described by

√
x−2
f (x) = when x = 4; f (4) = 1.
x−4
Solution
We can safely assume x = 4 since behaviour at 4 has no consequences for a limit
while approaching 4. With that proviso:
√ √
x−2 x−2 1 1 1
f (x) = = √ √ =√ → =
x−4 ( x − 2)( x + 2) x+2 2+2 4
√
because the function√ x is continuous (see 8.6.10) and so its limit as we approach
4 is merely its value 4 at 4 (and because we can use the algebra of limits theorem).
9.2.25 EXERCISE Evaluate the limits
x4 − 81 x3 − 3x2 − 9x + 27
lim and lim .
x→3 x4 − 8x2 − 9 x→3 x6 − 243
9.2.26 HARDER EXERCISE See if you can determine the limit of the function
discussed in 9.2.22 as x → 23 , and its limit as x → π . Don’t worry if this turns
out to be difficult – we shall encounter, in the next chapter, an alternative method
that works more easily for certain questions, including ones like this.
.........................................................................
10 Epsilontics and
functions
.........................................................................
10.1 The epsilontic view of function limits

Up to this point, we have exclusively employed sequences to define continuity
and function limits and to develop their theory. There is, however, an alternative
approach that is very widely used and that works more smoothly for certain types
of question. We need to become familiar with this before going any further.
The basic idea underlying limiting processes is that of approximation: to say
that some (variable) quantity X approaches a limit asserts that values of X can be
generated that are extremely good approximations to in the sense that the error
|X − | is not merely small, but can be made smaller than any required tolerance
just by continuing far enough with the approximation-generating procedure. We
have dealt pretty thoroughly with the case in which the approximation process is
a sequence – an unending list of one number after another – but how much has
to change when X is instead a function, say, a function f (x) where x is tending
towards a limit point p of the domain of f ? It still makes perfectly good sense to
call |f (x) − | the error of the approximation and to insist that this can be made
smaller than any given tolerance ε > 0; the only item that needs to be re-thought
is what controls the approximation-generating procedure this time and how far
we need to go with it: we now need the ‘best’ approximations to be the ones that
correspond to values of x that are very close to p – say, those in which x lies within
a sufficiently small distance δ from p. This discussion suggests a possible definition
of f (x) converging to the limit as x → p, as follows:
for each ε > 0 there exists δ > 0 such that |x − p| < δ implies that |f (x) − | < ε
which captures the intuition of high-quality approximation without needing to

call in sequences as a probing mechanism. Two details of this proposed definition
still need refinement, however. Firstly, we need to write in the requirement that
x shall belong to the domain of the function f before we even mention f (x).
Secondly, to acknowledge that what f does or does not do at the exact point x = p
has absolutely no effect on its limit, we should prevent |x − p| from equalling
zero. The epsilontic definition of limit of a function at a point therefore takes the
following final form:
170 10 EPSILONTICS AND FUNCTIONS
If f : D → C is a real function and

10.1.1 Alternative definition of function limit
p is a limit point of its domain D, we say that f (x) converges to the limit
(as x → p) if
for each ε > 0 there exists δ > 0 such that

x ∈ D and 0 < |x − p| < δ together imply that |f (x) − | < ε.
As before, we can write this symbolically either as limx→p f (x) = or as f (x) →

as x → p, and we can leave out the phrase ‘x → p’ whenever the context makes it
clear enough that this is what is intended.
Now it is perfectly possible to develop the entire theory of function limits and,
in turn, the entire theory of continuous functions, starting from this definition.
We shall do a little of this just to illustrate that it can be done, but it is a common
student experience that 10.1.1 is a harder definition to use than 9.2.5 (and also a
common lecturer experience that it is a harder definition to teach) which is why
we opted to make 9.2.5 our primary definition here.
A task that we cannot shirk is to show that 9.2.5 and 10.1.1 really are logically
equivalent. This is quite a sophisticated proof, and you might want to omit it on
a first reading, but it is important and urgent to take on board what the result
is saying: that despite their apparent differences, 10.1.1 and 9.2.5 are completely
interchangeable: if either one of them is satisfied, then so must the other be. You
are therefore free to use whichever of the two definitions you please (other things
being equal) in any given argument.
10.1.2 Theorem Let f : D → C be a function and p a limit point of its domain

D. Then the following are equivalent:
1. for every ε > 0 there is δ > 0 such that x ∈ D, 0 < |x − p| < δ together imply
that |f (x) − | < ε.
2. for every sequence (xn )n∈N of elements of D \ {p} that converges to p, we have
f (xn ) → .
Proof
(I): (1) implies (2).
• Suppose that condition (1) is satisfied.
• Let (xn )n∈N be an arbitrary sequence of elements of D \ {p} that converges
to p.
• For a given positive value of ε, use condition (1) to obtain a number δ > 0
such that whenever x ∈ D and 0 < |x − p| < δ, we have |f (x) − | < ε.
• Because xn → p, there is a positive integer n0 such that n ≥ n0 will
guarantee that |xn − p| < δ.
• Also 0 < |xn − p| because xn is not equal to p.
10.1 THE EPSILONTIC VIEW OF FUNCTION LIMITS 171
• Therefore
n ≥ n0 ⇒ 0 < |xn − p| < δ ⇒ |f (xn ) − | < ε.
• That is, f (xn ) → as required.

• Since (xn ) was any sequence in D \ {p} that happened to converge to p,
condition (2) is now proved.
TIMEOUT
Although the mathematics in the upcoming proof of the converse is reasonably
straightforward, the logical content is easily the most sophisticated that we have
dealt with so far, so let us call ‘timeout’ and pick our way through it one small step
at a time. We’ll also continue, for the moment, to bullet-point out those steps, to
try for a little added clarity.
• Instead of trying to show directly that (2) implies (1), we shall call in the logical
device of contraposition and show instead (but equivalently) that NOT-(1)
implies NOT-(2).
• Condition (1) says that, for every ε that we are challenged with, we can find a
positive δ that ‘works’.
• If this is not true then there is some special and awkward value of ε for which
no value of δ that we choose to try will ‘work’.
• In particular, if we pick a positive integer n and try δ = 1/n, it will not ‘work’.
• In other words, the implication
0 < |x − p| < δ = 1/n ⇒ |f (x) − | < ε
(for x ∈ D) is not true.

• So there must exist an x in D that satisfies 0 < |x − p| < δ = 1/n and yet the
conclusion |f (x) − | < ε is false.
• Thus x ∈ D and 0 < |x − p| < δ = 1/n, but |f (x) − | ≥ ε.
• A refinement of notation: the x that we just discovered will almost certainly
depend on the value of n that we picked several steps ago, so we had better label
it in a way that renders that visible – say, instead of plain x we should write x(n)
or, more simply perhaps, xn .
• Now we suddenly find ourselves in possession of a sequence (xn )n∈N , and we
know the following details about it:
each xn ∈ D,
each xn is distinct from p because 0 < |xn − p|,
(xn ) converges to p because |xn − p| < 1/n and 1/n → 0 (look back at 2.7.14
here if necessary),
the sequence (f (xn )) does not converge to because |f (xn ) − | ≥ ε for all n.
• That is to say, condition (2) – which claims validity for every appropriate
sequence that converges to p, is NOT TRUE.
• The proof, that NOT-(1) implies NOT-(2), is therefore now complete.
Once you have managed to follow the details of that expanded argument, you
should be able to grasp the sort of condensed version that typically appears in
textbooks:
(II): (2) implies (1).
• Suppose that condition (1) is not satisfied.
• That is, there exists a value of ε > 0 for which no suitable δ > 0 can be
found.
• In particular, for each n ∈ N, δ = 1/n is not suitable…
• …and so there is xn ∈ D such that 0 < |xn − p| < 1/n and yet
|f (xn ) − | ≥ ε.
• Therefore (xn )n∈N converges to p (with each xn ∈ D \ {p}), and yet
(f (xn ))n∈N does not converge to .
• That is, condition (2) is not satisfied.
• By contraposition, (2) implies (1).
10.1.3 Example To show that
x3 − 1000
f (x) = → 15 as x → 10
x2 − 100
directly from the epsilon-delta definition 10.1.1.
Solution
We can assume that x = 10 since behaviour at 10 is irrelevant to the limit, and so
x3 − 1000 (x − 10)(x2 + 10x + 100) x2 + 10x + 100

− 15 = − 15 = − 15
x2 − 100 (x − 10)(x + 10) x + 10
x2 − 5x − 50 (x + 5)(x − 10)
= = .
x + 10 x + 10
Let ε > 0 be given. We need to decide: how close to 10 must we take x in order
that this error term shall be less than ε in modulus?
• If we make |x − 10| < 1, that is, 9 < x < 11, then |x + 5| < 16 and
|x + 10| > 19, so |f (x) − 15| < (16/19)|x − 10|.
• If we also make |x − 10| < 19ε/16 then (16/19)|x − 10| < ε.
10.1 THE EPSILONTIC VIEW OF FUNCTION LIMITS 173
Therefore choose δ = min{1, 19ε/16} and we find that
0 < |x − 10| < δ ⇒ |f (x) − 15| < (16/19)|x − 10| < ε, as required.
10.1.4 Example To use the epsilon-delta definition of limit to show that a func-
tion cannot converge to two or more different limits at a limit point of its domain.
Proof
With a view to a contradiction, suppose that f : D → C, that p is a limit point of D,
that f (x) → 1 and f (x) → 2 as x → p, and that 1 = 2 . Arrange the labelling
2 − 1
so that 1 < 2 for convenience, and put ε = > 0. Using 10.1.1, there
2
exist two positive numbers δ1 and δ2 such that
x ∈ D, 0 < |x − p| < δ1 together imply |f (x) − 1 | < ε, and

x ∈ D, 0 < |x − p| < δ2 together imply |f (x) − 2 | < ε.
Now since δ1 and δ2 are each positive, so is the lesser of the two (whichever one
it is). Call that lesser δ3 . Because p is a limit point of D there must actually be1 an
element of D – let us call it x , for instance – that satisfies 0 < |x − p| < δ3 . So
both of the displayed lines apply to x , and we therefore know that
|f (x ) − 1 | < ε and |f (x ) − 2 | < ε.
Now invoke the triangle inequality:
|1 − 2 | = |1 − f (x ) + f (x ) − 2 | ≤ |1 − f (x )| + |f (x ) − 2 | < ε + ε,
that is, 2ε < 2ε, which is as crisp a contradiction as one can ask for.
10.1.5 EXERCISE Directly use the epsilon-delta definition of limit (10.1.1) to

show that 2
x − 36 1
lim =− .
x→−6 x3 + 216 9
10.1.6 EXERCISE Directly use the epsilon-delta definition of limit to prove that
if p is a limit point of D, and the functions f , g both have D as their domain, and
f (x) → 1 and g(x) → 2 as x → p, then f (x) + g(x) → 1 + 2 as x → p.
In all fairness to the epsilon-delta definition of limit, here is an example in

which it works more easily and more naturally than our primary, sequence-based
definition. The function involved is one we began to study in paragraph 9.2.22.
1 See 9.2.3
10.1.7 Example The function f : (0, ∞) → R is defined thus: for each irrational
p
x, put f (x) = 0; for each rational x = q where the fraction has been expressed in
its lowest terms, put f (x) = q5 if q is prime but f (x) = − q7 if q is not prime. To
show that f (x) → 0 as x → π .
Solution
Let ε > 0 be given. Consider a positive integer N. The interval (π − 1, π + 1)
includes π of course, and it includes only a finite2 number of rationals whose
(lowest-terms) denominators lie in the range 2 to N so, putting δ = the shortest
distance from π to one of these, we get δ > 0 and every rational r in (π − δ, π + δ)
has a denominator greater than N. It follows that |f (r)| < 7/N and, since f is
exactly zero at each irrational, we get
7
0 < |x − π | < δ ⇒ x ∈ (π − δ, π + δ) ⇒ |f (x) − 0| < .
N
Therefore if we choose N so large that 7/N < ε, we have what is required in order
to show that the limit of f at π is 0.
10.1.8 EXERCISE Use 10.1.1 to investigate the limit of this function at 23 .
10.2 The epsilontic view of continuity

Intuitively, it is not difficult to guess what the epsilon-delta alternative definition
of continuity at a point should be: for f to be continuous at p, we need f (x) to
have a limit equal to f (p), and there is no longer any need to prevent x from
equalling p since, if that happens, it certainly doesn’t prevent f (x) from being a
good approximation to f (p). We should therefore expect the formal alternative
definition to take this shape:
10.2.1 Alternative definition of continuity If f : D → C is a real function and

p ∈ D, we say that f is continuous at the point p when
for each ε > 0 there exists δ > 0 such that

x ∈ D and |x − p| < δ together imply that |f (x) − f (p)| < ε.
If you carefully compare this with 10.1.1, it will strike you that we have left out
any reference to p being a limit point of D. This is not a casual oversight, for it
2 Since the length of this open interval is 2, it cannot include more than 4 rationals of
denominator 2, nor more than 6 rationals of denominator 3, nor more than 8 rationals of
denominator 4, and so on.
10.2 THE EPSILONTIC VIEW OF CONTINUITY 175
turns out that 10.2.1 is entirely equivalent to our original definition of continuity
whether or not p is a limit point of D. The following proof of this assertion is so
like that of 10.1.2 that, were it a bit shorter or a bit less complicated, we would have
asked you to check it out as an exercise (and you might still decide to try that if you
have already properly grasped the argument of 10.1.2).
10.2.2 Theorem Let f : D → C be a function and p ∈ D. Then the following are

equivalent:
1. for every ε > 0 there is δ > 0 such that x ∈ D, |x − p| < δ together imply that
|f (x) − f (p)| < ε.
2. for every sequence (xn )n∈N of elements of D that converges to p, we have
f (xn ) → f (p).
Proof
(I): (1) implies (2).
• Let (xn )n∈N be an arbitrary sequence of elements of D that converges to p.
• For a given positive value of ε, use condition (1) to obtain a number δ > 0
such that whenever x ∈ D and |x − p| < δ, we have |f (x) − f (p)| < ε.
guarantee that |xn − p| < δ.
• Therefore
n ≥ n0 ⇒ |xn − p| < δ ⇒ |f (xn ) − f (p)| < ε.
• That is, f (xn ) → f (p) as required.

• Since (xn ) was any sequence in D that happened to converge to p,
condition (2) is now proved.
found.
• In particular, for each n ∈ N, δ = 1/n is not suitable . . .
• . . . and so there is xn ∈ D such that |xn − p| < 1/n and yet
|f (xn ) − f (p)| ≥ ε.
• Therefore (xn )n∈N converges to p (with each xn ∈ D), and yet (f (xn ))n∈N
does not converge to f (p).
We shall again present a couple of samples of how to use the alternative definition
in arguments:
10.2.3 Example To show, using the epsilontic definition, that a composite of

continuous functions is continuous.
Solution
Suppose that f : D → C and g : C → B are both continuous. We need to verify
that their composite g ◦ f : D → B is continuous. So let p ∈ D and ε > 0 be given.
Since g is continuous at the point f (p) of its domain, there is δ1 > 0 such that
y ∈ C, |y − f (p)| < δ1 together imply |g(y) − g(f (p))| < ε.
Since f is continuous at p (and δ1 > 0), there is δ > 0 such that x ∈ D, |x−p| < δ
together imply |f (x) − f (p)| < δ1 .
Thus, for x ∈ D and |x − p| < δ, we get |f (x) − f (p)| < δ1 and, consequently,
|g(f (x)) − g(f (p))| < ε, that is, |(g ◦ f )(x) − (g ◦ f )(p)| < ε. So g ◦ f is (by the
epsilon-delta definition) continuous at each element of its domain.
10.2.4 Example We define a function f : [0, ∞) → R by:

1
f (0) = 1, f (x) = x for x > 0.
x
To show (using 10.2.1) that f is continuous at 0.
Solution
We start from the fact that (for all real t) t ≥ t > t − 1 simply by the definition
of floor or ‘integer part’. So, for x > 0,

1 1 1
1=x ≥x = f (x) > x − 1 = 1 − x.
x x x
In particular, |f (x) − 1| < x. Therefore, given ε > 0, if we choose δ = ε, we get
|x − 0| < δ ⇒ |f (x) − f (0)| = |f (x) − 1| < x < δ = ε
which is our requirement for continuity at 0.
10.2.5 EXERCISE Using the epsilon-delta definition of continuity, show that

7x − 5 if x is rational,
f (x) =
4x + 4 if x is irrational
defines a function that is continuous at 3.

10.3 ONE-SIDED LIMITS 177
10.2.6 EXERCISE Verify, using the epsilon-delta definition of continuity, that the
function discussed in paragraph 10.1.7 is continuous at every positive irrational
number, but discontinuous at every positive rational number.
10.2.7 EXERCISE Show, without using sequences, that the following function

x2 + 4x − 1 if x < 3,
g(x) =
x3 − x2 + 3 if x ≥ 3
does not have a limit as x → 3.
10.3 One-sided limits
f (x) g (x)
1
2
a a
f (x) → 1 as x → a g (x) → 2 as x → a
1
2
How best to describe this?
There were a few examples in the preceding section ( x close to an integer value
for x and 10.2.7 are cases in point) where the reader can be forgiven for thinking
informally along the following lines: ‘this function f (x) does not have a limit as
x →p . . . and yet, if we were allowed to look only at values of x close to but just
less than p, it does appear to be settling to a limiting value . . . and likewise if we
look only at values of x close to but just greater than p.’ This is a perfectly legitimate
idea, and to explore and develop it we need first to formulate a clear definition of
such one-sided limits. By this point in the text you are probably able to guess even
the definitions with high accuracy.
10.3.1 Definition Suppose that f : D → C and that p is a limit point of D∩(p, ∞).
Then we call a number the right-hand limit of f (x) at p, or the limit on the right of
f (x) at p, and say that f (x) converges to as x → p from the right (or from above)
if, for each ε > 0, there is δ > 0 such that
x ∈ D, p < x < p + δ together imply |f (x) − | < ε.
This is also written as
lim f (x) = or as f (x) → as x → p+ .

x→p+
f (x)
+ε

−ε
p p+δ x
f (x) tends to as x tends to p from the right
10.3.2 Definition Suppose that f : D → C and that p is a limit point of

D ∩ (−∞, p). Then we call a number the left-hand limit of f (x) at p, or the limit
on the left of f (x) at p, and say that f (x) converges to as x → p from the left (or
from below) if, for each ε > 0, there is δ > 0 such that
x ∈ D, p − δ < x < p together imply |f (x) − | < ε.
lim f (x) = or as f (x) → as x → p− .

x→p−
+ε

−ε
p−δ p
f (x) tends to as x tends to p from the left
10.3.3 Remark You should keep in mind that one-sided limits, just like other
types of limit, can fail to exist. For instance, the function

x−1 if x < 0,
f (x) =
sin(x−1 ) if x > 0
Sketch graph of f
does not have a left-hand limit as x → 0 because the function is unbounded close
to 0 on the left, and also does not have a right-hand limit here (informally speaking,
because sin(1/x) oscillates wildly as x → 0+ rather than settling towards a stable
value). To prove that second point properly, you can use an argument involving
two sequences of positive numbers both homing in on x = 0 but at which the sine-
one-over-x function gives two streams of values with different limits. The overall
cautionary comment is that we must never assume that a function has limits of
any kind unless there is enough information given to guarantee that it has. In this
connection, also note Exercise 10.3.5.
10.3.4 Example In reviewing the work we did earlier on the floor or integer part
x of x we notice that, in the new notation, our observations amounted to:
lim x = p, lim x = p − 1
x→p+ x→p−
for every integer p.
10.3.5 EXERCISE Recall that a function f on an interval (a, b) is said to be

increasing if (throughout the interval)
x < y ⇒ f (x) ≤ f (y).
Show that if f is both increasing and bounded on (a, b) then limx→b− f (x) and
limx→a+ f (x) must both exist.
Partial solution
Think about the supremum and the infimum of the values of f (x) on the interval,
and argue as in the proof that a bounded monotonic sequence has to converge.
10.3.6 Theorem – a sequence description of right-hand limits Let f : D → C

be a function and p a limit point of D ∩ (p, ∞). Then the following are equivalent:
1. lim f (x) = ,
x→p+
2. for every sequence (xn )n∈N of elements of D ∩ (p, ∞) that tends to p, we have
f (xn ) → .
Proof
The proof amounts to little more than a re-run of that of 10.1.2 but ensuring that
every xn shall be greater than p.
10.3.7 The algebra of limits for functions as x → p+ Suppose that f : D → C

and g : B → A are two functions, that p is a limit point of the intersection
D ∩ B ∩ (p, ∞), and that (as x → p+ ) f (x) → and g(x) → m. Then (as x → p+ ):
1. f (x) + g(x) → + m,
2. f (x) − g(x) → − m,
3. f (x)g(x) → m,
f (x)
→ ,
g(x) m
6. |f (x)| → ||.
Partial proof
As a sample, let us set up a proof of part (3). For any sequence (xn ) in D∩B∩(p, ∞)
such that xn → p we know (from 10.3.6) that f (xn ) → and that g(xn ) → m. The
algebra of limits for sequences tells us that f (xn )g(xn ) → m. Thus we see (from
10.3.6 again) that f (x)g(x) → m as x → p+ .
10.3.8 Theorem: a squeeze or sandwich rule for function limits as x → p+ Sup-

pose that f : D → R, g : D → R, h : D → R are three functions, that
D ∩ (p, ∞) ⊆ D ∩ D (that is, wherever g is defined at a number greater than p,
so are f and h), that p is a limit point of D ∩ (p, ∞), that f (x) ≤ g(x) ≤ h(x) for
each x ∈ D ∩ (p, ∞) and that

x→p+ x→p+
Then also
lim g(x) = .
x→p+
EXERCISE
Write out a proof of 10.3.8.
All of this material can be tweaked routinely to apply also to left-hand limits,
and the proofs are routine variations of what we have already done.
The last result we set out in this chapter concerns how the two one-sided limits
(of a function, at a point), if they both exist, can either agree to create a limit in the
full sense, or disagree to prevent a limit (in the full sense) existing.
10.3.9 Theorem Let f : D → C be a function and p be a limit point both of

D ∩ (−∞, p) and of D ∩ (p, ∞). Then f (x) → as x → p if, and only if, both
of the one-sided limits
lim f (x), lim f (x)
x→p− x→p+
exist and are equal to .
Proof
Suppose that f (x) → as x → p. Given positive ε,3 it is therefore possible to find
δ > 0 such that
x ∈ D, 0 < |x − p| < δ together imply |f (x) − | < ε.
In particular,
x ∈ D, p < x < p + δ together imply |f (x) − | < ε,
and since ε was arbitrary, we deduce that f (x) → as x → p+ . Yet it is equally a

consequence that
x ∈ D, p − δ < x 0, we use these facts to identify δ > 0 and δ > 0 for which
x ∈ D, p−δ < x < p ⇒ |f (x)−| < ε and x ∈ D, p < x < p+δ ⇒ |f (x)−| < ε.
Put δ = min{δ, δ } and we see that
x ∈ D, 0 < |x − p| < δ ⇒ |f (x) − | < ε
whether x is less than p or greater. Therefore f (x) → as x → p.
10.3.10 Example We show that the following function

3x2 + 5x − 1 if x < −2,
g(x) =
5x + 11 if x ≥ −2
possesses a limit at −2.
Solution
For any sequence (xn ) of numbers less than −2 that converges to −2, we have
g(xn ) = 3xn2 + 5xn − 1 → 3(−2)2 + 5(−2) − 1 = 1
3 Somewhat unusually, the ε-style definition is more convenient in this demonstration than
the sequence-style alternative.
by the algebra of limits for sequences, and therefore limx→−2− g(x) exists and
equals 1 via the natural sequence description of left-hand limits. Likewise, for any
sequence (xn ) of numbers greater than −2 that converges to −2, we have
g(xn ) = 5xn + 11 → 5(−2) + 11 = 1.
By the preceding Theorem 10.3.9, the ‘two-sided limit’ limx→−2 g(x) exists (and
equals 1).
For x < −2, the function g and the (continuous) quadratic 3x2 + 5x − 1 are
identical, so
lim g(x) = lim (3x2 + 5x − 1) = lim (3x2 + 5x − 1) = 1

x→−2− x→−2− x→−2
(where we tacitly used 10.3.9 to switch from limx→−2− to limx→−2 for the
quadratic).
Similarly
lim g(x) = lim (5x + 11) = lim (5x + 11) = 1.

x→−2+ x→−2+ x→−2
Now 10.3.9 shows that g(x) → 1 as x → −2.
10.3.11 EXERCISE Given that the function

⎧
⎪
⎪
1
if x < 2,
⎪
⎨ x +1
2
f (x) = ax + b if 2 ≤ x ≤ 5,
⎪
⎪
⎪
⎩ x2
x2 −21x+110
if 5 < x < 10
has limits at x = 2 and at x = 5, determine the values of the constants a and b.

.........................................................................
11 Infinity and function

limits
.........................................................................
To speak of infinity in connection with limits can seem almost contrary to the basic
meaning of the word, since its usual import is the complete absence of any limit.
Nevertheless there are many natural and simple functions f in which the value
f (x) settles towards some kind of stable or equilibrium state not as x approaches
a particular (finite) number p, but as x becomes enormously big and positive (or
enormously big and negative). In many ways this is actually closer to the limiting
behaviour of sequences that we first studied, where the focus of attention was on
how the typical term xn behaved as n tended to infinity and, indeed, it is scarcely
x2 − 1
possible to draw sketch graphs of functions such as x−1 or x−2 or arctan x or 2
x +1
without some phrase about their behaviour as x tends to infinity coming to mind.
Our first objective in this chapter is to formulate clear definitions of these ideas
and to develop some basic theory concerning them. This will offer us very little
difficulty provided we resist the temptation to regard ∞ and −∞ as numbers, or
to use senseless symbols such as |x − ∞| to assess how ‘close’ x is to infinity.
11.1 Limit of a function as x tends to infinity

or minus infinity
f (x)
+ε ···
·· ·· ··
−ε ···
For each challenge ε > 0…
…there is a response K > 0
K x
f (x) tends to limit as x → ∞
186 11 INFINITY AND FUNCTION LIMITS
11.1.1 Definition Suppose that f : D → C and that its domain D is not bounded
above.1 Then we say that f (x) converges to the limit (or tends to ) as x → ∞ if,
for each ε > 0, there is K ∈ R such that
x ∈ D, x > K together imply | f (x) − | < ε.
lim f (x) = or as f (x) → (as x → ∞).

x →∞
It is always safe to assume that K is positive in the above definition: for if K were
negative or zero, then x > |K| + 1 ⇒ x > K ⇒ |f (x) − | < ε while x ∈ D, and
|K| + 1 certainly is positive.
f (x)
··· +ε
·· ·· ··
··· −ε
…there is a response K < 0
K x
f (x) tends to limit as x → −∞
11.1.2 Definition Suppose that f : D → C and that its domain D is not bounded
below.2 Then we say that f (x) converges to the limit (or tends to ) as x → −∞
if, for each ε > 0, there is K ∈ R such that
x ∈ D, x < K together imply | f (x) − | < ε.
lim f (x) = or as f (x) → (as x → −∞).

x → −∞
1 This is just to ensure that there are arbitrarily big (positive) values of x for which f (x) makes
sense.
2 This is just to ensure that there are arbitrarily big negative values of x for which f (x) makes
sense.
11.1 LIMIT OF A FUNCTION AS X TENDS TO INFINITY OR MINUS INFINITY 187
It is always safe to assume that K is negative in this second definition: for if K were
positive or zero, then x < −|K| − 1 ⇒ x < K ⇒ | f (x) − | < ε while x ∈ D, and
−|K| − 1 certainly is negative.
11.1.3 Example To show that the function f described by the formula
x3
f (x) =
1 − x3
converges to −1 as x → ∞, and also converges to −1 as x → −∞.
Solution
The domain of f is (−∞, 1)∪(1, ∞) which is neither bounded above nor bounded
below, so both questions are legitimate.
Provided that x > 1, we have

1
|f (x) − (−1)| = < ε ⇔ x3 − 1 > ε−1 ⇔ x > 3 1 + ε−1
1−x 3
√
so, for a given ε > 0, if we choose K = max{1, 3 1 + ε−1 },3 we get |f (x)−(−1)| < ε
whenever x > K, as required to show f (x) → −1 as x → ∞.
Provided that x < 1, we next have

1
|f (x) − (−1)| = < ε ⇔ 1 − x3 > ε−1 ⇔ x < 3 1 − ε−1
1 − x3
√
so, with ε > 0 given, if we choose K = min{1, 3 1 − ε−1 },4 we find
|f (x) − (−1)| < ε whenever x < K, as needed to demonstrate that
f (x) → −1 as x → −∞.
11.1.4 EXERCISE Confirm that the function tanh x defined by the formula
ex − e−x
tanh x =
ex + e−x
has a limit of 1 as x → ∞, and has a limit of −1 as x → −∞. (You may assume

the well-known properties of exponential and logarithmic functions.)
We could explore more technical examples of function convergence as x tends to

infinity or minus infinity but, once again, it is largely unnecessary to do so because
these ideas can be converted conveniently into sequence convergence instead. In
the case of limits as x → ∞, the key theorem is as follows:
√
3 Actually, this piece of notation is heavier than it need be, since
√
3
1 + ε−1 is greater than 1.
4 Again, this notation is unnecessarily heavy-handed, since 1 − ε−1 is clearly less than 1.
3
11.1.5 Theorem Let f : D → C be a function whose domain D is not bounded

above. Then the following are equivalent:
1. lim f (x) = ,
x →∞
2. for every sequence (xn )n∈N of elements of D that tends to ∞, we have
f (xn ) → .
Proof
(I): (1) implies (2).
• Let (xn )n∈N be an arbitrary sequence of elements of D such that xn → ∞.
• For a given positive value of ε, use condition (1) to obtain a number K
such that whenever x ∈ D and x > K, we have | f (x) − | < ε.
• Because xn → ∞, there is a positive integer n0 such that n ≥ n0 will
guarantee that xn > K.
• Therefore
n ≥ n0 ⇒ xn > K ⇒ |f (xn ) − | < ε.
• That is, f (xn ) → as required.
• Since (xn ) was any sequence in D that happened to tend to ∞, condition
(2) is now proved.
• That is, there exists a value of ε > 0 for which no suitable K can be found.
• In particular, for each n ∈ N, K = n is not suitable…
• . . . and so there is xn ∈ D such that xn > n and yet | f (xn ) − | ≥ ε.
• Therefore (xn )n∈N tends to ∞ (with each xn ∈ D), and yet ( f (xn ))n∈N
does not converge to .
Use of this theorem is often a convenient way to deal with a problem on function
limits as x → ∞, and also to prove the (predictable) theorems about those limits,
such as:
11.1.6 The algebra of limits for functions as x → ∞ Suppose that f : D → C and

g : B → A are two functions, that the intersection D ∩ B of their domains is not
bounded above, and that (as x → ∞) f (x) → and g(x) → m. Then (as x → ∞):
1. f (x) + g(x) → + m,
2. f (x) − g(x) → − m,
3. f (x)g(x) → m,
f (x)
→ ,
g(x) m
6. | f (x)| → ||.
11.1.7 EXERCISE Select any part of this theorem and give a proof of it using
sequences.
11.1.8 Remark It is, of course, perfectly possible for the limit of a function f (x)
as x → ∞ or as x → −∞ not to exist at all. For instance (and assuming the basic
behaviour of trigonometric functions) both of the sequences
(nπ )n∈N and ((2n + 0.5)π )n∈N
tend to ∞, while (sin(nπ ))n∈N converges to 0 whereas (sin((2n + 0.5)π ))n∈N

converges to 1. In the light of Theorem 11.1.5, this shows that limx→∞ sin x does
not exist.
11.1.9 Theorem: a squeeze or sandwich rule for function limits as x → ∞ Sup-

pose that f : D → R, g : D → R, h : D → R are three functions, that D ⊆ D ∩ D
(that is, wherever g is defined, so are f and h), that D is not bounded above, that

x→∞ x→∞
Then also
lim g(x) = .
x→∞
11.1.10 EXERCISE Construct a proof of this theorem.
The analogous results for function limits as x → −∞ are almost too obvious
even to state but, for the sake of completeness, here is the basic set. If you wish, feel
free to prove any of them as an additional exercise. (The first theorem is the one
that would be most worthwhile to try proving, since the others are very routine.)

below. Then the following are equivalent:
1. lim f (x) = ,
x → −∞
2. for every sequence (xn )n∈N of elements of D that tends to −∞, we have
f (xn ) → .
11.1.12 The algebra of limits for functions as x → −∞ Suppose that f : D → C

and g : B → A are two functions, that the intersection D ∩ B of their domains
is not bounded below, and that (as x → −∞) f (x) → and g(x) → m. Then (as
x → −∞):
1. f (x) + g(x) → + m,
2. f (x) − g(x) → − m,
3. f (x)g(x) → m,
f (x)
→ ,
g(x) m
6. |f (x)| → ||.
11.1.13 Theorem: a squeeze or sandwich rule for function limits as x → −∞

Suppose that f : D →R, g : D →R, h : D →R are three functions, that D ⊆ D ∩ D
(that is, wherever g is defined, so are f and h), that D is not bounded below, that

x→−∞ x→−∞
Then also
lim g(x) = .
x→−∞
It is sometimes useful to convert a problem concerning a limit ‘at infinity’ or

‘at minus infinity’ into a one-sided limit at anordinary real number. For instance,
to examine the limiting behaviour of sin x1 as x → ∞, it is tempting to argue
informally that when x is very big, x1 is very small (but positive) and so we are
actually dealing with sin(t) as t → 0+ , and it is now easy to determine this limit.
(See 11.1.15 for another illustration of why this can be convenient.) Here is a
suggested exercise – there are several such possibilities – that legitimises this kind
of translation between function limits ‘at infinity’ and ‘at zero’.
11.1.14 EXERCISE (I) Given a function f : D → C whose domain D is unbounded

above, we define an associated function g as follows:

1
g(x) = f .
x
Show that the following are equivalent:

1. f (x) → as x → ∞,
2. g(x) → as x → 0+ .
(II) Given a function f : D → C whose domain D is unbounded both above and

below, we define an associated function g as follows:

1
g(x) = f .
x
Show that the following are equivalent:

1. f (x) → as x → ∞ and f (x) → as x → −∞,
2. g(x) → as x → 0.
1
11.1.15 Example To show that → 0 as x → ∞, and to deduce the limit (as
x2
x → ∞) of the function
x2 + sin x
q(x) = , x ∈ (1, ∞).
x2 + cos x
Solution
By part (I) of 11.1.14, the statement x−2 → 0 as x → ∞ is equivalent to the
statement (1/x)−2 → 0 as x → 0+ , that is, to x2 → 0 as x → 0+ . Yet the truth of
the latter is immediate from the continuity of x2 .
Next, for the usual trigonometric reasons,5 we have (for x > 1)
x2 − 1 x2 + 1
≤ q(x) ≤ .
x2 + 1 x2 − 1
Also, using parts of the algebra of limits theorem 11.1.6:
x2 − 1 1 − x−2 1−0
= −2
→ = 1 as x → ∞,
x +1
2 1+x 1+0
x2 + 1 1 + x−2 1+0
= → = 1 as x → ∞.
x −1
2 1 − x−2 1−0
Lastly, the 11.1.9 version of the squeeze gives us the desired conclusion q(x) → 1
as x → ∞.
11.1.16 EXERCISE Determine the limit as x → ∞, and the limit as x → −∞, of
x3 + 5 + (3x2 − 7x − 2) sin x
.
x3 + 5 + (3x2 − 7x − 2) cos x
5 and using multiplication of inequalities, as flagged up in the checklist of 1.2

11.2 Functions tending to infinity or minus infinity

Just as, at the end of Chapter 2, we needed to give proper definitions to the ideas of
sequences tending to infinity or to minus infinity, it is now useful to do the same
for functions; indeed, it is even more obviously necessary for functions than it was
for sequences – it is barely possible to draw a sketch graph of, say, tan x just to the
left of x = π/2, or x−1 just to the right or just to the left of zero, or ex for large
values of x, without at least some informal notion of the expression ‘heading off to
infinity or minus infinity’ arising.
A complicating factor this time is that, when we want to discuss limiting
behaviour of a function f (x), there are (as we have seen) at least five different
kinds of thing that the ‘control’ variable x might be doing: tending to p, one-sidedly
tending to p from the left, or from the right, tending to infinity, or tending to minus
infinity. Combine those with the two outcome scenarios in which f (x) tends to
infinity and in which it tends to minus infinity, and we face the prospect of an array
of ten more very similar definitions! We shall, of course, list all ten of them for the
sake of completeness, but from the reader’s point of view it is much more important
to understand how they are put together, rather than to spend time attempting to
memorise them.
It may be useful to imagine that restricting the permitted values of x is a ‘cause’
and observing the resulting range of possible values of f (x) is an ‘effect’. The interplay
between cause and effect is then a mental image for any limiting scenario, including
the ones that we first discussed in Chapter 2 and Chapter 9. When we defined the
idea ‘f (x) → as x → p ’, what we effectively said was that the effect ‘f (x) shall be as
close to as is demanded’ can be brought about by the cause ‘x is sufficiently close
to, but distinct from, p ’. More precisely, for any challenge ε > 0 we can somehow
determine a response δ > 0 so that the desired effect
|f (x) − | < ε
can be brought about by the (carefully designed) cause
0 < |x − p| < δ.
Now that we have divided out the two aspects of any limiting process, we can
look separately at the conditions that can be imposed upon them to express various
convergence/divergence behaviours of functions, including those that we have
already studied.
For the effects:
• f (x) converges to means f (x) gets very close to : | f (x) − | < ε for any given
positive ε;
• f (x) diverges to ∞ means f (x) gets extremely big and positive: f (x) > K for
any given real K;
11.2 FUNCTIONS TENDING TO INFINITY OR MINUS INFINITY 193
• f (x) diverges to −∞ means f (x) gets extremely big but negative: f (x) < K for
any given real K.
For the causes:
• x tends to p means x gets very close to but distinct from p: 0 < |x − p| < δ for a
suitably chosen positive δ;
• x tends to p− means x gets very close to p from the left but remains distinct
from p: p − δ < x < p for a suitably chosen positive δ;
• x tends to p+ means x gets very close to p from the right but remains distinct
from p: p < x K for a suitably
chosen real number K (which we can assume to be positive);
• x tends to minus infinity means x becomes very big but negative: x < K for a
suitably chosen real number K (which we can assume to be negative).
(Be careful not to use the same symbol K in cause and effect if you are combining
an infinity-type cause with an infinity-type effect. This is why we used K instead
of K on the last few lines.)
Now let’s assemble one of the new definitions: say, that of f (x) diverging to minus
infinity as x approaches p one-sidedly from the right. The desired effect is f (x) < K
for any given K. The appropriate cause is p < x 0
f (x) tends to −∞ as x → p+
11.2.1 Definition We say that f (x) → −∞ as x → p+ if, for each K ∈ R, there is

some δ > 0 such that
p < x < p + δ ⇒ f (x) < K.
We also then write limx → p+ f (x) = −∞.

(The definition assumes that the domain of f includes points in every open
interval that has p as its left-hand endpoint, in order that f (x) shall make sense
for some x such that p < x < p + δ no matter how small we choose δ. In other
words, p is assumed to be a limit point of D ∩ (p, ∞) where D is the domain of f .)
Another: what should be the official definition of f (x) diverging to infinity as

x tends to minus infinity? The desired effect is f (x) > K for any given K. The
appropriate cause is x < K for some suitable K chosen in response to the challenge
K. Combining:
f (x)
x
K
For each challenge K…
...there is a response K
f (x) tends to ∞ as x → −∞
11.2.2 Definition We say that f (x) → ∞ as x → −∞ if, for each K ∈ R, there is

some K ∈ R such that
x < K ⇒ f (x) > K.
We also then write limx→−∞ f (x) = ∞.
(The definition assumes that the domain of f is not bounded below, in order that
f (x) shall make sense for some values of x that are arbitrarily big and negative.)
x
11.2.3 Example Verify that → −∞ as x → 1+ .
1−x
Roughwork
x 1
Since = −1 + , we ask, given K and thinking of x as being slightly
1−x 1−x
1 1
greater than 1, how do we contrive that −1 + < K? That is, < K + 1?
1−x 1−x
Make sure for a start that K is strictly less than −1 to keep the signs unambiguous,
1 1
and this is then the same as 1−x > (1+K)−1 , that is, x < 1− = 1+ .
1+K |K| − 1
Solution
Given K, we first choose K ∗ = min{−2, K} (just to guarantee that the ‘new’ K shall
1
be strictly less than −1). Then, as the roughwork shows, the choice δ = ∗
|K | − 1
will ensure that
x
1<x<1+δ ⇒ < K∗ ≤ K
1−x
in line with the requirements.
11.2.4 Example Verify that x2 → ∞ as x → −∞.
Solution
Given K ∈ R (and arranging
√ if necessary that K > 0 so that the next step makes
sense), we choose K = − K (< 0). Then
x < K (< 0) ⇒ x2 > (K )2 = K
in line with the requirements.

Provided that you understand how we engineered the last two definitions, you
probably don’t even need to read the other eight. For the sake of completeness,
however, we shall set out the full collection of ten. In each case, the definition tacitly
assumes that the function f is defined on a subset of R that is extensive enough for
the conclusion about f (x) to make sense.
11.2.5 Definition We say that f (x) → ∞ as x → p if, for each K ∈ R, there is

0 < |x − p| < δ ⇒ f (x) > K.
We also then write limx → p f (x) = ∞.
11.2.6 Definition We say that f (x) → ∞ as x → p+ if, for each K ∈ R, there is

p < x K.
We also then write limx → p+ f (x) = ∞.
11.2.7 Definition We say that f (x) → ∞ as x → p− if, for each K ∈ R, there is

p − δ < x K.
We also then write limx → p− f (x) = ∞.
11.2.8 Definition We say that f (x) → ∞ as x → ∞ if, for each K ∈ R, there is

x > K ⇒ f (x) > K.
We also then write limx → ∞ f (x) = ∞.
11.2.9 Definition We say that f (x) → ∞ as x → −∞ if, for each K ∈ R, there is

x < K ⇒ f (x) > K.
We also then write limx→−∞ f (x) = ∞.
11.2.10 Definition We say that f (x) → −∞ as x → p if, for each K ∈ R, there is

0 < |x − p| < δ ⇒ f (x) < K.
We also then write limx→p f (x) = −∞.
11.2.11 Definition We say that f (x) → −∞ as x → p+ if, for each K ∈ R, there

is some δ > 0 such that
p < x 0 such that
p − δ < x K ⇒ f (x) < K.
We also then write limx→∞ f (x) = −∞.
11.2.14 Definition We say that f (x) → −∞ as x → −∞ if, for each K ∈ R, there

is some K ∈ R such that
x < K ⇒ f (x) < K.

We also then write limx→−∞ f (x) = −∞.
11.2.15 EXERCISE Show that
x+1
lim = ∞.
x→2 (x − 2)2
11.2.16 EXERCISE (Assuming basic facts about the sine function, including that
it is continuous) show that the function
1
cosec x =
sin x
diverges to −∞ as x → 0− , and diverges to ∞ as x → 0+ .
By analogy with our key theorems about characterising convergence of func-

tions in terms of limits of sequences, every one of these ten divergence definitions
can be translated into sequential limit terms. Rather than plough through ten more
theorems of high similarity, we shall set out details of a couple, and invite the
interested reader to explore one or two further exemplars.

above. Then the following are equivalent:
1. lim f (x) = ∞,
x →∞
2. for every sequence (xn )n∈N of elements of D that tends to ∞, we have
f (xn ) → ∞.
Proof
(I): (1) implies (2).
• Let (xn )n∈N be an arbitrary sequence of elements of D such that xn → ∞.
• For a given K, use condition (1) to obtain a number K such that whenever
x ∈ D and x > K , we have f (x) > K.
• Because xn → ∞, there is a positive integer n0 such that n ≥ n0 will
guarantee that xn > K .
• Therefore
n ≥ n0 ⇒ xn > K ⇒ f (xn ) > K.
• That is, f (xn ) → ∞ as required.
• Since (xn ) was any sequence in D that happened to tend to ∞, condition
(2) is now proved.

• That is, there exists a value of K ∈ R for which no suitable K ∈ R can be
found.
• In particular, for each n ∈ N, K = n is not suitable…
• …and so there is xn ∈ D such that xn > n and yet f (xn ) ≤ K.
• Therefore (xn )n∈N tends to ∞ (with each xn ∈ D), and yet (f (xn ))n∈N does
not tend to ∞.
11.2.18 Theorem Let f : D → C be a function and p a limit point of D ∩ (p, ∞).

Then the following are equivalent:
1. lim f (x) = −∞,
x → p+
2. for every sequence (xn )n∈N of elements of D ∩ (p, ∞) that tends to p, we have
f (xn ) → −∞.
Proof
(I): (1) implies (2).
• Let (xn )n∈N be an arbitrary sequence of elements of D ∩ (p, ∞) such that
xn → p.
• For a given value of K, use condition (1) to obtain a number δ > 0 such
that whenever x ∈ D and p < x < p + δ, we have f (x) < K.
guarantee that xn < p + δ.
• Therefore
n ≥ n0 ⇒ p < xn 0 can be found.
• In particular, for each n ∈ N, δ = n−1 is not suitable…
• …and so there is xn ∈ D such that p < xn < p + 1/n and yet f (xn ) ≥ K.
• Therefore (xn )n∈N tends to p (with each xn ∈ D), and yet ( f (xn ))n∈N does
not diverge to −∞.

11.2.19 EXERCISE Select two more of the various definitions for f (x) diverging
to infinity or minus infinity under constraints upon x, and formulate for each a
theorem (like the last two) characterising this divergence in terms of the behaviour
of sequences. Give a detailed proof of one of them. Do not expect to enjoy it
disproportionately.
11.2.20 EXERCISE Give sequence-based proofs of the last two worked examples
11.2.3 and 11.2.4.
11.2.21 Example To establish the following variant of 11.1.14: if the domain of f

is not bounded above, and we define g(x) = f (1/x), to show that the following are
equivalent:
• f (x) → ∞ as x → ∞,
• g(x) → ∞ as x → 0+ .
Solution
Suppose firstly that f (x) → ∞ as x → ∞. Given K ∈ R, we can therefore find
K ∈ R such that x > K ⇒ f (x) > K. Without loss of generality, K > 0. Then
put δ = 1/K > 0. We have
0 < x < δ ⇒ x−1 > K ⇒ g(x) = f (x−1 ) > K,
therefore g(x) → ∞ as x → 0+ .
Suppose secondly that g(x) → ∞ as x → 0+ . Given K ∈ R, we can find δ > 0
such that 0 < x < δ ⇒ g(x) > K. Put K = δ −1 . Then
x > K ⇒ 0 < x−1 < (K )−1 = δ ⇒ g(x−1 ) > K

⇒ f (x) = f (x−1 )−1 = g(x−1 ) > K.
That is, f (x) → ∞ as x → ∞.
11.2.22 Example To establish this variant of 11.1.13: given that f (x) ≥ Cg(x) for
all x ∈ (a, b) where C is a positive constant, and that g(x) → ∞ as x → a+ , to show
that f (x) → ∞ as x → a+ .
Solution
Given K ∈ R, we use the fact that g(x) → ∞ to find δ > 0 such that
a < x < a + δ ⇒ g(x) > K/C. It follows that a < x < a + δ ⇒ f (x) ≥
Cg(x) > C(K/C) = K. Thus f (x) → ∞ as x → a+ as required.
11.2.23 EXERCISE Use the preceding material 11.2.21 and 11.2.22 to show that

1 1
lim − = ∞.
x→0+ x2 x
(Hint: on the interval (0, 12 ), x−2 > 2x−1 .)

.........................................................................
12 Differentiation — the
slope of the graph
.........................................................................
12.1 Introduction
In everyday English, we call a line straight if its direction of travel is the same
wherever we choose to inspect it, and curved if the apparent direction of travel
varies as we shift the focus of our attention from one part of it to another. As usual,
those ideas need to be sharpened up before we can do significant work with them
but, at least in the case of straight lines on a plane surface, this is very elementary:
impose a grid of (cartesian) coordinates on the surface, identify two distinct points
(x1 , y1 ) and (x2 , y2 ) on the line, define the gradient or slope between them to be
y2 − y1
,
x2 − x1
and the informal idea of straightness corresponds tidily to the fact that the
numerical value of this ratio is the same no matter which two points you have
chosen.
In the case of a curved line in the coordinate plane – in particular, of the graph
y = f (x) of some function – we can still define the gradient between two points on
the graph in exactly the same way, but the notion of gradient at a typical point
is harder to make precise, partly because it is expected to vary with the point.
One approach is to draw, or to imagine drawing, a ‘tangent’ straight line that just
skims the curve at a chosen point, and whose gradient then gives a reasonable
interpretation of ‘gradient of the curve’ at that point . . . but such a procedure will
always be subject to error, to our limited drawing skill, even to the precision of our
instruments and the sharpness of our pencil. Besides, it is very time-consuming to
implement, even at a dozen or so points.
What would be really useful here is some routine procedure that could be applied
to the formula f (x) – if the curve indeed has such a formula – and that could
derive from it another formula for the gradient at any point. Now it is highly
probable that you have done enough calculus to be aware of such a procedure,
called differentiation, that works excellently upon a wide range of formulas and has
several different routines for dealing with the internal structure of formulas that are
modestly complicated, and also of important applications such as the identification
of regions where a function is increasing or decreasing, and of stationary points,
202 12 DIFFERENTIATION — THE SLOPE OF THE GRAPH
and maximizing or minimizing variable quantities. Our task in the present chapter
is to revisit this idea and look under the bonnet: Why does it work? What functions
does it work on? How do we proceed if we cannot access a suitable formula? Are
there functions where not only the familiar procedures fail, but the very idea of
gradient loses its meaning? What further applications can be anticipated? We make
no pretence to give complete answers to such questions, for calculus is a very large
field, but we shall make a start (and later chapters will continue aspects of the
study).
To determine the slope of the graph at a particular point (say, at the point
P = (a, f (a))) without having to depend on the uncontrolled approximation of
our limited draftsmanship, we consider a nearby point on the same graph (say,
Q = (a + h, f (a + h))) and the exact gradient of the straight line PQ. This is
f (a + h) − f (a) f (a + h) − f (a)
=
(a + h) − a h
and, if the curve is reasonably smooth (an idea which we also need to clarify) we
should expect this ratio to be an approximation to the gradient of the curve (that
is, of the ideal tangent line to the curve) at P, and that it should become a better
and better approximation as the horizontal width h of the segment PQ becomes
smaller and smaller. This leads us to define the gradient of the curve at P to be the
limit of this ratio as h → 0, and to replace the intuitive term ‘smooth’ by the simple
mathematical demand that this limit shall exist. (You may also find it useful to re-
read an earlier example – see 9.1.2 and 9.2.6 – in which we actually did all this with
the curve y = f (x) = x2 at the point (3, 9).) This is precisely what we do with a
general function f :
f (x)
Q
f (a + h)
The ‘tangent’ at P
P
f (a)
a a+h x
Line PQ approximates ‘tangent’ at P

12.2 THE DERIVATIVE 203
f (x)
The ‘tangent’ at P
P
f (a)
a x
The approximation improves as h → 0
12.2 The derivative

12.2.1 Definition Given a function f whose domain includes a bounded open
interval of the form (p−d, p+d) centred on a point p, we say that f is differentiable
at p if the limit1
f (p + h) − f (p)
lim
h→0 h
exists. We then call this limit the derivative of f at p, and it is denoted by f (p)
df
or (p) (or, indeed, several other notations depending on context or on the
dx
preferences of the writer). Any reference to the gradient or slope or direction of
the curve or of its tangent at P means this derivative; indeed, we could then define
the tangent to be the straight line through P whose gradient is this derivative. The
process of obtaining a derivative is called differentiation.
Many functions turn out to be differentiable not merely at one or several points,
but at every point of an interval or even of the entire domain of the function. When
this is the case, we say that f is differentiable on the interval or domain in question,
and the derivative f (x) at a typical point x where it makes sense is then itself a
function (sometimes also called the derived function of f ) defined on that interval
or on that domain. When no interval or domain is mentioned, the phrase ‘f is
1 or, equivalently, the limit

f (x) − f (p)
lim
x→p x−p
differentiable’ usually means that f is differentiable at every point of its domain of

definition.
We establish differentiability of a handful of simple functions:
12.2.2 Proposition A constant function is differentiable, and its derivative is

(constantly) zero.
Proof
When f (x) = k for every x in some open interval centred on p, k being a constant,
the limit
f (p + h) − f (p) k−k 0
lim = lim = lim = 0
h→0 h h→0 h h→0 h
as predicted. (Notice that we had to assume that |h| was less than the half-width of
the open interval, but that this is not a problem since h is tending to zero.)
12.2.3 Proposition An identity function f (x) = x is differentiable, and its deriva-

tive is (constantly) 1.
Proof
When f (x) = x for every x in some open interval centred on p, we get
f (p + h) − f (p) (p + h) − p h
lim = lim = lim = 1
h→0 h h→0 h h→0 h
as predicted. (Again, there is the tacit assumption that |h| is small enough to put
p + h inside the interval where we know f is identity.)
12.2.4 Proposition The function f (x) = x2 is differentiable, and its derived func-
tion is f (x) = 2x.
Proof
Calculation of the relevant limit at a typical point p gives
f (p + h) − f (p) (p + h)2 − p2 2ph + h2

lim = lim = lim = lim (2p+h) = 2p.
h→0 h h→0 h h→0 h h→0
Re-writing that to express the discovery that the derivative at any point is twice the
x-value there, we normally write f (x) = 2x (although f (p) = 2p for each p ∈ R
is equally correct).
12.2.5 Proposition For any positive integer n, the function f (x) = xn is differen-
tiable, and its derived function is f (x) = nxn−1 .
Proof
It’s really the same argument as the others, except that this time we need the
binomial theorem to unscramble the algebra:
(p + h)n − pn pn + npn−1 h + 12 n(n − 1)pn−2 h2 + · · · + hn − pn

=
h h
npn−1 h + 12 n(n − 1)pn−2 h2 + · · · + hn
=
h
1
= npn−1 + n(n − 1)pn−2 h + · · · + hn−1
2
whose limit, as h → 0, is
npn−1 + 0 + 0 + · · · + 0 = npn−1 .
So the derived function, the derivative, is f (p) = npn−1 for each p ∈ R or, in
slightly more familiar function notation, f (x) = nxn−1 .
We could continue this catalogue of special cases, but once again it will be more
profitable and efficient to develop rules for processing ‘built-up’ functions based on
the derivatives of the simple components out of which they have been assembled.
Most of these rules should already be familiar to you, but possibly not the reasons
why they are valid, so we shall discuss them in fair detail.
12.2.6 Differentiation of a sum, of a difference, and of a scaled function If f and

g are both differentiable at p, and k is a constant, then
1. f + g is differentiable at p, and its derivative is f (p) + g (p);
2. kf is differentiable at p, and its derivative is kf (p);
3. f − g is differentiable at p, and its derivative is f (p) − g (p).
Proof
For 1, notice that (for any sufficiently small h = 0)
(f + g)(p + h) − (f + g)(p) f (p + h) + g(p + h) − f (p) − g(p)

=
h h
f (p + h) − f (p) g(p + h) − g(p)
= +
h h
whose limit, as h → 0, is f (p) + g (p), as expected.

EXERCISE: check part 2 (on very similar lines).
Part 3 follows from parts 1 and 2, by expressing f − g as f + (−1)g.
To prepare for the next two rules, we need to connect with material from
Chapter 8 (and to keep in mind that a function f is continuous at a suitable point
p precisely when the limit of f (x) as x → p coincides with the value f (p) of f at p
– see 9.2.21 for this important insight):
12.2.7 Theorem – differentiable implies continuous If a function f is differen-

tiable at a point p, then f is continuous at p.
Proof
Given that the limit
f (p + h) − f (p)
lim = f (p)
h→0 h
exists, all we need do is to notice that

f (p + h) − f (p)
× h + f (p) = f (p + h),
h
and now take limits (as h → 0) across this equality. We discover that
f (p + h) → f (p) × 0 + f (p) = f (p) as h → 0,
which, letting x stand for p + h, is the same as saying that f (x) → f (p) as x → p.
12.2.8 EXERCISE Show that the converse of this result is not true, by checking
that the function m(x) = |x| is continuous at x = 0 but not differentiable at
x = 0. You will almost certainly use one-sided limits (both of m(x) itself, and of
(m(0 + h) − m(0))/h) since the behaviour of m (that is, m(x) = x if x ≥ 0, but
m(x) = −x if x < 0) is different on the two sides of x = 0.
12.2.9 Differentiation – the product rule If f and g are both differentiable at p,

then so is their product fg, and its derivative there is
(fg) (p) = f (p)g (p) + f (p)g(p).
Proof
We need to evaluate the limit of
f (p + h)g(p + h) − f (p)g(p)
h
and our initial problem is that the two halves of the top line have nothing in
common. Remembering what happened in a similar impasse many pages ago, we
bring in an extra term that does have a common factor with each half, thus:
f (p + h)g(p + h) − f (p + h)g(p) + f (p + h)g(p) − f (p)g(p)

=
h
g(p + h) − g(p) f (p + h) − f (p)
= f (p + h) + g(p) .
h h
Here is the moment when we need the ‘differentiable implies continuous’ theorem.
As h → 0, f (p + h) → f (p), and the other components of the last display have
their more obvious limits; so we get
f (p + h)g(p + h) − f (p)g(p)
→ f (p)g (p) + g(p)f (p)
h
as expected.
12.2.10 Differentiation – the quotient rule If f and g are both differentiable at

p and g(p) = 0, then their quotient f /g is differentiable at p, and its derivative
there is
f g(p)f (p) − f (p)g (p)
(p) = .
g (g(p))2
Partial proof
We need to evaluate the limit of
f (p+h) f (p)
g(p+h) − g(p) f (p + h)g(p) − f (p)g(p + h)
=
h hg(p + h)g(p)
and, firstly, we must make sure that no division by zero can happen. It is built into
the definition of differentiability that there are already two open intervals centred
on p in which f and g are defined. Also, using again the ‘differentiable implies
continuous’ theorem, g is continuous at p and g(p) = 0, so there is2 a third open
interval centred on p throughout which g does not take the value zero. Take now
the smallest of the three intervals: here, f and g are defined and g is non-zero so,
once h is small enough to put p + h into that interval, no risk remains of our piece
of algebra failing to make sense.
Next, we again introduce an extra term that has something in common with
each half of the top line:
2 If you do not find this to be sufficiently convincing, here is a fuller argument. Since |g(p)| > 0
and g(p+h) → g(p) as h → 0, there is δ > 0 such that whenever |h| < δ we get |g(p+h)−g(p)| <
|g(p)|. That last inequality forces g(p + h) = 0, and it holds whenever p + h lies in the interval
(p − δ, p + δ).
f (p + h)g(p) − f (p)g(p) + f (p)g(p) − f (p)g(p + h)

··· =
hg(p + h)g(p)
{f (p + h)g(p) − f (p)g(p)} + {f (p)g(p) − f (p)g(p + h)}
= .
hg(p + h)g(p)
The rest of the proof proceeds on the same lines as did that of the product rule.
12.2.11 EXERCISE Complete this proof. (You can expect to use the ‘differentiable
implies continuous’ theorem yet again.)
Next, corresponding to the fact that the composite of two continuous functions
is continuous, we have that the composite of two differentiable functions is differ-
entiable.
12.2.12 The chain rule – differentiating a composite function If f is differen-

tiable at p, and q = f (p) and g is differentiable at q, then g ◦ f is differentiable at p
and its derivative there is g (f (p))f (p).
Partial proof
As h → 0, the quantity k = f (p + h) − f (p) also converges to 0 since the
(differentiable) function f is continuous. Consider the equation
(g ◦ f )(p + h) − (g ◦ f )(p) (g ◦ f )(p + h) − (g ◦ f )(p) f (p + h) − f (p)

= ×
h f (p + h) − f (p) h
g(f (p + h)) − g(f (p)) f (p + h) − f (p)
= ×
f (p + h) − f (p) h
g(f (p) + k) − g(f (p)) f (p + h) − f (p)
= ×
k h
g(q + k)) − g(q) f (p + h) − f (p)
= × .
k h
Now as h → 0 and, in consequence, k → 0 also, this converges to g f where the

derivative g is taken at the point q = f (p), and the derivative f is taken at p. In
other words, the derivative of the composite has been established as g (f (p))f (p)
as expected.
Unfortunately, this is not yet a general proof – because at our first step of
multiplying by
f (p + h) − f (p)
f (p + h) − f (p)
we tacitly assumed that f (p + h) − f (p) was not zero. This is a safe assumption
provided that f (p) = 0 because, in that case, the ratio
f (p + h) − f (p)
h
will be converging to a non-zero limit, and must therefore itself be non-zero for
sufficiently small h. In the general case where f (p) might be zero, the result is still
true, but we shall need to find a different way of proving it.3 (See the upcoming
12.5.2 for a ‘tidy’ proof that works in all cases.)
12.2.13 Note The chain rule, as we have presented it so far, applies only to the
composite of two differentiable functions, but it extends readily to three or more.
This is a place where the older style of notation df /dx, rather than f (p), makes it
easier to explain what is going on. Suppose we are given three functions f , g, h with
domains such that the composite h ◦ g ◦ f makes sense, and each of the three is dif-
ferentiable at appropriate points (starting with p for f ). Temporarily write y = f (x),
z = g(y), u = h(z). Then
dy dz du
means f , means g , means h .
dx dy dz
3 Meanwhile, here is an ad hoc proof for the awkward case f (p) = 0. The proof is really
intricate, and it will probably be wise to skip it on a first (and, indeed, on a second and a third)
reading.
Suppose f (p) = 0, q = f (p), g (q) = M. For small values of h, put k = f (p + h) − f (p), so that
f (p + h) = f (p) + k, that is, f (p + h) = q + k. Let ε > 0 be given. Then:
g(q + k) − g(q)
there is δ1 > 0 such that 0 < |k| < δ1 ⇒ ∈ [M − 1, M + 1]
k
⇒ |g(q + k) − g(q)| ≤ (1 + |M|) × |k|.
This also holds when k = 0. Continuing:

f (p + h) − f (p) ε
there is δ2 > 0 such that 0 < |h| < δ2 ⇒ <
1 + |M|
h
ε|h|
⇒ |f (p + h) − f (p)| <
1 + |M|
ε|h|
⇒ |k| < .
1 + |M|
Now f is continuous at p, so
there is δ3 > 0 such that |h| < δ3 ⇒ |f (p + h) − f (p)| < δ1 that is, |k| < δ1 .
Finally, if 0 < |h| < min{δ2 , δ3 }, we get:
ε|h|
|k| < δ1 , therefore |g(q + h) − g(q)| < (1 + |M|) ×
1 + |M|
which implies
g(f (p + h)) − g(f (p))
< ε.
h
Hence, since ε was arbitrary,
g(f (p + h)) − g(f (p))
lim = 0 = g (f (p))f (p).
h→0 h
(The weakness of this notation is that it does not explicitly name the individual
points at which these derivatives are evaluated – these have to be judged by
context.)
Then, using the chain rule,
dz
z = g(f (x)) so is (g ◦ f ) (p) = g (f (p))f (p)
dx
and, continuing,
du
u = h((g ◦ f )(x)) so is h (g(f (p)))(g ◦ f ) (p) = h (g(f (p)))g (f (p))f (p).
dx
Thus the derivative of the three-way composite exists, and the two alternative
notations that we have for it are
du du dz dy
(h ◦ g ◦ f ) (p) = h (g(f (p)))g (f (p))f (p) and = .
dx dz dy dx
Despite its reluctance to name points, the second strikes many people as much
more readable. It is also useful in helping us to remember what the chain rule
says, because it looks as if the two dz and the two dy cancel out – of course,
that is emphatically not what is really happening, since du/dx and its cousins
are not fractions. As an aide memoire, however, the fact that they multiply as
if they were fractions makes it easy to keep track of what the chain rule is
telling us.
12.2.14 Example (Assuming for the moment that the derivatives of sin x and ex
are cos x and ex respectively) we differentiate the function sin((x3 + ex )7 ).
Solution — modern notation

Unpick the given formula into its components so that we can perceive it as a
composite: if f (x) = x3 + ex and g(x) = x7 and h(x) = sin x then the function
described by the given formula is j(x) = h(g(f (x))), so (for any x)
j (x) = h (g(f (x)))g (f (x))f (x) = cos(g(f (x))) × 7(f (x))6 × (3x2 + ex )
= cos((x3 + ex )7 ) × 7(x3 + ex )6 × (3x2 + ex ).
Solution — ‘heritage’ notation

Unpick y = sin((x3 + ex )7 ) into its components
y = sin u, u = v7 , v = x3 + ex .
Then
dy du dv
= cos u, = 7v6 , = 3x2 + ex , so
du dv dx
dy dy du dv
= = cos u × 7v6 × (3x2 + ex )
dx du dv dx
= cos((x3 + ex )7 ) × 7(x3 + ex )6 × (3x2 + ex ).
12.2.15 Note For the sake of (relative) completeness we should also discuss here
the rule for differentiating an inverse function (see Chapter 8 for more detailed
comments on inverse functions, including their continuity) and we shall first do
so rather informally.
If f : (a, b) → R is strictly increasing or strictly decreasing and f (p) = 0 for
a point p ∈ (a, b) then, merely because f is one-to-one from (a, b) onto its range
f ((a, b)), the inverse mapping f −1 : f ((a, b)) → (a, b) exists. Better than that,
though, f −1 is differentiable at f (p) and its derivative is
1
(f −1 ) (f (p)) = .
f (p)
In what we called ‘heritage’ notation in the last paragraph, this can be expressed by
saying that if y = f (x) is strictly monotonic on an open interval and dy/dx is non-
zero at a point, then the inverse x = g(y) is also differentiable at the corresponding
point and
dx 1
=
dy dy
dx
(provided we carefully keep track of the points at which the derivatives are
calculated). Once again, although these symbols are not fractions, we see that they
can be manipulated as if they were, and the observation helps us to hold the result
in mind.
√
As a small illustration, we determine the derivative of 3 x at any point. Starting
with the function f (x) = x3 , which is strictly increasing on R and has derivative
√
3p2 at each point p, we see that the inverse is given by f −1 (x) = 3 x, and its
derivative at f (p) = p3 is
1 1 p

= 2 = 3
f (p) 3p 3p
√
provided we avoid division by zero, of course. Putting x = p3 , that is, p = 3 x, this
√
says (more readably) that the derivative of 3 x at any x except 0 is
√
3
p x 1
3
= = x−2/3 .
3p 3x 3
√
Switching to the alternative view using heritage notation: if y = 3 x then x = y3
and so dx/dy = 3y2 . Provided that this is non-zero, we therefore get
dy 1 1 1 1 √ 1 2
= = 2 = y−2 = ( 3 x)−2 = x− 3 (x = 0)
dx dx 3y 3 3 3
dy
(if we take care to track the points at which the derivatives are calculated).
Here is a proof of the result that we have outlined over the last page.
12.2.16 Theorem If f : I → f (I) is a continuous strictly increasing function on

an interval I, and f is differentiable at a non-endpoint point p of I, and f (p) = 0,
then the inverse function f −1 is differentiable at f (p), and
−1 1
f (f (p)) = .
f (p)
Proof
Differentiability at p tells us that f is defined (and continuous) on a small open
interval centred on p, so there is a small open interval centred on f (p) contained
in the range (see Lemma 8.6.4 if this is not clear). Thus, for all sufficiently small
non-zero k, f (p) + k is in that range and we can find h = 0 such that
f (p) + k = f (p + h).
(To be fussy, h depends on k so we should really write it as h(k), but that would
make the algebra harder to read.)
Now, again using g to stand for the inverse of f , to investigate g at f (p) we must
look for a limit (as k → 0) of:
g(f (p) + k) − g(f (p))

(f (p) + k) − f (p)
g(f (p + h)) − g(f (p)) p+h−p h
= = =
k f (p + h) − f (p) f (p + h) − f (p)

f (p + h) − f (p) −1
= .
h
As k → 0, so does h because4 g is continuous at f (p). Then the content of the final

large pair of brackets tends to (non-zero) f (p), and the overall answer is (f (p))−1 ,
as predicted.
4 h = g(f (p) + k) − g(f (p)) which → 0 as k → 0

12.3 UP AND DOWN, MAXIMUM AND MINIMUM 213
12.3 Up and down, maximum and minimum:

for differentiable functions
Since the derivative f (p) gives the slope of the curve y = f (x) at a typical
point (p, f (p)) on its graph, it is hardly surprising that positive gradients are
associated with upward-sloping, rising graphs and with functions that increase as
you increase the input value x, and negative gradients with downward-sloping,
falling graphs and functions that decrease. Analysing the precise nature of these
associations requires some care and, in the course of exercising that care, we shall
encounter a couple of theorems that have much wider application and significance
than our short-term aims suggest.
You may find it useful at this point to review the ideas of increasing and
decreasing functions as discussed in Chapter 8: especially paragraphs 8.6.1 and
8.6.3.
12.3.1 Lemma
1. If f is increasing on an open interval I and differentiable at a point p of I, then

f (p) ≥ 0.
2. If f is decreasing on an open interval I and differentiable at a point p of I, then
f (p) ≤ 0.
Proof
By assumption,
f (x) − f (p)
x−p
converges to the limit f (p) as x → p. Choose a sequence (xn ) of numbers greater
than p that converges to p and we get both f (xn ) − f (p) ≥ 0 and xn − p > 0, and
therefore also
f (xn ) − f (p)
≥0
xn − p
for all n. Therefore f (p) is the limit of a sequence of non-negative numbers, and
so is non-negative itself (via ‘taking limits across an inequality’, Theorem 4.1.17).
The proof of part 2 is very similar.
12.3.2 Notes
1. Although it is tempting to believe that if f is strictly increasing on an open

interval I and differentiable at a point p of I, then f (p) should be strictly
greater than 0, it is not true. For example, the function f (x) = x3 is strictly
increasing on (−1, 1) and differentiable at 0, and yet f (0) = 0 exactly.
2. Although it is reasonable and, indeed, correct, to expect some sort of converse

to the lemma, the sign of the derivative at a single point is not enough to
guarantee that the function shall increase or decrease over a small enough
interval. For a relatively difficult exercise, you might like to investigate the
following assertion: that the function f defined by
f (x) = x2 sin(x−1 ) + 0.5x if x = 0, f (0) = 0
is differentiable at every point of the real line, and f (0) = +0.5, and yet in
every interval of the form (−δ, δ) there are points at which f is strictly less
than 0, and sub-intervals on which f is strictly decreasing.
3. We shall find converses along the lines of: if f (x) ≥ 0 at every point of an
interval, then f is increasing on that interval (and so on).
12.3.3 Definitions Suppose that f is defined on an interval (open or closed) I.

• We say that f has a local maximum at p ∈ I if, for some δ > 0,
f (p) ≥ f (x) for every x ∈ I ∩ (p − δ, p + δ);
we also call the point (p, f (p)) a local maximum point on the graph of f .
• We say that f has a local minimum at p ∈ I if, for some δ > 0,
f (p) ≤ f (x) for every x ∈ I ∩ (p − δ, p + δ);
we also call the point (p, f (p)) a local minimum point on the graph of f .
Bear in mind that a local maximum may very well not show us an overall
maximum value that the function might reach in the interval: for one thing, there
may be several local maxima; for another, it is possible (depending partly on
the type of interval) that the function is unbounded above, or never attains the
supremum of its values. Consider, for instance, the functions whose graphs are
sketched here:
Two local maxima

Local maximum, but function unbounded
Local maximum, but overall maximum not attained
In the same way, the value at a local minimum might not be an overall, global
minimum value on the interval.
Also be aware that, at an endpoint of the interval I (in the case where it has an
endpoint that belongs to I), the local-maximum and local-minimum criteria only
pay attention to what happens on one side of the point (since the function is not
defined on the other side). For instance, f (x) = x2 on the interval [−2, 3] has a
local maximum at x = −2 because, for instance, f (−2) ≥ f (x) for every x within
[−2, 3] ∩ (−2 − 0.5, −2 + 0.5) even though that intersection, namely [−2, −1.5),
only contains points at and on the right of −2.
12.3.4 Lemma If f : I → R has a local maximum (or a local minimum) at p, and

is differentiable at p, then f (p) = 0.
Proof
Remember that differentiability at p includes the fact that f is defined throughout
some open interval (p − δ, p + δ) centred on p and, if f also has a local maximum
at p, we can make that number δ small enough to ensure that f (p) ≥ f (x) for every
x in (p − δ, p + δ). Pick a sequence (xn ) of numbers greater than p in that interval

that converges to p, and note that (for each n)
f (xn ) − f (p)
f (xn ) − f (p) ≤ 0, xn − p > 0 and so ≤ 0.
xn − p
Since f (p) is the limit of that last fraction, we now get f (p) ≤ 0 also.
Repeat the argument with a sequence (yn ) of numbers less than p that converges
to p, and we find that f (p) ≥ 0. To reconcile the two findings, we must have
f (p) = 0.
To establish the result for a local minimum, apply what we have just discovered
to the function (−f ).
Recall that if a function f is continuous on a bounded closed interval [a, b], then
it must reach a greatest value (and a smallest value) somewhere in that interval.
From the previous lemma there are three possibilities about a point where it does
this: this point could be a point where f takes the value zero, or a point where f is
not differentiable, or an endpoint of the interval. Each of the three possibilities can
actually occur, as even a few rough sketch graphs will readily indicate:
max
max
max
Overall maximum at Overall maximum at a Overall maximum at an

a point where f = 0 point where f does not exist endpoint of the domain
12.3.5 Definition A stationary point (also called critical point) of a function f is

a number p for which f (p) = 0. (The term is also sometimes applied to the point
(p, f (p)) on the graph of f .) We can re-cast the preceding discussion as follows:
12.3.6 Theorem Let f be continuous on [a, b] and differentiable on (a, b). Then
f takes its greatest value (and its smallest value) either at a stationary point of f , or
at an endpoint of the interval.
12.3.7 Note Stationary points do not have to be local (or global) maxima nor
minima: for instance, x = 0 is a stationary point for f (x) = x3 on the interval
[−1, 1], but it is not a maximum/minimum of any kind. For a more complicated
example,
f (x) = x2 sin(x−1 ) when x = 0; f (0) = 0

gives a function differentiable on the whole of R that has f (0) = 0, and yet there
are points arbitrarily near to x = 0 at which f is greater than 0 = f (0), and points
arbitrarily near to x = 0 at which f is less than 0.
12.3.8 Example (Assuming for the moment that the derivative of ex is ex ) find the
greatest and least values of f (x) = xe−x on the interval [0.5, 2].
Solution
Using both the product rule and the chain rule, we find that
f (x) = 1e−x + xe−x (−1) = (1 − x)e−x which is zero only at x = 1. The greatest
and least values of f can therefore only occur at 1 or at an endpoint of the interval.
Since the values of f (x) at x = 0.5, 1, 2 (respectively) are 0.5e−0.5 , e−1 , 2e−2 and
evaluate approximately to 0.303, 0.368, 0.271, it is clear that the maximum value
on the interval is e−1 and that the minimum value is 2e−2 .
x
12.3.9 EXERCISE Let f (x) = 2 for each real number x. Find the largest and
x +1
smallest values that f (x) can attain while x ranges over:
1. the interval [−10, 0],
2. the interval [− 12 , 2],
3. the interval [2, 6],
4. the whole real line (if indeed such largest and smallest values exist: note that
12.3.6 does not directly apply on this unbounded interval).
Now we meet the two theorems that will have significant roles to play later in the
text, as well as being useful in getting the converse implications that we mentioned
earlier. The first is actually a special case of the second, but is the version that we
are able to prove almost immediately from our earlier work.
12.3.10 Rolle’s theorem (‘RT’) Let f be continuous on [a, b] and differentiable on

(a, b). If also f (a) = f (b), then there is at least one point c in (a, b) such that
f (c) = 0 (a stationary point in the open interval).
Proof
In the special case where f is constant on the whole of [a, b], this result is trivial
and immediate: any point c ∈ (a, b) will do equally well.
If not, then either f takes somewhere in (a, b) values that are strictly greater
than f (a), or f takes somewhere in (a, b) values that are strictly smaller than f (a)
(or both, of course). If the former, then the greatest value of f on [a, b] is not
attained at a nor b, but at a point c in (a, b): and an earlier result (12.3.4) tells us
that f (c) = 0.
The latter case is similar.
f (a) = f (b)
a any c b
Rolle’s theorem – case 1
f (a) = f (b)
a ? c b
f (a) = f (b)
a ? c b
12.3.11 Example To show (assuming basic results about trig functions) that the
equation
(1 + 3x2 ) sin x + (x + x3 ) cos x = 0
has at least one solution in the interval (0, π ).
Solution
The function (x + x3 ) sin x is continuous on [0, π ] and differentiable on (0, π ), and
takes equal values (zero) at 0 and at π , so RT applies and tells us that its derivative,
namely (1 + 3x2 ) sin x + (x + x3 ) cos x, is zero at some point between 0 and π , as
the question required.
12.3.12 Example To show using Rolle’s theorem (and assuming basic results
2
about trig functions) that the equation tan x = has at least one solution in the
x
interval (0, π/2).
Solution
(The difficulty here is to decide what function to apply the theorem to. Re-
writing the equation as x tan x = 2, and then as x2 tan x = 2x, and then as
x2 sin x/ cos x = 2x, and finally as x2 sin x − 2x cos x = 0 reveals x2 cos x as
the key formula.)
The function x2 cos x is continuous on [0, π/2] and differentiable on (0, π/2),
and takes equal values (zero) at 0 and at π/2, so RT applies and tells us that its
derivative, namely −x2 sin x + 2x cos x, is zero at some point strictly between 0
and π/2, whence the result follows (noting that, on the open interval (0, π/2),
neither x nor cos x is zero, so dividing by them as we unpick the roughwork is
legitimate).
12.3.13 EXERCISE Using Rolle’s theorem (and proof by contradiction) show that,
whatever constants a and b are selected, the graph of
y = f (x) = x4 + 4x3 + 12x2 + ax + b
cannot have more than one stationary point.
12.3.14 EXERCISE Show that there is a sequence (cn ) in the interval (0, π/2) such
that, for each n ∈ N:
n
tan(cn ) = .
cn
(Hint: for each positive integer n, consider the function described by the formula
xn cos x.)
12.3.15 The first mean value theorem (‘FMVT’) Let f be continuous on [a, b] and
differentiable on (a, b). Then there is at least one point c in (a, b) such that
f (b) − f (a)
f (c) = .
b−a
Proof
(The idea is to modify the given function, by subtracting a multiple of x, to make
it satisfy all three conditions of Rolle’s theorem instead of just the first two.)
We seek a constant λ for which the function g(x) = f (x) − λx (which will at
least be continuous on [a, b] and differentiable on (a, b) because f was) also obeys
g(a) = g(b). Easy algebra gives the answer that
f (b) − f (a)
λ=
b−a
and now Rolle applied to g gives us c ∈ (a, b) such that 0 = g (c) = f (c) − λ, as
was required.
f (b) − f (a)
12.3.16 Remark Since is precisely the gradient of the straight line
b−a
that joins the first and last points P = (a, f (a)) and Q = (b, f (b)) on the graph of f
over [a, b], this result has easy geometrical interpretations: there is a point on the
graph (not at its first or last points) where the tangent to the curve runs parallel to
the straight line PQ; since PQ gives a kind of ‘overall slope’ for this section of the
graph, that says there is a point where the ‘instantaneous slope’ (of the curve itself)
equals the average slope, the mean value of the gradient over the whole interval.
There may, of course, be more than one such point.
c
First mean value theorem
12.3.17 Example To show (assuming standard facts about the trig functions) that
the equation
9
π(1 + x3 ) cos(π x) + 3x2 sin(π x) =
4
has at least one solution in the interval (0, 12 ).
Solution
We notice that the left-hand side of the equation is the derivative of
f (x) = (1+x3 ) sin(π x) (using the product and chain rules). Now f (x) is continuous
on [0, 12 ] and differentiable on (0, 12 ), and f (0) = 0 and f ( 12 ) = ( 98 ) sin( π2 ) = 98 , so
the FMVT guarantees at least one solution in (0, 12 ) of the equation
9
−0
f (x) = 8
1
,
2 −0
which is equivalent to what we were asked.
12.3.18 EXERCISE Let n be a positive integer. Use the two different methods
indicated to verify that the equation
√
nxn−1 = 1 − xn
has a solution in (0, 1):

1. by applying the FMVT to a suitable function,
2. by applying the intermediate value theorem to a suitable function.
12.3.19 Theorem: sign of derivative indicates monotonicity of function Let f be

continuous on [a, b] and differentiable on (a, b). Then
1. if f (x) > 0 for each x ∈ (a, b) then f is strictly increasing on [a, b],
2. if f (x) ≥ 0 for each x ∈ (a, b) then f is increasing on [a, b],
3. if f (x) < 0 for each x ∈ (a, b) then f is strictly decreasing on [a, b],
4. if f (x) ≤ 0 for each x ∈ (a, b) then f is decreasing on [a, b].
Proof
All four proofs are really the same argument.5 For instance, if f (x) > 0 for each
x ∈ (a, b) then, for any choice of p, q in [a, b] we can apply FMVT to f on the
interval [p, q] to obtain
f (q) − f (p)
= f (c), some c ∈ (p, q) ⊆ (a, b)
q−p
and so f (q) − f (p) = (q − p)f (c) is strictly positive; hence f (q) > f (p) and we
conclude that f is strictly increasing on [a, b].
Here is a rather classy application of the FMVT that uses limit of a derivative
to prove existence of a derivative at a single point at which differentiability was in
doubt:
12.3.20 Theorem Suppose we are told that a function f is continuous on

(a − δ, a + δ) (for some positive δ), differentiable on (a − δ, a) and on (a, a + δ)
(but not necessarily at the point a), and that limx→a f (x) exists (equal to , say).
Then f is also differentiable at a, and f (a) = .
Proof
For h positive and smaller than δ, the function f satisfies the FMVT conditions on
the interval [a, a + h] so there is a point ch in (a, a + h) such that
f (a + h) − f (a)
= f (ch ).
h
5 With a little care, the results can be extended to apply to unbounded intervals also: see
Example 12.3.21.
Now is the limit of f as we approach a so, given ε > 0, we can find

positive δ1 smaller than δ such that, in particular, |f (x) − | < ε whenever
x ∈ (a, a + δ1 ). If now 0 < h < δ1 , we see that ch lies in that interval, so
|f (ch ) − | < ε. Thus f (ch ) → as h → 0+ . In other words,
f (a + h) − f (a)
lim = .
h→0+ h
Essentially the same argument, applied to small negative values of h, yields
f (a + h) − f (a)
lim = ,
h→0− h
and the equality of the one-sided limits gives (see Theorem 10.3.9) what we wanted.
12.3.21 Example To show that a function f that is continuous on [a, ∞) and has
positive derivative at every point of (a, ∞) must be strictly increasing on [a, ∞).
Solution
For any p, q in [a, ∞) such that p < q, we observe that f is continuous on [p, q] and
differentiable (with positive derivative) on (p, q). By 12.3.19, f is strictly increasing
on [p, q]. In particular, f (p) < f (q): hence the result.
12.3.22 A specimen theorem on bounded monotonic functions If f is increas-

ing on (a, b) and bounded above, then the one-sided limit limx→b− f (x) exists.
Likewise if f is decreasing on (a, b) and bounded below.
Proof
(This result and the next exercise are simply extensions of paragraph 10.3.5.) Put
M = sup{f (x) : x ∈ (a, b)} (which must exist since the set of values here is non-
empty and bounded above). Given ε > 0, by the definition of supremum we can
find x ∈ (a, b) for which f (x ) > M − ε. Now since f is increasing:
x < x < b ⇒ M − ε < f (x ) ≤ f (x) ≤ M
and it follows that
b − (b − x ) < x < b ⇒ |M − f (x)| < ε
so the one-sided limit exists, and equals M.

The proof of the second claim is similar.
12.3.23 EXERCISE Show that if f is increasing on (a, b) and bounded below, or if

f is decreasing on (a, b) and bounded above, then the one-sided limit limx→a+ f (x)
exists.
12.4 HIGHER DERIVATIVES 223
12.3.24 Example (Assuming for the moment that ln x is differentiable on (0, ∞)

and that its derivative is x−1 ) find the smallest value of the function f (x) = x ln x
on (0, ∞), and verify that
lim f (x)
x→0+
exists.
Proof
Since x−1 is always positive on (0, ∞), ln x is strictly increasing there. By the
product rule, f (x) = x(x−1 ) + (ln x) × 1 = 1 + ln x which is positive on (1/e, ∞)
since ln x > −1 = ln(e−1 ) there, therefore f is strictly increasing on [1/e, ∞).
Likewise, f (x) is negative on (0, 1/e) since ln x < −1 = ln(e−1 ) there, therefore
f is strictly decreasing on (0, 1/e). That already tells us that the smallest value that
f can take is f (1/e) = −1/e.
(The fact that f (1/e) = 0 flags up 1/e as the only stationary point of f is also
informative; on its own, though, it doesn’t indicate what sort of a stationary point
occurs here.)
Secondly, the observation that f is negative on (0, 1/e), that is, bounded above
by 0 (as well as decreasing) allows us to invoke the previous Exercise 12.3.23 to
conclude that limx→0+ f (x) exists.
12.4 Higher derivatives

12.4.1 Definitions When, as often happens, the derivative f of a function f exists
at every point of an open interval or of a union of open intervals, so that f (x)
emerges as a properly defined (derived) function in its own right, it may well be
worth asking whether f can itself be differentiated. If it can (either at a point or
across a union of one or more open intervals) then its derivative is called the second
d2 f
derivative of f and is written as f or, occasionally, as 2 . In turn, if f turns out
dx
to be differentiable, then its derivative f is known as the third derivative of f , and
so on.
Beyond the third, the multiple dashes notation becomes clumsy and hard
to read, and very few texts use the notation f , preferring either to borrow
roman numerals (f iv ) or additional brackets (f [4] or f (4) ) or even just f 4 . In all
cases one must be careful not to confuse a symbol for a higher derivative with
a symbol for a power of the function, either by repeated multiplication or by
repeated composition. Indeed, this is an area in which mathematical notation is
by no means set in concrete, and will vary from writer to writer and from text
to text depending on precisely what mathematical construction is being written
about. If, for example, f (x) = sin x, then the notation f 5 (x) could conceivably
refer to any of f (x), (sin x)5 or sin(sin(sin(sin(sin x)))), and so could f [5] (x)
or f v (x), so it will pay you to read carefully what a particular text actually
means by it.
In the present work, we shall explicitly refer to higher derivatives before using
any such cluster of symbols. The higher derivatives are especially important in
Taylor’s theorem (see Chapter 16), but they also have a neat and well-known
application to the classification of stationary points which we shall deal with now.
12.4.2 Theorem: the second derivative test for local extrema Suppose that a
function f is twice differentiable on an open interval including the number p, that
f (p) = 0 and that f (x) is continuous at p. Then
• if f (p) > 0 then f has a local minimum at p,
• if f (p) < 0 then f has a local maximum at p.
(This is the result that has embedded itself in the minds of generations of
mnemonic-laden school pupils through the slogan ‘POS MIN, NEG MAX’.)
Proof
We shall deal only with the first scenario since the second is so similar.
Since f is continuous at p and f (p) > 0, we can choose δ > 0 such that f is
positive at every point of the interval (p − δ, p + δ).
Now we work backwards from knowledge of f to knowledge of f : for each
x ∈ (p − δ, p) apply FMVT to f on [x, p] and we find that
f (p) − f (x)
= f (y) for some y ∈ (x, p)
p−x
that is, 0 − f (x) = f (y)(p − x) is positive, and so f (x) itself is negative at every
point in (x, p). This implies (see 12.3.19) that f is (strictly) decreasing on [x, p] so,
in particular, f (x) > f (p).
In the same fashion we show that for every x ∈ (p, p + δ) we again get
f (x) > f (p). Hence the claimed result.
The reader will almost certainly be familiar with standard exercises such as the
following:
12.4.3 Example Find and classify the stationary points on the graph of the
function f (x) = x4 − 8x3 − 8x2 + 96x + 144.
Solution
It is easy to differentiate f two (or more) times:
f (x) = 4x3 − 24x2 − 16x + 96 = 4(x3 − 6x2 − 4x + 24),

f (x) = 12x2 − 48x − 16 = 4(3x2 − 12x − 4).
12.5 ALTERNATIVE PROOF OF THE CHAIN RULE 225
To find the stationary points we solve f (x) = 0, that is, x3 − 6x2 − 4x + 24 = 0.

A little back-of-the-envelope work yields x = 2 as one solution, so x − 2 is a factor
of the left-hand side, which now factorises into
x3 − 6x2 − 4x + 24 = (x − 2)(x2 − 4x − 12) = (x − 2)(x + 2)(x − 6)
and at this stage, we know that 2, −2 and 6 are the x-coordinates of the three
stationary points. We find their y-coordinates by substituting these numbers into
the f (x) formula, and identify
A = (−2, 0), B = (2, 256), C = (6, 0)
as the points in question. Now f is certainly continuous, and
f (−2) = +128, f (2) = −64, f (6) = +128.
The second derivative test informs us that A = (−2, 0) and C = (6, 0) are local
minima, while B = (2, 256) is a local maximum.
12.4.4 EXERCISE Devise and solve a similar question based on a fourth-degree

polynomial of your own choice. (If you are wise, you will reverse-engineer the
problem a little by making sure that the cubic equation that you will have to
solve for the three x-coordinates actually does have three real and arithmetically
simple solutions.)
12.5 Alternative proof of the chain rule

To try to reduce the amount of algebra in this proof, let us introduce some
temporary jargon. We shall call a function j(x) a junk function if
• it is defined on (at least) some open interval centred on 0,
• j(0) = 0, and
• as x → 0, j(x) → 0.
(This is less unprofessional than it probably looks: it is quite common in analytic
arguments to refer to some complicated piece of algebra as ‘junk’ if we cannot be
bothered to work out its value exactly and we are sure that it will be too small to
make any difference to the answer that we seek.)
For instance, if f is defined at and on both sides of a number p and is continuous
at p, then f (p + h) − f (p) is a junk function (of h). Again, if f is differentiable at p,
then
f (p + h) − f (p)
(h) = − f (p)
h
would be a junk function of h, except that it fails to be defined at h = 0. We can
easily fix this: the improved description
f (p + h) − f (p)
(h) = − f (p) if h = 0; (0) = 0
h
remedies this flaw, and makes into a junk function of h. This, in turn, gives rise
to an alternative description of differentiability:
12.5.1 Lemma A function f is differentiable at a point p, and its derivative there

is , if and only if (for all sufficiently small values of h)
f (p + h) = f (p) + h( + (h))
for some junk function .
Proof
If f is differentiable, then the above discussion shows how to define a suitable junk
function. Conversely, if such an does exist, then (for sufficiently small non-zero h)
f (p + h) − f (p)
= + (h),
h
which does converge to as h → 0, that is, f is differentiable at p and is its

derivative.
For most people, the condition in this lemma is less intuitive as a description
of differentiability than the one we first gave, but it has one important and
occasionally redeeming feature: it does not involve dividing by h: and for that
reason, on the few occasions when we opt to use it, we do not have to treat h = 0
as a special case.
A few moments’ thought should be enough to persuade you that
• if and η are junk functions, then so is + η,
• if is a junk function, then so is K for any constant K,
• if and η are junk functions, then so is their product η,
• if and η are junk functions, then so is their composite ◦ η (that is, the
function (η(x))).
12.5.2 The chain rule (again) If f is differentiable at p, and q = f (p) and g is

differentiable at q, then g ◦ f is differentiable at p and its derivative there is
g (f (p))f (p).
12.5 ALTERNATIVE PROOF OF THE CHAIN RULE 227
Alternative proof
Let and m stand for the numbers f (p) and g (q). By the lemma, there are two
junk functions and η such that (for sufficiently small h and k)
f (p + h) = f (p) + h( + (h))
and
g(q + k) = g(q) + k(m + η(k)).
We put k = f (p + h) − f (p), noting that k is itself a junk function of h because the
differentiable function f is necessarily continuous. Now
(g ◦ f )(p + h) = g(f (p + h))

= g(f (p) + k)
= g(q + k)
= g(q) + k(m + η(k))
= g(f (p)) + (f (p + h) − f (p))(m + η(k))
= (g ◦ f )(p) + h( + (h))(m + η(k))
that is,
(g ◦ f )(p + h) = (g ◦ f )(p) + h(m + junk)
where junk=m(h) + η(k) + (h)η(k) is a junk function by the comments made
above. Using the lemma again, this completes the demonstration that g ◦ f is
differentiable and that its derivative is m = f (p)g (q) = g (f (p))f (p).
.........................................................................
13 The Cauchy condition

— sequences whose
terms pack tightly
together
.........................................................................
13.1 Cauchy equals convergent

As we saw in Chapter 2, establishing convergence of a sequence by the definition
alone is viable only when we know in advance – or can make a well-informed
guess at – what number its limit is. In many cases this is difficult or practically
impossible. The insight that a bounded monotonic sequence must always converge
allowed us to bypass this difficulty in many important examples, but the fact
remains that a wide variety of sequences are not monotonic and, nevertheless, do
converge. How can we handle such sequences? What is needed is a way to recognise
convergence or divergence that works for all sequences (not just special cases like
the monotonic ones) but which makes no explicit mention of the limit. The most
effective recognition criterion of this kind is the so-called Cauchy condition, which
we shall now investigate.
13.1.1 Definition A sequence (xn )n∈N is called a Cauchy sequence if
for each ε > 0 there is nε ∈ N such that

|xm − xn | < ε for all m ≥ nε and all n ≥ nε .
13.1.2 Notes
• By convention, the last line here is usually written as
|xm − xn | < ε for all m, n ≥ nε
which is safe provided you understand that both m and n are restricted
to be ≥ nε .
• Carefully compare this definition with the definition of convergence, and you
will see that there is only one significant difference. To be specific: if you replace
230 13 THE CAUCHY CONDITION — SEQUENCES WHOSE TERMS PACK TIGHTLY
xm by (and, in consequence, remove the reference to m in the final line), you

recover the original definition of xn → . In other words:
– for a sequence to converge, we must be able to force its terms to lie within
any specified distance from the limit (just by going far enough along the
sequence);
– for a sequence to be Cauchy, we must be able to force its terms to lie within
any specified distance from one another (just by going far enough along the
sequence).
• Now here is the reason why this is important: Cauchy sequences and
convergent sequences turn out to be exactly the same objects (as we shall
show). It is in this sense that Cauchyness allows us to recognise exactly which
sequences have limits, without at any stage mentioning the limit itself.
13.1.3 Lemma Every convergent sequence is Cauchy.
Proof
If xn → then, given ε > 0, we can find n0 such that |xn − | < ε/2 whenever
n ≥ n0 . Then, if both m and n are ≥ n0 , we see that
|xm − xn | = |xm − + − xn |
≤ |xm − | + | − xn | (why?1 )
< ε/2 + ε/2 = ε
and so (xn ) is also Cauchy.

The first step in establishing the converse is to show that all Cauchy sequences
are bounded. (The following proof is very like the one we gave to show that all
convergent sequences are bounded.)
13.1.4 Lemma Every Cauchy sequence is bounded.
Proof
Suppose that (xn )n∈N is Cauchy. By the definition (and choosing ε = 1 for con-
venience) there is a positive integer n0 such that all the terms of the sequence
from the nth0 one onwards are separated by less than 1 unit so, in particular, they
are less than 1 unit distant from xn0 . The earlier terms x1 , x2 , x3 , · · · , xn0 −1 may
well be further away from xn0 , but there are only a finite number of them: so we
can find the biggest distance from one of them to xn0 . . . call it M. If we now put
M = max{M, 1} then every xn lies within the distance M from xn0 , so (xn )n∈N
is bounded.
1 via the triangle inequality

13.1 CAUCHY EQUALS CONVERGENT 231
13.1.5 Lemma A Cauchy sequence cannot have two subsequences with different
limits.
Proof
Suppose it did: that is, suppose (xn )n∈N is Cauchy, that subsequences (xnk )k∈N and
m−
(xmj )j∈N converge (respectively) to and m, and that < m. Take ε = > 0.
3
+ε m−ε
m
Subsequences of a Cauchy sequence
Cauchyness tells us that we can find nε such that |xm − xn | < ε whenever
m, n ≥ nε . Yet all but finitely many of the xnk belong to the interval ( − ε, + ε)
and all but finitely many of the xmj belong to the interval (m − ε, m + ε). So we
can find such values of nk and mj both bigger than nε , and then the gap between
xnk and xmj must exceed ε: contradiction.
13.1.6 Theorem A sequence converges if and only if it is Cauchy.
Proof
From a result in Chapter 5 (Proposition 5.3.6) that we have had little need to use
until now, in order to be divergent a sequence must either be unbounded or must
possess two subsequences that have different limits. By the second and third of the
above lemmata, a Cauchy sequence cannot do either of these, so it must converge.
The converse (convergent implies Cauchy) was the first lemma above.
We offer an alternative method of proof for this theorem, partly because it is such
a central result, and partly because it uses the following lemma which is useful in
its own right.
13.1.7 Lemma If a Cauchy sequence has even one subsequence that converges,
then the entire sequence converges also (and to the same limit).
Proof
Suppose that (xn )n∈N is a Cauchy sequence and that one of its subsequences
(xnk )k∈N converges to . Given ε > 0 we can use these two facts to find positive
integers n0 and kε such that
m, n ≥ n0 ⇒ |xm − xn | < ε/2, and

k ≥ kε ⇒ |xnk − | < ε/2.
Now for any n ≥ n0 , choose a value k of k that is greater than both n0 and kε . Then
nk ≥ k ≥ n0 and we find
|xn − | = |xn − xnk + xnk − | ≤ |xn − xnk | + |xnk − | < ε/2 + ε/2 = ε.
Hence the result.
13.1.8 Theorem – revisited A sequence converges if and only if it is Cauchy.
Proof – alternative
Suppose that (xn )n∈N is Cauchy (and therefore bounded, by Lemma 13.1.4).
According to Bolzano-Weierstrass, it has a convergent subsequence. According to
Lemma 13.1.7, it is itself convergent. The converse is handled as previously.
13.1.9 Example Given a sequence (xn )n∈N with the property that, for each
n ∈ N, |xn − xn+1 | < 0.9n , we show that (xn ) must be convergent.
Solution
We clearly do not have enough information to determine or guess the limit, nor
to use monotonicity, so the Cauchy criterion is the only trick available to us.
Whenever n < m, we see that
|xn − xm | = |xn − xn+1 + xn+1 − xn+2 + xn+2 − xn+3 + · · · + xm−1 − xm |

≤ |xn − xn+1 | + |xn+1 − xn+2 | + |xn+2 − xn+3 | + · · · + |xm−1 − xm |
∞
0.9n
< 0.9n + 0.9n+1 + 0.9n+2 + · · · + 0.9m−1 < 0.9k = = 10(0.9)n .
1 − 0.9
k=n
Now 0.9n → 0 so, given ε > 0, we can locate n0 ∈ N such that, whenever n ≥ n0 :
10(0.9)n < ε. It follows that, provided that n ≥ n0 (and therefore also m ≥ n0 ), we
shall have |xm −xn | < ε. Therefore (xn )n∈N , being Cauchy, must also be convergent.
13.1.10 Note Fairly slight changes to the above argument will show that if the step
|xn − xn+1 | from one term of a sequence to the immediately next one is less than a
constant times a power of t for some constant t ∈ (0, 1), then the sequence (xn ) is
Cauchy. Of course, in that circumstance, |xn − xn+1 | tends to zero. It is important,
however, to realise that the condition |xn − xn+1 | → 0 on its own is not enough to
guarantee that (xn ) shall be Cauchy. One straightforward way to illustrate this is to
let xn be the nth partial sum of the harmonic series
1 1 1 1 1
xn = 1 + + + + + ··· + .
2 3 4 5 n
Then we know that (xn ) is not convergent (since the harmonic series diverges) and
1
is therefore not Cauchy, and yet |xn − xn+1 | = which certainly converges
n+1
to zero.
Informally, we sometimes say that the step in the latter case is tending to zero but
not rapidly enough, whereas in instances like Example 13.1.9 the step is decaying
geometrically or exponentially and that this is fast enough to force Cauchyness. You
can explore this area a little further in some of the additional Exercises (numbers
184 to 190).
13.1.11 EXERCISE Given a sequence (xn )n∈N with the properties that, for each
n ∈ N:
|xn − xn+2 | < 10(0.6)n and |xn − xn+5 | < 20(0.7)n ,
show that (xn ) converges.
[Suggestion: xn − xn+5 + xn+5 − xn+3 + xn+3 − xn+1 = xn − xn+1 .]
13.1.12 EXERCISE Show that the series
∞
sin(k3 + 5k2 − 3) − 4 cos(k2 + 3k − 7)
(1.1)k
k=1
is convergent, by establishing that the sequence of its partial sums is Cauchy (and
assuming basic trigonometric facts).
‘Showing that the sequence of partial sums is Cauchy’ will turn out, in Chap-
ter 14, to be pretty much the fundamental tool for establishing convergence of
a series.
13.1.13 Example We use the Cauchy condition to revisit the proof that the

harmonic series n−1 diverges.
Solution
For any given positive integer n0 , first find a positive integer m such that 2m is
bigger than n0 . Then, with the usual notation for partial sums

n
1
Sn =
k
k=1
we see that
m+1
2
1
S2m+1 − S2m =
k
k=2m +1
is the total of 2m fractions of which the smallest one is 1/2m+1 . This total therefore
exceeds 1/2. In other words, no matter how large we choose n0 , there will be partial
sums later than the nth0 that differ by more than 0.5; so the sequence of partial sums
here is not Cauchy, and cannot converge.
13.1.14 Example Given that (xn )n∈N and (yn )n∈N are both Cauchy sequences, to
show that their ‘term-by-term product’ (xn yn )n∈N is also Cauchy.
Solution
(This can become quite messy if we try to argue from the definition of Cauchy, so
we won’t.)
Since (xn ) and (yn ) are both Cauchy, they must each be convergent to some
limit: say, xn → and yn → (as n → ∞). Algebra of limits now tells us that
xn yn → , so the product sequence converges, and is therefore Cauchy too.
13.1.15 EXERCISE Of the following two statements, just one is true in general.
Give a proof for the one that is true, and find a counterexample that disproves the
false one.
1. If (xn )n∈N and (yn )n∈N are both Cauchy sequences, and there is a (strictly)
positive number δ > 0 such that |yn | ≥ δ for all n ≥ 1, then the ‘term-by-term
quotient’ sequence
xn
yn n∈N
must also be Cauchy.
2. If (xn )n∈N and (yn )n∈N are both Cauchy sequences, and |yn | > 0 for all n ≥ 1,
then the ‘term-by-term quotient’ sequence

xn
yn n∈N
13.1.16 Example If f : [a, b] → R is a continuous function on a closed bounded

interval [a, b], and (xn )n∈N is a Cauchy sequence of elements of [a, b], we show that
(f (xn ))n∈N is also Cauchy (less formally, that the function f preserves Cauchyness.)
Solution
Because (xn )n∈N is Cauchy, it must converge: xn → for some . Also, since
a ≤ xn ≤ b for all n, we have a ≤ lim xn = ≤ b also, that is, ∈ [a, b].
Because continuous functions preserve convergence, it follows that f (xn ) → f ().
Thus (f (xn )) is a convergent sequence, and consequently Cauchy.
13.1.17 EXERCISE Of the following three statements, at least one is true in

general and at least one is false. Give a proof for each that is true, and find a
counterexample to disprove each false one.
1. If f : (a, ∞) → R is a continuous function on an unbounded open interval
(a, ∞), and (xn )n∈N is any Cauchy sequence of elements of (a, ∞), then
(f (xn ))n∈N must also be Cauchy.
2. If f : [a, ∞) → R is a continuous function on an unbounded closed interval
[a, ∞), and (xn )n∈N is any Cauchy sequence of elements of [a, ∞), then
(f (xn ))n∈N must also be Cauchy.
3. If f : R → R is a continuous function on the whole real line R, and (xn )n∈N is
any Cauchy sequence, then (f (xn ))n∈N must also be Cauchy.
.........................................................................
14 More about series

.........................................................................
14.1 Absolute convergence

Almost all the series that we have worked on so far have consisted exclusively
of non-negative terms (for the reasons set out in Chapter 7), the only significant
exception being the (very special) case of alternating series, where a simple test
applies. Our purpose now is to extend our knowledge to general series where a
mixture of positively- and negatively-signed terms must be expected, and where
the pattern of signs may well be unpredictable. Naturally, we try to do this by
capitalising on the work we have already done, by turning a general series into
a series of non-negative terms, and seeking to relate its behaviour to that of the
original. There is, therefore, a kind of inevitability about the first definition:
14.1.1 Definition We call a series

xk
absolutely convergent when the series (of non-negative terms)

|xk |
is convergent.
14.1.2 Note
• Let us be clear immediately that convergence and absolute convergence are not
the same thing. For instance, we know from the alternating series test that the
‘alternating harmonic series’
1
(−1)k
k
is convergent; and yet
1
(−1)k 1 =
k k
is the (notoriously divergent) harmonic series. In the present terminology, the
alternating harmonic series is convergent, but it is not absolutely convergent.
• A series that is convergent but not absolutely convergent is called conditionally
convergent.
238 14 MORE ABOUT SERIES
• On the other hand, you will never come across a series that is absolutely
convergent but is not convergent. No such series can exist: and the way to see
this important truth is to use the idea of Cauchy sequences that we encountered
just a few pages back. Recall that Cauchy and convergent are equivalent for
sequences, and so a (general) series converges if and only if its sequence of
partial sums is Cauchy.
• We also remind you again about the triangle inequality: that |a + b| ≤ |a| + |b|
for arbitrary numbers a and b. This basic form of the inequality extends
immediately to the three-term version
|a + b + c| (≤ |a + b| + |c|) ≤ |a| + |b| + |c|
and the four-term version
|a + b + c + d| (≤ |a + b + c| + |d|) ≤ |a| + |b| + |c| + |d|
and, through an easy induction, to the general form
|a1 + a2 + a3 + · · · + an | ≤ |a1 | + |a2 | + |a3 | + · · · + |an |
which we have occasionally used, and which features again in the next
demonstration:
14.1.3 Theorem Every absolutely convergent series is convergent.
Proof

Let xk be absolutely convergent, that is, let |xk | be convergent. Our task is to

show that xk is convergent and, since that means studying two different partial-
sum sequences (one for each series), we must take care to have different notations
for the two of them. For instance, let us put

n
Sn = x1 + x2 + x3 + · · · + xn = xk , and
1

n
Sn = |x1 | + |x2 | + |x3 | + · · · + |xn | = |xk | .
1

Convergence of |xk | tells us that (Sn ) is a (convergent, and therefore) Cauchy
sequence so, given ε > 0, we can find n0 such that
|Sm − Sn | < ε whenever m, n ≥ n0 .
Since the last line is empty of information when m equals n, we may as well assume
m = n here. Also, there is no harm in assuming that m is the larger and n is the
14.1 ABSOLUTE CONVERGENCE 239
smaller, since if we swop them over, the modulus signs will ensure that |Sm − Sn |
will remain unaltered. Thus we can write the previous display in slightly more
convenient (but equivalent) forms:
|Sm − Sn | < ε whenever m > n ≥ n0 , that is,
m
n

|xk | − |xk | < ε whenever m > n ≥ n0 , that is,

1 1
m

|xk | < ε whenever m > n ≥ n0 , that is,

n+1
m
|xk | < ε whenever m > n ≥ n0 .
n+1
This is where we get to use the ‘enhanced’ triangle inequality: whenever

m > n ≥ n0
m m

n m

|Sm − Sn | = xk − xk = xk ≤ |xk |

1 1 n+1 n+1
which, as we just saw, is <

ε. In other words, (Sn ) is also a Cauchy sequence, and
therefore convergent. So xk is, indeed, a convergent series.
14.1.4 Notes
1. What we now have is the basis of a strategy for deciding upon the convergence
or divergence of a general series (as opposed to a series of non-negatives). If
we are given such a series xk , we look instead at the modulussed series

|xk | and examine it by the techniques we acquired in Chapter 7. If they

show that |xk | is convergent, in other words, that xk is absolutely

convergent, then the theorem tells us that the original xk is convergent also,
and the task is completed. So far, so good.

2. However, what if our Chapter 7 skills tell us that |xk | is NOT convergent?

Then there is more and different work to do because the discovery, that xk is
not absolutely convergent, does not tell us whether it is convergent or not.
(Look again at the alternating harmonic series: itis not absolutely
√ convergent,
but it is convergent; in contrast, a series such as (−1) k is not absolutely
k k
convergent, and it is not convergent either.) In brief: non-(absolute

convergence) does not decide for us whether we do or do not have
convergence.
3. However, some of the main tests in Chapter 7 were designed to help us get
around this difficulty in many cases. Consider, for example, the nth root test
(for non-negative terms, of course). It did not simply say the following:
√
• if lim n xn < 1 then xn is convergent,
√
• if lim n xn > 1 then xn is divergent.
Instead, it said something rather less symmetrical and slightly more

awkward:
√
• if lim n xn < 1 then xn is convergent,
√
• if lim n xn > 1 then xn does not converge to zero and therefore xn is
divergent.
Perhaps you now see why it pays dividends to word it in this half-clumsy
√
fashion? If we use the root test on |xk |, discover that lim n |xn | > 1, and
bring back merely the information that |xk | is divergent, that leaves
unanswered the question of whether xk converges or diverges . . . but that is
only
part of what the test (as we expressed it) is telling us: it actually says that
|xn | is divergent because |xn | does not converge to zero. Therefore xn does

not converge to zero either, and xn cannot be convergent.
4. Consequently, the nth root test will deal with all the (general) series for which
√
lim n |xn | can be calculated and is not equal to 1.
5. A very similar discussion will clarify that the ratio test will deal with all series
|xn+1 |
(with no zero terms) for which the modulussed growth-rate limit lim
|xn |
can be calculated and is not equal to 1.
By way of illustration, we’ll now re-work a couple of Chapter 7’s examples but
without the assumption that the parameter t or x is positive.
14.1.5 Example For precisely which real values of t does the series
3n2 − 1 n
tn
2n2 − 1
converge?
Solution

Put xn = the nth term here, and consider instead |xn |. All its terms are non-
negative, and the nth root of |xn | is

3n2 − 1
|t|
2n2 − 1
3|t|
which has a limit of 2 so, by the root test:

1. for |t| < 2/3 the limit is < 1 and so the series |xn | converges, that is, xn
is absolutely convergent, and therefore also convergent;
2. for |t| > 2/3 the limit is > 1 so |xn | cannot tend to zero, and neither can xn , so

the original series xn must diverge.
It remains to ponder what happens when t is exactly ±2/3. Luckily, in that
borderline case |xn | itself is
14.1 ABSOLUTE CONVERGENCE 241
n n 2 n
3n2 − 1 2 6n − 2
=
2n2 − 1 3 6n2 − 3
which is (just) greater than 1 (in the final fraction, the top line exceeds the bottom
line). This shows once again that neither |xn | nor xn can tend to zero, and so the

series xn diverges.
We conclude that the given series xn is (absolutely) convergent when
−2/3 < t < 2/3, and divergent for every other value of t.
14.1.6 Example For exactly which real values of x does the following series
converge?
(n + 1)!(2n + 2)! xn
wn where wn =
(3n + 3)!
Solution
If x = 0 then, although a ratio test would not be legal, the
series definitely con-
verges;1 so from now on assume x = 0 and consider |wn |. Its growth rate
|wn+1| /|wn | cancels to
(n + 2)(2n + 3)(2n + 4)|x| (2n + 3)(2n + 4)|x|

=
(3n + 4)(3n + 5)(3n + 6) 3(3n + 4)(3n + 5)
whose limit is 4|x|/27. By the ratio test, therefore:

1. for 0 < |x| < 27/4 the limit is less than 1 and |wn | converges, that is, wn
converges absolutely, but
2. for 27/4 < |x| the limit exceeds 1, |wn | does not tend to zero, wn equally does

not tend to zero, and wn diverges.
It remains unclear at first what will happen in the borderline cases x = ±27/4.
Notice, however, that when x = ±27/4, the growth rate for |wn | is actually
(2n + 3)(2n + 4)27 (6n + 9)(6n + 12)

=
12(3n + 4)(3n + 5) (6n + 8)(6n + 10)
which is greater than 1 (look at the individual factors in the top and bottom lines).

Thus |wn+1 | > |wn |, the terms of |wn | are increasing and cannot converge to

zero, and thus wn cannot tend to zero either and wn again diverges.

We conclude that wn converges (absolutely) when −27/4 < x < 27/4 but
diverges in all other cases.
1 Every term is zero, every partial sum is zero, and the limit of the partial sums is zero (and
certainly exists)
14.1.7 EXERCISE For which real values of t does the series
(3n + 1)n
tn
nn−1
14.1.8 EXERCISE Determine the range of values of the real number x for which
the series
(n!)6 x3n
(6n)!
converges.
Many textbooks present the ratio and root tests as tests upon general series,
rather than as tests upon series of non-negative terms. Our preference is to proceed

as above, that is, consciously to switch from xk to |xk |, use the appropriate test
there, and then switch back to see what we have learned about the original series.
(For one thing, this forces awareness of the important fact that we are dealing with
two distinct series, not one.) For the sake of completeness, however, here are the
two tests as applicable to general series:
∞
14.1.9 The nth root test for general series Suppose that n=1 an is a series of
√
real terms and that n |an | converges to a limit (as n → ∞). Then:
1. if < 1 then the series converges absolutely,
2. if > 1 then an and |an | do not tend to zero, and therefore the series diverges.
∞
14.1.10 D’Alembert’s ratio test for general series Suppose that n=1 an is a
series of non-zero terms and that the growth rate |an+1 |/|an | converges to a limit
(as n → ∞). Then:
1. if < 1 then the series converges absolutely,
2. if > 1 then an and |an | do not tend to zero, and therefore the series diverges.
14.2 The ‘robustness’ of absolutely convergent series

For the learner, one of the most disturbing features of series is that it is sometimes
possible to take a convergent series, ‘add up’ all its terms in a different order, and get
a different sum to infinity (we shall see this soon). This outrageous behaviour flies
in the face of what we all learned about basic arithmetic in elementary school, and
one of the most reassuring features of absolutely convergent series is that they do
not behave badly like this. Indeed, they can even be multiplied together in a more-
or-less natural fashion. This section focusses on the ‘good’ behaviour of absolutely
14.2 THE ‘ROBUSTNESS’ OF ABSOLUTELY CONVERGENT SERIES 243
convergent series when subjected to arithmetical processes such as re-ordering

and multiplying. Some of the proofs are relatively complicated and it will again
be perfectly acceptable if you omit them on a first reading – but do take on board
the results and the examples as soon as possible.
An easy topic to begin with (although not one that is really about absolute

convergence) is that of insertion and removal of brackets. If xn = x1 + x2 +
x3 + x4 + · · · is a convergent series, we can create many more series by imposing
brackets in an arbitrary manner upon the stream of numbers, for instance:
(x1 + x2 ) + x3 + (x4 + x5 + x6 ) + (x7 + x8 ) + · · · .
To explore the question of its convergence, we must examine its partial sums which,
in the present example, begin with
x1 +x2 , x1 +x2 +x3 , x1 +x2 +x3 +x4 +x5 +x6 , x1 +x2 +x3 +x4 +x5 +x6 +x7 +x8 , · · · .
Notice that these constitute a subsequence of the partial-sum sequence for the
original series – indeed, this was inevitable, since the nth partial sum of the
bracketed series is simply the m(n)th original partial sum where m(n) is the label on
the last term of the nth bracket (regarding each unbracketed term as sitting inside
an invisible pair of brackets on its own). That is the only insight needed to establish:
14.2.1 Theorem Any series arising by bracketing together blocks of terms in a

convergent series is convergent, and to the same sum-to-infinity.
Proof
The partial-sum sequence for the bracketed series is a subsequence of the partial-
sum sequence for the original series, and therefore converges to the same limit.
14.2.2 Example On the other hand, removal of existing brackets can completely
change the convergence status of a series. For a simple example, consider:
(1 − 1) + (2 − 2) + (3 − 3) + (4 − 4) + (5 − 5) + · · · .
Clearly this converges, since every single term (every single bracket) is zero, and

0 converges to 0. However, if we remove all2 the brackets, it becomes
1 − 1 + 2 − 2 + 3 − 3 + 4 − 4 + 5 − 5 + ···
whose partial sums

1, 0, 2, 0, 3, 0, 4, 0, 5, 0, · · ·
2 Indeed, a similar argument runs if we remove an infinite number of the brackets.

don’t merely fail to converge to zero, they fail to converge at all since they are
unbounded (the (2n − 1)th partial sum is n, for each n ∈ N).
14.2.3 EXERCISE In contrast, show that if a bracketed version of a series of non-

negative terms converges, then so did the original series (and to the same sum).
Turning next to the question of ‘robustness of convergence under rearrange-

ment’, we once again begin by tackling the easy special case of a series of non-
negative terms.
14.2.4 Roughwork Starting with a convergent series of non-negative terms
a1 + a2 + a3 + a4 + a5 + · · ·
let us imagine a typical rearranged series consisting of exactly the same terms but
in a different order, such as
a9 + a3 + a41 + a2 + a17 + · · · .
Look at the first few partial sums of the rearranged series (as indeed we must, if we
seek its sum to infinity):
a9 , a9 + a3 , a9 + a3 + a41 , a9 + a3 + a41 + a2 , a9 + a3 + a41 + a2 + a17 , · · · .
The bad news is that these are, of course, not partial sums of the original series.
Give them a temporary name, say, random handfuls.
The better news is the observation that this is an increasing sequence, just as was
the partial-sum sequence of the original series . . . and for increasing sequences,
limit and supremum are the same thing. Furthermore, each of the random
handfuls is part of a partial sum, and therefore less than or equal to a partial
sum since all the terms are non-negative.3 Each random handful is therefore ≤
the supremum of all the partial sums, that is, the limit of the original series, so
the supremum of the random handfuls (= the limit of the rearranged series) must
be ≤ the limit of the original series. Presumably we now only have to reverse the
argument to obtain the inequality the other way round?
14.2.5 Theorem Any rearrangement of a convergent series of non-negative terms

converges to the same sum.
9 41
3 For instance, a9 + a3 is less than 1 ak , a9 + a3 + a41 is less than 1 ak and so on.
Proof

Given that bn is a rearrangement of a series an that converges to ,
and in
which an ≥ 0 for all n, recall that is the supremum of the partial sums of an .

Each partial sum of bn is the sum of finitely many an s scattered in some

unpredictable pattern within the series an , so these an s must all occur before
some particular am in the original sequence and their total is therefore ≤ the mth

partial sum of an , and therefore also ≤ .

Since the partial-sum sequence for bn is also increasing, it converges to some
where ≤ .
Now an is equally a rearrangement of bn so, by the identical argument,
≤.
Hence = .
14.2.6 Note In order to extend this conclusion to absolutely convergent series

whose terms are a mixture of positives and negatives, we simply segregate out the
positives from the negatives and think about the two streams separately. Given a

series an , we introduce the notation (for each n ∈ N):
a+ +
n = an if an ≥ 0, an = 0 if an < 0;
a− −
n = −an if an < 0, an = 0 if an ≥ 0.
Then
an = a+ −
n − an , |an | = a+ −
n + an
in all cases (just check it out for non-negative an and for negative an : it works in
both cases) but, importantly, every a+ −
n and every an is non-negative (and therefore
we can use the preceding theorem on them separately).

Think what the remark an = a+ −
n − an does to a typical partial sum of an
and you will understand what we meant by ‘segregating out the positives from the
negatives’; for instance:
(3 − 5 − 2 + 1 + 6 − 4 + 2 − 7 − 3)
= (3 + 0 + 0 + 1 + 6 + 0 + 2 + 0 + 0) − (0 + 5 + 2 + 0 + 0 + 4 + 0 + 7 + 3).
In general, what we get is

n
n
n
ak = a+
k − a−
k
1 1 1
and, likewise,

n
n
n
|ak | = a+
k + a−
k.
1 1 1
n −
The last display line shows that if n1 a+ kand 1 ak both converge, then so must
n n
|a |; yet the converse is also true: if |a | converges then, because (for all n)
1 k 1 k n +
0 ≤ a+ ≤ |a | and 0 ≤ a− ≤ |a |, the comparison test tells us that both
1 ak and
n − n n n n
a
1 k converge. Furthermore, their sums-to-infinity add in the obvious manner.
We have proved:

14.2.7 Lemma The series an is absolutely convergent if and only if both a+
− n
and an converge. Furthermore, their sums then satisfy

|an | = a+
n + a−
n and

an = a+
n − a−
n .
14.2.8 Theorem Any rearrangement of an absolutely convergent series converges

to the same sum.
Proof

Rearranging a given absolutely convergent an into a new order bn will
+ −
rearrange both an and an in exactly the same pattern. By the above, the two
latter series are convergent, and by the previous theorem, this does not alter their
sums, that is:

an = a+
n − a−
n

= b+
n − b−
n

= bn .

14.2.9 IMPORTANT EXERCISE Show that if a series an is conditionally con-

vergent, that is, convergent but NOT absolutely convergent, then both a+ n and
−
an diverge to ∞.
For most learners, the surprise is not that absolute convergence is robust under
rearrangement, but that non-absolute convergence isn’t. It turns out that this
may be demonstrated upon any convergent-but-not-absolutely-convergent series,
but we shall demonstrate using the most obvious such object – the alternating
harmonic series.
∞
14.2.10 Example The series 1 (−1)k−1 k−1 is well known to converge but not
absolutely. Furthermore, since in the expression

1 1 1 1 1 1 1 1 1
1− + − + − + − + − + ···
2 3 4 5 6 7 8 9 10
every bracket is positive, the sum-to-infinity (let us denote it by S), whatever it is,
is more than 0.5. In particular, S = 0.
Suppose it were correct that every rearrangement of this series is also convergent
to S (and now we shall seek a contradiction). In particular, the rearrangement
1 1 1 1 1 1 1 1 1 1 1
1− − + − − + − − + − − + ···
2 4 3 6 8 5 10 12 7 14 16
is a rearrangement since each original term appears once and once only (in the
pattern of one positive term and two negative terms alternating) so it also converges
to S.
From what we saw in 14.2.1 about imposing brackets, the modified series

1 1 1 1 1 1 1 1 1 1 1
1− − + − − + − − + − − + ···
2 4 3 6 8 5 10 12 7 14 16
that is,
1 1 1 1 1 1 1 1
− + − + − + − + ···
2 4 6 8 10 12 14 16
must also converge to S. Yet the last display is precisely one half of the original
series so it converges to S/2. We deduce that S = S/2 which, since S is non-zero, is
absurd.
Conclusion: rearrangement of at least some convergent (non-absolutely conver-
gent) series can alter their convergence!
14.2.11 EXERCISE Devise a rearrangement of the alternating harmonic series

that diverges to ∞. Devise another that diverges to −∞.

14.2.12 HARDER EXERCISE Given a completely arbitrary series xk that is
conditionally convergent, and a completely
arbitrary real number , think how you
could devise a rearrangement of xk that converges to .

Suggestion: the key ingredient in finding one is that both xk+ and xk− diverge
to infinity. You might begin by taking just enough of the non-negative terms of the
series to make the running total greater than .
14.2.13 Note Our last task in this section is to verify that absolutely convergent
series are robust under multiplication and, prior to that, we must define what we
mean by multiplying two series together. Tempting though it may at first appear to
multiply
them‘term by term’as we did successfully for sequences (that is, to define
xk times yk to mean xk yk ), this definition completely fails to match how
series are actually used in practice, so we had better begin by a forward glance at
one of their key applications: power series representations of functions.
It is well known (and yes, we shall be checking this out in detail) that many
important functions can be represented, or even optimally defined, by power
series. For instance, you have probably encountered the following:
xn ∞
x2 x3
ex = 1 + x + + + ··· = ,
2! 3! 0
n!
∞
x3 x5 x7 x2n+1
sin x = x − + − + ··· = (−1)n ,
3! 5! 7! 0
(2n + 1)!
∞
x2 x4 x6 x2n
cos x = 1 − + − + ··· = (−1)n .
2! 4! 6! 0
(2n)!
Much of the purpose of representing functions by power series is in order to be

able to manipulate the power series instead of the functions. So, for instance, if

two functions f and g are represented by power series ak xk and bk xk , then
we expect/need the sum f + g to be represented by ak x +
k bk xk and the
difference f − g by ak x −
k k
bk x , and somebody should have taken care to
define the sum and the difference of power series so that this does happen. Luckily,
there is no problem here: we define

ak xk + bk xk to be (ak + bk )xk ,
we define
ak xk − bk xk to be (ak − bk )xk ,
and everything turns out to run so smoothly that there was really no need to make
a fuss about it.
In the case of multiplication, it is ratherless obvious what to do. We need the
product function fg to be represented by ak x × bk x , but how then ought
k k
that product to be defined?

We shall let our definition be guided by the simplest case: the case where f and
g are polynomials. Then the power series that ‘represent’ them are just f and g
themselves (or, to be really fussy, f and g with infinite strings of zero terms attached:
so that, for instance, the function f (x) = 2 − 5x + 3x2 − 4x3 is represented by the
power series
2 − 5x + 3x2 − 4x3 + 0x4 + 0x5 + 0x6 + 0x7 + · · ·
but, pragmatically, we are not going to waste paper and patience by writing out
endless strings of zero terms).
Take the case of two cubics, say,
f (x) = a0 + a1 x + a2 x2 + a3 x3 , g(x) = b0 + b1 x + b2 x2 + b3 x3 .
What power series c0 + c1 x + c2 x2 + c3 x3 + · · · shall represent their product fg?

The question virtually answers itself, because their product is itself a polynomial,
as a few tedious minutes with paper and pen will show you:
(fg)(x) = a0 b0 + (a0 b1 + a1 b0 )x + (a0 b2 + a1 b1 + a2 b0 )x2 +

(a0 b3 + a1 b2 + a2 b1 + a3 b0 )x3 + (a0 b4 + a1 b3 + a2 b2 + a3 b1 + a4 b0 )x4 + · · ·
(and two more terms). Suddenly we are left with no freedom of action about how
to multiply the series: the coefficient c0 has to be a0 b0 , c1 has to be a0 b1 + a1 b0 ,
c2 has to be a0 b2 + a1 b1 + a2 b0 and so on. Any other decision we might consider
making would create a definition that didn’t even work correctly for polynomials,
let alone for general (properly infinite) power series.
This is why the following definition,4 complicated though it looks, is the right
one for our purposes:
∞ ∞
14.2.14 Definition The Cauchy product of two series5 0 ak and 0 bk is the
∞
series 0 ck defined by
c0 = a0 b0 , c1 = a0 b1 + a1 b0 , c2 = a0 b2 + a1 b1 + a2 b0 ,
and, in general,

k=n
cn = a0 bn + a1 bn−1 + a2 bn−2 + · · · + an b0 = ak bn−k .
k=0
∞ ∞
14.2.15 Theorem If the two series 0 ak and 0 bk are absolutely convergent,
with sums A and B, say, then their Cauchy product is also absolutely convergent,
and its sum is AB.
Roughwork

When we multiply a partial sum (call it An = n0 ak ) of the first series by a partial
sum Bn of the second, the various fragments ai bj do not naturally line up in a
sequence but, rather, in a two-dimensional grid such as
a0 b0 a1 b0 a2 b0 a3 b0 · · ·
a0 b1 a1 b1 a2 b1 a3 b1 · · ·
a0 b2 a1 b2 a2 b2 a3 b2 · · ·
a0 b3 a1 b3 a2 b3 a3 b3 · · ·
: : : :
and so on. There are several different ways to string that array out into a sequence
so that we can consider adding the terms up as a series. For one, we can create an
expanding list of ‘square shells’ starting at the top left hand corner (follow these on
the grid to see what we mean by that somewhat cryptic phrase):
4 We have formulated it for arbitrary series, not just for power series, so all the xn s have
disappeared; but the main application we intend is still that of power series.
5 We are starting the labelling at k = 0 instead of at k = 1, again mainly because of the focus
on power series which do naturally begin with a0 x0 in order to accommodate a constant term.
a0 b0
+a0 b1 + a1 b1 + a1 b0
+a0 b2 + a1 b2 + a2 b2 + a2 b1 + a2 b0
+a0 b3 + a1 b3 + a2 b3 + a3 b3 + and so on · · ·
Notice that, at the end of each line, the running totals are A0 B0 , A1 B1 , A2 B2 , A3 B3
and so on – a sequence whose limit is easy to grasp.
On the other hand, if we sort out the array into a sequence/series by following
‘diagonal sweeps’ (again, please follow these on the grid to see what we mean), we
get instead:
a0 b0
+a0 b1 + a1 b0
+a0 b2 + a1 b1 + a2 b0
+a0 b3 + a1 b2 + a2 b1 + a3 b0
+a0 b4 + a1 b3 + and so on · · ·
and look: each line is now one of the Cauchy product coefficients – we are now
building up c0 + c1 + c2 + c3 + · · · as, indeed, we must do if we want to address
what this theorem claims.
What we now need is a guarantee that these quite different sorting processes will
give ultimately the same sum to infinity, that is, we need to be able to rearrange and
know that the sum is robust. For this, absolute convergence must be established
first; and for that, we have to begin with the same array but with modulus signs on
every term.
Proof
∞
Let An , Bn stand for the nth partial sums of ∞ 0 |ak | and 0 |bk | respectively
(which we know to be convergent series) and, since the partial-sum sequences are
bounded, find two positive constants P, Q such that (for all n ∈ N)
An ≤ P and Bn ≤ Q.
The various numbers |ai bj | that turn up when we multiply An and Bn together
present themselves naturally in a two-dimensional (infinite) grid:
|a0 b0 | |a1 b0 | |a2 b0 | |a3 b0 | · · ·

|a0 b1 | |a1 b1 | |a2 b1 | |a3 b1 | · · ·
|a0 b2 | |a1 b2 | |a2 b2 | |a3 b2 | · · ·
|a0 b3 | |a1 b3 | |a2 b3 | |a3 b3 | · · ·
: : : :
To be precise, the items in the first (n + 1) places of the first (n + 1) rows add up
to An Bn .
Any finite selection of terms from the grid will lie within ‘the first (n + 1) places
of the first (n + 1) rows’ if we choose n big enough, so the sum total of any finite
selection is less than or equal to An Bn for that n, and therefore cannot exceed PQ.
That is, no matter how we string these items |ai bj | together into a sequence, the
resulting series (of non-negatives) has its partial sums bounded above (by PQ) and
must therefore converge. In other words, if we strip out the modulus signs from the
grid and organise its entries into a sequence in any fashion whatsoever, the resulting
series is absolutely convergent to some sum S. Best of all: it is the same number S no
matter how we chose to organise them: for rearranging an absolutely convergent
series does not alter its sum.
So now consider the ‘un-modulussed’ grid:
a0 b0 a1 b0 a2 b0 a3 b0 · · ·
a0 b1 a1 b1 a2 b1 a3 b1 · · ·
a0 b2 a1 b2 a2 b2 a3 b2 · · ·
a0 b3 a1 b3 a2 b3 a3 b3 · · ·
: : : :
If, firstly, we choose to sort it into a sequence and then a series as follows:
a0 b0
+a0 b1 + a1 b1 + a1 b0
+a0 b2 + a1 b2 + a2 b2 + a2 b1 + a2 b0
+a0 b3 + a1 b3 + a2 b3 + a3 b3 + and so on · · ·
then its partial sums converge to S and, moreover, the subsequence comprising
partial sums number 1, 4, 9, 16, 25, · · · also converges to S. Yet partial sum number
(n + 1)2 is exactly
(a0 + a1 + a2 + · · · + an )(b0 + b1 + b2 + · · · + bn )
which converges to AB. We now know that S = AB.

Secondly, let us sort the grid into a different sequence and series like this:
a0 b0
+a0 b1 + a1 b0
+a0 b2 + a1 b1 + a2 b0
+a0 b3 + a1 b2 + a2 b1 + a3 b0
+a0 b4 + a1 b3 + and so on · · ·
This series also has to converge to S = AB, and so will the subsequence comprising
items 1, 3, 6, 10, 15, · · · of its partial sums (as indicated by the line-breaks here). Yet
these are precisely the Cauchy product numbers
c0 , c0 + c1 , c0 + c1 + c2 , c0 + c1 + c2 + c3
and so on.
We are (at last) able to conclude that the Cauchy product series converges to AB.
14.2.16 EXERCISE We know that (for all x ∈ (−1, 1)) the series 1 + x + x2 + x3 +
1
x4 + · · · converges to 1−x and the series 1 − x + x2 − x3 + x4 − · · · converges to
1
1+x . Calculate (and simplify as necessary) the Cauchy product of these two series
and confirm that, as predicted by the theorem, it converges to the product of the
two functions.
14.2.17 EXERCISE Assuming the correctness of
xn ∞
x2 x3
ex = 1 + x + + + ··· = ,
2! 3! 0
n!
calculate (and simplify as necessary) the Cauchy product of the power series
representations of ex and of ey , and confirm that it converges to the product of
the two functions.
14.3 Power series

A power series is a series of the form ∞ 0 an (x − c) where c is a constant (the
n
‘centre’) and the ‘coefficients’ an are also constants. Most of the time, we change
the variable by
substituting, say, y = x − c so that the appearance of the series
simplifies to ∞ 0 an y n (and the centre becomes 0). Since this can always be done,
most of the theory assumes that it has already taken place. That is, in practice, a

power series is a series of the form ∞ n
0 an x .
We need to be aware of which values of x make ∞ n
0 an x converge and which
make it diverge, and for a series of this type there are just three possible scenarios:
either it converges absolutely for all x, or only for x = 0, or (the most typical case)
there is a number D such that the series converges absolutely whenever |x| < D and
diverges whenever |x| > D. The number D is known as the radius of convergence,
mainly because all this theory can equally well be developed for the case in which x
is a complex number, and then |x| < D describes the inside of a circle centred at the
origin and of radius D. Since we are discussing only real functions, we shall have to
put up with the slightly odd use of the word ‘radius’ to describe half the length of
an interval (−D, D) (which is where we know that the real series converges). It is
generally more difficult to determine whether it converges at x = D or at x = −D.
14.3 POWER SERIES 253
The two extreme cases (when the series converges for all x, or only for x = 0)
are conventionally represented by saying that the radius of convergence is infinite,
or zero.
14.3.1 Lemma Every power series has a radius of convergence.
Proof

If ∞ 0 an x converges only at x = 0, then D = 0. If not, pick any non-zero t such
n
∞
that 0 an t n does converge. Then certainly an t n → 0 and is therefore bounded:
there is M > 0 such that |an t n | < M for all n.
For any number u in the interval (−|t|, +|t|), we have
u n
M
|an un | ≤ n |u|n = M
|t | t
and the
last nitem belongs to a convergent
geometric series so, by the comparison
test, |an u | is also convergent and an un is absolutely convergent. In other
words, whenever t is a point in the ‘convergence zone’ of the given power series,
then every point in (−|t|, +|t|) (every point that lies closer to zero than t does) is
also in the convergence zone.
The only subsets of R that possess this property are R itself and the intervals that
are centred on 0 (length 2D, say). Hence either ∞ or D acts as radius of convergence
for the series.
In many cases, the radius of convergence can be calculated quite easily:

14.3.2 Theorem For a given power series an xn :
√
1. If n |an | converges to a limit then
1
• > 0 implies that the radius of convergence is ,

• = 0 implies that the radius of convergence is ∞;
|an+1 |
2. If converges6 to a limit then
|an |
1
• > 0 implies that the radius of convergence is ,

• = 0 implies that the radius of convergence is ∞.
Proof
√ √ √
Given that n |an | converges
to , we see that n
|an x n | = n |a ||x| → |x|. The
n
roottest (applied to |an xn |) tells us that if |x| < 1 then |an xn | converges
(so n
an x converges absolutely) whereas if |x| > 1 then |an xn | does not tend
to zero, an x also does not tend to zero, and an xn diverges. Separating out the
n
cases > 0 and = 0, that proves the first part. The second emerges from the
d’Alembert test in the same way.
6 and, implicitly, an is non-zero for all sufficiently large values of n, of course.

14.3.3 EXERCISE Determine the radius of convergence of each of the following:

x2 x3 xn
1. (ex =) 1 + x + + + ··· = ∞ 0 ,
2! 3! n!
x3 x5 x7 n x
2n+1
2. (sin x =) x − + − + ··· = ∞ 0 (−1) ,
3! 5! 7! (2n + 1)!
x2 x4 x6 n x
2n
3. (cos x =) 1 − + − + ··· = ∞ 0 (−1) ,
2! 4! 6! (2n)!

4. (30n + n30 )xn ,

5. (n!)xn ,

n!(n + 1)!(n + 2)! n
6. x ,
(3n + 1)!
xn
7. 2n2 .
1 + n3

It is important to realise that when, for each x ∈ (−D, D), a power series an xn
converges to a limit, then that limit will depend on the value of x, that is, it will
itself be a function of x. The whole concept of a sequence or series of functions
converging to a function is of great importance in analysis, but we can deal with
only a few key aspects of it in this text. Most particularly, we need to discuss
continuity and differentiability of the sum of a series of continuous or differentiable
functions, but only in the special case of convergent power series. Continuity is
fairly straightforward to check out:

14.3.4 Theorem When a power series an xn has nonzero radius of convergence

D, then its sum f (x) = an x is continuous at every point of (−D, D). (This
n
includes the special case D = ∞: then f is continuous on the whole of R.)
Roughwork/preparation

Let us put Sm (x) = m 0 an x , the m partial sum. Since limm→∞ Sm (x) = f (x),
n th
Sm (x) will be a good approximation to f (x) (provided m is taken big enough) for
each individual x: but that is not strong enough to guarantee the approximation
to be equally good for all values of x at once – indeed, the endpoints ±D of the
interval could present serious problems since we do not even know whether the
series converges at all there. Common sense therefore suggests that we stay away
from ±D and work on a slightly smaller interval of the form [−ρ, ρ] for some
suitably chosen positive ρ < D.

Let ε > 0 be given. Since an ρ n converges absolutely to f (ρ), the modulussed

series |an |ρ n converges to some limit = ∞ 0 |an |ρ , and so we can find a
n
positive integer m such that

∞

m ε

− |an |ρ n = |an |ρ n < .
3
0 m+1
Now for any x ∈ [−ρ, ρ], we have

∞

m ∞ ∞
ε
n n
f (x) − an x = an x ≤ |an ||x|n ≤ |an |ρ n <
3
0 m+1 m+1 m+1
that is, the mth partial sum Sm is an ε3 -good approximation to f at every point of
[−ρ, ρ] simultaneously.
Proof
Given any point x0 in (−D, D) and any ε > 0, choose ρ so that −ρ < x0 < + ρ < D,
and choose m ∈ N as in the roughwork/preparation. Since Sm (x) is continuous
(being just a polynomial) we can find δ > 0 so that
ε
|Sm (x) − Sm (x0 )| < whenever x lies in the interval (x0 − δ, x0 + δ).
3
(We also take care that δ is small enough to fit (x0 − δ, x0 + δ) inside [−ρ, ρ].)
Now for any x ∈ (x0 − δ, x0 + δ):
|f (x) − f (x0 )| = |(f (x) − Sm (x)) + (Sm (x) − Sm (x0 )) + (Sm (x0 ) − f (x0 ))|
≤ |f (x) − Sm (x)| + |Sm (x) − Sm (x0 )| + |Sm (x0 ) − f (x0 )|
ε ε ε
< + + = ε.
3 3 3
That is, f is continuous at x0 . Since x0 was an arbitrary element of (−D, D), the
proof is complete.
14.3.5 Notes
1. There are a couple of places in the above argument at which a conscientious

student might perfectly justifiably feel anxious. For one thing, we used an
infinite version of the triangle inequality although we have ever only proved
finite versions. Yet this is legitimate: if (ak ) is a sequence then, for each m ∈ N,
we have already shown that
m
m

ak ≤ |ak |.

1 1

Now provided that ak is absolutely convergent, the second summation
converges to its supremum , the number conventionally written as

= ∞ 1 |ak |. Thus, for every m:

m
− ≤ ak ≤ .
1

However, absolute convergence implies convergence, so m 1 ak also converges
∞
to its limit, the number conventionally written as ∞ 1 ak . Taking limits across
the
∞ previous display, we therefore obtain − ≤ a
1 k ≤ , that is,

1 ak ≤ , or ∞
∞

ak ≤ |ak |,

1 1
as we desired.
2. Lines such as
∞

m ∞

f (x) = an xn therefore f (x) − an xn = an xn ,
0 0 m+1
plausible though they appear, should also create a pause for thought since two
limiting processes are involved. Remember that, once the integer m has been

fixed, m n
0 an x is simply a real
number, a constant.
If we take a convergent series ∞ 1 ak , and add a constant K as a zeroeth
term, every partial sum will increase by K and therefore so will the
sum-to-infinity. In other notation,
lim (K + a1 + a2 + a3 + · · · + an ) = K + lim (a1 + a2 + a3 + · · · + an ).

n→∞ n→∞

In the case where K = − m 1 ak , and restricting the discussion to n > m
which will not disturb limiting behaviour, this says
lim (am+1 + am+2 + am+3 + · · · + an ) = K + lim (a1 + a2 + a3 + · · · + an ),

n→∞ n→∞
confirming the legitimacy of steps such as
∞

m ∞

ak = − ak + ak .
m+1 1 1
3. The idea of finding an approximation to a limit function f (x) that is ‘ε-good’

simultaneously for an entire interval of values of x, instead of just for one
x-value that interested us, is called uniform convergence. We mobilised it just
once, in the roughwork preparation for the last theorem. The interested reader
(with considerable time to spare) will find a great deal more about this in the
literature.
The final result in the set, concerning how to differentiate the sum of a power
series, looks like little more than common sense at first sight. The proof, however,
is demanding, so we shall divide its burden across this chapter and Chapter 16.
Given that
f (x) = a0 + a1 x + a2 x2 + a3 x3 + · · · + an xn + · · ·
converges on an open interval (−D, D), the only reasonable guess that comes to
mind is that the so-called derived series
a1 + 2a2 x + 3a3 x2 + · · · + nan xn−1 + · · ·
really ought to converge on the same interval, and its sum really ought to be the
derivative of f (x). This is, in fact, true. For the moment, we shall content ourselves
with proving only the first ‘really ought’.

14.3.6 Theorem If
an xn has radius of convergence D > 0, then the derived
series nan x n−1 also converges on (−D, D).
Proof
For arbitrary x0 in (−D, D), again start by choosing positive ρ so that

−ρ < x0 < +ρ < D. Since an ρ n converges, its terms tend to zero and are
therefore bounded: there is a positive number M such that
M
|an ρ n | < M, that is, |an | < for all n ∈ N.
ρn
Therefore
M M |x0 | n−1
|nan x0n−1 |
≤ n n |x0 | n−1
= ×n .
ρ ρ ρ
n−1
Now M ρ is merely a constant, and the power series nx has radius of con-
vergence 1, so the final term in the display belongs to a convergent series. The

comparison test applies, and shows that nan x0n−1 is also absolutely convergent.
Since x0 was any element of (−D, D), the proof is complete.

14.3.7 Remark Hence also the second derived series n(n−1)an xn−2 , the third

derived series n(n − 1)(n − 2)an x , and so on, all converge absolutely on the
n−3
interval (−D, D), where D is the radius of convergence of the original power series.
It can also be shown that the radius of convergence of the derived series (and of
the second, and of the third . . . ) is exactly D.
.........................................................................
15 Uniform continuity —
continuity’s global
cousin
.........................................................................
15.1 Introduction
Continuity is a local property of a function, not a global one.
It is all too easy to forget this, because most of the functions we meet in practice
are continuous at every point in their domain; this tends to create a misleading
impression that continuity is global in character. However, look at the definition of
continuity on a set: a function f : D → R is continuous on D if, for every individual
point p in D and every sequence (xn ) in D that converges to p, we find that the limit
of f (xn ) equals f (p). The phrase in italics reminds us that continuity upon a set
needs to be assessed at each individual element of that set: that makes it essentially
a ‘local’ property (because it is judged locally, one point at a time).
We can emphasise this further by reminding ourselves of the ε − δ, challenge-
response, input-output ‘game’ that we can play in order to confirm continuity,
namely:
for each output challenge ε > 0, there is an input response δ > 0

such that x ∈ D, |x − p| < δ together imply |f (x) − f (p)| < ε.
(Of course, we don’t usually write most of the English words in that display – we
are just reminding ourselves about the nature, the dynamics of the game.)
It goes without saying that the response depends on the challenge – that δ
depends upon (is a function of) ε. We could have been hyper-fussy and written
not δ but δε or δ(ε) to make this point visible, but we don’t: as was just remarked,
it goes without saying.
What also goes without saying, and is about to become important, is that δ is
also allowed to depend upon and vary with p. Because continuity is assessed by
the ε − δ game one point at a time, there is no requirement that (for a particular ε)
the δ you find at one point should equal the δ you find at another. It might happen,
but for ordinary continuity it doesn’t have to.
260 15 UNIFORM CONTINUITY — CONTINUITY ’S GLOBAL COUSIN
ε
f (p)
ε
…there is a response δ > 0
δ δ
Ordinary continuity at p: δ may well depend on p
Straight line graph, positive gradient K
ε
f (p)
ε
the optimal response is δ = ε

K
p
ε ε
K K
‘Uniform’ continuity: a straight-line function; optimal δ does not depend on p
We can throw some more light onto this issue by looking at a few simple
illustrative examples. The straight-line function s : R → R given by s(x) = Kx
(where K is some positive constant) is just about the simplest of all functions upon
which to play the ε − δ game because, for a given ε > 0, the choice δ = ε/K is
optimal: it works, and it is the biggest possible choice of δ that does work.
Consider now a slightly more complicated function, say, the function
s : [0, 4] → R whose graph consists of the four pieces of straight line joining
the points (0, 0) to (1, 1), (1, 1) to (2, 3), (2, 3) to (3, 6) and (3, 6) to (4, 10).
Piecewise-linear function (four segments)
(4, 10)
gradient = 4
(3, 6)
gradient = 3
(2, 3)
gradient = 2
(1, 1)
gradient = 1
(0, 0)
Uniform continuity: piecewise linear, four segments
Since their gradients are 1, 2, 3 and 4, playing the ε − δ game (for a small
value of ε) at 0.5 has an optimal choice of δ = ε/1; but playing instead at 1.5, the
optimal choice changes to δ = ε/2, and playing instead at 2.5 and at 3.5 changes it
to δ = ε/3 and to δ = ε/4. Of course, a general policy of choosing δ = ε/4 would
have dealt with all four of these points, since a choice of δ that is smaller than the
optimal choice is still a perfectly valid choice – a winning move in the game, so
to speak. Devoting a little extra care to the corners (because at the three junction
points (1, 1), (2, 3) and (3, 6), the graph actually doesn’t have a gradient), you can
readily check that the strategy δ = ε/4 will win the ε − δ game at every point
in [0, 4].
Summary so far: for the function s , the natural choice of δ (for a given pos-
itive value of ε) varies from one point of its domain to another, but there is
a uniform way to select δ that actually works irrespective of which point you
focus on.
If we step up the last example by adding straight line fragments of gradient
5, 6, 7, · · · , n, then the discussion barely changes: the ‘best’ choice of δ at a point on
the part of the graph that has gradient j is δ = ε/j, and that varies from one point
to another; but a systematic choice of δ = ε/n will work smoothly at every single
point of the domain [0, n].
Now take the final step in the direction in which we are travelling, by continuing
to add endlessly many pieces of straight line of ever steeper gradient to the graph
of s .
Piecewise-linear function (infinitely many segments)
(6, 21)
gradient = 6
(5, 15)
gradient = 5
gradient = 4 (4, 10)
gradient = 3 (3, 6)
(2, 3)
(1, 1)
(0, 0)
Uniform continuity fails: piecewise linear, infinitely many segments
(If you wish, you can even obtain a formula for the function s whose
graph this is:
n − n2
s (x) = nx + while n − 1 ≤ x < n (n ∈ N)
2
but, pragmatically, it is the shape of the graph that will help you to see what is
happening, more than the formula does.)
For this function s it remains true that the optimal choice of δ at a point on
the part of the graph that has gradient j is δ = ε/j, but now there is no one-size-
fits-all choice of δ that will work everywhere: if someone were to claim that (for
a particular ε) a magical choice of δ = δ would work at all points of the domain
[0, ∞), then we could disprove that claim merely by finding an integer q greater
than ε/δ , shifting our attention to a point on the graph at which the gradient
is q, and observing that, at that point, the alleged all-purpose δ is bigger than the
optimal, the greatest acceptable value of δ (namely ε/q). Thus, for the function s
(which certainly is continuous), not only does δ vary naturally with the point at
which you play the game, but IT HAS TO VARY: there is no way to pick a δ that will
work at all of the points in the domain of s .
You may have noticed the word uniform sneaking into the discussion half a page
back, and this is precisely what it means when it refers to continuity: a function is
uniformly continuous on a set if not only can the ε − δ game be played and won
at each point of the set, but also there is (for each ε > 0) a way to choose δ > 0
that will work equally well at each and every point of the set; in other words, δ can
be chosen in a way that is independent of the point at which the ε − δ game is to
15.2 UNIFORMLY CONTINUOUS FUNCTIONS 263
be played. So s and s (as detailed above) were not just continuous but uniformly
continuous, whereas s was continuous but not uniformly continuous. As George
Orwell didn’t quite get around to saying in Animal Farm, all (these) functions are
continuous but some are more continuous than others.
15.2 Uniformly continuous functions

15.2.1 Definition Let f : D → R, and A ⊆ D. We say that f is uniformly continuous
on A if
for all ε > 0 there exists δ > 0 such that

x ∈ A, y ∈ A, |x − y| < δ together imply |f (x) − f (y)| < ε.
(We can usually assume that A is the whole of D since, if it were not, we could
replace f by its restriction f |A to A and work with that function instead.)
15.2.2 Note Most people cannot, at first sight, see how this differs from ‘ordinary’
continuity on A but, hopefully, the introduction to this chapter will have clarified
the distinction for you. In ‘ordinary’ continuity, you start with a y and an ε, and
you go looking for a δ such that
|x − y| < δ forces |f (x) − f (y)| < ε · · · · · · (1).
In uniform continuity, on the other hand, you start only with an ε, and you go
looking for a δ such that
|x − y| < δ forces |f (x) − f (y)| < ε · · · · · · (2).
In the first case, then, the δ needs to work only for a particular y and ε; in the
second case, the δ has to work for a particular ε, but for all x and y that are δ-
close no matter where in A they lie. This is asking considerably more: despite the
apparent identity between (1) and (2), in (1) the y is fixed and only the x varies,
whereas in (2) the x and the y are both free to vary.
√
15.2.3 Example To show that the function f : [1, ∞) → R given by f (x) = x is
uniformly continuous (on [1, ∞)).
Solution
√ √
We need to compare |x − y| with |f (x) − f (y)| = | x − y| and, fortunately, there
is a simple algebraic connection between them:
√ √ √ √
( x − y)( x + y) = x − y
from which it follows (keeping in mind that x ≥ 1 and y ≥ 1 in the present

domain) that

√ √ x − y x − y
|f (x) − f (y)| = | x − y| = √ √ ≤ .
x + y 2
So, if ε > 0 is given, we may choose δ = 2ε > 0 and see from the last display that
|x −y| < δ will ensure that |f (x)−f (y)| < ε no matter where x and y are in [1, ∞):
this is exactly what uniform continuity requires.
15.2.4 EXAMPLE Use a similar argument to prove that the function g : [1, ∞) → R
√
given by g(x) = 3 x is uniformly continuous (on [1, ∞)). You may find it helpful
to use the factorisation p3 − q3 = (p − q)(p2 + pq + q2 ).
Here is a small result that you would probably guess to be true, given what you
know about ‘ordinarily continuous’ functions:
15.2.5 Example To show that the composition of two uniformly continuous

functions must be uniformly continuous.
Solution
To set up enough notation to discuss the posed question, let f : A → B and
g : B → C both be uniformly continuous, where B ⊆ B . Then the composite
map g ◦ f makes sense, and it is a function from A to C.
Let ε > 0 be given.
Since g is uniformly continuous, there is δ1 > 0 such that
p, q ∈ B , |p − q| < δ1 together imply |g(p) − g(q)| < ε.
Since f is uniformly continuous and δ1 > 0, there is δ2 > 0 such that
x, y ∈ A, |x − y| < δ2 together imply |f (x) − f (y)| < δ1 .
Now for any x and y in A such that |x − y| < δ2 , the second display tells us that
|f (x) − f (y)| is less than δ1 so, putting p = f (x) and q = f (y) in the first display,
we see that |p − q| is less than δ1 , and therefore |g(f (x)) − g(f (y))| < ε. In other
words, |(g ◦ f )(x) − (g ◦ f )(y)| is less than the given ε.
Hence the result.
Recalling that the convergence of sequences gave us a particularly efficient
way to describe continuity, it might have been expected that something similar
would arise in uniform continuity. The following lemma provides just such a
characterization, and it is often helpful in sorting out the subtle but important
distinction between the two concepts.
15.2.6 Lemma: the two–sequence characterization Let f : A → R. The follow-

ing are equivalent:
1. f is uniformly continuous on A,
2. for every two sequences (xn ) and (yn ) in A such that |xn − yn | → 0, we have
that |f (xn ) − f (yn )| → 0 also.
SUGGESTION
Since it is clear that this lemma is a close relative of the corresponding (‘one-
sequence’) result that connects the sequence-style definition and the epsilontics-
style characterisation for ordinary continuity, you can reasonably expect its
demonstration to follow the pattern of that result’s proof. Try constructing such a
demonstration before you read the account that we present next.
Proof
(I): (1) implies (2).
• Let (xn )n∈N and (yn )n∈N be any two sequences of elements of A for which
|xn − yn | → 0.
• Given ε > 0, use condition (1) to obtain δ > 0 such that whenever x ∈ A
and y ∈ A and |x − y| < δ, we have |f (x) − f (y)| < ε.
• Since |xn − yn | → 0, there is n0 ∈ N such that n ≥ n0 guarantees
|xn − yn | < δ.
• Therefore n ≥ n0 ⇒ |xn − yn | < δ ⇒ |f (xn ) − f (yn )| < ε.
• That is, |f (xn ) − f (yn | → 0. This proves (2).
found.
• In particular, for each n ∈ N, δ = 1/n is not suitable…
• . . . and so there are points xn , yn ∈ A such that |xn − yn | < 1/n and yet
|f (xn ) − f (yn )| ≥ ε.
• Therefore |xn − yn | → 0, and yet |f (xn ) − f (yn )| does not converge to 0.
• In other words, condition (2) is not satisfied.
15.2.7 Corollary Any uniformly continuous function on A is continuous on A.
Proof
Suppose f : A → R to be uniformly continuous. Given a convergent sequence
xn → in A, let (yn ) be the constant sequence (, , , , · · · ). Obviously
|xn − yn | → 0, so the two-sequence lemma gives |f (xn ) − f (yn )| → 0, that is,
|f (xn ) − f ()| → 0. This is the same as saying f (xn ) → f (). Since f therefore
preserves limits of sequences, f is continuous at each point of A, as required.
Alternative proof
In the definition of uniform continuity, take the special case where y is held
constant, and you immediately get f continuous at y (for each y).
15.2.8 Example Of course f (x) = x2 defines a continuous function on R. We

show that it is NOT uniformly continuous.
Solution:
The two-sequence lemma suggests we look for a pair of sequences that get close to
one another but whose squares do not, and the Introduction suggests we try to get
out into the high-gradient parts of the graph. So take xn = n, yn = n + n1 for each
n ∈ N. That decision creates two sequences in [0, ∞) and, since |xn − yn | = n1 , we
certainly have |xn − yn | → 0. However,

1 2 2 1
2 2 1
n − n + = n − n − 2 − 2 = 2 + 2
n n n
which does not converge to zero. By the two-sequence lemma, x2 cannot be

uniformly continuous.
15.2.9 Remark Almost exactly the same proof will show that x2 is not uniformly
continuous on any interval of the form [a, ∞), nor on any interval of the form
(a, ∞).
15.2.10 EXERCISE
• Show that f (x) = x3 fails to define a uniformly continuous function on R.

√
• Show that f (x) = x x fails to define a uniformly continuous function on
[0, ∞).
The relationship between uniformly continuous functions and Cauchy

sequences is close and very useful, but it is also a little complicated:
15.2.11 Theorem Uniformly continuous functions preserve Cauchyness. That is,

if f is uniformly continuous on a set A of real numbers and (xn )n∈N is a Cauchy
sequence in A, then (f (xn ))n∈N is also a Cauchy sequence.
Proof
Given ε > 0, we use uniform continuity to find δ > 0 so that |x − y| < δ, x ∈ A,
y ∈ A together imply |f (x) − f (y)| < ε. Since (xn )n ∈ N is Cauchy, now choose
n0 ∈ N so that m, n ≥ n0 will force |xm − xn | < δ. Then m, n ≥ n0 also forces
|f (xm ) − f (xn )| < ε and we have what we wanted.
The interesting question now is: is the converse true? And the irritating answer
is: sometimes. . .
15.2.12 Theorem Let f : A → R be a function defined on a bounded set A and

suppose that it preserves Cauchyness. Then f is uniformly continuous on A.
Proof
If not, then there is a positive number ε such that, no matter how we choose δ > 0,
there will be points of A within δ of one another whose f -values are at least ε apart.
In particular, for each positive integer n (and choosing δ = n1 ), there exist xn , yn in
A for which |xn − yn | < n1 and yet |f (xn ) − f (yn )| ≥ ε.
Since A is bounded, Bolzano-Weierstrass tells us that (xn )n∈N has a convergent
subsequence (xnk )k∈N converging to a limit . Since, for all k ≥ 1:
1
|ynk − | ≤ |ynk − xnk | + |xnk − | < + |xnk − | → 0,
nk
we see that (ynk )k∈N also converges to the same limit . If we now ‘interleave’ these
two sequences, we get
(xn1 , yn1 , xn2 , yn2 , xn3 , yn3 , xn4 , yn4 , · · · )
again converging to (see Example 5.2.5), and therefore Cauchy, and yet
(f (xn1 ), f (yn1 ), f (xn2 ), f (yn2 ), f (xn3 ), f (yn3 ), f (xn4 ), f (yn4 ), · · · )
fails to be Cauchy since endlessly many pairs of its terms are at least ε apart: we
have thus achieved a contradiction.
15.2.13 Note If we were to drop the word ‘bounded’, this result would cease to
be true: for instance, we know that the x2 function on (the unbounded set) R
is not uniformly continuous, and yet it is easy to check (do so) that it preserves
Cauchyness.
The next theorem is generally viewed as the most important basic result about
uniform continuity. We’ll offer two different proofs of it.
15.2.14 Key theorem Any continuous function f on a bounded closed interval

[a, b] is uniformly continuous there.
First proof — only using the definition

If not, then there exists ε > 0 for which NO choice of δ > 0 will work. In particular,
for each n ∈ N, the condition
1
x ∈ [a, b], y ∈ [a, b], |x − y| <
n
fails to force |f (x) − f (y)| < ε.

That is, there exist (for each n ∈ N) points xn , yn ∈ [a, b] such that
1
|xn − yn | < and yet |f (xn ) − f (yn )| ≥ ε · · · · · · (1)
n
By Bolzano-Weierstrass, some subsequence (xnk ) of (xn ) converges (and, since

a ≤ xnk ≤ b for each k, its limit must satisfy the same inequality):
xnk → x ∈ [a, b] as k → ∞ · · · · · · (2)
Now
|x − ynk | ≤ |x − xnk | + |xnk − ynk | → 0 as k → ∞
that is, ynk → x also1 (as k → ∞). By ordinary continuity, f (ynk ) → f (x) as
k → ∞, and also (using (2)) f (xnk ) → f (x) as k → ∞.
Subtract, and we get f (xnk ) − f (ynk ) → 0 as k → ∞.
This contradicts (1), and completes the proof.
Second proof — using Cauchy sequences

Let (xn )n∈N be any Cauchy sequence in [a, b].
Because it is Cauchy, xn → for some ∈ R.
Since a ≤ xn ≤ b for all n, also a ≤ ≤ b.
Since f is continuous on [a, b], and ∈ [a, b], f is in particular continuous at ,
so f (xn ) → f ().
Because it is convergent, (f (xn ))n∈N is Cauchy.
Now f is Cauchy-preserving on a bounded set, so (by 15.2.12 above) it is
uniformly continuous.
15.2.15 Example Let interval I and uniformly continuous f : I → R be given.

1. If I is closed and bounded, we show that f 2 is uniformly continuous.
2. We show by example that (1) can fail to be true if we omit the words ‘and
bounded’.
Solution to 1: first method

(Roughwork: knowing that we can make |f (x) − f (y)| as small as we wish just by
taking x and y close together, here we need to make |f 2 (x) − f 2 (y)| small as well.
What connection can we see between them? Well, the second one factorises:
1 Refer back to Exercise 2.7.14 if this is not clear enough.

|f 2 (x) − f 2 (y)| = |f (x) + f (y)| |f (x) − f (y)|
and this will be less than M|f (x)−f (y)| provided that we can find a constant M that
is always bigger than |f (x) + f (y)|. Luckily, we can find some such constant since
the continuous function f will be bounded on the closed, bounded interval I. Then
forcing |f (x) − f (y)| to be smaller than ε/M will guarantee that |f 2 (x) − f 2 (y)| is
smaller than ε. Now let’s write that out properly. . .)
Since f is uniformly continuous, it is certainly continuous on the closed,
bounded interval I, and therefore f itself is bounded: there exists K > 0 so that
|f (x)| ≤ K for all x ∈ I. Notice that (for any p and q)
|p2 − q2 | = |(p − q)(p + q)| ≤ (|p| + |q|)|p − q|.
Now f 2 is the function defined by f 2 (x) = (f (x))2 , x ∈ I, so

2
f (x) − f 2 (y) ≤ |f (x)| + |f (y)| |f (x) − f (y)| ≤ 2K|f (x) − f (y)|.
Given ε > 0 choose δ > 0 such that

ε
(x ∈ I, y ∈ I, |x − y| < δ) ⇒ |f (x) − f (y)| <
2K
⇒ |f 2 (x) − f 2 (y)| < ε as required.
Solution to 1: second method

Since f is uniformly continuous, it is continuous. By standard algebra of continuous
functions, f 2 is also continuous, and upon a closed, bounded interval. By the key
theorem, f 2 is uniformly continuous there.
Solution to 2:
With g : R → R defined by g(x) = x, it is really trivial that g is uniformly
continuous. Yet g 2 is now the x2 function that we have shown not to be uniformly
continuous. So the boundedness of the interval I cannot be thrown away
in part 1.
Of course the key theorem is not able to prove a function to be uniformly
continuous on an unbounded interval; nevertheless, it can sometimes be employed
to carry out a significant part of that task; look first at the following:
15.2.16 EXERCISE Let I and J be two intervals that share an endpoint from
opposite sides: that is, I is either (−∞, b] or (a, b] or [a, b], while J is either [b, ∞)
or [b, c) or [b, c]. Let f : I ∪ J → R be uniformly continuous on I, and also
uniformly continuous on J. Show that f is uniformly continuous on I ∪ J.
Remarks
• This turns out to be valuable more often than you might expect, because the
most obvious reason why some function is uniformly continuous can vary
from one part of its domain to another. The above result allows us to ‘glue
together’ uniform continuity that has been ‘separately evidenced’ in different
parts of its domain. (Consider, as an illustration, the next example.)
• The only non-routine part of the proof is to show that nearby points x ∈ I
and y ∈ J have f -values that are suitably close together. To help with this, notice
that both x and b, and also b and y, will be nearby, and that
|f (x) − f (y)| ≤ |f (x) − f (b)| + |f (b) − f (y)|.
√
15.2.17 Example To prove that the formula f (x) = x, x ∈ [0, ∞) defines a
uniformly continuous function on [0, ∞).
Solution
Since f is continuous on the closed, bounded interval [0, 1], the key theorem
provides evidence of its uniform continuity there. Also, an earlier example (15.2.3)
showed its uniform continuity on [1, ∞). Now we can appeal to 15.2.16 to deduce
that it is uniformly continuous on the union [0, 1] ∪ [1, ∞) = [0, ∞) as was
required.
√
15.2.18 EXERCISE Prove that the function f given by f (x) = 3 x, x ∈ [0, ∞) is
uniformly continuous on [0, ∞).
15.2.19 Examples To determine whether the following real functions are uni-
formly continuous on the intervals indicated.
1. On (0, ∞) we define f by the formula f (x) = x1 .
2. On [1, ∞) we define f by the formula f (x) = x1 .
√ 4
3. On [0, 10] we define f by the formula f (x) = sin(cos2 ( 1 + x3 + ex+x )).
4. On [0, 2] we define f (x) = x, the floor of x.
sin x
5. On (0, 1] we define f (x) = (you should assume that the limit of f (x) as
x
x → 0 is 1).
6. On (0, ∞] we define f (x) = ex (you should assume basic facts about the
function ln, including that it is continuous on positive numbers).
Solution

1. The sequence n1 n≥1 in (0, 1) is Cauchy because it converges (to 0). Yet
1
f n n≥1 is the sequence (n) and that is not Cauchy: indeed, it is not even
bounded. So f , since it does not preserve Cauchyness, cannot be uniformly
continuous.
2. When x ≥ 1 and y ≥ 1, we see that

1 1 |y − x|
|f (x) − f (y)| = − =
< |x − y|.
x y xy
So, given ε > 0, if we choose δ = ε, we shall have |x − y| < δ implying

|f (x) − f (y)| < ε. Therefore f is uniformly continuous on this set.
3. Despite its complicated formula, this expression has been built up from
components that we know to be continuous; so f is therefore continuous.
Because [0, 10] is closed and bounded, the key theorem assures us that f is
uniformly continuous here.
4. As x → 1 this function does not have a limit, so it is not continuous and
therefore cannot possibly be uniformly continuous.
5. The question is more awkward than (3) because the domain is not closed.
However, f is again composed from continuous components at every point of
its domain and without division by zero, and is therefore continuous there.
sin x
Also, as x → 0, → 1. Therefore, if we now define a ‘new’ function F on
x
the closed interval [0, 1] by the formula
F(x) = f (x) if 0 < x ≤ 1, F(x) = 1 if x = 0
then F will be continuous not only on (0, 1] but at 0 as well, that is, it is
continuous on closed, bounded [0, 1]. By the key theorem, F is uniformly
continuous on [0, 1] and, in particular, on its subset (0, 1]. Yet on this interval,
F and f are the same function – so f is uniformly continuous on its given
domain.
6. (After some trial-and-error along the lines of the roughwork thinking we did
towards showing that x2 was not uniformly continuous), for each positive
integer n we try xn = ln(n) and yn = ln(n + 1). Then
|xn − yn | = ln(n + 1) − ln(n) = ln 1 + n1 and, as n → ∞, this expression
→ ln(1) = 0 since ln is continuous. On the other hand,
|f (xn ) − f (yn )| = | n − (n + 1)| = 1 which does not converge to 0. By the
two-sequence lemma, f is not uniformly continuous.
15.2.20 Example Suppose that f : I → R is uniformly continuous and never takes

the value 0 (where I is an interval).
1
1. If I is both bounded and closed, we show that f is also uniformly continuous
on I.
1
2. We show by example that if I is closed but not bounded, then f could fail to be
uniformly continuous on I.
1
3. We show by example that if I is bounded but not closed, then f could fail to be
uniformly continuous on I.
Solution
1. Being uniformly continuous, f is certainly continuous. Since dividing among
continuous functions (but scrupulously avoiding division by 0) always gives
continuous functions, 1f is continuous (on I). Since the interval I is bounded
1
and closed, f is also uniformly continuous there by the key theorem.
2. For instance, on the closed unbounded interval I = [1, ∞), f : I → R
described by f (x) = x12 is non-zero and uniformly continuous (as a proof very
like that of Example 15.2.19, part 2, will show). But here, 1f is the x2 function
which we know how to prove to be not uniformly continuous.
3. For instance, on the bounded, non-closed interval (0, 1], it is very easy to
check that the function f (x) = x is uniformly continuous (and never exactly
zero). Yet, very much as we saw in an earlier example, its reciprocal, the x1
function, is not.
15.3 The bounded derivative test

Just as differentiability sometimes provides us with a quick way to confirm conti-
nuity, the first mean value theorem sometimes gives us a quick way to confirm uni-
form continuity. To add a little perspective, we first formulate a related definition:
15.3.1 Definition: Lipschitz functions A function f : I → R is said to be

Lipschitz, or to satisfy the Lipschitz condition on I, if there is a positive constant K
such that
|f (x) − f (y)| ≤ K|x − y| for all x, y ∈ I.
15.3.2 Lemma Lipschitz functions are uniformly continuous.
Proof
Given ε > 0, define δ = Kε where K is the constant in the Lipschitz definition.
Then
x, y ∈ I, |x − y| < δ ⇒ |f (x) − f (y)| ≤ K|x − y| < Kδ = ε.
15.3.3 EXERCISE (Perhaps unfortunately,) not all uniformly continuous func-

tions are Lipschitz. For example, the function given by

f : [−1, 1] → R, f (x) = 1 − x2
is uniformly continuous by the key theorem, because it is continuous on the closed

bounded interval [−1, 1]. Prove that it is not Lipschitz.
15.3.4 Theorem: the bounded derivative test for uniform continuity Suppose
that the real function f is continuous on an interval I and differentiable at each
15.3 THE BOUNDED DERIVATIVE TEST 273
interior point of I, and that its derivative f (x) is bounded. Then f is uniformly
continuous on I.
Proof
For any a < b in I the conditions of the first mean value theorem are met on the
subinterval [a, b], so there is a point c ∈ (a, b) such that
f (b) − f (a)
= f (c).
b−a
There is also a positive constant K such that |f | < K at every point of the interval
(a, b), so
|f (b) − f (a)| ≤ K|b − a|.
Hence f is a Lipschitz function, and is therefore uniformly continuous.
15.3.5 Notes
• The converse of this

√ theorem is not true: for instance, the uniformly continuous
function f (x) = 1 − x2 has an unbounded derivative on (−1, 1).
• This may be a good moment at which to review a variety of uniformly
continuous functions, and observe how the bounded derivative test can make it
easier for us to see why they are so.
1. On the interval [1, ∞), f (x) = x−1 is uniformly continuous because the
modulus of its derivative | − x−2 | never exceeds 1.
2. On the interval [1, ∞), f (x) = x−2 is uniformly continuous because the
modulus of its derivative | − 2x−3 | never exceeds 2.
3. If f (x) = ax + b then |f (x)| = |a| always, and is bounded, so f is uniformly
continuous on any interval.
4. If
f (x) = esin x+cos x
then (assuming basic results about the trig functions)
f (x) = (cos x − sin x)esin x+cos x
which cannot exceed 2e2 in modulus, so this function is uniformly

continuous on any interval.
√
5. Here is another way to show that the function f (x) = x is uniformly
continuous on [0, ∞). Firstly, it is continuous on [0, 1] and therefore, by the
key theorem, uniformly continuous there. Secondly, on [1, ∞) its derivative
1
is √ and therefore (in modulus) never more than 12 , so by the bounded
2 x
derivative test it is uniformly continuous there also. Now we can invoke
Exercise 15.2.16 to see that it is uniformly continuous on the union
[0, 1] ∪ [1, ∞) = [0, ∞).
√
3 2
15.3.6 EXERCISE Show that the function specified by f (x) = x is uni-
formly continuous on R. Suggestion: break up the domain into
(−∞, −1] ∪ [−1, 1] ∪ [1, ∞).
15.3.7 EXERCISE Suppose that f : (0, ∞) → R is everywhere differentiable, and

that the derivative f (x) → ∞ as x → ∞ (so: the derivative is unbounded in a
rather extreme manner.) Show that f cannot be uniformly continuous on (0, ∞).
15.3.8 Note In many areas of mathematics, an important question to ask about a

continuous function is whether it can be extended over a bigger domain without
destroying its continuity. Indeed, we recently saw a use of this idea in part 5 of
Example 15.2.19, where we proved uniform continuity of a function on the domain
(0, 1] by extending it to become a continuous function on the domain [0, 1]: even
that tiny augmentation of domain turned out to be strategically beneficial. There
are – as that quoted example already indicates – strong connections between this
kind of ‘continuous extensibility’ on the one hand, and uniform continuity on the
other. We shall round off the chapter by taking a brief look at this issue.
15.3.9 Proposition Let f : (a, b] → R. Then f can be extended to a continuous

real function on [a, b] if and only if f is uniformly continuous.
Proof
(I) Suppose firstly that f can be so extended: that is, there is a continuous function
F : [a, b] → R whose restriction to (a, b] is exactly f . By the key theorem, F is
uniformly continuous on [a, b] and therefore, in particular, uniformly continuous
on (a, b]. Yet F and f are identical on (a, b], so f is uniformly continuous.
(II) Conversely, suppose that f is uniformly continuous (on (a, b]). Choose a
sequence2 (yn ) in (a, b] whose limit is a. Since (convergent) (yn ) is Cauchy, we
know from 15.2.11 that (f (yn )) is also Cauchy, and consequently converges to some
limit – let us call it .
Now consider any sequence (xn ) in (a, b] that converges to a. If (as we did before)
we interleave the two sequences thus: (x1 , y1 , x2 , y2 , x3 , y3 , · · · ), we create a new
sequence converging to a, and therefore Cauchy, and we see from 15.2.11 that
(f (x1 ), f (y1 ), f (x2 ), f (y2 ), f (x3 ), f (y3 ), · · · )
is Cauchy and therefore has to tend to some limit (let us be cautious and call it
for the moment). However, since the subsequence (f (y1 ), f (y2 ), f (y3 ) · · · ) must
also converge to but actually does converge to , the two numbers and are,
in fact, identical. Hence the ‘complementary’ subsequence (f (x1 ), f (x2 ), f (x3 ) · · · )
converges to also.
b−a
2 for example, yn = a + n+1 would give one suitable choice.
15.3 THE BOUNDED DERIVATIVE TEST 275
What the last paragraph shows us is that the function F : [a, b] → R defined by
the formula
F(x) = f (x) for x ∈ (a, b]; F(a) =
possesses a limit as x → a, and that this limit is which equals F(a). Therefore F is
continuous at a, as well as (trivially) continuous everywhere else in [a, b]. Thus we
have managed to find a continuous extension of f over [a, b].3
15.3.10 EXERCISE Think how much4 you would need to modify that argument
in order to show that a real function defined on a bounded open interval (a, b) or,
indeed, on a finite union of bounded open intervals (a1 , b1 )∪(a2 , b2 )∪· · ·∪(an , bn )
can be continuously extended over the corresponding closed interval(s) if and only
if it is uniformly continuous.
3 Incidentally, the same ‘last paragraph’ also shows that F is unambiguously defined in the sense
that the number we selected to be the value of F(a) does not depend on how we chose the
sequence (yn ): any different choice of (yn ) would have resulted in exactly the same number .
4 The short answer is: not very much!
.........................................................................
16 Differentiation — mean
value theorems, power
series
.........................................................................
16.1 Introduction
Recall Rolle’s theorem: a function that is continuous on a bounded closed interval,
differentiable on the corresponding open interval, and of equal value at the
endpoints must have zero derivative at one point (at least) in between.
Recall the first mean value theorem: a function that is continuous on a bounded
closed interval and differentiable on the corresponding open interval must, some-
where between the endpoints, have derivative equal to the average (the mean)
gradient of its graph across the entire interval.
Given the use of the word first in the title, it will hardly surprise anyone to learn
that there are other ‘mean value’ theorems. This chapter is going to look at some
of the others: what they say, why they are true and what use can be made of them.
This study will lead us into questions of how to represent ‘highly differentiable’
functions by power series and thus, inevitably, back to the foreshadowed theorem
on the differentiation of power series themselves.
16.2 Cauchy and l’Hôpital

As we commented in an earlier chapter, Rolle can be seen as a special case of the
FMVT and, furthermore, the FMVT is most readily proved by re-engineering its
hypotheses into a form to which Rolle can apply. Analogous remarks apply to the
next result, which is a kind of ‘double’ FMVT that deals with two functions at once.
16.2.1 Cauchy’s Mean Value Theorem (‘CMVT’) Let f and g both be continuous
on [a, b] and differentiable on (a, b), with g (x) non-zero at every point of (a, b).
Then there is (at least) one point c ∈ (a, b) such that
f (b) − f (a) f (c)

= .
g(b) − g(a) g (c)
278 16 MEAN VALUE THEOREMS, POWER SERIES
Proof
We define a new function h by the formula h(x) = f (x)−λg(x) where the constant
λ will be chosen in such a way that RT can be applied to h.
First, how to choose λ? Certainly h will be continuous and differentiable where
f and g were, no matter how we decide to pick λ, so all we need to arrange is that
h(a) = h(b), that is
f (a) − λg(a) = f (b) − λg(b)
which solves easily to give

f (b) − f (a)
λ=
g(b) − g(a)
provided that the bottom line is non-zero. Fortunately, g(b) − g(a) cannot be zero
because, if it were, RT applied to g would tell us that g was zero somewhere, which
is explicitly not the case.
Now that λ has been thus chosen, RT applied to h gives us the existence of
c ∈ (a, b) for which 0 = h (c) = f (c) − λg (c) and so, again because g cannot
go zero,
f (c)
= λ,
g (c)
which is the declared result.
16.2.2 Remarks
1. In the special case where g(x) = x for all relevant x (and therefore g (x) = 1
which is certainly non-zero) we get, from CMVT,
f (b) − f (a) f (c)

= ,
b−a 1
that is, the FMVT is a particular case of the CMVT.

2. Occasionally useful is a slightly different version of CMVT in which we do not
assume that g is non-zero on the relevant interval. It says:
‘Let f and g both be continuous on [a, b] and differentiable on (a, b). Then
there is (at least) one point c ∈ (a, b) such that
(f (b) − f (a))g (c) = (g(b) − g(a))f (c)."
Proof
Case 1: if g(b) − g(a) = 0 then essentially the proof we gave before still runs.
Case 2: if g(b) − g(a) = 0 then RT says there is a point c such that g (c) = 0
and then the result is immediate.
16.2 CAUCHY AND L’HÔPITAL 279
16.2.3 Example Let f be any function that is continuous on [0, 1] and differen-
tiable on (0, 1), and n be any positive integer. We show that there must be a number
c in (0, 1) such that f (c) = ncn−1 (f (1) − f (0)).
Roughwork
Only one function f is visible here, but the group of symbols ncn−1 should make
us suspect that ‘the other function’ is xn . Certainly g(x) = xn is continuous and
differentiable wherever we need it to be, and its derivative goes to zero only at
x = 0, that is, not anywhere in (0, 1) . . .
Solution
The CMVT does apply to the two functions f and g where g(x) = xn , and it says
there is c ∈ (0, 1) such that
f (1) − f (0) f (c)

= .
1n − 0n ncn−1
This quickly rearranges into the form required.
16.2.4 EXERCISE The following alleged proof of CMVT is incorrect. Find out
precisely why.
(If necessary, try running the argument of the purported proof on a couple of
simple functions such as f (x) = x2 and g(x) = x3 over [0, 1].)
‘Since f satisfies the conditions of FMVT over the interval [a, b], we know that
f (b) − f (a)
there exists c ∈ (a, b) such that f (c) = .
b−a
‘By exactly the same argument on g:
g(b) − g(a)
there exists c ∈ (a, b) such that g (c) = .
b−a
‘Dividing one by the other (and remembering that g(b) − g(a) cannot be zero, else
RT would give g = 0 somewhere, contradiction) we get
f (c) f (b) − f (a)

=
g (c) g(b) − g(a)
as desired.’
One of the most immediate and useful applications of CMVT is a result that you
may very well have used already, called l’Hôpital’s Rule, for determining function
limits in the most awkward case where unaided common sense hits the nonsense
barrier of zero divided by zero.
16.2.5 L’Hôpital’s Rule Suppose that f and g are both differentiable on an open
interval (p − h, p + h) centred on a real number p, that both f (p) and g(p) are zero,
that g is non-zero here except possibly at p,1 and that f (x)/g (x) tends to a limit
as x → p. Then also
f (x)
→ as x → p.
g(x)
Proof
(I) For k positive and less than h, the two functions f , g satisfy the conditions
of CMVT on the interval [p, p + k]. Therefore there is a number c such that
p < c < p + k and
f (c) f (p + k) − f (p) f (p + k)

= = .
g (c) g(p + k) − g(p) g(p + k)
As k → 0 (but through positive values) the fact that p < c < p + k gives us c → p
f (c) f (p + k)
also, so (by hypothesis) → . Therefore → also.
g (c) g(p + k)
Let x stand for p + k here. In the language of one-sided limits (which is what we
are presently speaking, since we have so far only considered points just to the right
of p), what this establishes is that
f (x)
lim = .
x→p+ g(x)
(II) For k negative and lying between −h and 0, a virtually identical argument
upon the interval [p + k, p] shows that
f (x)
lim = .
x→p− g(x)
Putting the two one-sided limits together, we get (as we wanted)
f (x)
lim = .
x→p g(x)
There are many different versions of this Rule, of which the two most obvious
are what we actually proved above for one-sided limits:
1 Actually, we do not even need g (p) to exist, so long as g is continuous at p.

16.2.6 L’Hôpital’s Rule: right–hand limits Suppose that f and g are both differen-
tiable on an open interval (p, p + h) and continuous on [p, p + h), that f (p) and
g(p) are zero, that g is non-zero on (p, p + h), and that f (x)/g (x) tends to a limit
as x → p+ . Then also
f (x)
→ as x → p+ .
g(x)
Now the left-handed variety of this resembles it so closely that it is barely worth
stating:
16.2.7 L’Hôpital’s Rule: left–hand limits Suppose that f and g are both differen-
tiable on an open interval (p − h, p) and continuous on (p − h, p], that f (p) and
g(p) are zero, that g is non-zero on (p − h, p), and that f (x)/g (x) tends to a limit
as x → p− . Then also
f (x)
→ as x → p− .
g(x)
Here is another in which the control variable tends to infinity:
16.2.8 L’Hôpital’s Rule: as x → ∞ Suppose that f and g are both differentiable on

an open interval (a, ∞), that both f (x) and g(x) converge to 0 as x → ∞, that g
f (x)
is non-zero on (a, ∞), and that tends to a limit as x → ∞. Then also
g (x)
f (x)
→ as x → ∞.
g(x)
Proof
We shall use 11.1.14 to switch between limits at infinity and limits at zero (and
there is no loss of generality in assuming a > 0).
1 1
Define two ‘new’ functions F, G on the interval 0, by setting F(x) = f ,
a x
1
G(x) = g . Then F, G are differentiable and, if we additionally define
x
1
F(0) = 0, G(0) = 0, they also become continuous on 0, (because their limiting
a
values, as x → 0+ , are 0 and thus coincide
with the values that we attributed to
1 −2
them at 0). Furthermore, G (x) = −g x is non-zero, and
x

F (x) −f x1 x−2 f x1
lim = lim = lim 1
x→0+ G (x) x→0+ −g 1 x−2 x→0+ g
x x
f (t)
= lim = .
t→∞ g (t)
By a one-sided version of the Rule (16.2.6) that we have already established, that
gives
F(x)
lim = ,
x→0+ G(x)
which is equivalent to
f (x)
lim = .
x→∞ g(x)
16.2.9 Notes
1. Resist the temptation to think (or write) that the essence of the Rule is that (in
the zero-over-zero case)
f (x) f (x)
lim = lim .
g(x) g (x)
While it is true that l’Hôpital does say this, the main point is that if the second
limit exists, then so must the first. Their numerical equality is secondary to that.
2. One more procedural detail before we settle down to a batch of examples.
The Rule is written as if you begin with knowledge of the limit of f /g and
proceed from there to knowledge of the limit of f /g, but that is not exactly what
happens in practice. We actually begin with curiosity about the limit of f /g,
turn it into curiosity about f /g , solve that question if we can, and feed it back
into an answer for the limit of f /g.
3. The additional point is that if our first attack on the limit of f /g also hits the
nonsense wall of zero divided by zero, we ought not to give up the struggle:
we should, instead, drive the process further into curiosity about the limit of
f /g . If that is answerable, then the answer we get pans back to one for f /g
and, in turn, for f /g. If not, consider f /g , and so on. (Of course, if the
derivatives are becoming unmanageable, this process should not be continued
past the point of reasonable hopes.) Take care to check that the conditions
of the Rule are fully satisfied each time you invoke it.
x4 − 16
16.2.10 Example To determine, if it exists, the limit of as x → 2.
x6 − 64
Solution
An initial, common-sense attempt of replacing x by 2 gives the meaningless (but
encouraging) response of zero divided by zero, so an application of the Rule is
indicated.
Putting f (x) = x4 − 16 and g(x) = x6 − 64 we see that g (x) = 6x5 is zero
only at x = 0 so, if we operate over (say) the interval (1, 3) then that derivative is
non-zero and
f (x) 4x3 2
= = 2
g (x) 6x5 3x
f (x)
whose limit is obviously 16 . Therefore the limit also of exists, and equals 16 .
g(x)
16.2.11 Example (Assuming knowledge of how to differentiate basic exponential
and trigonometric functions), to determine whether or not the limit exists of
x(ex − 1)
sin2 x
as x → 0.
Roughwork
Putting x = 0 gives us zero divided by zero, which is not an answer, but suggests
we should try l’Hôpital.
Let
f (x) = x(ex − 1), g(x) = sin2 x.
Then (using product rule and chain rule)
f (x) = xex + (ex − 1), g (x) = 2 sin x cos x = sin(2x).
Now x = 0 in this still gives zero over zero, so try again:
f (x) = (1 + x)ex + ex , g (x) = 2 cos(2x).
This time the limits (as x → 0) are 2 and 2, so we have ‘broken through the
nonsense wall’.
Solution
Putting f (x) = x(ex − 1), g(x) = sin2 x we see that f (x) = xex + ex − 1,
g (x) = sin(2x) and f (x) = (2 + x)ex , g (x) = 2 cos(2x). All are visibly
differentiable (and continuous).
Now on (− π4 , π4 ) we have g non-zero, f and g are zero at 0, and

f (x) 2 f (x)
→ = 1 as x → 0. By the Rule, → 1 also.
g (x) 2 g (x)
Secondly, on (− π2 , π2 ) we have g non-zero except at 0, f and g are zero at 0, and

f (x) f (x)
→ 1 as x → 0. By the Rule, → 1 also.
g (x) g(x)
The desired limit does exist (and equals 1).
16.2.12 EXERCISE Evaluate

√
x− x
lim .
x→4 4 − x
16.2.13 EXERCISE Assuming that the derivative of ex is ex , evaluate
ex + (2 − x)e3
lim .
x→3 (x − 3)2
16.2.14 EXERCISE
1. Use l’Hôpital’s Rule to investigate

π
lim x − arctan x .
x→∞ 2
(You can assume that the derivative of arctan x is (1 + x2 )−1 .)

It may be useful to express the desired function in the form
π
2 − arctan x
1
.
x
1
2. Re-work the problem by putting t = x and so converting it into
π
1
2 − arctan t
lim .
t→0+ t
16.3 Taylor series

If f is a function that is differentiable several times on an open interval including
a then we can easily write down a list of polynomials that – in some sense at least
– give better and better approximations to f itself. For, consider the following:
p1 (x) = f (a) + f (a)(x − a)

f (a)
p2 (x) = f (a) + f (a)(x − a) + (x − a)2
2!
f (a) f (a)
p3 (x) = f (a) + f (a)(x − a) + (x − a)2 + (x − a)3
2! 3!
f (a) f (a) f iv (a)
p4 (x) = f (a) + f (a)(x − a) + (x − a)2 + (x − a)3 + (x − a)4
2! 3! 4!
. . . and so on. It is routine to check that the first of these, p1 , has the same
value and the same derivative as f had at a, the second p2 has the same value, the
same first derivative and the same second derivative as f had at a, the third has
additionally the same third derivative as f at a, and so on.
It would be nice to believe that these so-called Taylor polynomials were
approximating f better and better not only at a but on the interval as a whole.
16.3 TAYLOR SERIES 285
Unfortunately, this is not always true. However, it is true in many important cases.
This is what Taylor’s theorem is about: it sets out to examine how well the list of
polynomials that we just described approximates f over an interval of values of x
around x = a. There are many slightly different versions of it, but we’ll focus on
just one of them:
16.3.1 Taylor’s theorem Suppose that f is differentiable at least k + 1 times on an

open interval including a and x. Let pk denote the kth Taylor polynomial
f (a) f k (a)
pk (x) = f (a) + f (a)(x − a) + (x − a)2 + · · · + (x − a)k .
2! k!
Then
f k+1 (ξ )
f (x) = pk (x) + (x − a)k+1
(k + 1)!
for some number ξ lying between a and x.
Proof
Consider the function
f (t) f k (t)
F(t) = f (t) + f (t)(x − t) + (x − t)2 + · · · + (x − t)k
2! k!
where t varies over the interval in question. The first nice thing about this function
(please check this out) is just how its derivative simplifies:
f k+1 (t)
F (t) = (x − t)k
k!
and the second nice thing (check this one also) is that F(x) − F(a) simplifies to
f (x) − pk (x).
Now with G(t) = (x − t)k+1 , use the Cauchy mean value theorem on the
functions F, G on the interval joining a and x. It tells us that
F(x) − F(a) F (ξ )
=
G(x) − G(a) G (ξ )
for some number ξ between a and x, that is,
f k+1 (ξ )
k! (x − ξ )
k
f (x) − pk (x) f k+1 (ξ )
= =−
−(x − a) k+1 −(k + 1)(x − ξ )k (k + 1)!
which, cancelling the minuses, gives the declared result.

16.3.2 Notes
1. The series (of functions) whose partial sums are these Taylor polynomials is
known as the Taylor series of the function f at the point a.
2. For historical reasons, the special case a = 0 is named after Maclaurin as well
as after Taylor: Maclaurin’s theorem, Maclaurin polynomials, and so on.
3. Think about Taylor’s theorem as saying ‘original function = approximating
polynomial + error term’, where all three are functions of x, of course, and both
the polynomial and the error depend on k (on how far we have gone with the
approximation process). So the main point of the theorem above is to give us a
usable formula for the kth -stage error. Typical questions then are: does the
error always tend to zero, at least for each value of x across a range? Can we
make the error as small as we please over a range of x values simultaneously?
Determine a value of f to so-many decimal places (etc.)
4. The kth -stage error term is often called the remainder after k terms and
denoted by Rk (x):
f (x) = pk (x) + Rk (x).
16.3.3 Example Assuming that ex is its own derivative, we use the theorem to
show that (for every real number x) ex is the limit (as k → ∞) of
x2 x3 x4 xk
1+x+ + + + ··· + .
2! 3! 4! k!
Solution
Since the exponential function equals all of its own derivatives, and since they all
take the value 1 at 0, it is easy to see that the formula displayed here is just pk (x).
Our task, then, is only to show that the remainder tends to zero. Also, via Taylor’s
theorem,
eξ
Rk (x) = xk+1
(k + 1)!
e|x| |x|k+1
(for some ξ between 0 and x) whose modulus is at most , which does
(k + 1)!
indeed tend to zero (see paragraph 6.2.9).
16.3.4 Example We use Taylor’s theorem to show that, for the interval
J = [−1000, 1000], we can find a polynomial that differs from sin x at each point
of J by less than 0.000001. (Assume standard facts about the trig functions.)
Solution
In the theorem, take f (x) = sin x, a = 0. All the derivatives of f are either ± sin x
or ± cos x so they never exceed 1 in modulus. The remainder term
xk+1
|Rk (x)| = ±(sin, cos)(ξ )
(k + 1)!
16.4 DIFFERENTIATING A POWER SERIES 287
cannot exceed 1000k+1 /(k + 1)!, which tends to zero as before. Choosing a value
k0 of k that makes the latter expression less than 0.000001, we then see that
| sin x−pk0 (x)| = |Rk0 (x)| is less than 0.000001 at every point of J, that is, the Taylor
polynomial pk0 (x) is as good an approximation to sin x as the question wanted.
16.3.5 Example Estimate ln(1.12) to four significant figures.
Solution
Take f (x) = ln x, a = 1. The derivatives fall into an obvious pattern f (x) = x−1 ,
f (x) = − x−2 , f (x) = + 2x−3 , f iv (x) = − 3! x−4 and, in general, we see that
f k (x) = (−1)(k−1) (k − 1)! x−k . Setting a = 1 in these and appealing to Taylor’s
theorem, we see that the Taylor polynomials take the form
1 1 1
0 + 1(x − 1) − (x − 1)2 + (x − 1)3 − (x − 1)4 + · · ·
2 3 4
to the appropriate number of terms, and that the remainder is
f k+1 (ξ ) ξ −k−1
Rk (x) = (x − 1)k+1 = ± (x − 1)k+1
(k + 1)! k+1
where we are about to replace x by 1.12 and know, therefore, that ξ lies between 1
and 1.12. Thus the modulus of the remainder after k terms cannot be more than
1
k+1 (0.12)
k+1 . Experimenting now with a calculator, we soon find that k = 4 gives
1
k+1 (0.12)
k+1 = 0.000005 approximately, which ought to be good enough for four
significant figures, and the 4th Taylor polynomial approximation gives
1 1 1
p4 (1.12) = 0 + 1(1.12 − 1) − (1.12 − 1)2 + (1.12 − 1)3 − (1.12 − 1)4
2 3 4
which calculates out as 0.113324…

We find an answer of 0.1133 to four significant figures.
16.4 Differentiating a power series

What the previous section established can be expressed as follows: every infinitely
differentiable function2 for which it can be shown that the Taylor remainder after
k terms tends to zero in a satisfactory manner is the sum of a power series.
(The inclusion of the proviso about the remainder after k terms breaks up the
symmetry of what we are trying to say, but it cannot be avoided: there are infinitely
differentiable functions whose Taylor remainders totally fail to tend to zero and
which, therefore, are very different from the sum of their Taylor series!)
2 that is, every function that can be repeatedly differentiated as often as you wish
What we now seek is a converse, to the effect that every convergent power series
gives, as its limit, an infinitely differentiable function. (The only proviso needed
this time is that the radius of convergence shall be greater than zero, and this is
obviously necessary since a series whose radius of convergence was exactly zero
would define a sum function only at x = 0, so the issue of differentiability would
simply not arise.) We started this investigation back in Chapter 14, from which the
following revision material is taken:
Given that the power series
a0 + a1 x + a2 x2 + a3 x3 + · · · + an xn + · · ·
has radius of convergence D > 0 and therefore converges (absolutely) to a sum

f (x) at every point x of the open interval (−D, D), we can create another series,
called its derived series, by mindlessly differentiating each individual term in the
preceding one:
a1 + 2a2 x + 3a3 x2 + · · · + nan xn−1 + · · ·
and, continuing, another second derived series
2a2 + 6a3 x + · · · + n(n − 1)an xn−2 + · · ·
(and more such series, if the need arises) by exactly the same mechanical process.
We showed in Chapter 14 that all of these series also converge (absolutely) at every
point of (−D, D), and we expressed the aspiration that the sum function of the
derived series ‘ought’ to be the derivative f (x) of f (x). It is time to show that this
is not mere wishful thinking.
16.4.1 Theorem: power series can be differentiated term–by–term
Proof
This is, by a good margin, the biggest and most demanding proof in the entire
text, so
• we shall split it down into (hopefully) more comprehensible chunks,
• do not worry unduly if you don’t fully understand it,
• do pay careful attention to what the theorem says, which is both useful and
natural, even if you decide to shelve the proof till later,
• do read through it carefully at some stage, because understanding why a result
works gives a deeper understanding of what it can do.
STEP 1: CLARIFY THE TASK

We fix our attention at a typical point x0 in (−D, D) and we choose r so that
|x0 | < r < D. (See the comments after the proof as to why it is important to do this.)
With
f (x) = a0 + a1 x + a2 x2 + a3 x3 + · · · + an xn + · · ·
and, say,
g(x) = a1 + 2a2 x + 3a3 x2 + · · · + nan xn−1 + · · ·
we need to prove that the limit, as h → 0, of the expression
f (x0 + h) − f (x0 )
E(h) = − g(x0 )
h
is zero.
Let ε > 0 be given. We need to show that, for all sufficiently small (non-zero)
values of h, |E(h)| < ε.
STEP 2: BREAK THE TASK UP INTO MANAGEABLE PIECES
For each positive integer N we need to think about the N th partial sum of what
is happening here, so let
N ∞
SN (x) = ak xk , TN (x) = ak xk
0 N+1
and use this notation to decompose E(h) into three fragments (with the intention
of handling each separately):
E(h) = E1 (h) + E2 + E3 (h)
where
SN (x0 + h) − SN (x0 )
E1 (h) = − SN (x0 ), E2 = SN (x0 ) − g(x0 ), and
h
TN (x0 + h) − TN (x0 )
E3 (h) = .
h
Fortunately, two of these three pieces are pretty easy to deal with. After all, SN is
just a polynomial, and polynomials surely are differentiable, so the limit (as h → 0)
of E1 is zero; continuing, SN is the N th partial sum of the series that defines g, so
the limit of E2 is also zero. The sole abiding difficulty is E3 .
STEP 3: DEAL WITH THE REMAINING DIFFICULT CASE

At this point it will pay us to recall a (rather forgettable) factorisation trick from
basic algebra:
xk − yk = (x − y)(xk−1 + xk−2 y + xk−3 y2 + · · · + yk−1 ).
If, in this, you replace x by x0 + h and y by x0 , it yields
(x0 + h)k − x0k

= (x0 + h)k−1 + (x0 + h)k−2 x0 + (x0 + h)k−3 x02 + · · · + x0k−1
h
so, if we can arrange that both x0 and x0 + h have modulus less than r:
(x0 + h)k − x0k

< rk−1 + rk−1 + rk−1 + · · · + rk−1 = krk−1
h
(using the triangle inequality yet again). Now we are fully prepared to look more
closely at E3 :
∞ ∞
1
E3 (h) = ak (x0 + h)k − ak x0k
h
N+1 N+1
∞
1
= ak {(x0 + h)k − x0k } ,
h
N+1
therefore
∞ ∞
1
|E3 (h)| = ak {(x0 + h)k − x0k } ≤ |ak | × krk−1 .
h
N+1 N+1
The reason why this is good news is that ∞ 0 kak r

k−1 converges absolutely (to g(r))
so the last summation on the line above is a tail of a convergent series, and therefore
must tend to zero (see 7.3.20) as N → ∞.
STEP 4: PUT THE PIECES TOGETHER (IN THE RIGHT ORDER)
Remember that ε > 0 was given some time ago. How small must we make
h in order that all the pieces of this jigsaw-puzzle of a proof shall come
together?
1. Because SN (x0 ) is the N th partial sum of the series whose limit (by definition)
is g(x0 ), there is a positive integer N0 such that, for every N ≥ N0 , we
will get
ε
|E2 | = |SN (x0 ) − g(x0 )| < .
3
2. Because ∞ 0 kak r
k−1 converges absolutely, there is a positive integer N such
2
that, for every N ≥ N2 , the ‘remainder’ of the (modulussed) series
∞ k−1 will be less than ε .
N+1 |ak | × kr 3
3. Choose now and fix a value of N that is bigger than each of N1 , N2 .
4. Because SN is merely a differentiable polynomial, the limit of E1 is zero. Hence
there is δ1 > 0 such that 0 < |h| < δ1 will guarantee that
ε
|E1 (h)| < .
3
5. Because |x0 | < r < D, if we pick δ2 = r − |x0 | then 0 < |h| < δ2 will
guarantee that (not only |x0 | but also) |x0 + h| will be less than r: in
consequence of which, the first round of estimates in STEP 3 will work.
6. So now the second round of estimates in STEP 3 gives
∞
|E3 (h)| ≤ |ak | × krk−1
N+1
from which (2) gives

ε
|E3 (h)| < .
3
Define δ = min{δ1 , δ2 } > 0. Then, provided only that 0 < |h| < δ, we get
|E(h)| = |E1 (h) + E2 + E3 (h)| ≤ |E1 (h)| + |E2 | + |E3 (h)|

ε ε ε
< + + = ε.
3 3 3
16.4.2 Comments Why did we select a number r for which |x0 | < r < D?
• The superficial reason is that bad things can happen to power series at ±D, that
is, at the endpoints of the interval whose ‘radius’ is the radius of convergence of
the series. By selecting such an r and then only working inside [−r, r], we were
keeping these dangers at a small but safe distance away from our calculations.
• More precisely, the estimates taking place in STEP 3 were all rounded up, so to
speak, to the behaviour of the derived power series at the single point r. If the
power series had not converged at r, these estimates would have collapsed.
There was no guarantee that the derived series would converge at D itself: we
needed a point less than D upon which to hang these estimates.
• Furthermore, that same rounding-up process by which we estimated our
various power series by the absolutely convergent derived series at r shows that
they are all absolutely convergent also: and this we needed in order to be able to
rearrange them, which we did extensively in breaking up E(h) into
individually-handled fragments. Such a radical reorganisation of the terms of
those infinite series would have been illegal without a guarantee of their
absolute convergence.
.........................................................................
17 Riemann integration —
area under a graph
.........................................................................
17.1 Introduction
The words integrate, integral, integration occur very often in later school
mathematics with what seem to be, at first sight, two entirely different meanings.
Let us begin by reviewing the sorts of pre-university questions in this area that you
have certainly encountered many times before.
17.1.1 Example A To find an (indefinite) integral of the function f (x) = x cos x

(with respect to x).
Interpretation
This means: find, by any means whatsoever (not excluding trial and error) another
function g(x) whose derivative g (x) is exactly the given f (x). (This is sometimes
described informally as un-differentiation of f (x).)
Solution
You probably know tricks such as ‘integration by parts’ for tackling this sort of
question (and if so, just use them), but don’t undervalue trial and error either.
The presence of cos suggests that sin ought to be part of the answer, so x sin x is a
reasonable first guess. Differentiate (via the product rule) and see if we are close:
d
{x sin x} = x(sin x) + (sin x)(x) = x cos x + sin x.
dx
Not bad, but we need to get rid of that last sin x. Of course, the derivative of cos x
is −sin x which would cancel it out, so . . . a second guess:
d
{x sin x + cos x} = x(sin x) + (sin x)(x) − sin x
dx
= x cos x + sin x − sin x = x cos x.
Success. An answer is g(x) = x sin x + cos x. Indeed, for any constant C that you
care to select, another answer is g(x) = x sin x + cos x + C since added-in constants
disappear under differentiation.
294 17 RIEMANN INTEGRATION — AREA UNDER A GRAPH
17.1.2 Example B To find the (definite) integral from x = 1 to x = 6 of the

function f described by
f (x) = x − 1 while 1 ≤ x ≤ 3; f (x) = 2 while 3 < x ≤ 6.
Interpretation
This means: find the area of the region of the coordinate plane lying under the
graph of f , above the horizontal axis, and between the vertical lines x = 1 and x = 6.
Solution
Even a very rough sketch graph will clarify what this region is: it consists of a right-
angled triangle with vertices at (1, 0), (3, 0), (3, 2) and a rectangle with vertices at
(3, 0), (6, 0), (6, 2), (3, 2). Primary school mathematics is perfectly good enough to
identify its area as 2 + 6 = 8.
(Incidentally, the terms definite and indefinite are often left out, and so may be
the phrase with respect to x provided that there are no other variables in play.)
Now for an exercise that brings the two ideas together:
17.1.3 Example C To determine the area of the region lying under the graph of
f (x) = x cos x, above the x-axis and between x = 0 and x = π4 .
Procedure
• First, find an indefinite integral g of f . We’ll lift our earlier answer from
Example A: g(x) = x sin x + cos x.
• Second, calculate the change in value of g from x = 0 to x = π4 ; this is often
π
x=
denoted by [g(x)]x = 04 :
x= π π π π π
[g(x)]x = 04 = g − g(0) = sin + cos − 0 sin(0) − cos(0)
4 4√ 4√ 4
π 2+4 2−8
= .
8
• Answer: the area is √ √

π 2+4 2−8
.
8
17.1.4 Remarks Why we should expect that procedure to give the right answer
is very hard to explain in a school classroom, no matter how easy to apply and
immediately useful the result itself is. Part of our brief in this chapter is to present
evidence of its reliability. Before that, however, there are several other issues to
address. What exactly does ‘area under a graph’ mean? Does it have a logically
watertight meaning – a definition that agrees with common sense but does not
depend upon it – for the graph of every function? If not, for which ones? Can
17.2 RIEMANN INTEGRABILITY 295
we expect to be able to un-differentiate any given formula? How, if at all, can we

proceed with a function if it is not possible to un-differentiate it? Questions such
as these are normally side-stepped in school mathematics; over the next dozen or
so pages you will see why.
17.2 Riemann integrability — how closely can

rectangles approximate areas under graphs?
Throughout the next several paragraphs, f : [a, b] → R will be a given bounded
function on a closed bounded interval. We imagine the graph of a typical such
function:
f (x)
a b
graph of f (x)
17.2.1 Definition: partition
1. A partition of [a, b] is a finite set of elements in [a, b] that includes both the
endpoints a and b. Since is finite, we can always label its elements as
x0 , x1 , x2 · · · , xn in such a way that
a = x0 < x1 < x2 < · · · < xn−1 < xn = b
and by default we shall always assume that this has been done. The effect of the
partition is to divide up [a, b] into a list of subintervals
[x0 , x1 ], [x1 , x2 ], [x2 , x3 ], · · · , [xk , xk+1 ], · · · , [xn−1 , xn ]
that overlap only at the shared endpoints.

2. If two partitions 1 , 2 (of the same interval [a, b]) satisfy 1 ⊆ 2 , we call
2 a refinement of 1 , and say that 2 is finer than 1 . Since both of these
sets are finite, it is always legitimate to regard a refinement of 1 as having
been created by adding in the extra points one at a time, breaking up just one
of the subintervals into two sub-subintervals each time.
3. Of two arbitrary partitions 1 , 2 it is by no means guaranteed that one of
them is a refinement of the other. For instance, consider the partitions
{0, 0.5, 0.75, 1} and {0, 0.25, 0.5, 1} of [0, 1].
4. If 1 , 2 are any two partitions of [a, b] then their union 3 = 1 ∪ 2 is a
partition also, and it is a refinement of each of them.
17.2.2 Definition: upper and lower Riemann sums
1. Given a partition = {a = x0 < x1 < x2 < · · · < xn−1 < xn = b} of [a, b],
since f is bounded on the whole of [a, b], it is certainly bounded on each of the
subintervals [x0 , x1 ], [x1 , x2 ], [x2 , x3 ], · · · , [xk , xk+1 ], · · · , [xn−1 , xn ] and
possesses an infimum and a supremum on each of them. Put
mk = inf{f (x) : x ∈ [xk , xk+1 ]}, Mk = sup{f (x) : x ∈ [xk , xk+1 ]}
for each k = 0, 1, 2, · · · , n − 1.
2. Notice that if f happens to have a smallest and/or a biggest value over
[xk , xk+1 ] then mk and/or Mk will be these values. In particular, this will
happen in cases where f is continuous and in cases where f is either increasing
or decreasing, and when it happens it simplifies the rest of the argument.
However, it fails to happen with many of the functions that we need to
consider.
f (x)
Mk
mk
xk xk+1
The sup and inf of f over a partition subinterval
3. The lower Riemann sum and the upper Riemann sum (for this function, this
interval and this partition) are

n−1
L( f , [a, b], ) = mk (xk+1 − xk );
k=0

n−1
U( f , [a, b], ) = Mk (xk+1 − xk ).
k=0
Notice that L( f , [a, b], ) ≤ U( f , [a, b], ) just because mk ≤ Mk for all
relevant k.
f (x)
a b
A lower Riemann sum for f
f (x)
a b
An upper Riemann sum for f
These sums have natural geometric interpretations (at least, in the case where
f (x) > 0 always) as the area of the largest ‘histogram’ figure (placed upon the
subintervals of the partition) that definitely fits under the graph, and the area
of the smallest ‘histogram’ figure (placed upon the subintervals of the
partition) that definitely fits over the graph. At this point, common sense
suggests that, whatever we eventually define the area under the graph to mean,
it should lie somewhere between these under- and overestimates.
4. In any extended argument in which f or [a, b] doesn’t change, it is normal
practice to stop labelling the lower and upper sums with them: so that
L([a, b], ), U([a, b], ) or L( f , ), U( f , ) or L(), U()
become legitimate abbreviations. Likewise, in a discussion involving several

functions but only one interval and only one partition, shorthand symbols
such as
L( f ), L(g), L(h), · · · U( f ), U(g), U(h) · · ·
are perfectly acceptable.
5. Informally, our overall position is this. There may be (indeed, there are) both
practical and philosophical difficulties involved in being clear about the area of
a region that has one or more curved or discontinuous edges . . . but there
certainly aren’t any about rectangles, nor about unions of rectangles that don’t
overlap except along their edges. The ‘histogram’ trick allows us to find such a
figure that approximates the region (whose area we would like to define) from
beneath, and another that does so from above, using the underlying partition
as a control of how fine the approximation is. As we move through successively
finer and finer underlying partitions, we can expect that the approximations
ought to get better and better (based on the rather imprecise insight that
opting for narrower and narrower rectangles should allow us to reduce the size
of the errors). This suggests that the area could be defined as the limit of one,
or perhaps both, of these streams of approximations as the partition tends
to . . . and there is the difficulty: the partition is not an integer (so we are
not grappling with the limit of a sequence of approximations) nor is it a real
number that we can allow to tend to infinity and calculate the limit of an
approximation function A(x) as x → ∞. Though this argument clearly points
to a limit of some kind, it is not a type of limit that we have met previously,
and we must take care not to assume that it works in precisely the same way
as our experience of sequence and real-function limits says it should.
17.2.3 Lemma If we add one extra point to a partition, it does not decrease the
lower Riemann sum nor increase the upper Riemann sum. That is, if + is
together with one extra point y where, say, xk < y < xk+1 , then
L( f , [a, b], ) ≤ L( f , [a, b], + ) and U( f , [a, b], ) ≥ U( f , [a, b], + ).
Proof
The change from L( f , [a, b], ) to L( f , [a, b], + ) only alters the individual sum-
mand mk (xk+1 − xk ) to m (y − xk ) + m (xk+1 − y) where m , m are the infima of f
on the sub-subintervals [xk , y] and [y, xk+1 ]. Since, clearly, m ≥ mk and m ≥ mk ,
this is a nett increase or, more precisely, cannot be a nett decrease.
A similar analysis deals with the upper sum.
17.2.4 Improved lemma If is a refinement of , then
L( f , [a, b], ) ≤ L( f , [a, b], ) and U( f , [a, b], ) ≥ U( f , [a, b], ).
Proof
We can evolve from to in stages, adding one new point at a time. At each
stage, the first lemma tells us that the lower sum increases (or stays still) and that
the upper sum decreases (or stays still). Hence the final result.
17.2.5 Proposition Any lower sum is ≤ any upper sum.
Proof
What this says is that if 1 and 2 are any partitions at all of [a, b], then
L(1 ) ≤ U(2 ). The proof consists of recalling that 3 = 1 ∪ 2 is a refinement
of each of the given partitions, and then applying the Improved Lemma:
L(1 ) ≤ L(3 ) ≤ U(3 ) ≤ U(2 ).
17.2.6 Definition
1. Let us put A = {L( f , [a, b], ) : all partitions of [a, b]}. From the
proposition, the non-empty set A of real numbers is bounded above by any
U( f , [a, b], ) and therefore has a supremum, a least upper bound
(write it temporarily as J − ) such that J − ≤ U( f , [a, b], ) for every
partition .
In turn, the set B = {U( f , [a, b], ) : all partitions of [a, b]} is a
non-empty set of real numbers bounded below by J − , so it has an infimum, a
greatest lower bound J + such that J − ≤ J + .
2. The numbers J − and J + are called the lower Riemann integral and the
upper
Riemann integral of f over [a, b], and are more usually denoted by f

and f .
3. What we know so far is that, for any partition :

L( f , [a, b], ) ≤ f ≤ f ≤ U( f , [a, b], ).

4. We call f Riemann integrable over [a, b] if f = f , and in that case, their

common value is called the Riemann integral of f over [a, b] and written as f .
b b
Sometimes, a more elaborated symbol such as a f or R f or a f (x) dx will
be employed if we feel a need to stress which interval we are operating over, or
which procedure (for there are others beyond Riemann’s) we are using, or
which variable is associated with the horizontal axis.
5. Since Riemann’s is the only integration procedure (apart from naïve
un-differentiation) being discussed in this chapter, we shall feel free to
abbreviate Riemann integable to integrable, Riemann integral to integral, and
upper or lower Riemann sum to upper or lower sum when it makes the text
easier to read.
We have by now achieved a logically reliable definition of what the integral of a

bounded function over a closed bounded interval is, and when it exists, that does
not depend on intuiting areas. Unfortunately, this definition is cumbersome and
time-consuming to use, even for quite simple functions.
17.2.7 Example To show that a constant function is Riemann integrable over any
closed bounded interval, and to evaluate the integral.
Solution
We consider f : [a, b] → R given by f (x) = C, x ∈ [a, b] where C is a constant. In
this case, for any partition
= {a = x0 < x1 < x2 < · · · < xn−1 < xn = b},
we see that mk = Mk = C for every k, so

n−1
n−1
L( f , [a, b], ) = C(xk+1 − xk ) = C (xk+1 − xk ) = C(b − a)
0 0
and, likewise, U( f , [a, b], ) = C(b − a). The sets we denoted by A and B in the
definition paragraph above each consists of the single number C(b − a), so it is

entirely trivial to determine inf and sup for them: f = C(b−a) and f = C(b−a).
Therefore f is integrable, and its integral over [a, b] is C(b − a).
17.2.8 Example Given b > 0, and f defined by f (x) = x on the interval [0, b], to
b2
show that R f exists and equals .
2
Solution
For any partition
= {0 = x0 < x1 < x2 < · · · < xn−1 < xn = b}
of [0, b], the fact that f is increasing tells us that, on the typical subinterval
[xk , xk+1 ], f (xk ) = xk is the least value of f and f (xk+1 ) = xk+1 is the greatest value
of f : that is, mk = xk and Mk = xk+1 . Thus, typical lower and upper sums take
the form

n−1
n−1
xk (xk+1 − xk ), xk+1 (xk+1 − xk ).
0 0
Consider now the special-case partition n all of whose subintervals have the
same length b/n (where n is a particular positive integer). In this case, xk is simply
kb/n for each k, so the lower and upper sums become much more accessible to
calculation:

n−1
n−1
L( f , [0, b], n ) = xk (xk+1 − xk ) = (kb/n)((k + 1)b/n − kb/n)
0 0

n−1
n−1
= (kb/n)(b/n) = (b2 /n2 ) k = (b2 /n2 )(n(n − 1))/2
0 0

n−1 b2 1
= b2 = 1−
2n 2 n
b2
and this is an increasing sequence with limit (and therefore supremum) . Of
2
course, the n s are only some of the possible partitions, so it is imaginable that the
supremum of all their lower sums might be different from the supremum of this
b2
sample. Yet it certainly cannot be smaller:1 so we learn that f ≥ .
2
1 If ∅ = A1 ⊆ A2 where A2 is a bounded-above set of real numbers, then sup A1 ≤ sup A2 .

Likewise for the upper sums:

n−1
n−1
U( f , [0, b], n ) = xk+1 (xk+1 − xk ) = ((k + 1)b/n)((k + 1)b/n − kb/n)
0 0

n−1
n−1
2 2
= ((k + 1)b/n)(b/n) = (b /n ) (k + 1) = (b2 /n2 )(n(n + 1))/2
0 0

b2 n+1 b2 1
= = 1+
2 n 2 n
b2
which is a decreasing sequence with limit (and therefore infimum) . The
2
infimum of all the upper sums might conceivably differ from the infimum of this
b2
special sample, but it cannot be greater: therefore f ≤ .
2
b2
Bearing in mind that f ≤ f in all cases, we now deduce that f = f = ,
2
as expected.
17.2.9 Remark It took us a full typed page of calculations to establish the integral
of the absurdly simple function f (x) = x. Your rational, entirely legitimate response
to that observation should be one of bitter disappointment, combined with a
determination to get access to better methods as soon as possible. The first step
in that direction is the following lemma, strongly reminiscent of the Cauchy
condition’s ability to detect the existence of a limit (for a sequence) without any
need to know what number that limit might turn out to be.
17.2.10 Darboux’s (or Riemann’s) integrability criterion The function f is Rie-

mann integrable over [a, b] if and only if, for each ε > 0, there is a partition ε of
[a, b] for which
U( f , [a, b], ε ) − L( f , [a, b], ε ) < ε.
Proof
If, for each ε > 0, such a partition exists, then our inequality

L( f , [a, b], ) ≤ f ≤ f ≤ U( f , [a, b], ) for all partitions

says that f and f differ, if at all, by less than ε. Since the lower and upper
integrals do not depend on ε, that can only happen if they are equal. Hence f is
integrable.

Conversely, suppose that f is integrable. Then J = f = f is both the supremum
of the lower sums and the infimum of the upper sums. Given ε > 0, it is therefore
possible to find a partition 1 with L(1 ) > J − ε2 , and a partition 2 with
U(2 ) < J + ε2 . Put 3 = 1 ∪ 2 :
ε ε
J− < L(1 ) ≤ L(3 ) ≤ U(3 ) ≤ U(2 ) < J + .
2 2
Therefore the gap between L(3 ) and U(3 ) is smaller than ε, and the Darboux
criterion holds.
f (x)
a b
Darboux: U(f, Δ) − L( f, Δ) is shaded
Although the criterion seems to concern only the existence of the integral of f ,
its numerical value quite often emerges from the same calculations. Note that:
17.2.11 Corollary If f is Riemann integrable over [a, b], then its integral is the
unique number J such that
L( f , [a, b], ) ≤ J ≤ U( f , [a, b], ) for all partitions .
Proof
The integral certainly does lie between every lower sum and every upper sum, by its
definition. If it were not unique in this respect, there would have to be two distinct
numbers J1 and J2 with
L( f , [a, b], ) ≤ J1 < J2 ≤ U( f , [a, b], ) for all partitions .
Yet this implies U() − L() ≥ ε = J2 − J1 > 0 for every partition , in contra-
diction to the Darboux criterion.
As a first indication of the usefulness of the Darboux test, here is a worked
example of a function that is not Riemann integrable. Given how labour intensive
it seems to be to show that a simple function is integrable, you may be surprised

(or even irritated) to see how straightforward this is.
17.2.12 Example We define a function f : [0, 5] → R by f (x) = 2 if x is rational,

f (x) = 3 if x is irrational. Show that f is not Riemann integrable over [0, 5].
Solution
Let be absolutely any partition of [0, 5]. Since, in every interval, there are both
rationals and irrationals, f will take both the value 2 and the value 3 somewhere
in each of the subintervals [xk , xk+1 ] into which carves up [0, 5]. So (for each k)
mk = 2 and Mk = 3. Hence2

L( f , ) = 2(xk+1 − xk ) = 2 (xk+1 − xk ) = 2(5) = 10.
Likewise, U( f , ) = 15. So the supremum of ‘all’ the lower sums is just the sup of
the one single number 10, namely 10. Likewise, the infimum of ‘all’ the upper sums
is 15.
We conclude that no partition can force U( f , ) − L( f , ) to be smaller than
15−10 = 5, so Darboux alerts us that this function is not integrable. (Alternatively:

we have just shown that f = 10 = 15 = f , so f does not exist via the definition.)
Clearly, the choice of the numbers 0, 5, 2 and 3 has no real bearing on the
outcome: a function that takes a constant value on the rationals and a different
constant value on the irrationals is not integrable over any non-degenerate interval.
Here is another worked example illustrating how we can use Darboux plus its
corollary (17.2.11) both to guarantee the existence of a Riemann integral and to
determine its numerical value.
17.2.13 Example Determine the Riemann integral of the function f : [1, 4] → R

described by
⎧
⎪
⎨0 if 1 ≤ x < 2,
f (x) = 7 if x = 2,
⎪
⎩
0 if 2 < x ≤ 4.
Solution
For any positive h < 1 let us consider the partition h = {1, 2 − h, 2 + h, 4} (whose
intention is to isolate the somewhat anomalous value x = 2 of the domain). We see
that L( f , h ) = 0 and that U( f , h ) = 14h.
ε
Firstly, given ε > 0, if we choose h to be, say, min 12 , 15 , then L( f , h ) and
4 f , h ) differ by less than ε, so Darboux guarantees that the Riemann integral
U(
1 f exists.

2 Note that (xk+1 − xk ) is always the total length of all the subintervals, that is, the length of
the whole interval in question.
17.3 THE INTEGRAL THEOREMS WE OUGHT TO EXPECT 305
Secondly, now that the integral is known to exist, Corollary 17.2.11 observes that
it is the unique number that lies between L( f , ) and U( f , ) for every partition
. Since 0 is the only number lying between L( f , h ) = 0 and U( f , h ) = 14h
(for every h between 0 and 1) even for the particular partitions h that we have
4
examined, 1 f can only be zero.
17.3 The integral theorems we ought to expect

As a second (and much larger) illustration of how we can use Darboux’s criterion,
here are a few of the several integration theorems that your pre-university experi-
ence probably leads you to expect to be true for the Riemann integral:
17.3.1 Theorem: integral addition over back–to–back intervals If f is (Rie-

mann) integrable over [a, b] and over [b, c], where a 0, use Darboux to find a partition 1 of [a, b] such that
ε
U([a, b], 1 ) − L([a, b], 1 ) < .
2
Likewise, use Darboux again to find a partition 2 of [b, c] such that
ε
U([b, c], 2 ) − L([b, c], 2 ) < .
2
Now 3 = 1 ∪ 2 is a partition of [a, c] and, purely by the definitions,
L([a, c], 3 ) = L([a, b], 1 ) + L([b, c], 2 )
and
U([a, c], 3 ) = U([a, b], 1 ) + U([b, c], 2 ).
It follows that
U([a, c], 3 ) − L([a, c], 3 ) =

= U([a, b], 1 ) − L([a, b], 1 ) + U([b, c], 2 ) − L([b, c], 2 )
ε ε
< + = ε.
2 2
c
At this point, the Darboux test tells us that a f exists.
Secondly, we know that

b c
L([a, b], 1 ) ≤ f ≤ U([a, b], 1 ), L([b, c], 2 ) ≤ f ≤ U([b, c], 2 ).
a b
Adding these, we get

b c
L([a, b], 1 ) + L([b, c], 2 ) ≤ f+ f ≤ U([a, b], 1 ) + U([b, c], 2 )
a b
that is, in the light of our comments above,

b c
L([a, c], 3 ) ≤ f+ f ≤ U([a, c], 3 ).
a b
Also, of course, c
L([a, c], 3 ) ≤ f ≤ U([a, c], 3 ).
a
b c c
Comparing these two displays, we see that a f + b f and a f differ, if at all, by no
more than L([a, c], 3 ) and U([a, c], 3 ) differ: that is, by less than ε. Since they
are independent of ε, which is arbitrary – and could therefore be made arbitrarily
small – that can be true only if they are exactly equal.
It is very convenient to be able to jettison the requirement a < b < c from this
result, and a simple notational convention allows this to happen:
17.3.2 Definition We define (for arbitrary a ∈ R)

a
f =0
a
and, whenever b < a, we define

b a
f=− f.
a b
The effect of this (rather artificial seeming) convention is that the equality
c b c
f= f+ f.
a a b
now becomes true no matter how the three numbers a, b, c are arranged on the
real line (always provided, of course, that the two integrals on the right-hand side
b a
do exist). For instance, if a = c, the equality merely says that 0 = a + b , that
b a
is, that a = − b , which is correct by the convention. Again, if a < c < b then
c b c c b b
the equality a = a + b decodes under the convention as a = a − c , that
b c b
is, a = a + c which is the originally established version of the theorem when
the limits of integration occur in that order.
There is also a valid converse to 17.3.1: if a < b < c and f is integrable over [a, c],
then it is necessarily integrable also over [a, b] and over [b, c]. In fact, with really
no additional work we can prove something slightly more general:
17.3.3 Theorem: integrability over a subinterval Suppose that f is integrable

over [a, b] and that a ≤ c < d ≤ b; then f is integrable over [c, d].
Proof
Given ε > 0, first use Darboux to find a partition of [a, b] for which
U( f , [a, b], ) − L( f , [a, b], ) < ε. Of course, it is perfectly possible that will
not include the points c and d . . . but if we refine by adding them in, the lower
sum will increase or stay still, and the upper sum will decrease or stay still: so, after
the refinement (if necessary) the gap between U(. . .) and L(. . .) will still be less
than ε. For that reason, we may as well assume that this has been done already,
and that does include both c and d.
With that understanding, = ∩ [c, d] is now a partition of [c, d], and
U( f , [c, d], ) − L( f , [c, d], ) is just the sum of those expressions
(Mk − mk )(xk+1 − xk )
for which the subinterval [xk , xk+1 ] happens to lie within [c, d].
f (x)
a c d b
Darboux: over subinterval [c, d] and over whole interval [a, b]
It is therefore ≤ the total of all such expressions across the whole of [a, b]; that is:
U( f , [c, d], ) − L( f , [c, d], ) ≤ U( f , [a, b], ) − L( f , [a, b], ) < ε.
By Darboux, f is integrable over [c, d].
17.3.4 Theorem: integral addition of functions If f and g are each (Riemann)

integrable over [a, b] then so is f + g, and
b b b
( f + g) = f+ g.
a a a
Proof
Given ε > 0 we use Darboux (twice) to find partitions 1 , 2 of [a, b] such that
ε ε
U( f , 1 ) − L( f , 1 ) < , U(g, 2 ) − L(g, 2 ) < .
2 2
Putting 3 = 1 ∪ 2 , it follows that
ε ε
U( f , 3 ) − L( f , 3 ) < , U(g, 3 ) − L(g, 3 ) < .
2 2
Thinking back to the definitions of mk and Mk , and necessarily now enhancing that
notation to refer explicitly to the function it concerns, we have
mk ( f + g) = inf{f (x) + g(x) : xk ≤ x ≤ xk+1 }

≥ inf{f (x)) : xk ≤ x ≤ xk+1 } + inf{g(x)) : xk ≤ x ≤ xk+1 }
= mk ( f ) + mk (g)
for each k = 0, 1, 2, · · · , n − 1 and, likewise
Mk ( f + g) ≤ Mk ( f ) + Mk (g).
Therefore

n−1
L( f + g, 3 ) = (mk ( f + g))(xk+1 − xk )
0

n−1
≥ (mk ( f ) + mk (g))(xk+1 − xk )
0

n−1
n−1
= (mk ( f ))(xk+1 − xk ) + (mk (g))(xk+1 − xk )
0 0
= L( f , 3 ) + L(g, 3 ),
and

n−1
U( f + g, 3 ) = (Mk ( f + g))(xk+1 − xk )
0

n−1
≤ (Mk ( f ) + Mk (g))(xk+1 − xk )
0

n−1
n−1
= (Mk ( f ))(xk+1 − xk ) + (Mk (g))(xk+1 − xk )
0 0
= U( f , 3 ) + U(g, 3 ).
From these, we get
U( f + g, 3 ) − L( f + g, 3 ) ≤ U( f , 3 ) + U(g, 3 ) − L( f , 3 ) − L(g, 3 )
ε ε
= U( f , 3 ) − L( f , 3 ) + U(g, 3 ) − L(g, 3 ) < + = ε,
2 2

and it follows from Darboux that ( f + g) exists.
Secondly, we know that

L( f , 3 ) ≤ f ≤ U( f , 3 ) and L(g, 3 ) ≤ g ≤ U(g, 3 )
so, adding,

L( f , 3 ) + L(g, 3 ) ≤ f+ g ≤ U( f , 3 ) + U(g, 3 )........(1)
On the other hand, we learned above that

L( f , 3 ) + L(g, 3 ) ≤ L( f + g, 3 ) ≤ ( f + g)
≤ U( f + g, 3 ) ≤ U( f , 3 ) + U(g, 3 )..............(2)
Since the gap between L( f , 3 )+L(g, 3 ) and U( f , 3 )+U(g, 3 ) is smaller

than

ε, we see from (1) and (2) that the difference (if there is one) between f + g
and ( f + g) must also be smaller than ε.
Because the integral expressions are independent of ε, which is arbitrary, this
can only be true if they are exactly equal.
17.3.5 Theorem: integral of a scaled function For any constant C, if f is inte-

grable over [a, b] then so is Cf , and

Cf = C f.
Proof
ε
• Case 1: C > 0. Given ε > 0, apply Darboux to f with the tolerance : there is a
C
ε
partition of [a, b] such that U( f , ) − L( f , ) < . Now (for this same
C
partition)
mk (Cf ) = inf{Cf (x) : xk ≤ x ≤ xk+1 }

= C inf{f (x) : xk ≤ x ≤ xk+1 } = Cmk ( f )
and likewise Mk (Cf ) = CMk ( f ),
so

L(Cf , ) = mk (Cf )(xk+1 − xk ) = C mk ( f )(xk+1 − xk ) = CL( f , )
and likewise U(Cf , ) = CU( f , ).
ε
Therefore U(Cf , ) − L(Cf , ) = C(U( f , ) − L( f , )) < C = ε,
C
showing via Darboux that Cf is integrable.
Secondly, now that we know Cf exists,

L(Cf , ) ≤ Cf ≤ U(Cf , ); ............(3)

but also L( f , )
≤ f ≤ U( f , ) so, multiplying across by C, we find that
CL( f , ) ≤ C f ≤ CU( f , ), that is,

L(Cf , ) ≤ C f ≤ U(Cf , )........(4)

Comparing
(3) and (4), we again find that the difference between C f and
Cf is less than arbitrary ε, so the two expressions must coincide.
• Case 2: C = 0. Here, Cf is a constantly zero function, whose integral is zero, so
the result is entirely trivial.
• EXERCISE: check out the details of Case 3: C < 0. Be aware that scaling by a
negative number will swop over sups and infs: this time we have to anticipate
mk (Cf ) = CMk ( f ), L(Cf , ) = CU( f , ), and so on.
Here is perhaps the easiest of this set of theorems to believe and to prove:
17.3.6 Theorem: integrating across an inequality If f (x) ≤ g(x) for every

x ∈ [a, b], and f and g are integrable over [a, b], then

f ≤ g.
Proof
For any partition , any resulting subinterval [xk , xk+1 ] and any x in that subin-
terval, we know f (x) ≤ g(x). That feeds through the sups and infs to give us
mk ( f ) ≤ mk (g) and Mk ( f ) ≤ Mk (g), feeds through the formation of sums to give
us L( f , [a, b], ) ≤ L(g, [a, b], ) and U( f , [a, b], ) ≤ U(g, [a, b], ), and feeds

through more sups and infs to provide f ≤ g and f ≤ g. Since the functions
are integrable, that is equivalent to what we had to prove.
17.3.7 Corollary If K ≤ f (x) ≤ L for all x ∈ [a, b], where K and L are constants,
then
K(b − a) ≤ f ≤ L(b − a).
Proof
Immediate upon applying the theorem to f together with each of the constant
functions K and L (whose integrals we determined some time ago).
17.3.8 Theorem: integral of the modulus If f : [a, b] → R is integrable then so

is |f |, and

|f | ≥ f .
Proof
All we have to show is that |f | can be integrated: because then

(for all x) f (x) ≤ |f (x)| ⇒ f ≤ |f |, and

(for all x) − f (x) ≤ |f (x)| ⇒ − f = (−f ) ≤ |f |

by the previous theorem. Then | f | is either f or − f and, whichever of the
two it is, it is ≤ |f |.
Using Darboux again (and given ε > 0), the integrability of f says there is a
partition for which (in the usual notation)

U( f , ) − L( f , ) = (Mk ( f ) − mk ( f ))(xk+1 − xk ) < ε.
Consider the embedded expression Mk ( f ) − mk ( f ). If f happened to have

largest and smallest values f (x ) and f (x ) over the subinterval [xk , xk+1 ] then
that expression would have been f (x ) − f (x ), that is, the biggest difference-in-
value that two values of f could achieve in the subinterval. In general, of course,
‘biggest’ values may fail to happen, and we shall need to settle for the second-best
option, the supremum. What that indicates (and you can prove it formally without
much difficulty) is that Mk ( f )−mk ( f ) is the supremum of the differences-in-value
|f (t) − f (u)| as t and u vary across the subinterval [xk , xk+1 ].
Although that is a slightly awkward way to think about Mk ( f ) − mk ( f ) most of
the time, it is the optimal approach to take to it in this particular proof, because
the inverse triangle inequality:

|p| − |q| ≤ |p − q|, p, q ∈ R
gives us a neat way to connect these values for f and for |f |; look:

|f (t)| − |f (u)| ≤ |f (t) − f (u)|, all t, u in the subinterval.
Taking sups across that last line gives Mk (|f |) − mk (|f |) ≤ Mk ( f ) − mk ( f ), and
therefore

(Mk (|f |) − mk (|f |))(xk+1 − xk ) ≤ (Mk ( f ) − mk ( f ))(xk+1 − xk ),
that is,
U(|f |, ) − L(|f |, ) ≤ U( f , ) − L( f , ) < ε.
Therefore, via Darboux once more, |f | is indeed integrable.
The last instalment in this catalogue of expected theorems is the one that says
that the product of two integrable functions is integrable. It is quite intricate to
prove this directly, so we shall instead sneak up on it from behind using the
following lemma as cover:
17.3.9 Lemma: integrating the square of a function If f : [a, b] → R is inte-

grable (over [a, b]) then so is f 2 .
Proof
Think again about the quantity we denote by Mk −mk as being the supremum of all
the differences |f (t) − f (u)| as t and u vary over the kth subinterval of the partition.
We need to compare this quantity as calculated for f with the same quantity as
calculated for f 2 , and this – keeping in mind that f is bounded, so |f (x)| < K for
some constant K and for all x – turns out to be rather easy:
|f 2 (t) − f 2 (u)| = |( f (t) + f (u))( f (t) − f (u))| = |( f (t) + f (u))| |( f (t) − f (u))|
≤ (|( f (t)| + |f (u))|) |( f (t) − f (u))| ≤ 2K|( f (t) − f (u))|.
Taking sups across that line, we find that (for each relevant k):
Mk ( f 2 ) − mk ( f 2 ) ≤ 2K(Mk ( f ) − mk ( f ))
17.4 THE FUNDAMENTAL THEOREM OF CALCULUS 313
and, in consequence,
U( f 2 , ) − L( f 2 , ) ≤ 2K(U( f , ) − L( f , )).
Now, given ε > 0, Darboux says that (when f is integrable) there is a partition
ε
for which U( f , ) − L( f , ) < . Feed that into the previous line, and we get
2K
ε
U( f 2 , ) − L( f 2 , ) ≤ 2K = ε.
2K
Now a second mobilisation of Darboux shows that f 2 is also integrable.
17.3.10 Theorem: multiplying two integrable functions Suppose that f and g are
both integrable over [a, b]. Then so is their product fg.
Proof
Forget Darboux for once: this is just basic school algebra. For any x ∈ [a, b], and
abbreviating f (x) to f and g(x) to g in order to minimise the clutter:
( f + g)2 − ( f − g)2
= fg
4
and, from previous theorems, all of the following are integrable:
f + g, −g = (−1)g, f − g = f + (−g), ( f + g)2 , ( f − g)2 ,

−( f − g)2 = (−1)( f − g)2 , ( f + g)2 − ( f − g)2 = ( f + g)2 + (−( f − g)2 )
and, finally, so is
( f + g)2 − ( f − g)2
,
4
whence the result.
17.4 The fundamental theorem of calculus

Here we are, some twenty pages into the theory of the Riemann integral and, so
far, f (x) = x is the most complicated function that we have actually integrated. How
bad is that?3
The reality is that the Riemann definition of integral and integrability, successful
though it is in putting the intuitive idea of area-under-graph on a logically sound
basis, is extremely unwieldy on its own as a tool for actually calculating integrals.
3 Very.
However, the theorems you have seen developing from it over the last several pages
now allow us to construct a much more efficient and easier calculation method, not
for all integrable functions, but for a huge range of commonly occurring ones. More
specifically, we now set out to investigate whether the method shown in Example
C in the introduction to this chapter is valid for the Riemann integral. The label
‘the fundamental theorem of calculus’ is used to refer to both of the theorems in
this section (and, on occasions, to other similar results).
17.4.1 Theorem Given that f : [a, b] → R is integrable, we can4 define another

function F : [a, b] → R as follows:
x
F(x) = f , x ∈ [a, b].
a
Then:
1. F is continuous on [a, b].
2. If f is continuous at a point p of (a, b), then F is differentiable at p, and
F (p) = f (p).
3. If f is continuous on [a, b], then F = f everywhere in (a, b).
Proof
Since f is integrable, it must be bounded. Choose therefore a positive constant K
such that |f (x)| ≤ K always.
1. For any x ∈ [a, b) let positive h be small enough to ensure that y = x + h also
belongs to [a, b]. Then
x x+h
x+h

|F(x + h) − F(x)| = f− f = f
a a x
x+h
≤ |f |
x
x+h
≤ K = K((x + h) − x) = Kh.
x
Since the final item tends to zero as h → 0+ , we get the one-sided limit
limy → x+ F(y) = F(x).
Similarly, for any x ∈ (a, b], limy → x− F(y) = F(x).
The agreement of the two one-sided limits and of the value of F tells us that
F is continuous at each point of (a, b) (and we also got the correct one-sided
limits at a and at b, where it is only one-sided limits that are relevant). This
proves (1).
4 thanks to the theorem on integrability over subintervals

2. Again, let positive h be small enough to ensure that y = p + h also belongs to

[a, b], Then
F(p + h) − F(p) F(p + h) − F(p) − hf (p)

− f (p) =
h h
p+h p
f − a f − hf (p)
= a
h
p+h
p f − hf (p)
=
h
p+h p+h
p f − p f (p)
=
h
(Remember that f (p) is just a constant.)

p+h
p {f (x) − f (p)}
= .
h
(We’re writing in the (x) here just to stress that f varies across the interval of
integration, whereas f (p) does not.)
Next, take the modulus:

p+h
F(p + h) − F(p) p {f (x) − f (p)}
− f (p) =
h h

p+h
p |f (x) − f (p)|
≤
h
(Consider now the supremum of all the values of |f (x) − f (p)| as x varies over
[p, p + h]:)
p+h
p sup |f (x) − f (p)| h sup |f (x) − f (p)|
≤ =
h h
= sup |f (x) − f (p)|.
p≤x≤p+h
Now continuity of f tells us that this final supremum tends to zero as h → 0+ .

Thus we have a one-sided limit

F(p + h) − F(p)
lim − f (p) = 0,
h→0 + h
which is equivalent to
F(p + h) − F(p)
lim = f (p).
h → 0+ h
A very similar argument establishes the other one-sided limit:
F(p + h) − F(p)
lim = f (p)
h → 0− h
and completes the proof of part(2) since the two one-sided limits agree upon
the number f (p).
3. This is immediate from part (2).
17.4.2 Comments
• We actually proved a bit more than we claimed in part 2 above. Provided that
you interpret the phrase ‘F is differentiable on [a, b]’ to mean differentiability
on (a, b) plus existence of both
F(a + h) − F(a) F(b + h) − F(b)

lim and lim
h → 0+ h h → 0− h
then our proof established that F was differentiable on the closed interval [a, b]
rather than just on the open interval (a, b).
• Note that we just now showed that every continuous function is the derivative
of something. However, that is no guarantee that we can come up with a simple,
explicit formula for that something. Furthermore, the converse is false: there are
plenty of discontinuous functions that are derivatives of something.5
17.4.3 Theorem Given that f : [a, b] → R is continuous and integrable, suppose

that we can find a function G : [a, b] → R such that
• G is continuous on [a, b],
• G is differentiable on (a, b), and
• G = f on (a, b).
Then
f = [G(x)]xx = b
= a = G(b) − G(a).
5 For instance, differentiate f (x) = x2 sin(x−1 ) for x = 0, f (0) = 0, paying careful attention
to exactly what happens at the difficult point x = 0, and you should find that the derivative exists
at every point (including 0) but doesn’t even have a limit as x → 0.
Proof
With F as defined in the previous theorem, notice that F −G is continuous on [a, b]
and has zero derivative everywhere in (a, b). Therefore6 it is constant. In particular,
F(b) − G(b) = F(a) − G(a), which we rearrange into
F(b) − F(a) = G(b) − G(a).

b a
However, F(b) = a f and F(a) = a f = 0 by definition, so the previous display is
exactly what we had to prove.
17.4.4 Comment If T S Eliot is right when he says that . . . the end of all our
exploring will be to arrive where we started and know the place for the first time,
then that is just about the point we have reached in our brief exploration of the
integral as defined by the Riemann process: for the previous result now justifies
the way in which you have been calculating integrals up to now. In particular, the
phrase ‘suppose that we can find’ is, in a way, liberating: when you are looking
for an expression whose derivative is the given function f , you are free to use any
sixth-form tricks, any trial-and-error or sheer guesswork process, even mindlessly
looking up a cookbook table of standard derivatives and integrals, so long as you
check that the derivative of the thing you ‘found’ actually is f – and this check
is almost always a routine process since differentiating, unlike un-differentiating,
is normally a pretty algorithmic business. After that, the actual calculation of the
integral is just arithmetic.
17.4.5 Examples Assuming that the following expressions are integrable over the
indicated intervals, calculate their integrals. (You should assume, where appropri-
ate, basic properties of trig, logarithmic and exponential functions.)
1. f (x) = xn on [a, b] assuming that n = −1;
2. f (x) = x2 (5 + 2x3 )3/4 over the interval [0, 1];
3. f (x) = sin2 x cos3 x over [0, π/6];
4. f (x) = x2 e−x on [0, ln 2].
Solution
1 n+1
1. Since the derivative of n+1 x is exactly xn , the answer is
b
1 n+1 1
x = (bn+1 − an+1 ).
n+1 a n+1
2. By some process (perhaps the method called ‘change of variable’ or

2
‘substitution’) we find that G(x) = 21 (5 + 2x3 )7/4 does actually have G = f .
6 This is an easy consequence of the first mean value theorem.

Then the integral is
2 2 2 7 7

[G(x)]10 = G(1) − G(0) = (5 + 2)7/4 − (5)7/4 = 74 − 54 .
21 21 21
3. Change of variable and the black arts of trigonometry will provide one way to
stumble into the function G(x) = 13 sin3 x − 15 sin5 x, whose derivative is easily
seen to be equal to f (x). Thus the answer is
π/6
[G(x)]0 = G(π/6) − G(0)
1 1 11 1 1 17
= sin3 (π/6) − sin5 (π/6) − 0 = − = .
3 5 3 8 5 32 480
4. (Perhaps using integration by parts twice?) we come up with the function

G(x) = − e−x (x2 + 2x + 2) whose derivative is readily checked to equal f (x).
Therefore the answer is
[G(x)]ln 2
0 = G(ln 2) − G(0)
= −e− ln 2 ((ln 2)2 + 2(ln 2) + 2) − {−e−0 (02 + 2(0) + 2)}
1
= − ((ln 2)2 + 2(ln 2) + 2) + 2
2
1
= 1 − ((ln 2)2 + 2 ln 2).
2
17.4.6 Another comment There is a strong temptation to ignore or omit the

phrase ‘assuming that the following expressions are integrable’ from that group of
examples, probably due to all our pre-university experiences that, once you can un-
differentiate f (x), there is nothing left to stop you evaluating the integral of f . This
is, however, not true in general: there are functions for which you can successfully
un-differentiate in the way we have been describing, and which are, nevertheless,
not Riemann integrable!
A relatively easy example to show the truth of this (rather annoying) reality is as
follows. Consider the function g : [0, 1] → R defined by
g(x) = x2 sin(x−2 ) if x = 0, g(0) = 1.
It is easy to see that g is differentiable at each point of (0, 1] – just use the various
rules of differentiation to find g (x) there. With a bit more caution, you can also
verify that g (0) exists (and equals 0, if you’re curious about it). So the function g
upon the interval [0, 1] can certainly be un-differentiated. Yet, look at the formula
you get for g and you will see that it is unbounded, and therefore cannot be
Riemann integrated.
In fact, much stranger things than that can happen. It is possible to define a
function that is differentiable everywhere in R, and whose derivative is bounded
everywhere in R, and yet that derivative is not Riemann integrable over a closed
interval. The construction used in the definition is seriously sophisticated: search
for Volterra’s function if you have a lot of time and patience to spare.
For our purposes, the main point here is that un-differentiability is not a good
enough reason to assert that a function can be integrated. Our final major task
for this chapter is to identify a good range of functions (but by no means all) that
definitely can.
17.4.7 Theorem Every continuous function on a closed bounded interval is

Riemann integrable.
Proof
If f : [a, b] → R is continuous then, by the key theorem on uniform continuity, it
is also uniformly continuous there. Let ε > 0 be given.
By definition, we can find δ > 0 such that any two points of [a, b] that are less
than δ apart have f -values that are less than ε apart. Choose any partition of
[a, b] whose subintervals are each shorter than δ. On each subinterval [xk , xk+1 ] the
continuous function f has biggest and smallest values (which will be the numbers
Mk and mk used in the Riemann integral’s construction) and they will necessarily
differ by less than ε, that is:

(for each k) Mk − mk < ε, so U( f , ) − L( f , ) = (Mk − mk )(xk+1 − xk )

<ε (xk+1 − xk ) = ε(b − a).
ε
Go back over the last paragraph and re-run it with ε replaced by (this kind
b−a
of in-flight course correction in ‘epsilontics’ should be almost second nature to you
by now) and the revised conclusion U( f , ) − L( f , ) < ε shows via Darboux
that f is integrable.
Now we know that it is OK to dump the phrase ‘assuming that the following
expressions are integrable’ out of the last batch of examples, because all the
expressions there presented actually were (fairly obviously) continuous.
There are also many functions that are not continuous but are nevertheless
Riemann integrable. Here is one source:
17.4.8 Theorem Every monotonic function on a closed bounded interval is

Riemann integrable.
Proof
Suppose that f : [a, b] → R is monotonically increasing (the decreasing case is
very similar or, in that scenario, you could choose to look at −f which is then
increasing . . . ). If f is actually constant then we already know it is integrable, so
suppose not: that is, suppose f (a) < f (b). Also let ε > 0 be given, so that we are
ready to try Darboux again.
On each subinterval [xk , xk+1 ] created by an arbitrary partition , the increasing
function f has biggest value f (xk+1 ) and smallest value f (xk ) (which will be the
numbers Mk and mk ). So

U( f , )−L( f , ) = (Mk −mk )(xk+1 −xk ) = ( f (xk+1 −f (xk ))(xk+1 −xk ).
Pick a partition whose subintervals all have the same length (say, h) and we now get

U( f , ) − L( f , ) = ( f (xk+1 − f (xk ))(xk+1 − xk )

=h ( f (xk+1 − f (xk )) = h( f (b) − f (a)).
ε
Now reverse-engineer the last paragraph by choosing h = , and
2( f (b) − f (a))
we find that
U( f , ) − L( f , ) = h( f (b) − f (a))

ε ε
= ( f (b) − f (a)) = < ε.
2( f (b) − f (a)) 2
(Again, the in-flight correction of retro-fitting h to the developing requirements is

the sort of thing we all need to do when working through an unfamiliar question.)
Darboux is now content to deliver the conclusion we asked for.
There are also plenty of integrable functions that are neither continuous nor
monotonic. Here is an exercise to help you find some of them for yourself.
17.4.9 EXERCISE Let f : [a, b] → R be given by
f (x) = 0 if x = c, f (c) = 1
where c is some particular point in [a, b]. Use Darboux to show that f is integrable
over [a, b] and check that its integral is zero.
(Paragraph 17.2.13 offers a useful approach. The cases where c equals either a or
b need a little extra attention.)
Extend this result (using whichever theorems help you) to show that a function
that is zero on a closed interval except at a finite number of particular points is
integrable.
Extend it further to verify that if two bounded functions f, g on [a, b] are equal
in value at all but a finite number of points, and one of them is integrable, then so
is the other one, and their integrals are equal.
17.4.10 EXERCISE
1. Given that f is bounded on [0, 2] and that, for each positive integer n, it is
integrable over [0, 2 − n1 ], show that f is also integrable over [0, 2], and that
2 2− n1
f = lim f.
0 n→∞ 0
2. Give an example of an integrable function on a bounded closed interval I that

is not monotonic and is discontinuous at an infinite number of points in I.
17.4.11 EXERCISE Show by example that the following assertion is false: if f , g

are integrable over a closed bounded interval I and g(x) > 0 at every point x of I,
f (x)
then must be integrable over I also.
g(x)
Think how you might slightly modify this false statement to make (and then
prove) a true one about integrability of the quotient of two integrable functions.
17.4.12 EXERCISE Calculate the integrals of each of the following expressions

over the interval indicated (assuming, where appropriate, basic properties of trig,
logarithmic and exponential functions).
1. f (x) = x ln x over [1, e];
2 2
2. f (x) = (0.5 + xex )(x + ex )6 over the interval [0, 1];
3. f (x) = sin2 x cos2 x over [0, π2 ];
4. f (x) = ex sin x over [0, π ].
A final idea to examine in this chapter is a test that, in some sense, really
belongs in Chapter 14 since it concerns convergence of series, but which had to
wait until we had developed the idea of integration. There are, in fact, not very
many series problems for which it is useful; however, for those few, it is usually the
only reasonably obvious test that will work at all.
17.4.13 The integral test for series Suppose that f : [1, ∞) → R is continuous,
positive and decreasing.7 Then the following statements are equivalent:

1. the series ∞k = 1 f (k) is convergent;
n
2. the sequence x = 1 f (x) dx n∈N is convergent.
7 That is, 1 ≤ x < y implies that f (x) ≥ f (y).

Proof
For the integral just mentioned, f (2) + f (3) + f (4) + · · · + f (n) is a lower Riemann
sum, and f (1) + f (2) + f (3) + · · · + f (n − 1) is an upper Riemann sum (where
n ≥ 1 is an integer). If, as usual, we denote by Sn the nth partial sum of the series
∞
k = 1 f (k), we therefore have:
n
Sn − f (1) ≤ f (x) dx ≤ Sn − f (n).
x=1
n
This shows that if either of the sequences x = 1 f (x) dx , (Sn ) is bounded above,
then so must the other be. Since both sequences are increasing, this is the same as
saying that if either of them is convergent, then so must the other be.
17.4.14 Illustration An alternative proof that the harmonic series diverges: the
function f (x) = x1 certainly is positive, continuous and decreasing on [1, ∞), and
n
(as is well known) 1 f = [ln x]n1 = ln n − ln 1 = ln n. Since the sequence (ln n)n≥1
is unbounded and therefore divergent, the integral test informs us that the series
∞
∞
1
f (k) =
k
k=1 k=1
(which is the harmonic series) diverges also.

∞
17.4.15 Example Does the series 2 ak , given by
1
ak =
k ln k
Solution
In order to use the integral test, we need to consider the corresponding real
function f specified by
1
f (x) = .
x ln x
Notice that this formula goes bad at x = 1 since ln 1 = 0, but that this does not
impede our progress since the series started at n = 2 (for the same necessitating
reason). We’ll therefore regard f as being defined on [2, ∞) (and we also need a
slightly modified version of the test, in which k = 1 and x = 1 are replaced by k = 2
and x = 2).
Now it needs a little bit of insight or experience (or luck) to notice that f (x) is
precisely the derivative of the function g(x) = ln(ln x), x ∈ [2, ∞). The rest of the
argument is routine:
n
f (x) dx = [g(x)]n2 = ln(ln n) − ln(ln 2)
2
which is unbounded8 and therefore divergent. By the integral test, the given series
is also divergent.
17.4.16 EXERCISE Show that9
∞
1
n=3
n ln n ln(ln n)
diverges.
17.4.17 EXERCISE
1. Extend the argument

n of the integral test to show that, for each (fixed) positive
integer n0 , if n0 f → as n → ∞, then
∞

≤ f (k) ≤ + f (n0 ).
k = n0
∞ 1
2. Estimate the sum of the (convergent) series 1 k5 with an error less than
0.001.
17.4.18 Note It’s occasionally useful to notice that, in the context and notation
of the integral test, when the two (equivalent) conditions hold then f (x) has a
limit of 0 as x → ∞. For suppose that f : [0, ∞) → R is continuous,10 positive and

decreasing and that ∞ 1 f (n) converges. Then certainly f (n) → 0 as n → ∞. For
any given ε > 0 we can therefore find n0 ∈ N such that f (n) < ε. Thinking now
of f (x) decreasing (and also positive) as the real variable x increases, we see that
x ∈ R, x ≥ n0 together imply that 0 < f (x) < ε. Hence f (x) → 0 as x → ∞, as
claimed.
K
8 (for instance) because, for any positive constant K, if we choose a value of n greater than ee ,
then we shall get g(n) > K − ln(ln 2) > K.
9 Notice that the function (x ln x ln(ln x))−1 is positive and decreasing on [3, ∞] but not on
[2, ∞]: indeed, it is not even defined at x = e.
10 actually, continuity does not play any role in this part of the argument.
.........................................................................
18 The elementary
functions revisited
.........................................................................
18.1 Introduction
One of the benefits of now having a logically watertight definition of integral is
that we can at last provide reliable definitions of the so-called elementary functions
such as ln x, ex and sin x, and prove that the basic properties of these entities that we
have been cheerfully using throughout – in order to enrich our library of examples
and exercises – do actually hold good in all circumstances. If it seems surprising
and even counter-intuitive that integration theory should be required for this task,
pause and think about the useful facts that the area under the graph of f (x) = x1
between x = 0 and x = a is ln a, that the area under the graph of f (x) = ex between
x = a and x = b is eb − ea (assuming that a < b), and that the number π that is
so critical to trigonometry is the area of a circle of unit radius. Area is evidently
quite central to how these functions operate, so perhaps it is more natural than
it initially seems that area (interpreted as integral) should provide a means for
defining them in a way that is consistent with intuition and common sense, but
not dependent on either.
18.2 Logarithms and exponentials

1
18.2.1 Definition Consider the function f : (0, ∞) → R specified by f (x) = .
x
We notice that f is positive, strictly decreasing, continuous and differentiable on
its domain, that f (x) → ∞ as x → 0+ , and that f (x) → 0 as x → ∞. This will let
us sketch its graph with decent accuracy.
Now for each t > 0 we define
t
ln t = f.
1
The fact that f is continuous on the interval from 1 to t guarantees that the integral
exists, and the following details about the so-called natural logarithm function thus
created are immediate:
326 18 THE ELEMENTARY FUNCTIONS REVISITED
1. ln 1 = 0;
2. if t > 1 then ln t > 0;
t 1
3. if t < 1 then ln t = 1 f = − t f < 0.
18.2.2 Lemma The function ln is differentiable at the point t (for each t > 0) and
1
its derivative there is .
t
Proof
Immediate from the fundamental theorem of calculus.
18.2.3 Lemma If n ≥ 2 is a positive integer, then

n n
1 1 1
− 1 < ln n < − .
1
k 1
k n
Proof
It is easy to ‘see’ this from a sketch graph:
1
x
1 1 1
ln n > 2 + 3 ··· + n
1
2
1
1 3
4
1
n
···
1 2 3 4 ··· n−1 n x
1
x
1 1 1
ln n < 1 + 2 + 3 ··· + n−1
1
2
1
3
1
n−1
···
1 2 3 4 ··· n−1 n x
but a more logically robust reason is the fact that the first and third items in that
display are exactly the lower and upper Riemann sums for f using the partition
1 < 2 < 3 < 4 < · · · < n of the interval [1, n].
18.2 LOGARITHMS AND EXPONENTIALS 327
18.2.4 Corollary ln x → ∞ as x → ∞.
Proof
From the Lemma, and the fact that the harmonic series diverges, we get1 ln n → ∞
as n → ∞. Now for any real x we have x ≥ x so, since ln is increasing,2
ln x ≥ lnx and, letting x → ∞ (and consequently x → ∞ also) we get
ln x → ∞.
Mildly reassuring though these details are, we are still missing the essential
point of what logarithms are for: their prime purpose, whether for calculation, for
algebraic simplification or for theoretical arguments, is to convert multiplication
into addition: ‘ln(xy) = ln x + ln y’. We next need to establish this fundamental
‘law’.
18.2.5 Theorem For any positive numbers x and y, ln(xy) = ln x + ln y.
Proof
Consider any real constant a > 0, and define a real function g : (0, ∞) → R by
the formula
g(x) = ln(ax) − ln x.
Using the chain rule (and our known derivative of ln), it is easy to differentiate this
at any positive x:
1 1
g (x) = a − = 0.
ax x
It follows that g must be constant, and that its constant value is
g(1) = ln a − ln 1 = ln a,
in other words, that ln(ax) = ln x + ln a. Since a and x are arbitrary members of

(0, ∞), the proof is concluded.
1
18.2.6 Corollary 1 For any y > 0, ln = − ln y.
y
Proof
(Using the theorem):

1 1
0 = ln 1 = ln y = ln y + ln .
y y
1 See 2.9.9.
b a b
2 If 0 < a 0 and y > 0, ln = ln x − ln y.
y
Proof
(Using the theorem and its first corollary):

x 1 1
ln = ln x × = ln x + ln = ln x − ln y.
y y y
18.2.8 Corollary 3 As x → 0+ , ln x → −∞.
Proof
If x is a small positive number, then x−1 is a large positive number. More
precisely, as x → 0+ , x−1 → ∞ and (via 18.2.4) ln x−1 → ∞. Consequently
ln x = −ln x−1 → −∞. (If the last step does not appear obvious, you can confirm
it from the definitions.)
Our intuitive picture of the graph of ln is now reasonably complete:
ln x
1
x
graph of ln x
and an important detail emerges: the range of ln is the whole real line (−∞, ∞)
because, for any x ∈ R, we can use the behaviour of ln near to 0 and ‘near to ∞’ to
find a value ln a of ln that is less than x and a value ln b of ln that is greater than x.
Now since ln is continuous, the IVT tells us that x itself is a value of ln.
SUMMARY: ln is a (strictly increasing and therefore) one-to-one map from
(0, ∞) onto R = (−∞, ∞) and, of course, the vital thing about one-to-one onto
maps is that they possess inverses.
18.2.9 Definition We define a function, called the exponential function and (for
the moment) denoted by exp : R → (0, ∞), by declaring exp to be the inverse of ln.
We are reluctant to reach for the notation ln−1 since the risk of consequent
confusion is high. Instead, let’s concentrate on what inverse function means in
this case:
• For each t > 0 we have exp(ln t) = t.
• For each x ∈ R we have ln(exp x) = x.
At this point, look back to what we found out about inverses of continuous,
differentiable, strictly increasing functions in Chapters 8 and 12 (specifically,
paragraphs 8.6.8 and 12.2.16). From there, we know that exp is continuous and
differentiable, and we have a formula for its derivative at each point, namely:
1
at each p ∈ (0, ∞), exp (ln p) =
ln p
which simplifies, since ln p = p1 , to exp (ln p) = p. Substituting x for ln p, that is,
p = exp x, and noting that x ranges over the whole real line as p ranges over (0, ∞),
that says
exp (x) = exp x for every x ∈ R
and we have recovered what is possibly the most important fact about exp: that it
is its own derivative. The other basic details follow easily enough:
18.2.10 Proposition
1. exp is strictly increasing.

2. exp x → 0 as x → −∞, exp x → ∞ as x → ∞.
3. For each x, y ∈ R, exp(x + y) = exp x × exp y.
4. exp(0) = 1.
1
5. For each y ∈ R, exp(−y) = .
exp y
exp x
6. For each x, y ∈ R, exp(x − y) = .
exp y
Proof
1. Its derivative ( = itself) is always positive. (Alternatively, appeal to 8.6.8 or
8.6.10.)
2. Firstly, if ε > 0 is given, then (using part 1) x < ln ε implies
0 < exp(x) < exp(ln ε) = ε. (Recall that exp(x) is always positive.) Secondly,
given K > 0, then x > ln K implies exp(x) > K for the same reasons.
3. Let p = exp x, q = exp y, that is, x = ln p, y = ln q. Then
x + y = ln p + ln q = ln(pq) and therefore exp(x + y) = pq = exp x × exp y.
4. Because ln 1 = 0.
5. exp(−y) · exp(y) = exp(−y + y) = exp(0) = 1 (and exp(y) = 0).

1 exp(x)
6. exp(x − y) = exp(x + (−y)) = exp(x) exp(−y) = exp(x) = .
exp(y) exp(y)
The proposition guarantees that the overall appearance of the graph of exp is, as
is widely known:
ex
x
graph of e x
As a matter of notation, it is customary to put e = exp(1) and then write not

exp(x) but ex . This is not, however, a trivial matter if we actually want to think
about powers of the number e or, indeed, of non-rational powers of any positive
base-number. The reader is invited to check out some details in order to understand
better what is going on here:
18.2.11 EXERCISE
1. Use induction to check that exp operates through all finite sums, in the sense
that exp( n1 xi ) = exp(x1 ) exp(x2 ) exp(x3 ) · · · exp(xn ) for all finite lists of real
numbers x1 , x2 , x3 , · · · , xn .
2. Show that exp(n) = en for every positive integer n.
3. Show that exp(1/m) = e1/m for every positive integer m.
4. Show that exp(r) = er for every positive rational r.
5. Show that exp(q) = eq for every rational q.
The essence of that little investigation is that exp(x) and ex agree at all the values
of x for which common sense tells you the meaning of ex ; once x stops being
rational, ex no longer possesses
√
a ‘natural’ meaning, and this is nothing to do with
2
e, for expressions such as 2 are also beyond the grasp of basic algebra – they
require proper definition before we can study them with any degree of confidence.
18.2.12 Definition (Now that we have proper definitions of exp and ln,) for x ∈ R
and a > 0 we define
ax = exp(x ln a).
Notice first that in the special case where a = e, we have actually defined
ex to mean exp(x) because ln e = 1 (in turn, because exp(1) = e by definition).
As regards reconciling formal and informal definitions of powers of numbers
other than e, you may find the following ‘spiked’ version of the previous exercise
useful:
18.2.13 EXERCISE For any a > 0, show that

1. exp(n ln a) = an for every positive integer3 n.

1
2. exp ln a = a1/m for every positive integer4 m.
m
3. exp(r ln a) = ar for every positive rational5 r.
4. exp(q ln a) = aq for every rational6 q.
This time, the message is that wherever ax can be defined by common sense
and simple algebra (that is, whenever x is merely a rational number), then that
common-sense definition gives the same answer as the formal all-purpose defini-
tion set out above.
Of course, we now need to confirm that these general powers obey the familiar
index laws: but this is reassuringly straightforward:
18.2.14 Proposition For any a > 0 and real numbers x and y:

1. ax ay = ax+y ,
ax
2. y = ax−y ,
a
3. (ax )y = axy .
Proof of 1.
ax ay = exp(x ln a) exp(y ln a)
= exp(x ln a + y ln a)
= exp((x + y) ln a)
= ax+y .
18.2.15 EXERCISE Check out the proofs of 2 and 3 above.
3 The point is that we don’t need ln and exp in order to define an when n is a positive integer,
it just means write a down n times and multiply the lot.
4 Again, we don’t need ln and exp in order to define an mth root for a when m is a positive
integer, it just means the (positive) number whose mth power is a.
5 Similar remarks.
6 Similar remarks.
18.2.16 EXERCISE
• For a given real number a, use the one-sided version of l’Hôpital’s Rule (see
paragraph 16.2.6) to determine
ln(1 + ax)
lim .
x→0+ x
• Now use the sequence-based description of one-sided limits (see 10.3.6) to
deduce that
ln 1 + na
lim 1
= a.
n→∞
n
• Lastly, use continuity of the exponential function to deduce that
a n
1+ → ea as n → ∞.
n
18.2.17 EXERCISE Compute the Taylor series of ln x about x = 1, and confirm
that it converges (at least) everywhere in the interval ( 12 , 32 ).
18.2.18 EXERCISE Compute the Taylor series of exp x about x = 0, and confirm
that it converges everywhere on the real line.
18.3 Trigonometric functions

Surprisingly, the elementary trig functions – which most of us thought we had
understood perfectly well at school – are harder to define properly than ln and exp
turned out to be. Indeed, for any sort of thorough-going treatment of them, we
need to start by unambiguously defining the number π . Of the two most obvious
ways to do this (area of unit circle, or half-circumference of unit circle) it is the
former that sits more easily with the Riemann approach to integral visualised as
area under graph. The unit circle, crisply described by the equation
x2 + y2 = 1,
can be seen√as composed of the graphs of two functions: the upper semicircle
y = f1 (x) = 1√ − x2 lying above the horizontal axis, and the lower semicircle
y = f2 (x) = − 1 − x2 lying below it. By the obvious symmetry, the whole area
is twice that of the region between the graph of f1 and the horizontal axis, so
18.3.1 Definition
1
π =2 f1
−1
1
=2 1 − x2 dx.
−1
18.3 TRIGONOMETRIC FUNCTIONS 333
y = f1(x)
−1 1
y = f 2(x)
π is twice the ‘shaded area’
θ Q 1
O x
P′
A is the ‘shaded area’
18.3.2 Comments To see the roughwork that lies behind the next definition, take
a look at the second diagram above, where the unit circle (centre 0, radius 1) is
cut by the vertical chord PP that crosses the horizontal axis at Q (so all three
points P, Q, P have the same first coordinate x), the sector POP (shaded) has area
A and the angle POQ is designated θ. Allowing ourselves to think, for just one more
paragraph, that we actually did understand basic trigonometry years ago, what is
the relationship between x and A? Well, θ is the angle (within the acceptable range)
whose cosine is x and, by the ‘ 12 r2 θ’ formula for sector area, A is 12 12 (2θ) = θ .
Therefore
A = cos−1 x.
This formula7 would be fine for determining A from x provided we knew

exactly what was meant by cosine but, at this point in the text, our reliable formal
knowledge is (thanks to Riemann integration again) how to define areas. So it is
really the A in that last display that we thoroughly understand, and we can now
view that understanding as being an equally precise definition of cos−1 . Once that
is clear (and once we check that the idea of inverse function is legitimate here) cos
can be defined as the inverse of cos−1 , and the rest of the details about the trig
functions will fall into place in a more straightforward manner.
Two more small simplifications before we proceed: firstly, the triangular area
POP could indeed be calculated via integration as twice the area lying under the
straight-line graph OP, but there is no need to use such heavy machinery since half
base times perpendicular √height provides a perfectly adequate alternative; thus, the
area of the triangle is x 1 − x2 , and the sector area A is
1
x 1 − x2 + 2 1 − u2 du.
x
(We have had to use a letter other than x for the variable of integration, since x has
been assigned already to denote the first coordinate of P.)
Secondly, although the above diagram (and our thinking that it supported)
tacitly assumed that θ was less than a right angle, the displayed formula for area
A remains valid if θ lies between π2 and π : for now, the sector area is twice the
√
area under the graph of x 1 − x2 minus the triangular area POP , and that minus
is picked up by the fact that the first x in the displayed formula is now negative
(please refer to the following diagram).
Q θ
1
x O
P′
A is again the ‘shaded area’
7 Of course, it is common practice to use the notation arccos x instead of cos−1 x, but here we
have opted for the latter in order to stress the importance of invertible functions in this approach.
18.3.3 Definition We define the function A : [−1, 1] → R via the formula
1
A(x) = x 1 − x2 + 2 1 − u2 du, −1 ≤ x ≤ 1.
x
18.3.4 EXERCISE
1. Verify that, on the open interval (−1, 1), A is differentiable and its
derivative is
1
A (x) = − √ .
1 − x2
(You
√ will need to use only the product rule and the chain rule to differentiate
x 1 − x2 . 1√ x√
Then express x 1 − u2 du as − 1 1 − u2 du, and appeal to the
fundamental theorem of calculus.)
2. Check that A(−1) = π and that A(1) = 0, and show that
lim A(x) = π and lim A(x) = 0.

x→−1+ x→1−
18.3.5 Note The essence of the above Exercise is that A is the sort of function
to which we can apply the first mean value theorem: it is continuous on [−1, 1],
differentiable on (−1, 1) and its derivative is always less than zero here, so it is
strictly decreasing and therefore one-to-one, and its range is precisely the interval
[0, π ]. Viewing it therefore as a map
A : [−1, 1] → [0, π ],
it is a bijection, and possesses an (also strictly decreasing) inverse mapping A−1

from [0, π ] to [−1, 1]. This inverse, when we have appropriately extended it
over the whole real line, is our (logically watertight) definition of the cosine
function.
18.3.6 Definition The real function cos (cosine) is defined as follows (using the
notation from above):
1. For 0 ≤ x ≤ π , cos x = A−1 (x).
2. Then for −π ≤ x ≤ 0, cos x = cos(−x).
3. Then for each integer n, cos(x + 2nπ ) = cos x.
18.3.7 Remarks
1. Although it is not strictly part of the definition, you can follow the ‘evolution’
of cosine through points 1, 2 and 3 above by looking at the three phases of the
diagram supplied:
cos x
0 π x
cos x
−π π x
cos x
2. The inequality −1 ≤ cos x ≤ 1 (for all real x) is built into the definition, and so
is cos(x + 2nπ ) = cos x (for all real x and integer n).
3. We also record that cos, when restricted to the interval [0, π ], has an inverse
cos−1 : [−1, 1] → [0, π ] (namely, the function A) whose derivative
1
cos−1 (x) = − √
1 − x2
for every x ∈ (−1, 1).
Having formally defined one of the trig functions, we can now create the others
from it quite routinely:
18.3.8 Definition The real function sin (sine) is defined as follows:

1. For 0 ≤ x ≤ π , sin x = 1 − (cos x)2 .
2. Then for −π ≤ x ≤ 0, sin x = −sin(−x).
3. Then for each integer n, sin(x + 2nπ ) = sin x.
18.3.9 Remarks
1. Again, it may be helpful to follow the evolution of sine through points 1, 2 and
3 above by looking at the three phases of the sketch graphs provided:
sin x
0 π x
sin x
−π
π x
sin x
2. The inequality −1 ≤ sin x ≤ 1 (for all real x) is built into the definition, as are
the equalities sin(x + 2nπ ) = sin x (for all real x and integer n) and
(sin x)2 + (cos x)2 = 1 (for all real x).
3. It is, as you will almost certainly be aware, conventional to write sin x and cos x
rather than sin(x) and cos(x) (and similarly for the other trig functions, and
also for ln) provided that the argument x is a single letter, but beware of the
dangers of extending this custom to cases where the argument is
typographically complex. The symbol sin π x is capable of being interpreted
either as sin(π x) or as (sin π )x, so use bracketing to prevent the ambiguity. It
is also conventional (so long as n is a positive integer) to denote the nth power
of sin x and of cos x not as we have done above, but rather as sinn x and cosn x.
Clearly, one should carefully avoid creating confusion by doing this for
n = −1.
18.3.10 Theorem
1. For 0 < x < π , cos (x) = −sin x.

2. For −π < x < 0, cos (x) = −sin x.
3. For all real x except multiples of π , cos (x) = −sin x.
4. For all real x, cos (x) = −sin x.
Proof
1. For 0 < x < π , we need only invoke once more the theorem on differentiating
an inverse function to see that
1
cos (x) = (A−1 ) (x) = = − 1 − (cos x)2 = −sin x.
A (cos x)
2. For −π < x < 0, the result follows from part 1 because we defined cosine as
an ‘even’ function and sine as an ‘odd’ function.
3. For all real x except multiples of π , the periodicity of both sine and cosine
extends the validity of parts 1 and 2.
4. When x is a multiple of π , we can ‘patch’ the desired equality by an appeal to
Theorem 12.3.20.
√
18.3.11 EXERCISE Starting with sin x = 1 − cos2 x on the interval (0, π ), show
that sin (x) = cos x for all real x. You may expect to have to extend the result from
(0, π ) to R in stages, as in the preceding theorem.
The addition formulas for sine and cosine (the identities for sin(x + y) and
cos(x + y)) are not very obvious in the present approach, but can be established
indirectly by one of the nicest instances of rabbit-out-of-hat mathematics (the sort
of argument that is clear afterwards, but completely invisible beforehand) that you
are ever likely to see:
18.3.12 Lemma 1 Suppose that a function f is twice differentiable on R, and

that
• f + f = 0 everywhere in R,
• f (0) = 0, and
• f (0) = 0.
Then f is the zero function (f is zero everywhere in R).
Proof
Consider the function g(x) = f (x)2 + f (x)2 . Routine differentiation shows that
g (x) = 0 everywhere, so g is a constant function. Now the second and third bullet
points tell us that its constant value is zero.
18.3.13 Lemma 2 Suppose that a function f is twice differentiable on R, and

that
• f + f = 0 everywhere in R,
• f (0) = a, and
• f (0) = b
where a, b are constants. Then f (x) = a cos x + b sin x everywhere in R.
Proof
Consider the function h(x) = f (x) − a cos x − b sin x. Apply Lemma 1 to h and we
find that it is the zero function.
18.3.14 Theorem For all real x, y we have

1. sin(x + y) = sin x cos y + cos x sin y,
2. cos(x + y) = cos x cos y − sin x sin y.
Partial proof
Consider y as fixed for the moment and define f by f (x) = sin(x + y). Differenti-
ating twice shows that f (x) = −sin(x + y), and also notice that f (0) = sin y and
f (0) = cos y. By Lemma 2, f (x) = sin y cos x + cos y sin x for every real x. Since y
was arbitrary, part 1 is established.
18.3.15 EXERCISE Prove part 2 also.
18.3.16 Note At this point, the tasks of defining and differentiating the functions
tan, sec, cot and cosec, and their inverses where appropriate, are pedestrian and
can safely be left unless and until there is a need for them.
18.3.17 EXERCISE
1. Verify that the function tan, defined (initially) on the interval (− π2 , π2 ) by the
formula
sin x
tan x = ,
cos x
is continuous and differentiable, with (positive) derivative (cos x)−2 , and has
range R; also that its inverse arctan or tan−1 : R → (− π2 , π2 ) is continuous
and differentiable, and that its derivative is given by
1
(tan−1 ) (x) = .
1 + x2
2. Check that the radius of convergence of the power series
t t2 t3 t4
1− + − + − ···
3 5 7 9
is 1: so that, in particular, the series
x2 x4 x6 x8
1− + − + − ···
3 5 7 9
and
x3 x5 x7 x9
x− + − + − ···
3 5 7 9
are absolutely convergent, the second one to some function f (x), on (at least)
the interval (−1, 1).
3. Appeal to the theorem on differentiation of power series to show that
1
f (x) = 1 − x2 + x4 − x6 + x8 − · · · = (−1 < x < 1).
1 + x2
4. Deduce that (for −1 < x < 1):
x3 x5 x7 x9
x− + − + − · · · = tan−1 (x).
3 5 7 9
5. Use this to justify

√ what we claimed in paragraph 1.6. Also use the fact that
tan(π/6) = 1/ 3 to obtain a (numerically simple) series whose sum is √π .
12
18.3.18 Valedictory Thank you for travelling with us!

.........................................................................
19 Exercises: for
additional practice
.........................................................................
These further exercises are presented broadly in line with the order in which their
associated material occurs in the main text, but you should be aware that analysis
is a profoundly interconnected subject, so that ideas from an earlier or a later
section than the one that seems to be central to a particular question may well
turn out to be valuable in crafting a good answer. Specimen solutions to these
problems are available to instructors via the publishers: please visit the webpage
www.oup.co.uk/companion/McCluskey&McMaster to find out how to seek access
to these.
1. How far along the list of numbers
0.3, 0.33, 0.333, 0.3333, . . .
should we go so that, from that point onwards, every number we meet is an

approximation to 13 with error smaller than 10−8 ?
2. How far along the list of numbers
1 1 1 1 1 1 1 1
1− − , 1 − − , 1 − − , 1 − − ,...
3 9 5 25 7 49 9 81
can we go and be certain that, from then on, all the numbers we find are
approximations to 1 whose errors are less than 10−6 ?
3. For the list of numbers
1 2 3 4 n
10 + , 10 + , 10 + , 10 + , · · · , 10 + ,···
2 5 10 17 1 + n2
how far along should we go if we need to be sure that, from then on, each
number we encounter differs from 10 by less than 0.000 003?
4. Find a stage along the list of numbers

9 + (1)−2 , 9 + (2)−2 , 9 + (3)−2 , 9 + (4)−2 . . .
342 19 EXERCISES: FOR ADDITIONAL PRACTICE
after which we can be sure that each of these approximations to 3 will have
error less than 10−5 .
2 5
5. Prove, by the definition of limit of a sequence, that 2 − + 4 → 4 (as
n n
n → ∞).
6. Use the definition of convergence of a sequence to a limit to prove that
7 − 5n3 5
(a) 3
→− ,
2n 2
4 5
(b) 3 + − 2 → 3.
n n
7. Use the definition of convergence to a limit to prove that
1
(a) → 0,
5n − 1
1
(b) √ → 0,
n n
1
(c) 2 → 0.
n − 30π n
8. Let (xn )n∈N be a given convergent sequence of real numbers whose limit is
. Prove, directly from the definition of convergence, that 6xn → 6.
9. Show via the definition of convergence that if xn ≥ 0 for every n ∈ N and
√
4 + xn → 2, then xn → 0.
10. (The arithmetic mean – geometric mean inequality, more briefly called the
AM – GM inequality.)
√ x+y
(a) If x ≥ 0 and y ≥ 0, prove that xy ≤ .
√ √ 22
(Hint: begin by noticing that ( x − y) ≥ 0.)
(b) Use part (a) to deduce that, for every four non-negative numbers w, x, y
√ w+x+y+z
and z, we have 4 wxyz ≤ .
4
(Hint: you can apply part (a) to w and x, and then to y and z. Can it then
√ √
be applied to wx and yz?)
(c) Note that these are particular cases of a more general result: for any
positive integer n and any list a1 , a2 , a3 , · · · an of non-negative numbers,
we have
√
n
a1 a2 a3 · · · an ≤ (a1 + a2 + a3 + · · · + an )/n.
11. The harmonic mean of two positive numbers a and b is defined to be the
reciprocal of the arithmetic mean (that is, the average) of their reciprocals.
Investigate
√ whether this is greater or smaller than their geometric mean
ab.
12. Using various parts of the algebra of limits theorem, determine the limits (as
n → ∞) of the sequences whose nth terms are as follows:
7n3 − 4n2 + 5
(a) ,
2 + 2n − n3
19 EXERCISES: FOR ADDITIONAL PRACTICE 343
1
(b) ,
+2
n2
2
1 2 3n + 1
(c) 1 + − 2 .
n n 2n2 − 1
13. Use the algebra of limits to determine limn→∞ an and limn→∞ bn where

3n + 4n2 + 5n4 2 2−n 2
an = , bn = (23 − 7n + 2n ) .
(6 + 7n2 )2 5 − n2
14. Using various parts of the algebra of limits theorem, determine the limits (as
n → ∞) of the sequences whose nth terms are as follows:
5
(a) 3 + ,
n
6n + π 2
(b) ,
5n
2 3
(c) 2 + − 4,
n n
6n3 + 4n2 − 1
(d) ,
17 − 7n + 2n3
1+n
(e) ,
1 + n + n2
5
3n + 2
(f) .
4−n
3 2
15. Put xn = −1 − + 2 (for each positive integer n). By simplifying the
n n
difference xn+1 − xn , show that (xn ) is an increasing sequence.
1 1
16. Show that the sequence (cn ) described by cn = 2 + − 2 is decreasing
n n
provided that n ≥ 2.
5 2
17. Let us denote (for each positive integer n) by xn the number 4 + − 2 .
n n
Show that every xn satisfies the inequality −12 ≤ xn ≤ +12. By simplifying
the difference xn+1 − xn , show that (xn ) is a decreasing sequence.
18. Notice first that 2 < 3, 4 < 5, 8 < 9, 16 < 17, . . . , 2n < 2n + 1.
Consequently, 1/2 > 1/3, 1/4 > 1/5, 1/8 > 1/9, 1/16 > 1/17, . . . ,
2−n > 1/(2n + 1). Now put
1 1 1 1 1
xn = + + + + ... + n
3 5 9 17 2 +1
for each positive integer n. Show that the sequence (xn )n∈N is bounded. Also
check that it is increasing. Why must it converge?
√
19. Show that the sequence (n − n)n∈N is increasing, and not bounded.
20. Explain (in terms of sequence limits) the meaning of the recurrent decimals
0.44444 . . . and 0.2136363636 . . . and express each of them as a rational
number (a fraction in the usual sense of that word).
21. Explain the meaning of (and evaluate as a fraction) the recurring decimal
1.281818181 . . .
22. Prove that a decreasing sequence that is bounded below must converge, and
that its limit is the infimum of the set of all its terms.
23. Use the squeeze
√ to find the limits of the sequences whose typical terms are:
(a) xn = n 4n + 6n ,
3n + 5 sin(n2 + 2)
(b) yn = .
1 − 6n
24. Use the squeeze to show that the sequences whose nth terms are as follows
are convergent:
(−1)n +4n sin(n12 − 2n7 )
2
3n
(a) + ,
1 − 4n
√ n
√
(b) n 3n + 5n + 8n , assuming that n a → 1 for each positive constant a.
(We prove this result in paragraph 6.2.3.)
√ √
25. Find the limit as n → ∞ of the sequence ( 5n + 9 − 5n + 4).
√ √
26. Find the limit (as n → ∞) of 3n + 2 − 3n − 2.
27. We are given three sequences (an )n∈N , (bn )n∈N and (cn )n∈N and we are told
only that an → , cn → 0 and |an − bn | ≤ |cn | for every n ∈ N. Prove that
bn → .
28. Use the squeeze to show that the sequences whose nth terms are as follows
are convergent:
(−1)n
(a) π/4 + √ ,
n
√
(b) n 1000 + 3n . You√may assume that, for any positive constant k that you
choose, we have k → 1.
n
29. (a) Let there be given a sequence (xn )n∈N for which the subsequence
(x2n−1 )n∈N of all odd-numbered terms and the subsequence (x2n )n∈N of
all even-numbered terms both converge to the same limit . Prove that
the entire sequence (xn )n∈N also converges to .
(b) For the sequence (an )n∈N described by
⎧
⎪
⎪ 3 + 7n − n2
⎪
⎨ 2n2 + n + 12 if n is odd,
an =
⎪
⎪ (0.7)n − 1
⎪
⎩ if n is even,
(0.6)n + 2
use part (a) to prove that it converges and to determine its limit.
1
30. Consider the sequence (xn ) described by xn = 1 − if n is the kth prime
k
1
number, xn = 1 + if n is the k non-prime positive integer. Write down
th
k
the first twelve terms of the sequence (xn ). How large a value of n will
guarantee that |xn − 1| < 0.01?
31. Show that the sequences whose nth terms are as follows are unbounded:
√
(a) (−1)n n,
n2 + 1
(b) .
n+5
32. Show that the sequences whose nth terms are as follows are unbounded:
√
(a) 3 n − 1000,
1 + n2
(b) .
5 − 8n
33. Show that the following sequences are unbounded:
nπ
(a) 2n + 5 + 8 sin ,
17

1 − n2
(b) .
1 + 2n
34. Write down the total of the following list of numbers:
1 1 1 1
1+ + + + ... + n
3 9 27 3
for an arbitrary positive integer n. Using this (or otherwise) show that the
sequence (xn ) defined by the formula
2 8 26 3n − 1
xn = 1 + + + + . . . +
32 34 36 32n
is bounded. Now use this to prove that it must be convergent.

1 1 1 1 1
35. Put xn = + + + + . . . + (for each positive integer n). By
1 3 5 7 2n − 1
showing that (xn ) is not bounded, prove that it cannot be convergent.
36. Show that each of the sequences whose nth terms are as follows is divergent
(hint: consider suitable subsequences of each in turn):
(−2)n − 1
(a) ,
2n + 1
nπ
(b) sin .
2
37. Show thatthe following
sequences are divergent:
3nπ
(a) cos ,
4
(b) (−1)n + (−1)n/2 , where t denotes the floor (the integer part) of t.
38. Show that each of the sequences whose nth terms are as follows is divergent
(hint: consider suitable subsequences of each in turn):

2
(a) 2π + (−1)n 3− ,
n+5

(n + 3)π
(b) 1 + cos .
4
39. Construct a sequence (xn )n≥1 such that there are ten different numbers that
are limits of subsequences of (xn ).
40. The sequence (bn√
)n∈N is defined recursively by the two formulae
b1 = 3, bn+1 = 6bn − 8. Show that
(a) 3 ≤ bn < 4 for all n ∈ N,
(b) (bn )n∈N is an increasing sequence,
(c) (bn )n∈N converges,
and determine what its limit is.
41. The sequence (bn√
)n∈N is defined recursively by the two formulae
b1 = 6, bn+1 = 4bn + 32 (each n ∈ N). Show that
(a) 6 ≤ bn < 8 for all n ∈ N,
(b) (bn )n∈N is an increasing sequence,
(c) (bn )n∈N converges;
also determine its limit.
42. The sequence (xn ) is defined recursively thus:

x1 = 20, xn+1 = 13xn − 36 (n ≥ 1).
Show that 9 < xn ≤ 20 for all n and that (xn ) is a decreasing sequence. Then
show that it converges and determine its limit.
43. Does the following sequence converge? If so, what is its limit?
⎛ ⎞

⎜ √ √ √ √ ⎟
⎝ 12, 12 + 12, 12 + 12 + 12, 12 + 12 + 12 + 12, . . . ⎠
44. The sequence (dn )n∈N is defined recursively by the two formulae
2 2 + 2dn
d1 = , dn+1 = (for each n ∈ N).
3 3 + dn
Show that
2
(a) ≤ dn < 1 for all n ∈ N,
3
(b) (dn )n∈N is an increasing sequence,
(c) (dn )n∈N converges, and determine its limit.
45. The sequence (xn )n∈N is defined recursively by the two formulae
2
x1 = 2, xn+1 = (for each n ∈ N).
1 + xn
Obtain a formula for xn+2 in terms of xn . Think what this tells you about the
subsequence of odd-numbered terms, and about the subsequence of
even-numbered terms. Use Exercises 44 and 29 to determine the limit of the
sequence (xn )n∈N .

46. Give an example of two divergent series an and bn for which

(an + bn ) is convergent.
47. Find(if it exists) the limit (as n → ∞) of
nπ
(a) n 100 + sin v
17
2n
(b) √
n! + n!
√
(c) n 60.5n + 3n
3 2n2 +n
n +n
(d)
n3
1 n
(Hint: the tricky part is to investigate 1 + 2 . Once you have shown
n
n2
1
that 1 + 2 converges, you know that it is bounded: there is a
n
2
1 n
constant K such that 1 + 2 < K for all n; therefore
n n
1 √
1+ 2 < n K . . .)
n
48. Determine the limit as n → ∞ of the sequences whose nth terms are as
given:

4 n
(a) 1 − 2
n

3 2n+5
(b) 1 +
n
3n
(c) .
n! + n2
5n2 5n2 − 1 5n2 − 2 5n2 − 3 5n2 − n
(d) + + + + . . . +
3n3 3n3 + 2 3n3 + 4 3n3 + 6 3n3 + 2n
49. What are the limits of the following sequences?
2 √
(a) 3n +n−2 123.47
⎛ −1 ⎞
1
(b) ⎝((n!)!) n! ⎠, that is, √ n!
(n!)!

(3π )n
(c)
n!
2n+1
5
(d)
(2n + 1)!

3 n
(e) 1+
n
50. Determine the limits of the sequences whose typical terms are as presented
below.
2
0.5 n +n+10
(a) 1 −
n
√ √
(b) n + 7 − n + 2
√
(c) 2n+3 10n + 5
51. Find the limit of the sequences whose nth terms are as given:

6 3n+17
(a) 1 + ,
5n
3n−1
n
(b) ,
n+2
n2 n2 − 2 n2 − 4 n2 − 2n
(c) 3 + 3 + 3 + ··· + 3 .
n +1 n +4 n +7 n + 3n + 1
52. Prove or disprove the following statements concerning a general sequence
(xn )n∈N : √ √ √
• If |xn | → 2 then either xn → 2 or xn → − 2;
• xn3 → 64 ⇒ xn → 4.
53. The following sequence (xn )n∈N :
0, 1, 1, 2, 1, 1, 1, 3, 1, 1, 1, 1, 1, 1, 1, 4, 1, 1, 1, . . .
is defined by specifying xn = 1 if n is not a power of 2 but xn = k whenever

n = 2k . Decide (with proof) whether the sequence is convergent or
divergent.
54. We are given two sequences of positive real numbers (an )n∈N , (bn )n∈N such
that (bn )n∈N is convergent and (a1 + a2 + a3 + . . . + an ) ≤ bn for every
positive integer n. Prove that (an )n∈N must converge, and determine its
limit.
55. The sequence (cn )n∈N is defined recursively by the two formulae
2 + 2cn
c1 = 2, cn+1 = (for each n ∈ N).
3 + cn
Show that
(a) 1 < cn ≤ 2 for all n ∈ N,
(b) (cn )n∈N is a decreasing sequence,

(c) (cn )n∈N converges,
and determine its limit.
√
56. Does the sequence whose nth term is 4n + 7 converge?
3n+1
57. We are given a bounded sequence (yn ) such that, for every constant K, the
sequence (sin(Kyn )) converges. Prove that (yn ) must also converge.
(Suggestion: try proof by contradiction.)
1
58. Given that n2
converges, use the comparison test to prove convergence
for
5n − 1
(a) ,
3n3 + 1
5n + 1
(b) .
3n3 − 1
3
59. Given that the series n− 2 converges, use the comparison test to show that
each of the following also converges:
3n − 2
(a) √ ,
n(2n2 + 1)
n2 + n + 1
(b) √ .
( n)7 + 13
60. Use the limit comparison test to decide, for each of the following series,
whether it converges or diverges:
−2+ 1
(a) n n
1001/n + (n!)−1/n
(b) .
n1+2/n
61. Use the limit comparison test to decide, for each of the following series,
whether it√converges or diverges:
n3 n + 5
(a)
2n4 − 7
n3 + 5
(b) √ .
2n4 n − 7
62. Does the following series converge or diverge? Give reasons for your answer.
1
n2
1− .
2n + 1
63. For which positive values of t does
(n + 1)! (2n + 1)! t n

(3n − 1)!

64. Use the nth -root test to decide whether

2
1 n
1+ n
2
4 n
1+
5n
is convergent or divergent.
65. Use the ratio test of d’Alembert to prove the convergence of
n! (2n)! (2π )n
.
(3n)!
66. Determine the real number B such that the following series converges for
0 < t B:
4n2 − 1 n
tn .
9n2 − 1
(Note that it is quite difficult to decide whether or not this series converges
when t = B exactly, and you are not asked to investigate this.)
67. Does the series
(n!)2 22n
(2n)!
68. For which t > 0 does this series converge?
n!(2n)!(3n)!
tn
(6n)!
69. For which t > 0 does this series converge?
3n n2
tn
3n + 2
((n + 2)!)2 t n
70. For which positive values of t is the series convergent, and
(2n + 1)!
for which is it divergent?
2
n + 5 n +n
71. For which values of t is the series |1 − t|n convergent, and
n
for which is it divergent?
72. Find a positive integer N so large that

N
1
> 100.
n
n=65
(Suggestion: look at the proof that the harmonic series diverges.)


73. (a) If an is a convergent series and bn is a divergent series, prove that

the series (an + bn ) must be divergent.

an example of two divergent series an and bn for which
(b) Give
(an bn ) is convergent.

(c)
Give an example of two convergent series cn and dn for which
(cn dn ) is divergent.
74. Show that the series
3n + 5
(−1)n
4n2 + 3
converges.
75. Determine whether the following series converges:

n 1 −n
(−1) 1 + .
n
76. Decide whether the following series converge:

2n
(a) (−1)n 2 ,
n +4
2n2
(b) (−1)n 2 .
n +4

n−5 n
77. (a) Does the series (−1) n converge or diverge?
n
2 n
e +1
(b) Does the series (−1) n converge or diverge?
10
78. We define a sequence (xn )n∈N by setting
4
xn = n+1 when n is odd but xn = n 2 when n is even.

Verify that xn → 0 (as n → ∞). Determine whether (−1)n+1 xn is
convergent or divergent.
79. By first factorising the bottom line and using partial fractions, find the sum
of the series
∞
1
.
4k2 + 12k + 5
k=1
1
80. Let wn = 3 for each integer n ≥ 2. Calculate, as a rational in its lowest
n −n
terms,
59
wn .
n=2
Show that the series
∞

wn
n=2
converges, and determine the numerical value of its sum.

81. Given a sequence (xn )n∈N in the interval [−10, −2], show (using, for

example, the direct comparison test) that nxn is convergent.
82. There are two sequences of positive real numbers (an )n∈N , (bn )n∈N .
A subsequence (ank )k∈N of (an )n∈N satisfies the condition ank ≥ bk for every

k ≥ 1 and the series bn is divergent. Prove that an is also divergent.

Suggestion: if not, then the partial sums of an would be bounded . . .
(−1)n
83. Putting tn = −1 + , show that ntn is divergent. Suggestion: use the
3n + 1
result of Exercise 82.
84. Identify the domains of the real functions defined by the following formulae:
(a) arcsin(5 + 3x),
x−1
(b) 2 ,
(x − 49)(x2 + 3x + 2)
(c) ln(6 + x − x2 ),

10
(d) ln .
1 + x2
85. Determine the domains of the functions described by
x
(a) 2 ,
x + 5x − 50
(b) arccos(ex ),

x(4 − x)
(c) ln .
3
√
86. For the function f (x) = x , where x denotes the floor (or integer part)
of x, find two sequences (xn ), (yn ) such that xn → 4 and f (xn ) → f (4), but
yn → 4 and f (yn ) f (4).
87. For the function defined by f (x) = x3 , find
(a) a sequence (xn ) such that xn → 0 and f (xn ) → f (0),
(b) a sequence (yn ) such that yn → 0 and f (yn ) f (0).
88. Prove that p(x) = x4 − 2x2 + 17x − 12 defines a function p that is
continuous at x = 4.
89. Show directly from the definition that the polynomial
g(x) = 2x7 − 15x5 + 22x − 12
is continuous at x = −3.
90. For the function f described by

20 − x2 if x < 2,
f (x) =
1 + 2x + 3x2 if x ≥ 2
find a sequence (xn ) such that xn → 2 and f (xn ) f (2).

91. Prove√that the function given by f (x) = x 2 + x2 is not continuous at

x = 3.
√
92. Show that the function specified by h(x) = x 2 is not continuous at 100.
93. Prove that the ‘step function’

0 if x < 0,
s(x) =
1 if x ≥ 0
is not continuous.
94. Verify that f is continuous at x = 2, where

1+x if x is rational,
f (x) =
5−x if x is irrational.
95. Verify that g is continuous at x = 0, where

⎧
⎨x sin 1 + x2 cos 1 if x = 0,
g(x) = x x
⎩
0 if x = 0.
96. Show that the function g described by

⎧
⎨x cos 1 if x = 0,
g(x) = πx
⎩
0 if x = 0
is continuous at 0.
97. Show that the function h given by

2 + 5x if x ∈ Q,
h(x) =
10 − 3x if x ∈ R \ Q
is continuous at 1, but not continuous at −1.

98. Prove that the function
x3 − x2 + 5x − 12
2x2 + 3
is continuous on R.
99. Prove that
1 − 2x + 3x2 − x5
f (x) =
2x2 + 8x + 9
is continuous everywhere on the real line.
100. Suppose that B is a non-empty set of real numbers and that λ = inf(B).
Show that there is a sequence (xn ) of elements of B such that xn → λ.
101. Show that x5 + 15x − 20 has a root in [1, 2].
102. Show that the polynomial 6x4 − 8x3 + 1 has at least two roots in the interval
(0, 2).
103. Prove that x6 − 5x4 + 2x + 1 = 0 has at least four real solutions.
104. Prove that the equation 4x5 − 8x3 + 4x − 1 = 0 has at least three positive
solutions. It may be useful to evaluate the polynomial at x = 12 .
105. Show that the graph of the function 7 sin x − 10 cos x − 4x crosses the x-axis
at least twice between 0 and π .
106. Show that the equation x4 + x3 − 8x2 + 1 = 0 has four distinct real
solutions.
107. Given that f : [0, π/2] → [0, 1] is continuous, prove (by considering the
function g(x) = f (x) − sin x) that there is a number c in [0, π/2] such that
f (c) = sin c.
108. Given that f : [0, 1] → [0, 1] is continuous, prove that there exists a number
c ∈ [0, 1] such that (f (c))2 + 2f (c) − 4c2 = 0.
109. (a) Suppose that f : [a, b] → R is continuous and never takes the value
zero. Prove that there is δ > 0 such that no value of f (x) lies in the
interval [−δ, δ].
(b) Show by example that the statement in part (a) ceases to be true if we
replace [a, b] by (a, b).
b] → R, show that there is a positive constant K
110. Given continuous f : [a,
such that the function K + f (x) is defined everywhere on [a, b].
111. Given that f : [a, b] → R is continuous, show that
(a) there is a positive constant K such that the function ln(f (x) + K) is
defined everywhere on [a, b],
(b) there is a positive constant A such that the function arcsin(Af (x)) is
defined everywhere on [a, b].
112. Show by means of examples (preferably, simple ones) that
(a) A continuous function on a bounded open interval can fail to be
bounded,
(b) A continuous function on a bounded open interval, even if it is
bounded, can fail to have a maximum value and can fail to have a
minimum value,
(c) A continuous function on an unbounded closed interval can fail to be
bounded,
(d) A continuous function on an unbounded closed interval, even if it is
minimum value,
(e) A discontinuous function on a bounded closed interval can fail to be

bounded,
(f) A discontinuous function on a bounded closed interval, even if it is
minimum value.
113. If f : D → C is increasing and g : D → C is decreasing, verify that the
function 2f − 3g is increasing.
114. If f : D → C and g : D → C are both increasing, and k is a positive
constant, show that f + g and kf are both increasing. Also show (by
presenting a suitable counterexample) that the product function fg could fail
to be increasing.
115. If h : D → C and j : C → R are both increasing or both decreasing, show
that the composite function j ◦ h is increasing.
116. Determine all the limit points of X = { n1 : n ∈ N} and all the limit points of
Y = Q ∩ (2, 3).
117. Determine all the limit points of each of the following sets:
(a) (a, b),
(b) [0, 2) ∪ (2, 3],
(c) Q.

118. Let g(x) = x2 (x − 1) and f (x) = g(x). What is the domain of f (x)?
Investigate, if possible, the limit of f (x) as x → 1.
Investigate, if possible, the limit of f (x) as x → 0.
graph of the function f (x) = x (x + 1)(x − 2). Now let
119. Sketch the 2
g(x) = f (x):
(a) What is the domain of g?
(b) What is the numerical value of g(2)? Investigate the limit of g(x) as
x → 2.
(c) What is the numerical value of g(0)? Why can we not investigate the
limit of g(x) as x → 0?
120. Use sequences to evaluate
x3 − 1000
lim .
x→10 x4 − 10000
121. Use sequences to evaluate

lim j(x)
x→3
where ⎧ 4
⎨ x − 81
while x = 3,
j(x) = x3 − 27
⎩
−1 if x = 3.
122. Evaluate (with proof using sequences)

x2 − 16
(a) lim ,
x→−4 x + 4
x3 − 64
(b) lim .
x→4 x2 − 16
123. For the function f defined by
⎧ 2
⎪ x + 3x − 10
⎨ if x = 2,
f (x) = x + x − 4x − 4
3 2
⎪
⎩ 1 if x = 2
2
use sequences to evaluate lim f (x) as x → 2.

124. For the function f defined by
⎧
⎨ sin(cos(sin x))
if x ≤ 1,
f (x) = 2 + cos(sin(cos x))
⎩ 4
x − x3 if x > 1,
√
use sequences to evaluate lim f (x) as x → 3
2.
125. For the function described by
⎧
⎪
⎪sin(sin(sin(x))) if x ∈ Q ∩ (−∞, 10),
⎨
f (x) = cos(cos(cos(x))) if x ∈ (R \ Q) ∩ (−∞, 10),
⎪
⎪
⎩
3x − x3 otherwise
determine the limit of f (x) as x → 11.

126. The function f is defined by:
⎧
⎪
⎪ x if x is rational with even denominator when in lowest terms,
⎨
f (x) = 2x − 2 if x is rational with odd denominator when in lowest terms,
⎪
⎪
⎩
3x − 4 if x is irrational.
Show that, in all cases, |f (x) − 2| ≤ 3|x − 2|.

Deduce, via the squeeze, that f (x) → 2 as x → 2.
127. Construct a proof of the result, that if f is a function with domain D, and p is
a limit point of D, and f (x) → as x → p, then |f (x)| → || as x → p.
128. Use sequences to prove that if f : D → C and p is a limit point of D and
f (x) → as x → p, then (f (x))2 + (f (x))3 → 2 + 3 as x → p.
129. Construct a proof of the squeeze rule for limits of functions.
130. A function f is defined as follows:

(a) if x is rational then f (x) = 2x + 7,
(b) if x is irrational but x2 is rational then f (x) = 4x + 5,
(c) in all other cases, f (x) = 11x − 2.
Check that, in all three cases, |f (x) − 9| ≤ 11|x − 1| for all x. Deduce, using
the squeeze, that limx→1 f (x) = 9.
131. Use the epsilon-delta description of limits to prove the following version of
the squeeze: suppose that f , g, h have the same domain D, that p is a limit
point of D, that
f (x) ≤ g(x) ≤ h(x)
for each x ∈ D, that f (x) → as x → p and that h(x) → as x → p; then

also g(x) → as x → p.
132. Use the epsilon-delta definition to prove the following: if f , g have domain
D, p is a limit point of D, f (x) ≤ g(x) for every x ∈ D and both f and g have
limits at p, then limx→p f (x) ≤ limx→p g(x). [Hint: if not, then
the number
limx→p f (x) − limx→p g(x)
3
is positive. Call it ε. Use the fact that the two limits exist to get two positive
numbers δ1 and δ2 and put δ = the smaller of δ1 and δ2 . Pick a point x = p
of D whose distance from p is less than δ, and look for
a contradiction.]
133. Let p(x) = 4 − 5x + 7x2 − 2x3 . Given a positive number ε, find a positive
number δ such that |x − 2| < δ will guarantee that |p(x) − p(2)| < ε.
134. Let p be the polynomial described by p(x) = 12 − 5x + x2 + 4x3 and ε > 0
be given. Determine a positive number δ such that the condition |x + 2| < δ
will guarantee that |p(x) − p(−2)| < ε.
135. For the polynomial function p(x) = 3x2 − 7x + 5 and a given positive
number ε, obtain a formula for a positive number δ such that
0 < |x − 2| < δ will guarantee that |p(x) − 3| < ε. Why does it follow from
this that p is continuous at 2?
136. The function f is defined by
⎧ 2
⎨ 3x − 243
if x = 9,
f (x) = x−9
⎩
27 if x = 9.
Given ε > 0, get a formula for a positive number δ such that

0 < |x − 9| < δ implies that |f (x) − 54| < ε. Is f continuous at x = 9?
137. Consider the function g defined by

⎧ 2
⎨ 5x − 180
if x = 6,
g(x) = x−6
⎩
48 if x = 6.
For a given positive number ε, obtain a formula for a positive number δ

such that 0 < |x − 6| < δ will guarantee that |g(x) − 60| < ε. Why does it
follow from this that g is not continuous at 6?
138. Show via the epsilon-delta definition of continuity that the function
defined by
⎧
⎨(x − 1)2 1
if x = 1,
f (x) = (x − 1)2
⎩
1 if x = 1
is continuous at 1.
√
139. The straight line joining (1, 1) to a nearby point√(x, x) on the graph of
√ x−1
y = x (assuming always x > 0) has gradient . Verify that this
x−1
1
simplifies to √ . Now show that
x+1
√
1 1 |1 − x| |1 − x|
√
x + 1 − 2 < 2
<
2
.
√
x−1
Then use the epsilon-delta definition of limit to determine limx→1 ,
x−1
that is, the gradient of the curve at (1, 1).
140. Assume throughout this question that x > 0. Verify that
√
1 1 |3 − x| |9 − x|
√
x + 3 − 6 < 18
<
54
.
√
x−3 1
Now show that (if also x = 9) =√ .
x−9 x+3
√ joining the point (9, 3) to a nearby point
The gradient of the straight line
√ √ x−3
(x, x) on the curve y = x is . Use the above roughwork and the
x−9
epsilon-delta definition to evaluate the limit of this expression as x → 9
(that is, the gradient of the curve itself at x = 9).
141. Suppose that f , g have domain D, that p is a limit point of D ∩ (p, ∞) and
that (as x → p+ ) f (x) → and g(x) → m.
(a) Use the epsilon-delta description of one-sided limits to prove that
f (x) + g(x) → + m (as x → p+ ),
(b) Use the sequence description of one-sided limits to prove that
f (x) − g(x) → − m (as x → p+ ).
142. Show that left-hand limits preserve inequalities, in the following sense: if
f (x) ≤ g(x) for all x ∈ (a, b) and both f and g have left-hand limits at b,
then
lim f (x) ≤ lim g(x).
x→b− x→b−
(Suggestion: proof by contradiction using the epsilon-delta description of

limits.)
143. State and (using sequences) prove a squeeze rule for function limits as
x → p− .
144. Given that
⎧
⎪
⎪(1 + x − x )
2 2 if x < 3,
⎨
f (x) = 12 if x = 3,
⎪
⎪
⎩ 3
x −x+1 if x > 3
show that f has a limit at 3 but is not continuous there.

145. Verify that the function specified by
⎧
⎪
⎪ 1 + x + x2 if x < 4,
⎨
f (x) = x3 − 2x2 − 3x + 1 if x > 4,
⎪
⎪
⎩
0 if x = 4
has a limit at x = 4 but is not continuous there.
146. We define f : R → R by the formula

a + bx + x2 if x = 3,
f (x) =
a + 3b if x = 3.
Prove that f has a limit as x → 3. Find the relation (in as simple a form as
possible, without modulus signs!) between a and b that is equivalent to the
modulus |f | being continuous at 3.
147. The function f specified by
⎧ 2
⎪ x − 4x + 3
⎪
⎪ if x < 1,
⎪
⎨ x−2
f (x) = ax2 + bx + 5 if 1 ≤ x ≤ 2,
⎪
⎪
⎪
⎩ x + 4x − 3 if 2 < x
2
⎪
x−1
is continuous at 1 and at 2. Find the values of the constants a and b.

148. The formula ⎧

⎪
⎪ 5 − x2 if x < −1,
⎨
f (x) = ax3 + b if − 1 ≤ x ≤ 1,
⎪
⎪
⎩
1 + x + x2 if x > 1
is known to define a continuous function on R. What are the numerical
values of a and b?
149. The function
⎧
⎪ 2 + x3
⎪
⎪ if − 2 < x < −1,
⎨ 2+x
f (x) = a + bx + x2 if − 1 ≤ x ≤ 1,
⎪
⎪
⎪
⎩ 2
bx − 13 if 1 < x
is continuous at x = −1 and at x = 1. What are the numerical values of the
constants a and b?
150. Given that the shared domain of f and g is not bounded above, and that the
two functions tend to , m (respectively) as x → ∞, use the sequence
description of limits to prove that f (x)g(x) → m as x → ∞.
151. Given that f and g have the same domain and possess limits and m as
x → ∞ and that m = 0, use sequences to show that
f (x)
→ as x → ∞.
g(x) m
x
152. Show that tends to −∞ as x → −1. (You can use either the formal
(1 + x)2
definition or the sequence description.)
3x + 5
153. Verify via the definition that limx→2 = ∞.
(x − 2)2
154. Let f , g have the same domain D of which p is a limit point, and suppose that
(as x → p) f (x) → ∞ and g(x) → ∈ R. Prove that f (x) + g(x) → ∞ (as
x → p).
sin(π x)
155. Show that tends to ∞ as x → 12 . (Use the formal definition: given
(2x − 1)2
K > 0, find a positive number δ such that |x − 12 | < δ guarantees that
sin(π x)
> K.)
(2x − 1)2
156. Show that
5x2 + 3x
lim = ∞.
x→−1 (x + 1)2
157. Use either the sequence description or the epsilon-delta description to
2x2 − 3
explore the limit of f (x) = 2 as x → −1.
3x − 1
1 + x + x2
158. For the function f defined by f (x) = , show that the limit of
2 + 3x + 4x2
1
f (x) as x → ∞ is 4 by two different methods:
• by the epsilon-style definition,
• by using the sequence description of such limits.
159. Suppose that f : (0, ∞) → R and thatanother
function g : (0, ∞) → R is
1
then defined by the formula g(t) = f . Prove that the following
t
statements are equivalent (suggestion: use an epsilon-style argument):
(a) f (x) → as x → ∞,
(b) g(t) → as t → 0+ .
160. Differentiate (within the appropriate domain) each of the following:
1 + x − x3
(a) ,
2 − x + x2
(b) sin(ex )esin x ,
√
(c) x ln x cos x,
(d) sin(ln(cos x)).
161. Differentiate the following functions (within their domains, and assuming
that ex , sin x, cos x and ln x have their well-known derivatives). (Do not
spend a lot of time simplifying your answers.)
(a) f (x) = ex ln x cos(2x),
sin x + x3
(b) f (x) = ,
cos x − x2
(c) f (x) = (1 + x + ex )13 ,
(d) f (x) = ecos(ln x) ,
x ln(1 + ex )
(e) f (x) = √ .
4 + x2
162. Differentiate (with respect to x) the expressions
x2 + sin x
(a) 3 ,
x − cos x
(b) (x4 − 12) sin9 x,
(c) ln(e2x − e−2x ).
163. Differentiate,
with
respect to x, each of the expressions
1 + ex
(a) sin ,
1 − ex
(b) (1 + x2 ) ln(1 + x2 ),
x
(c) xee .
f (c + h) − f (c − h)
164. If f is differentiable at x = c, show that possesses a limit
2h
as h → 0.
See if you can devise a function g that is not differentiable at x = 0 and yet
g(0 + h) − g(0 − h)
possesses a limit as h → 0.
2h
165. Suppose that f is differentiable at c. Evaluate the following:
f (c + 2h) − f (c − 3h)
(a) limh→0 ,
h
(f (c + h))2 − (f (c − h))2
(b) limh→0 .
h
(Notice that the top line in (b) is the difference of two squares.)
166. If
1 + x + ax2 while x < 4,
f (x) =
3 + bx + x2 while x ≥ 4
is to define a function that is differentiable on R, find what numerical values
a and b must have.
167. Given that the function

ax3 + bx if x < 2,
f (x) =
ax2 + 5 if x ≥ 2
is differentiable at x = 2, determine the numerical values of the constants a
and b.
168. The formula 2
3x + ax + b if x < 1,
f (x) =
2x2 + 2bx − a if x ≥ 1
defines a function f that is differentiable at x = 1. Evaluate the constants a
and b.
169. Determine, if possible, the maximum and minimum values of the functions
x
f (x) = cos(2x) − cos2 x on [0, 3π/4] and g(x) = 2 on R.
x + 4x + 9
170. Find the maximum value and the minimum value of the expressions on the
intervals indicated:
(a) x2 ln x on [ 1e , 1],
(b) ex − 2e2x + e3x on [−2, 2].
171. Suppose that f and g are continuous on [a, b] and differentiable on (a, b),
and that f (x) = g (x) everywhere in (a, b). Show that the graphs of f and of
g cannot intersect at two different points. (Hint: if they did, consider the
behaviour of the function h(x) = f (x) − g(x).)
172. Use Rolle’s
theorem
on the function f given by f (x) = x3 (1 + sin x) on the
3π
interval 0, 2 , to see what it tells us about the equation
cos x 3
=− .
1 + sin x x
173. Use Rolle’s theorem to show that the equation tan x = x2 has a solution in
the interval (0, π/2). (Suggestion: consider x2 cos x.)
Show further that there is a sequence (cn )n∈N of numbers in the interval
(0, π/2) such that, for each positive integer n,
n
tan cn = .
cn
174. Use differentiation to show that the function f (x) = x2 ex is decreasing on

the interval [−2, 0].
175. Use differentiation to show that the function
f (x) = (1 + x − x2 )e3x
is decreasing on (−∞, −1], increasing on [−1, 43 ] and decreasing on [ 43 , ∞).

176. Show that
4x2 + 3x
(a) f (x) = is increasing on − 13 , 3 ,
1+x 2
(b) g(x) = 2x cos x − 2 sin x − x2 is decreasing on [0, ∞).

177. Given real constants a and b such that a = 0, show that the equation
x5 + ax3 + 3a2 x + b = 0 cannot have two distinct solutions.
178. Given a constant a > 1, show that the equation x4 + ax3 − a = 0 has
exactly one positive solution.
179. Show that if f is decreasing on (a, b) and bounded above, then the one-sided
limit limx→a+ f (x) exists.
180. (a) Suppose we are given two sequences (yn )n≥1 and (an )n≥1 of real
numbers such that an → 0 and |ym − yn | ≤ an whenever the integers
n, m satisfy 0 < n < m. Prove that (yn )n≥1 is a Cauchy sequence.
(b) (Conversely,) given that (yn )n≥1 is a Cauchy sequence, show that there
is a sequence (an )n≥1 such that an → 0 and |ym − yn | ≤ an whenever
0 < n < m.
181. Using the rather crude estimation of counting the terms and noting which
1 1 1 1
one is the smallest, show that √ + √ + √ + · · · + √ >6
17 18 19 64
and that, for each positive integer n,
1 1 1 1
√ +√ +√ + ··· + √ > 3(2)n−2 .
4 n−1 +1 4n−1 +2 4n−1 +3 4n
1
Use this to show that the series ∞ n=1 √ is divergent.
n
182. Suppose that (xn ) and (yn ) are Cauchy sequences. Show, purely from the
definition of Cauchy (and, in the case of part (c), also using the fact that
Cauchy sequences are bounded) that:
(a) (xn + yn ) is Cauchy,

(b) for any constant M, (Mxn ) is Cauchy,
(c) (xn yn ) is Cauchy.
1
Also show by example that can fail to be Cauchy even in the case
xn
where xn > 0 for every n.
183. Of the following two statements, just one is true in general. Give a proof for
the one that is true, and find a counterexample that disproves the
false one.
(a) If (xn )n∈N and (yn )n∈N are both Cauchy sequences, and there is a
(strictly) positive number δ > 0 such that |yn | ≥ δ for all n ≥ 1, then
the ‘term-by-term quotient’ sequence

xn
yn n∈N
(b) If (xn )n∈N and (yn )n∈N are both Cauchy sequences, and |yn | > 0 for all
n ≥ 1, then the ‘term-by-term quotient’ sequence

xn
yn n∈N
184. Given a sequence (xn ), suppose we succeed in finding a positive constant K
and a real number a ∈ (0, 1) such that, for every positive integer n,
|xn − xn+1 | < Kan .
Prove that (xn ) is Cauchy, and therefore converges.

185. Given a real constant t ∈ (0, 1) and a sequence (xn )n≥1 that satisfies (for
each positive integer n) the inequality |xn+1 − xn | < 1000t n , prove that
(xn )n≥1 converges to some limit.
186. Given that (for each n ≥ 1) |xn − xn+2 | < 10(0.6)n and that
|xn − xn+5 | < 20(0.7)n , show that (xn ) is Cauchy.
(Suggestion: xn − xn+5 + xn+5 − xn+3 + xn+3 − xn+1 = xn − xn+1 .)
187. For a sequence (xn ), the following information is known for each n ∈ N:
|xn − xn+6 | < 7(0.6)n , |xn − xn+10 | < 4(0.7)n , |xn − xn+15 | < 9(0.8)n .
By estimating |xn − xn+1 |, show that (xn ) must be convergent.

188. (a) If, for a given sequence (xn ), |xn − xn+2 | < 2−n and |xn − xn+3 | < 3−n
for each n ∈ N, does it necessarily follow that (xn ) itself is Cauchy?
(b) If, for a given sequence (xn ), the subsequence (x2n ) and the subsequence
(x3n ) are both Cauchy, does it necessarily follow that (xn ) itself
is Cauchy?
189. Confirm that the series
∞
7 sin(k2 + 3k) − 4 cos(2k2 − 5)
(1.1)k
k=1
converges, by showing that the sequence of its partial sums is Cauchy.

190. Consider the sequence (xn )n≥1 defined by the formula xn =
sin(cos((1 + π ))) sin(cos((1 + π )2 )) sin(cos((1 + π )3 ))

+ + + ...
e e2 e3
sin(cos((1 + π )n ))
+ .
en
Verify that |xn − xn+1 | ≤ e−(n+1) . Prove that (xn )n≥1 converges (by
verifying that it is Cauchy).
191. Of the following three statements, at least one is true in general and at least
one is false. Give a proof for each that is true, and find a counterexample to
disprove each false one.
(a) If f : (a, ∞) → R is a continuous function on an unbounded open
interval (a, ∞), and (xn )n∈N is any Cauchy sequence of elements of
(a, ∞), then (f (xn ))n∈N must also be Cauchy.
(b) If f : [a, ∞) → R is a continuous function on an unbounded closed
interval [a, ∞), and (xn )n∈N is any Cauchy sequence of elements of
[a, ∞), then (f (xn ))n∈N must also be Cauchy.
(c) If f : R → R is a continuous function on the whole real line R, and
(xn )n∈N is any Cauchy sequence, then (f (xn ))n∈N must also be Cauchy.
192. A given series consists of non-negative terms. By bracketing these terms
together in a particular way, we can create a new series that converges. Prove
that the original series converges also (and to the same sum).
193. We consider the rearrangement of the alternating harmonic series in which
the positive terms are taken in pairs followed by one negative term, thus:
1 1 1 1 1 1 1 1 1 1 1 1
+ − + + − + + − + + − + ··· (∗)
1 3 2 5 7 4 9 11 6 13 15 8
Notice that if we bracket these terms together in threes, the nth bracket is

1 1 1
+ − .
4n − 3 4n − 1 2n
Use this to show that (i) the bracketed series converges, and that (ii) series
(*) also converges.
194. We consider the rearrangement of the alternating harmonic series in which
each positive term is followed by three negative terms, thus:
1 1 1 1 1 1 1 1 1 1 1 1
− − − + − − − + − − − +··· (∗∗)
1 2 4 6 3 8 10 12 5 14 16 18
Notice that if we bracket these terms together in fours, the nth bracket is

1 1 1 1
− − − .
2n − 1 6n − 4 6n − 2 6n
Use this to show that (i) the bracketed series converges, and that (ii) series
(**) itself converges.
195. Use the ratio test of d’Alembert to find all values of x ∈ R for which the
following series converges:
(2 − x)k
.
5k (3k + 4)
196. Find the range of values of x for which the series
((n + 1)!)2
32n−1 x2n
(2n + 1)!
converges.
197. If x ∈ R and (for each k ∈ N)
k!(2k + 1)!
ak = (1 + 2x)k ,
(3k − 1)!

determine precisely the range of values of x for which the series ak
converges.
198. Determine the set of values of the real parameter t for which the following
series is convergent:
(n + 2)n t n
.
(3n + 1)n
199. Use the nth root test to find all values of x ∈ R for which the following series
converges:
k k2
22k xk .
k+1

200. Find all values of x ∈ R for which the series yn converges, where:
−2n2
n 1
yn = nx 1+ .
n

201. Show that if a series an is conditionally
convergent
(that is, convergent
but not absolutely convergent) then both a+ n and a−n diverge to ∞.
202. Devise a rearrangement of the alternating harmonic series that diverges
to ∞.

203. Given a completely arbitrary series xk that is conditionally convergent,
and a completely arbitrary real number , think how you could devise a

rearrangement of xk that converges to .

Suggestion: the key ingredient in finding one is that both xk+ and xk−
diverge to infinity. You might begin by taking just enough of the
non-negative terms of the series to make the running total greater than .
204. Let s, t be distinct numbers in the interval (−1, 1). Recall that the geometric
n 1
series ∞ 0 s converges to (and likewise for t in place of s).
1−s
∞ n
Write down the Cauchy product ∞ 0 cn of the two series 0 s and
∞ n
0 t and simplify the expression you obtain for its typical term cn . From
this, deduce that
∞ n+1
s − t n+1 1
converges to .
0
s−t (1 − s)(1 − t)

205. Let −1 < s < 1. Write down the Cauchy product ∞ 0 cn of the series
∞ n
0 s by itself, and simplify the expression you obtain for its typical term
cn . Hence evaluate the sum of the series
∞

(n + 1)sn .
0
206. Recall that (for all x ∈ (−1, 1)) the series 1 + x + x2 + x3 + x4 + · · ·

1
converges to and the series 1 − x + x2 − x3 + x4 − · · · converges to
1−x
1
. Calculate (and simplify as necessary) the Cauchy product of these
1+x
two series and confirm that (as predicted by the theorem on the Cauchy
product of two series) it converges to the product of the two functions.
207. Assuming the correctness of
xn ∞
x2 x3
ex = 1 + x + + + ··· = ,
2! 3! 0
n!
calculate (and simplify as necessary) the Cauchy product of the power series
representations of ex and of ey , and confirm that it converges to the product
of the two functions.
208. Determine the radius of convergence of each of the following:
∞ n
x2 x3 x
(a) (ex =) 1 + x + + + ··· = ,
2! 3! 0
n!
∞
x3 x5 x7 x2n+1
(b) (sin x =) x − + − + ··· = (−1)n ,
3! 5! 7! 0
(2n + 1)!
∞
x2 x4 x6 x2n
(c) (cos x =) 1 − + − + ··· = (−1)n ,
2! 4! 6! 0
(2n)!

(d) (30n + n30 )xn ,

(e) (n!)xn ,

n!(n + 1)!(n + 2)! n
(f) x ,
(3n + 1)!
xn
(g) 2.
3 2n
1+
n
209. Given a, b ∈ R, show that
f : R → R, x → ax + b
is uniformly continuous on R.
210. Show that the function given by f (x) = x3 is not uniformly continuous on
any interval of the form [a, ∞).
211. Decide (with proof) whether the following functions are or are not
uniformly continuous on the indicated intervals.
(i) On (0, 1), f (x) = sin( x1 ).
(ii) On (0, 1), f (x) = x sin( x1 ).
√
(iii) On [0, ∞), f (x) = x x.
212. Let a ∈ R be given. Find:
(i) a continuous function on [a, ∞) that is not uniformly continuous,
(ii) a continuous function on (a, ∞) that is not uniformly continuous,
(iii) a continuous function on (a, b) that is not uniformly continuous.
213. An interior point b of an interval I divides it into a left portion L and a right
portion R that intersect only at b. (So L takes one of the forms (−∞, b],
(a, b], [a, b] and R takes one of the forms [b, ∞), [b, c), [b, c].)
Given a real function f : I → R that is uniformly continuous on L and
uniformly continuous on R, show that f is uniformly continuous on
I = L ∪ R also. (This is sometimes referred to as a ‘gluing lemma’, as we

imagine gluing the two portions of the domain back together.)
214. (i) If x and y are both greater than or equal to 1, show that
√ √
| 3 x − 3 y| ≤ 13 |x − y|.
√
(ii) From (i), show that the function given by f (x) = 3 x is uniformly
continuous on [1, ∞).
√
(iii) Now use Exercise 213 to show that f (x) = 3 x is uniformly continuous
on [0, ∞).
215. Determine whether the following real functions are uniformly continuous
on the intervals indicated.
(a) On [0, 2π ] we define f by the formula f (x) = esin(cos(x ))
2
1
(b) On (−1, 1) we define f by the formula f (x) = .
1 − x2
1 − cos x
(c) On (0, π/2] we define f by the formula f (x) = .
x2
(d) On (0, ∞) we define f (x) = ln x.
√
216. Show that the function f : [0, 1] → R described by f (x) = 1 − x2 is
uniformly continuous, but is not Lipschitz. (Suggestion: if there were a
constant K such that |f (x) − f (y)| ≤ K|x − y| for all relevant x and y, put
x = 1 and y = 1 − n1 for each positive integer n and see what that tells you
about K.)
217. Show that the function f described by the formula f (x) = x − x−1 is
uniformly continuous on [1, ∞), but not on (0, ∞).
218. (a) Suppose we know that a function (defined at least upon an interval of
the form [a, ∞) or (a, ∞)) possesses a limit as x → ∞. Show that there
is some interval of the form [b, ∞) upon which this function is
bounded.
√
(b) Show that f (x) = 3x + sin( x) defines a uniformly continuous
function on the interval (0, ∞). (Hint: use the result of part (a) upon the
derivative of f .)
sin x
219. Show that the function f defined on R by the formula f (x) = is
1 + x2
uniformly continuous on the real line.
√
3
220. Prove that the function specified by f (x) = x2 , x ∈ R, is uniformly
continuous on the real line.
221. We are given that f : (a, ∞) → R is differentiable on its domain and that
f (x) → ∞ as x → ∞. Prove that f is not uniformly continuous on (a, ∞).
222. Show that the function f : (0, ∞) → R defined by
sin x
f (x) =
ex − e−x
is uniformly continuous (on its domain).

223. Use the three methods indicated to show that the equation
π πx
cos = (x + 1)ex−1
2 2
has at least one solution in the interval (0, 1):

(a) by applying the intermediate value theorem to
π cos π x − (x + 1)ex−1 ,
2 2
(b) by applying the Cauchy mean value theorem to sin π2x and xex−1 ,
(c) by applying Rolle’s theorem to sin π2x − xex−1 .

224. Let n be a given positive integer. Use the Cauchy mean value theorem on the
two functions f , g given by f (x) = sinn x, g(x) = xn over the interval
[0, π/2], to see what it tell us about the equation

π n sin x n−1
sec x = .
2 x
225. Given that 0 < a < b, that f is continuous on [a, b] and that f is
differentiable on (a, b), show using Cauchy’s mean value theorem that there
is a number c in the open interval (a, b) such that
bf (a) − af (b)
= f (c) − cf (c).
b−a
f (x) 1
(Hint: consider the functions described by and .)
x x
226. Assuming that cos t < 1 for every t ∈ (0, 2π ), use Cauchy’s mean value
theorem to show that
(a) sin x < x for every x ∈ (0, 2π ),
(b) 1 − cos x < x2 /2 for every x ∈ (0, 2π ),
(c) sin x > x − x3 /6 for every x ∈ (0, 2π ).
(Suggestions: for (a), try the functions described by f (t) = sin t and g(t) = t
on the interval [0, x]; for (b), try the functions described by f (t) = 1 − cos t
and g(t) = t 2 /2 on the interval [0, x].)
227. The following alleged proof of CMVT is incorrect. Find out precisely why.
‘Since f satisfies the conditions of FMVT over the interval [a, b], we know
that f (b) − f (a)
there exists c ∈ (a, b) such that f (c) = .
b−a
‘By exactly the same argument on g:
g(b) − g(a)
there exists c ∈ (a, b) such that g (c) = .
b−a
‘Dividing one by the other (and remembering that g(b) − g(a) cannot be
zero, else Rolle’s theorem would give g = 0 somewhere, contradiction)
we get
f (c) f (b) − f (a)
=
g (c) g(b) − g(a)
as desired.’
(You might wish to try running the argument of the alleged proof on a
couple of simple functions such as f (x) = x2 and g(x) = x3 over [0, 1], and
observe its failure.)
x7 − 3x5 + 2 sin x − x
228. Use l’Hôpital’s rule to evaluate limx→1 5 and limx→0 .
x + 2x3 − 3 x3
229. Use l’Hôpital’s rule to evaluate the following:
1 − esin x
(a) limx→π ,
x−π
1 − cos x + x ln(1 + x)
(b) limx→0 ,
sin2 x
1
ex − 1 − x − x2
(c) limx→0 2 .
x − sin x
x
e − (x + 2)e−1
230. Evaluate limx→−1 .
(x + 1)2
231. Evaluate limx→0+ (x ln x).
(HINT: seek the limit of f (x)/g(x) where f (x) = ln x and g(x) = x1 . You can
assume that l’Hôpital’s rule works in the ‘plus or minus infinity over infinity’
case just as it does in the ‘zero over zero’ case – see also Exercise 234.)

x − arctan x
232. Evaluate limx→0 .
x3
233. Determine the following
√ limits:
x− x
(a) limx→4 ,
4−x
ex + (2 − x)e3
(b) limx→3 ,
(x − 3)2
π
(c) limx→∞ x − arctan x .
2
234. (a) Given that 0 < ε < 1, |h − L| < ε and |m − 1| < ε, verify that
|hm − L| ≤ (2 + |L|)ε.
(b) Prove the following version of l’Hôpital’s rule: if f and g are both
differentiable (with g = 0) on (a, b) and, as x → b− :
• f (x) → ∞,
• g(x) → ∞ and
f (x)
• → ,
g (x)
then also
f (x)
• → .
g(x)
Hints for part (b):
i. (Given ε > 0) show that there is x0 in (a, b) such that, for every x in
f (x)
(x0 , b), − < ε.
g (x)
ii. Now show that there is x1 in (x0 , b) such that, for every x in (x1 , b),

g(x0 )
1 −
g(x)
− 1 < ε.
f (x0 )
1 −
f (x)
iii. For each x in (x1 , b), apply the Cauchy mean value theorem to f , g
over the interval [x0 , x].
235. Verify that the eighth Taylor (Maclaurin) polynomial approximating the
function cos x at a = 0 is given by
x2 x4 x6 x8
p8 (x) = 1 − + − + .
2! 4! 6! 8!
Use Taylor’s theorem, including estimation of the error or remainder, to

evaluate cos(0.4) correct to six decimal places.
236. Determine a positive integer n such that there is a polynomial of degree n
whose values differ from those of ex by less than 0.001 at all points of the
interval [−1, 1]. (You may use the Taylor/Maclaurin polynomials as
determined in paragraph 16.3.3 of the text.)
237. If |x| < 12 , it is easy to check (and you may take this for granted) that
|x|
< 1. Taking f (x) = ln(1 + x) and a = 0 in our version of Taylor’s
1 − |x|
theorem, show that the kth Taylor polynomial (evaluated at x) converges to
ln(1 + x) at every point of the interval (− 12 , 12 ) by estimating the remainder
at stage k.1
1 The result is actually true on the larger interval (−1, 1), but confirming this needs a slightly
different method of proof.
238. Use the Taylor expansion of ln(1 + x) (for small values of x) to determine

whether or not the series yn given by
n2 +n
n+5
yn = e−5n
n
is convergent.
239. By considering the logarithm of the typical term
and appealing to Taylor’s
theorem, determine whether or not the series xn specified by
n2 2n
3n
xn = e3
3n + 2
converges.
240. Provided that −1 < x < 1, what is the sum of the series
1 + 2x + 3x2 + 4x3 + · · · + (n + 1)xn + · · ·
and what is the sum of the series

1
1 + 3x + 6x2 + 20x3 + · · · + (n + 1)(n + 2)xn + · · · ?
2
241. Assuming that it is possible to express arctan x (for −1 < x < 1) as the sum
of a power series a0 + a1 x + a2 x2 + a3 x3 + · · · + an xn + · · · , use the result
on differentiation of power series to determine the coefficients an .
242. Consider the real function f : [0, 2] → R defined by
f (x) = 2 if x ∈ [0, 1), f (1) = 0, f (x) = 1 if x ∈ (1, 2].
By directly calculating the upper and lower Riemann sums for f using the
partition
= {0, 1/n, 2/n, 3/n, · · · , (n − 1)/n, (n + 1)/n, (n + 2)/n, · · · , 2}
where n is an arbitrary positive integer, show that f is Riemann integrable

and evaluate its Riemann integral.
243. Consider the real function f : [0, 1] → R defined by
f (x) = x if x ∈ [0, 1), f (1) = 0.
By directly calculating the upper and lower Riemann sums for f using the
partition
= {0, 1/n, 2/n, 3/n, · · · , (n − 1)/n, 1}
where n is an arbitrary positive integer, show that f is Riemann integrable

and evaluate its Riemann integral.
244. (a) Suppose we know that a function f (that is defined on an interval of the
form [a, b]) takes the value 0 at every point of [a, b] with one exception:
there is a unique point c ∈ (a, b) such that f (c) = 0. Show that the Riemann
integral of f over [a, b] exists and is zero. Suggestion: given ε > 0, choose
positive h < 2|fε(c)| so that a < c − h < c + h < b and calculate the upper
and lower Riemann sums for this four-element partition.
(b) How would you modify this proof (to get the same result) if c were equal
to a or to b?

245. Use the integral test to show that the series ∞ n=1 n converges if t < −1
t
and diverges if −1 ≤ t < 0.

1
246. Decide whether the series ∞ n=3 converges or diverges.
n ln(n) ln(ln(n))
247. Use the proof of the integral test to get upper and lower estimates for
∞ 1
n=10 .
n2
248. Consider a real function f : [0, 1] → R of which we know only that it is
bounded and that, for every positive integer n ≥ 2, f is Riemann integrable
over the interval [ n1 , 1]. Prove that f must be integrable over [0, 1] and that
1 1
f = lim f.
0 n→∞ 1
n
249. Evaluate
1 1 3
x2 ex dx and x2 ex dx.
0 0
250. The real function f : [0, 2] → R is defined by the formulae f ( n1 ) = 0 for
each positive integer n but f (x) = 5x for each x ∈ [0, 2] that is not of the
form n1 for positive integer n. Show that f is R-integrable over [0, 2] and
evaluate its R-integral. (Hint: consider 17.4.9 and 17.4.10.)
251. Give an example of an integrable function on a bounded closed interval I
that is not monotonic and is discontinuous at an infinite number of points
in I.
252. Show by example that the following assertion is false: if f , g are integrable
over a closed bounded interval I and g(x) > 0 at every point x of I, then
f (x)
must be integrable over I also.
g(x)
Think how you might slightly modify this false statement to make (and then
prove) a true one about integrability of the quotient of two integrable
functions.
253. Calculate the integrals of each of the following expressions over the interval
indicated (assuming, where appropriate, basic properties of trig, logarithmic
and exponential functions).
(a) f (x) = x ln x over [1, e],

2 2
(b) f (x) = (0.5 + xex )(x + ex )6 over the interval [0, 1],
π
(c) f (x) = sin2 x cos2 x over [0, ],
2
(d) f (x) = ex sin x over [0, π ].
254. (a) Extend the argument !of the integral test to show that, for each (fixed)
n
positive integer n0 , if n0 f → as n → ∞, then
∞

≤ f (k) ≤ + f (n0 ).
k=n0
1
(b) Estimate the sum of the (convergent) series ∞ 1 k5 with an error less
than 0.001.
255. (a) For a given real number a, use the one-sided version of l’Hôpital’s Rule
to determine
ln(1 + ax)
lim .
x→0 + x
(b) Use the sequence-based description of one-sided limits to deduce that
a
ln 1 +
lim n = a.
n→∞ 1
n
(c) Now use continuity of the exponential function to deduce that
a n
1+ → ea as n → ∞.
n
256. (a) Verify that the function tan, defined (initially) on the interval (− π2 , π2 )
by the formula
sin x
tan x = ,
cos x
is continuous and differentiable, with (positive) derivative (cos x)−2 ,
and has range R; also that its inverse arctan or tan−1 : R → (− π2 , π2 ) is
continuous and differentiable, and that its derivative is given by
1
(tan−1 ) (x) = .
1 + x2
(b) Check that the radius of convergence of the power series
t t2 t3 t4
1− + − + − ···
3 5 7 9
is 1: so that, in particular, the series
x2 x4 x6 x8
1− + − + − ···
3 5 7 9
and
x3 x5 x7 x9
x− + − + − ···
3 5 7 9
are absolutely convergent, the second one to some function f (x), on (at
least) the interval (−1, 1).
(c) Appeal to the theorem on differentiation of power series to show that
1
f (x) = 1 − x2 + x4 − x6 + x8 − · · · = (−1 < x < 1).
1 + x2
(d) Deduce that (for −1 < x < 1):
x3 x5 x7 x9
x− + − + − · · · = tan−1 (x).
3 5 7 9
(e) Use this to justify what

√ we claimed in paragraph 2.6. Also use the fact
that tan(π/6) = 1/ 3 to obtain a (numerically simple) series whose
sum is √π .
12
Suggestions for further reading
Alcock, L. How to Think About Analysis. Oxford University Press (2014).

Appelbaum, D. Limits, Limits Everywhere: the Tools of Mathematical Analysis. Oxford
University Press (2012).
Bryant, V. Yet Another Introduction to Analysis. Cambridge University Press (1990).
Burn, R.P. Numbers and Functions: Steps into Analysis (2nd edn). Cambridge University
Press (2000).
Howie, J.M. Real Analysis. Springer (2001).
Spivak, M. Calculus (corrected 3rd edn). Cambridge University Press (2006).
Index
A D
Absolute convergence 237 D’Alembert’s test 120
algebra of limits Darboux integrability criterion 302
for continuous functions 136 decreasing
for convergent sequences 24 function 147
for divergent sequences 32 sequence 53
for functions 165 dense 48
for functions as x → − ∞ 190 derivative 203
for functions as x → ∞ 188 differentiation 203
for series 110 chain rule 208, 225
alternating series test 109 inverse function 212
product rule 206
B quotient rule 207
Binomial term-by-term 287
coefficient 69 direct comparison test 111
theorem 70 diverge/divergent/divergence
Bolzano-Weierstrass 83 for sequences 17
bound for series 105
lower 41
upper 41 E
bounded 41 Element 37
above 41 elementary functions
below 41 basic information 35
defined and established 325
C endpoint 41
Cauchy exponential function 329
mean value theorem 277
product 249 F
sequence 229 Fibonacci sequence 98
CMVT 277 first mean value theorem 219
completeness principle 45 floor 4
composite/composition 137 FMVT 219
conditional convergence 237 function
continuous/continuity composite/composition 137
at a point 133 continuous/continuity at a point 133
on a set 134 continuous/continuity on a set 134
uniform 263 converging to a limit 159
converge/convergent/convergence decreasing 147
absolute (for series) 237 differentiable at a point 203
conditional (for series) 237 differentiable on a set 203
for sequences 17 increasing 147
critical point 216 Lipschitz 272
380 INDEX
function (cont.) modulus 3

monotone/monotonic 147 monotone/monotonic
tending to ∞ or to −∞ 192 function 147
uniformly continuous 263 sequence 53
fundamental theorem of calculus 313
P
I Partial sum 105
Increasing partition 295
function 147 positive integers 38
sequence 53 power series 252
induction 62
inequality R
reverse triangle 3 Radius of convergence 252
triangle 3 ratio test 120
infimum 44 rational 38
integers 38 recursively defined sequence 71
positive 38 reverse triangle inequality 3
integral test for series 321 Riemann
interior point 41 integrability criterion 302
intermediate value theorem 138 integrable 299
intersection 39 integral 299
interval 40 lower integral 299
irrational 39 lower sum 296
IVT 138 upper integral 299
upper sum 296
L Rolle’s theorem 217
L’Hôpital’s rule 280 root test 118
limit RT 217
from above 177
from below 177 S
from the left 177 Sequence 6
from the right 177 bounded, bounded above/below 55
of a function 159 Cauchy 229
of a function as x → ∞ or −∞ 185 converge/convergent/convergence 17
of a sequence 17 decreasing 53
one-sided 177 diverge/divergent/divergence
limit point 158 17
Lipschitz function 272 Fibonacci 98
local maximum 214 increasing 53
local minimum 214 limit 17
logarithm 325 monotone/monotonic 53
lower bound 41 recursively defined 71
tending/diverging to −∞ 29
M tending/diverging to ∞ 29
Maclaurin series 105
series 286 absolutely convergent 237
theorem 286 bounded 105
INDEX 381
conditionally convergent 237 sum

convergent 105 partial 105
divergent 105 to infinity (of a series) 105
integral test 321 supremum 43
Maclaurin 286
power 252 T
Taylor 284 Taylor
term-by-term differentiation series 284
287 theorem 284
set 37 triangle inequality 3
set difference 39 trigonometric functions 332
squeeze
for functions 166
for functions as x → − ∞
U
190 Uniform continuity 263
for functions as x → ∞ 189 union 39
for sequences 60 upper bound 41
one-sided limits 181
stationary point 216 W
subsequence 78 Well-defined 37

Aisling McCluskey, Brian McMaster - Undergraduate Analysis - A Working Textbook-Oxford University Press (2018)

Uploaded by

Copyright:

Available Formats

Aisling McCluskey, Brian McMaster - Undergraduate Analysis - A Working Textbook-Oxford University Press (2018)

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Aisling McCluskey, Brian McMaster - Undergraduate Analysis - A Working Textbook-Oxford University Press (2018)

Uploaded by

Copyright:

Available Formats

Undergraduate Analysis

3. Without undervaluing the absolute necessity of secure logical argument, it

A Note to the Instructor xiii

7 Endless sums — a ﬁrst look at series 103

16 Differentiation — mean value theorems, power series 277

Suggestions for further reading 377

1.1 Real numbers

A naı̈ve picture of the real line

1.2 The basic rules of inequalities — a checklist

2 – and are pronounced as x is greater/larger/bigger than y, y is less/smaller than x.

1.3.2 Proposition For any real numbers x, y:

|x1 + x2 + x3 + . . . + xn | ≤ |x1 | + |x2 | + |x3 | + . . . + |xn |.

6 More briefly: |x| = max{x, −x}.

Case 1: when x is not an integer

Case 2: when x itself is an integer

In both cases, the essential inequality connecting x and x is

2.2 Sequences, and how to write them

n!(n + 1)!(2n + 3)! t n

• Also be cautious about using such a symbol as (1, 4, 9, 16, · · · ): however

then we dare not use n = 1 or n = 3 because it would lead to division by zero

which starts the list off safely at n = 4.

1. (1, 3, 5, 7, 9, · · · ) = (2n − 1)n≥1

1 The best of all possible worlds!

Proper understanding of an approximation procedure therefore entails aware-

2.4 Inﬁnite decimals

2.5 Approximating an area

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 x

If we denote this estimate by U10 – since it is visibly an Underestimate of the

U10 = 0.2{0.22 + 0.42 + 0.62 + 0.82 + 12 + 1.22 + 1.42 + 1.62 + 1.82 }

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 x

This time, the calculation is

and the overestimate

8 (n − 1)(n)(2n − 1) 4(n − 1)(2n − 1)

the overestimate formula to

8 (n)(n + 1)(2n + 1) 4(n + 1)(2n + 1)

and the difference between them to

4(n + 1)(2n + 1) 4(n − 1)(2n − 1) 4(6n) 8

2.6 A small slice of π

be very tedious but routine to calculate by hand) will provide an approximation

2.7 Testing limits by the deﬁnition

2.7.1 Deﬁnition 1. A sequence (xn )n∈N is said to converge to a limit (or to

The ‘good’ approximations to lie between −tolerance and +tolerance

2.7.4 Example To show that

Partial draft solution

7 ‘worst case scenario’

2.7.6 EXERCISE Show that the sequence (17n−3 − 2)n≥1 converges.

Partial draft solution

Roughwork and partial draft solution

2.7.10 Theorem: uniqueness of limit of a convergent sequence If a sequence

ε is half the difference between 1 and 2

2.8 Combining sequences; the algebra of limits

Proof of part (1)

Proof of part (4)

and, consequently, that

2.8.3 Example To establish the convergence of the sequence (an ) described by

15n2 + n + 1 15 + n−1 + n−2 15 + 0 + 0 15

2.8.4 Example To find the limit (as n → ∞) of:

and therefore (using part 5 of the theorem)

Next, again dividing top and bottom by n:

In both cases, the essential inequality connecting x and x is

1. If a and b are rational, b = 0, and x is irrational, show that a + bx must be