Aisling McCluskey, Brian McMaster - Undergraduate Analysis - A Working Textbook-Oxford University Press (2018)
Aisling McCluskey, Brian McMaster - Undergraduate Analysis - A Working Textbook-Oxford University Press (2018)
Aisling McCluskey, Brian McMaster - Undergraduate Analysis - A Working Textbook-Oxford University Press (2018)
Undergraduate
Analysis
A Working Textbook
Aisling McCluskey
Senior Lecturer in Mathematics
National University of Ireland, Galway
Brian McMaster
Honorary Senior Lecturer
Queen’s University Belfast
1
3
Great Clarendon Street, Oxford, OX2 6DP,
United Kingdom
Oxford University Press is a department of the University of Oxford.
It furthers the University’s objective of excellence in research, scholarship,
and education by publishing worldwide. Oxford is a registered trade mark of
Oxford University Press in the UK and in certain other countries
© Aisling McCluskey and Brian McMaster 2018
The moral rights of the authors have been asserted
First Edition published in 2018
Impression: 1
All rights reserved. No part of this publication may be reproduced, stored in
a retrieval system, or transmitted, in any form or by any means, without the
prior permission in writing of Oxford University Press, or as expressly permitted
by law, by licence or under terms agreed with the appropriate reprographics
rights organization. Enquiries concerning reproduction outside the scope of the
above should be sent to the Rights Department, Oxford University Press, at the
address above
You must not circulate this work in any other form
and you must impose this same condition on any acquirer
Published in the United States of America by Oxford University Press
198 Madison Avenue, New York, NY 10016, United States of America
British Library Cataloguing in Publication Data
Data available
Library of Congress Control Number: 2017963197
ISBN 978–0–19–881756–7 (hbk.)
ISBN 978–0–19–881757–4 (pbk.)
Printed and bound by
CPI Group (UK) Ltd, Croydon, CR0 4YY
We dedicate this book to all those practitioners of the craft of analysis whose
apprentices we have been in times long past, and to the colleagues who in more
recent years have shared with us their insights and their enthusiasm.
In particular, we salute with gratitude and affection:
Samuel Verblunsky
Derek Burgess
Ralph Cooper
James McGrotty
David Armitage
Tony Wickstead
Ariel Blanco
Ray Ryan
John McDermott
AMcC, BMcM, October 2017
Preface
Mathematical analysis underpins calculus: it is the reason why calculus works, and
it provides a toolkit for handling situations in which algorithmic calculus doesn’t
work. Since calculus in its turn underpins virtually the whole of the mathematical
sciences, analytic ideas lie right at the heart of scientific endeavour, so that a
confident understanding of the results and techniques that they inform is valuable
for a wide range of disciplines, both within mathematics itself and beyond its
traditional boundaries.
This has a challenging consequence for those who participate in third-level
mathematics education: large numbers of students, many of whom do not regard
themselves primarily as mathematicians, need to study analysis to some extent; and
in many cases their programmes do not allow them enough time and exposure to
grow confident in its ideas and techniques. This programme-time poverty is one
of the circumstances that have given analysis the unfortunate reputation of being
strikingly more difficult than other cognate disciplines.
Aspects of this perception of difficulty include the lack of introductory gradual-
ness generally observed in the literature, and the without loss of generality factor:
experienced analysts are continually simplifying their arguments by summoning
up a battery of shortcuts, estimations and reductions-to-special-cases that are
part of the discipline’s folklore, but which there is seldom class time to teach in
any formal sense: instead, students are expected to pick up these ideas through
experience of working on examples. Yet the study time allocated to analysis in
early undergraduate programmes is often insufficient for this kind of learning
by osmosis. The ironic consequence is that basic analytic exercises are not only
substantially harder for the beginner than for the professional, but substantially
harder than they need to be.
This text, through its careful design, emphasis and pacing, sets out to develop
understanding and confidence in analysis for first-year and second-year under-
graduates embarked upon mathematics and mathematically related programmes.
Keenly aware of contemporary students’ diversity of motivation, background
knowledge and time pressures, it consistently strives to blend beneficial aspects
of the workbook, the formal teaching text and the informal and intuitive tutorial
discussion. In particular:
1. It devotes ample space and time for development of insight and confidence in
handling the fundamental ideas that – if imperfectly grasped – can make
analysis seem more difficult than it actually is.
2. It focuses on learning through doing, presenting a comprehensive integrated
range of examples and exercises, some worked through in full detail, some
supported by sketch solutions and hints, some left open to the reader’s
initiative (and some with online solutions accessible through the publishers).
viii PREFACE
1 Preliminaries 1
1.1 Real numbers 1
1.2 The basic rules of inequalities — a checklist of things you probably know
already 2
1.3 Modulus 3
1.4 Floor 4
2 Limit of a sequence — an idea, a definition, a tool 5
2.1 Introduction 5
2.2 Sequences, and how to write them 6
2.3 Approximation 10
2.4 Infinite decimals 11
2.5 Approximating an area 13
2.6 A small slice of π 16
2.7 Testing limits by the definition 17
2.8 Combining sequences; the algebra of limits 24
2.9 POSTSCRIPT: to infinity 29
2.10 Important note on ‘elementary functions’ 35
3 Interlude: different kinds of numbers 37
3.1 Sets 37
3.2 Intervals, max and min, sup and inf 40
3.3 Denseness 47
4 Up and down — increasing and decreasing sequences 53
4.1 Monotonic bounded sequences must converge 53
4.2 Induction: infinite returns for finite effort 62
4.3 Recursively defined sequences 71
4.4 POSTSCRIPT: The epsilontics game — the ‘fifth factor of difficulty’ 75
5 Sampling a sequence — subsequences 77
5.1 Introduction 77
5.2 Subsequences 77
5.3 Bolzano-Weierstrass: the overcrowded interval 83
6 Special (or specially awkward) examples 87
6.1 Introduction 87
6.2 Important examples of convergence 87
x CONTENTS
The first twelve chapters present the ideas of analysis to which virtually everyone
enrolled upon a degree pathway within mathematical sciences will require expo-
sure. Those whose degree is explicitly in mathematics are likely to need most of the
rest. Of course, how this material is divided across the years or across the semesters
will vary from one institution to another.
Most of the exercises set out within the text are provided with specimen
solutions either complete, outlined or hinted at, but in the final chapter we have
also included a suite of over two hundred problems which are intended to assist you
in creating assessments for your student groups. Specimen solutions to these are
available to you, but not directly to your students, by application to the publishers:
please see the webpage www.oup.co.uk/companion/McCluskey&McMaster for
how to access them.
Prior knowledge that the reader should have before undertaking study of this
material includes a familiarity with elementary calculus and basic manipulative
algebra including the binomial theorem, a good intuitive understanding of the
real number system including rational and irrational numbers, basic proof tech-
niques including proof by contradiction and by contraposition, very basic set
(and function) theory, and the use of simple inequalities including modulus.
Substantial revision notes on several of these topics are provided within the text
where appropriate.
A Note to the Student Reader
If, as a student of the material that this book sets forth, you are enrolled on a
course of study at a third-level institution, your instructors will guide and pace you
through it. Careful consideration of the feedback they give you on the work you
submit will be very profitable to you as you develop competence and confidence.
If you are an independent reader, not engaged with such an institution’s pro-
grammes, we intend that you also will find that the text supports your endeav-
ours through its design: in particular, through the expansive (almost leisurely)
treatment of the initial ideas that really need to be thoroughly grasped before you
proceed, through the informal and intuitive background discussions that seek to
develop a feel for concepts that will work in parallel with their precise mathematical
formulations, and through the explicit inclusion of roughwork paragraphs that
allow you to look over the shoulder of the more experienced practitioner of the
craft and under the bonnet of the problem being tackled.
In both cases, our strongest advice to you is to work through every exercise
as you encounter it, and either check your answer against a specimen answer
where available, see if it convinces a colleague or fellow student, or submit it for
assessment or feedback as appropriate. Nobody learns analysis merely by reading
it, any more than you can learn swimming or cycling just by reading a how-to book,
however well-intentioned or knowledgably written it may be. No one can teach you
analysis without your commitment; but you can choose to learn it and, if you do,
this working textbook is designed to help you towards success.
.........................................................................
1 Preliminaries
.........................................................................
−3 −2 −1 0 1 2 3
This is not, of course, a proper definition of what real numbers are. We are taking
what is sometimes called a naïve view of the system of real numbers: not having
sufficient time to construct it – to dig deeply enough into the logical foundations of
mathematics to come up with a guarantee of its existence – we are instead seeking
to highlight the common consensus on how real numbers behave, combine and
compare. This consensus will already be enough to let us start explaining some
basic ideas in analysis (and we shall say more about the finer structure of the real
numbers in Chapter 3).
Nothing in Section 1.2 is likely to strike the student reader as being much more
than common sense, and nor should it at this stage of study. Nevertheless, it is all
too easy to make mistakes in comparisons between numbers – inequalities – and it
is consequently important to keep these apparently obvious rules in mind and to
build up a good measure of confidence in their use, especially because so many
arguments in analysis depend upon using inequalities. Sections 1.3 and 1.4 present
a couple of useful operations on real numbers that are strongly connected with
inequalities.
√
√ 5
3
√
1 that is, non-rational numbers involving roots, such as 2, √ , 10 − 3 2.
1+ 2
Undergraduate Analysis: A Working Textbook, Aisling McCluskey and Brian McMaster 2018.
© Aisling McCluskey and Brian McMaster 2018. Published 2018 by Oxford University Press
2 1 PRELIMINARIES
• ‘There are large integers:’ that is, for any given real number x we can find an
integer n so that n > x.
1.3 Modulus
1.3.1 Definition If x is a real number, we define6 its modulus (also called its
absolute value) as |x| = the greater of x and −x. That is:
• If x ≥ 0 then |x| = x;
• If x < 0 then |x| = −x.
Since the effect of modulus is to ‘throw away the minus from negative numbers’,
the following should be obvious:
1.3.3 The triangle inequality For any real numbers x and y, we have
|x + y| ≤ |x| + |y|.
Proof
Since x ≤ |x| and y ≤ |y|, adding gives us x + y ≤ |x| + |y|.
Exactly the same reasoning gives us −x + (−y) = −(x + y) ≤ |x| + |y|.
Now |x + y| is either x + y or −(x + y). So whichever one it is, it is ≤ |x| + |y|.
Note
It is easy to extend this by induction7 to deal with any finite list of numbers, thus:
1.3.4
Thereverse triangle inequality For any real numbers x and y, we have
|x| − |y| ≤ |x − y|.
Proof
Use the triangle inequality on x = (x − y) + y and we get |x| ≤ |x − y| + |y|, from
which |x| − |y| ≤ |x − y|.
Interchange
x and y, and we also get |y| − |x| ≤ |y − x| = |x − y|.
Now |x| − |y| is either |x| − |y| or |y| − |x|. So whichever one it is, it is ≤ |x − y|.
1.4 Floor
1.4.1 Definition When x is a real number, we define the floor of x (also called
the integer part of x or, informally, x rounded down to the nearest integer) to be the
largest integer that is ≤ x. The usual notation for the floor of x is x , although some
books write it as [x]. For instance, 5.6 = 5, π = 3, 7 = 7, −8 12 = −9.
If you choose to imagine the real numbers as being set out along the real line,
with the integers – marked here by heavier dots – embedded into it at regular
intervals, then the following diagram should help you to picture the relationship
between x and x .
x x + 1
x x + 1
x ≤ x < x + 1
or, equivalently
x − 1 < x ≤ x.
.........................................................................
2 Limit of a sequence
— an idea, a definition,
a tool
.........................................................................
2.1 Introduction
Mathematical analysis has acquired a reputation – not entirely justified – for
seeming more difficult than other first-year undergraduate study areas. We shall
begin our exploration of it by seeking to identify the factors that have contributed
to this image, and what we can do to explain or address them.
Firstly, the study of mathematics is cumulative to a greater degree than that
of most disciplines. Each new block of mathematics that a student encounters is
built directly on other, underpinning, blocks, and it is practically impossible to
achieve confidence in the new without having previously identified and grasped the
older supporting material. No matter how well you can implement differentiation
algorithms, your chance of successfully finding the second derivative of x4 is very
limited until you’ve learned your three-times table.
Secondly, mathematics is hard. By that we do not mean that it is intrinsically
difficult: in this sense, ‘hard’ is the opposite of ‘soft’, not the opposite of ‘easy’.
Learning a piece of mathematics requires a precise understanding of the terms
that it involves, of the arguments that it employs and of the questions that it seeks
to answer. A broad appreciation, a solid general overview of the topic, will on its
own be utterly insufficient for actual application. Precision of concept and of logical
discourse, as well as the previously mentioned cumulativeness, are the hallmarks
of a discipline that is ‘hard’ in this sense.
Yet these two factors are common to the whole of mathematics. Why does
analysis in particular have such a daunting public image?
It seems to us that, thirdly, a lack of introductory gradualness comes into play
here. Most topics, in mathematics and elsewhere, can be adequately explained to
the beginner by working initially on simple special cases. So the usual arena for
first steps in linear algebra is something like the coordinate plane, rather than
an infinite-dimensional Banach space; French language lessons do not kick off by
handing out a table of the complete tenses of common irregular verbs. In analysis,
however, the very first concept that a beginner has to make sense of is one of the
most demanding: until you have a crisp understanding of the notion of the limit
of a sequence (or, a matter of similar difficulty, of the supremum of a set of real
Undergraduate Analysis: A Working Textbook, Aisling McCluskey and Brian McMaster 2018.
© Aisling McCluskey and Brian McMaster 2018. Published 2018 by Oxford University Press
6 2 LIMIT OF A SEQUENCE — AN IDEA , A DEFINITION, A TOOL
numbers) you can neither read nor carry out any significant analytic activity. On
the credit side, this means that we can honestly promise the beginner that the
material gets easier once we are through most of Chapter 2 – an interesting contrast
with many topics, both mathematical and otherwise – provided always that this
first concept is fully and thoroughly understood before we go any further.
Fourthly – and this is another point that applies to the whole of the discipline,
but is particularly relevant just here – mathematics as a subject and mathematicians
as a breed are inclined to prefer conciseness over verbosity when they present final
versions of their work, and to feel more at home with terse, lean, point-by-point
arguments rather than expansive, wordy, descriptive accounts. There are, however,
some key moments in analysis where expansive rather than compressed accounts
actually help in delivering understanding, and the definition of sequence limits,
right at the start of our study, is one of them. It is perfectly possible to write down
that definition in one line: but if we do, most readers will not see the point of it,
will not grasp the kind of problem that it is set up to address and will not be able
to make effective use of it even in quite simple examples. So – with apologies to all
those who don’t like reading essays – we see no alternative to spending a fair bit of
time and several hundred words filling in the background and ‘thinking out loud’
about how to use this idea in applications. We again reiterate that the concept itself
is not intrinsically difficult; it is merely different from mathematical notions that
you have already mastered, and needs a particular form of argument presentation
in order to get the best out of it. We also commit to getting back to concise, un-
wordy arguments as soon as and wherever possible.
With all this in mind, we shall devote most of Chapter 2 to a thorough and
leisurely exploration of this one single idea that opens the path to analytic argu-
ments in mathematics: limits of sequences – its intuitive meaning, some of the
contexts in which it arises, how to define it in terms sufficiently precise to do
serious mathematics with it, and how to handle that rigorous definition in a range
of illustrative examples. Please keep in mind that, once the opening chapter is safely
assimilated, most of the rest of the first-year analysis syllabus is easier. (By the way,
there is a fifth factor contributing to the widespread perception of the difficulty of
introductory analysis, but it concerns its logical structure rather than its narrowly
mathematical content, so we shall set it aside until some familiarity with the basic
idea has been gained – see Section 4.4.)
(a1 , a2 , a3 , a4 , · · · , an , · · · )
(a1 , a2 , a3 , a4 , · · · )
(an )n∈N
(an )n≥1
(an )
– and in many cases we complete the description by setting down a formula for
how to calculate each individual number an in the list (the so-called nth term). For
instance, if we wish to talk about the sequence of all perfect squares, that is, all
the squares of positive integers in their natural order, then all of the following are
acceptable symbols:
(1, 4, 9, 16, · · · , n2 , · · · )
(1, 4, 9, 16, · · · )
(n2 )n∈N
(n2 )n≥1
(n2 )
(a1 , a2 , a3 , a4 , · · · , an , · · · ) where an = n2 for each positive integer n
(a1 , a2 , a3 , a4 , · · · ) where an = n2 , each positive integer n
(an )n∈N in which an = n2 for each n
(an )n≥1 with an = n2 for each n
(an ), an = n2 for each positive integer n
It may seem a little irritating that so many different styles of symbol are allowed,
but this is mostly to enable us to tailor the notation we use to the particular problem
that we are working on without writing more than is necessary. For instance, if the
formula for an is as simple as an = n2 , then we really have no need for a separate
symbol for the nth term, and we might just as well write it as n2 all the time; on the
other hand, if the nth term is something as complicated as
then we shall certainly not want to write that out more often than is needful, and in
such cases, having a brief symbol such as an to stand in for it will be a considerable
benefit and relief.
Although the idea of denoting a sequence by a list of its first few terms or a
formula for its general term, wrapped up in brackets, is little more than common
sense, it will be important to use this notation consistently and correctly. So we
now flag up a few dos and don’ts concerning how best to employ it:
• Whenever you use a notation like (a1 , a2 , a3 , a4 , · · · , an , · · · ) or
(a1 , a2 , a3 , a4 , · · · ), be careful not to leave out the final row of dots: because a
symbol such as (a1 , a2 , a3 , a4 , · · · , an ) or (a1 , a2 , a3 , a4 ) is a standard way to
write a finite list of numbers consisting of only n or, indeed, only four items,
and you will confuse the person reading your work if you use it when you
actually intend an infinite sequence.
8 2 LIMIT OF A SEQUENCE — AN IDEA , A DEFINITION, A TOOL
(n!)n≥0 .
Here are a few illustrative examples of sequences, some presented in more than
one style of symbol. You may find it useful to ‘translate them into English’ in your
head; for instance, the first is ‘the sequence of odd positive integers’, the fourth is
‘the sequence of primes’, the sixth is ‘the sequence of reciprocals of the positive
integers but with the sign alternating’ and so on.
2.2.1 Example
3.
1 1 1 1 1
5, , 5, , 5, , 5, , 5, , · · ·
2 4 8 16 32
= (xn ) where xn = 5 if n is odd but xn = 2−n/2 if n is even.
4. (2, 3, 5, 7, 11, · · · ) = (yn )n≥1 where yn is the nth prime number. Notice how
potentially misleading the first symbol was here: it could have meant several
different sequences including, for example, ‘two, and then the odd integers
excluding the perfect squares’. The second symbol was free from any such
ambiguity.
5.
(1, −8, 27, −64, 125, −216, · · · ) = (1, −8, 27, −64, · · · , (−1)n−1 n3 , · · · )
= ((−1)n−1 n3 )n∈N
Once again the first symbol might have been misunderstood, but the second
and third left no room for confusion.
6.
1 1 1 1 1 1 1 (−1)n−1
,− , ,− , ,− , ,··· =
1 2 3 4 5 6 7 n n≥1
7. √ 1 1 1
(1, 2, 3 3 , 4 4 , 5 5 , · · · )
8.
2
1 1 1 3 1 4 1 n
(1 + 1) , 1 + , 1+ , 1+ ,··· = 1+
2 3 4 n n∈N
9.
1 1 1 1 1 1
1, 1 + , 1 + + , 1 + + + , · · ·
2 2 3 2 3 4
10.
1 1 1 1 1 1
1, 1 + , 1 + + , 1 + + + , · · ·
2 2 4 2 4 8
You should notice that some sequences, but by no means all of them, seem to
be settling towards an ‘equilibrium value’, a ‘steady state’ as we scan further and
further along the list. For instance, (2) above appears to be settling towards 1, and
(6) towards 0; in contrast, (1) and (4) are so far showing no sign of settling, but are
‘exploding towards infinity’ (and of course we shall need to make that phrase a lot
more precise before we do anything serious with it) while (5) is doing some kind
of cosmic splits by exploding towards infinity and minus infinity at the same time
(same comment). In the case of the last four sequences (7) to (10), it is much less
clear – to unaided common sense – what is going to happen in the long run.
10 2 LIMIT OF A SEQUENCE — AN IDEA , A DEFINITION, A TOOL
This feature of settling towards some limiting steady state is the most impor-
tant property that a sequence can possess. Our major upcoming task is to seek
ways of deciding whether a given sequence ultimately settles or not, and if so, to
what steady state it does ‘gravitate’. As a first step in tackling that task, we need
to find a way to describe such a settling process that is crisp and precise enough
that we can do proper mathematics with it. In this description, we shall need to
avoid all vague and undefined phrases like ‘gravitate’ and ‘gets extremely close
to’ and ‘is as nearly equal to as makes no difference’ without, of course, throwing
away the valuable intuition that these phrases try to capture.
2.3 Approximation
Across the full expanse of science, engineering and mathematics, we find instances
where some interesting constant is known not precisely but ‘only’ by estimation, by
approximation. In most such scenarios, we expect to see not just one approxima-
tion, but several obtained at various times and by different procedures (hopefully
with increasing accuracy over time) and, using if necessary a little imagination,
we can conceive of an endless process of refinement (new experiments, wider data
collection, more powerful computation, more sophisticated digital image enhance-
ment …) capable of generating better and better estimates for ever. Of course, in
the best of all possible worlds it would be ideal if, at some point in the process,
we should meet and recognise the exact value of the elusive constant…but this is
unrealistic for several reasons (including the fact that no measuring device ever
invented can operate to infinite precision) and, even within mathematics itself 1,
one must normally be content with an endlessly refining approximation procedure.
‘Your estimate is only as good as your assessment of its error’ as the maxim
puts it, so each approximation process has to focus on how bad the error term
is…or, more precisely, on how bad the error term could be: because we are actually
never going to know the exact size of the error since that presupposes knowing
the exact value of the constant that we are struggling to estimate. The final piece
in this jigsaw is: how good do we need the approximation to be? – for most
estimations are carried out with a view to application, and different applications
depend for their success or validity on different levels of accuracy. (If you are in
the business of manufacturing ball bearings for use in cheap, disposable water
pumps, then a radius accuracy of 0.1 mm may well be good enough since this
also helps to hold the price down; but if your next customer is installing similar
devices in a submarine lab environment where failure means transporting the
device up through a thousand fathoms for replacement, you had better increase
that accuracy by an order of magnitude or two; and if you want to seek a contract
with a commercial aircraft manufacturer for whom pump failure places lives in
jeopardy, another order of magnitude again…)
13
= 0.48148148148 . . . .
27
For each positive integer n, let pn stand for the decimal expansion, up to the nth
decimal digit, of this number. So…
p1 = 0.4
p2 = 0.48
p3 = 0.481
p4 = 0.4814
p5 = 0.48148
.. ..
. .
…and so on. None of these numbers equals 13 27 exactly: if you take any one
of them and multiply it by 27, you don’t get 13; indeed, just from the way in
2 that is, the difference between the nth approximation and the ideal value
12 2 LIMIT OF A SEQUENCE — AN IDEA , A DEFINITION, A TOOL
which multiplication is carried out, you don’t get a whole number. They are,
however, approximations to 13 27 and – broadly speaking – they provide better and
better approximations as you work along the list: indeed, you could get ‘as close to
13
27 as you needed to be’ just by going far enough.
It is that last, slightly vague, comment that we need to make precise and, in order
to pin down its exact meaning, we shall look at the errors in the approximations,
the differences 13 27 − pn :
13
27 − p1 = 0.08148148 . . . < 0.1 = 10−1
13
27 − p2 = 0.00148148 . . . < 0.01 = 10−2
13
27 − p3 = 0.00048148 . . . < 0.001 = 10−3
.. .. ..
. . .
13
27 − pn = < 0.00 . . . 01 = 10−n
.. .. ..
. . .
Notice that this display doesn’t tell us explicitly the exact value of these errors3,
but that this is not going to matter: because, instead, we have an overestimate of
the size of the typical nth -stage error that is simple enough to work with. Look:
• If we are allowed a certain ‘tolerance of error’, that is, we’ve been asked to get
some approximations whose actual errors are less than that tolerance, we can
1 N
now easily see how to do it. Just find some positive integer N such that 10
is smaller than the permitted tolerance, and then pN will be a good enough
approximation because 0 < 13 1 N 13
27 − pN < ( 10 ) , so 27 − pN is also smaller
than the tolerance.
• Continuing…not only is that pN good enough, but all the later ones pN+1 ,
pN+2 , pN+3 , pN+4 …will also be good enough in the same sense of ‘good’: they
all have actual errors that are less than our allowed tolerance.
Incidentally, if we had approximated from above instead of from below, by opt-
ing for the list of numbers 0.5, 0.49, 0.482, 0.4815, 0.48149…, then the differences
13
27 − pn would have been negative. That would not have bothered us too much
because the size of the error is usually more important than whether it is positive
or negative. So we would have taken pn − 13 27 as the error measurement in this case
instead of 13
27 − p n , and the rest of the calculations would have worked out almost
exactly the same.
The way to avoid worrying about whether our approximations are overestimates
or underestimates is simply to define the error to mean 13 27 − p
n , so that error
measurements are always counted as positive. We shall do this in future.
The last small step we take in order to compress our account of this string of
improving approximations into a compact phrase is to agree on a standard symbol
for what we called the tolerance of error. For historical reasons, the Greek letter ε
3 for one thing, we are only working to so-many decimal places at this point
2.5 APPROXIMATING AN AREA 13
(pronounced ‘EP-silon’) is used. Thus, our precise and concise reason for declaring
the list of numbers (pn ) to be a ‘perfect’ approximation process for the fraction
13
27 is:
for each ε > 0, we can find
a positive
integer N such that
for every n ≥ N, we get pn − 13
27
< ε.
You should probably read that last couple of lines several times in order to feel
how it captures all the aspects of our lengthy discussion. Notice particularly that
the N that we find depends on the particular ε that we are challenged with: if they
change ε, we are free to change4 the replying N that we find (and we shall usually
need to do so). Sometimes it is denoted by N(ε) or Nε instead of plain N in order
to make exactly this point, and other commonly employed symbols for it are nε ,
n0 , n1 and n2 .
y = x2
0 1 2 x
As a first attempt, we could divide this area into vertical strips by adding in extra
vertical lines at x = 0.2, 0.4, 0.6, 0.8, 1.0, 1.2, 1.4, 1.6 and 1.8. Since (for positive
numbers) p < q implies p2 < q2 , within each of these strips, the lowest point of
the curve is at its left-hand edge and the highest point is at its right-hand edge. If we
therefore imagine, inside each of the strips, the tallest rectangle that fits underneath
4 For instance, in our ‘13/27’ example, when ε is set at 0.001, N = 3 is a good enough choice;
if the application requires ε to be reset to 0.000001, then N will have to alter to 6 at least. The
relationship between ε and N is not always as simple as this, however.
14 2 LIMIT OF A SEQUENCE — AN IDEA , A DEFINITION, A TOOL
the curve, it is easy to write down the area (length times breadth) of each of these
rectangles. By adding these together, we get an estimate of the area under the curve.
y = x2
which calculates out as 2.28. In just the same way, we can find an overestimate
(let us denote it by O10 ) of the desired curved area by considering the shortest
rectangle within each vertical strip that fits above the curve, as indicated in the
next diagram:
y = x2
O10 = 0.2{0.22 + 0.42 + 0.62 + 0.82 + 12 + 1.22 + 1.42 + 1.62 + 1.82 + 22 } = 3.08.
2.5 APPROXIMATING AN AREA 15
It would, of course, be wrong to claim that U10 and O10 are accurate estimates of
the area A that we set out to find: for one thing, our diagrams suggest that they are
not; for another, the relatively large difference (0.8) between them makes it clear
that they certainly cannot both have a high degree of accuracy. However, there are
two very encouraging aspects of the discussion:
1. If we re-run the argument with more and narrower vertical strips, there are
good prospects that the accuracy will improve.
2. We have control of the error: since the desired area A lies between U10 and
O10 , the error we make in proposing either of these as an approximation to A
cannot be more than the difference O10 − U10 . Therefore if, as we hope, the
difference between the overestimate and the underestimate becomes smaller as
we increase the number of strips, we have an improving sequence of
approximations to A, just as in the previous illustration the decimals of
increasing length provided an improving sequence of approximations to
13/27.
Therefore, instead of using just ten vertical strips to slice up the area A, imagine
that we choose a positive integer n and divide A into n strips (meeting at n2 , n4 , n6
and so on up to 2n−2
n ). There are only very minor changes in the argument: we get
the underestimate
2 2 2
2 2 4 6 2n − 2 2
Un = + + + ··· +
n n n n n
8 2
= (1 + 22 + 32 + · · · + (n − 1)2 )
n3
Now we can call in an algebraic identity for the sum of consecutive squares that
you may have come across before:
k(k + 1)(2k + 1)
12 + 22 + 32 + 42 + · · · + k2 =
6
(if this is not familiar to you, you will find a proof of it in the next chapter but one,
as paragraph 4.2.2). Using this, the underestimate formula simplifies to
At this point we are ready to obtain estimates for A that are as accurate as we
choose to make them. For instance, if we need an approximation whose error is
0.01 or smaller, choosing n = 800 will be good enough since, at that point, n8 is
0.01 and the error we make in claiming
4(801)(1601)
‘A = O800 = = 2.67169 approximately’
3(800)2
is less than that. If, instead, we need the error to be smaller than 0.0001, then
choosing n = 80, 000 (or, indeed, anything larger than 80,000 – for instance, it
might make the arithmetic simpler if we opted for n = 100, 000 instead5) will
achieve it. Indeed, no matter how small the error is required to be, we now have
a simple rule of thumb for choosing a positive integer N so that any value of n
that exceeds N will give us an approximation On (or Un or, indeed, anything in
between) whose actual error is smaller: we have, in the language of Section 2.4, set
up a ‘perfect’ approximation procedure for A.
then (firstly) the total is an estimate for π/4 and (secondly) the error in that
estimate is smaller than the modulus of the next number in the list – the first one
that you decided not to take. Exactly why this is true is not at all obvious, but we
shall investigate it later in the text (see 18.3.17).
If, for instance, we add the first five numbers, then the running total so far is
263/315 which is really not all that good as an estimate, but does at least come
with an assessed error: the error must be smaller than 1/11, the modulus of the
sixth term. Likewise, the total of the first twenty fractions in the list (which would
5 Working to five decimal places, O100,000 calculates out at 2.66671 and U100,000 at 2.66663
2.7 TESTING LIMITS BY THE DEFINITION 17
It will probably help you to keep on thinking of xn as the nth item in a succession
of approximations to , and of ε as the tolerance for some intended application.
In that sense the open interval ( − ε, + ε), which consists of exactly the
numbers whose distances from are smaller than ε, is where to find the ‘good’
approximations – those whose errors are smaller than the current tolerance. Keep
in mind that the physical distance between numbers x and y on the real line is
|x − y|, so that the phrase |xn − | < ε simply says ‘the distance between xn and
is smaller than ε’.
ε ε
−ε +ε
Draft solution
Our task is – given any positive tolerance ε – to find a value of n beyond which
all terms of the sequence lie within that tolerance
of 0. Such a task usually needs a
piece of roughwork first. We want n1 − 0 < ε, that is, n1 < ε, that is, n > ε1 . This
shows how big n needs to be. However, ε1 is probably not an integer so, in order to
line up with the definition, we had better round it up to the next whole number (or
one greater still, if you prefer) and call that nε . Now we are ready:
Solution
1
Given ε > 0, let nε be any integer larger ε . Then for every integer n ≥ 1nε we
1 than
have n > ε , therefore n < ε, that is, n − 0 < ε. By the definition, limn→∞ n = 0.
1 1
2.7.3 Example Put xn = 4 − 3n−2 for each positive integer n. We show that (xn )
is a convergent sequence, and that its limit is 4.
Draft solution
We need to arrange |xn − 4| < ε for each given positive tolerance ε – or, more
precisely, to decide how big n needs to be in order to force this to happen. Now
2
|xn − 4| simplifies to n32 and this will be less than ε just when n3 > ε1 , that is, when
n2 > ε3 , that is, when n > ε3 . We can locate a suitable nε for the definition by
rounding up that last expression to a whole number.
Solution
3
Given ε > 0, let nε be any integer larger than ε. Then for every integer n ≥ nε
3 3 ε
we have n > ε and therefore n2 > ε and therefore n12 < 3 and therefore n32 < ε,
that is,
3
|xn − 4| = |4 − 3n−2 − 4| = < ε.
n2
By definition, the sequence (xn ) converges to 4.
n(3n − 1)
lim = 3.
n→∞ n2 + 1
6 While we begin to build up experience and confidence in using this definition, we shall often
practise on sequences (such as this one) for which it is possible to guess fairly easily the exact
numerical value of the limit.
2.7 TESTING LIMITS BY THE DEFINITION 19
Draft solution
Let xn stand for the typical term in this sequence and let ε denote a given positive
tolerance. We want to arrange that |xn − 3| < ε for sufficiently big values of n,
that is,
2
n(3n − 1) 3n − n − 3n2 − 3 −n − 3 n+3
n2 + 1 − 3 = n2 + 1 = n2 + 1 = n2 + 1 < ε.
This time it is not straightforward to determine exactly how big n must be to force
this, but we do not need to do so exactly: we can look for a WCS7 overestimate that
is easier to work with, and use that instead to decide where nε can be safely placed.
Look carefully at the following overestimation:8
n+3 n+3 n + 3n 4n 4
< ≤ = 2 = .
n +1
2 n2 n2 n n
4 4
Now it is easy to make n less than ε: just ensure that n exceeds ε or, rather, an
integer larger than that.
Solution
Given ε > 0, let nε be any integer larger than ε4 . Then for every integer n ≥ nε we
have n > ε4 and therefore n4 > ε1 and therefore n4 < ε. But
n(3n − 1) n+3 n+3 n + 3n 4n 4
|xn − 3| = 2 − 3 = 2 < ≤ = 2 =
n +1 n +1 n2 n2 n n
so also |xn − 3| < ε. By definition, xn → 3.
n+5
2.7.5 EXERCISE Show that the sequence converges to 12 .
2n + 13 n≥1
that this happens, or else (preferably) use some (WCS) overestimation to make
your task easier; for instance:
3 4 2 2 1
< = < = .
2(2n + 13) 2(2n + 13) 2n + 13 2n n
2.7.7 EXERCISE Prove the convergence (and evaluate the limit) of the sequence
(an ) described by
15n2 + n + 1
an = .
5n2 − n − 2
Partial draft solution
You should again be able to guess the limit pretty certainly just by trying huge
values of n (but better methods are coming). The tricky point this time comes in the
(WCS) overestimating of the error term. You ought to find that the error simplifies
4n + 7
to 2 , and it is then tempting to argue as follows:
5n − n − 2
4n + 7 4n + 7n 11n 11n
≤ 2 = 2 < 2 ...
5n2 −n−2 5n − n − 2 5n − n − 2 5n
but the last step is wrong: by changing 5n2 − n − 2 into 5n2 we have
actually increased the denominator and therefore decreased the fraction,
which is the exact opposite of what we intended and needed. Instead, try this:
5n2 − n − 2 ≥ 5n2 − n2 − 2n2 so
11n 11n 11n 12n 6
≤ 2 = 2 < 2 =
5n2 − n − 2 5n − n2 − 2n2 2n 2n n
and so on.
2.7.8 Remark How a sequence converges (or not) is not influenced in any way by
the first few terms – nor, indeed, by the first trillion terms. For imagine that we take
a convergent sequence (an ) with limit , and alter the first trillion (= 1012 ) terms
in some fashion. Given positive ε, we can find nε so that n ≥ nε makes the error
terms |an − | in the original sequence less than ε. If it happens that nε is more than
a trillion, this remains true for the modified sequence (since the modifications only
2.7 TESTING LIMITS BY THE DEFINITION 21
affected the early terms). Yet if nε is a trillion or less, we see that n ≥ 1012 forces the
nth stage errors in both the modified and the unmodified sequences to be smaller
than ε once again. In both cases, the limit has not been affected.
This allows us, when exploring the limit of a sequence, to ignore the first few (or
the first many – but never infinitely many) terms if it simplifies our argument. Here
is an illustration:
3n
2.7.9 Example To show that → 0.
7n2 − 6n − 12
Proof
If not, then one of the two is larger. Without loss of generality we’ll assume 1 < 2
(otherwise, just change the labels of the two alleged limits). Put ε = 12 (2 −1 ) > 0.
2ε
1 2
1 − ε 2 + ε
1 + ε = 2 − ε
From the definition of limit there must be a positive integer n1 such that
1 − ε < an < 1 + ε
for every n ≥ n1 . Then again, there must be another positive integer n2 such that
2 − ε < an < 2 + ε
for every n ≥ n2 . Choose any integer n that is bigger than both n1 and n2 , and
we have all of these inequalities working for us simultaneously. In particular, and
holding in mind that the way we chose ε made 1 + ε and 2 − ε be the same
number:
an < 1 + ε = 2 − ε < an
which produces the contradiction an < an .
2.7.11 Example To show that each constant sequence converges (and that its limit
is that constant).
Solution
Consider a sequence (xn ) in which every xn is the same number: that is, a sequence
of the form (c, c, c, c, c, · · · ). We show that the limit of (xn ) is also c. In other
notation, limn→∞ c = c.
Given ε > 0, let us choose nε = 1…yes, with a constant sequence, we can get
away with a constant choice of nε also. Then for every n ≥ nε , we get |xn − c| =
|c − c| = 0 < ε and the demonstration is complete.
2.7.12 EXERCISE
• Show by example that the following statement is not true: if the sequence (a2n )
converges to 2 , then the sequence (an ) must converge to or to −.
• Prove that if the sequence (a2n ) converges to 0, then (an ) converges to 0.
2.7 TESTING LIMITS BY THE DEFINITION 23
Roughwork
• For questions like the first part, the thing to keep in mind is that if you only
know about the value of x2 (x being a real number) then, generally, you don’t
know whether x itself is positive or negative.
• The second part of this exercise highlights a small trick that frequently turns
out to be extremely useful. Suppose we know that a particular sequence (xn )
converges to a limit , and we wish to use this information to show that a
different but related sequence (yn ) converges to a limit m. Our task is to
demonstrate that, for any given ε > 0, we can force |yn − m| < ε (for
sufficiently large values of n). The available information is that |xn − | actually
can be made less than any given positive number, such as ε…but not only ε:
absolutely any positive quantity can be used instead if it helps us solve the
problem – for the basic given information is that |xn − | can be forced to be
less than any positive tolerance whatever.
Here, we need to show that (given ε > 0) we can make |an − 0| < ε, and the
given information is that |a2n − 0| can be made less than any tolerance. We ask
ourselves: what should that tolerance be, in order to be able to show that
|an − 0| < ε? Another reading of the sentence (if needed) should show you that
we can choose the ‘missing’ tolerance as ε2 : because |a2n − 0| < ε2 certainly
implies that |an − 0| < ε. So a formal solution to the second part can begin
as follows:
Partial solution
Given that (a2n ) converges to 0, and given ε > 0, notice that ε2 is also greater than
0, so there is a positive integer n0 such that |a2n − 0| < ε2 for all n ≥ n0 …
2.7.13 EXERCISE Let (xn )n≥1 be any sequence. Verify that xn → 0 if and only if
|xn | → 0.
Roughwork
The cautious approach to an ‘if and only if ’ claim is to break the argument into two
parts: the ‘if ’ and the ‘only if ’. That is, we set out to show the following:
1. if xn → 0 then |xn | → 0, and
2. if |xn | → 0 then xn → 0.
To set up part (1), assume that xn → 0. Then (for a given value of ε > 0) what
we know is that |xn − 0| < ε for all n ≥ some n0 . What we need to know is that
||xn | − 0| < ε for all sufficiently large values of n. Now compare what we know
with what we want to know.
Part (2) should work in a very similar way.
24 2 LIMIT OF A SEQUENCE — AN IDEA , A DEFINITION, A TOOL
2.7.14 EXERCISE Let (an )n≥1 be any sequence. Show (directly from the defini-
tion of the limit) that an → if and only if an − → 0 if and only if |an − | → 0.
Roughwork
The second if-and-only-if is something that we know already: just put xn = an − in
the previous exercise (2.7.13). So we need only focus on the first one: that an →
if and only if an − → 0. For any given ε > 0, write down the definitions of what
an → and an − → 0 require, and compare them.
2.8.1 Theorem Suppose that (an ) and (bn ) are convergent sequences, with limits
and m respectively. Then:
1. an + bn → + m,
2. an − bn → − m,
3. an bn → m,
4. For each constant k, kan → k,
5. |an | → ||,
an
6. Provided that m = 0 and that no bn is zero, also → .
bn m
REMARK: There are several other ways of expressing this collection of results,
including the following version:
1. lim(an + bn ) = lim an + lim bn ,
2. lim(an − bn ) = lim an − lim bn ,
3. lim(an bn ) = lim an lim bn ,
4. For each constant k, lim(kan ) = k lim an ,
5. lim |an | = | lim an |,
an lim an
6. Provided that no division by zero occurs, lim = .
bn lim bn
(The entire result can even be turned into English, thus: taking limits of conver-
gent sequences is compatible with addition, with subtraction, with multiplication,
2.8 COMBINING SEQUENCES; THE ALGEBRA OF LIMITS 25
with ‘scaling’ (that is, multiplying by constants), with taking modulus, and with
division provided always that no illegal division by zero is attempted.)
We shall eventually provide proofs of all six parts of this highly useful result, but
not all at once since we are keener just now to show how to use it. Here is a start
on that project:
For similar reasons, there is another positive integer (call this one n2 ) such that
ε
n ≥ n2 forces |bn − m| < .
2
We cannot know which of n1 , n2 is the greater, but one of them is.11 Write n0 =
max{n1 , n2 } and notice that whenever n ≥ n0 , we get both the displayed lines
working for us at the same time. Therefore, for every n ≥ n0 :
ε ε
|an + bn − ( + m)| = |(an − ) + (bn − m)| ≤ |an − | + |bn − m| < + = ε.
2 2
Thus the proof of (1) is complete. (Notice the use of the triangle inequality 1.3.3 in
the last line here.)
10 Why did we make that choice? Well, to show that an + bn converges to + m, we need to
arrange that |an + bn − ( + m)| shall be less than ε. Rearrange that desired punch-line into
|an − + bn − m)| < ε. Each of |an − | and |bn − m| can be made as small as we please…and if
we make each of them less than one half of ε, then their combined total will be smaller than two
halves of ε, which is exactly what we needed. Now, rejoin the main text.
11 Unless, of course, they happen to be equal, in which case it really doesn’t matter which one
you choose.
26 2 LIMIT OF A SEQUENCE — AN IDEA , A DEFINITION, A TOOL
Then also
ε
n ≥ n0 forces |kan − k| = |k(an − )| = |k||an − | < |k| =ε
|k|
which is exactly what the definition asked in order to show that kan → k.
2.8.2 EXERCISE Construct a proof of part (2) of the theorem. (You should find
that an argument very like that given for part (1) will be convincing; after all,
addition and subtraction are quite similar operations.)
Moving towards the use of this theorem now, notice for a start that (according
to part (3), and remembering from Example 2.7.2 that n1 tends to zero)
1 11 1 1
lim 2 = lim = lim lim = 0.0 = 0
n nn n n
1
and so on. Consequently → 0 for each positive integer k. (It is easy to prove this
nk
formally by induction,12 if you wish to try it.) This observation, combined with
pieces of our main theorem, allows us to deal with a large number of limit problems
quickly and painlessly. Consider, for instance:
15n2 + n + 1
an = .
5n2 − n − 2
Solution
Begin by dividing the numerator and denominator of the fraction formula by the
largest power of n appearing, n2 (which dividing, of course, doesn’t change the
fraction in the least):
We are finished already! Notice, however, how many aspects of the key theo-
rem were used in that last line: part (6) to let us deal with the numerator and
12 If this technique is not familiar to you, wait until Section 4.2 where we shall discuss it in
detail.
2.8 COMBINING SEQUENCES; THE ALGEBRA OF LIMITS 27
denominator separately, parts (1) and (2) to add/subtract the separate parts of each
and part (4) to see that lim 2n−2 = 2 lim n−2 . It is not considered necessary to write
out all of these moves, but be aware of them.
Solution
(Despite its forbidding appearance, this expression has been built up layer by
layer from simple pieces, and we only need to ‘shadow’ its construction to get the
answer.) To begin with, dividing top and bottom by n:
2 − 3n 2n−1 − 3 2(0) − 3 −3
= −1 → =
4 + 5n 4n + 5 4(0) + 5 5
and consequently
n − 2−3n
4+5n
→ 1 = 1.
3n + π 3 3
Moving on (but still digesting fractional formulas by dividing top and bottom
by the largest power involved, currently n3 ):
2.8.5 EXERCISE Use the algebra of limits to find the limit of each of the sequences
whose nth terms are given by the following formulae:
1. 1 − 2n + 3n2 − 4n3
−5 − 6n + 7n2 + 8n3
2. 11 − 6n 5
5n + 2
9 − 4n 9 − 4n + n2
3.
1 + n + 7n2 − 1 − n + 7n2
π 2 n2 3 − n4 1 − π n
4. + | | − 3
π n2 + π 3 3 + n4 2+π n2
Remark
In the first of these, be sure to check that the denominator of the given fraction
is never zero (because otherwise the relevant part of the key theorem could not
be used). In the fourth of these exercises, keep in mind the traditional ‘order of
precedence’ of the arithmetical operations: for example, that multiplications and
divisions are always done before additions and subtractions, except where brackets
or other ‘enclosing’ operations dictate otherwise (since material inside brackets and
the like needs to be evaluated first). In this context, pairs of modulus signs behave
like brackets.
2.8.6 EXERCISE Prove part (5) of the algebra of limits theorem. That is, given
that an → , show that |an | → ||.
Draft solution
For a slick solution, almost all you will need is the reverse triangle inequality 1.3.4
|x| − |y| ≤ |x − y|
2.8.7 A look forward What we have done so far about detecting and proving
limits of sequences works fine provided that either we can sensibly guess the limit
and then come up with an overestimate of the error term that is simple enough
to work with, or else we can break the typical term down into simpler pieces
whose separate limits we already know. Unfortunately, many important and useful
sequences don’t fall into either of those categories. For instance (look back to a
previous example 2.2.1 (7)), although it is not very hard to guess the limit of n1/n ,
it is then far from obvious how we could estimate the error-gap between that guess
and the typical term. Again, in the cases of ( (1 + n1 )n )n≥1 and of
2.9 POSTSCRIPT: TO INFINITY 29
1 1 1 1
1 + + + + ··· + ,
2 3 4 n n≥1
although a few minutes’ roughwork with a calculator will strongly suggest that the
nth term of each is getting steadily bigger as n increases, it is hardly self-evident
whether or not they are getting steadily closer to some limiting ‘ceiling’ or, if so,
what number that ceiling might be. In short, we need more techniques, more
analytic technology, to tackle such questions. The aim of the next chapter is to
develop and deploy some of that technology.
2.9.2 Definition A sequence (xn )n∈N is said to tend to minus infinity or diverge to
minus infinity if:
for each K < 0 there is some positive integer nK such that
xn < K for every n ≥ nK .
Roughwork
Given K > 0, we need to guarantee that, for all values of n that are big enough,
n3 −5n2 −2n−17 > K. Now (to simplify the algebra, just as we did for convergent
sequences)14
1 3
1. 5n2 ≤ 10 n provided that n ≥ 50,
1 3
2. 2n ≤ 10 n provided that 20 ≤ n and therefore surely if n ≥ 5,
2
1 3
3. 17 ≤ 10 n provided that 170 ≤ n , and therefore surely if n ≥ 6.
3
1 3 1 1 7
n3 − 5n2 − 2n − 17 ≥ n3 − n − n3 − n3 = n3 .
10 10 10 10
7 3 10K
Then, to force 10 n > K, it is good enough to take n > 3
7 …
Solution
10K
Given K > 0, choose an integer nK > max 50, 3
7 . Then (as shown in the
roughwork) n ≥ nK will guarantee that
7 3
n3 − 5n2 − 2n − 17 ≥ n > K.
10
3n4 − 7n2
→ ∞ as n → ∞.
2n2 + n − 1
14 Sometimes, even the roughwork needs roughwork. Here, as n becomes bigger, the n3 will
become much bigger and more important than the other pieces. To bring this out, we ought to
force each of 5n2 , 2n and 17 to be less than a small fraction of the n3 …for instance, less than one
tenth of it. How do we make 5n2 less than a tenth of n3 ? By ensuring that 50n2 < n3 …and any
value of n over 50 will do that. Then the same kind of discussion will handle 2n and 17. At this
point you can rejoin the text and see why we restrict n as we do.
2.9 POSTSCRIPT: TO INFINITY 31
2.9.5 EXERCISE Write out a formal proof of the following (almost immediate)
consequence of the definitions: xn → −∞ if and only if −xn → ∞.
100 − n5
→ −∞ as n → ∞.
200 + n2 + n4
n5 − 100
→ ∞ as n → ∞
200 + n2 + n4
(which will be slightly easier since there are fewer minuses involved). Given K > 0,
we therefore need to arrange (just by taking large enough values of n) that
n5 − 100
> K.
200 + n2 + n4
It will be easy to ensure that this final fraction is greater than K: we need only make
n exceed 10K/3. A formal solution can now be compiled as follows:
2.9.8 Solution Given K > 0, we choose nK to be any positive integer that is greater
than both 4 and 10K
3 . Then (as seen in the roughwork) for any n ≥ nK , we shall
have
This shows
n5 − 100
→∞
200 + n2 + n4
which, by the preceding exercise, is equivalent to the question posed.
2.9.10 Theorem
15 that is, (yn )n∈N is bounded: see paragraph 4.1.4 for more on this.
2.9 POSTSCRIPT: TO INFINITY 33
Sample proofs
All of the above are proved in a routine16 way from the definitions, so we shall only
demonstrate a few.
Roughwork towards 1.
Given K > 0, we need to show that xn + yn > K for all n ≥ some threshold
value. Now we are told that xn > K for all n ≥ some n0 , and also that yn > K
for all n ≥ some possibly different n1 . What if we combine these inequalities while
n ≥ max{n0 , n1 }? Remember that K is positive, so 2K > K.17
Proof of 2.
Begin by choosing A as in 2., and let K be a given positive constant.
(Roughwork: we need to arrange that xn + yn > K (for large values of n) and,
knowing that yn ≥ −A in any case, it will be enough to get xn > K + A; but this
we can do, because xn → ∞.)
Since xn → ∞, we can find a positive integer n0 such that, for each n ≥ n0 ,
xn > K + A. Since we also know that yn ≥ −A, adding the two inequalities gives
xn + yn > K. Therefore xn + yn → ∞.
Proof of 4.
Suppose that xn → ∞ and that A is a negative constant. Given K < 0, we need
to think how to ensure that Axn < K and, because negative multipliers can cause
confusion in inequalities, it is safer to write that as −Axn > −K, that is, as |A|xn >
|K|, or as xn > |K|
|A| .
Since xn → ∞ and |K| |K|
|A| is positive, there exists18 n0 ∈ N such that xn > |A| for all
n ≥ n0 : that is, |A|xn > |K| or −Axn > −K or Axn < K. Therefore Axn → −∞.
16 Incidentally, routine is not the same as easy! By calling an argument routine, all we mean is
that it is built up by putting together the definitions and the ‘obvious’ results in a rather predictable
fashion, without depending on surprise insights. That may or may not be brief or easy – in some
cases, it is neither.
17 Incidentally, a slightly more elegant solution could begin with xn > K/2 for all n ≥ some
n0 and yn > K/2 for all n ≥ some n1 .
18 This is similar to the trick we discussed in the roughwork to 2.7.12; once we are told that
xn → ∞, then xn can be made larger not only than a particular positive constant such as |K| that
we are challenged with, but also than any positive expression built from that constant that suits
our purpose, such as |K|/|A|.
34 2 LIMIT OF A SEQUENCE — AN IDEA , A DEFINITION, A TOOL
Proof of 7.
Supposing that xn > 0 for all n ≥ some n0 and that xn → 0; let K be any given
positive constant. Then K1 is positive19 so there exists n1 ∈ N such that |xn −0| < K1
for all n ≥ n1 : indeed, if n ≥ n0 as well, then 0 < xn < K1 for all such n. Therefore
1 1
n ≥ max{n0 , n1 } ⇒ 0 < xn < ⇒ >K
K xn
2.9.11 EXERCISE Choose any other two parts of this theorem and write out
proofs for them.
100 − n5
→ −∞ as n → ∞.
200 + n2 + n4
2.9.13 Solution Once we divide the top and bottom lines by n5 , the biggest power
of n appearing, it is immediate from the algebra of limits that
200 + n2 + n4
→ 0 as n → ∞
100 − n5
and we also notice that the expression is negative for all n ≥ 3. The conclusion
follows from part 8 above.
2.9.14 EXERCISE Decide (with proof) whether each of the following is true or
false:
• If xn → 0 and no xn is exactly zero, then either 1/xn → ∞ or 1/xn → −∞.
• If xn → 0 and no xn is exactly zero, then 1/|xn | → ∞.
2.9.15 Remark Once we had checked that 1/n → 0, we learned from the algebra
of limits that its various positive-integer powers
1 1 1 1
, , , ,
n2 n3 n4 n5
and so on, also converged to zero. This conclusion is not, however, restricted to
integer powers:
19 Where did that come from? Once again, it comes from thinking about what we need to show.
In order to prove that x1n → ∞, we need to get x1n > K for big values of n and, provided that xn is
positive, that is the same as asking for xn < K1 . Can we make that happen? Yes, because xn → 0.
2.10 IMPORTANT NOTE ON ‘ELEMENTARY FUNCTIONS’ 35
2.9.16 Example To use part (5) of the preceding theorem to show that, for any
positive real number a,
1
→ 0.
na
Solution
By the referenced part (5), it is good enough to show that na → ∞.
(Roughwork:
√ for each given positive K, we need to arrange that na > K, that is,
that n > K . . .)
a
√
Given K > 0, we note that a K is also √ merely a real number, so we can find√a
positive integer n0 that is larger: n0 > a K. Then for any n ≥ n0 we have n > a K
and therefore20 na > K. In other words, the sequence (na ) diverges to ∞, as
desired.
20 Here we are taking it for granted that larger positive numbers have larger ath powers which,
although true, will not be amenable to formal demonstration until we have properly defined xa
for all real a and positive real x.
21 Note that there are a number of notations in use for the natural logarithm function in
particular; we have chosen to use ‘ln’, but ‘log’ and ‘loge ’ are also widespread.
22 We should stress that expressions such as sin x and ex are functions rather than sequences,
since it is to be understood that their control variable x is a real number rather than just a
positive integer.
However,
we can use such functions to build a wide variety of sequences, such
as (sin n), n sin n1 , e1+sin(π n/4) , (ln(1 + 1/n)) and so on, n being a positive integer in each
case.
36 2 LIMIT OF A SEQUENCE — AN IDEA , A DEFINITION, A TOOL
Pending our future encounter with those definitions and proofs, it seems
right that we allow ourselves to mobilise certain basic information concerning
sin x, cos x, ln x and ex provided that we take care not to use it in the development
of the theory, but only in examples, and provided that we do eventually get around
to showing that this information is reliable. The following summary lists explicitly
the details, concerning the four functions, that we temporarily accept for use in
examples (and that we promise, in the long run, to establish).
1. • sin : R → [−1, 1] is an odd, periodic function in the sense that
3 Interlude: different
kinds of numbers
.........................................................................
3.1 Sets
It is convenient for us to use a little of the language and symbolism of set theory,
although the theory itself lies beyond the scope of this text. By the term set we
mean a well-defined collection of distinct objects that are called its elements. By
well-defined we mean that each object either definitely is an element of the set in
question, or definitely is not: there must be no borderline cases. (So, for instance, we
have to avoid ideas such as ‘the set of all very large integers’ or ‘the set of numbers
that are extremely close to 3’: for it would be a matter of opinion and context which
objects were to belong to such ill-defined collections.) By distinct we mean in
practice that repetitions among the elements of a set are not allowed, so that, for
instance, the set of prime factors of 360 (= 23 × 32 × 5) has only the three elements
2, 3 and 5, although the list (2, 2, 2, 3, 3, 5) of its prime factors comprises six items.
The sets we particularly work with are sets of real numbers – either real numbers
of a particular type, or a selection of real numbers lifted out for some particular
discussion. Some of these have turned out to be useful so often in the past that
there are now standard symbols for them that should be known and recognised:
1 More precisely, the powers of two in which the index is a positive whole number.
Undergraduate Analysis: A Working Textbook, Aisling McCluskey and Brian McMaster 2018.
© Aisling McCluskey and Brian McMaster 2018. Published 2018 by Oxford University Press
38 3 INTERLUDE: DIFFERENT KINDS OF NUMBERS
1. N is the set of positive integers,2 the set of whole numbers that are greater than
zero. In this case, the curly bracket notation is clear enough:
N = {1, 2, 3, 4, 5, · · · }.
2. Z is the set of all integers – positive, negative and zero whole numbers. The
symbol comes from the German word Zahl meaning number. Sometimes we
stretch the list-in-curly-brackets notation and display it as
Z = {· · · − 3, −2, −1, 0, 1, 2, 3, 4, · · · }.
3. Q is the set of rational numbers, those that can be expressed exactly as a
fraction in the usual sense of the word, that is, as an integer divided by a
non-zero integer. For instance, 23 ∈ Q, 1.4 ∈ Q because 1.4 = 75 exactly,
−3 ∈ Q because −3 = −3 1 (or, if you feel it is somehow cheating to have 1 as
the unnecessary denominator of a fraction, then −3 = −6 2 is an alternative
reason).
√
4. R is the set of all real numbers. (So, for instance, π/e ∈ R but −1 ∈ / R.)
It is not obvious to common sense alone that there are any real √ numbers
√ √ that
are not rational. You may have seen proofs that surds such as 2, 5, 3 12 and
so on are not rational but, if not, here is a sample argument that is worth reading
carefully as an illustration of proof by contradiction. We shall take it as given that
every positive integer n (except 1) can be expressed as a product of powers of prime
numbers, and that (apart from the order in which these are written) such a prime
decomposition is unique for each n.
√
3.1.3 Proposition The real number 35 is not a rational number.
Proof
Suppose it were. Then it must be possible to find two integers p and q such that
√ p p2
35 = q . Then 35 = q2 or, more simply,
p2 = 35q2 .
(Now pick on one of the primes that appear to be involved in that last equation–
say, the 5. The 7 would do equally well.)
Let 5a be the power of 5 that appears in the prime decomposition of p, and 5b be
the power of 5 that appears in the prime decomposition of q. (We are not ruling out
2 We have avoided using the phrase ‘natural numbers’. Some writers use the term natural
numbers as a synonym for the positive integers, and others take it to mean the set comprising
the positive integers and 0 itself. In addition, some writers use the symbol N to mean the set of
natural numbers rather than the set of positive integers. So 0 ∈ N in some books and 0 ∈ / N in
others! Be aware of this possibly confusing point if you are reading a range of textbooks.
3.1 SETS 39
that 5 may not be a prime factor at all of p or of q, for a or b might be zero.) Then 52a
is the power appearing in (the decomposition of) p2 , and 52b the power appearing
in q2 . Also the power appearing in 35q2 = 5 × 7 × q2 will be 52b+1 . However, p2
and 35q2 are the same number so, from the uniqueness of prime decomposition,
52a and 52b+1 must be the same thing – which tells us that 2a = 2b + 1 and that
1
a−b= .
2
3.1.4 Definition If A and B are sets, and if every element of A is also an element
of B, then A is called a subset of B, and we write A ⊆ B. For example, N, Z and
Q are subsets of R, while N and Z are subsets of Q, and N is a subset of Z. Notice
that the wording of the definition makes every set a subset of itself: A ⊆ A merely
because every element of A is (of course) an element of A.
There are ways of combining two (or more) sets that will also help us to discuss
some matters in analysis:
3.1.6 Selection Whenever A is a set and P(x) is a statement that makes sense4 for
each element x of A, the notation
{x ∈ A : P(x)} or {x ∈ A | P(x)}
means the subset of A comprising just those elements of A for which the statement
in question is true. This is a much more versatile style of notation than the
list-in-curly-brackets that we previously presented. The whole symbol is usually
pronounced as ‘the set of all x in A such that P(x) is true’ (and the words ‘is true’
can be left out). Whether you use a colon (:) or a vertical (|) half-way through
the symbol is a matter of taste and readability; for instance, if P(x) begins with
something like |x| or already involves a colon, then using the other divider will
help the eye to take in quickly what is written. Of course, this selection notation only
applies to sets that are subsets of some pre-existing set, but that is not a problem for
us since virtually all the sets we need to work with are subsets of the real number
system R. Here are a few illustrations:
p
• x ∈ R | x = for some integers p, q where q = 0
q
– is the definition of the set Q of rationals.
• {x ∈ R : x ∈ / Q}
– is the set R \ Q of irrationals.
• {x ∈ R | x = 2n , some n ∈ N}
– is the set we previously (and a little clumsily) wrote as {2, 4, 8, 16, 32, 64, · · · } .
• {x ∈ R : |x| < 3}
– is the ‘solid block’ of real numbers lying between −3 and 3. We shall next turn
our attention towards such unbroken ranges of real numbers.
4 That is to say, for each individual x ∈ A, P(x) is either true or false – once again, there must
be no borderline cases.
3.2 INTERVALS, MAX AND MIN, SUP AND INF 41
There are several different kinds of interval, depending on whether the subset
extends limitlessly far up or down the real line, and on whether it includes or
excludes points ‘at the edge’ (properly called endpoints of the interval), and we list
all of them here, together with an exact set-theoretic description of each type (in
each case, a and b denote arbitrary real numbers, with a < b if appropriate):
1. (a, b) = {x ∈ R : a < x < b}
2. [a, b] = {x ∈ R : a ≤ x ≤ b}
3. (a, b] = {x ∈ R : a < x ≤ b}
4. [a, b) = {x ∈ R : a ≤ x < b}
5. [a, a] = {x ∈ R : a ≤ x ≤ a} = {a}
6. (a, ∞) = {x ∈ R : a < x}
7. [a, ∞) = {x ∈ R : a ≤ x}
8. (−∞, b) = {x ∈ R : x < b}
9. (−∞, b] = {x ∈ R : x ≤ b}
10. (−∞, ∞) = R
(Incidentally, some texts do not class case (5) as an interval at all, and some
others call it a degenerate interval.)
The numbers a and b appearing in these descriptions are referred to as the
endpoints of the relevant interval, and all the other points of an interval are called
its interior points. It is important to bear in mind that an endpoint of an interval
may or may not itself belong to that interval. Also notice that the symbols ∞ and
−∞ are not counted as endpoints, mainly because (whatever they are) they are
not real numbers: their purpose in these notations is just to draw attention to the
absence of a right-hand or a left-hand endpoint.
The first five cases in our list are called bounded intervals. Cases (6) and (7) are
called bounded below (but not bounded above) while cases (8) and (9) are called
bounded above (but not bounded below). These ideas can be extended to apply to
any subsets of the real line:
3.2.3 Definition A lower bound of a set A of real numbers means a number l such
that l ≤ a for every a ∈ A. The set A is called bounded below if it has a lower bound.
Of course, if l is a lower bound of A then any number smaller than l is also a lower
bound of A.
3.2.4 Definition The set A is called bounded if it is both bounded above and
bounded below: that is, if it has an upper bound and a lower bound.
42 3 INTERLUDE: DIFFERENT KINDS OF NUMBERS
You should check back to our list of types of interval and see that the way we used
the terms ‘bounded’, ‘bounded above’ and ‘bounded below’ there is consistent with
the definitions that we have just given.
Specimen solution
Consider the last of these four assertions. If there does exist a constant K as
described, then −K ≤ a ≤ K for every a ∈ A, so −K and K are respectively
lower and upper bounds for A, and A is therefore bounded. Conversely, if A is
indeed bounded, and we then choose a lower bound l and an upper bound u for it,
then for each a ∈ A we have
l ≤ a ≤ u ≤ |u|, −a ≤ −l ≤ | − l| = |l|
so both a and −a are less than or equal to max{|u|, |l|}. Put K = max{|u|, |l|} and
we have (for each a ∈ A) |a| ≤ K.
When I is an interval that is bounded above, that is, one of the form (a, b) or
[a, b] or [a, b) or (a, b] or (−∞, b) or (−∞, b], it is obvious where the ‘right-hand
edge’ of I is (namely, the endpoint b) and it is obvious from the shape of the closing
bracket whether that edge point belongs to the interval or not. These are things also
worth asking about sets that are more complicated than intervals, but we need to
be more careful about defining ‘right-hand edge’ for a set that is not just an interval.
(in other words, if m is both an element of A and an upper bound of A). Notice
immediately that many sets do not possess a maximum element: for instance,
(a, b), (0, 1) ∪ (2, 4), N and { 12 , 34 , 78 , 15 31
16 , 32 , · · · } do not. On the other hand, (a, b],
1
(0, 1) ∪ (2, 4], {−n : n ∈ N} and { 2n : n ∈ N} do have (fairly obvious) maximum
elements. Informally, we often use terms such as biggest, largest, greatest or top
element instead of maximum element.
3.2.8 EXERCISE
Specimen solution
Consider, for example, the fourth of these assertions. It is clear that every element
of the set in question is negative, so 0 is an upper bound of the set. If ε is any given
positive number, then we can choose (after some roughwork6) an integer p that is
larger than ε8 . Now choose n, m both greater than p and we see that
5 We are using the word ‘least’ in its common-sense meaning here; if in any doubt, please refer
forward to paragraph 3.2.9.
6 We need to find an element of A that is greater than −ε. Looking at what a typical element
3 5
of A is, and getting rid of the complicating minuses, that says we want n+1 + m+4 < ε. We shall
1 1 ε
achieve that if both n+1 and m+4 are less than 8 , so those bottom-line integers will need to be
greater than ε8 …
44 3 INTERLUDE: DIFFERENT KINDS OF NUMBERS
8 1 ε −3 −3ε
n+1>n>p> , < , > ,
ε n+1 8 n+1 8
8 1 ε −5 −5ε
m+4>m>p> , < , >
ε m+4 8 m+4 8
−3 −5 (−3 − 5)ε
+ > =0−ε
n+1 m+4 8
3.2.9 Definition If A is a set of real numbers and l is a particular real number then
we say that l is the minimum element of A if
• l ∈ A and
• l ≤ x for every x ∈ A
(in other words, if l is both an element of A and a lower bound of A).
Notice that many sets do not possess a minimum element: for instance,
(a, b), (0, 1) ∪ (2, 4), {−n : n ∈ N} and {0.1, 0.01, 0.001, 0.0001, · · · } do not. On
the other hand, [a, b), [0, 1) ∪ (2, 4), N and {−1, 12 , − 13 , 14 , − 15 , 16 , − 17 , · · · } do
have (fairly obvious) minimum elements. Informally, we often use terms such as
smallest, least or bottom element instead of minimum element.
We shall again unpack that definition a little. To say that a number t is the
infimum of a set A says two things: firstly, that t is one of A’s lower bounds and,
secondly, that no greater number can be. That is, t ≤ x for every x ∈ A but, for any
positive number ε, t + ε fails to be less than or equal to all of the elements of A, that
is, it is strictly bigger than at least one element of A. In summary, t = inf A says:
• t ≤ x for every x ∈ A, and
• for each ε > 0 there exists xε ∈ A such that t + ε > xε .
3.2.11 EXERCISE
• Show that if a set A possesses a minimum element, then this minimum element
is the infimum of A.
3.2 INTERVALS, MAX AND MIN, SUP AND INF 45
Partial solution
For instance, let us consider the first of these assertions.
Let z denote the minimum element of the set A. That is, z belongs to A as an
element, and z ≤ x for every x in A. The second of those observations tells us that
z is one of the lower bounds of A. On the other hand, for any ε > 0 we can indeed
find an element xε of A that is less than z + ε, namely xε = z. So z is the infimum
of A.
We have made the point several times that many sets do not have maximum
or minimum elements. The vital point about sups and infs7 is that, in contrast,
these virtually always exist within the real numbers – provided only that the set in
question does not ‘stretch off towards infinity or minus infinity’ and is not merely
the empty set. This is, in many ways, the most critical property of R:8
Every non-empty set of real numbers that is bounded above has a supremum.
Every non-empty set of real numbers that is bounded below has an infimum.
It is possible to construct the real number system within a framework of set
theory and to establish this key completeness property, but such a construction
lies outside the scope of this text so we must ask you to take it on trust at present.
When you have seen how powerful it is, you will have better reasons for going
deeper into set theory with a view to understanding such a construction.
Note once again that the sup and the inf of a set A might or might not belong
to A as elements. Note also, as a point of interest, that each of the two sentences in
our statement of the completeness principle logically implies the other,9 so we did
not really need to state both of them.
7 The official Latin plurals are suprema and infima, but it is common practice to speak of sups
and infs, and also of sup and inf in less formal discussions.
8 But not a property of Q! We shall return to this issue in 3.3.9.
9 Supposing that each non-empty bounded-above subset of R has a supremum, and that A ⊆ R
is non-empty and bounded below, put B = the set of lower bounds of A. Then B is non-empty and
bounded above (by any element of A that you choose to consider) so it possesses a supremum s.
Now it is routine to confirm that s is the infimum of A.
46 3 INTERLUDE: DIFFERENT KINDS OF NUMBERS
We’ll finish off this section with a few results that combine or compare sup and
inf of different sets.
3.2.13 Lemma Suppose that A and B are two non-empty subsets of R, each
bounded above. Let A + B mean the set {a + b : a ∈ A, b ∈ B}. Then A + B
is also bounded above, and sup(A + B) = sup A + sup B.
Proof
Let s be a temporary symbol for sup A. We know that (for each a ∈ A) a ≤ s and
therefore ka ≤ ks, so at least ks is one of the upper bounds of the set kA. Given
ε > 0, we see that εk is also positive, therefore s − εk < a for some a ∈ A. Hence
ks − ε < ka where ka is an element of kA. This establishes ks as the supremum
of kA.
3.2.16 EXERCISE
1. Prove 3.2.13.
2. State and prove modifications of 3.2.13 and 3.2.14 for infima instead of
suprema.
3. Prove 3.2.15.
4. State and prove a modification of 3.2.15 for a set kA where A is bounded below
(and k is negative).
5. Use these lemmata10 to determine the supremum of the set
3 4
C= − : m ∈ N, n ∈ N .
n+1 m+2
10 The official plural of the Greek word lemma is lemmata, but it is perfectly ok to use the
anglicised plural ‘lemmas’ instead.
3.3 DENSENESS 47
Partial solutions
In part (1), it is purely routine to check that sup A + sup B is an upper bound of
the set A + B.
Now if ε > 0 is given, note that ε/2 is also positive, so there are elements
a ∈ A, b ∈ B greater than sup A − ε/2 and sup B − ε/2 respectively. Combine
these observations.
For part (5), the notation set up in the lemmata lets us express the set C as
3A + (−4)B where
1 1
A= :n∈N , B= :m∈N .
n+1 m+2
It is easy to see that the biggest element (and therefore the supremum) of A is 12
and that the infimum of B is 0. Using the machinery set up by the lemmata, we
therefore find
3.3 Denseness
Our main objective in this section is to establish (and to use) the fact that between
each two distinct real numbers, there is a rational number. We should begin,
however, by looking a little more closely at (what appears to be) the simplest
number system of all,11 that of the positive integers N.
Suppose that a is a particular positive integer and that b1 is another that is less
than a. Since the differences between integers have to be integers, it follows that b1
is at most a − 1. Likewise, if a > b1 > b2 (all three being positive integers) then
b2 is at most b1 − 1 and, consequently, at most a − 2. Continuing this argument,
a > b1 > b2 > b3 will guarantee that b3 ≤ a − 3 provided that all these numbers
are positive integers.
Repeating this argument a times, we find that a > b1 > b2 > b3 > · · · > ba will
guarantee that ba ≤ a − a = 0 if all the numbers involved are positive integers: but
this is impossible, since the positive integer ba cannot be ≤ 0. The contradiction
shows that no strictly decreasing succession of positive integers, starting with a,
can contain more than a terms.
This insight can be presented as a statement about sets of positive integers, as
follows:
11 Actually, N is not as simple as it appears to be. In particular, a complete logical account of the
positive integer system would need to justify the idea of carrying out some procedure an arbitrary
positive-integer-number of times, as we describe in this discussion and in the next proof. But this
is not a textbook on mathematical logic, and we shall accept some intuitive input into our view
of N, just as we did – and continue to do – concerning the real number system R.
48 3 INTERLUDE: DIFFERENT KINDS OF NUMBERS
Proof
Given non-empty A ⊆ N, suppose that A does not possess a least element – that
is, for each element of A that we look at, there will always be a smaller element of
A. Since A is not empty, we can choose an element a somewhere in A.
Since a is not the least element of A, we can find some a1 in A that is smaller.
Since a1 is not the least element of A, we can find some a2 in A smaller than a1 .
Since a2 is not the least element of A, we can find a3 in A smaller than a2 ,
and so on.
Run that argument a times, and we shall have created a strictly decreasing
succession of a + 1 positive integers beginning with a. This contradicts what we
observed above.
3.3.2 Theorem: Q is dense in R If c < d are any two distinct real numbers, then
there is a rational number q such that c < q < d.
Roughwork
The informal idea is this. We choose a positive integer n so big that n1 is smaller
than the gap d − c between c and d, and we think about all the rational fractions
whose denominator is n. These are evenly spaced out across the entire real line at
intervals of n1 , and some of them lie to the left of c, and some of them lie to the
right of d. If we imagine switching our attention from one that lies > d, step by
step toward the left with strides of length n1 until we eventually reach one that lies
< c then, because each step that we took was shorter than the gap between c and
d, one of them must have fallen into that gap. The first one that does this is the
rational q that we were looking for.
1 m k
n n n
c d
Hunting for rationals between c and d
Proof
Case 1: assume that d > 1.
Choose a positive integer n that is greater than 1/(d − c). Then n1 < d − c.
Choose next a positive integer k that is greater than nd. This ensures that nk > d,
and therefore that the set
3.3 DENSENESS 49
k
M= k∈N: ≥d
n
m−1 m 1 m
= − > − (d − c) ≥ d − (d − c) = c
n n n n
m−1
that is, the rational n lies strictly between c and d.
3.3.3 Note It is now easy to see that, between any two distinct real numbers c
and d, there are actually infinitely many rationals: because if not, then the finite set
Q ∩ (c, d) would have a smallest element q , and then there would be no further
rationals between c and q , contradicting denseness. (Alternatively, once we have
one rational q between c and d, then denseness says there is another q1 between c
and q, and another q2 between c and q1 , and another q3 between c and q2 , and so
on endlessly.)
3.3.4 EXERCISE
Partial solution
1. Use proof by contradiction: assume that a + bx equals a rational number,
rearrange to obtain a formula for x and conclude that x is actually rational (in
contradiction to what was given).
2. Use denseness of Q twice to find rationals a and
√ b such that c < a < b < d
and then consider the number a + 12 (b − a) 2.
50 3 INTERLUDE: DIFFERENT KINDS OF NUMBERS
3.3.5 Note Now it follows (by just the same style of argument as in the previous
Note) that between any two distinct real numbers there are actually infinitely
many irrational numbers, as well as infinitely many rational numbers: both Q
and R \ Q have this denseness property. The informal mental picture we should
now be building is that, no matter how small a segment of R we look at and no
matter how high a magnification we use, we shall always see an interleaved mix
of rationals and irrationals – indeed, infinitely many of each of them. This has
important consequences for limits of sequences and for sups and infs:
3.3.6 Proposition
Proof
For any x ∈ R and each n ∈ N in turn, we can use denseness to find a rational
number between x − n1 and x + n1 : call this rational number qn since it may well
depend on n. So
1 1
x − < qn < x + ,
n n
that is, |x − qn | < n1 .12 Given ε > 0, if we choose an integer n0 > 1ε , it follows that
n ≥ n0 will guarantee that |x − qn | < ε, so qn → x as n → ∞. The proof of the
second part is almost identical.
3.3.8 Proposition
1. Every real number x is the supremum of the set of rationals that are less than x.
2. Every real number x is the infimum of the set of rationals that are greater
than x.
3. Every real number x is the supremum of the set of irrationals that are less
than x.
4. Every real number x is the infimum of the set of irrationals that are greater
than x.
12 The next chapter will provide us with a slick and tidy way to finish the argument from that
point: see 4.1.18.
3.3 DENSENESS 51
Proof
Let A = Q ∩ (−∞, x) comprise all the rationals less than x. Certainly x is an upper
bound of that set. Also, for any ε > 0, denseness says that there is a rational q
between x − ε and x. This q belongs to A, and q > x − ε. Hence x is the supremum
of A. The other three parts are proved in just the same way.
3.3.9 Note The following statement is untrue: ‘every non-empty subset of Q that
is bounded above in Q has a least upper bound in Q’. For instance, consider the set
A of rationals whose squares are less than 2. It is clearly non-empty, and bounded
above in Q by, for example, 32 . Now suppose it did have a least upper bound λ
in Q.
√
• If λ < 2, √then we can (due to denseness) find a rational q such that
λ < q < 2 which gives q < 2. This shows that q belongs to A and yet
2
3.3.10 Example
√ To verify that the infimum of the set B = {q : q is rational and
q2 ≤ 2} is − 2.
Solution
√
(a) − 2 is not rational√
so it cannot belong
√ to B.
Any rational q < − 2 has |q| > 2 and q2 = |q|2 > 2, so it cannot belong
to B either. √ √
Hence every element of B must be greater than − 2, that is, − 2 is a lower
bound for B.
√
13 It is possible to revamp this argument in a way that avoids all mention of 2, and that
therefore takes place entirely within the family of rational numbers.
52 3 INTERLUDE: DIFFERENT KINDS OF NUMBERS
√
(b) If ε > 0 then ε = min{ε,√ 2} is also14
√ greater than 0 so, by denseness, there
is a rational r between − 2 and − 2 + ε .
Our choice of ε ensures that this r will be negative, and
√ √
− 2 < r < 0 ⇒ |r| < 2 ⇒ r2 = |r|2 < 2,
that is, r ∈ B.
√
Hence − 2 is inf B.
3.3.11 EXERCISE
√
• Verify that the supremum of {q : q is rational and q2 ≤ 2} is 2.
14 The step that most often puzzles the reader is the replacement of ε by ε at this point. Why
is something like this necessary at all? Because if ε were too big, the
√ next step
√ might go wrong. If,
for instance, ε were 3 then, when we chose a rational r between − 2 and − 2+ε, our otherwise
random choice of rational lies in the interval (−1.414, +1.586) (to three decimal places) and, for
all we know, r could be +1.5. This number has a square greater than 2 and therefore fails to lie
within B, destroying
√ the punch-line of our demonstration. Making sure that the ‘new epsilon’ is
no bigger than 2 guarantees that this will not happen.
.........................................................................
4 Up and down —
increasing and
decreasing sequences
.........................................................................
So these are the sequences that ‘move steadily in one direction – up or down’
as you scan along the list of terms. There are some very obvious examples, such as
1 1 1 √
the decreasing sequences n , n2 , n3 , (−n) and (− n), and the increasing
sequences (n), (n2 ) and 1 − n1 . For less transparent examples, it is often useful
to calculate the difference2 xn − xn+1 to see whether it is always positive (in
which case the sequence is decreasing) or always negative (in which case it is
increasing) or sometimes positive and sometimes negative (in which case it cannot
be monotonic).
1 Notice that, since the inequalities are non-strict (that is, they allow equality), a constant
sequence is both increasing and decreasing.
2 or sometimes – provided that the terms are positive – the ratio xn+1 /xn if that might simplify
through a lot of cancelling: the ratio will be always greater than or equal to 1 if the sequence is
increasing, but less than or equal to 1 if it is decreasing.
Undergraduate Analysis: A Working Textbook, Aisling McCluskey and Brian McMaster 2018.
© Aisling McCluskey and Brian McMaster 2018. Published 2018 by Oxford University Press
54 4 INCREASING AND DECREASING SEQUENCES
Solution
In this case,
6n − 1 6(n + 1) − 1 6n − 1 6n + 5
xn − xn+1 = − = −
4n + 3 4(n + 1) + 3 4n + 3 4n + 7
−22
.
(4n + 3)(4n + 7)
This is obviously negative for all n, so the sequence is increasing (and thus
monotonic).
Solution
In the present case,
4 2 4 2
xn − xn+1 =3+ − 2 − 3+ −
n n n + 1 (n + 1)2
4n2 − 2
.
n2 (n + 1)2
sometimes helpful to look at the two together. For instance, this gives us an easy
way to talk about the boundedness of a sequence:
4.1.4 Definition A sequence is said to be bounded if the set of all its terms is
bounded.
Likewise, a sequence is said to be bounded above if the set of all its terms is bounded
above, and bounded below if the set of all its terms is bounded below.
4.1.5 Lemma A sequence (xn )n∈N is bounded if and only if one of the following
(equivalent) conditions holds:
1. Some bounded closed interval [a, b] includes xn for every n,
2. Some bounded interval of the form [−K, K] includes xn for every n,
3. There is some constant K > 0 such that |xn | < K for every n,
4. All terms of the sequence lie within some fixed distance from 0,
5. All terms of the sequence lie within some fixed distance from a number c ∈ R.
Proof
Suppose that (xn )n∈N converges and that its limit is . By the definition (and
choosing ε = 1 for convenience), there is a positive integer n0 such that all the
0 lie between − 1 and + 1, that is, less than
terms of the sequence after the nth
1 unit distant from . The earlier terms x1 , x2 , x3 , · · · , xn0 −1 may well be further
away from , but there are only a finite number of them: so we can find the biggest
distance from one of them to …call it M. If we now let M = max{M, 1}, then
every xn lies within the distance M from , and so (xn )n∈N is bounded.
4.1.7 Alert The converse of this lemma is certainly not true! For instance, the
‘alternating sequence’ ((−1)n )n∈N is bounded (it lies entirely inside [−1, 1], for
example) but not convergent. However, we’ll now demonstrate a correct partial
converse, namely the result embedded in the name of this section:
4.1.8 Theorem
1. A sequence that is increasing and bounded above must converge. (The limit is
the supremum of the set of all its terms.)
56 4 INCREASING AND DECREASING SEQUENCES
2. A sequence that is decreasing and bounded below must converge. (The limit is
the infimum of the set of all its terms.)
Proof
(1) Suppose that (xn )n∈N is increasing and bounded above. The set X = {xn : n ∈ N}
is non-empty and bounded above so, by the completeness principle, its supremum
(let’s denote it by ) does exist. The definition of supremum then tells us that, if ε
is any positive number:
• xn ≤ for every n ≥ 1, and
• − ε < xm for at least one positive integer m.
Now use the fact that the sequence is increasing, and we get:
in other words, every term of the sequence from number m onwards lies between
− ε and , whence |xn − | < ε for every n ≥ m. Thus, (xn )n∈N converges to .
= supremum
x1 x2 x3 x4 x5 …
−ε +ε
x1 x2 x3 x4 x5 … xm xm+1 xm+2 …
Exercise
Prove part (2) of this theorem: the proof will closely resemble what we have just
set out, but a lot of the inequalities will be the other way around.
4.1.9 Example To use the ‘bounded + monotonic’ theorem to show that the
following sequence (xn ) is convergent: where, for each positive integer n, xn is
defined to be the product of fractions
3 8 15 n2 − 1
xn = × × × ··· × .
4 9 16 n2
4.1 MONOTONIC BOUNDED SEQUENCES MUST CONVERGE 57
Solution
Since all the fractions involved here are positive, it is clear that xn > 0 for all n,
and so the sequence (xn ) is bounded below. Also, comparing the formulae for xn
(n + 1)2 − 1 1
and for xn+1 , we see that xn+1 = xn × = xn × 1 − .
(n + 1) (n + 1)2
2
1
Since the extra multiplier 1 − is positive and less than 1, it follows
(n + 1)2
that xn+1 < xn , that is, that the sequence (xn ) is decreasing. According to the last
theorem (part (2)) it must converge.
Solution
As pointed out already, the sequence whose nth term is the typical number in this
set is decreasing and tends to 3. By the small print of the ‘bounded + monotonic’
theorem, 3 has to be the infimum of the set.
4.1.11 EXERCISE If
2n3 − 5n2 − 4n − 2
an =
2n3 + n2
find the supremum of the set {an : n ∈ N}.
Hint
Most of the work consists in checking that the sequence (an ) is increasing. Verify
first that
2 6
1− 2 − = an .
n 2n + 1
√
4.1.12 EXERCISE Show that the sequence (n − n) is increasing but not
bounded.
4.1.13 Example Let t ∈ (0, 1) be a constant. We show that the sequence (t n )n∈N
converges to 0.
Solution
Since 0 < t < 1, all the powers of t are positive, and t n t < t n 1 = t n , that is, t n+1 < t n
for every n. So the given sequence is decreasing and bounded below by zero, and
therefore must converge to some limit . Now we need to identify .
The sequence (t n+1 )n∈N = (t 2 , t 3 , t 4 , t 5 , · · · ) is the original sequence with its
first term removed so (see earlier comment) it also converges to (the same) . Yet
also (by part (4) of the algebra of limits) t n+1 = t × t n has to converge to t. That
58 4 INCREASING AND DECREASING SEQUENCES
gives = t (since limits are unique when they exist) which, since t = 1, forces
= 0 as predicted.
4.1.14 EXERCISE Let t ∈ (1, ∞) be a constant. Show that the sequence (t n )n∈N
diverges.
Partial solution
Use proof by contradiction. If this sequence did not diverge, it would have to
converge to a limit which (as in the last example) would have to satisfy = t.
Check that the sequence is increasing, and consider the consequences for the
number .
We have delayed proving part (3) of the algebra of limits theorem until now,
because we wanted the convergent sequences are bounded theorem to help in the
demonstration. This proof is harder than most we have done so far, so we shall
first roughwork our way through what needs to be shown and how we might show
it, and then crystallise out a proper proof from that discussion.
Proof
ε
Given ε > 0, use first the fact that an → to find n0 such that3 |an −| <
2|m| + 1
whenever n ≥ n0 . Next, use the fact that (an ), being convergent, must also be
3 The +1 on the bottom line has been put in purely to make sure that we are at no risk of
dividing by zero, and to avoid having to treat m = 0 as a special case.
4.1 MONOTONIC BOUNDED SEQUENCES MUST CONVERGE 59
bounded, so we can find a constant K > 0 such that |an | < K for every value
of n. Now use the convergence of bn to m to find another integer n1 such that
ε
|bn − m| < whenever n ≥ n1 .
2K
For each n ≥ max{n0 , n1 } we now have
The proof is finished. (Notice that we were more careful with the modulus signs in
the final proof than we had been in the opening roughwork.)
4.1.16 EXERCISE Fill in the details in the following outline proof of part (6) of
the algebra of limits theorem. If bn → m, and neither m nor any of the bn is zero,
then
1 1
→ .
bn m
Outline proof
Notice first that
1 1 |bn − m|
− = .
bn m |m||bn |
That last expression could get into deep trouble4 if bn were to become close to zero,
|m| 3|m|
so we need to prevent that.5 Find n1 so that < |bn | < whenever n ≥ n1 .
2 2
For such values of n, check that
|bn − m| 2|bn − m|
< .
|m||bn | m2
m2 ε
Next, find n2 for which |bn − m| < whenever n ≥ n2 .
2
Put all the pieces together.
Lastly, it is now quite easy to prove part (6) from part (3) and the above: begin
by writing abnn as an × b1n and using part (3) on that product.
4 In order to make a fraction small (in modulus), we need to make its top line small but prevent
its bottom line from becoming too small.
5 Since |bn | → |m|, we can keep |bn | close enough to |m| – say, between one half of |m| and
three halves of |m| – to keep it well away from zero.
60 4 INCREASING AND DECREASING SEQUENCES
We’ll conclude this section with two more results that connect limits with
inequalities where, this time, the inequalities are between the terms of two or more
sequences rather than, as above, between the terms of a single sequence.
4.1.17 Theorem: limits across an inequality If (an ) and (bn ) are two convergent
sequences such that an ≤ bn for every n ∈ N, then lim an ≤ lim bn .
Proof
For brevity, put 1 = lim an , 2 = lim bn . If 1 were not ≤ 2 then the number
1 − 2
ε = would be strictly greater than zero. Convergence tells us that, for
2
sufficiently large n, both |an − 1 | and |bn − 2 | will be smaller than ε, that is,
Our choice of ε, however, arranges that 1 − ε and 2 + ε are the same number
– indeed, that is precisely why we chose it so. Therefore (for large values of n like
this) bn < an , and this contradiction establishes the result.
Remarks
1. Be careful not to use this result on sequences about whose convergence you are
unsure. For instance, (−1)n is certainly less than 2 for every value of n . . . but
does this tell us that lim(−1)n ≤ 2? No, because lim(−1)n does not exist.
2. Also be aware that the strict inequality < is not preserved under limits in this
way: that is, an < bn for all n does not guarantee that lim an < lim bn . A simple
illustration of this is that − n1 is certainly strictly less than + n1 for every n, and
each tends to 0, but it would be foolish to claim that 0 < 0 as a consequence.
4.1.18 Theorem: the ‘sandwich’, or the ‘squeeze’ Of three sequences (an ), (bn ),
(cn ) suppose we know that an ≤ bn ≤ cn for all n, and also that (an ), (cn ) converge
to the same limit . Then also bn → .
Proof
Given any ε > 0 we can first use the given convergence to find positive integers
na , nc such that |an − | < ε for n ≥ na and |cn − | < ε for n ≥ nc . Then for each
n ≥ max{na , nc } we shall have both of these inequalities true at once, and so
4.1.19 Examples
Solution
The awkward-looking trigonometric term must lie between −5 and +5, so the
6n − 5 6n + 5
nth term here lies between and . Since each of these converges
2n + 3 2n + 3
to 3 via the algebra of limits, so must the given sequence.
2. To find the limit of the sequence whose nth term is
n2
.
n2 + 3n cos(n3 + n + 1) − 2 sin(π ln 5n − 16)
Solution
Take care with the inequalities when estimating bottom lines of fractions. We
know −1 ≤ cos θ ≤ +1 and −1 ≤ sin θ ≤ +1 no matter what (real) number
θ may be, so
1 1 1
≥ 2 ≥ 2
n2 − 3n − 2 n + 3n cos(n3 + n + 1) − 2 sin(π ln 5n − 16) n + 3n + 2
n2 n2 n2
≥ ≥ .
n2 − 3n − 2 n2 + 3n cos(n3 + n + 1) − 2 sin(π ln 5n − 16) n2 + 3n + 2
Since (via the algebra of limits) the first and third of these expressions
converge to 1, so must the given sequence that is squeezed between them. The
circumstance, that we ignored the first three terms, has no effect on limiting
behaviour, of course.
Partial solution
As we scan along the list of n + 1 separate fractions whose sum defines tn , the
numerators decrease and the denominators increase; consequently the largest of
these fractions is the first and the smallest is the last. Therefore
4n2 − n 4n2
(n + 1) < tn < (n + 1).
7n3 + 3n 7n3
cases, depending on the value(s) of the variable(s). For example, the proof we gave
of the result
lim(kxn ) = k lim xn
divided into the two cases k = 0 and k = 0 and, later, the discussion of whether
(xn ) converges or not usually splits into cases such as 0 < x < 1, −1 < x < 0, x = 0,
x = 1, x = − 1, |x| > 1 because either the result or the argument (or both) will run
differently depending on the variable’s value.
In this area, the ‘worst conceivable situation’ is that in which we appear to be
forced to consider an infinite number of cases. At the time of writing, the notorious
‘3n + 1’ problem seems to be stuck in this nightmare zone. The ‘3n + 1’ problem is
this: given a positive integer n, either divide it by 2 (if it is even) or multiply it by 3
and add 1 (if it is odd); now repeat that process on your answer, and on the answer
to that, and so on. Question: do you always get to 1 in a finite number of moves?
Here is an illustration, starting on 58:
58 → 29 → 88 → 44 → 22 → 11 → 34 → 17 → 52 → 26 → 13 → 40 →
→ 20 → 10 → 5 → 16 → 8 → 4 → 2 → 1
Well, we reached 1 that time. Does that always happen, no matter what n
we start with? Nobody knows, perhaps in part because although an enormous
number of individual initial n’s have been checked out, and many special cases
have been successfully handled, nobody has yet devised a finite list of special-
case arguments that comprehensively covers all positive integers. Warning: do not
invest a disproportionate amount of your time into exploring this problem; you
have plenty of other things to do.
(Mathematical) induction is a pattern of proof that is highly successful in estab-
lishing universal statements that are controlled by a positive integer. Its strength lies
in the fact that a (usually quite routine) demonstration along generally predictable
lines will cover all positive integer cases at once: so that when it works (which is not
always, but very frequently), it proves the truth of an infinite number of statements
all at once. Here in outline is what the pattern of proof by induction is:
• Step 0: express the result that you are trying to prove as a sequence (S(n))n∈N of
statements, where S(n) involves the typical positive integer n.
• Step 1: check that the first statement S(1) is actually true.
• Step 2: assume the truth of a particular (but unspecified) S(k).
• Step 3: deduce from this that the next statement S(k + 1) is also true.
That’s all you need to do. At that point, induction says that all of the statements
S(n) are true statements.
4.2.1 Example To show using induction that 72n−1 +52n+1 +12 is exactly divisible
by 24, for every positive integer n.
64 4 INCREASING AND DECREASING SEQUENCES
Solution
We’ll follow slavishly the pattern of proof set out above; once you are familiar with
induction, you can take some shortcuts.
• Step 0: For each n ∈ N let S(n) be the statement: 72n−1 + 52n+1 + 12 is exactly
divisible by 24.
• Step 1: S(1) says that 7 + 125 + 12 is divisible by 24. Since the total is 144, this
is indeed true.
• Step 2: Assume the truth of a particular S(k); that is, that 72k−1 + 52k+1 + 12
really is divisible by 24.
• Step 3: Now 72(k+1)−1 + 52(k+1)+1 + 12 − (72k−1 + 52k+1 + 12) simplifies to
72k−1 (49 − 1) + 52k+1 (25 − 1) which certainly is a multiple of 24 (write it as
24m, say) because 49 − 1 and 25 − 1 are. Therefore
which, using Step 2, is the total of two multiples of 24, and therefore itself a
multiple of 24. In other words, S(k + 1) is also true.
By induction, all of the statements are true: that is, 72n−1 + 52n+1 + 12 is exactly
divisible by 24 for every positive integer n.
4.2.2 Example: the sum of the first n perfect squares We show that, for every
positive integer n,
n(n + 1)(2n + 1)
12 + 22 + 32 + 42 + · · · + n2 = .
6
Solution
• Step 0: For each n in turn let S(n) be the statement:
n(n + 1)(2n + 1)
12 + 22 + 32 + 42 + · · · + n2 = .
6
(1)(1 + 1)(2 + 1)
• Step 1: S(1) says that 12 = which is certainly true.
6
• Step 2: Assume the truth of a particular S(k); that is, that
k(k + 1)(2k + 1)
12 + 22 + 32 + 42 + · · · + k2 = .
6
• Step 3: Adding the next perfect square to each side will not damage the
equation, so
k(k + 1)(2k + 1)
12 + 22 + 32 + 42 + · · · + k2 + (k + 1)2 = + (k + 1)2
6
k+1 k+1 2
= (k(2k + 1) + 6(k + 1)) = 2k + k + 6k + 6
6 6
4.2 INDUCTION: INFINITE RETURNS FOR FINITE EFFORT 65
k+1 2 k+1
= 2k + 7k + 6 = ((k + 2)(2k + 3))
6 6
(k + 1)(k + 1 + 1)(2(k + 1) + 1)
= .
6
Solution
• Step 0: For each n in turn let S(n) be the statement: (1 + x)n ≥ 1 + nx.
• Step 1: S(1) says that (1 + x)1 ≥ 1 + x which is not very interesting but
certainly true.
• Step 2: Assume the truth of a particular S(k); that is, that (1 + x)k ≥ 1 + kx.
• Step 3: Now because x ≥ −1, we know that (1 + x) is positive or zero, so it is
safe to multiply a non-strict inequality by it. Thus:
(1 + x)k+1 ≥ 1 + x + kx = 1 + (k + 1)x.
Solution
• Step 0: For each value of n in turn let S(n) be the statement: n + 1 distinct
straight lines can’t cross at more than n(n + 1)/2 points.
• Step 1: S(1) says that 2 distinct straight lines cannot cross at more than
1(2)/2 = 1 point, which is a simple geometric truth.
• Step 2: Assume the truth of a particular S(k); that is, that k + 1 such lines have
at most k(k + 1)/2 crossing points.
• Step 3: If we are now given (k + 1) + 1 = k + 2 such lines, imagine looking at
the first k + 1 of them. By step 2, those lines have at most k(k + 1)/2 crossing
points. Imagine we now draw in the last (the (k + 2)th ) line: it hits each of the
66 4 INCREASING AND DECREASING SEQUENCES
previous k + 1 lines at most once, so the total number of crossing points now is,
at the most, k(k + 1)/2 + (k + 1). This rearranges easily as (k + 1)(k + 2)/2,
which is the same as (k + 1)((k + 1) + 1)/2. In other words, S(k + 1) is also true.
By induction, all of the statements are true.
4.2.5 EXERCISES
n2 (n + 1)2
13 + 23 + 33 + · · · + n3 = .
4
2. Use induction to verify that, whenever n is a positive integer:
4.2.6 Comments
• It’s a very common experience, when you first meet induction, to feel that it is
cheating in some sense! It can seem that, instead of proving the wished-for
result properly, you are just assuming (at step 2) that it is true already. This is
not, however, what is going on. The desired result is something that claims to
work for all positive integers and, in that phrase, the most important word is
all. At step 2, what we are assuming is definitely not that the relevant statement
is true for all positive integers but merely for one particular positive integer. This
is perfectly reasonable: in fact, at step 1 we already confirmed that it actually is
true for n = 1, so there is nothing outrageous or illogical about supposing that
it might be true for some (other) values. This is all that we are doing at step 2.
• Step 3 is the only part of the argument at which you usually have to pause and
think a bit. The question to be pondered is: how am I going to turn statement
number k into statement number k + 1? – and there is no all-purpose answer: it
will depend on what these statements are trying to say. Looking back at our
four little case studies above, in the first one the two lumps of algebra were very
similar in appearance, and it was a reasonable guess that if we subtracted them
we might see in a convenient form what the difference was. For the second,
adding in the next perfect square was the natural way to trade up from the sum
of k squares to the sum of k + 1 of them. In the third, powers of (1 + x) were
the essential ingredient, and we had to think how to turn (1 + x)k into
(1 + x)k+1 . . . to which the simple answer, once the question is posed, is:
multiply by another (1 + x), and first ask yourself whether it is actually safe to
do that to an inequality. In the fourth example, how can we predict the
behaviour of k + 2 straight lines when we already know only how k + 1 lines
behave? How else, apart from keeping one line aside for the moment, letting the
4.2 INDUCTION: INFINITE RETURNS FOR FINITE EFFORT 67
remaining k + 1 lines do what we know they can, and then bringing in the last
line to see how it might interact with the others? In many cases, then, there is a
kind of inevitability about how you trade up from statement k to statement
k + 1, but you may need to look quite carefully at those statements before you
see what it is.
• As to why induction is a valid method of proof, it may help if you imagine the
various component statements S(1), S(2), S(3), . . . stacked one above the
previous one, like the rungs of an (endless) ladder ‘heading off to infinity’. By
checking the truth of S(1) you are, almost literally, getting your foot on the
bottom rung of the ladder – testing that it is strong enough to take the weight
of careful inspection. The main part of the induction process, then, the
demonstration that S(k + 1) follows as a logical deduction from S(k), says that
you can always climb from any ‘sound’ rung to the one above it. So, start
climbing: the first rung is strong/valid/true, therefore the one above (that is, the
second rung) is also. From that observation, it follows that the one above that
(the third rung) is equally sound. From that, so is the fourth. From that, so is
the fifth . . . when is this process going to stop? Never! The way in which the
positive integers are naturally ordered is that any particular one of them can be
reached in a finite number of steps starting at 1 and increasing by 1 each time.
For that reason, any particular S(n) is accessible by the process set out in the
induction template, and must therefore be a true statement.
• If you really want to understand why induction works, think back to paragraph
3.3.1 (every non-empty set of positive integers possesses a least element). Once
we know that S(1) is true and that S(k) implies S(k + 1) for each k ≥ 1, then
suppose that some of the S(n)’s are not true: that is, that the set
W = {n ∈ N : S(n) is false}
We shall finish the section by doing two further examples: one that illustrates
how to carry out these minor changes when ‘n = 1’ is not the right starting
point, and one that observes that what we called Step 0 can on occasions be a
little tricky to get right.
Solution
• Step 0: For each n = 9, 10, 11, 12, · · · let S(n) be the statement: n! > 4n .
• Step 1: S(9) says that 9! > 49 . A little calculation shows that the left-hand side is
362880 and the right-hand side is 262144, so this statement is correct.
• Step 2: Assume the truth of a particular S(k): that is, that k! > 4k .
• Step 3: (In order to turn k! into the expected left-hand side (k + 1)! of statement
k + 1, we need to multiply by k + 1, which is of course, positive . . . indeed, it is
at least 10 since k ≥ 9.)
Attempted solution
• Step 0: For each integer n ≥ 2 let S(n) be the statement ‘n can be expressed as a
product of primes’. (That’s the obvious way to break the claimed result into
layers, isn’t it?)
• Step 1: S(2) says that 2 is a product of primes; but 2 is itself a prime, so this
statement is vacuously true: 2 = 2 gives the prime factorisation of 2.
• Step 2: Assume the truth of a particular S(k); that is, k can be expressed as a
product of primes.
• Step 3: Suddenly we hit a snag. There is no evident way to get from the prime
factors of k to the prime factors of k + 1. Indeed, no prime factor of k can
possibly divide into k + 1 since they differ by 1.
However, we can re-word Step 0 and try again:
Reattempted solution
• Step 0: For each integer n ≥ 2 let S(n) be the statement ‘each integer from 2 up
to n can be expressed as a product of primes’. (If we can prove all of those true,
we shall have what we want.)
4.2 INDUCTION: INFINITE RETURNS FOR FINITE EFFORT 69
• Step 1: S(2) says just that 2 is a product of primes; but, as before, this statement
is vacuously true: 2 = 2 gives the prime factorisation of 2.
• Step 2: Assume the truth of a particular S(k); that is, that each integer from 2 up
to k can be expressed as a product of primes.
• Step 3: Now with a view to S(k + 1), we know from Step 2 that each integer
from 2 to k can be prime-factorised, and we only still need to look at k + 1. If
k + 1 happens to be prime, there is no need to do anything: k + 1 = k + 1 is a
trivial prime factorisation. Otherwise, k + 1 is not prime and (by definition of
prime) can be written as the product of two smaller numbers, say, k + 1 = a.b
where a, b are at least 2 but less than k + 1. Yet then Step 2 tells us that each of
a and b can be written as the product of a list of primes and, putting the two lists
together, we have a prime factorisation of a.b = k + 1. So S(k + 1) is confirmed.
By induction, all the statements S(n) are true, and so all integers from 2 upwards
can be prime-factorised.
4.2.9 EXERCISES
4.2.10 Note: binomial coefficients It may be useful to round off this section by
revising the binomial theorem and the coefficients that appear in connection with
it. Whenever n is a positive integer and k is an integer such that 0 ≤ k ≤ n, the
symbol
n n!
=
k k!(n − k)!
is called a binomial coefficient. It is, amongst other interpretations, the number
of different possible selections of k objects that can be chosen6 from n dis-
tinct
n objects.
n Straightforward
n n calculations confirm that (for all relevant n and k)
n
0 = n = 1, 1 = n, n−k = k and, most importantly:
n
n n k
(1 + x) = x .
k
k=0
Proof
• Step 0: For each n ∈ N in turn, let S(n) denote the statement:
(1 + x)n = n0 nk xk .
• Step 1: S(1) says that (1 + x)1 = 10 + 11 x which is trivially correct.
j
• Step 2: Assume the truth of a particular S(j); that is, that (1 + x)j = 0 kj xk .
(Notice that we are, as usual, taking care not to use the same symbol with more
than one meaning.)
• Step 3: In order to turn (1 + x)j into the expected left-hand side (1 + x)j+1 of
statement number j + 1, we need to multiply by another (1 + x) and carefully
gather up7 each power of x that appears:
j
j k
(1 + x)j+1 = (1 + x) x
k
k=0
j 1 j 2 j 3 j
= (1 + x) 1 + x + x + x + ··· + xj−1 + xj
1 2 3 j−1
j j j j j j
=1+ + 1 (x1 ) + + (x2 ) + + (x3 ) + · · · + 1 + (xj ) + xj+1
1 2 1 3 2 j−1
j j 1 j j 2 j j 3 j j
=1+ + (x ) + + (x ) + + (x ) + · · · + + (x ) + xj+1
j
1 0 2 1 3 2 j j−1
j+1 1 j+1 2 j+1 3 j+1 j
=1+ x + x + x + ··· + x + xj+1
1 2 3 j
j+1
j+1 k
= x
k
k=0
– using the identity from 4.2.10 at the last-but-one line. In other words,
S(k + 1) is also true.
By induction, all of the statements are true.
7 We are using overscoring as another form of bracketing, to try to improve readability in these
few lines of algebra.
4.3 RECURSIVELY DEFINED SEQUENCES 71
desired solution
x2 x1 x
In general, x2 will be significantly closer to the true solution than x1 was, and we
can now repeat the improvement process
f (x2 )
x3 = x2 − ,
f (x2 )
f (x3 )
x4 = x3 − ,
f (x3 )
f (x4 )
x5 = x4 −
f (x4 )
72 4 INCREASING AND DECREASING SEQUENCES
4.3.1 Example To identify, if it exists, the limit of the sequence (xn )n≥1 defined
recursively by: √
x1 = 12; xn+1 = 3xn + 28 (n ≥ 1).
Draft solution
With so little given information, we need to start by calculating the first few terms.
They work out (to four decimal places) as follows:
It must be stressed that this is very little evidence as to what happens as n goes to
infinity! Nevertheless, it is enough to let us make a clutch of informed guesses: we
guess that all the terms lie between 12 and 7, that the sequence is decreasing, and
that the limit is 7. Now we have a definite proposal to try to establish:
Solution
• Step 0: For each n ≥ 1 let S(n) be the statement: 7 < xn ≤ 12.
• Step 1: S(1) says that 7 < 12 ≤ 12, which is true.
• Step 2: Assume the truth of a particular S(k); that is, that 7 < xk ≤ 12.
• Step 3: (How are we to get from xk to the next term? As the recursive definition
told us, we multiply by 3, add 28, and take the square root. So:)
21 < 3xk ≤ 36, 49 < 3xk + 28 ≤ 64, 7< 3xk + 28 = xk+1 ≤ 8 ≤ 12.
xn2 − xn+1
2
= xn2 − 3xn − 28 = (xn − 7)(xn + 4)
and because we already know that all terms lie between 7 and 12, that is a product of
two positive numbers, therefore positive itself. So xn2 > xn+12 and, taking (positive)
square roots, we get xn > xn+1 for all values of n, that is, the sequence is decreasing.
We now know that (xn ) is both bounded and decreasing, and must therefore
converge to some limit . Also (xn+1 ), being merely the sequence (xn ) without its
first term, converges to the same , and (xn2 ) converges to 2 by algebra of limits.
Take limits across the equation
2
xn+1 − 3xn − 28 = 0 for all n
4.3.2 Example For the sequence (an ) specified by the formulae a1 = 0, an+1 =
√
4an + 77:
1. Show that 0 ≤ an < 11 for every n ≥ 1,
2. Show that the sequence is increasing,
3. Explain why it must possess a limit, and evaluate that limit.
Solution
• Step 0: For each n let S(n) be the statement: 0 ≤ an < 11.
• Step 1: S(1) says 0 ≤ 0 < 11 which is true.
• Step 2: Assume that 0 ≤ ak < 11 for some particular k.
3: Then 4(0) + 77 ≤ 4ak + 77√
• Step √ < 4(11) + 77, and so
√
0 ≤ 77 ≤ 4ak + 77 = ak+1 < 121 = 11. Therefore S(k + 1) is also true.
By induction, all the statements S(n) are true. This proves (1).
Using (1), a2n − a2n+1 = a2n − 4an − 77 = (an + 7)(an − 11) is the product of a
positive and a negative, therefore negative. That is, a2n < a2n+1 and so an < an+1 for
each n. This proves (2).
Since (an ) is now bounded and increasing, it must converge. Let be its limit.
Also an+1 → . Taking limits across a2n+1 − 4an − 77 = 0 (for all n) gives
2 − 4 − 77 = ( + 7)( − 11) = 0, so can only be −7 or 11. Yet = −7
is impossible because every term is at least 0. Therefore an → 11.
4.3.3 EXERCISE For the sequence (an ) specified by the formulae a1 = 16,
√
an+1 = 20 + an :
1. Show that 5 < an ≤ 16 for every n ≥ 1,
2. Show that the sequence is decreasing,
3. Explain why it must possess a limit, and evaluate that limit.
4.3.4 EXERCISE Find (if it exists) the limit of the indicated sequence:
√ √ √ √
5, 5 + 5, 5 + 5 + 5, 5 + 5 + 5 + 5, · · · .
Draft solution
We first need a clearer idea of what this sequence is. The pattern is that, for each
term in turn, the next one is created by
√adding 5 and taking the square root. In other
√
words, an+1 = 5 + an . Also a1 = 5 to get the process started. This flags up the
recursive nature of the problem. Calculate the first few terms and they seem to be
increasing towards a limit of approximately 2.791. What number could that be?
If there is indeed a limit then (an+1 ) will also converge to √so, from
a2n+1 = 5 + an we get 2 = 5 + . This quadratic has solutions (1 ± 21)/2 =
2.7913, −1.7913 approximately, so now we can see exactly what number that limit
is going to be.
Break the problem into the same three sections as before:
√ √
1. Show that 5 ≤ an < (1 + 21)/2 for every n ≥ 1,
2. Show that the sequence is increasing,
3. Explain why it must possess a limit, and evaluate that limit.
4, 3(4) − 2, 3( 3(4) − 2) − 2, 3( 3( 3(4) − 2) − 2) − 2,
3( 3( 3( 3(4) − 2) − 2) − 2) − 2, · · · .
4.4 POSTSCRIPT: THE EPSILONTICS GAME 75
4.3.7 EXERCISE
5 Sampling a sequence
— subsequences
.........................................................................
5.1 Introduction
If we sample an (infinite) sequence by picking out a finite number of its terms,
it would be unreasonable to expect such a sample to be at all ‘representative’ in
the sense of telling us anything useful about the sequence as a whole. Certainly it
will not tell us anything about its possible limit: for we have pointed out several
times that changing or deleting any finite number of terms does not affect limiting
behaviour in any way. What if, instead, we sample an infinite selection of terms?
Then at least our selection (in the original order) will constitute a sequence in its
own right, and we might reasonably expect that its behaviour will tell us something,
but not everything, about the original sequence; on the other hand, knowledge
about the whole sequence is likely to give us all the information we might need
about the newly selected one. This short chapter gives a more precise description
of the idea that we are sketching here, and develops and applies a few rather
predictable results plus one that is less obvious and much more powerful (the
Bolzano-Weierstrass theorem) and which will play a key role later in the text.
5.2 Subsequences
Informally, a subsequence of a sequence is an endless list of its terms in their
original order. So if (xn )n≥1 is any sequence, and we imagine its terms strung out
in an unending list
x1 , x2 , x3 , x4 , x5 , x6 , x7 , x8 , x9 , x10 , · · ·
then a subsequence will be created by scanning along that list and lifting out an
unending selection of what we find; for instance,
or
x4 , x8 , x13 , x400 , x401 , x605 , x677 , x759 , · · · .
Undergraduate Analysis: A Working Textbook, Aisling McCluskey and Brian McMaster 2018.
© Aisling McCluskey and Brian McMaster 2018. Published 2018 by Oxford University Press
78 5 SAMPLING A SEQUENCE — SUBSEQUENCES
How should we best write a general subsequence when we don’t know in advance
which terms are to be picked out? We could relabel the first item chosen as y1
and the second chosen as y2 and the third as y3 and so on, and the resulting
symbol (yn )n∈N is certainly a perfectly good notation for a sequence, but it has lost
any visible connection with the original sequence that we started with. A better
method is to give labels to the places in the original sequence where we find the
subsequence’s items: if our first choice occurred at the nth1 place in the original list,
and our second at the nth2 place, and our third at the n th place and so on, then the
3
chosen numbers that build up the subsequence are
and the entire subsequence can now be written as (xnk )k≥1 . This is a slightly
cluttered symbol, but it succeeds in capturing two of the important aspects of a
subsequence: that it actually is an infinite sequence, and that its terms are some of
the terms from the original (xn ). The third important aspect – that the order has
to be the same as was in the original (no back-tracking allowed) – is captured by
insisting that n1 < n2 < n3 < n4 < · · · , in other words, that the sequence of labels
(n1 , n2 , n3 , n4 , · · · ) has to be strictly increasing.
With that discussion behind us, we can now write down concisely what a
subsequence is (and how to denote it):
Notice how we used different symbols, in the sequence and in the subsequence,
for ‘the thing that is going to infinity’. Indeed, we must be careful not to use a symbol
with two meanings at once anywhere in mathematics, and especially not when
working with subsequences: if we had denoted our subsequence by such a notation
as (xnn )n≥1 then confusion would be practically guaranteed. Of course there is no
need to use the particular letter k that we chose here: (xni )i≥1 or (xnp )p≥1 would
have done equally well.
5.2.2 Examples
1. For any sequence (xn )n≥1 , the following are a few of its subsequences: the
sequence of even-numbered terms (x2k )k≥1 , the sequence of odd-numbered
terms (x2k−1 )k≥1 , the sequence (xk2 )k≥1 = (x1 , x4 , x9 , x16 , x25 , · · · ), the
sequence (xk! )k≥1 = (x1 , x2 , x6 , x24 , x120 , · · · ), the sequence
(x2k )k≥1 = (x2 , x4 , x8 , x16 , x32 , · · · ).
2. In particular,
(5k − 1) is a subsequence
of (n), (k−2 ) is a subsequence of (n−1 ),
1 1
is a subsequence of n , (sin(7k + 3k + 6)) is a subsequence of (sin n),
2
√
k! √
( 3k) is a subsequence of ( n n).
3k
5.2 SUBSEQUENCES 79
3. The
sequence
1 1 1 1 1
(−1)n 2 + = −2 − , 2 + , −2 − , 2 + , · · · does not
n n∈N 1 2 3 4
converge, but two of its subsequences (those consisting of the odd terms and
the even terms) are
1 1 1 1
−2 − , −2 − , −2 − , −2 − , · · ·
1 3 5 7
2k−1 1 1
= (−1) 2+ = − 2+
2k − 1 k∈N 2k − 1 k∈N
and
1 1 1 1
2 + ,2 + ,2 + ,2 + ,···
2 4 6 8
2k 1 1
= (−1) 2+ = 2+
2k k∈N 2k k∈N
Proof
Suppose that xn → and that (xnk )k≥1 is a subsequence of (xn )n≥1 . We need to
show that xnk → (as k → ∞).
80 5 SAMPLING A SEQUENCE — SUBSEQUENCES
n1 ≥ 1, n2 ≥ 2, n3 ≥ 3, · · ·
and, in general,
nk ≥ k.
If ε > 0 is given, xn → tells us that there is a positive integer n0 such that n ≥ n0
forces |xn −| to be < ε. Now notice that k ≥ n0 forces nk ≥ k ≥ n0 , that is, nk ≥ n0 ,
from which we get |xnk − | < ε. (Less formally, big enough values of k guarantee
that the error |xnk − | will be smaller than ε.) Hence xnk → , as required.
5.2.4 Example To show that the sequence (−1)n 2 + n1 n∈N is divergent.
Solution
In 5.2.2 (3) we noticed that this sequence has a subsequence whose limit is −2 and
another whose limit is 2. By the theorem, this could not happen if the full sequence
converged. Hence the result.
Here is a kind of weak converse to Theorem 5.2.3:
5.2.5 Example Suppose we are given a sequence (xn )n ≥ 1 and a number such
that the subsequence of odd-numbered terms (x2k−1 )k ≥ 1 and the subsequence of
even-numbered terms (x2k )k ≥ 1 both converge to . To show that (xn )n ≥ 1 itself
converges to .
Solution
Given ε > 0, the convergence of the two subsequences tells us that there are positive
integers ko and ke such that
Partial proof
If (an )n∈N is increasing and (ank )k∈N is one of its subsequences then, for each k, we
have nk < nk+1 . Fill in the integers that lie between them, and we see
so ank ≤ ank+1 .
Expressing the last exercise briefly as ‘sequence is monotonic implies subse-
quence is monotonic’, it would be foolish to expect anything like a full converse
(along the lines of ‘subsequence is monotonic implies sequence is monotonic’)
since a single subsequence simply does not contain enough information about the
parent sequence for such a conclusion to be at all plausible. All the same, it would
not be unreasonable of us to expect some kind of partial converse, in which the
monotonicity of a subsequence told us something about the ordering of the terms
of the entire sequence. It is therefore a little surprising to see, from the next result,
that possession of a monotonic subsequence tells us absolutely nothing about the
parent sequence
Proof
Let (xn )n∈N be any sequence. To help us (almost literally) see through this curious
proof, let us call a positive integer m farsighted if xm is greater than all the later
terms in the sequence, that is, if xm > xq for every q > m. (Imagine that you are
standing on xm and trying to see off to infinity over the heads of all the later xq ’s in
the sequence; if you can do that, then the m that you are using is farsighted.) Now
either there are infinitely many farsighted integers, or else there are only finitely
many (perhaps even none).
In the first case, there is an (endless) succession m1 < m2 < m3 < · · · of far-
sighted integers, and by definition of farsighted we get
is, for every integer s we encounter, there will be a greater integer s for which
xs ≤ xs . (Every xs has its view of infinity obstructed by some later xs , so to speak.)
This yields firstly xr ≤ xr1 for some r1 > r, and then in turn xr1 ≤ xr2 for some
r2 > r1 , xr2 ≤ xr3 for some r3 > r2 and so on without end. Look: we are forming
an increasing subsequence
Draft solution
In each case, it will be enough to recognise the given sequence as a subse-
quence of some (simpler) sequence whose limit you already know or can easily
calculate.
Fragment of solution
In the second of these, you may save yourself a good deal of trigonometry by
noticing the value of the nth term when n is an odd number times 77, and again
when n is an even number times 77. (The idea of focusing on 77 is driven simply
by a wish to avoid fractions if we can reasonably do so without losing too much
information.)
5.3 BOLZANO-WEIERSTRASS: THE OVERCROWDED INTERVAL 83
Proof
Let (xn )n∈N be any bounded sequence. By its boundedness, we can find a closed
interval I0 = [−M, M] (for some positive M) that includes every term xn .
Now I0 is the union of its left half and its right half: indeed, we can write
I0 = [−M, 0] ∪ [0, M]. So we can pick one of the two halves that includes xn for
infinitely many values of n. Call whichever half we pick I1 , and select also n1 for
which xn1 ∈ I1 .
Next, repeat this argument upon I1 : for I1 is the union of its left half and its right
half, and we can pick one of the two halves that includes xn for infinitely many
values of n. Call whichever half we pick this time I2 , and select also n2 greater than
n1 for which xn2 ∈ I2 . The phrase greater than n1 is legitimate because we have an
infinite number of possible n2 s to pick from, and can therefore surely arrange to
make that choice larger than our previous n1 , whatever it was.
Next, repeat this argument upon I2 : I2 is the union of its left half and its right
half, and we can pick one of the two halves that includes xn for infinitely many
values of n. Call whichever half we pick this time I3 , and select also n3 greater than
n2 for which xn3 ∈ I3 .
1 Unlike real pigeonholes, the sets A and B do not have to be disjoint; unlike real pigeons, the
terms of the sequence do not have to be all distinct from one another.
84 5 SAMPLING A SEQUENCE — SUBSEQUENCES
And so on without end (and you can make an appeal to induction if you feel that
it’s necessary).
This generates a sequence (I1 , I2 , I3 , · · · , Ik , · · · ) of closed intervals, each con-
tained in the previous one and exactly half of its length, and also a subsequence
(xn1 , xn2 , xn3 , · · · , xnk , · · · ) of the original sequence, such that (for all k) xnk belongs
to Ik . All we still need to do is to use the ‘shrinking’ nature of the intervals to show
that the subsequence converges.
Write the typical Ik as [ak , bk ] and notice that, since the length of I1 was M and
the (half) length of I2 was M/2 and the length of I3 was M/4 and so on, bk is actually
ak + M/2k−1 . Since Ik+1 is either the left or the right half of Ik , we also have
ak ≤ ak+1 ≤ bk+1 ≤ M
ak ≤ xnk ≤ bk = ak + M/2k−1
lets us appeal to the squeeze because, since 2−k → 0 as k → ∞ (see 4.1.13), ak and
ak + M/2k−1 both converge to (the same) . Thus we have (xnk ) converging to .
An alternative approach to proving Bolzano-Weierstrass is outlined in para-
graph 5.3.7.
For the purposes of this text, the really important applications of Bolzano-
Weierstrass happen only after we have defined continuous functions. Until we reach
that point, 5.3.2 and 5.3.3 will provide a little insight into how it may be used. In
any case, please take note of 5.3.5 and 5.3.6 which have a direct bearing on our
ongoing study of sequences.
and, in general, |xn − xn+1 | < 1/(n + 1) for every positive integer n. To prove that
every bounded sequence has a channelled subsequence.
Solution
Thanks to what Bolzano-Weierstrass tells us, it will be enough just to show that
every convergent sequence has a channelled subsequence; so let us tackle that.
Suppose (yn )n∈N converges to . Then there is n1 ∈ N such that |yn − | < 14
whenever n ≥ n1 . There is also n2 ∈ N such that |yn − | < 16 whenever n ≥ n2 ,
and we can make sure that n2 > n1 (just by increasing n2 if necessary). Then there
is also n3 ∈ N such that |yn − | < 18 whenever n ≥ n3 , and we can make sure that
n3 > n2 (just by increasing n3 if necessary).
5.3 BOLZANO-WEIERSTRASS: THE OVERCROWDED INTERVAL 85
We are now into an induction process, and it will generate more and more
positive integers n3 < n4 < n5 < n6 < · · · such that (in the sequence (yn )):
1
• each two terms from number n4 onwards will be less than 10 away from ,
1
• each two terms from number n5 onwards will be less than 12 away from ,
1
• each two terms from number n6 onwards will be less than 14 away from ,
– and so on. Using these distance estimates (and the triangle inequality), we
see that
• |yn1 − yn2 | ≤ |yn1 − | + | − yn2 | < 1/4 + 1/4 = 1/2,
• |yn2 − yn3 | ≤ |yn2 − | + | − yn3 | < 1/6 + 1/6 = 1/3,
• |yn3 − yn4 | ≤ |yn3 − | + | − yn4 | < 1/8 + 1/8 = 1/4,
• |yn4 − yn5 | ≤ |yn4 − | + | − yn5 | < 1/10 + 1/10 = 1/5
— and so on. In other words, the subsequence (yn1 , yn2 , yn3 , yn4 · · · ) is channelled.
5.3.3 EXERCISE Let us (equally temporarily) call a sequence (xn )n∈N superchan-
nelled if
and, in general, |xn − xn+1 | < 10−n for every positive integer n. Prove that every
bounded sequence has a superchannelled subsequence.
Draft solution
Think of the coordinates (xn , yn ) of the typical point Pn in the sequence. For
each positive integer k (thinking ε = 1k ) we get a positive integer nk such that
0 < ynk < 1k . Can we use Bolzano-Weierstrass on the sequence (xnk ) of all the
x-coordinates of the associated points? What does it tell us if we do?
5.3.5 Example Let (an )n∈N be a bounded sequence that is not convergent. We
show that it must have two convergent subsequences possessing different limits.
Solution
Find M > 0 such that [−M, M] contains the entire sequence. By Bolzano-
Weierstrass, there is a subsequence (ank )k∈N that converges to a limit . Yet (an )n∈N
86 5 SAMPLING A SEQUENCE — SUBSEQUENCES
itself does not converge to : which means that there is some positive number ε
so small that an never settles permanently between − ε and + ε. This in turn
means that, whichever n0 in N we think of, there is some greater n > n0 for which
xn is outside those borderlines . . .put more tidily, there is a whole subsequence
of (an )n∈N lying in [−M, − ε] ∪ [ + ε, M]. By pigeonholing, one of the two
intervals here contains a subsequence. Use Bolzano-Weierstrass on this and it
gives us another sub-subsequence (still a subsequence of the original, of course)
converging to a limit which, since it has to lie in [−M, − ε] ∪ [ + ε, M] (by
the theorem on taking limits across an inequality – see 4.1.17), cannot be the same
number as .
That is actually a more useful result than it initially looks. Establishing that
a sequence converges by the original definition is, of course, seldom easy . . .but
using that same definition to show that a sequence diverges can be very awkward
indeed. What the theorems on ‘convergent implies bounded’ and ‘convergence of
subsequences’ plus the last example, put together, tell us is that it is never necessary.
Of course a sequence that is unbounded or that possesses two subsequences with
different limits cannot be convergent (by earlier results), but now we see that the
converse is also valid:
Proof
A sequence that is unbounded, or possesses subsequences with differing limits,
must be divergent by Lemma 4.1.6 and Theorem 5.2.3. The converse – that a
divergent sequence must have one of these characteristics – is what Example 5.3.5
demonstrated.
5.3.7 EXERCISE Use the every sequence has a monotonic subsequence theorem
(5.2.8) to give an alternative proof of the Bolzano-Weierstrass result.
(This is quite a simple exercise to carry out, and most people regard this
alternative proof as both shorter and easier to follow than the one we presented
earlier, and yet somehow less informative, more like ‘rabbit out of the hat’ show-
off maths. Of course, you are free to use whichever works better for you.)
5.3.8 A look forward By this point, we have built up a good range of techniques
for establishing convergence and evaluating limits that are capable of working
across a wide variety of sequences. A wide variety, but by no means all – for
there are many important and useful sequences that need some additional (and
sometimes quite individual) attention. The business of our next chapter is to gather
together and explore many of these ‘routine-procedure-resistant’ examples.
.........................................................................
6.1 Introduction
We apologise in advance for the rather fragmentary character of this chapter but,
as we pointed out at the end of Chapter 5, there are many convergent sequences
that do not readily give up their secrets under routine uses of the techniques we
have so far developed, and you will need to become acquainted with their limits at
some point. It might as well be now. Keep in mind the squeeze, which will often
turn out to play a role here.
Solution
We have already seen (see 4.1.13 and 4.1.14) that for 0 < x < 1 we get xn → 0
and that for x > 1 we get (xn )n∈N divergent, and an appeal to 2.7.13 shows that for
−1 < x < 0 we get xn → 0 again. Now we only need to address the few missing
cases.
If x = 1 then it is obvious that xn → 1, and if x = 0 then it is obvious that
x → 0.
n
Undergraduate Analysis: A Working Textbook, Aisling McCluskey and Brian McMaster 2018.
© Aisling McCluskey and Brian McMaster 2018. Published 2018 by Oxford University Press
88 6 SPECIAL (OR SPECIALLY AWKWARD) EXAMPLES
Solution
Since n−t is certainly positive, we need only ensure that (given ε > 0) we can
find n0 ∈ N so large that n−t < ε will always happen once n ≥ n0 . Straightforward
roughwork will show how big n0 needs to be:
A formal proof is now easy to construct, beginning with: ‘Given ε > 0, choose
an integer n0 greater than eln(1/ε)/t . Then for any n ≥ n0 , we have . . . ’.
√
6.2.3 nth roots of a constant If a > 0, how does the sequence ( n a)n∈N behave?
Solution
In the special case a = 1 it is clear that this sequence converges to 1.
√
Now if a > 1 we shall have n a > 1 for every n (for the nth power of a positive
number less than 1 will still be less than 1), so let us write
√
n
a = 1 + hn
where hn is positive, and its subscript n just serves to remind us that its actual value
may vary with n. Raise both sides to the power of n and think what the binomial
theorem tells us:
√ n(n − 1)
a = ( n a)n = (1 + hn )n = 1 + n(hn ) + (hn )2 + · · ·
2!
where there are several more terms to come, but all we need to know about them
is that they are all positive. Each term on the right-hand side is therefore smaller
than a, and we focus on the second one:
n(hn ) < a.
6.2.4 Example To find the limit of the sequence whose nth term is
√
3n2 +n−2
xn = 123.
Solution
√
This is a subsequence of the sequence ( n a) (for a = 123) whose limit we know to
be 1. Therefore xn → 1 also.
√
6.2.5 EXERCISE Determine the limit of (an )n∈N where an = 3 + 7n + 9n .
n n
Draft solution
Begin with the obvious remark that
Solution
Ignoring the case n = 1 (which, as usual, cannot affect the limiting behaviour) we
√
see that n n > 1 for every other n, so let us write
√
n
n = 1 + hn
where hn is positive and will vary with n. Raise both sides to the power of n and
use the binomial theorem:
√ n(n − 1)
n = ( n n)n = (1 + hn )n = 1 + n(hn ) + (hn )2 + · · ·
2!
where there are several more terms to come and they are all positive. Each term on
the right-hand side is therefore smaller than n, but this time we focus on the third
one:
n(n − 1)
(hn )2 < n.
2!
Dividing both sides by n and remembering that n is at least 2, this rearranges to
give
2
(hn )2 <
n−1
90 6 SPECIAL (OR SPECIALLY AWKWARD) EXAMPLES
and so
2
0 < hn < .
n−1
2
Now it is fairly obvious (or see paragraph 2.7.12) that n−1 → 0, the squeeze
√
gives us hn → 0, and n n = 1 + hn → 1.
√
6.2.7 Example To determine the limit of (an ) where an = 5n+2
7n + 10.
Solution
√
The nth term broadly resembles n n but, of course, we have to do better than broad
resemblance. One approach (assuming n > 2) is:
√ √ √ √
5n+2
5n + 2 < 7n + 10 <
5n+2
7n + 10 = ( 7n+10 7n + 10)2
3.5n+5
(think carefully about the use of the index laws1 in that calculation) at which point
√
we see that the first and last items involve subsequences of ( n n) and therefore
converge to 1, after which the squeeze tells us that an → 1 also.
6.2.8 EXERCISE
√
1. Determine the limit of n 20n − 15. (It may help to begin by observing that
5n ≤ 20n − 15 < 20n for each n.)
2. Determine the limit of (an )n∈N where
√
an = (n−1 ) n 1n + 2n + 3n + · · · + nn .
6.2.9 Factorials grow faster than powers For any constant t, the sequence
tn
n! n∈N
converges to zero. This may seem contrary to common sense at first sight; for
example, try t = 10, calculate the first few terms
1 In particular, you should take note that if x > 1 and a < b (where a and b are positive) then
1 1
xa > xb .
This is another of those plausible-looking arithmetical results whose full understanding will have
to wait until we have properly defined the elementary functions, including general powers, in
Chapter 18.
6.2 IMPORTANT EXAMPLES OF CONVERGENCE 91
– and there is certainly no sign yet that they are drifting closer to zero. But we need
to look at what happens for seriously big values of n, and the early, small values
may be misleading.
Solution
We can assume that t is positive. (Because, firstly, in the case where t = 0 the result
is immediate; and, secondly, once we have proved it in the case t > 0, if we are
then challenged with a sequence
un
(an )n∈N =
n! n∈N
|u|n
→ 0.
n!
That is, |an | → 0. Now an earlier exercise (2.7.13) tells us that an → 0 also.)
Comparing an+1 with an , we find that
t
an+1 = an ,
n+1
and that fraction multiplier t/(n + 1) will be small once n becomes substantially
bigger than t. More precisely, once n exceeds 2t, the fraction multiplier will be less
than a half and, therefore, each term will be more than 50% smaller than the one
before it. This lets us set up a clear proof that the terms tend to zero:
As soon as the integer n exceeds 2t, we have
t n+1 t tn t tn 1 tn
= < =
(n + 1)! n + 1 n! 2t n! 2 n!
– that is, an+1 < 0.5an . So, picking an integer n0 > 2t, we have
an0 +1 < 0.5an0 , an0 +2 < (0.5)2 an0 , an0 +3 < (0.5)3 an0 , · · ·
· · · an0 +k < (0.5)k an0 (k ≥ 1).
Since (0.5)k → 0 as k → ∞ and all the terms are positive, the sequence (an )
converges to zero (by the squeeze) as claimed. (Notice that we entirely ignored
the first n0 terms but, as usual, this does not affect limiting behaviour.)
6.2.10 How does the nth root of n! behave? It gets seriously big. More precisely,
the sequence (xn )n≥1 given by
92 6 SPECIAL (OR SPECIALLY AWKWARD) EXAMPLES
1
xn = √
n
n!
converges to 0.
Solution
For any positive constant ε put t = 1/ε > 0 and call in the previous result, that
tn
→ 0.
n!
tn
<1
n!
6.2.11 EXERCISE Investigate the limiting behaviour of the sequence whose nth
term, for n > 10, is
1
√ .
n
(10)(11)(12)(13) · · · (n − 2)(n − 1)n
Remark
Watch out for subsequences, and for the squeeze.
Solution
Arrange the labelling so that a > b (for if a and b are equal, the sequence is
constantly zero). Notice first (difference of two squares) that
6.2 IMPORTANT EXAMPLES OF CONVERGENCE 93
√ √ √ √ √ √
( n + a − n + b)( n + a + n + b) = ( n + a)2 − ( n + b)2
= (n + a) − (n + b) = a − b
6.2.14 EXERCISE
(where the answers have been reported to six decimal places). It appears that this is
an increasing sequence so far, but the numerical value of the limit (if it even exists)
is not yet obvious. Still, we now have a possible strategy: to try to show that the
sequence is, indeed, always increasing, and perhaps bounded as well?
At a first or second reading you might choose to ignore the proofs of the next
three paragraphs, and you will be none the worse for it. Do not, however, ignore
the result that emerges.
94 6 SPECIAL (OR SPECIALLY AWKWARD) EXAMPLES
Proof
1
Take x = − in the preceding Recall, and simplify the resulting algebra:
(n + 1)2
n
1 n
1− ≥1−
(n + 1)2 (n + 1)2
n n 1
=1− 2 >1− 2 =1−
n + 2n + 1 n + 2n n+2
(n + 2) − 1 n+1
= = .
n+2 n+2
Carefully note the disturbance of inequality caused by ditching the +1 from the
bottom line. This shrinks the bottom line, and therefore increases the fraction; but
the fraction has an overall minus attached, so the nett result is to decrease the total.
Now it turns out that this ugly duckling of a lemma is precisely what is needed
to prove the result that we want:
1 n
6.2.18 Proposition The sequence 1+ n is increasing.
n≥1
Proof
xn+1
Denoting the typical term by xn , we carefully simplify the ratio2 xn and seek
evidence that the answer is greater than 1.
n+1
1
xn+1 1 + n+1
= n
xn 1 + n1
n+1
n+2
n+1
= n+1 n
n
n
n+2
n+2 n
= ×
n+1
n+1 n+1
2 n
n+2 n + 2n
=
n+1 (n + 1)2
2 This is a case where subtracting xn from xn+1 would not readily have helped us.
6.2 IMPORTANT EXAMPLES OF CONVERGENCE 95
n
n+2 n2 + 2n + 1 − 1
=
n+1 (n + 1)2
2 n
n+2 (n + 2n + 1) − 1
=
n+1 (n + 1)2
n
n+2 1
= 1−
n+1 (n + 1)2
n+2 n+1
≥ =1
n+1 n+2
Proof
Since we know that it is increasing and that the sixth term already exceeds 2.5, we
need only confirm that it is bounded above by 3. That is disarmingly simple, using
the binomial theorem again:
n 1
1 n 1 n(n − 1) 1 2
1+ =1+ +
n 1 n 2! n
3 n
n(n − 1)(n − 2) 1 1
+ + ··· +
3! n n
n 1 1 n(n) 1 2 n(n)(n) 1 3 1
<1+ + + + ··· +
1 n 2! n 3! n n!
1 1 1 1
=1+1+ + + + ··· +
2! 3! 4! n!
1 1 1 1
<1+1+ + + + · · · + n−1
2 2·2 2·2·2 2
1 n
1− 2
=1+ <1+2=3
1 − 12
and the proof is complete. Notice that we assumed n to be big enough that all the
terms of the binomial expansion that we listed actually came into play: but that
is harmless since an upper bound for later terms in this (increasing) sequence is
certainly an upper bound also for the earlier, smaller ones.
1. Taking more care in our estimations will let us find much more accurate values
for the limit in question. On the lower side, the limit of this increasing
sequence has to exceed each term that we choose to calculate exactly, such as
96 6 SPECIAL (OR SPECIALLY AWKWARD) EXAMPLES
2. The limit is actually the irrational number written as e, as in loge (that is, ln)
and ex and exponential growth. Its numerical value is approximately 2.71828.
3. Summary: n
1
1+ → e.
n
4. This turns out to be a special case (the case x = 1) of a highly important limit
that you need to know, but whose detailed proof will have to wait until much
later (Chapter 18, in fact; paragraph 18.2.16) in this account:
x n
1+ → ex .
n
6.2.21 Examples To find the limits of the sequences whose nth terms are as
follows:
n!+1
−1 n π n 1
(1 + 2n ) , 1− , 1+ ,
n n! + 1
2 n+2
1 n +n n
1+ 2 , .
n n+3
Solution
The first three are immediate from the above theorem and its subsequent notes:
n
they are e2 , e−π and e (since the third is a subsequence of 1 + n1 ).
n≥1
The fourth one – let us denote it by xn – needs a little more attention. We can
express xn as
2
1 n 1 n
1+ 2 1+ 2
n n
6.2 IMPORTANT EXAMPLES OF CONVERGENCE 97
– and the first of these factors is easy enough to deal with: it converges to e because
n
it represents a subsequence of 1 + n1 (which, of course, tends to e). The
remaining problem is to estimate the second factor which (check the index laws)
is the nth root of the first factor. Now the (convergent) first factor is bounded: it lies
always between two (evidently positive) constants a and b:
n2
1
a≤ 1+ 2 ≤b
n
n2 n
1 1
xn = 1 + 2 1+ 2 → e × 1 = e.
n n
6.2.23 EXERCISE Use induction to show that, for each positive integer n:
n n
n! ≥ 3 .
3
(You may find that the result established in the proof of 6.2.19 is useful here.)
98 6 SPECIAL (OR SPECIALLY AWKWARD) EXAMPLES
The final case study in this set – unlike the one we have just done – is included
merely for interest and, if you are short of time, you may safely leave it out. It
concerns the Fibonacci sequence (fn )n∈N that we mentioned briefly in our work
on recursively defined sequences, the sequence
that is created by choosing the first and second terms to be 1 and, from then on,
letting fn+2 = fn+1 + fn (n ≥ 1), that is, creating each term by adding the
two that are immediately before it. It seems highly unlikely at first sight that the
sequence of numbers (fn ) is settling towards a limit (indeed, it would be fairly easy
to show that it is unbounded, and therefore divergent) but its rate of growth is quite
a different matter.
Even a casual look at the sequence (fn ) will show that the terms are growing at
quite a steady rate, increasing by about 60% each time once the pattern is securely
established. This is rather curious …why should a sequence formed by adding turn
out to be one that is propagated by multiplying by about 1.6, and what is the limiting
value of this multiplier if, indeed, it has a limiting value? That is, can we find the
f
limit of the ratio, the ‘growth rate’ n+1
fn ? Note that our usual trick of trying to show
that the sequence (of ratios) is monotonic and bounded will not work this time: a
glance at the first few ratios
fn+2 = fn+1 + fn ,
fn+2 fn
=1+ ,
fn+1 fn+1
and observe that the left-hand side also converges to (for it is just the ratio
sequence with its first term left out). Taking limits on both sides, we deduce that
= 1 + 1 or, more readably:
2 − − 1 = 0.
6.2 IMPORTANT EXAMPLES OF CONVERGENCE 99
This quadratic
√
will not√factorise, so we solve it instead by the quadratic formula
to obtain 2 and 1−2 5 , that is, approximately 1.618034 and −0.618034. At this
1+ 5
point we can be pretty confident that we know what number the limit is bound
to be, but bear in mind that we have not yet proved that any limit exists for the
growth-rate sequence.
Let’s try a different approach. A sequence whose growth-rate is constant (say,
permanently equal to x), as opposed to merely settling towards a constant as limit,
can only take the form
Is it at all possible for such a sequence to satisfy the Fibonacci recurrence relation?
Let’s see:
fn+2 = fn+1 + fn now says axn+1 = axn + axn−1
p + q = 1, pα + qβ = 1.
Yet these are very easy equations to solve for p and q! We get
1−β α−1
p= , q=
α−β α−β
which, when you substitute in the values we have for α and β, simplify to
√ √
5+ 5 5− 5
p= , q=
10 10
Proof
The (originally described) Fibonacci sequence is fully specified by the conditions
f1 = 1, f2 = 1, fn+2 = fn+1 + fn (n ≥ 1). That is, once we have agreed that
these conditions are to hold, there is no ambiguity as to what every term in that
sequence has to be. (You may take that as obvious, on the grounds that we could
calculate from these conditions the value of any particular term that we wanted.
If you are not convinced by that, an induction argument upon the statement ‘all
the terms up to and including the nth term are completely specified by the given
conditions’ can easily be constructed.)
Yet the (possibly new?) sequence gn = pα n−1 + qβ n−1 (n ≥ 1) does satisfy
g1 = 1 and g2 = 1 since we picked the numbers p and q expressly so as to make
that happen, and also
That is, (gn ) satisfies all of the conditions that completely specified the Fibonacci
sequence. This can only mean that (gn ) actually is the Fibonacci sequence, and our
proof is complete.
6.2.25 Note For ease of use, we can slightly simplify the formula just obtained for
fn as follows:
fn = pα n−1 + qβ n−1
√ √ n−1 √ √ n−1
5+ 5 1+ 5 5− 5 1− 5
= +
10 2 10 2
√ √ √ n−1 √ √ √ n−1
5(1 + 5) 1 + 5 5(1 − 5) 1 − 5
= −
5×2 2 5×2 2
√ n √ n
1 1+ 5 1 1− 5
= 5− 2 − 5− 2
2 2
1 1
= 5− 2 α n − 5− 2 β n .
6.2 IMPORTANT EXAMPLES OF CONVERGENCE 101
6.2.26 Example
√
The limit of the growth-rate in the Fibonacci sequence is
1+ 5
α= 2 .
Solution
Using the formula for the nth Fibonacci number fn obtained in the Note, we have:
β
which, because the number α is numerically less than 1 (in fact it is approximately
−0.382) converges to √
α−0 1+ 5
=α=
1−0 2
as we claimed.
1 1
6.2.27 Postscript Of the two components 5− 2 α n and 5− 2 β n of our formula for
th
the n Fibonacci number, the first is by far the more important. For example, if we
put n = 15, the formula yields (to five decimal places) f15 = 609.99967 + 0.00033.
The reason is that β has modulus smaller than 1, and therefore its powers tend to
1
zero rather rapidly. More exactly, when we regard 5− 2 α n as an approximation to
fn , the error term
1 1
| fn − 5− 2 α n | = 5− 2 |β|n
which decreases to the limit zero. Notice also that even for n = 1, the error term
1
5− 2 |β|1 is only about 0.2764, so all of the error terms are much less than 1. It
1
follows that fn is always the integer closest to 5− 2 α n . An additional detail – taking
1
note of the sign of 5− 2 β n – is that
1
• for odd values of n, fn is the integer just greater than 5− 2 α n , whereas
1
• for even values of n, fn is the integer just less than 5− 2 α n .
6.2.28 EXERCISES
1. With fn continuing to denote the nth Fibonacci number, what is the limiting
behaviour of
fn+2
?
fn
2. If we were to alter the defining conditions of the Fibonacci sequence by
changing only the first two terms, say, to
102 6 SPECIAL (OR SPECIALLY AWKWARD) EXAMPLES
g1 = 7, g2 = 4, gn+2 = gn+1 + gn (n ≥ 1)
or to
g1 = a, g2 = b, gn+2 = gn+1 + gn (n ≥ 1)
for any constants a and b, what effect would that have on the limiting
behaviour of the growth rate?
3. Investigate the sequence (hn )n≥1 defined by
7.1 Introduction
Every ten-year-old school child knows that zero point endlessly many threes means
one third. This is not in doubt. What does need some critical analysis, however, is
whether zero point endlessly many threes is a legitimate symbol at all.
We are so over-familiar with the decimal system of representing numbers that
it is all too easy to forget what a superb invention it was. Its beauty and power
reside in the ability it gives us to write down any whole number whatsoever, and
a great many non-integers too, using only twelve symbols: the digits 0, 1, 2, 3, 4,
5, 6, 7, 8 and 9, the decimal point (or, if you prefer, the decimal comma) and the
minus sign. The power derives from the fact that it is a positional system: each
symbol carries information not only from its shape but also from where it occurs
in relation to the other symbols (especially the decimal point). Thus, for instance,
12825 actually means 1(10)4 + 2(10)3 + 8(10)2 + 2(10)1 + 5(10)0 and the two 2s
have different meanings, different significances, because of where they sit. In the
same way, positive numbers less than 1 can be denoted by placing digits of lower
and lower significance to the right of the decimal point: 0.4703 means 4(10)−1 +
7(10)−2 + 0(10)−3 + 3(10)−4 .
Seeking to extend that notation to non-terminating decimals1 raises an issue that
virtually no ten-year-old is in a position to handle with full rigour. The phrase zero
point endlessly many threes suggests that we write down, or at least imagine writing
down,
0.33333333 . . . and so on for ever,
and that this ought to mean
3 3 3 3 3
+ + + + + · · · and so on for ever.
10 100 1000 10000 100000
The first (practical) problem here is that no-one has ever lived long enough to write
down an infinite list of threes, nor indeed will the universe last long enough for
this to occur; but the deeper (conceptual) difficulty is that this is not how addition
works. Adding is essentially a finite procedure: we know what adding two numbers
1 which, at the least, obliges us to admit one more symbol, namely the row of dots · · ·
Undergraduate Analysis: A Working Textbook, Aisling McCluskey and Brian McMaster 2018.
© Aisling McCluskey and Brian McMaster 2018. Published 2018 by Oxford University Press
104 7 ENDLESS SUMS — A FIRST LOOK AT SERIES
means, and from that we can add three, or four, or a million, just by grouping them
together in pairs or by implementing some kind of induction argument; but adding
an infinite list of numbers does not make sense.
However, we have already faced and overcome this difficulty while talking about
where the idea of sequence limits comes from. Instead of grappling with a virtual
symbol such as that invoked by the slightly mystical phrase zero point endlessly
many threes, look instead at the sequence of perfectly ordinary numbers
If this has a limit, then that limit will be the natural way to interpret zero point
endlessly many threes. It does, and – to the undoubted satisfaction of many former
ten-year-olds – the limit is one third.
More importantly, though, this discussion provides a fruitful suggestion as to
how we might seek to make sense of the sum of an arbitrary endless list of numbers,
namely:
• don’t attempt to add all of them,
• just add the first n
• and then look for a limit of that partial total as n → ∞.
a1 + a2 + a3 + · · · + an + · · ·?
(a1 , a1 + a2 , a1 + a2 + a3 , a1 + a2 + a3 + a4 , · · · )
7.2.1 Definition A series is a pair of sequences (ak )k∈N and (sn )n∈N linked
together by the conditions sn = nk=1 ak for each n. The first is called the sequence
of terms and the second (as we already said) is called the sequence of partial sums.
∞
The series itself is denoted by the symbol ∞ 1 ak or 1 an .
It is the second of the two sequences whose limiting behaviour we have to focus
on when we examine a series – so much so that, for nearly all practical purposes,
it is legitimate to shorten the above official definition to:
∞
Shortened definition:
a series 1 ak is the sequence (sn )n∈N of partial sums
(where sn means nk=1 ak = a1 + a2 + a3 + · · · + an ).
An advantage of the shortened definition is that it makes the rest of this
definition paragraph so obvious as to be scarcely worth saying:
∞ k
1
converges, and its sum is 1
1
2
106 7 ENDLESS SUMS — A FIRST LOOK AT SERIES
and
∞ k
1
=1
1
2
should be viewed as saying exactly the same thing, although in the first the sigma
symbol means the series itself, while in the second the sigma symbol means the
sum (to infinity) of that series.
One more ambiguity alert: when working on a series, be careful not to use too
many pronouns! You are dealing with two sequences at once, so try to avoid phrases
like ‘it tends to zero’ or ‘it is bounded’ or ‘it converges’ unless the context really
does make it clear which of the two you mean.
7.2.3 Example: geometric series For any x ∈ (−1, 1), the geometric series
∞
xk
k=0
1
converges, and its sum is .
1−x
Solution
If we multiply out the product
(1 − x)(1 + x + x2 + x3 + · · · + xn−1 )
we see that all of the terms cancel in pairs except for 1 and −xn . Dividing across
by (non-zero) (1 − x) shows that the nth partial sum of this series
1 − xn
sn = 1 + x + x2 + x3 + · · · + xn−1 = .
1−x
1
Taking limits (as n → ∞) we find that sn → as predicted.
1−x
(Note also the effect of starting such a geometric series at a point other than
k = 0: for an integer m ≥ 0 the series
∞
xm + xm+1 + xm+2 + xm+3 + · · · = xk
k=m
can be thought of as
∞
1 xm
m
x xr = xm = .
r=0
1−x 1−x
This is legitimate because each partial sum can be factorised in just such a fashion,
after which we can let the number of terms tend to infinity and obtain the claimed
conclusion in the limit.)
7.2 DEFINITION AND EASY RESULTS 107
7.2.4 Example We confirm that the recurring decimal indicated by the phrase
zero point endlessly many nines represents the number 1.
Solution
The phrase actually means the limit – if it has a limit – of the sequence
n−1
n−1
sn = 0.9(1/10)k = 0.9 (1/10)k .
k=0 k=0
1 0.9
0.9 1
= =1
1 − 10 0.9
as predicted.
7.2.5 EXERCISE
−3.2283737373737 · · · .
2. Think about how you would prove that every recurring decimal represents a
rational number.
If x ≥ 1 or x ≤ −1 then the geometric series ∞ k
k=0 x does not converge, but
th
diverges. You can check this out by examining the n partial sum and using the
definition, but it is easier and quicker to engage the following little theorem instead:
∞
7.2.6 Theorem If a series k=1 xk converges, then xk → 0.
Proof
That the series converges tells us that the nth partial sum sn converges to some limit
. Also sn+1 converges to the same limit since loss of its first term has no bearing
on the limit. Then xn+1 = sn+1 − sn → − = 0. Hence the result. (We lost x1 in
that demonstration but, once again, early terms don’t have any impact on the limit
of a sequence.)
7.2.7 Alert:
converse of this result is not true. That is, xk → 0 is not enough to
The
guarantee that ∞ k=1 xk converges. We’ll do an example next to demonstrate this.
108 7 ENDLESS SUMS — A FIRST LOOK AT SERIES
Solution
Scan along the list of fractions and you will realise that, since they are steadily
decreasing, the biggest in any block is the first in that block and the smallest is the
last. So, for instance,
1 1 1 1 1 1
+ > + =2 = ,
3 4 4 4 4 2
1 1 1 1 1 1
+ + + >4 = ,
5 6 7 8 8 2
1 1 1 1 1 1
+ + ··· + + >8 =
9 10 15 16 16 2
and so on. The pattern emerging here (and writing sn for the nth partial sum as
usual) is
1 1 1 1 1 1 1
s4 > 1 + + = 1 + 2 , s8 > 1 + + + = 1 + 3 ,
2 2 2 2 2 2 2
1 1 1 1 1
s16 > 1 + + + + = 1 + 4
2 2 2 2 2
and, in general,
1
s(2n ) >1+n
2
for each n ≥ 2.
We see from this that the subsequence s(2n ) of (sn ) is unbounded,2 so the
partial-sum sequence (sn ) itself is also unbounded and therefore cannot converge.
7.2.9 EXERCISE
1. Find a positive integer N for which the N th partial sum of the harmonic series
is greater than 2018.
2. Let (ak )k∈N be any given sequence of positive numbers converging to a limit
> 0. Show that the following series diverges:
∞
ak
1
k
2 If this subsequence were bounded, we could find a constant M such that, for every positive
integer n, s(2n ) < M. The previous display now gives us 1 + n/2 < M, that is, n < 2M − 2 for
every positive integer, which is absurd.
7.2 DEFINITION AND EASY RESULTS 109
7.2.10 The alternating series test Suppose that (ak )k≥1 is a decreasing sequence
of positive numbers that converges to zero. Then the ‘alternating series’
∞
(−1)k−1 ak = a1 − a2 + a3 − a4 + a5 − · · ·
1
converges.
Proof
Since the terms ak are positive and getting steadily smaller, the typical ‘even’ partial
sum
7.2.11 Example
∞
1 1 1 1 1 1
(−1)k−1 = 1 − + − + − + · · ·
1
k 2 3 4 5 6
converges.
110 7 ENDLESS SUMS — A FIRST LOOK AT SERIES
∞
1 1 1 1 1 1
(−1)k−1 a = 1 − a + a − a + a − a + · · ·
1
k 2 3 4 5 6
Solution
Both of these are immediate from the alternating series test.
1 1
b2k−1 = ; b2k = ,
k 2k + 2
noting that the odd- and even-numbered terms have different descriptions.
• Is it legitimate to apply the alternating series test to ∞
1 (−1)
k−1 b ?
k
∞
• Does 1 (−1) bk converge or diverge?
k−1
7.2.14 Note Some parts of the algebra of limits transfer immediately from
sequences to series. If two series ak and bk converge, with sums sa and
sb , say, then the series
(ak + b k ) converges to the sumsa + sb simply
because
its nth partial sum nk=1 (ak + bk ) can be rearranged as nk=1 ak + nk=1 bk and
therefore does converge to sa + sb . We write this briefly as
(ak + bk ) = ak + bk
(provided, of course, that the two sums on the right-hand side do exist). In the
same way, and subject to similar provisos:
(ak − bk ) = ak − bk ,
(Cak ) = C ak
where C is any constant. On the other hand, no such result is available for
multiplication:
we do not obtain a partial sum for ak bk by multiplying partial
sums for ak and bk !
7.3 BIG SERIES, SMALL SERIES: COMPARISON TESTS 111
Proof
∞
If a series 1 ak has ak ≥ 0 for all values of k then, considering its partial sums sn :
sn+1 = sn + an+1 ≥ sn
for all n, that is, (sn ) is an increasing sequence. Therefore (sn ) will converge if it is
bounded, and vice versa.
In less formal language, this result converts the relatively difficult idea of con-
vergence (for this class of series only) into the relatively easy one of boundedness:
you will get convergence precisely when the terms are so small that, no matter
how many of them you add together, there is some absolute upper ceiling to
how big a total you accumulate. ‘Small series converge, big series diverge’. This
insight, in turn, we can sharpen up into a group of results called in general series
comparison tests:
7.3.2 The direct comparison test Suppose (ak )k∈N and (bk )k∈N are
two
sequences of non-negative terms and ak ≤ bk for every k ∈ N. Then if ∞1 bk
converges, ∞ a
1 k must also converge.
∞
(Equivalently, if ∞
1 ak diverges, 1 bk must also diverge.)
Proof
If ∞ 1 bk converges, the previous theorem tells us that there is some upper bound
(call it M) for all of its partial sums: that is, for every n ∈ N we have
n
bk ≤ M.
1
n n
Yet the fact that ak ≤ bk for every k ∈ N assures us that 1 ak ≤ 1 bk by simple
addition, so
n
ak ≤ M
1
∞
also. Now the same theorem gives us the convergence of 1 ak .
7.3.3 EXERCISE: the direct comparison test with scalingSuppose (ak )k∈N and
(bk )k∈N are two sequences of non-negative terms, and we can find a positive
112 7 ENDLESS SUMS — A FIRST LOOK AT SERIES
∞ ∞
constant C such that ak ≤ Cbk for every k ∈ N. Then if 1 bk converges, 1 ak
must also converge.
Comment
You will find that the proof
is virtually identical to the previous one: at the line
before the last you’ll get n1 ak ≤ CM, but CM is also just a constant.
Well, as long as we are sure that only the first and third terms have been altered, the
answer ought to be obvious: the altered series converges to a sum of 10 + 300 + 10 9
simply because every partial sum from the third one onwards has been increased
by 310, and that will feed through to the limit. So we see that – unlike the similar
scenario in sequences – changing a finite number of terms in a series does affect
its convergence behaviour, but only in quite a predictable fashion: the total of the
changes that you added to individual terms gets added onto the sum-to-infinity. In
particular, if the original series did converge, then so must the altered one . . . and
vice versa, because additions can be cancelled out by adding their negatives. We
conclude that:
If a series converges, then after alteration or omission of a finite number of terms,
the new series also converges, and vice versa.
More simply, it is always safe to alter or delete a finite number of terms from a
series provided that we only want to know whether or not it converges (and do not
care what particular sum it converges to). This allows us to modify the previous
two results as follows:
7.3.5 The direct comparison test with alterations/omissions Suppose (ak )k∈N
and (bk )k∈N are two sequences of non-negative terms, and that we can find a
positive
∞ integer k0 such that ak ≤ bk for every k ≥ k0 . Then if ∞
1 bk converges,
1 ka must also converge.
7.3.6 The direct comparison test with scaling and alterations/omissions Sup-
pose (ak )k∈N and (bk )k∈N are two sequences of non-negative terms and we can
find a positive constant
∞ integer k0 such that ak ≤ Cbk for every
C and a positive
k ≥ k0 . Then if ∞ 1 bk converges, 1 ak must also converge.
7.3 BIG SERIES, SMALL SERIES: COMPARISON TESTS 113
Solution
1. The top line of the fraction that defines ak always lies between 2 and 8 so,
firstly, all the terms are positive (and therefore we can use the theory
developed so far) and, secondly, ak ≤ 8( 12 )k .
Since (the geometric series) ( 12 )k converges, so does ak (by the direct
comparison test with scaling).
2. There is a risk that several terms here may be negative because of the minus on
the top line. However, if k ≥ 31 then
so, from that point on, the top line and ak itself are definitely positive: so we
shall just ignore the first 30 terms. Furthermore, if k ≥ 60 then
1 2 1
k = k × k > 30k ≥ 30k cos(anything)
2 2
√
and therefore k2 − 30k cos(k + 2 k) is at least ( 12 )k2 which, in turn, gives us
k2 k2 1 1
ak ≥ ≥ = provided k ≥ 60.
2(4k3 + 7) 2(4k3 + 7k3 ) 22 k
1
Since the harmonic series k diverges, so must the ‘bigger’ series ak by
1
‘comparison’, along with scaling by 22 and omission of terms.
7.3.8 EXERCISE
6k − 5
k(k2 + 17)
114 7 ENDLESS SUMS — A FIRST LOOK AT SERIES
and
6k + 2
k(8k2 − 5)
also converge.
There is another variety of comparison test that, in some cases at least, saves us
the bother of ignoring initial terms and guessing about scaling constants:
7.3.9 The limit comparison test (sometimes denoted by LCT) Suppose that
ak
(ak )k∈N and (bk )k∈N are two sequences of positive terms and that tends to a
bk
non-zero limit. Then either the two series both converge, or they both diverge.
Proof
Let denote the (non-zero) limit of the ratio of ak to bk . Using ε = /2 in the
definition of limit, there is a positive integer k0 such that, for k ≥ k0 , we have
ak 3
< < .
2 bk 2
The right-hand portion of that inequality rearranges to give ak < (3/2)bk for
large values of k, so the direct comparison test with scaling 3/2 (and ignoring
terms up to the kth
0 ) tells us that if bk converges then so must ak .
Now the left-hand portion of the displayed line rearranges to produce
bk < (2/)ak for large values of k, so the same argument establishes that if ak
converges then so must bk .
Solution
1. (Roughly speaking, the biggest power of k will dominate each line of the
fraction, that is, the top line will be dominated by k4 and the bottom line
k4 1
dominated by 2k5 . So ak resembles 5 = . Therefore . . . )
2k 2k
Let us consider bk = 1k . We see that
whose limit is 0.5 which is not zero. By the LCT, both series must converge or
else both series must diverge. Yet the harmonic series bk diverges, and
therefore so does the given series ak .
7.3 BIG SERIES, SMALL SERIES: COMPARISON TESTS 115
2 k
2. In the second example, try bk = 3 . Then
k
ak k − 1/2 −0.5 k
= = 1+
bk k k
whose limit is e−0.5 which is not zero. Since the geometric series bk
converges, the LCT tells us that ak must do so also.
7.3.11 EXERCISE Revisit Exercise 7.3.8 and use the limit comparison test to get
quicker, easier solutions of each of the problems that it posed.
Solution
A quick answer depends on noticing3 that
1 1 k+1−k 1
− = = ,
k k+1 k(k + 1) k(k + 1)
1 1 1 1 1
+ + + + ··· +
1(1 + 1) 2(2 + 1) 3(3 + 1) 4(4 + 1) n(n + 1)
1 1 1 1 1 1 1 1 1 1
= − + − + − + − + ··· + −
1 2 2 3 3 4 4 5 n n+1
7.3.14 Solution Use the preceding example and the limit comparison test: if we
put
1 1
ak = , bk =
k2 k(k + 1)
ak
then it is immediate that → 1 = 0, so the convergence of bk proves the
bk
convergence of ak .
3 The so-called theory of partial fractions, which you may have come across, helps one to notice
such things (especially in more complicated examples).
116 7 ENDLESS SUMS — A FIRST LOOK AT SERIES
1 a b
= +
k(k + 2) k k+2
∞ 1
and hence show that the series 1 converges (because its partial sums
k(k + 2)
collapse ‘telescopically’).
7.3.16 HARDER EXERCISE Let t > 1 be a constant. Prove that the series
∞
1 1 1 1 1 1
k−t = 1 + t
+ t + t + t + t + t ···
2 3 4 5 6 7
k=1
converges.
Partial solution
We can use the same kind of estimation of partial sums that we employed for the
harmonic series (in paragraph 7.2.8) but, instead of grouping the terms in blocks
that end with a negative power of 2, this time we make them start with a negative
power of 2, thus:
1 1 1 1 1 1
+ < + = 2 = t−1 ,
2t 3t 2t 2t 2t 2
1 1 1 1 1 1
+ + + < 4 = t−1 ,
4t 5t 6t 7t 4t 4
1 1 1 1 1 1
+ + · · · + + < 8 = t−1
8t 9t 14t 15t 8t 8
and so on. The pattern emerging this time concerning the partial sums is:
1 1 1
s(2n −1) < 1 + + + ··· +
2t−1 4t−1 (2n−1 )t−1
for each n ≥ 2. Now, recognise the right-hand side of the last display as being
a partial sum of a convergent geometric series, and therefore bounded above by
some constant M (the sum-to-infinity of that geometic series). Lastly, argue that
each partial sum of the original series is less than s(2n −1) for a suitably chosen value
of n, and is therefore less than M. An appeal to Theorem 7.3.1 (convergent equals
bounded for series of positive terms) will now complete the argument.
is called a tail of ∞ th
1 xk – more precisely, its n tail. Since this is formed by
omitting the first n terms, part at least of the following example should be quite
obvious.
∞
7.3.18 Example Given a series 1 xk , to show that the following three state-
ments are equivalent:
1. ∞ 1 xk converges,
2. every tail of ∞ 1 xk converges,
3. at least one tail of ∞1 xk converges.
Solution
• Suppose that statement 1 is true. For each n ≥ 1, the nth tail is formed by
omitting the first n terms of the series and therefore, by Remark 7.3.4, it is itself
a convergent series, that is, 2 is true.
• Suppose that statement 2 is true. Then the truth of 3 is immediate.
• Suppose
∞ that statement 3 is true. So we can find a positive integer n such that
k=n+1 xk converges to some limit s. (For N > n) the N partial sum of the
th
N
N
xk = (x1 + x2 + x3 + · · · + xn ) + xk
1 k=n+1
N
xk → (x1 + x2 + x3 + · · · + xn ) + s (as N → ∞).
1
∞
Thus, 1 xk is a convergent series.
7.3.19 Note It may be helpful to draw attention to the structure of that last
demonstration. To claim that a number of statements are equivalent is to say that if
any one of them is true then so are all the others. So when we said that statements
1, 2 and 3 were equivalent, we were asserting that 1 implies 2, 2 implies 1, 2 implies
3, 3 implies 2, 1 implies 3 and 3 implies 1. Fortunately there is no need to give a
separate demonstration of each of these six implications: for instance, once one has
established both 1 implies 2 and 2 implies 3, then 1 implies 3 follows immediately.
The usual way to set up an efficient proof of equivalence for three statements is to
confirm either that
118 7 ENDLESS SUMS — A FIRST LOOK AT SERIES
1⇒2⇒3⇒1
or that
1⇒3⇒2⇒1
that is, to set out a cyclical proof. This is what we did in 7.3.18 above. As another
illustration, if we wished to write out in full detail why the five ‘equivalent condi-
tions’ in 4.1.5 actually are equivalent, a cyclical proof along the lines of 1 ⇒ 2 ⇒
3 ⇒ 4 ⇒ 5 ⇒ 1 or 1 ⇒ 4 ⇒ 3 ⇒ 5 ⇒ 2 ⇒ 1 would be an efficient strategy.
∞
7.3.20 EXERCISE Given a convergent series 1 xk , let sn denote the sum of the
nth tail (which, by 7.3.18, is necessarily convergent). Show that limn→∞ sn = 0.
Proof
1. Assuming < 1, we consider the number half-way between and 1; we can
write this either as + ε or as 1 − ε if we choose ε to be half the length of the
1−
gap, that is, ε = .
2
2ε
1
( − ε)
+ε=1−ε
(Limit is less than 1)
√
n a → we can find n such that, for every n ≥ n :
Because n 0 0
√
n
an < + ε = 1 − ε, and therefore an < (1 − ε)n .
7.4 THE ROOT TEST AND THE RATIO TEST 119
Since the geometric series (1 − ε)n converges, so does the ‘smaller’ series
an according to the comparison test (with alteration/omissions, since we
‘lost’ the first n0 terms here).
2. Now assuming > 1 (and thinking ε = − 1 > 0) we can find n1 such that,
for every n ≥ n1 :
√
n
an > − ε = 1, and therefore an > 1.
1
=−ε ( + ε)
(Limit is greater than 1)
7.4.2 Remark You may well have noticed that we have gone back to using n
instead of k for the label on the typical term. This is safe because, at the moment, we
are not looking both at terms and at partial sums in the same paragraph, so there
are not two different labels to keep separate in our minds. Of course, it is perfectly
acceptable to use k or another letter instead of n.
7.4.3 Warning If the limit of the nth root of the nth term is exactly 1, this
test tells us nothing at all: for all it knows, the series could be convergent or
divergent. So we will need to look for a different test when such a borderline case
−1 −2
arises. As illustration of this point, notice that n is divergent and n is
th th
convergent, but in both cases you get a limit of 1 for the n root of the n term.
√n √ 2
(Note that n2 = n2/n = n n , and recall the result of 6.2.6.)
(1 + 3 )n2
n
(1 + πn )n
2
converge or diverge?
Solution
The typical term is positive so we can try the root test. The nth root of the nth term is
(1 + n3 )n
(1 + πn )n
e3
which converges to eπ = e3−π which is less than 1, so the given series converges.
120 7 ENDLESS SUMS — A FIRST LOOK AT SERIES
7.4.5 Example For precisely which positive values of t does the series
3n2 − 1 n
tn
2n2 − 1
converge?
Solution
All terms are positive, and the nth root of the nth term is
3n2 − 1
t
2n2 − 1
3t
which has a limit of so, by the root test,
2
1. for 0 < t < 2/3 the limit is < 1 and so the series converges,
2. for t > 2/3 the limit is > 1 and the nth term does not tend to zero and the
series must diverge.
It remains to ponder what happens when t is exactly 2/3. Luckily, in that borderline
case the nth term itself is
n n 2 n
3n2 − 1 2 6n − 2
=
2n2 − 1 3 6n2 − 3
which is (just) greater than 1 (in the final fraction, the top line exceeds the bottom
line by 1). That shows once again that the nth term cannot tend to zero and the
series diverges.
We conclude that the given series converges precisely when 0 < t < 2/3.
7.4.6 EXERCISE
1
n2 +2
1−
3n − 1
converges.
2. For which positive values of t does the series
(3n + 1)n
tn
nn−1
Proof
1. Assuming < 1, we again consider the number + ε = 1 − ε half-way
1−
between and 1, where ε = .
2
2ε
1
( – ε)
+ε=1−ε
(Limit is less than 1)
an+1
Because → we can find m such that, for every n ≥ m:
an
an+1
< + ε = 1 − ε, and therefore an+1 < (1 − ε)an .
an
Therefore
k=1 am+k according to the comparison test with scaling (by am ). We ‘lost’
the first m terms this time, but that doesn’t prevent the entire series an from
converging also.
2. Now assuming > 1 (and thinking ε = − 1 > 0) we can find p such that,
for every n ≥ p:
an+1
> − ε = 1, and therefore an+1 > an .
an
1
=−ε ( + ε)
(Limit is greater than 1)
7.4.8 Warning If the limit of the growth rate is exactly 1, this test too tells us
nothing at all, and we must seek a different test to analyse such a borderline case.
−1 −2
For instance, the divergent series n and the convergent series n both
have limits of 1 for their growth rates.
(n!)3 25n
vn where vn = .
(3n)!
Solution
vn+1
Carefully cancel all you can in the ratio and you should find that the growth
vn
rate is
(n + 1)3 25
.
(3n + 1)(3n + 2)(3n + 3)
Using the algebra of limits, we see that this fraction converges to 32/27 which is
greater than 1, so vn diverges.
7.4.10 Example For exactly which positive values of x does the following series
converge?
(n + 1)!(2n + 2)! xn
wn where wn =
(3n + 3)!
Solution
wn+1
The growth rate cancels to
wn
which is greater than 1 (look at the individual factors in the top and bottom lines).
Thus wn+1 > wn , the terms are increasing, the terms do not converge to zero and
the series again diverges.
We conclude that this series converges only when 0 < x < 27/4.
7.4 THE ROOT TEST AND THE RATIO TEST 123
7.4.11 EXERCISE Determine whether or not the series an converges, where:
nn × n!
an = .
(2n − 1)!
7.4.12 EXERCISE Determine the range of values of the real number x for which
the series
(n!)4 x2n
(4n)!
converges. (Note that the wording of the question allows x itself to be negative.)
7.4.13 Remark Notice that the ratio test is particularly suitable for series whose
terms involve several factorials, simply because massive cancellation occurs. For
instance,
(n + 1)! 1 × 2 × 3 × 4 × · · · × n × (n + 1)
= = n + 1,
n! 1 × 2 × 3 × 4 × ··· × n
(3n)! 1 × 2 × 3 × 4 × · · · × 3n
=
(3n + 3)! 1 × 2 × 3 × 4 × · · · × (3n) × (3n + 1) × (3n + 2) × (3n + 3)
1
=
(3n + 1)(3n + 2)(3n + 3)
and so on.
Take care that, when writing down the formula for term number n + 1, you do
so by replacing n by n + 1 at each of its appearances in term number n so that, for
example, 2n + 1 turns into 2n + 3 (that is, 2(n + 1) + 1). Don’t just ‘add 1 to each
bracket’.
Series in which the nth term’s formula is dominated by nth powers are likely
candidates for simplification by the nth root rest, of course. Where the formula
contains both factorials and nth powers, the decision is less clear-cut – but unless
and until you can access some useful information about the nth root of factorial n,
the ratio test is still probably the better bet.
.........................................................................
8 Continuous functions
— the domain thinks
that the graph
is unbroken
.........................................................................
8.1 Introduction
In order to study continuity, we need to make sure that we understand the ideas of
function (mapping, map, transformation), domain, codomain (target), range, one-
to-one (injective, 1 – 1), onto (surjective), composite (composition) and inverse
function. Let us begin by revising the most basic points now, and promising to
review other concepts through this chapter as we come to need them.
There are two styles of definition of the term function that you need to be aware
of. We’ll begin with the informal one, which is the one that we shall actually use
almost all of the time. A function from a set D to a set C is any rule, however it may
be expressed, that for each element of D allows us to determine a single associated
element of the set C. If the letter f stands for the function – the rule – then for
each x ∈ D the associated element of C is written as f (x). To describe a particular
function, you need to identify both sets D and C and, most importantly, to specify
the rule in enough detail to allow the reader to work out what f (x) is for each
possible x in D.
If you find that definition rather unsatisfactory, then your unease is justified.
Apart from being somewhat vague (which is bad enough already), it suffers from
the more serious flaw that it does not spell out the meaning of the words rule or
associated, and these are at least as needful of definition as the word function was.1
The way around that is to ground the ideas entirely in set theory, as in the following
formal definition.
A function, also called a mapping, map or transformation, consists of a list of
three sets called (respectively) the domain, the codomain (or target) and the graph,
which satisfy a particular condition that we now describe. If the sets are denoted
1 This is more than a little like defining a circle to mean a perfectly round figure, and then
going on to define ‘round’ to mean ‘shaped like a circle’. We end up merely noting the equivalence
of two ideas – which, of course, is better than nothing – without actually succeeding in defining
either of them.
Undergraduate Analysis: A Working Textbook, Aisling McCluskey and Brian McMaster 2018.
© Aisling McCluskey and Brian McMaster 2018. Published 2018 by Oxford University Press
126 8 CONTINUOUS FUNCTIONS
The function is usually represented by a single symbol such as f . Then the entire
phrase
is abbreviated to
f :D→C
and usually spoken as
f maps D to C.
For each x in D, the object represented above by y is conventionally written as f (x)
and called the value of f at x or the image of x under f .
Whether, in a particular question, you choose to think of the informal or the
formal definition, let us be clear that a function involves three objects: the domain,
the codomain, and the process of converting each element of the domain into an
element of the codomain. If you change any one of these, then you are looking at
a different function.
To convince yourself that the formal and informal definitions are essentially
saying the same thing, consider this: if we had drawn the graph of some (formally
defined) function f with perfect accuracy then, for each individual x in its domain,
we could trace vertically up or down the graph paper until we found a point on the
graph whose first coordinate was x, because this was guaranteed by the condition.
Furthermore, the same condition also guaranteed that only one such point could
exist. Then the second coordinate of that point is the (single) value of f (x), and we
have uncovered ‘the rule’ which the informal definition spoke of. Yet, conversely,
if we begin with exact knowledge of the rule, and then precisely mark the point
(x, f (x)) for every single value of x in the domain, we shall have drawn the graph
with absolute precision. Thus, perfect knowledge of the rule and perfect knowledge
of the graph determine one another. In the ideal world of infinite precision (which
is where definitions live), a function is its graph.
A real function is a function whose domain and codomain are both subsets of
the real line R. Since all of the functions discussed in this text are real functions,
we shall drop the qualifier real and simply refer to them as functions. In most
examples,2 the domain will be either an interval or a union of intervals, and the
individual function itself will be spelled out by presenting some sort of formula
or algorithm for calculating f (x) for each permissible input value x. When this is
so, by way of default the domain will comprise all the real numbers x for which
the formula for f (x) makes sense, and the default codomain will be R itself. For
example, the phrase
f is the function given by f (x) = 4 − x2
will mean the function g : (−∞, −3) ∪ (−3, 1) ∪ (1, ∞) → R defined by the
formula g(x) = (x − 1)−2 (x + 3)−1 . Note, in each case, how the domain has been
defaulted to be as extensive as possible, subject to the formula always returning a
real number.
You will already be familiar with the graphs of common and important functions
such as sin x, cos x and ex , of quadratics with formulas of the form ax2 + bx + c and
of ‘straight line’ functions of the form mx + c. Graphs, even rather rough sketch
graphs, are a useful way of storing and presenting information about functions and,
although they are not in themselves proofs, they often enable us to make sensible
guesses about how particular functions behave. Indeed, a decent sketch graph
frequently helps us to build up a sound, logical proof by supporting and guiding
our intuition in a visual way. Accordingly, we shall begin our investigation of
continuous functions by a short series of sketch graphs aimed at visually explaining
what it is that we are trying to define.
Graph of f
2
Graph of g
x5 + 1
h : (−3, 0) → R, h(x) = if x = −1, h(−1) = −1;
x4 − 1
x5 + 1
k : (−3, 0) → R, k(x) = 4 if x = −1, k(−1) = −5/4;
x −1
−1
Graph of h
8.2 AN INFORMAL VIEW OF CONTINUITY 129
−1
Graph of k
x5 + 1
m : (−3, −1) ∪ (−1, 0) → R, m(x) = .
x4 − 1
−1
Graph of m
sin x
p : (−7, 0) ∪ (0, 7) → R, p(x) = ;
x
sin x
q : (−7, 7) → R, q(x) = if x = 0, q(0) = 2;
x
sin x
r : (−7, 7) → R, r(x) = if x = 0, r(0) = 1.
x
130 8 CONTINUOUS FUNCTIONS
Graph of p
Graph of q
Graph of r
Graph of s
8.2 AN INFORMAL VIEW OF CONTINUITY 131
g(2) = 8
BREAK!
··
·
g(xn) ↛ g(2) g(x3)
g(x2)
g(x1)
x1 x2 x3 · ··
xn → 2
g(y1)
g(y2)
g(y3)
g(yn) → g(2) ·
··
·· · y3 y2 y1
yn → 2
On the other hand, look again at the first function whose graph we sketched
above, the one described by f : (−1, 5) → R, f (x) = x(x − 1)(x − 2)(x − 4)
and which we thought ‘looked continuous’ at, for instance, the point where x = 3.
If we send in absolutely any probing sequence (zn ) that converges to 3, we see
from the algebra of sequence limits that f (zn ) = zn (zn − 1)(zn − 2)(zn − 4) →
3(3 − 1)(3 − 2)(3 − 4) which is precisely the value f (3) of f at 3: so no sequence
probe finds any evidence of a break in the graph at x = 3.
These examples – and you might usefully try a few more like them just to
reinforce the point – allow us to see a way to identify graphs that are continuous
at a point x = p by using sequences, and without needing to depend on imprecise
and time-consuming sketches: if at least one sequence probe (xn ) (in the domain
of f ) that converges to p finds that f (xn ) does not converge to f (p), then there is
some kind of continuity-fracturing break at p; on the other hand, if all sequences
(xn ) that converge to p find that f (xn ) → f (p), then there is no such break, and we
have continuity.
8.3 CONTINUITY AT A POINT 133
This is, incidentally, not the only way to define and identify continuity for real
functions, but in the present context it is the quickest to understand and the easiest
to use, so we shall run with it:
Solution
Let (xn )n≥1 be any sequence in the domain (R) of p that converges to 4 as limit.
Then, using the algebra of limits:
8.3.3 EXERCISE Show that the (rational) function defined by the formula
x4 + 3x3 + 8x + 24
r(x) =
x4 + x2 − 20
is continuous at x = −1. Why can your argument not be modified to show that it
is also continuous at x = −2?
8.3.4 Example To show that the ‘floor’ function j(x) = x is not continuous at
x = 2, but is continuous at x = 2.2.
Solution
To take the second part of the question first . . . if (xn ) is any sequence whose limit
is 2.2 then, for all n ≥ some n0 , we shall have |xn −2.2| < 0.2, that is, 2.0 < xn < 2.4
which implies that j(xn ) = xn = 2. Ignoring the first n0 − 1 terms of the sequence
(which, as usual, has no effect on its limiting behaviour) we get j(xn ) → 2 = j(2.2).
Therefore j is continuous at 2.2.
In contrast, if we take the particular sequence (yn ) described by
yn = 2−(n+1)−1 then certainly yn → 2. However, since each term of this sequence
lies between 1.5 and 2 and therefore has floor 1, we find that j(yn ) → 1 = j(2).
The discovery of even one sequence convergent to 2 whose convergence is not
preserved by j shows that j is not continuous at 2.
8.3.5 EXERCISE
√ Verify that the function s(x) = x2 − x2 is not continuous
at x = 3.
134 8 CONTINUOUS FUNCTIONS
Solution
Firstly, the domain of f is the whole of R so 0 is a point of its domain. Also note
that, since 0 is rational, f (0) = |02 + 3(0)| = 0.
For any x we have −|x2 + 3x| ≤ f (x) ≤ |x2 + 3x| so, if (xn ) is any sequence in
R that converges to zero, we get
Now (as n → ∞) |xn2 + 3xn | → 0 and −|xn2 + 3xn | → 0. The squeeze then tells
us that f (xn ) → 0 also. By our definition, f is continuous at 0.
Secondly, since 1 is rational, f (1) = |12 + 3(1)| = 4.
We can√easily devise a sequence of irrationals that converges to 1: for instance,
yn = 1 + n2 will do fine. Then f (yn ) = −|yn2 + 3yn | → −|12 + 3(1)| = −4. Since
4 and −4 are different, the definition says that f is not continuous at 1.
(It is quite routine to modify that argument to show that f is not continuous
anywhere except at x = 0 and at x = −3.)
is continuous at x = 0.
By far the most useful, best behaved functions are those that are continuous not
just at individual points, but at every point in their domains. This is where we shall
now concentrate our attention.
Recall, at this point, that the restriction of f to S is ‘the same function as f was’
except that it is only defined at the points of S. That is, it is the function f : S → C
described by f (x) = f (x) for every x ∈ S. By way of example, the function
f (x) = x is – as we saw earlier – not continuous at x = 2, but it is continuous
on [2, 3) because it is actually constant there (and, as we shall soon easily verify,
constant functions are always continuous). The difference3 arises because, when
we investigate f itself near x = 2, we must examine what f does to all sequences
that converge to 2 and, since the domain of f is R, that includes sequences that
approach 2 from the left as well as sequences that approach 2 from the right; in
contrast, when we investigate f on the set [2, 3), we only examine how it behaves
on that set, and not therefore what might be happening to the left of 2.
For convenience, we combine definitions 8.3.1 and 8.4.1 into one:
Solution
For each p ∈ D and each sequence (xn ) in D that converges to p, we have
limn→∞ f (xn ) = limn→∞ k = k = f (p).
Solution
For each p ∈ D and each sequence (xn ) in D that converges to p, we have
limn→∞ f (xn ) = limn→∞ xn = p = f (p).
3 If it helps you to understand this point, feel free to think in terms of the informal description
of continuous functions as those whose domains believe that their graphs are unbroken. The
domain of f includes 2 and numbers to the right and to the left of 2, so it is able to ‘see’ the abrupt
change in height of the graph, from 1 immediately to the left of 2, to 2 at and immediately to the
right of 2; on the other hand, the domain of f is only [2, 3) and, consequently, all it ‘sees’ of the
graph is an unbroken horizontal line.
136 8 CONTINUOUS FUNCTIONS
At this point, if we knew that it was safe to add and multiply continuous
functions and be certain that the results were continuous, we could begin to
build up a useful catalogue of basic continuous functions such as kx, x2 , kx2 ,
kx2 + mx, kx3 + mx2 + qx + r and many others . . .
8.4.5 Theorem Suppose that the functions f and g are continuous on a set D; then
so are
1. f + g,
2. f − g,
3. their product fg,
4. kf for any constant k,
5. f /g provided that g(x) = 0 for each x ∈ D, and
6. |f |.
Proof
All six parts are proved in the same rather predictable fashion, so we shall only
demonstrate the (very slightly trickier) part 5.
Let (xn ) be any sequence in D that converges to an element p of D. Because f and
g are continuous on D, we know that f (xn ) → f (p) and that g(xn ) → g(p). We
additionally know that none of the g(xn ) can be zero. By the algebra of limits for
sequences yet again, f (xn )/g(xn ) → f (p)/g(p), that is,
f f
(xn ) → (p).
g g
Thus f /g is continuous.
8.4.6 Corollary
Proof
The typical polynomial
p(x) = a0 + a1 x + a2 x2 + a3 x3 + · · · + an xn
can be built up in a finite number of moves from the basic constituents of x (that
is, the identity function on R) and constants by adding, multiplying and scaling.
Parts 1, 3 and 4 of the theorem assure us that continuity will not be lost in that
process.
8.4 CONTINUITY ON A SET 137
p1 (x)
r(x) =
p2 (x)
and its domain comprises all real numbers except those at which the denominator
p2 (x) takes the value zero. Once we avoid those points (which, of course, are not
in the domain of r anyway), the first part of the corollary says that r(x) is one
continuous function divided by another that is nonzero, and part 5 of the theorem
tells us that r(x) is continuous.
There is a way of combining functions that simply does not apply to sequences
and, therefore, was not mentioned in the earlier chapters of this text. If f : D → C
and g : C → B are two functions such that the codomain of f equals4 the domain
of g then, for each x ∈ D, we see that f (x) lies in the domain of g and therefore
g(f (x)) makes sense and is an element of B. Thus the correspondence of x to g(f (x))
has created a function from D to B. It is called the composite or the composition of
f and g, and denoted by g ◦ f in most textbooks. In summary:
If f : D → C and g : C → B then their composite is the function
g◦f :D→B
specified by
(g ◦ f )(x) = g(f (x)) for each x ∈ D.
8.4.7 Warning Beware that in some books the composite is written as gf and, in
this case, you must take great care not to confuse it with the product g times f .
√
Just to illustrate this: if f (x) = 3 x and g(x) = x2 + 5 then the ordinary product
√
gf is given by (gf )(x) = g(x)f (x) = (x2 + 1) 3 x, but the composite g ◦ f by
√ √
(g ◦ f )(x) = g( 3 x) = ( 3 x)2 + 5. Notice also that √the composite the other way
round, f ◦ g, is given by (f ◦ g)(x) = f (x + 5) = 3 x2 + 5 and is a completely
2
g ◦ f : D → B.
Proof
Let (xn ) be any sequence in D that converges to an element p of D. Because f is
continuous, we know that f (xn ) → f (p). Yet this convergence takes place inside
4 This definition works equally well if the codomain of f is merely a subset of the domain of g.
138 8 CONTINUOUS FUNCTIONS
the domain of the continuous function g, and therefore g(f (xn )) → g(f (p)). In
other words, (g ◦ f )(xn ) → (g ◦ f )(p). Therefore g ◦ f is continuous.
√
8.4.9 A look forward Once we get around to checking that 3 x is continuous, this
theorem will assure us that both of the composites in the above Warning paragraph
are continuous.
More generally, once we have confirmed continuity for functions such as
sin, cos, ex and so on, Theorem 8.4.8 will help to generate a huge and diverse
array of functions whose continuity we shall know in advance: expressions such as
Proof
For each positive integer n, the definition of supremum tells us that there is an
element an of A such that s − n1 < an ≤ s. Now the squeeze shows that an → s.
8.5.2 The intermediate value theorem (‘IVT’) Let f be continuous on (at least)
a closed bounded interval [a, b] and suppose that either f (a) < λ < f (b) or
f (a) > λ > f (b). Then there is a number c ∈ (a, b) such that f (c) = λ.
(In other words, any number that lies intermediate between two values of a
continuous function on an interval actually is a value of that function.)
5 And provided, of course, that you never try to apply a function to a number that does not lie
in its domain: taking square roots or logarithms of a negative number, for instance, will destroy
not only the continuity of an expression, but also its very meaning.
8.5 KEY THEOREMS ON CONTINUITY 139
Proof
Take the first case6 f (a) < λ < f (b). We put
and we see that X is not empty (for a at least belongs to it) and is bounded above
by b. Put s = the supremum of X.
By the lemma, there is a sequence (an ) in X such that an → s. Continuity gives
us f (an ) → f (s) and, since each f (an ) < λ, therefore (using 4.1.17)
In particular, since f (s) < f (b), s must be strictly less than b, and the whole of
(s, b] lies outside X: that is, f (y) ≥ λ at every point y of (s, b].
Now choose any sequence7 (yn ) in (s, b] that converges to s and, just as before,
we get
f (s) = lim f (yn ) ≥ λ...........(2).
n→∞
From (1) and (2) we conclude that f (s) = λ.
Notice that the variant of the IVT that uses non-strict inequalities is also perfectly
valid (and doesn’t count as a different theorem):
8.5.3 The intermediate value theorem (again) Let f be continuous on (at least)
a closed bounded interval [a, b] and suppose that either f (a) ≤ λ ≤ f (b) or
f (a) ≥ λ ≥ f (b). Then there is a number c ∈ [a, b] such that f (c) = λ.
Proof
If f (a) < λ < f (b) or f (a) > λ > f (b) then there is nothing new to prove, because
the original IVT applies. Yet if λ equals f (a) or f (b), the result is trivial anyway:
c = a or b will satisfy.
8.5.4 Example To show that the equation p(x) = x2 (x2 − 4)(x − 3) − 1 = 0 has
at least one (real) solution.
Proof
Since p is continuous (being just a polynomial) and p(0) = −1 is negative, all
we need to do is to find a value of x such that p(x) is positive: for then zero will lie
between two values of p, and the IVT will guarantee that zero actually is a value of p
6 For the second case, apply the conclusion of the first case to the continuous function (−f ).
7 For instance, yn = s + b−s
n will do.
140 8 CONTINUOUS FUNCTIONS
– as required. A little experimentation will readily find, for example, that p(1) = 5
which is positive. Therefore by 8.5.2 there is some c ∈ (0, 1) such that p(c) = 0.
8.5.5 EXERCISE
p(n)
→ 1 as n → ∞
n5
8.5.6 Example To show that the equation p(x) = x2 (x2 − 4)(x − 3) − 1 = 0 has
five (real) solutions.
Roughwork
This needs rather more trial-and-error . . . more precisely, it requires enough
number-crunching to find not just one but five intervals over which p(x) changes
sign. We eventually discovered this:
Solution
Since p is a continuous polynomial and p(−2) = −1 < 0 < 11 = p(−1), there
exists c1 in (−2, −1) such that p(c1 ) = 0.
Since p is a continuous polynomial and p(−1) = 11 > 0 > −1 = p(0), there
exists c2 in (−1, 0) such that p(c2 ) = 0.
Since p is a continuous polynomial and p(0) = −1 < 0 < 5 = p(1), there exists
c3 in (0, 1) such that p(c3 ) = 0.
Since p is a continuous polynomial and p(1) = 5 > 0 > −1 = p(2), there exists
c4 in (1, 2) such that p(c4 ) = 0.
Since p is a continuous polynomial and p(2) = −1 < 0 < 191 = p(4), there
exists c5 in (2, 4) such that p(c5 ) = 0.
8.5 KEY THEOREMS ON CONTINUITY 141
Because of the intervals in which they lie, c1 , c2 , c3 , c4 and c5 must all be different
solutions of the equation. (You probably also know that a polynomial equation
of degree 5, such as this, can never have more than five solutions: indeed, that a
polynomial equation of degree n can never have more than n distinct solutions.)
8.5.7 Example To show that a function that is continuous on an interval and all
of whose values are rational must actually be constant on that interval.
Solution
Let f : I → C be continuous on the interval I. If it were not constant, we could
find x1 , x2 in I such that f (x1 ) = f (x2 ). Suppose, to make the picture more definite,
that x1 < x2 and that f (x1 ) > f (x2 ) (the other cases will work out in a very similar
manner). We know that f (x1 ), f (x2 ) are rational, but we can choose an irrational
number j that lies between them.8 By the IVT, j must be a value of f at a point
somewhere in the interval I: but this contradicts what we were told about its values
all being rational.
8.5.8 Example Given a continuous function f on the interval [0, 3] such that
f (0) = −f (3), to show that the equation f (x) + f (x + 2) = f (x + 1) has a solution
in [0, 1].
Solution
We create a new function (suggested by the equation we are trying to solve)
g(x) = f (x) − f (x + 1) + f (x + 2). This is defined on [0, 1] and is continuous
there (because it has been built from continuous components f , x + 1 and x + 2).
Notice that
g(0) = f (0) − f (1) + f (2)
and
g(1) = f (1) − f (2) + f (3) = f (1) − f (2) − f (0) = −(f (0) − f (1) + f (2))
have opposite signs.9 Thus 0 lies intermediately between two values g(0), g(1) of
continuous g and therefore is a value of it: 0 = g(t) for some t ∈ [0, 1], that is,
0 = f (t) − f (t + 1) + f (t + 2) or f (t) + f (t + 2) = f (t + 1).
8.5.9 EXERCISE Given a continuous function f : [0, 1] → [0, 1], show that the
equation
(f (x))2 = x5
must have at least one solution in [0, 1].
f (x )−f (x )
8 For instance, f (x2 ) + 1 √ 2 would do.
2
9 Unless, of course, both are equal to zero: but then the result is immediate.
142 8 CONTINUOUS FUNCTIONS
Roughwork
Does g(x) = (f (x))2 − x5 define a continuous function? Also pay attention to the
codomain of f this time.
8.5.10 Optional extra – another proof of IVT This proof ought to remind you of
how we proved Bolzano-Weierstrass . . . which is very timely: for very shortly we
are going to be making serious use of Bolzano-Weierstrass at last.
8.5.11 Lemma Suppose that f : [a, b] → R is continuous on the interval [a, b],
and f (a) < 0, and f (b) ≥ 0. Then there is c ∈ (a, b] such that f (c) = 0.
Proof
As a piece of temporary jargon, let us call [a, b] a signchange interval for f to mean
that f (a) < 0 and f (b) ≥ 0. We are going to look for smaller signchange intervals
for this function.
Consider the midpoint m = (a + b)/2 of [a, b]. If f has a negative value here,
then [m, b] is a signchange interval for f ; if not, then [a, m] is a signchange interval
for f ; in each case we have found one half of [a, b] – label this half [a1 , b1 ] – that is
a signchange interval for f .
Repeat the process: we shall find one half of [a1 , b1 ] – label this half [a2 , b2 ] –
that is a signchange interval for f .
Repeat the process: we shall find one half of [a2 , b2 ] – label this half [a3 , b3 ] –
that is a signchange interval for f .
Continue indefinitely.
We are producing two sequences (an ) and (bn ) in [a, b] and, because each
interval contains the next one, they satisfy
a1 ≤ a2 ≤ a3 ≤ a4 ≤ · · · < b; b1 ≥ b2 ≥ b3 ≥ b4 ≥ · · · > a.
So these two sequences are monotonic and bounded, and therefore converge:
for some c, c in [a, b]. But also, since each interval has just half the length of the
previous one:
bn − an = (b − a)/2n .
Taking limits there, we see that c − c = 0: in other words, c = c .
Now f is negative at the left endpoint of each signchange interval so f (an ) < 0
for all n, whence (using continuity at last) f (c) = limn→∞ f (an ) ≤ 0.
Equally, f (bn ) ≥ 0 for all n, whence (via continuity) f (c ) = limn→∞ f (bn ) ≥ 0.
Since c = c , when we combine these we get f (c) = 0 as desired.
(Also, since f (a) < 0 and f (c) = 0, a and c cannot be equal; so c ∈ (a, b].)
8.5 KEY THEOREMS ON CONTINUITY 143
8.5.12 Theorem – the IVT yet again Let f be continuous on a closed bounded
interval [a, b] and suppose that either f (a) ≤ λ ≤ f (b) or f (a) ≥ λ ≥ f (b). Then
there is a number c ∈ [a, b] such that f (c) = λ.
Proof
If λ = f (a) or f (b), the result is immediate. Otherwise, apply the last lemma to the
function f (x) − λ (or to the function −f (x) + λ for the case f (a) > λ > f (b)).
Proof
Suppose firstly that it were not bounded above. Then, for each positive integer n,
n cannot be an upper bound for the range f ([a, b]), so there is a point xn ∈ [a, b]
such that f (xn ) > n.
According to Bolzano-Weierstrass, the bounded sequence (xn ) thus created has
a convergent subsequence: xnk → p for some p ∈ [a, b]. Now continuity tells
us that f (xnk ) → f (p) so, in particular, (f (xnk )) is a convergent sequence, and
therefore bounded. Yet it is not: for we arranged that, for each positive integer k,
f (xnk ) > nk ≥ k. The contradiction shows that f must have been bounded above.
Secondly, much the same argument will yield a contradiction from supposing
that f were not bounded below. (Alternatively, since we now know that continuous
functions on [a, b] are always bounded above, apply that fact to the continuous
function (−f ): for ‘(−f ) bounded above’ and ‘f bounded below’ say exactly the
same thing.)
This theorem entitles us to speak of the supremum and the infimum of any
continuous function on a closed bounded interval, and yet there is better news
144 8 CONTINUOUS FUNCTIONS
than that – the sup and the inf are actually values of the function: so we can safely
speak of the function’s biggest and smallest values instead.
8.5.15 Theorem – sup and inf are attainedIf f : [a, b] → R is continuous, then
it possesses a maximum and a minimum value on [a, b].
Proof
Knowing from the previous theorem that f ([a, b]) is bounded and therefore has
a supremum fsup and an infimum finf , we need to find x0 , x1 in [a, b] such that
f (x0 ) = finf and f (x1 ) = fsup .
For each n ∈ N, the definition of supremum tells us that there is a point yn in
[a, b] such that
1
fsup − < f (yn ) ≤ fsup .
n
Once more we have a bounded sequence (yn ), and once more Bolzano-Weierstrass
promises us a convergent subsequence: say
ynk → p as k → ∞
where also p ∈ [a, b]. Appealing to continuity, we find that f (p) = limk→∞ f (ynk ).
Yet, when we take limits (as k → ∞) across the inequality
1
fsup − < f (ynk ) ≤ fsup
nk
8.5.16 Optional extra – an alternative proof that suprema are attained Again
let f : [a, b] → R be continuous, and suppose that it does not attain a maximum
value. In that case, fsup is always strictly greater than the values of f , so the function
1
g(x) =
fsup − f (x)
is defined and continuous everywhere on [a, b]. By above, g(x) is bounded: there
is a positive constant K such that, for every x ∈ [a, b]:
1 1
g(x) ≤ K, that is, ≤ K, that is, fsup − f (x) ≥ ,
fsup − f (x) K
1
that is, f (x) ≤ fsup − .
K
8.5 KEY THEOREMS ON CONTINUITY 145
This, however, contradicts fsup being the supremum (the least possible upper
bound) of the values of f .
for each ε > 0 there is K > 0 such that f (x) < ε whenever x > K,
Solution
(Note that we cannot immediately use the ‘supremum is attained’ theorem since
the domain of f here is an unbounded interval. But it would be good if we could
somehow force the action to take place on a closed bounded interval and, since f (x)
is nearly 0 for very big values of x, that perhaps could be arranged . . . )
In the special case where f is constant at 0, the result is trivial.
If not, then we can find a ∈ [0, ∞) such that f (a) > 0.
Then the given condition on f tells us that we can find a positive number K
so that:
for every x > K, we get 0 ≤ f (x) < f (a)/2.
It should be obvious that K ≥ a. Now on the closed bounded interval [0, K], f
must have a biggest value (f (b), say), and this is at least as big as f (a). It is therefore
bigger than every value that f can take on (K, ∞) since they are all smaller than
f (a). In other words, f (b) is the maximum value that f takes anywhere in [0, ∞).
Show that there is a positive constant K such that |g(x)| < K for every x ∈ R. (You
can assume standard facts about the trig functions.)
1
entirely different from the reciprocal function 3 with domain
x +1
(−∞, −1) ∪ (−1, ∞).
• The key result from the basic theory of sets and mappings is that the invertible
mappings are precisely the bijective mappings: f has an inverse if and only if it is
both one-to-one and onto.
That last result is important even in very elementary algebra. For instance, what
is the inverse of f (x) = x2 ? The short answer is that it doesn’t have one . . . if, by
that brief formula, we mean the function
f : R → R given by f (x) = x2 :
because it is obvious that this function (now that we have described it fully) is
neither one-to-one nor onto.12 If we modify the definition to
then at least it becomes onto, since every element of [0, ∞) is the square of some
real number, but it is still not one-to-one. However, if we modify the definition
again to read
f : [0, ∞) → [0, ∞) given by f (x) = x2
then what we are looking at now is both one-to-one and onto, so it does possess an
inverse.
The inverse, naturally, is the square root function – the √ function
√
g : [0, ∞) →[0, ∞) given by g(x) = x – because it is clear that x2 = x and
√
( y)2 = y for all non-negative x and y. This pattern of seeking an inverse for
some important function that initially did not have one, by restricting the domain
and/or codomain of its defining formula until it becomes one-to-one and onto, is
common and valuable – we shall meet it again in Chapter 18.
Incidentally, it is important to keep in mind that the last three display lines
defined three different functions, even though we (rather incorrectly) used the
same letter f to stand for all of them.
Just as, amongst sequences, the monotonic ones were often easier to work with
than the rest, functions whose values steadily increase or steadily decrease have
some desirable and useful properties, and it will pay us to give clear definitions to
these classes of function now:
12 For example, because f (1) = f (−1), and because −3 is in the codomain but is not in the
range of f .
148 8 CONTINUOUS FUNCTIONS
Proof
We’ll consider only the case where f is strictly increasing – the other is very similar.
If x = y in D then either x < y or y < x. Accordingly either f (x) < f (y) or
f (y) < f (x): and in both cases, f (x) = f (y), as required for injectivity. The choice
of codomain has ensured that f is also onto, so it is invertible.
Given p < q in f (D) choose x, y in D such that p = f (x) and q = f (y), that is,
x = f −1 (p) and y = f −1 (q). If it were true that x ≥ y then the increasing nature of
f would yield the contradiction p = f (x) ≥ f (y) = q. Hence we must have x < y,
that is, f −1 (p) < f −1 (q). Therefore f −1 is a strictly increasing function.
Proof
Any number y that lies between two elements f (x1 ), f (x2 ) of the range is, according
to the IVT, a value of f , that is, another element of the range: but this is the defining
characteristic of an interval.
Proof
For any element f (x) of the range, x ∈ I cannot be an endpoint of the (open)
interval I, so we can find x , x in I such that x < x < x . Depending on whether
f is (strictly) increasing or decreasing, it follows that either f (x ) < f (x) < f (x ) or
f (x ) > f (x) > f (x ). In both cases, f (x) fails to be an endpoint for the interval f (I),
which therefore cannot include any of its endpoints: hence the result.
8.6 CONTINUITY OF THE INVERSE 149
8.6.6 Remark – optional extra Once we think in more detail about what a con-
tinuous strictly monotonic function can do to intervals of various types, a picture
emerges that is rather more complicated than the last lemma suggests. As a way of
building intuition about this, you could check out the details summarised in the
following table:
Table 8.1. Possible forms of f (I) for interval I and continuous strictly monotonic f
8.6.7 EXERCISE – optional extra Confirm that the table above is complete and
correct. That is, for each row, verify that when the interval I has the indicated form
and the function f : I → R is continuous and strictly increasing, f (I) must take
one of the listed forms, and confirm by examples that each listed form actually can
occur; then repeat the exercise for f continuous and strictly decreasing.
Partial solution
Let us consider just the fifth row, the one in which I is of the form [a, ∞).
150 8 CONTINUOUS FUNCTIONS
Proof
The inverse exists and is strictly increasing by 8.6.2.
Firstly, we shall look in detail at the case where p is not an endpoint of I
(and, consequently, f (p) is not an endpoint of the interval f (I)). Let (yn ) be any
sequence in f (I) that converges to f (p); we need to show that (f −1 (yn )) converges
to f −1 (f (p)) = p.
For each n ∈ N, yn = f (xn ) for some (unique) xn ∈ I, namely xn = f −1 (yn ). Let
ε > 0 be given. There is no loss of generality in assuming that ε is small enough
to ensure that the interval [p − ε, p + ε] lies inside I: for were it not so, we would
replace ε by a smaller number that does ensure this. Consequently, we can talk
about f (p − ε) and f (p + ε), and know that the first is smaller than f (p) and the
second is greater than f (p). Since yn → f (p), we see that13 there will be a positive
integer n0 such that
n ≥ n0 ⇒ yn ∈ (f (p − ε), f (p + ε))
xn = f −1 (yn ) ∈ (p − ε, p + ε)
13 If this step is not sufficiently clear to you, try putting δ equal to the smaller of the two
numbers f (p) − f (p − ε) and f (p + ε) − f (p), and notice that (for sufficiently large values of
n) we shall have |yn − f (p)| < δ.
8.6 CONTINUITY OF THE INVERSE 151
f (p + ε)
f (p)
yn
f (p − ε)
p − ε xn p p+ε
xn = f −1(yn)
Almost exactly the same argument will show that this also works for decreasing
in place of increasing:
(Alternatively, you may be able to see how to prove the second proposition for
free, just by applying the first proposition to the increasing function −g.)
Combining the two propositions (invoked at each point of the domain) and
Lemma 8.6.2, we obtain the continuous inverse theorem in the form that is usually
most useful:
[0, ∞).
.........................................................................
9 Limit of a function
.........................................................................
9.1 Introduction
As in the previous chapter, it may be useful to begin by outlining informally and
visually the next topic that we are going to define and investigate. Let’s start by
reviewing the sketch graphs we drew in our first attempt to explain continuity.
Graph of f
Graph of g
Graph of h
Undergraduate Analysis: A Working Textbook, Aisling McCluskey and Brian McMaster 2018.
© Aisling McCluskey and Brian McMaster 2018. Published 2018 by Oxford University Press
154 9 LIMIT OF A FUNCTION
Graph of k
Graph of m
Graph of p
Graph of q
9.1 INTRODUCTION 155
Graph of r
Graph of s
for ordinary continuity but consistently avoiding x = a. So, starting with Chapter
8’s definition of continuity at the point a:
for every sequence (xn ) in D such that xn → a, we find that f (xn ) → f (a)
we should firstly replace the (perhaps undefined) f (a) by a symbol for the limiting
number to which all the sequences (f (xn )) need to converge, and secondly prevent
the sequences (xn ) from including a as one or more of their terms. This suggests
that the defining characteristic of functions of the second type is this: there is a real
number such that
for every sequence (xn ) in D\{a} such that xn → a, we find that f (xn ) → .
When this is the case, the number which the values of f are approximating
better and better as we approach a (but without actually reaching a, of course) is
called the limit of f (x) as x tends to a or the limit of f at a. In this language, looking
back at our sketch graphs, we intend to say that both h and m have limits of −5/4
at −1, that both p and q have limits of 1 as we approach 0, but that g does not
have a limit as we approach 2 and that s does not have a limit at 0. (Of course, we
still need to show that these are true statements – but at least we now possess a
logically sound definition against which to test their truthfulness.) We also gain
from the discussion an alternative definition of continuity, namely: a function f
is continuous at a point a of its domain if the limit of f as we approach a is
precisely f (a).
··
·
f (xn) → f (x3)
f (x2)
f (x1)
x1 x2 x3 · ·· a
xn → a
Next, a technical warning: since the definition that we are setting up (of limit
of f (x) as x approaches a) is entirely dependent on what happens to sequences in
D \ {a} that converge to a, we must take care never to use it if, in fact, there are no
such sequences! For instance, any attempt to find a limit of ln x as x approaches −1,
√
or of x at x = −0.3, or of arcsin x as x tends to 2, is doomed to fail since these
9.1 INTRODUCTION 157
functions are undefined close to the point that we are claiming to approach. For
a subtler example, think about the factorial function f (x) = x! . Now, we don’t
often consider ‘factorial’ as a real function at all since it only handles non-negative
integers but, nevertheless, it does satisfy the requirements of our definition of a real
function (with domain N ∪ {0}). Let us ask, then: what is the limit of this function
x! as x tends to, say, 2? The domain D of this function is {0, 1, 2, 3, 4, 5, · · · }, so
D \ {2} is {0, 1, 3, 4, 5, · · · }, and the definition needs us to look at a typical sequence
in {0, 1, 3, 4, 5, · · · } that converges to 2 . . . but no such sequences exist: a sequence in
that set never gets within the distance 1 of 2, so it cannot converge to 2. We conclude
that the limit of x! as we approach 2 (or, indeed, as we approach any other number)
cannot be defined.
The final matter that we ought to stress before concluding this introductory
section is that, although functions such as m and h appear somewhat artificial at
first sight – contrived examples, designed to deliver a teaching point rather than
practical, useful algebra or calculus – questions concerning the limiting values
of non-continuous functions do turn up in a very large number of important
application-oriented problems, and we should outline a couple of these before
starting our serious study of function limits. The first is one that we have touched
on already (see the functions p, q, r again), and the second introduces an idea that
is fundamental to differential calculus, which we shall work on in Chapter 12.
sin x
9.1.1 Example When x is interpreted as an angle in radians, the ratio
x
calculates out very close to 1 when x is small. This is the basis of many arguments
in Science as well as in Mathematics proper, in which sin x is replaced by (the much
sin x
easier to handle) x as a high-quality approximation. Let us ask, then: when is
x
exactly 1?
Reply
sin x
Never. There is no value of x for which = 1. Of course, as is widely known,
x
the ratio is very close to 1 provided that x is sufficiently close to 0…but a careful
examination shows that the only solution of the equation sin x = x is x = 0, and we
sin x 0
cannot replace x by 0 in the ratio since is meaningless. What this example is
x 0
sin x
informally expressing is that the limit of the function , as x approaches 0, is 1.
x
9.1.2 Example The point P = (3, 9) lies on the graph of the quadratic function
f (x) = x2 . In an attempt to evaluate the slope of the graph exactly at the point
P, we take a nearby point on the graph – say, the point Q = (3 + h, (3 + h)2 ) –
and work out the gradient of the straight line PQ. Provided that the number h is
really small, that straight line should hug the curve closely enough to ensure that
the gradient of PQ will be a good approximation to the gradient of the curve itself
and – thinking imprecisely for a moment – when h = 0, the approximation ought
to become perfect. What, then, actually does happen to the gradient of PQ when
we allow h to become zero?
158 9 LIMIT OF A FUNCTION
Reply
It ceases to have any meaning whatsoever. Since the gradient of the straight line
PQ (change in y-coordinate divided by change in x-coordinate) is actually
(3 + h)2 − 32 6h + h2
= ,
(3 + h) − 3 h
0
replacing h by 0 gives, once again, the meaningless symbol . Notice that we cannot
0
simply cancel an h top and bottom in the previous display unless we write h = 0
into the contract, because cancelling just means dividing top and bottom lines by h,
and this is illegal precisely in the special case h = 0 that we really wanted to get
(3 + h)2 − 32
at. What the example is trying to work towards is not the value of
(3 + h) − 3
at h = 0, but its limit at that point: for this is what will give us the gradient of the
curve itself at the point P.
9.2.1 Definition Let S be a set of real numbers and p a real number. We call p a
limit point1 of S if there is at least one sequence of elements of S \ {p} that converges
to p. (Note that p may or may not be an element of S.)
9.2.2 Notes
1. The only case that will occupy our attention is that in which S is the domain of
some function. Then p being a limit point of S is exactly what is needed in
order that we can sensibly try to find a limit of that function at p.
2. If S happens to be an interval then it is easy to see that the limit points of S are
precisely the points of S and the endpoints of S (one or both of which, of
course, might be elements of S already).
3. In nearly all the examples in this text, the domain of a function will be either a
non-degenerate interval or the union of a finite list of non-degenerate
intervals. For such a domain D it is again easy to see that the limit points are
simply the points of D together with all the endpoints of those intervals. (The
recent exception was x!, whose domain N ∪ {0} had no limit points
whatsoever.)
4. In the event that you might need to deal with some function whose domain is
more complicated, the following lemma may be useful in identifying limit
points:
Proof
If p is a limit point of S, choose a sequence (xn ) in S\{p} such that xn → p. Then for
any choice of ε > 0 there are, in fact, infinitely many xn in the interval (p−ε, p+ε),
and they are all different from p.
Conversely, suppose that the displayed condition holds. Then, choosing ε = 1/n
for each positive integer n in turn, we can find a point of S (call it xn ) in (p−ε, p+ε)
but distinct from p. The sequence (xn ) of elements of S \ {p} thus created satisfies
p − 1/n < xn < p + 1/n so, using the squeeze, it converges to p. Hence p is indeed
a limit point of S.
{n + (m + 1)−1 : n ∈ N, m ∈ N}.
for every sequence (xn ) in D\{p} such that xn → p, we find that f (xn ) → .
It is also common practice to call the limit of f at p, and to write all this as
lim f (x) =
x→p
x2 − 9
f (x) =
x−3
converges to a limit of 6 as x → 3.
Solution
The domain of f is D = (−∞, 3) ∪ (3, ∞) so it has 3 as a limit point.
160 9 LIMIT OF A FUNCTION
(There is an important point near the end of the line which it is all too easy to miss.
When we cancel xn − 3, what we are doing is dividing top and bottom by xn − 3. Of
course we dare not divide by zero . . . but since xn belongs to D \ {3}, we know that
xn − 3 is definitely non-zero; it is precisely this detail that allows us to cancel, and
thus saves us from hitting a nonsensical conclusion, such as: that the limit is 00 .)
Hence f (x) → 6 as x → 3.
Alternative solution
To see more easily what is happening when x is close to 3, it often helps to put
x = 3 + h and then consider h → 0 instead. Thus, given any sequence (xn ) in
D \ {3} such that xn → 3, if we set xn = 3 + hn for each n, then we see that:
(3 + hn )2 − 9 6hn + h2n
f (xn ) = f (3 + hn ) = = = 6 + hn → 6.
(3 + hn ) − 3 hn
(Please note again that the cancellation is legal precisely because we are not
cancelling or dividing zeros: hn can never be exactly zero because xn is never
exactly 3.)
We conclude that limx→3 f (x) = 6.
For a fairly straightforward question such as the previous one, there was very
little to choose between the two different solutions we demonstrated. However, if
the algebra is more complicated and the possibility of cancelling less obvious, then
the trick of putting x = a + h to explore close to x = a can save you quite a bit of
time and effort.
x5 + 1
f (x) = .
x4 − 1
Solution
Since the only real numbers x for which x4 = 1 are 1 and −1, the bottom line goes
zero only at 1 and −1, the domain D of this function is (−∞, −1)∪(−1, 1)∪(1, ∞),
and −1 is a limit point of D.
Given any sequence (xn ) in D \ {−1} whose limit is −1, put xn = −1 + hn for
each n, and notice that
9.2 LIMIT OF A FUNCTION AT A POINT 161
(noting that hn = 0 is what allows us that cancellation, and that to keep the new
bottom line non-zero we also need to prevent xn = 1, that is, avoid letting hn = 2).
Now since hn → 0 we see that, by ignoring the first few terms if necessary, we
can be sure that hn = 2. Then
9.2.8 EXERCISE The point P = (2, 2) lies on the curve described by y = f (x) =
x3 − 3x2 + 6. Determine the gradient of this curve at the point P. (Hint: set the
problem up as in 9.1.2, and deal with the limit as in 9.2.6.)
6x2
f (x) = 0 when x = 2 or x = 5/2; otherwise f (x) = .
2x − 5
Solution
The domain is R since, although the fraction formula fails to make sense at x = 5/2,
a separate definition of f (5/2) has been provided. (Why a separate definition of
f (2) has also been provided is not obvious, but the definition of f as a whole is
unambiguous.)
We consider any sequence (xn ) in R\{2} whose limit is 2, and we put xn = 2+hn
for each n, where hn → 0. By ignoring (if necessary) the first few terms, we can
arrange that |xn −2| < 1/2, that is, 3/2 < xn < 5/2: therefore the separate definition
of f at 5/2 is no longer relevant, and
– whose limit (using again the algebra of limits for sequences) is −24. Therefore
9.2.10 Remark Two general points that emerge from the last example are worth
stressing. Firstly, although the value ascribed to f (2) was a distinctly odd choice,
this had absolutely no effect upon the limit as x approached 2 because, when
exploring limiting behaviour of f (x) as x approaches a point p, we don’t care what
(if anything) f (x) does when x exactly equals p – only how it behaves when x is very
close to but distinct from p.
Secondly, the separate definition given to f (5/2) was also essentially irrelevant
to the problem since any investigative sequence (xn ) converging to 2 will eventually
be closer to 2 than 5/2 is, so that the limiting behaviour of (f (xn )) cannot be
influenced by how f behaves at 5/2 or, indeed, at any significant distance from
2. This insight can even be rephrased into an occasionally useful theorem:
Proof
Suppose that g(x) → as x → p. For any sequence (xn ) in D \ {p} that converges
to p, since η > 0 we can find n0 such that n ≥ n0 implies that p − η < xn < p + η,
which in turn shows that
f (xn ) = g(xn ) → as n → ∞
Comment
What that result says is that, if you wish to determine the limit of f as x approaches
p, it is good enough to narrow your attention to what f does in any open interval
centred on p. By way of illustration, if you were asked to find the limit at x = 3.3
of the following function
√
x0.37 5x4 + 29 sin(3x2 + π/17) if x < 3,
h(x) =
2x if x ≥ 3
2 Such a function g is more properly called a restriction of f : in this case, the restriction of f to
D ∩ (p − η, p + η).
9.2 LIMIT OF A FUNCTION AT A POINT 163
then you could choose to work with h(x) as if it were only defined on, say, (3.1, 3.5).
Yet on that interval, h(x) = 2x is just twice an identity function, so its limit is almost
immediately seen to be 6.6.
The usefulness of the insight, that the limit as x → 3.3 is only influenced by
what happens locally at 3.3, for example, on the interval (3.1, 3.5), is that it allows
us to ignore completely the more complicated behaviour of the function outside
that locality.
x3
g(x) = 0 when x is any integer; otherwise g(x) = .
x+2
9.2.14 Example
√ To investigate the limit, as x → 0, of the function
f (x) =( x4 − x2 )2 .
Solution
(Aside: yes, of course we want to ‘cancel’ the squaring and the square root, but this
will be legal only when the square root exists as a real number.)
Now x4 − x2 factorises easily into x2 (x + 1)(x − 1), from which we see that it is
non-negative when x ≤ −1 and when x = 0 and when x ≥ 1, but negative when
x lies in the open interval (−1, 0), and negative again when x lies in (0, 1). Under
our convention about the domain of a formula-defined function comprising all the
real numbers for which the formula delivers a real answer, f (x) exists and is x4 −x2
on (−∞, −1] ∪ {0} ∪ [1, ∞) but is undefined on (−1, 0) ∪ (0, 1).
Since 0 is the only member of the domain of f that lies in, for instance, the
interval (−0.5, 0.5), 0 is not a limit point of the domain, so the limit of f (as we
approach 0) is not defined.
164 9 LIMIT OF A FUNCTION
graph of x4 − x2
graph of √x4 − x2
graph of (√x4 − x2 )2
At this stage we can start developing basic theorems about function limits that
will allow us to handle them more efficiently than by the definition alone. Some
of these will strike you as very predictable, given what we have already seen about
sequences and continuous functions. We start with an observation that justifies
our use of the word the whenever we talk about the limit of a function at a point:
Proof
With a view to a contradiction, suppose that f : D → C, that p is a limit point of D,
that f (x) → 1 and f (x) → 2 as x → p, and that 1 = 2 . Pick any sequence (xn )
in D \ {p} that converges to p as limit. Then f (xn ) has to converge both to 1 and
to 2 : which is impossible by 2.7.10.
9.2 LIMIT OF A FUNCTION AT A POINT 165
Proof
These can all be proved in the same (and hopefully obvious) way. For example, let’s
do numbers (3) and (5):
Limit of a product In the above notation, let (xn )n∈N be an arbitrary sequence in
D ∩ B \ {p} that converges to p. Then (via the algebra of sequence limits)
and therefore
f f (x)
(x) = → (as x → p)
g g(x) m
as required.
9.2.18 EXERCISE Choose two other parts of this theorem and write out proofs
for them.
166 9 LIMIT OF A FUNCTION
9.2.19 Theorem: a squeeze or sandwich rule for function limits Suppose that
f : D → R, g : D → R, h : D → R are three functions, that D ⊆ D ∩ D (that
is, wherever g is defined, so are f and h), that p is a limit point of D , that
f (x) ≤ g(x) ≤ h(x) for each x ∈ D and that
Then also
lim g(x) = .
x→p
Solution
First, suppose that (1) holds. With a view to establishing (2), let (xn ) be any
sequence in D \ {p} that converges to p. Then merely because (xn ) is a sequence in
D that converges to p, continuity gives f (x) → f (p) as x → p as we wanted.
The converse is a little less straightforward. Suppose this time that (2) holds, and
let (xn ) be any sequence in D that converges to p.
• If there are only finitely many values of n for which xn = p then we can ignore
them without implication for limiting processes, and thus regard (xn ) as a
sequence in D \ {p} that converges to p. By supposition, f (xn ) → f (p) as
n → ∞.
• At the other extreme, if there are only finitely many values of n for which
xn = p, then we can equally well ignore them, regard (xn ) as a constant
sequence (p), and immediately have f (xn ) = f (p) → f (p) as n → ∞.
• In the remaining case, (xn ) divides up into two (infinite) subsequences, one of
which (call it (yn )) lies in D \ {p} and converges to p, while the other (call it
(zn )) is constant at p. Since both (f (yn )) and (f (zn )) converge to f (p) – the first
9.2 LIMIT OF A FUNCTION AT A POINT 167
via condition (2) and the second because it is constant, an exercise in Chapter 5
(see 5.2.5, 5.2.6) tells us that f (xn ) → f (p) once again.
In each of the three possible scenarios, we have what we needed in order to
conclude that f is continuous at p.
Once more we find that a benefit of possessing a battery of basic theorems is that
we can tackle examples without having to fall back on the definitions:
Solution
Notice that |f (x)| ≤ 7x in all cases. That is:
Now the functions 7x and −7x are both continuous (on R) so, using 9.2.21,
limx→0 7x = 0 and limx→0 (−7x) = 0. Using the ‘new squeeze’ 9.2.19 on the
previous display gives limx→0 f (x) = 0 as required.
Alternative solution
Since what happens at −1 is irrelevant to this limit, we can assume that x = −1.
Then (with a little effort of factorisation):
x5 + 1 (x + 1)(x4 − x3 + x2 − x + 1)
lim f (x) = lim = lim
x4 − 1 (x + 1)(x − 1)(x2 + 1)
x −x +x −x+1
4 3 2
= lim
(x − 1)(x2 + 1)
lim(x − x3 + x2 − x + 1)
4 5 5
= = =−
lim((x − 1)(x + 1))
2 (−2)(2) 4
3 This is an example in which any attempt at sketching the graph is likely to waste quite a lot
of time!
168 9 LIMIT OF A FUNCTION
(noting that all the limits are as x → −1, the cancellation was legitimate since
x + 1 was non-zero, the algebra of limits allowed us to operate on the top and
bottom lines separately, and the fact that top and bottom lines were continuous
gave us the numerical values of those two limits immediately.)
Solution
We can safely assume x = 4 since behaviour at 4 has no consequences for a limit
while approaching 4. With that proviso:
√ √
x−2 x−2 1 1 1
f (x) = = √ √ =√ → =
x−4 ( x − 2)( x + 2) x+2 2+2 4
√
because the function√ x is continuous (see 8.6.10) and so its limit as we approach
4 is merely its value 4 at 4 (and because we can use the algebra of limits theorem).
x4 − 81 x3 − 3x2 − 9x + 27
lim and lim .
x→3 x4 − 8x2 − 9 x→3 x6 − 243
9.2.26 HARDER EXERCISE See if you can determine the limit of the function
discussed in 9.2.22 as x → 23 , and its limit as x → π . Don’t worry if this turns
out to be difficult – we shall encounter, in the next chapter, an alternative method
that works more easily for certain questions, including ones like this.
.........................................................................
10 Epsilontics and
functions
.........................................................................
for each ε > 0 there exists δ > 0 such that |x − p| < δ implies that |f (x) − | < ε
Undergraduate Analysis: A Working Textbook, Aisling McCluskey and Brian McMaster 2018.
© Aisling McCluskey and Brian McMaster 2018. Published 2018 by Oxford University Press
170 10 EPSILONTICS AND FUNCTIONS
Now it is perfectly possible to develop the entire theory of function limits and,
in turn, the entire theory of continuous functions, starting from this definition.
We shall do a little of this just to illustrate that it can be done, but it is a common
student experience that 10.1.1 is a harder definition to use than 9.2.5 (and also a
common lecturer experience that it is a harder definition to teach) which is why
we opted to make 9.2.5 our primary definition here.
A task that we cannot shirk is to show that 9.2.5 and 10.1.1 really are logically
equivalent. This is quite a sophisticated proof, and you might want to omit it on
a first reading, but it is important and urgent to take on board what the result
is saying: that despite their apparent differences, 10.1.1 and 9.2.5 are completely
interchangeable: if either one of them is satisfied, then so must the other be. You
are therefore free to use whichever of the two definitions you please (other things
being equal) in any given argument.
Proof
(I): (1) implies (2).
• Suppose that condition (1) is satisfied.
• Let (xn )n∈N be an arbitrary sequence of elements of D \ {p} that converges
to p.
• For a given positive value of ε, use condition (1) to obtain a number δ > 0
such that whenever x ∈ D and 0 < |x − p| < δ, we have |f (x) − | < ε.
• Because xn → p, there is a positive integer n0 such that n ≥ n0 will
guarantee that |xn − p| < δ.
• Also 0 < |xn − p| because xn is not equal to p.
10.1 THE EPSILONTIC VIEW OF FUNCTION LIMITS 171
• Therefore
TIMEOUT
Although the mathematics in the upcoming proof of the converse is reasonably
straightforward, the logical content is easily the most sophisticated that we have
dealt with so far, so let us call ‘timeout’ and pick our way through it one small step
at a time. We’ll also continue, for the moment, to bullet-point out those steps, to
try for a little added clarity.
• Instead of trying to show directly that (2) implies (1), we shall call in the logical
device of contraposition and show instead (but equivalently) that NOT-(1)
implies NOT-(2).
• Condition (1) says that, for every ε that we are challenged with, we can find a
positive δ that ‘works’.
• If this is not true then there is some special and awkward value of ε for which
no value of δ that we choose to try will ‘work’.
• In particular, if we pick a positive integer n and try δ = 1/n, it will not ‘work’.
• In other words, the implication
• That is to say, condition (2) – which claims validity for every appropriate
sequence that converges to p, is NOT TRUE.
• The proof, that NOT-(1) implies NOT-(2), is therefore now complete.
Once you have managed to follow the details of that expanded argument, you
should be able to grasp the sort of condensed version that typically appears in
textbooks:
(II): (2) implies (1).
• Suppose that condition (1) is not satisfied.
• That is, there exists a value of ε > 0 for which no suitable δ > 0 can be
found.
• In particular, for each n ∈ N, δ = 1/n is not suitable…
• …and so there is xn ∈ D such that 0 < |xn − p| < 1/n and yet
|f (xn ) − | ≥ ε.
• Therefore (xn )n∈N converges to p (with each xn ∈ D \ {p}), and yet
(f (xn ))n∈N does not converge to .
• That is, condition (2) is not satisfied.
• By contraposition, (2) implies (1).
x3 − 1000
f (x) = → 15 as x → 10
x2 − 100
Solution
We can assume that x = 10 since behaviour at 10 is irrelevant to the limit, and so
Let ε > 0 be given. We need to decide: how close to 10 must we take x in order
that this error term shall be less than ε in modulus?
• If we make |x − 10| < 1, that is, 9 < x < 11, then |x + 5| < 16 and
|x + 10| > 19, so |f (x) − 15| < (16/19)|x − 10|.
• If we also make |x − 10| < 19ε/16 then (16/19)|x − 10| < ε.
10.1 THE EPSILONTIC VIEW OF FUNCTION LIMITS 173
0 < |x − 10| < δ ⇒ |f (x) − 15| < (16/19)|x − 10| < ε, as required.
10.1.4 Example To use the epsilon-delta definition of limit to show that a func-
tion cannot converge to two or more different limits at a limit point of its domain.
Proof
With a view to a contradiction, suppose that f : D → C, that p is a limit point of D,
that f (x) → 1 and f (x) → 2 as x → p, and that 1 = 2 . Arrange the labelling
2 − 1
so that 1 < 2 for convenience, and put ε = > 0. Using 10.1.1, there
2
exist two positive numbers δ1 and δ2 such that
Now since δ1 and δ2 are each positive, so is the lesser of the two (whichever one
it is). Call that lesser δ3 . Because p is a limit point of D there must actually be1 an
element of D – let us call it x , for instance – that satisfies 0 < |x − p| < δ3 . So
both of the displayed lines apply to x , and we therefore know that
that is, 2ε < 2ε, which is as crisp a contradiction as one can ask for.
1 See 9.2.3
174 10 EPSILONTICS AND FUNCTIONS
10.1.7 Example The function f : (0, ∞) → R is defined thus: for each irrational
p
x, put f (x) = 0; for each rational x = q where the fraction has been expressed in
its lowest terms, put f (x) = q5 if q is prime but f (x) = − q7 if q is not prime. To
show that f (x) → 0 as x → π .
Solution
Let ε > 0 be given. Consider a positive integer N. The interval (π − 1, π + 1)
includes π of course, and it includes only a finite2 number of rationals whose
(lowest-terms) denominators lie in the range 2 to N so, putting δ = the shortest
distance from π to one of these, we get δ > 0 and every rational r in (π − δ, π + δ)
has a denominator greater than N. It follows that |f (r)| < 7/N and, since f is
exactly zero at each irrational, we get
7
0 < |x − π | < δ ⇒ x ∈ (π − δ, π + δ) ⇒ |f (x) − 0| < .
N
Therefore if we choose N so large that 7/N < ε, we have what is required in order
to show that the limit of f at π is 0.
If you carefully compare this with 10.1.1, it will strike you that we have left out
any reference to p being a limit point of D. This is not a casual oversight, for it
2 Since the length of this open interval is 2, it cannot include more than 4 rationals of
denominator 2, nor more than 6 rationals of denominator 3, nor more than 8 rationals of
denominator 4, and so on.
10.2 THE EPSILONTIC VIEW OF CONTINUITY 175
turns out that 10.2.1 is entirely equivalent to our original definition of continuity
whether or not p is a limit point of D. The following proof of this assertion is so
like that of 10.1.2 that, were it a bit shorter or a bit less complicated, we would have
asked you to check it out as an exercise (and you might still decide to try that if you
have already properly grasped the argument of 10.1.2).
Proof
(I): (1) implies (2).
• Suppose that condition (1) is satisfied.
• Let (xn )n∈N be an arbitrary sequence of elements of D that converges to p.
• For a given positive value of ε, use condition (1) to obtain a number δ > 0
such that whenever x ∈ D and |x − p| < δ, we have |f (x) − f (p)| < ε.
• Because xn → p, there is a positive integer n0 such that n ≥ n0 will
guarantee that |xn − p| < δ.
• Therefore
We shall again present a couple of samples of how to use the alternative definition
in arguments:
Solution
Suppose that f : D → C and g : C → B are both continuous. We need to verify
that their composite g ◦ f : D → B is continuous. So let p ∈ D and ε > 0 be given.
Since g is continuous at the point f (p) of its domain, there is δ1 > 0 such that
y ∈ C, |y − f (p)| < δ1 together imply |g(y) − g(f (p))| < ε.
Since f is continuous at p (and δ1 > 0), there is δ > 0 such that x ∈ D, |x−p| < δ
together imply |f (x) − f (p)| < δ1 .
Thus, for x ∈ D and |x − p| < δ, we get |f (x) − f (p)| < δ1 and, consequently,
|g(f (x)) − g(f (p))| < ε, that is, |(g ◦ f )(x) − (g ◦ f )(p)| < ε. So g ◦ f is (by the
epsilon-delta definition) continuous at each element of its domain.
Solution
We start from the fact that (for all real t) t ≥ t > t − 1 simply by the definition
of floor or ‘integer part’. So, for x > 0,
1 1 1
1=x ≥x = f (x) > x − 1 = 1 − x.
x x x
10.2.6 EXERCISE Verify, using the epsilon-delta definition of continuity, that the
function discussed in paragraph 10.1.7 is continuous at every positive irrational
number, but discontinuous at every positive rational number.
10.2.7 EXERCISE Show, without using sequences, that the following function
x2 + 4x − 1 if x < 3,
g(x) =
x3 − x2 + 3 if x ≥ 3
f (x) g (x)
1
2
a a
f (x) → 1 as x → a g (x) → 2 as x → a
1
2
There were a few examples in the preceding section ( x close to an integer value
for x and 10.2.7 are cases in point) where the reader can be forgiven for thinking
informally along the following lines: ‘this function f (x) does not have a limit as
x →p . . . and yet, if we were allowed to look only at values of x close to but just
less than p, it does appear to be settling to a limiting value . . . and likewise if we
look only at values of x close to but just greater than p.’ This is a perfectly legitimate
178 10 EPSILONTICS AND FUNCTIONS
idea, and to explore and develop it we need first to formulate a clear definition of
such one-sided limits. By this point in the text you are probably able to guess even
the definitions with high accuracy.
10.3.1 Definition Suppose that f : D → C and that p is a limit point of D∩(p, ∞).
Then we call a number the right-hand limit of f (x) at p, or the limit on the right of
f (x) at p, and say that f (x) converges to as x → p from the right (or from above)
if, for each ε > 0, there is δ > 0 such that
f (x)
+ε
−ε
p p+δ x
+ε
−ε
p−δ p
10.3.3 Remark You should keep in mind that one-sided limits, just like other
types of limit, can fail to exist. For instance, the function
x−1 if x < 0,
f (x) =
sin(x−1 ) if x > 0
Sketch graph of f
180 10 EPSILONTICS AND FUNCTIONS
does not have a left-hand limit as x → 0 because the function is unbounded close
to 0 on the left, and also does not have a right-hand limit here (informally speaking,
because sin(1/x) oscillates wildly as x → 0+ rather than settling towards a stable
value). To prove that second point properly, you can use an argument involving
two sequences of positive numbers both homing in on x = 0 but at which the sine-
one-over-x function gives two streams of values with different limits. The overall
cautionary comment is that we must never assume that a function has limits of
any kind unless there is enough information given to guarantee that it has. In this
connection, also note Exercise 10.3.5.
10.3.4 Example In reviewing the work we did earlier on the floor or integer part
x of x we notice that, in the new notation, our observations amounted to:
lim x = p, lim x = p − 1
x→p+ x→p−
Show that if f is both increasing and bounded on (a, b) then limx→b− f (x) and
limx→a+ f (x) must both exist.
Partial solution
Think about the supremum and the infimum of the values of f (x) on the interval,
and argue as in the proof that a bounded monotonic sequence has to converge.
Proof
The proof amounts to little more than a re-run of that of 10.1.2 but ensuring that
every xn shall be greater than p.
1. f (x) + g(x) → + m,
2. f (x) − g(x) → − m,
3. f (x)g(x) → m,
4. for constant k, kf (x) → k,
5. provided that m = 0,
f (x)
→ ,
g(x) m
6. |f (x)| → ||.
Partial proof
As a sample, let us set up a proof of part (3). For any sequence (xn ) in D∩B∩(p, ∞)
such that xn → p we know (from 10.3.6) that f (xn ) → and that g(xn ) → m. The
algebra of limits for sequences tells us that f (xn )g(xn ) → m. Thus we see (from
10.3.6 again) that f (x)g(x) → m as x → p+ .
Then also
lim g(x) = .
x→p+
EXERCISE
Write out a proof of 10.3.8.
All of this material can be tweaked routinely to apply also to left-hand limits,
and the proofs are routine variations of what we have already done.
The last result we set out in this chapter concerns how the two one-sided limits
(of a function, at a point), if they both exist, can either agree to create a limit in the
full sense, or disagree to prevent a limit (in the full sense) existing.
Proof
Suppose that f (x) → as x → p. Given positive ε,3 it is therefore possible to find
δ > 0 such that
In particular,
x ∈ D, p−δ < x < p ⇒ |f (x)−| < ε and x ∈ D, p < x < p+δ ⇒ |f (x)−| < ε.
Solution
For any sequence (xn ) of numbers less than −2 that converges to −2, we have
3 Somewhat unusually, the ε-style definition is more convenient in this demonstration than
the sequence-style alternative.
10.3 ONE-SIDED LIMITS 183
by the algebra of limits for sequences, and therefore limx→−2− g(x) exists and
equals 1 via the natural sequence description of left-hand limits. Likewise, for any
sequence (xn ) of numbers greater than −2 that converges to −2, we have
By the preceding Theorem 10.3.9, the ‘two-sided limit’ limx→−2 g(x) exists (and
equals 1).
Alternative solution
For x < −2, the function g and the (continuous) quadratic 3x2 + 5x − 1 are
identical, so
(where we tacitly used 10.3.9 to switch from limx→−2− to limx→−2 for the
quadratic).
Similarly
f (x) = ax + b if 2 ≤ x ≤ 5,
⎪
⎪
⎪
⎩ x2
x2 −21x+110
if 5 < x < 10
To speak of infinity in connection with limits can seem almost contrary to the basic
meaning of the word, since its usual import is the complete absence of any limit.
Nevertheless there are many natural and simple functions f in which the value
f (x) settles towards some kind of stable or equilibrium state not as x approaches
a particular (finite) number p, but as x becomes enormously big and positive (or
enormously big and negative). In many ways this is actually closer to the limiting
behaviour of sequences that we first studied, where the focus of attention was on
how the typical term xn behaved as n tended to infinity and, indeed, it is scarcely
x2 − 1
possible to draw sketch graphs of functions such as x−1 or x−2 or arctan x or 2
x +1
without some phrase about their behaviour as x tends to infinity coming to mind.
Our first objective in this chapter is to formulate clear definitions of these ideas
and to develop some basic theory concerning them. This will offer us very little
difficulty provided we resist the temptation to regard ∞ and −∞ as numbers, or
to use senseless symbols such as |x − ∞| to assess how ‘close’ x is to infinity.
f (x)
+ε ···
·· ·· ··
−ε ···
K x
f (x) tends to limit as x → ∞
Undergraduate Analysis: A Working Textbook, Aisling McCluskey and Brian McMaster 2018.
© Aisling McCluskey and Brian McMaster 2018. Published 2018 by Oxford University Press
186 11 INFINITY AND FUNCTION LIMITS
11.1.1 Definition Suppose that f : D → C and that its domain D is not bounded
above.1 Then we say that f (x) converges to the limit (or tends to ) as x → ∞ if,
for each ε > 0, there is K ∈ R such that
It is always safe to assume that K is positive in the above definition: for if K were
negative or zero, then x > |K| + 1 ⇒ x > K ⇒ |f (x) − | < ε while x ∈ D, and
|K| + 1 certainly is positive.
f (x)
··· +ε
·· ·· ··
··· −ε
K x
11.1.2 Definition Suppose that f : D → C and that its domain D is not bounded
below.2 Then we say that f (x) converges to the limit (or tends to ) as x → −∞
if, for each ε > 0, there is K ∈ R such that
1 This is just to ensure that there are arbitrarily big (positive) values of x for which f (x) makes
sense.
2 This is just to ensure that there are arbitrarily big negative values of x for which f (x) makes
sense.
11.1 LIMIT OF A FUNCTION AS X TENDS TO INFINITY OR MINUS INFINITY 187
It is always safe to assume that K is negative in this second definition: for if K were
positive or zero, then x < −|K| − 1 ⇒ x < K ⇒ | f (x) − | < ε while x ∈ D, and
−|K| − 1 certainly is negative.
x3
f (x) =
1 − x3
converges to −1 as x → ∞, and also converges to −1 as x → −∞.
Solution
The domain of f is (−∞, 1)∪(1, ∞) which is neither bounded above nor bounded
below, so both questions are legitimate.
Provided that x > 1, we have
1
|f (x) − (−1)| = < ε ⇔ x3 − 1 > ε−1 ⇔ x > 3 1 + ε−1
1−x 3
√
so, for a given ε > 0, if we choose K = max{1, 3 1 + ε−1 },3 we get |f (x)−(−1)| < ε
whenever x > K, as required to show f (x) → −1 as x → ∞.
Provided that x < 1, we next have
1
|f (x) − (−1)| = < ε ⇔ 1 − x3 > ε−1 ⇔ x < 3 1 − ε−1
1 − x3
√
so, with ε > 0 given, if we choose K = min{1, 3 1 − ε−1 },4 we find
|f (x) − (−1)| < ε whenever x < K, as needed to demonstrate that
f (x) → −1 as x → −∞.
11.1.4 EXERCISE Confirm that the function tanh x defined by the formula
ex − e−x
tanh x =
ex + e−x
√
3 Actually, this piece of notation is heavier than it need be, since
√
3
1 + ε−1 is greater than 1.
4 Again, this notation is unnecessarily heavy-handed, since 1 − ε−1 is clearly less than 1.
3
188 11 INFINITY AND FUNCTION LIMITS
Proof
(I): (1) implies (2).
• Suppose that condition (1) is satisfied.
• Let (xn )n∈N be an arbitrary sequence of elements of D such that xn → ∞.
• For a given positive value of ε, use condition (1) to obtain a number K
such that whenever x ∈ D and x > K, we have | f (x) − | < ε.
• Because xn → ∞, there is a positive integer n0 such that n ≥ n0 will
guarantee that xn > K.
• Therefore
n ≥ n0 ⇒ xn > K ⇒ |f (xn ) − | < ε.
• That is, f (xn ) → as required.
• Since (xn ) was any sequence in D that happened to tend to ∞, condition
(2) is now proved.
(II): (2) implies (1).
• Suppose that condition (1) is not satisfied.
• That is, there exists a value of ε > 0 for which no suitable K can be found.
• In particular, for each n ∈ N, K = n is not suitable…
• . . . and so there is xn ∈ D such that xn > n and yet | f (xn ) − | ≥ ε.
• Therefore (xn )n∈N tends to ∞ (with each xn ∈ D), and yet ( f (xn ))n∈N
does not converge to .
• That is, condition (2) is not satisfied.
• By contraposition, (2) implies (1).
Use of this theorem is often a convenient way to deal with a problem on function
limits as x → ∞, and also to prove the (predictable) theorems about those limits,
such as:
3. f (x)g(x) → m,
4. for constant k, kf (x) → k,
5. provided that m = 0,
f (x)
→ ,
g(x) m
6. | f (x)| → ||.
11.1.7 EXERCISE Select any part of this theorem and give a proof of it using
sequences.
11.1.8 Remark It is, of course, perfectly possible for the limit of a function f (x)
as x → ∞ or as x → −∞ not to exist at all. For instance (and assuming the basic
behaviour of trigonometric functions) both of the sequences
Then also
lim g(x) = .
x→∞
The analogous results for function limits as x → −∞ are almost too obvious
even to state but, for the sake of completeness, here is the basic set. If you wish, feel
free to prove any of them as an additional exercise. (The first theorem is the one
that would be most worthwhile to try proving, since the others are very routine.)
2. for every sequence (xn )n∈N of elements of D that tends to −∞, we have
f (xn ) → .
190 11 INFINITY AND FUNCTION LIMITS
Then also
lim g(x) = .
x→−∞
x2 + sin x
q(x) = , x ∈ (1, ∞).
x2 + cos x
Solution
By part (I) of 11.1.14, the statement x−2 → 0 as x → ∞ is equivalent to the
statement (1/x)−2 → 0 as x → 0+ , that is, to x2 → 0 as x → 0+ . Yet the truth of
the latter is immediate from the continuity of x2 .
Next, for the usual trigonometric reasons,5 we have (for x > 1)
x2 − 1 x2 + 1
≤ q(x) ≤ .
x2 + 1 x2 − 1
x2 − 1 1 − x−2 1−0
= −2
→ = 1 as x → ∞,
x +1
2 1+x 1+0
x2 + 1 1 + x−2 1+0
= → = 1 as x → ∞.
x −1
2 1 − x−2 1−0
Lastly, the 11.1.9 version of the squeeze gives us the desired conclusion q(x) → 1
as x → ∞.
x3 + 5 + (3x2 − 7x − 2) sin x
.
x3 + 5 + (3x2 − 7x − 2) cos x
|f (x) − | < ε
0 < |x − p| < δ.
Now that we have divided out the two aspects of any limiting process, we can
look separately at the conditions that can be imposed upon them to express various
convergence/divergence behaviours of functions, including those that we have
already studied.
For the effects:
• f (x) converges to means f (x) gets very close to : | f (x) − | < ε for any given
positive ε;
• f (x) diverges to ∞ means f (x) gets extremely big and positive: f (x) > K for
any given real K;
11.2 FUNCTIONS TENDING TO INFINITY OR MINUS INFINITY 193
• f (x) diverges to −∞ means f (x) gets extremely big but negative: f (x) < K for
any given real K.
For the causes:
• x tends to p means x gets very close to but distinct from p: 0 < |x − p| < δ for a
suitably chosen positive δ;
• x tends to p− means x gets very close to p from the left but remains distinct
from p: p − δ < x < p for a suitably chosen positive δ;
• x tends to p+ means x gets very close to p from the right but remains distinct
from p: p < x < p + δ for a suitably chosen positive δ;
• x tends to infinity means x becomes very big and positive: x > K for a suitably
chosen real number K (which we can assume to be positive);
• x tends to minus infinity means x becomes very big but negative: x < K for a
suitably chosen real number K (which we can assume to be negative).
(Be careful not to use the same symbol K in cause and effect if you are combining
an infinity-type cause with an infinity-type effect. This is why we used K instead
of K on the last few lines.)
Now let’s assemble one of the new definitions: say, that of f (x) diverging to minus
infinity as x approaches p one-sidedly from the right. The desired effect is f (x) < K
for any given K. The appropriate cause is p < x < p + δ for some suitable δ chosen
in response to the challenge K. Combining:
f (x)
p p+δ x
f (x) tends to −∞ as x → p+
194 11 INFINITY AND FUNCTION LIMITS
x
K
For each challenge K…
...there is a response K
f (x) tends to ∞ as x → −∞
Roughwork
x 1
Since = −1 + , we ask, given K and thinking of x as being slightly
1−x 1−x
1 1
greater than 1, how do we contrive that −1 + < K? That is, < K + 1?
1−x 1−x
Make sure for a start that K is strictly less than −1 to keep the signs unambiguous,
1 1
and this is then the same as 1−x > (1+K)−1 , that is, x < 1− = 1+ .
1+K |K| − 1
Solution
Given K, we first choose K ∗ = min{−2, K} (just to guarantee that the ‘new’ K shall
1
be strictly less than −1). Then, as the roughwork shows, the choice δ = ∗
|K | − 1
will ensure that
x
1<x<1+δ ⇒ < K∗ ≤ K
1−x
in line with the requirements.
Solution
Given K ∈ R (and arranging
√ if necessary that K > 0 so that the next step makes
sense), we choose K = − K (< 0). Then
x+1
lim = ∞.
x→2 (x − 2)2
11.2.16 EXERCISE (Assuming basic facts about the sine function, including that
it is continuous) show that the function
1
cosec x =
sin x
Proof
(I): (1) implies (2).
• Suppose that condition (1) is satisfied.
• Let (xn )n∈N be an arbitrary sequence of elements of D such that xn → ∞.
• For a given K, use condition (1) to obtain a number K such that whenever
x ∈ D and x > K , we have f (x) > K.
• Because xn → ∞, there is a positive integer n0 such that n ≥ n0 will
guarantee that xn > K .
• Therefore
n ≥ n0 ⇒ xn > K ⇒ f (xn ) > K.
• That is, f (xn ) → ∞ as required.
• Since (xn ) was any sequence in D that happened to tend to ∞, condition
(2) is now proved.
198 11 INFINITY AND FUNCTION LIMITS
2. for every sequence (xn )n∈N of elements of D ∩ (p, ∞) that tends to p, we have
f (xn ) → −∞.
Proof
(I): (1) implies (2).
• Suppose that condition (1) is satisfied.
• Let (xn )n∈N be an arbitrary sequence of elements of D ∩ (p, ∞) such that
xn → p.
• For a given value of K, use condition (1) to obtain a number δ > 0 such
that whenever x ∈ D and p < x < p + δ, we have f (x) < K.
• Because xn → p, there is a positive integer n0 such that n ≥ n0 will
guarantee that xn < p + δ.
• Therefore
n ≥ n0 ⇒ p < xn < p + δ ⇒ f (xn ) < K.
• That is, f (xn ) → −∞ as required.
• Since (xn ) was any appropriate sequence, condition (2) is now proved.
(II): (2) implies (1).
• Suppose that condition (1) is not satisfied.
• That is, there exists a value of K for which no suitable δ > 0 can be found.
• In particular, for each n ∈ N, δ = n−1 is not suitable…
• …and so there is xn ∈ D such that p < xn < p + 1/n and yet f (xn ) ≥ K.
• Therefore (xn )n∈N tends to p (with each xn ∈ D), and yet ( f (xn ))n∈N does
not diverge to −∞.
11.2 FUNCTIONS TENDING TO INFINITY OR MINUS INFINITY 199
11.2.19 EXERCISE Select two more of the various definitions for f (x) diverging
to infinity or minus infinity under constraints upon x, and formulate for each a
theorem (like the last two) characterising this divergence in terms of the behaviour
of sequences. Give a detailed proof of one of them. Do not expect to enjoy it
disproportionately.
11.2.20 EXERCISE Give sequence-based proofs of the last two worked examples
11.2.3 and 11.2.4.
Solution
Suppose firstly that f (x) → ∞ as x → ∞. Given K ∈ R, we can therefore find
K ∈ R such that x > K ⇒ f (x) > K. Without loss of generality, K > 0. Then
put δ = 1/K > 0. We have
therefore g(x) → ∞ as x → 0+ .
Suppose secondly that g(x) → ∞ as x → 0+ . Given K ∈ R, we can find δ > 0
such that 0 < x < δ ⇒ g(x) > K. Put K = δ −1 . Then
11.2.22 Example To establish this variant of 11.1.13: given that f (x) ≥ Cg(x) for
all x ∈ (a, b) where C is a positive constant, and that g(x) → ∞ as x → a+ , to show
that f (x) → ∞ as x → a+ .
Solution
Given K ∈ R, we use the fact that g(x) → ∞ to find δ > 0 such that
a < x < a + δ ⇒ g(x) > K/C. It follows that a < x < a + δ ⇒ f (x) ≥
Cg(x) > C(K/C) = K. Thus f (x) → ∞ as x → a+ as required.
200 11 INFINITY AND FUNCTION LIMITS
11.2.23 EXERCISE Use the preceding material 11.2.21 and 11.2.22 to show that
1 1
lim − = ∞.
x→0+ x2 x
12 Differentiation — the
slope of the graph
.........................................................................
12.1 Introduction
In everyday English, we call a line straight if its direction of travel is the same
wherever we choose to inspect it, and curved if the apparent direction of travel
varies as we shift the focus of our attention from one part of it to another. As usual,
those ideas need to be sharpened up before we can do significant work with them
but, at least in the case of straight lines on a plane surface, this is very elementary:
impose a grid of (cartesian) coordinates on the surface, identify two distinct points
(x1 , y1 ) and (x2 , y2 ) on the line, define the gradient or slope between them to be
y2 − y1
,
x2 − x1
and the informal idea of straightness corresponds tidily to the fact that the
numerical value of this ratio is the same no matter which two points you have
chosen.
In the case of a curved line in the coordinate plane – in particular, of the graph
y = f (x) of some function – we can still define the gradient between two points on
the graph in exactly the same way, but the notion of gradient at a typical point
is harder to make precise, partly because it is expected to vary with the point.
One approach is to draw, or to imagine drawing, a ‘tangent’ straight line that just
skims the curve at a chosen point, and whose gradient then gives a reasonable
interpretation of ‘gradient of the curve’ at that point . . . but such a procedure will
always be subject to error, to our limited drawing skill, even to the precision of our
instruments and the sharpness of our pencil. Besides, it is very time-consuming to
implement, even at a dozen or so points.
What would be really useful here is some routine procedure that could be applied
to the formula f (x) – if the curve indeed has such a formula – and that could
derive from it another formula for the gradient at any point. Now it is highly
probable that you have done enough calculus to be aware of such a procedure,
called differentiation, that works excellently upon a wide range of formulas and has
several different routines for dealing with the internal structure of formulas that are
modestly complicated, and also of important applications such as the identification
of regions where a function is increasing or decreasing, and of stationary points,
Undergraduate Analysis: A Working Textbook, Aisling McCluskey and Brian McMaster 2018.
© Aisling McCluskey and Brian McMaster 2018. Published 2018 by Oxford University Press
202 12 DIFFERENTIATION — THE SLOPE OF THE GRAPH
and maximizing or minimizing variable quantities. Our task in the present chapter
is to revisit this idea and look under the bonnet: Why does it work? What functions
does it work on? How do we proceed if we cannot access a suitable formula? Are
there functions where not only the familiar procedures fail, but the very idea of
gradient loses its meaning? What further applications can be anticipated? We make
no pretence to give complete answers to such questions, for calculus is a very large
field, but we shall make a start (and later chapters will continue aspects of the
study).
To determine the slope of the graph at a particular point (say, at the point
P = (a, f (a))) without having to depend on the uncontrolled approximation of
our limited draftsmanship, we consider a nearby point on the same graph (say,
Q = (a + h, f (a + h))) and the exact gradient of the straight line PQ. This is
f (a + h) − f (a) f (a + h) − f (a)
=
(a + h) − a h
and, if the curve is reasonably smooth (an idea which we also need to clarify) we
should expect this ratio to be an approximation to the gradient of the curve (that
is, of the ideal tangent line to the curve) at P, and that it should become a better
and better approximation as the horizontal width h of the segment PQ becomes
smaller and smaller. This leads us to define the gradient of the curve at P to be the
limit of this ratio as h → 0, and to replace the intuitive term ‘smooth’ by the simple
mathematical demand that this limit shall exist. (You may also find it useful to re-
read an earlier example – see 9.1.2 and 9.2.6 – in which we actually did all this with
the curve y = f (x) = x2 at the point (3, 9).) This is precisely what we do with a
general function f :
f (x)
Q
f (a + h)
The ‘tangent’ at P
P
f (a)
a a+h x
f (x)
The ‘tangent’ at P
P
f (a)
a x
Proof
When f (x) = k for every x in some open interval centred on p, k being a constant,
the limit
f (p + h) − f (p) k−k 0
lim = lim = lim = 0
h→0 h h→0 h h→0 h
as predicted. (Notice that we had to assume that |h| was less than the half-width of
the open interval, but that this is not a problem since h is tending to zero.)
Proof
When f (x) = x for every x in some open interval centred on p, we get
f (p + h) − f (p) (p + h) − p h
lim = lim = lim = 1
h→0 h h→0 h h→0 h
as predicted. (Again, there is the tacit assumption that |h| is small enough to put
p + h inside the interval where we know f is identity.)
12.2.4 Proposition The function f (x) = x2 is differentiable, and its derived func-
tion is f (x) = 2x.
Proof
Calculation of the relevant limit at a typical point p gives
Re-writing that to express the discovery that the derivative at any point is twice the
x-value there, we normally write f (x) = 2x (although f (p) = 2p for each p ∈ R
is equally correct).
12.2.5 Proposition For any positive integer n, the function f (x) = xn is differen-
tiable, and its derived function is f (x) = nxn−1 .
12.2 THE DERIVATIVE 205
Proof
It’s really the same argument as the others, except that this time we need the
binomial theorem to unscramble the algebra:
whose limit, as h → 0, is
npn−1 + 0 + 0 + · · · + 0 = npn−1 .
So the derived function, the derivative, is f (p) = npn−1 for each p ∈ R or, in
slightly more familiar function notation, f (x) = nxn−1 .
We could continue this catalogue of special cases, but once again it will be more
profitable and efficient to develop rules for processing ‘built-up’ functions based on
the derivatives of the simple components out of which they have been assembled.
Most of these rules should already be familiar to you, but possibly not the reasons
why they are valid, so we shall discuss them in fair detail.
Proof
For 1, notice that (for any sufficiently small h = 0)
To prepare for the next two rules, we need to connect with material from
Chapter 8 (and to keep in mind that a function f is continuous at a suitable point
p precisely when the limit of f (x) as x → p coincides with the value f (p) of f at p
– see 9.2.21 for this important insight):
Proof
Given that the limit
f (p + h) − f (p)
lim = f (p)
h→0 h
exists, all we need do is to notice that
f (p + h) − f (p)
× h + f (p) = f (p + h),
h
and now take limits (as h → 0) across this equality. We discover that
which, letting x stand for p + h, is the same as saying that f (x) → f (p) as x → p.
12.2.8 EXERCISE Show that the converse of this result is not true, by checking
that the function m(x) = |x| is continuous at x = 0 but not differentiable at
x = 0. You will almost certainly use one-sided limits (both of m(x) itself, and of
(m(0 + h) − m(0))/h) since the behaviour of m (that is, m(x) = x if x ≥ 0, but
m(x) = −x if x < 0) is different on the two sides of x = 0.
Proof
We need to evaluate the limit of
f (p + h)g(p + h) − f (p)g(p)
h
and our initial problem is that the two halves of the top line have nothing in
common. Remembering what happened in a similar impasse many pages ago, we
bring in an extra term that does have a common factor with each half, thus:
12.2 THE DERIVATIVE 207
Here is the moment when we need the ‘differentiable implies continuous’ theorem.
As h → 0, f (p + h) → f (p), and the other components of the last display have
their more obvious limits; so we get
f (p + h)g(p + h) − f (p)g(p)
→ f (p)g (p) + g(p)f (p)
h
as expected.
Partial proof
We need to evaluate the limit of
f (p+h) f (p)
g(p+h) − g(p) f (p + h)g(p) − f (p)g(p + h)
=
h hg(p + h)g(p)
and, firstly, we must make sure that no division by zero can happen. It is built into
the definition of differentiability that there are already two open intervals centred
on p in which f and g are defined. Also, using again the ‘differentiable implies
continuous’ theorem, g is continuous at p and g(p) = 0, so there is2 a third open
interval centred on p throughout which g does not take the value zero. Take now
the smallest of the three intervals: here, f and g are defined and g is non-zero so,
once h is small enough to put p + h into that interval, no risk remains of our piece
of algebra failing to make sense.
Next, we again introduce an extra term that has something in common with
each half of the top line:
2 If you do not find this to be sufficiently convincing, here is a fuller argument. Since |g(p)| > 0
and g(p+h) → g(p) as h → 0, there is δ > 0 such that whenever |h| < δ we get |g(p+h)−g(p)| <
|g(p)|. That last inequality forces g(p + h) = 0, and it holds whenever p + h lies in the interval
(p − δ, p + δ).
208 12 DIFFERENTIATION — THE SLOPE OF THE GRAPH
The rest of the proof proceeds on the same lines as did that of the product rule.
12.2.11 EXERCISE Complete this proof. (You can expect to use the ‘differentiable
implies continuous’ theorem yet again.)
Next, corresponding to the fact that the composite of two continuous functions
is continuous, we have that the composite of two differentiable functions is differ-
entiable.
Partial proof
As h → 0, the quantity k = f (p + h) − f (p) also converges to 0 since the
(differentiable) function f is continuous. Consider the equation
f (p + h) − f (p)
h
12.2 THE DERIVATIVE 209
will be converging to a non-zero limit, and must therefore itself be non-zero for
sufficiently small h. In the general case where f (p) might be zero, the result is still
true, but we shall need to find a different way of proving it.3 (See the upcoming
12.5.2 for a ‘tidy’ proof that works in all cases.)
12.2.13 Note The chain rule, as we have presented it so far, applies only to the
composite of two differentiable functions, but it extends readily to three or more.
This is a place where the older style of notation df /dx, rather than f (p), makes it
easier to explain what is going on. Suppose we are given three functions f , g, h with
domains such that the composite h ◦ g ◦ f makes sense, and each of the three is dif-
ferentiable at appropriate points (starting with p for f ). Temporarily write y = f (x),
z = g(y), u = h(z). Then
dy dz du
means f , means g , means h .
dx dy dz
3 Meanwhile, here is an ad hoc proof for the awkward case f (p) = 0. The proof is really
intricate, and it will probably be wise to skip it on a first (and, indeed, on a second and a third)
reading.
Suppose f (p) = 0, q = f (p), g (q) = M. For small values of h, put k = f (p + h) − f (p), so that
f (p + h) = f (p) + k, that is, f (p + h) = q + k. Let ε > 0 be given. Then:
g(q + k) − g(q)
there is δ1 > 0 such that 0 < |k| < δ1 ⇒ ∈ [M − 1, M + 1]
k
⇒ |g(q + k) − g(q)| ≤ (1 + |M|) × |k|.
This also holds when k = 0. Continuing:
f (p + h) − f (p) ε
there is δ2 > 0 such that 0 < |h| < δ2 ⇒ <
1 + |M|
h
ε|h|
⇒ |f (p + h) − f (p)| <
1 + |M|
ε|h|
⇒ |k| < .
1 + |M|
Now f is continuous at p, so
there is δ3 > 0 such that |h| < δ3 ⇒ |f (p + h) − f (p)| < δ1 that is, |k| < δ1 .
Finally, if 0 < |h| < min{δ2 , δ3 }, we get:
ε|h|
|k| < δ1 , therefore |g(q + h) − g(q)| < (1 + |M|) ×
1 + |M|
which implies
g(f (p + h)) − g(f (p))
< ε.
h
Hence, since ε was arbitrary,
g(f (p + h)) − g(f (p))
lim = 0 = g (f (p))f (p).
h→0 h
210 12 DIFFERENTIATION — THE SLOPE OF THE GRAPH
(The weakness of this notation is that it does not explicitly name the individual
points at which these derivatives are evaluated – these have to be judged by
context.)
Then, using the chain rule,
dz
z = g(f (x)) so is (g ◦ f ) (p) = g (f (p))f (p)
dx
and, continuing,
du
u = h((g ◦ f )(x)) so is h (g(f (p)))(g ◦ f ) (p) = h (g(f (p)))g (f (p))f (p).
dx
Thus the derivative of the three-way composite exists, and the two alternative
notations that we have for it are
du du dz dy
(h ◦ g ◦ f ) (p) = h (g(f (p)))g (f (p))f (p) and = .
dx dz dy dx
Despite its reluctance to name points, the second strikes many people as much
more readable. It is also useful in helping us to remember what the chain rule
says, because it looks as if the two dz and the two dy cancel out – of course,
that is emphatically not what is really happening, since du/dx and its cousins
are not fractions. As an aide memoire, however, the fact that they multiply as
if they were fractions makes it easy to keep track of what the chain rule is
telling us.
12.2.14 Example (Assuming for the moment that the derivatives of sin x and ex
are cos x and ex respectively) we differentiate the function sin((x3 + ex )7 ).
j (x) = h (g(f (x)))g (f (x))f (x) = cos(g(f (x))) × 7(f (x))6 × (3x2 + ex )
= cos((x3 + ex )7 ) × 7(x3 + ex )6 × (3x2 + ex ).
y = sin u, u = v7 , v = x3 + ex .
12.2 THE DERIVATIVE 211
Then
dy du dv
= cos u, = 7v6 , = 3x2 + ex , so
du dv dx
dy dy du dv
= = cos u × 7v6 × (3x2 + ex )
dx du dv dx
= cos((x3 + ex )7 ) × 7(x3 + ex )6 × (3x2 + ex ).
12.2.15 Note For the sake of (relative) completeness we should also discuss here
the rule for differentiating an inverse function (see Chapter 8 for more detailed
comments on inverse functions, including their continuity) and we shall first do
so rather informally.
If f : (a, b) → R is strictly increasing or strictly decreasing and f (p) = 0 for
a point p ∈ (a, b) then, merely because f is one-to-one from (a, b) onto its range
f ((a, b)), the inverse mapping f −1 : f ((a, b)) → (a, b) exists. Better than that,
though, f −1 is differentiable at f (p) and its derivative is
1
(f −1 ) (f (p)) = .
f (p)
In what we called ‘heritage’ notation in the last paragraph, this can be expressed by
saying that if y = f (x) is strictly monotonic on an open interval and dy/dx is non-
zero at a point, then the inverse x = g(y) is also differentiable at the corresponding
point and
dx 1
=
dy dy
dx
(provided we carefully keep track of the points at which the derivatives are
calculated). Once again, although these symbols are not fractions, we see that they
can be manipulated as if they were, and the observation helps us to hold the result
in mind.
√
As a small illustration, we determine the derivative of 3 x at any point. Starting
with the function f (x) = x3 , which is strictly increasing on R and has derivative
√
3p2 at each point p, we see that the inverse is given by f −1 (x) = 3 x, and its
derivative at f (p) = p3 is
1 1 p
= 2 = 3
f (p) 3p 3p
√
provided we avoid division by zero, of course. Putting x = p3 , that is, p = 3 x, this
√
says (more readably) that the derivative of 3 x at any x except 0 is
√
3
p x 1
3
= = x−2/3 .
3p 3x 3
√
Switching to the alternative view using heritage notation: if y = 3 x then x = y3
and so dx/dy = 3y2 . Provided that this is non-zero, we therefore get
212 12 DIFFERENTIATION — THE SLOPE OF THE GRAPH
dy 1 1 1 1 √ 1 2
= = 2 = y−2 = ( 3 x)−2 = x− 3 (x = 0)
dx dx 3y 3 3 3
dy
(if we take care to track the points at which the derivatives are calculated).
Here is a proof of the result that we have outlined over the last page.
−1 1
f (f (p)) = .
f (p)
Proof
Differentiability at p tells us that f is defined (and continuous) on a small open
interval centred on p, so there is a small open interval centred on f (p) contained
in the range (see Lemma 8.6.4 if this is not clear). Thus, for all sufficiently small
non-zero k, f (p) + k is in that range and we can find h = 0 such that
f (p) + k = f (p + h).
(To be fussy, h depends on k so we should really write it as h(k), but that would
make the algebra harder to read.)
Now, again using g to stand for the inverse of f , to investigate g at f (p) we must
look for a limit (as k → 0) of:
12.3.1 Lemma
Proof
By assumption,
f (x) − f (p)
x−p
converges to the limit f (p) as x → p. Choose a sequence (xn ) of numbers greater
than p that converges to p and we get both f (xn ) − f (p) ≥ 0 and xn − p > 0, and
therefore also
f (xn ) − f (p)
≥0
xn − p
for all n. Therefore f (p) is the limit of a sequence of non-negative numbers, and
so is non-negative itself (via ‘taking limits across an inequality’, Theorem 4.1.17).
The proof of part 2 is very similar.
12.3.2 Notes
is differentiable at every point of the real line, and f (0) = +0.5, and yet in
every interval of the form (−δ, δ) there are points at which f is strictly less
than 0, and sub-intervals on which f is strictly decreasing.
3. We shall find converses along the lines of: if f (x) ≥ 0 at every point of an
interval, then f is increasing on that interval (and so on).
we also call the point (p, f (p)) a local maximum point on the graph of f .
• We say that f has a local minimum at p ∈ I if, for some δ > 0,
we also call the point (p, f (p)) a local minimum point on the graph of f .
Bear in mind that a local maximum may very well not show us an overall
maximum value that the function might reach in the interval: for one thing, there
may be several local maxima; for another, it is possible (depending partly on
the type of interval) that the function is unbounded above, or never attains the
supremum of its values. Consider, for instance, the functions whose graphs are
sketched here:
In the same way, the value at a local minimum might not be an overall, global
minimum value on the interval.
Also be aware that, at an endpoint of the interval I (in the case where it has an
endpoint that belongs to I), the local-maximum and local-minimum criteria only
pay attention to what happens on one side of the point (since the function is not
defined on the other side). For instance, f (x) = x2 on the interval [−2, 3] has a
local maximum at x = −2 because, for instance, f (−2) ≥ f (x) for every x within
[−2, 3] ∩ (−2 − 0.5, −2 + 0.5) even though that intersection, namely [−2, −1.5),
only contains points at and on the right of −2.
Proof
Remember that differentiability at p includes the fact that f is defined throughout
some open interval (p − δ, p + δ) centred on p and, if f also has a local maximum
at p, we can make that number δ small enough to ensure that f (p) ≥ f (x) for every
216 12 DIFFERENTIATION — THE SLOPE OF THE GRAPH
f (xn ) − f (p)
f (xn ) − f (p) ≤ 0, xn − p > 0 and so ≤ 0.
xn − p
Since f (p) is the limit of that last fraction, we now get f (p) ≤ 0 also.
Repeat the argument with a sequence (yn ) of numbers less than p that converges
to p, and we find that f (p) ≥ 0. To reconcile the two findings, we must have
f (p) = 0.
To establish the result for a local minimum, apply what we have just discovered
to the function (−f ).
Recall that if a function f is continuous on a bounded closed interval [a, b], then
it must reach a greatest value (and a smallest value) somewhere in that interval.
From the previous lemma there are three possibilities about a point where it does
this: this point could be a point where f takes the value zero, or a point where f is
not differentiable, or an endpoint of the interval. Each of the three possibilities can
actually occur, as even a few rough sketch graphs will readily indicate:
max
max
max
12.3.6 Theorem Let f be continuous on [a, b] and differentiable on (a, b). Then
f takes its greatest value (and its smallest value) either at a stationary point of f , or
at an endpoint of the interval.
12.3.7 Note Stationary points do not have to be local (or global) maxima nor
minima: for instance, x = 0 is a stationary point for f (x) = x3 on the interval
[−1, 1], but it is not a maximum/minimum of any kind. For a more complicated
example,
12.3 UP AND DOWN, MAXIMUM AND MINIMUM 217
12.3.8 Example (Assuming for the moment that the derivative of ex is ex ) find the
greatest and least values of f (x) = xe−x on the interval [0.5, 2].
Solution
Using both the product rule and the chain rule, we find that
f (x) = 1e−x + xe−x (−1) = (1 − x)e−x which is zero only at x = 1. The greatest
and least values of f can therefore only occur at 1 or at an endpoint of the interval.
Since the values of f (x) at x = 0.5, 1, 2 (respectively) are 0.5e−0.5 , e−1 , 2e−2 and
evaluate approximately to 0.303, 0.368, 0.271, it is clear that the maximum value
on the interval is e−1 and that the minimum value is 2e−2 .
x
12.3.9 EXERCISE Let f (x) = 2 for each real number x. Find the largest and
x +1
smallest values that f (x) can attain while x ranges over:
1. the interval [−10, 0],
2. the interval [− 12 , 2],
3. the interval [2, 6],
4. the whole real line (if indeed such largest and smallest values exist: note that
12.3.6 does not directly apply on this unbounded interval).
Now we meet the two theorems that will have significant roles to play later in the
text, as well as being useful in getting the converse implications that we mentioned
earlier. The first is actually a special case of the second, but is the version that we
are able to prove almost immediately from our earlier work.
Proof
In the special case where f is constant on the whole of [a, b], this result is trivial
and immediate: any point c ∈ (a, b) will do equally well.
If not, then either f takes somewhere in (a, b) values that are strictly greater
than f (a), or f takes somewhere in (a, b) values that are strictly smaller than f (a)
(or both, of course). If the former, then the greatest value of f on [a, b] is not
attained at a nor b, but at a point c in (a, b): and an earlier result (12.3.4) tells us
that f (c) = 0.
218 12 DIFFERENTIATION — THE SLOPE OF THE GRAPH
f (a) = f (b)
a any c b
Rolle’s theorem – case 1
f (a) = f (b)
a ? c b
Rolle’s theorem – case 2
f (a) = f (b)
a ? c b
Rolle’s theorem – case 3
12.3.11 Example To show (assuming basic results about trig functions) that the
equation
(1 + 3x2 ) sin x + (x + x3 ) cos x = 0
has at least one solution in the interval (0, π ).
Solution
The function (x + x3 ) sin x is continuous on [0, π ] and differentiable on (0, π ), and
takes equal values (zero) at 0 and at π , so RT applies and tells us that its derivative,
namely (1 + 3x2 ) sin x + (x + x3 ) cos x, is zero at some point between 0 and π , as
the question required.
12.3.12 Example To show using Rolle’s theorem (and assuming basic results
2
about trig functions) that the equation tan x = has at least one solution in the
x
interval (0, π/2).
12.3 UP AND DOWN, MAXIMUM AND MINIMUM 219
Solution
(The difficulty here is to decide what function to apply the theorem to. Re-
writing the equation as x tan x = 2, and then as x2 tan x = 2x, and then as
x2 sin x/ cos x = 2x, and finally as x2 sin x − 2x cos x = 0 reveals x2 cos x as
the key formula.)
The function x2 cos x is continuous on [0, π/2] and differentiable on (0, π/2),
and takes equal values (zero) at 0 and at π/2, so RT applies and tells us that its
derivative, namely −x2 sin x + 2x cos x, is zero at some point strictly between 0
and π/2, whence the result follows (noting that, on the open interval (0, π/2),
neither x nor cos x is zero, so dividing by them as we unpick the roughwork is
legitimate).
12.3.13 EXERCISE Using Rolle’s theorem (and proof by contradiction) show that,
whatever constants a and b are selected, the graph of
12.3.14 EXERCISE Show that there is a sequence (cn ) in the interval (0, π/2) such
that, for each n ∈ N:
n
tan(cn ) = .
cn
(Hint: for each positive integer n, consider the function described by the formula
xn cos x.)
12.3.15 The first mean value theorem (‘FMVT’) Let f be continuous on [a, b] and
differentiable on (a, b). Then there is at least one point c in (a, b) such that
f (b) − f (a)
f (c) = .
b−a
Proof
(The idea is to modify the given function, by subtracting a multiple of x, to make
it satisfy all three conditions of Rolle’s theorem instead of just the first two.)
We seek a constant λ for which the function g(x) = f (x) − λx (which will at
least be continuous on [a, b] and differentiable on (a, b) because f was) also obeys
g(a) = g(b). Easy algebra gives the answer that
f (b) − f (a)
λ=
b−a
and now Rolle applied to g gives us c ∈ (a, b) such that 0 = g (c) = f (c) − λ, as
was required.
220 12 DIFFERENTIATION — THE SLOPE OF THE GRAPH
f (b) − f (a)
12.3.16 Remark Since is precisely the gradient of the straight line
b−a
that joins the first and last points P = (a, f (a)) and Q = (b, f (b)) on the graph of f
over [a, b], this result has easy geometrical interpretations: there is a point on the
graph (not at its first or last points) where the tangent to the curve runs parallel to
the straight line PQ; since PQ gives a kind of ‘overall slope’ for this section of the
graph, that says there is a point where the ‘instantaneous slope’ (of the curve itself)
equals the average slope, the mean value of the gradient over the whole interval.
There may, of course, be more than one such point.
c
First mean value theorem
12.3.17 Example To show (assuming standard facts about the trig functions) that
the equation
9
π(1 + x3 ) cos(π x) + 3x2 sin(π x) =
4
has at least one solution in the interval (0, 12 ).
Solution
We notice that the left-hand side of the equation is the derivative of
f (x) = (1+x3 ) sin(π x) (using the product and chain rules). Now f (x) is continuous
on [0, 12 ] and differentiable on (0, 12 ), and f (0) = 0 and f ( 12 ) = ( 98 ) sin( π2 ) = 98 , so
the FMVT guarantees at least one solution in (0, 12 ) of the equation
9
−0
f (x) = 8
1
,
2 −0
12.3.18 EXERCISE Let n be a positive integer. Use the two different methods
indicated to verify that the equation
√
nxn−1 = 1 − xn
12.3 UP AND DOWN, MAXIMUM AND MINIMUM 221
Proof
All four proofs are really the same argument.5 For instance, if f (x) > 0 for each
x ∈ (a, b) then, for any choice of p, q in [a, b] we can apply FMVT to f on the
interval [p, q] to obtain
f (q) − f (p)
= f (c), some c ∈ (p, q) ⊆ (a, b)
q−p
and so f (q) − f (p) = (q − p)f (c) is strictly positive; hence f (q) > f (p) and we
conclude that f is strictly increasing on [a, b].
Here is a rather classy application of the FMVT that uses limit of a derivative
to prove existence of a derivative at a single point at which differentiability was in
doubt:
Proof
For h positive and smaller than δ, the function f satisfies the FMVT conditions on
the interval [a, a + h] so there is a point ch in (a, a + h) such that
f (a + h) − f (a)
= f (ch ).
h
5 With a little care, the results can be extended to apply to unbounded intervals also: see
Example 12.3.21.
222 12 DIFFERENTIATION — THE SLOPE OF THE GRAPH
f (a + h) − f (a)
lim = .
h→0+ h
f (a + h) − f (a)
lim = ,
h→0− h
and the equality of the one-sided limits gives (see Theorem 10.3.9) what we wanted.
12.3.21 Example To show that a function f that is continuous on [a, ∞) and has
positive derivative at every point of (a, ∞) must be strictly increasing on [a, ∞).
Solution
For any p, q in [a, ∞) such that p < q, we observe that f is continuous on [p, q] and
differentiable (with positive derivative) on (p, q). By 12.3.19, f is strictly increasing
on [p, q]. In particular, f (p) < f (q): hence the result.
Proof
(This result and the next exercise are simply extensions of paragraph 10.3.5.) Put
M = sup{f (x) : x ∈ (a, b)} (which must exist since the set of values here is non-
empty and bounded above). Given ε > 0, by the definition of supremum we can
find x ∈ (a, b) for which f (x ) > M − ε. Now since f is increasing:
Proof
Since x−1 is always positive on (0, ∞), ln x is strictly increasing there. By the
product rule, f (x) = x(x−1 ) + (ln x) × 1 = 1 + ln x which is positive on (1/e, ∞)
since ln x > −1 = ln(e−1 ) there, therefore f is strictly increasing on [1/e, ∞).
Likewise, f (x) is negative on (0, 1/e) since ln x < −1 = ln(e−1 ) there, therefore
f is strictly decreasing on (0, 1/e). That already tells us that the smallest value that
f can take is f (1/e) = −1/e.
(The fact that f (1/e) = 0 flags up 1/e as the only stationary point of f is also
informative; on its own, though, it doesn’t indicate what sort of a stationary point
occurs here.)
Secondly, the observation that f is negative on (0, 1/e), that is, bounded above
by 0 (as well as decreasing) allows us to invoke the previous Exercise 12.3.23 to
conclude that limx→0+ f (x) exists.
In the present work, we shall explicitly refer to higher derivatives before using
any such cluster of symbols. The higher derivatives are especially important in
Taylor’s theorem (see Chapter 16), but they also have a neat and well-known
application to the classification of stationary points which we shall deal with now.
12.4.2 Theorem: the second derivative test for local extrema Suppose that a
function f is twice differentiable on an open interval including the number p, that
f (p) = 0 and that f (x) is continuous at p. Then
• if f (p) > 0 then f has a local minimum at p,
• if f (p) < 0 then f has a local maximum at p.
(This is the result that has embedded itself in the minds of generations of
mnemonic-laden school pupils through the slogan ‘POS MIN, NEG MAX’.)
Proof
We shall deal only with the first scenario since the second is so similar.
Since f is continuous at p and f (p) > 0, we can choose δ > 0 such that f is
positive at every point of the interval (p − δ, p + δ).
Now we work backwards from knowledge of f to knowledge of f : for each
x ∈ (p − δ, p) apply FMVT to f on [x, p] and we find that
f (p) − f (x)
= f (y) for some y ∈ (x, p)
p−x
that is, 0 − f (x) = f (y)(p − x) is positive, and so f (x) itself is negative at every
point in (x, p). This implies (see 12.3.19) that f is (strictly) decreasing on [x, p] so,
in particular, f (x) > f (p).
In the same fashion we show that for every x ∈ (p, p + δ) we again get
f (x) > f (p). Hence the claimed result.
The reader will almost certainly be familiar with standard exercises such as the
following:
12.4.3 Example Find and classify the stationary points on the graph of the
function f (x) = x4 − 8x3 − 8x2 + 96x + 144.
Solution
It is easy to differentiate f two (or more) times:
and at this stage, we know that 2, −2 and 6 are the x-coordinates of the three
stationary points. We find their y-coordinates by substituting these numbers into
the f (x) formula, and identify
The second derivative test informs us that A = (−2, 0) and C = (6, 0) are local
minima, while B = (2, 256) is a local maximum.
f (p + h) − f (p)
(h) = − f (p)
h
would be a junk function of h, except that it fails to be defined at h = 0. We can
easily fix this: the improved description
f (p + h) − f (p)
(h) = − f (p) if h = 0; (0) = 0
h
remedies this flaw, and makes into a junk function of h. This, in turn, gives rise
to an alternative description of differentiability:
Proof
If f is differentiable, then the above discussion shows how to define a suitable junk
function. Conversely, if such an does exist, then (for sufficiently small non-zero h)
f (p + h) − f (p)
= + (h),
h
Alternative proof
Let and m stand for the numbers f (p) and g (q). By the lemma, there are two
junk functions and η such that (for sufficiently small h and k)
and
g(q + k) = g(q) + k(m + η(k)).
We put k = f (p + h) − f (p), noting that k is itself a junk function of h because the
differentiable function f is necessarily continuous. Now
that is,
(g ◦ f )(p + h) = (g ◦ f )(p) + h(m + junk)
where junk=m(h) + η(k) + (h)η(k) is a junk function by the comments made
above. Using the lemma again, this completes the demonstration that g ◦ f is
differentiable and that its derivative is m = f (p)g (q) = g (f (p))f (p).
.........................................................................
13.1.2 Notes
which is safe provided you understand that both m and n are restricted
to be ≥ nε .
• Carefully compare this definition with the definition of convergence, and you
will see that there is only one significant difference. To be specific: if you replace
Undergraduate Analysis: A Working Textbook, Aisling McCluskey and Brian McMaster 2018.
© Aisling McCluskey and Brian McMaster 2018. Published 2018 by Oxford University Press
230 13 THE CAUCHY CONDITION — SEQUENCES WHOSE TERMS PACK TIGHTLY
Proof
If xn → then, given ε > 0, we can find n0 such that |xn − | < ε/2 whenever
n ≥ n0 . Then, if both m and n are ≥ n0 , we see that
|xm − xn | = |xm − + − xn |
≤ |xm − | + | − xn | (why?1 )
< ε/2 + ε/2 = ε
Proof
Suppose that (xn )n∈N is Cauchy. By the definition (and choosing ε = 1 for con-
venience) there is a positive integer n0 such that all the terms of the sequence
from the nth0 one onwards are separated by less than 1 unit so, in particular, they
are less than 1 unit distant from xn0 . The earlier terms x1 , x2 , x3 , · · · , xn0 −1 may
well be further away from xn0 , but there are only a finite number of them: so we
can find the biggest distance from one of them to xn0 . . . call it M. If we now put
M = max{M, 1} then every xn lies within the distance M from xn0 , so (xn )n∈N
is bounded.
13.1.5 Lemma A Cauchy sequence cannot have two subsequences with different
limits.
Proof
Suppose it did: that is, suppose (xn )n∈N is Cauchy, that subsequences (xnk )k∈N and
m−
(xmj )j∈N converge (respectively) to and m, and that < m. Take ε = > 0.
3
+ε m−ε
m
Cauchyness tells us that we can find nε such that |xm − xn | < ε whenever
m, n ≥ nε . Yet all but finitely many of the xnk belong to the interval ( − ε, + ε)
and all but finitely many of the xmj belong to the interval (m − ε, m + ε). So we
can find such values of nk and mj both bigger than nε , and then the gap between
xnk and xmj must exceed ε: contradiction.
Proof
From a result in Chapter 5 (Proposition 5.3.6) that we have had little need to use
until now, in order to be divergent a sequence must either be unbounded or must
possess two subsequences that have different limits. By the second and third of the
above lemmata, a Cauchy sequence cannot do either of these, so it must converge.
The converse (convergent implies Cauchy) was the first lemma above.
We offer an alternative method of proof for this theorem, partly because it is such
a central result, and partly because it uses the following lemma which is useful in
its own right.
13.1.7 Lemma If a Cauchy sequence has even one subsequence that converges,
then the entire sequence converges also (and to the same limit).
Proof
Suppose that (xn )n∈N is a Cauchy sequence and that one of its subsequences
(xnk )k∈N converges to . Given ε > 0 we can use these two facts to find positive
integers n0 and kε such that
Now for any n ≥ n0 , choose a value k of k that is greater than both n0 and kε . Then
nk ≥ k ≥ n0 and we find
|xn − | = |xn − xnk + xnk − | ≤ |xn − xnk | + |xnk − | < ε/2 + ε/2 = ε.
Proof – alternative
Suppose that (xn )n∈N is Cauchy (and therefore bounded, by Lemma 13.1.4).
According to Bolzano-Weierstrass, it has a convergent subsequence. According to
Lemma 13.1.7, it is itself convergent. The converse is handled as previously.
13.1.9 Example Given a sequence (xn )n∈N with the property that, for each
n ∈ N, |xn − xn+1 | < 0.9n , we show that (xn ) must be convergent.
Solution
We clearly do not have enough information to determine or guess the limit, nor
to use monotonicity, so the Cauchy criterion is the only trick available to us.
Whenever n < m, we see that
Now 0.9n → 0 so, given ε > 0, we can locate n0 ∈ N such that, whenever n ≥ n0 :
10(0.9)n < ε. It follows that, provided that n ≥ n0 (and therefore also m ≥ n0 ), we
shall have |xm −xn | < ε. Therefore (xn )n∈N , being Cauchy, must also be convergent.
13.1.10 Note Fairly slight changes to the above argument will show that if the step
|xn − xn+1 | from one term of a sequence to the immediately next one is less than a
constant times a power of t for some constant t ∈ (0, 1), then the sequence (xn ) is
Cauchy. Of course, in that circumstance, |xn − xn+1 | tends to zero. It is important,
however, to realise that the condition |xn − xn+1 | → 0 on its own is not enough to
guarantee that (xn ) shall be Cauchy. One straightforward way to illustrate this is to
let xn be the nth partial sum of the harmonic series
1 1 1 1 1
xn = 1 + + + + + ··· + .
2 3 4 5 n
13.1 CAUCHY EQUALS CONVERGENT 233
Then we know that (xn ) is not convergent (since the harmonic series diverges) and
1
is therefore not Cauchy, and yet |xn − xn+1 | = which certainly converges
n+1
to zero.
Informally, we sometimes say that the step in the latter case is tending to zero but
not rapidly enough, whereas in instances like Example 13.1.9 the step is decaying
geometrically or exponentially and that this is fast enough to force Cauchyness. You
can explore this area a little further in some of the additional Exercises (numbers
184 to 190).
13.1.11 EXERCISE Given a sequence (xn )n∈N with the properties that, for each
n ∈ N:
|xn − xn+2 | < 10(0.6)n and |xn − xn+5 | < 20(0.7)n ,
show that (xn ) converges.
[Suggestion: xn − xn+5 + xn+5 − xn+3 + xn+3 − xn+1 = xn − xn+1 .]
∞
sin(k3 + 5k2 − 3) − 4 cos(k2 + 3k − 7)
(1.1)k
k=1
is convergent, by establishing that the sequence of its partial sums is Cauchy (and
assuming basic trigonometric facts).
‘Showing that the sequence of partial sums is Cauchy’ will turn out, in Chap-
ter 14, to be pretty much the fundamental tool for establishing convergence of
a series.
13.1.13 Example We use the Cauchy condition to revisit the proof that the
harmonic series n−1 diverges.
Solution
For any given positive integer n0 , first find a positive integer m such that 2m is
bigger than n0 . Then, with the usual notation for partial sums
n
1
Sn =
k
k=1
we see that
m+1
2
1
S2m+1 − S2m =
k
k=2m +1
234 13 THE CAUCHY CONDITION — SEQUENCES WHOSE TERMS PACK TIGHTLY
is the total of 2m fractions of which the smallest one is 1/2m+1 . This total therefore
exceeds 1/2. In other words, no matter how large we choose n0 , there will be partial
sums later than the nth0 that differ by more than 0.5; so the sequence of partial sums
here is not Cauchy, and cannot converge.
13.1.14 Example Given that (xn )n∈N and (yn )n∈N are both Cauchy sequences, to
show that their ‘term-by-term product’ (xn yn )n∈N is also Cauchy.
Solution
(This can become quite messy if we try to argue from the definition of Cauchy, so
we won’t.)
Since (xn ) and (yn ) are both Cauchy, they must each be convergent to some
limit: say, xn → and yn → (as n → ∞). Algebra of limits now tells us that
xn yn → , so the product sequence converges, and is therefore Cauchy too.
13.1.15 EXERCISE Of the following two statements, just one is true in general.
Give a proof for the one that is true, and find a counterexample that disproves the
false one.
1. If (xn )n∈N and (yn )n∈N are both Cauchy sequences, and there is a (strictly)
positive number δ > 0 such that |yn | ≥ δ for all n ≥ 1, then the ‘term-by-term
quotient’ sequence
xn
yn n∈N
must also be Cauchy.
2. If (xn )n∈N and (yn )n∈N are both Cauchy sequences, and |yn | > 0 for all n ≥ 1,
then the ‘term-by-term quotient’ sequence
xn
yn n∈N
Solution
Because (xn )n∈N is Cauchy, it must converge: xn → for some . Also, since
a ≤ xn ≤ b for all n, we have a ≤ lim xn = ≤ b also, that is, ∈ [a, b].
Because continuous functions preserve convergence, it follows that f (xn ) → f ().
Thus (f (xn )) is a convergent sequence, and consequently Cauchy.
13.1 CAUCHY EQUALS CONVERGENT 235
is convergent.
14.1.2 Note
• Let us be clear immediately that convergence and absolute convergence are not
the same thing. For instance, we know from the alternating series test that the
‘alternating harmonic series’
1
(−1)k
k
is convergent; and yet
1
(−1)k 1 =
k k
is the (notoriously divergent) harmonic series. In the present terminology, the
alternating harmonic series is convergent, but it is not absolutely convergent.
• A series that is convergent but not absolutely convergent is called conditionally
convergent.
Undergraduate Analysis: A Working Textbook, Aisling McCluskey and Brian McMaster 2018.
© Aisling McCluskey and Brian McMaster 2018. Published 2018 by Oxford University Press
238 14 MORE ABOUT SERIES
• On the other hand, you will never come across a series that is absolutely
convergent but is not convergent. No such series can exist: and the way to see
this important truth is to use the idea of Cauchy sequences that we encountered
just a few pages back. Recall that Cauchy and convergent are equivalent for
sequences, and so a (general) series converges if and only if its sequence of
partial sums is Cauchy.
• We also remind you again about the triangle inequality: that |a + b| ≤ |a| + |b|
for arbitrary numbers a and b. This basic form of the inequality extends
immediately to the three-term version
which we have occasionally used, and which features again in the next
demonstration:
Proof
Let xk be absolutely convergent, that is, let |xk | be convergent. Our task is to
show that xk is convergent and, since that means studying two different partial-
sum sequences (one for each series), we must take care to have different notations
for the two of them. For instance, let us put
n
Sn = x1 + x2 + x3 + · · · + xn = xk , and
1
n
Sn = |x1 | + |x2 | + |x3 | + · · · + |xn | = |xk | .
1
Convergence of |xk | tells us that (Sn ) is a (convergent, and therefore) Cauchy
sequence so, given ε > 0, we can find n0 such that
Since the last line is empty of information when m equals n, we may as well assume
m = n here. Also, there is no harm in assuming that m is the larger and n is the
14.1 ABSOLUTE CONVERGENCE 239
smaller, since if we swop them over, the modulus signs will ensure that |Sm − Sn |
will remain unaltered. Thus we can write the previous display in slightly more
convenient (but equivalent) forms:
|Sm − Sn | < ε whenever m > n ≥ n0 , that is,
m
n
|xk | − |xk | < ε whenever m > n ≥ n0 , that is,
1 1
m
|xk | < ε whenever m > n ≥ n0 , that is,
n+1
m
|xk | < ε whenever m > n ≥ n0 .
n+1
14.1.4 Notes
1. What we now have is the basis of a strategy for deciding upon the convergence
or divergence of a general series (as opposed to a series of non-negatives). If
we are given such a series xk , we look instead at the modulussed series
|xk | and examine it by the techniques we acquired in Chapter 7. If they
show that |xk | is convergent, in other words, that xk is absolutely
convergent, then the theorem tells us that the original xk is convergent also,
and the task is completed. So far, so good.
2. However, what if our Chapter 7 skills tell us that |xk | is NOT convergent?
Then there is more and different work to do because the discovery, that xk is
not absolutely convergent, does not tell us whether it is convergent or not.
(Look again at the alternating harmonic series: itis not absolutely
√ convergent,
but it is convergent; in contrast, a series such as (−1) k is not absolutely
k k
By way of illustration, we’ll now re-work a couple of Chapter 7’s examples but
without the assumption that the parameter t or x is positive.
14.1.5 Example For precisely which real values of t does the series
3n2 − 1 n
tn
2n2 − 1
converge?
Solution
Put xn = the nth term here, and consider instead |xn |. All its terms are non-
negative, and the nth root of |xn | is
3n2 − 1
|t|
2n2 − 1
3|t|
which has a limit of 2 so, by the root test:
1. for |t| < 2/3 the limit is < 1 and so the series |xn | converges, that is, xn
is absolutely convergent, and therefore also convergent;
2. for |t| > 2/3 the limit is > 1 so |xn | cannot tend to zero, and neither can xn , so
the original series xn must diverge.
It remains to ponder what happens when t is exactly ±2/3. Luckily, in that
borderline case |xn | itself is
14.1 ABSOLUTE CONVERGENCE 241
n n 2 n
3n2 − 1 2 6n − 2
=
2n2 − 1 3 6n2 − 3
which is (just) greater than 1 (in the final fraction, the top line exceeds the bottom
line). This shows once again that neither |xn | nor xn can tend to zero, and so the
series xn diverges.
We conclude that the given series xn is (absolutely) convergent when
−2/3 < t < 2/3, and divergent for every other value of t.
14.1.6 Example For exactly which real values of x does the following series
converge?
(n + 1)!(2n + 2)! xn
wn where wn =
(3n + 3)!
Solution
If x = 0 then, although a ratio test would not be legal, the
series definitely con-
verges;1 so from now on assume x = 0 and consider |wn |. Its growth rate
|wn+1| /|wn | cancels to
which is greater than 1 (look at the individual factors in the top and bottom lines).
Thus |wn+1 | > |wn |, the terms of |wn | are increasing and cannot converge to
zero, and thus wn cannot tend to zero either and wn again diverges.
We conclude that wn converges (absolutely) when −27/4 < x < 27/4 but
diverges in all other cases.
1 Every term is zero, every partial sum is zero, and the limit of the partial sums is zero (and
certainly exists)
242 14 MORE ABOUT SERIES
(3n + 1)n
tn
nn−1
14.1.8 EXERCISE Determine the range of values of the real number x for which
the series
(n!)6 x3n
(6n)!
converges.
Many textbooks present the ratio and root tests as tests upon general series,
rather than as tests upon series of non-negative terms. Our preference is to proceed
as above, that is, consciously to switch from xk to |xk |, use the appropriate test
there, and then switch back to see what we have learned about the original series.
(For one thing, this forces awareness of the important fact that we are dealing with
two distinct series, not one.) For the sake of completeness, however, here are the
two tests as applicable to general series:
∞
14.1.9 The nth root test for general series Suppose that n=1 an is a series of
√
real terms and that n |an | converges to a limit (as n → ∞). Then:
1. if < 1 then the series converges absolutely,
2. if > 1 then an and |an | do not tend to zero, and therefore the series diverges.
∞
14.1.10 D’Alembert’s ratio test for general series Suppose that n=1 an is a
series of non-zero terms and that the growth rate |an+1 |/|an | converges to a limit
(as n → ∞). Then:
1. if < 1 then the series converges absolutely,
2. if > 1 then an and |an | do not tend to zero, and therefore the series diverges.
To explore the question of its convergence, we must examine its partial sums which,
in the present example, begin with
x1 +x2 , x1 +x2 +x3 , x1 +x2 +x3 +x4 +x5 +x6 , x1 +x2 +x3 +x4 +x5 +x6 +x7 +x8 , · · · .
Notice that these constitute a subsequence of the partial-sum sequence for the
original series – indeed, this was inevitable, since the nth partial sum of the
bracketed series is simply the m(n)th original partial sum where m(n) is the label on
the last term of the nth bracket (regarding each unbracketed term as sitting inside
an invisible pair of brackets on its own). That is the only insight needed to establish:
Proof
The partial-sum sequence for the bracketed series is a subsequence of the partial-
sum sequence for the original series, and therefore converges to the same limit.
14.2.2 Example On the other hand, removal of existing brackets can completely
change the convergence status of a series. For a simple example, consider:
(1 − 1) + (2 − 2) + (3 − 3) + (4 − 4) + (5 − 5) + · · · .
Clearly this converges, since every single term (every single bracket) is zero, and
0 converges to 0. However, if we remove all2 the brackets, it becomes
1 − 1 + 2 − 2 + 3 − 3 + 4 − 4 + 5 − 5 + ···
don’t merely fail to converge to zero, they fail to converge at all since they are
unbounded (the (2n − 1)th partial sum is n, for each n ∈ N).
a1 + a2 + a3 + a4 + a5 + · · ·
let us imagine a typical rearranged series consisting of exactly the same terms but
in a different order, such as
a9 + a3 + a41 + a2 + a17 + · · · .
Look at the first few partial sums of the rearranged series (as indeed we must, if we
seek its sum to infinity):
The bad news is that these are, of course, not partial sums of the original series.
Give them a temporary name, say, random handfuls.
The better news is the observation that this is an increasing sequence, just as was
the partial-sum sequence of the original series . . . and for increasing sequences,
limit and supremum are the same thing. Furthermore, each of the random
handfuls is part of a partial sum, and therefore less than or equal to a partial
sum since all the terms are non-negative.3 Each random handful is therefore ≤
the supremum of all the partial sums, that is, the limit of the original series, so
the supremum of the random handfuls (= the limit of the rearranged series) must
be ≤ the limit of the original series. Presumably we now only have to reverse the
argument to obtain the inequality the other way round?
9 41
3 For instance, a9 + a3 is less than 1 ak , a9 + a3 + a41 is less than 1 ak and so on.
14.2 THE ‘ROBUSTNESS’ OF ABSOLUTELY CONVERGENT SERIES 245
Proof
Given that bn is a rearrangement of a series an that converges to ,
and in
which an ≥ 0 for all n, recall that is the supremum of the partial sums of an .
Each partial sum of bn is the sum of finitely many an s scattered in some
unpredictable pattern within the series an , so these an s must all occur before
some particular am in the original sequence and their total is therefore ≤ the mth
partial sum of an , and therefore also ≤ .
Since the partial-sum sequence for bn is also increasing, it converges to some
where ≤ .
Now an is equally a rearrangement of bn so, by the identical argument,
≤.
Hence = .
a+ +
n = an if an ≥ 0, an = 0 if an < 0;
a− −
n = −an if an < 0, an = 0 if an ≥ 0.
Then
an = a+ −
n − an , |an | = a+ −
n + an
in all cases (just check it out for non-negative an and for negative an : it works in
both cases) but, importantly, every a+ −
n and every an is non-negative (and therefore
we can use the preceding theorem on them separately).
Think what the remark an = a+ −
n − an does to a typical partial sum of an
and you will understand what we meant by ‘segregating out the positives from the
negatives’; for instance:
(3 − 5 − 2 + 1 + 6 − 4 + 2 − 7 − 3)
= (3 + 0 + 0 + 1 + 6 + 0 + 2 + 0 + 0) − (0 + 5 + 2 + 0 + 0 + 4 + 0 + 7 + 3).
n
n
n
ak = a+
k − a−
k
1 1 1
and, likewise,
n
n
n
|ak | = a+
k + a−
k.
1 1 1
246 14 MORE ABOUT SERIES
n −
The last display line shows that if n1 a+ kand 1 ak both converge, then so must
n n
|a |; yet the converse is also true: if |a | converges then, because (for all n)
1 k 1 k n +
0 ≤ a+ ≤ |a | and 0 ≤ a− ≤ |a |, the comparison test tells us that both
1 ak and
n − n n n n
a
1 k converge. Furthermore, their sums-to-infinity add in the obvious manner.
We have proved:
14.2.7 Lemma The series an is absolutely convergent if and only if both a+
− n
and an converge. Furthermore, their sums then satisfy
|an | = a+
n + a−
n and
an = a+
n − a−
n .
Proof
Rearranging a given absolutely convergent an into a new order bn will
+ −
rearrange both an and an in exactly the same pattern. By the above, the two
latter series are convergent, and by the previous theorem, this does not alter their
sums, that is:
an = a+
n − a−
n
= b+
n − b−
n
= bn .
14.2.9 IMPORTANT EXERCISE Show that if a series an is conditionally con-
vergent, that is, convergent but NOT absolutely convergent, then both a+ n and
−
an diverge to ∞.
For most learners, the surprise is not that absolute convergence is robust under
rearrangement, but that non-absolute convergence isn’t. It turns out that this
may be demonstrated upon any convergent-but-not-absolutely-convergent series,
but we shall demonstrate using the most obvious such object – the alternating
harmonic series.
∞
14.2.10 Example The series 1 (−1)k−1 k−1 is well known to converge but not
absolutely. Furthermore, since in the expression
1 1 1 1 1 1 1 1 1
1− + − + − + − + − + ···
2 3 4 5 6 7 8 9 10
every bracket is positive, the sum-to-infinity (let us denote it by S), whatever it is,
is more than 0.5. In particular, S = 0.
14.2 THE ‘ROBUSTNESS’ OF ABSOLUTELY CONVERGENT SERIES 247
Suppose it were correct that every rearrangement of this series is also convergent
to S (and now we shall seek a contradiction). In particular, the rearrangement
1 1 1 1 1 1 1 1 1 1 1
1− − + − − + − − + − − + ···
2 4 3 6 8 5 10 12 7 14 16
is a rearrangement since each original term appears once and once only (in the
pattern of one positive term and two negative terms alternating) so it also converges
to S.
From what we saw in 14.2.1 about imposing brackets, the modified series
1 1 1 1 1 1 1 1 1 1 1
1− − + − − + − − + − − + ···
2 4 3 6 8 5 10 12 7 14 16
that is,
1 1 1 1 1 1 1 1
− + − + − + − + ···
2 4 6 8 10 12 14 16
must also converge to S. Yet the last display is precisely one half of the original
series so it converges to S/2. We deduce that S = S/2 which, since S is non-zero, is
absurd.
Conclusion: rearrangement of at least some convergent (non-absolutely conver-
gent) series can alter their convergence!
14.2.13 Note Our last task in this section is to verify that absolutely convergent
series are robust under multiplication and, prior to that, we must define what we
mean by multiplying two series together. Tempting though it may at first appear to
multiply
them‘term by term’as we did successfully for sequences (that is, to define
xk times yk to mean xk yk ), this definition completely fails to match how
series are actually used in practice, so we had better begin by a forward glance at
one of their key applications: power series representations of functions.
It is well known (and yes, we shall be checking this out in detail) that many
important functions can be represented, or even optimally defined, by power
series. For instance, you have probably encountered the following:
xn ∞
x2 x3
ex = 1 + x + + + ··· = ,
2! 3! 0
n!
248 14 MORE ABOUT SERIES
∞
x3 x5 x7 x2n+1
sin x = x − + − + ··· = (−1)n ,
3! 5! 7! 0
(2n + 1)!
∞
x2 x4 x6 x2n
cos x = 1 − + − + ··· = (−1)n .
2! 4! 6! 0
(2n)!
we define
ak xk − bk xk to be (ak − bk )xk ,
and everything turns out to run so smoothly that there was really no need to make
a fuss about it.
In the case of multiplication, it is ratherless obvious what to do. We need the
product function fg to be represented by ak x × bk x , but how then ought
k k
but, pragmatically, we are not going to waste paper and patience by writing out
endless strings of zero terms).
Take the case of two cubics, say,
f (x) = a0 + a1 x + a2 x2 + a3 x3 , g(x) = b0 + b1 x + b2 x2 + b3 x3 .
(and two more terms). Suddenly we are left with no freedom of action about how
to multiply the series: the coefficient c0 has to be a0 b0 , c1 has to be a0 b1 + a1 b0 ,
c2 has to be a0 b2 + a1 b1 + a2 b0 and so on. Any other decision we might consider
making would create a definition that didn’t even work correctly for polynomials,
let alone for general (properly infinite) power series.
This is why the following definition,4 complicated though it looks, is the right
one for our purposes:
∞ ∞
14.2.14 Definition The Cauchy product of two series5 0 ak and 0 bk is the
∞
series 0 ck defined by
c0 = a0 b0 , c1 = a0 b1 + a1 b0 , c2 = a0 b2 + a1 b1 + a2 b0 ,
and, in general,
k=n
cn = a0 bn + a1 bn−1 + a2 bn−2 + · · · + an b0 = ak bn−k .
k=0
∞ ∞
14.2.15 Theorem If the two series 0 ak and 0 bk are absolutely convergent,
with sums A and B, say, then their Cauchy product is also absolutely convergent,
and its sum is AB.
Roughwork
When we multiply a partial sum (call it An = n0 ak ) of the first series by a partial
sum Bn of the second, the various fragments ai bj do not naturally line up in a
sequence but, rather, in a two-dimensional grid such as
a0 b0 a1 b0 a2 b0 a3 b0 · · ·
a0 b1 a1 b1 a2 b1 a3 b1 · · ·
a0 b2 a1 b2 a2 b2 a3 b2 · · ·
a0 b3 a1 b3 a2 b3 a3 b3 · · ·
: : : :
and so on. There are several different ways to string that array out into a sequence
so that we can consider adding the terms up as a series. For one, we can create an
expanding list of ‘square shells’ starting at the top left hand corner (follow these on
the grid to see what we mean by that somewhat cryptic phrase):
4 We have formulated it for arbitrary series, not just for power series, so all the xn s have
disappeared; but the main application we intend is still that of power series.
5 We are starting the labelling at k = 0 instead of at k = 1, again mainly because of the focus
on power series which do naturally begin with a0 x0 in order to accommodate a constant term.
250 14 MORE ABOUT SERIES
a0 b0
+a0 b1 + a1 b1 + a1 b0
+a0 b2 + a1 b2 + a2 b2 + a2 b1 + a2 b0
+a0 b3 + a1 b3 + a2 b3 + a3 b3 + and so on · · ·
Notice that, at the end of each line, the running totals are A0 B0 , A1 B1 , A2 B2 , A3 B3
and so on – a sequence whose limit is easy to grasp.
On the other hand, if we sort out the array into a sequence/series by following
‘diagonal sweeps’ (again, please follow these on the grid to see what we mean), we
get instead:
a0 b0
+a0 b1 + a1 b0
+a0 b2 + a1 b1 + a2 b0
+a0 b3 + a1 b2 + a2 b1 + a3 b0
+a0 b4 + a1 b3 + and so on · · ·
and look: each line is now one of the Cauchy product coefficients – we are now
building up c0 + c1 + c2 + c3 + · · · as, indeed, we must do if we want to address
what this theorem claims.
What we now need is a guarantee that these quite different sorting processes will
give ultimately the same sum to infinity, that is, we need to be able to rearrange and
know that the sum is robust. For this, absolute convergence must be established
first; and for that, we have to begin with the same array but with modulus signs on
every term.
Proof
∞
Let An , Bn stand for the nth partial sums of ∞ 0 |ak | and 0 |bk | respectively
(which we know to be convergent series) and, since the partial-sum sequences are
bounded, find two positive constants P, Q such that (for all n ∈ N)
An ≤ P and Bn ≤ Q.
The various numbers |ai bj | that turn up when we multiply An and Bn together
present themselves naturally in a two-dimensional (infinite) grid:
To be precise, the items in the first (n + 1) places of the first (n + 1) rows add up
to An Bn .
Any finite selection of terms from the grid will lie within ‘the first (n + 1) places
of the first (n + 1) rows’ if we choose n big enough, so the sum total of any finite
selection is less than or equal to An Bn for that n, and therefore cannot exceed PQ.
That is, no matter how we string these items |ai bj | together into a sequence, the
resulting series (of non-negatives) has its partial sums bounded above (by PQ) and
must therefore converge. In other words, if we strip out the modulus signs from the
grid and organise its entries into a sequence in any fashion whatsoever, the resulting
series is absolutely convergent to some sum S. Best of all: it is the same number S no
matter how we chose to organise them: for rearranging an absolutely convergent
series does not alter its sum.
So now consider the ‘un-modulussed’ grid:
a0 b0 a1 b0 a2 b0 a3 b0 · · ·
a0 b1 a1 b1 a2 b1 a3 b1 · · ·
a0 b2 a1 b2 a2 b2 a3 b2 · · ·
a0 b3 a1 b3 a2 b3 a3 b3 · · ·
: : : :
If, firstly, we choose to sort it into a sequence and then a series as follows:
a0 b0
+a0 b1 + a1 b1 + a1 b0
+a0 b2 + a1 b2 + a2 b2 + a2 b1 + a2 b0
+a0 b3 + a1 b3 + a2 b3 + a3 b3 + and so on · · ·
then its partial sums converge to S and, moreover, the subsequence comprising
partial sums number 1, 4, 9, 16, 25, · · · also converges to S. Yet partial sum number
(n + 1)2 is exactly
(a0 + a1 + a2 + · · · + an )(b0 + b1 + b2 + · · · + bn )
a0 b0
+a0 b1 + a1 b0
+a0 b2 + a1 b1 + a2 b0
+a0 b3 + a1 b2 + a2 b1 + a3 b0
+a0 b4 + a1 b3 + and so on · · ·
252 14 MORE ABOUT SERIES
This series also has to converge to S = AB, and so will the subsequence comprising
items 1, 3, 6, 10, 15, · · · of its partial sums (as indicated by the line-breaks here). Yet
these are precisely the Cauchy product numbers
c0 , c0 + c1 , c0 + c1 + c2 , c0 + c1 + c2 + c3
and so on.
We are (at last) able to conclude that the Cauchy product series converges to AB.
14.2.16 EXERCISE We know that (for all x ∈ (−1, 1)) the series 1 + x + x2 + x3 +
1
x4 + · · · converges to 1−x and the series 1 − x + x2 − x3 + x4 − · · · converges to
1
1+x . Calculate (and simplify as necessary) the Cauchy product of these two series
and confirm that, as predicted by the theorem, it converges to the product of the
two functions.
xn ∞
x2 x3
ex = 1 + x + + + ··· = ,
2! 3! 0
n!
calculate (and simplify as necessary) the Cauchy product of the power series
representations of ex and of ey , and confirm that it converges to the product of
the two functions.
‘centre’) and the ‘coefficients’ an are also constants. Most of the time, we change
the variable by
substituting, say, y = x − c so that the appearance of the series
simplifies to ∞ 0 an y n (and the centre becomes 0). Since this can always be done,
most of the theory assumes that it has already taken place. That is, in practice, a
power series is a series of the form ∞ n
0 an x .
We need to be aware of which values of x make ∞ n
0 an x converge and which
make it diverge, and for a series of this type there are just three possible scenarios:
either it converges absolutely for all x, or only for x = 0, or (the most typical case)
there is a number D such that the series converges absolutely whenever |x| < D and
diverges whenever |x| > D. The number D is known as the radius of convergence,
mainly because all this theory can equally well be developed for the case in which x
is a complex number, and then |x| < D describes the inside of a circle centred at the
origin and of radius D. Since we are discussing only real functions, we shall have to
put up with the slightly odd use of the word ‘radius’ to describe half the length of
an interval (−D, D) (which is where we know that the real series converges). It is
generally more difficult to determine whether it converges at x = D or at x = −D.
14.3 POWER SERIES 253
The two extreme cases (when the series converges for all x, or only for x = 0)
are conventionally represented by saying that the radius of convergence is infinite,
or zero.
Proof
If ∞ 0 an x converges only at x = 0, then D = 0. If not, pick any non-zero t such
n
∞
that 0 an t n does converge. Then certainly an t n → 0 and is therefore bounded:
there is M > 0 such that |an t n | < M for all n.
For any number u in the interval (−|t|, +|t|), we have
u n
M
|an un | ≤ n |u|n = M
|t | t
and the
last nitem belongs to a convergent
geometric series so, by the comparison
test, |an u | is also convergent and an un is absolutely convergent. In other
words, whenever t is a point in the ‘convergence zone’ of the given power series,
then every point in (−|t|, +|t|) (every point that lies closer to zero than t does) is
also in the convergence zone.
The only subsets of R that possess this property are R itself and the intervals that
are centred on 0 (length 2D, say). Hence either ∞ or D acts as radius of convergence
for the series.
In many cases, the radius of convergence can be calculated quite easily:
14.3.2 Theorem For a given power series an xn :
√
1. If n |an | converges to a limit then
1
• > 0 implies that the radius of convergence is ,
• = 0 implies that the radius of convergence is ∞;
|an+1 |
2. If converges6 to a limit then
|an |
1
• > 0 implies that the radius of convergence is ,
• = 0 implies that the radius of convergence is ∞.
Proof
√ √ √
Given that n |an | converges
to , we see that n
|an x n | = n |a ||x| → |x|. The
n
roottest (applied to |an xn |) tells us that if |x| < 1 then |an xn | converges
(so n
an x converges absolutely) whereas if |x| > 1 then |an xn | does not tend
to zero, an x also does not tend to zero, and an xn diverges. Separating out the
n
cases > 0 and = 0, that proves the first part. The second emerges from the
d’Alembert test in the same way.
Roughwork/preparation
Let us put Sm (x) = m 0 an x , the m partial sum. Since limm→∞ Sm (x) = f (x),
n th
Sm (x) will be a good approximation to f (x) (provided m is taken big enough) for
each individual x: but that is not strong enough to guarantee the approximation
to be equally good for all values of x at once – indeed, the endpoints ±D of the
interval could present serious problems since we do not even know whether the
series converges at all there. Common sense therefore suggests that we stay away
from ±D and work on a slightly smaller interval of the form [−ρ, ρ] for some
suitably chosen positive ρ < D.
Let ε > 0 be given. Since an ρ n converges absolutely to f (ρ), the modulussed
series |an |ρ n converges to some limit = ∞ 0 |an |ρ , and so we can find a
n
that is, the mth partial sum Sm is an ε3 -good approximation to f at every point of
[−ρ, ρ] simultaneously.
Proof
Given any point x0 in (−D, D) and any ε > 0, choose ρ so that −ρ < x0 < + ρ < D,
and choose m ∈ N as in the roughwork/preparation. Since Sm (x) is continuous
(being just a polynomial) we can find δ > 0 so that
ε
|Sm (x) − Sm (x0 )| < whenever x lies in the interval (x0 − δ, x0 + δ).
3
(We also take care that δ is small enough to fit (x0 − δ, x0 + δ) inside [−ρ, ρ].)
Now for any x ∈ (x0 − δ, x0 + δ):
|f (x) − f (x0 )| = |(f (x) − Sm (x)) + (Sm (x) − Sm (x0 )) + (Sm (x0 ) − f (x0 ))|
≤ |f (x) − Sm (x)| + |Sm (x) − Sm (x0 )| + |Sm (x0 ) − f (x0 )|
ε ε ε
< + + = ε.
3 3 3
That is, f is continuous at x0 . Since x0 was an arbitrary element of (−D, D), the
proof is complete.
14.3.5 Notes
m
− ≤ ak ≤ .
1
256 14 MORE ABOUT SERIES
However, absolute convergence implies convergence, so m 1 ak also converges
∞
to its limit, the number conventionally written as ∞ 1 ak . Taking limits across
the
∞ previous display, we therefore obtain − ≤ a
1 k ≤ , that is,
1 ak ≤ , or ∞
∞
ak ≤ |ak |,
1 1
as we desired.
2. Lines such as
∞
m ∞
f (x) = an xn therefore f (x) − an xn = an xn ,
0 0 m+1
plausible though they appear, should also create a pause for thought since two
limiting processes are involved. Remember that, once the integer m has been
fixed, m n
0 an x is simply a real
number, a constant.
If we take a convergent series ∞ 1 ak , and add a constant K as a zeroeth
term, every partial sum will increase by K and therefore so will the
sum-to-infinity. In other notation,
In the case where K = − m 1 ak , and restricting the discussion to n > m
which will not disturb limiting behaviour, this says
∞
m ∞
ak = − ak + ak .
m+1 1 1
The final result in the set, concerning how to differentiate the sum of a power
series, looks like little more than common sense at first sight. The proof, however,
is demanding, so we shall divide its burden across this chapter and Chapter 16.
Given that
14.3 POWER SERIES 257
f (x) = a0 + a1 x + a2 x2 + a3 x3 + · · · + an xn + · · ·
converges on an open interval (−D, D), the only reasonable guess that comes to
mind is that the so-called derived series
really ought to converge on the same interval, and its sum really ought to be the
derivative of f (x). This is, in fact, true. For the moment, we shall content ourselves
with proving only the first ‘really ought’.
14.3.6 Theorem If
an xn has radius of convergence D > 0, then the derived
series nan x n−1 also converges on (−D, D).
Proof
For arbitrary x0 in (−D, D), again start by choosing positive ρ so that
−ρ < x0 < +ρ < D. Since an ρ n converges, its terms tend to zero and are
therefore bounded: there is a positive number M such that
M
|an ρ n | < M, that is, |an | < for all n ∈ N.
ρn
Therefore
M M |x0 | n−1
|nan x0n−1 |
≤ n n |x0 | n−1
= ×n .
ρ ρ ρ
n−1
Now M ρ is merely a constant, and the power series nx has radius of con-
vergence 1, so the final term in the display belongs to a convergent series. The
comparison test applies, and shows that nan x0n−1 is also absolutely convergent.
Since x0 was any element of (−D, D), the proof is complete.
14.3.7 Remark Hence also the second derived series n(n−1)an xn−2 , the third
derived series n(n − 1)(n − 2)an x , and so on, all converge absolutely on the
n−3
interval (−D, D), where D is the radius of convergence of the original power series.
It can also be shown that the radius of convergence of the derived series (and of
the second, and of the third . . . ) is exactly D.
.........................................................................
15 Uniform continuity —
continuity’s global
cousin
.........................................................................
15.1 Introduction
Continuity is a local property of a function, not a global one.
It is all too easy to forget this, because most of the functions we meet in practice
are continuous at every point in their domain; this tends to create a misleading
impression that continuity is global in character. However, look at the definition of
continuity on a set: a function f : D → R is continuous on D if, for every individual
point p in D and every sequence (xn ) in D that converges to p, we find that the limit
of f (xn ) equals f (p). The phrase in italics reminds us that continuity upon a set
needs to be assessed at each individual element of that set: that makes it essentially
a ‘local’ property (because it is judged locally, one point at a time).
We can emphasise this further by reminding ourselves of the ε − δ, challenge-
response, input-output ‘game’ that we can play in order to confirm continuity,
namely:
(Of course, we don’t usually write most of the English words in that display – we
are just reminding ourselves about the nature, the dynamics of the game.)
It goes without saying that the response depends on the challenge – that δ
depends upon (is a function of) ε. We could have been hyper-fussy and written
not δ but δε or δ(ε) to make this point visible, but we don’t: as was just remarked,
it goes without saying.
What also goes without saying, and is about to become important, is that δ is
also allowed to depend upon and vary with p. Because continuity is assessed by
the ε − δ game one point at a time, there is no requirement that (for a particular ε)
the δ you find at one point should equal the δ you find at another. It might happen,
but for ordinary continuity it doesn’t have to.
Undergraduate Analysis: A Working Textbook, Aisling McCluskey and Brian McMaster 2018.
© Aisling McCluskey and Brian McMaster 2018. Published 2018 by Oxford University Press
260 15 UNIFORM CONTINUITY — CONTINUITY ’S GLOBAL COUSIN
ε
f (p)
ε
δ δ
ε
f (p)
ε
p
ε ε
K K
We can throw some more light onto this issue by looking at a few simple
illustrative examples. The straight-line function s : R → R given by s(x) = Kx
(where K is some positive constant) is just about the simplest of all functions upon
which to play the ε − δ game because, for a given ε > 0, the choice δ = ε/K is
optimal: it works, and it is the biggest possible choice of δ that does work.
Consider now a slightly more complicated function, say, the function
s : [0, 4] → R whose graph consists of the four pieces of straight line joining
the points (0, 0) to (1, 1), (1, 1) to (2, 3), (2, 3) to (3, 6) and (3, 6) to (4, 10).
15.1 INTRODUCTION 261
(4, 10)
gradient = 4
(3, 6)
gradient = 3
(2, 3)
gradient = 2
(1, 1)
gradient = 1
(0, 0)
Since their gradients are 1, 2, 3 and 4, playing the ε − δ game (for a small
value of ε) at 0.5 has an optimal choice of δ = ε/1; but playing instead at 1.5, the
optimal choice changes to δ = ε/2, and playing instead at 2.5 and at 3.5 changes it
to δ = ε/3 and to δ = ε/4. Of course, a general policy of choosing δ = ε/4 would
have dealt with all four of these points, since a choice of δ that is smaller than the
optimal choice is still a perfectly valid choice – a winning move in the game, so
to speak. Devoting a little extra care to the corners (because at the three junction
points (1, 1), (2, 3) and (3, 6), the graph actually doesn’t have a gradient), you can
readily check that the strategy δ = ε/4 will win the ε − δ game at every point
in [0, 4].
Summary so far: for the function s , the natural choice of δ (for a given pos-
itive value of ε) varies from one point of its domain to another, but there is
a uniform way to select δ that actually works irrespective of which point you
focus on.
If we step up the last example by adding straight line fragments of gradient
5, 6, 7, · · · , n, then the discussion barely changes: the ‘best’ choice of δ at a point on
the part of the graph that has gradient j is δ = ε/j, and that varies from one point
to another; but a systematic choice of δ = ε/n will work smoothly at every single
point of the domain [0, n].
Now take the final step in the direction in which we are travelling, by continuing
to add endlessly many pieces of straight line of ever steeper gradient to the graph
of s .
262 15 UNIFORM CONTINUITY — CONTINUITY ’S GLOBAL COUSIN
(6, 21)
gradient = 6
(5, 15)
gradient = 5
gradient = 3 (3, 6)
(2, 3)
(1, 1)
(0, 0)
(If you wish, you can even obtain a formula for the function s whose
graph this is:
n − n2
s (x) = nx + while n − 1 ≤ x < n (n ∈ N)
2
but, pragmatically, it is the shape of the graph that will help you to see what is
happening, more than the formula does.)
For this function s it remains true that the optimal choice of δ at a point on
the part of the graph that has gradient j is δ = ε/j, but now there is no one-size-
fits-all choice of δ that will work everywhere: if someone were to claim that (for
a particular ε) a magical choice of δ = δ would work at all points of the domain
[0, ∞), then we could disprove that claim merely by finding an integer q greater
than ε/δ , shifting our attention to a point on the graph at which the gradient
is q, and observing that, at that point, the alleged all-purpose δ is bigger than the
optimal, the greatest acceptable value of δ (namely ε/q). Thus, for the function s
(which certainly is continuous), not only does δ vary naturally with the point at
which you play the game, but IT HAS TO VARY: there is no way to pick a δ that will
work at all of the points in the domain of s .
You may have noticed the word uniform sneaking into the discussion half a page
back, and this is precisely what it means when it refers to continuity: a function is
uniformly continuous on a set if not only can the ε − δ game be played and won
at each point of the set, but also there is (for each ε > 0) a way to choose δ > 0
that will work equally well at each and every point of the set; in other words, δ can
be chosen in a way that is independent of the point at which the ε − δ game is to
15.2 UNIFORMLY CONTINUOUS FUNCTIONS 263
be played. So s and s (as detailed above) were not just continuous but uniformly
continuous, whereas s was continuous but not uniformly continuous. As George
Orwell didn’t quite get around to saying in Animal Farm, all (these) functions are
continuous but some are more continuous than others.
(We can usually assume that A is the whole of D since, if it were not, we could
replace f by its restriction f |A to A and work with that function instead.)
15.2.2 Note Most people cannot, at first sight, see how this differs from ‘ordinary’
continuity on A but, hopefully, the introduction to this chapter will have clarified
the distinction for you. In ‘ordinary’ continuity, you start with a y and an ε, and
you go looking for a δ such that
In uniform continuity, on the other hand, you start only with an ε, and you go
looking for a δ such that
In the first case, then, the δ needs to work only for a particular y and ε; in the
second case, the δ has to work for a particular ε, but for all x and y that are δ-
close no matter where in A they lie. This is asking considerably more: despite the
apparent identity between (1) and (2), in (1) the y is fixed and only the x varies,
whereas in (2) the x and the y are both free to vary.
√
15.2.3 Example To show that the function f : [1, ∞) → R given by f (x) = x is
uniformly continuous (on [1, ∞)).
Solution
√ √
We need to compare |x − y| with |f (x) − f (y)| = | x − y| and, fortunately, there
is a simple algebraic connection between them:
√ √ √ √
( x − y)( x + y) = x − y
264 15 UNIFORM CONTINUITY — CONTINUITY ’S GLOBAL COUSIN
So, if ε > 0 is given, we may choose δ = 2ε > 0 and see from the last display that
|x −y| < δ will ensure that |f (x)−f (y)| < ε no matter where x and y are in [1, ∞):
this is exactly what uniform continuity requires.
15.2.4 EXAMPLE Use a similar argument to prove that the function g : [1, ∞) → R
√
given by g(x) = 3 x is uniformly continuous (on [1, ∞)). You may find it helpful
to use the factorisation p3 − q3 = (p − q)(p2 + pq + q2 ).
Here is a small result that you would probably guess to be true, given what you
know about ‘ordinarily continuous’ functions:
Solution
To set up enough notation to discuss the posed question, let f : A → B and
g : B → C both be uniformly continuous, where B ⊆ B . Then the composite
map g ◦ f makes sense, and it is a function from A to C.
Let ε > 0 be given.
Since g is uniformly continuous, there is δ1 > 0 such that
Now for any x and y in A such that |x − y| < δ2 , the second display tells us that
|f (x) − f (y)| is less than δ1 so, putting p = f (x) and q = f (y) in the first display,
we see that |p − q| is less than δ1 , and therefore |g(f (x)) − g(f (y))| < ε. In other
words, |(g ◦ f )(x) − (g ◦ f )(y)| is less than the given ε.
Hence the result.
Recalling that the convergence of sequences gave us a particularly efficient
way to describe continuity, it might have been expected that something similar
would arise in uniform continuity. The following lemma provides just such a
characterization, and it is often helpful in sorting out the subtle but important
distinction between the two concepts.
1. f is uniformly continuous on A,
2. for every two sequences (xn ) and (yn ) in A such that |xn − yn | → 0, we have
that |f (xn ) − f (yn )| → 0 also.
SUGGESTION
Since it is clear that this lemma is a close relative of the corresponding (‘one-
sequence’) result that connects the sequence-style definition and the epsilontics-
style characterisation for ordinary continuity, you can reasonably expect its
demonstration to follow the pattern of that result’s proof. Try constructing such a
demonstration before you read the account that we present next.
Proof
(I): (1) implies (2).
• Suppose that condition (1) is satisfied.
• Let (xn )n∈N and (yn )n∈N be any two sequences of elements of A for which
|xn − yn | → 0.
• Given ε > 0, use condition (1) to obtain δ > 0 such that whenever x ∈ A
and y ∈ A and |x − y| < δ, we have |f (x) − f (y)| < ε.
• Since |xn − yn | → 0, there is n0 ∈ N such that n ≥ n0 guarantees
|xn − yn | < δ.
• Therefore n ≥ n0 ⇒ |xn − yn | < δ ⇒ |f (xn ) − f (yn )| < ε.
• That is, |f (xn ) − f (yn | → 0. This proves (2).
(II): (2) implies (1).
• Suppose that condition (1) is not satisfied.
• That is, there exists a value of ε > 0 for which no suitable δ > 0 can be
found.
• In particular, for each n ∈ N, δ = 1/n is not suitable…
• . . . and so there are points xn , yn ∈ A such that |xn − yn | < 1/n and yet
|f (xn ) − f (yn )| ≥ ε.
• Therefore |xn − yn | → 0, and yet |f (xn ) − f (yn )| does not converge to 0.
• In other words, condition (2) is not satisfied.
• By contraposition, (2) implies (1).
Proof
Suppose f : A → R to be uniformly continuous. Given a convergent sequence
xn → in A, let (yn ) be the constant sequence (, , , , · · · ). Obviously
|xn − yn | → 0, so the two-sequence lemma gives |f (xn ) − f (yn )| → 0, that is,
|f (xn ) − f ()| → 0. This is the same as saying f (xn ) → f (). Since f therefore
preserves limits of sequences, f is continuous at each point of A, as required.
266 15 UNIFORM CONTINUITY — CONTINUITY ’S GLOBAL COUSIN
Alternative proof
In the definition of uniform continuity, take the special case where y is held
constant, and you immediately get f continuous at y (for each y).
Solution:
The two-sequence lemma suggests we look for a pair of sequences that get close to
one another but whose squares do not, and the Introduction suggests we try to get
out into the high-gradient parts of the graph. So take xn = n, yn = n + n1 for each
n ∈ N. That decision creates two sequences in [0, ∞) and, since |xn − yn | = n1 , we
certainly have |xn − yn | → 0. However,
1 2 2 1
2 2 1
n − n + = n − n − 2 − 2 = 2 + 2
n n n
15.2.9 Remark Almost exactly the same proof will show that x2 is not uniformly
continuous on any interval of the form [a, ∞), nor on any interval of the form
(a, ∞).
15.2.10 EXERCISE
Proof
Given ε > 0, we use uniform continuity to find δ > 0 so that |x − y| < δ, x ∈ A,
y ∈ A together imply |f (x) − f (y)| < ε. Since (xn )n ∈ N is Cauchy, now choose
n0 ∈ N so that m, n ≥ n0 will force |xm − xn | < δ. Then m, n ≥ n0 also forces
|f (xm ) − f (xn )| < ε and we have what we wanted.
15.2 UNIFORMLY CONTINUOUS FUNCTIONS 267
The interesting question now is: is the converse true? And the irritating answer
is: sometimes. . .
Proof
If not, then there is a positive number ε such that, no matter how we choose δ > 0,
there will be points of A within δ of one another whose f -values are at least ε apart.
In particular, for each positive integer n (and choosing δ = n1 ), there exist xn , yn in
A for which |xn − yn | < n1 and yet |f (xn ) − f (yn )| ≥ ε.
Since A is bounded, Bolzano-Weierstrass tells us that (xn )n∈N has a convergent
subsequence (xnk )k∈N converging to a limit . Since, for all k ≥ 1:
1
|ynk − | ≤ |ynk − xnk | + |xnk − | < + |xnk − | → 0,
nk
we see that (ynk )k∈N also converges to the same limit . If we now ‘interleave’ these
two sequences, we get
again converging to (see Example 5.2.5), and therefore Cauchy, and yet
fails to be Cauchy since endlessly many pairs of its terms are at least ε apart: we
have thus achieved a contradiction.
15.2.13 Note If we were to drop the word ‘bounded’, this result would cease to
be true: for instance, we know that the x2 function on (the unbounded set) R
is not uniformly continuous, and yet it is easy to check (do so) that it preserves
Cauchyness.
The next theorem is generally viewed as the most important basic result about
uniform continuity. We’ll offer two different proofs of it.
1
x ∈ [a, b], y ∈ [a, b], |x − y| <
n
1
|xn − yn | < and yet |f (xn ) − f (yn )| ≥ ε · · · · · · (1)
n
Now
|x − ynk | ≤ |x − xnk | + |xnk − ynk | → 0 as k → ∞
that is, ynk → x also1 (as k → ∞). By ordinary continuity, f (ynk ) → f (x) as
k → ∞, and also (using (2)) f (xnk ) → f (x) as k → ∞.
Subtract, and we get f (xnk ) − f (ynk ) → 0 as k → ∞.
This contradicts (1), and completes the proof.
and this will be less than M|f (x)−f (y)| provided that we can find a constant M that
is always bigger than |f (x) + f (y)|. Luckily, we can find some such constant since
the continuous function f will be bounded on the closed, bounded interval I. Then
forcing |f (x) − f (y)| to be smaller than ε/M will guarantee that |f 2 (x) − f 2 (y)| is
smaller than ε. Now let’s write that out properly. . .)
Since f is uniformly continuous, it is certainly continuous on the closed,
bounded interval I, and therefore f itself is bounded: there exists K > 0 so that
|f (x)| ≤ K for all x ∈ I. Notice that (for any p and q)
Solution to 2:
With g : R → R defined by g(x) = x, it is really trivial that g is uniformly
continuous. Yet g 2 is now the x2 function that we have shown not to be uniformly
continuous. So the boundedness of the interval I cannot be thrown away
in part 1.
Of course the key theorem is not able to prove a function to be uniformly
continuous on an unbounded interval; nevertheless, it can sometimes be employed
to carry out a significant part of that task; look first at the following:
15.2.16 EXERCISE Let I and J be two intervals that share an endpoint from
opposite sides: that is, I is either (−∞, b] or (a, b] or [a, b], while J is either [b, ∞)
or [b, c) or [b, c]. Let f : I ∪ J → R be uniformly continuous on I, and also
uniformly continuous on J. Show that f is uniformly continuous on I ∪ J.
270 15 UNIFORM CONTINUITY — CONTINUITY ’S GLOBAL COUSIN
Remarks
• This turns out to be valuable more often than you might expect, because the
most obvious reason why some function is uniformly continuous can vary
from one part of its domain to another. The above result allows us to ‘glue
together’ uniform continuity that has been ‘separately evidenced’ in different
parts of its domain. (Consider, as an illustration, the next example.)
• The only non-routine part of the proof is to show that nearby points x ∈ I
and y ∈ J have f -values that are suitably close together. To help with this, notice
that both x and b, and also b and y, will be nearby, and that
|f (x) − f (y)| ≤ |f (x) − f (b)| + |f (b) − f (y)|.
√
15.2.17 Example To prove that the formula f (x) = x, x ∈ [0, ∞) defines a
uniformly continuous function on [0, ∞).
Solution
Since f is continuous on the closed, bounded interval [0, 1], the key theorem
provides evidence of its uniform continuity there. Also, an earlier example (15.2.3)
showed its uniform continuity on [1, ∞). Now we can appeal to 15.2.16 to deduce
that it is uniformly continuous on the union [0, 1] ∪ [1, ∞) = [0, ∞) as was
required.
√
15.2.18 EXERCISE Prove that the function f given by f (x) = 3 x, x ∈ [0, ∞) is
uniformly continuous on [0, ∞).
15.2.19 Examples To determine whether the following real functions are uni-
formly continuous on the intervals indicated.
1. On (0, ∞) we define f by the formula f (x) = x1 .
2. On [1, ∞) we define f by the formula f (x) = x1 .
√ 4
3. On [0, 10] we define f by the formula f (x) = sin(cos2 ( 1 + x3 + ex+x )).
4. On [0, 2] we define f (x) = x, the floor of x.
sin x
5. On (0, 1] we define f (x) = (you should assume that the limit of f (x) as
x
x → 0 is 1).
6. On (0, ∞] we define f (x) = ex (you should assume basic facts about the
function ln, including that it is continuous on positive numbers).
Solution
1. The sequence n1 n≥1 in (0, 1) is Cauchy because it converges (to 0). Yet
1
f n n≥1 is the sequence (n) and that is not Cauchy: indeed, it is not even
bounded. So f , since it does not preserve Cauchyness, cannot be uniformly
continuous.
15.2 UNIFORMLY CONTINUOUS FUNCTIONS 271
then F will be continuous not only on (0, 1] but at 0 as well, that is, it is
continuous on closed, bounded [0, 1]. By the key theorem, F is uniformly
continuous on [0, 1] and, in particular, on its subset (0, 1]. Yet on this interval,
F and f are the same function – so f is uniformly continuous on its given
domain.
6. (After some trial-and-error along the lines of the roughwork thinking we did
towards showing that x2 was not uniformly continuous), for each positive
integer n we try xn = ln(n) and yn = ln(n + 1). Then
|xn − yn | = ln(n + 1) − ln(n) = ln 1 + n1 and, as n → ∞, this expression
→ ln(1) = 0 since ln is continuous. On the other hand,
|f (xn ) − f (yn )| = | n − (n + 1)| = 1 which does not converge to 0. By the
two-sequence lemma, f is not uniformly continuous.
Solution
1. Being uniformly continuous, f is certainly continuous. Since dividing among
continuous functions (but scrupulously avoiding division by 0) always gives
continuous functions, 1f is continuous (on I). Since the interval I is bounded
1
and closed, f is also uniformly continuous there by the key theorem.
2. For instance, on the closed unbounded interval I = [1, ∞), f : I → R
described by f (x) = x12 is non-zero and uniformly continuous (as a proof very
like that of Example 15.2.19, part 2, will show). But here, 1f is the x2 function
which we know how to prove to be not uniformly continuous.
3. For instance, on the bounded, non-closed interval (0, 1], it is very easy to
check that the function f (x) = x is uniformly continuous (and never exactly
zero). Yet, very much as we saw in an earlier example, its reciprocal, the x1
function, is not.
Proof
Given ε > 0, define δ = Kε where K is the constant in the Lipschitz definition.
Then
x, y ∈ I, |x − y| < δ ⇒ |f (x) − f (y)| ≤ K|x − y| < Kδ = ε.
15.3.4 Theorem: the bounded derivative test for uniform continuity Suppose
that the real function f is continuous on an interval I and differentiable at each
15.3 THE BOUNDED DERIVATIVE TEST 273
interior point of I, and that its derivative f (x) is bounded. Then f is uniformly
continuous on I.
Proof
For any a < b in I the conditions of the first mean value theorem are met on the
subinterval [a, b], so there is a point c ∈ (a, b) such that
f (b) − f (a)
= f (c).
b−a
There is also a positive constant K such that |f | < K at every point of the interval
(a, b), so
|f (b) − f (a)| ≤ K|b − a|.
Hence f is a Lipschitz function, and is therefore uniformly continuous.
15.3.5 Notes
Proof
(I) Suppose firstly that f can be so extended: that is, there is a continuous function
F : [a, b] → R whose restriction to (a, b] is exactly f . By the key theorem, F is
uniformly continuous on [a, b] and therefore, in particular, uniformly continuous
on (a, b]. Yet F and f are identical on (a, b], so f is uniformly continuous.
(II) Conversely, suppose that f is uniformly continuous (on (a, b]). Choose a
sequence2 (yn ) in (a, b] whose limit is a. Since (convergent) (yn ) is Cauchy, we
know from 15.2.11 that (f (yn )) is also Cauchy, and consequently converges to some
limit – let us call it .
Now consider any sequence (xn ) in (a, b] that converges to a. If (as we did before)
we interleave the two sequences thus: (x1 , y1 , x2 , y2 , x3 , y3 , · · · ), we create a new
sequence converging to a, and therefore Cauchy, and we see from 15.2.11 that
is Cauchy and therefore has to tend to some limit (let us be cautious and call it
for the moment). However, since the subsequence (f (y1 ), f (y2 ), f (y3 ) · · · ) must
also converge to but actually does converge to , the two numbers and are,
in fact, identical. Hence the ‘complementary’ subsequence (f (x1 ), f (x2 ), f (x3 ) · · · )
converges to also.
b−a
2 for example, yn = a + n+1 would give one suitable choice.
15.3 THE BOUNDED DERIVATIVE TEST 275
What the last paragraph shows us is that the function F : [a, b] → R defined by
the formula
F(x) = f (x) for x ∈ (a, b]; F(a) =
possesses a limit as x → a, and that this limit is which equals F(a). Therefore F is
continuous at a, as well as (trivially) continuous everywhere else in [a, b]. Thus we
have managed to find a continuous extension of f over [a, b].3
15.3.10 EXERCISE Think how much4 you would need to modify that argument
in order to show that a real function defined on a bounded open interval (a, b) or,
indeed, on a finite union of bounded open intervals (a1 , b1 )∪(a2 , b2 )∪· · ·∪(an , bn )
can be continuously extended over the corresponding closed interval(s) if and only
if it is uniformly continuous.
3 Incidentally, the same ‘last paragraph’ also shows that F is unambiguously defined in the sense
that the number we selected to be the value of F(a) does not depend on how we chose the
sequence (yn ): any different choice of (yn ) would have resulted in exactly the same number .
4 The short answer is: not very much!
.........................................................................
16 Differentiation — mean
value theorems, power
series
.........................................................................
16.1 Introduction
Recall Rolle’s theorem: a function that is continuous on a bounded closed interval,
differentiable on the corresponding open interval, and of equal value at the
endpoints must have zero derivative at one point (at least) in between.
Recall the first mean value theorem: a function that is continuous on a bounded
closed interval and differentiable on the corresponding open interval must, some-
where between the endpoints, have derivative equal to the average (the mean)
gradient of its graph across the entire interval.
Given the use of the word first in the title, it will hardly surprise anyone to learn
that there are other ‘mean value’ theorems. This chapter is going to look at some
of the others: what they say, why they are true and what use can be made of them.
This study will lead us into questions of how to represent ‘highly differentiable’
functions by power series and thus, inevitably, back to the foreshadowed theorem
on the differentiation of power series themselves.
16.2.1 Cauchy’s Mean Value Theorem (‘CMVT’) Let f and g both be continuous
on [a, b] and differentiable on (a, b), with g (x) non-zero at every point of (a, b).
Then there is (at least) one point c ∈ (a, b) such that
Undergraduate Analysis: A Working Textbook, Aisling McCluskey and Brian McMaster 2018.
© Aisling McCluskey and Brian McMaster 2018. Published 2018 by Oxford University Press
278 16 MEAN VALUE THEOREMS, POWER SERIES
Proof
We define a new function h by the formula h(x) = f (x)−λg(x) where the constant
λ will be chosen in such a way that RT can be applied to h.
First, how to choose λ? Certainly h will be continuous and differentiable where
f and g were, no matter how we decide to pick λ, so all we need to arrange is that
h(a) = h(b), that is
f (a) − λg(a) = f (b) − λg(b)
provided that the bottom line is non-zero. Fortunately, g(b) − g(a) cannot be zero
because, if it were, RT applied to g would tell us that g was zero somewhere, which
is explicitly not the case.
Now that λ has been thus chosen, RT applied to h gives us the existence of
c ∈ (a, b) for which 0 = h (c) = f (c) − λg (c) and so, again because g cannot
go zero,
f (c)
= λ,
g (c)
16.2.2 Remarks
1. In the special case where g(x) = x for all relevant x (and therefore g (x) = 1
which is certainly non-zero) we get, from CMVT,
Proof
Case 1: if g(b) − g(a) = 0 then essentially the proof we gave before still runs.
Case 2: if g(b) − g(a) = 0 then RT says there is a point c such that g (c) = 0
and then the result is immediate.
16.2 CAUCHY AND L’HÔPITAL 279
16.2.3 Example Let f be any function that is continuous on [0, 1] and differen-
tiable on (0, 1), and n be any positive integer. We show that there must be a number
c in (0, 1) such that f (c) = ncn−1 (f (1) − f (0)).
Roughwork
Only one function f is visible here, but the group of symbols ncn−1 should make
us suspect that ‘the other function’ is xn . Certainly g(x) = xn is continuous and
differentiable wherever we need it to be, and its derivative goes to zero only at
x = 0, that is, not anywhere in (0, 1) . . .
Solution
The CMVT does apply to the two functions f and g where g(x) = xn , and it says
there is c ∈ (0, 1) such that
16.2.4 EXERCISE The following alleged proof of CMVT is incorrect. Find out
precisely why.
(If necessary, try running the argument of the purported proof on a couple of
simple functions such as f (x) = x2 and g(x) = x3 over [0, 1].)
‘Since f satisfies the conditions of FMVT over the interval [a, b], we know that
f (b) − f (a)
there exists c ∈ (a, b) such that f (c) = .
b−a
g(b) − g(a)
there exists c ∈ (a, b) such that g (c) = .
b−a
‘Dividing one by the other (and remembering that g(b) − g(a) cannot be zero, else
RT would give g = 0 somewhere, contradiction) we get
as desired.’
280 16 MEAN VALUE THEOREMS, POWER SERIES
One of the most immediate and useful applications of CMVT is a result that you
may very well have used already, called l’Hôpital’s Rule, for determining function
limits in the most awkward case where unaided common sense hits the nonsense
barrier of zero divided by zero.
16.2.5 L’Hôpital’s Rule Suppose that f and g are both differentiable on an open
interval (p − h, p + h) centred on a real number p, that both f (p) and g(p) are zero,
that g is non-zero here except possibly at p,1 and that f (x)/g (x) tends to a limit
as x → p. Then also
f (x)
→ as x → p.
g(x)
Proof
(I) For k positive and less than h, the two functions f , g satisfy the conditions
of CMVT on the interval [p, p + k]. Therefore there is a number c such that
p < c < p + k and
f (c) f (p + k) − f (p) f (p + k)
= = .
g (c) g(p + k) − g(p) g(p + k)
As k → 0 (but through positive values) the fact that p < c < p + k gives us c → p
f (c) f (p + k)
also, so (by hypothesis) → . Therefore → also.
g (c) g(p + k)
Let x stand for p + k here. In the language of one-sided limits (which is what we
are presently speaking, since we have so far only considered points just to the right
of p), what this establishes is that
f (x)
lim = .
x→p+ g(x)
(II) For k negative and lying between −h and 0, a virtually identical argument
upon the interval [p + k, p] shows that
f (x)
lim = .
x→p− g(x)
f (x)
lim = .
x→p g(x)
There are many different versions of this Rule, of which the two most obvious
are what we actually proved above for one-sided limits:
16.2.6 L’Hôpital’s Rule: right–hand limits Suppose that f and g are both differen-
tiable on an open interval (p, p + h) and continuous on [p, p + h), that f (p) and
g(p) are zero, that g is non-zero on (p, p + h), and that f (x)/g (x) tends to a limit
as x → p+ . Then also
f (x)
→ as x → p+ .
g(x)
Now the left-handed variety of this resembles it so closely that it is barely worth
stating:
16.2.7 L’Hôpital’s Rule: left–hand limits Suppose that f and g are both differen-
tiable on an open interval (p − h, p) and continuous on (p − h, p], that f (p) and
g(p) are zero, that g is non-zero on (p − h, p), and that f (x)/g (x) tends to a limit
as x → p− . Then also
f (x)
→ as x → p− .
g(x)
Here is another in which the control variable tends to infinity:
f (x)
→ as x → ∞.
g(x)
Proof
We shall use 11.1.14 to switch between limits at infinity and limits at zero (and
there is no loss of generality in assuming a > 0).
1 1
Define two ‘new’ functions F, G on the interval 0, by setting F(x) = f ,
a x
1
G(x) = g . Then F, G are differentiable and, if we additionally define
x
1
F(0) = 0, G(0) = 0, they also become continuous on 0, (because their limiting
a
values, as x → 0+ , are 0 and thus coincide
with the values that we attributed to
1 −2
them at 0). Furthermore, G (x) = −g x is non-zero, and
x
F (x) −f x1 x−2 f x1
lim = lim = lim 1
x→0+ G (x) x→0+ −g 1 x−2 x→0+ g
x x
f (t)
= lim = .
t→∞ g (t)
282 16 MEAN VALUE THEOREMS, POWER SERIES
By a one-sided version of the Rule (16.2.6) that we have already established, that
gives
F(x)
lim = ,
x→0+ G(x)
which is equivalent to
f (x)
lim = .
x→∞ g(x)
16.2.9 Notes
1. Resist the temptation to think (or write) that the essence of the Rule is that (in
the zero-over-zero case)
f (x) f (x)
lim = lim .
g(x) g (x)
While it is true that l’Hôpital does say this, the main point is that if the second
limit exists, then so must the first. Their numerical equality is secondary to that.
2. One more procedural detail before we settle down to a batch of examples.
The Rule is written as if you begin with knowledge of the limit of f /g and
proceed from there to knowledge of the limit of f /g, but that is not exactly what
happens in practice. We actually begin with curiosity about the limit of f /g,
turn it into curiosity about f /g , solve that question if we can, and feed it back
into an answer for the limit of f /g.
3. The additional point is that if our first attack on the limit of f /g also hits the
nonsense wall of zero divided by zero, we ought not to give up the struggle:
we should, instead, drive the process further into curiosity about the limit of
f /g . If that is answerable, then the answer we get pans back to one for f /g
and, in turn, for f /g. If not, consider f /g , and so on. (Of course, if the
derivatives are becoming unmanageable, this process should not be continued
past the point of reasonable hopes.) Take care to check that the conditions
of the Rule are fully satisfied each time you invoke it.
x4 − 16
16.2.10 Example To determine, if it exists, the limit of as x → 2.
x6 − 64
Solution
An initial, common-sense attempt of replacing x by 2 gives the meaningless (but
encouraging) response of zero divided by zero, so an application of the Rule is
indicated.
Putting f (x) = x4 − 16 and g(x) = x6 − 64 we see that g (x) = 6x5 is zero
only at x = 0 so, if we operate over (say) the interval (1, 3) then that derivative is
non-zero and
f (x) 4x3 2
= = 2
g (x) 6x5 3x
16.2 CAUCHY AND L’HÔPITAL 283
f (x)
whose limit is obviously 16 . Therefore the limit also of exists, and equals 16 .
g(x)
16.2.11 Example (Assuming knowledge of how to differentiate basic exponential
and trigonometric functions), to determine whether or not the limit exists of
x(ex − 1)
sin2 x
as x → 0.
Roughwork
Putting x = 0 gives us zero divided by zero, which is not an answer, but suggests
we should try l’Hôpital.
Let
f (x) = x(ex − 1), g(x) = sin2 x.
Then (using product rule and chain rule)
This time the limits (as x → 0) are 2 and 2, so we have ‘broken through the
nonsense wall’.
Solution
Putting f (x) = x(ex − 1), g(x) = sin2 x we see that f (x) = xex + ex − 1,
g (x) = sin(2x) and f (x) = (2 + x)ex , g (x) = 2 cos(2x). All are visibly
differentiable (and continuous).
Now on (− π4 , π4 ) we have g non-zero, f and g are zero at 0, and
f (x) 2 f (x)
→ = 1 as x → 0. By the Rule, → 1 also.
g (x) 2 g (x)
Secondly, on (− π2 , π2 ) we have g non-zero except at 0, f and g are zero at 0, and
f (x) f (x)
→ 1 as x → 0. By the Rule, → 1 also.
g (x) g(x)
The desired limit does exist (and equals 1).
ex + (2 − x)e3
lim .
x→3 (x − 3)2
16.2.14 EXERCISE
1
2. Re-work the problem by putting t = x and so converting it into
π
1
2 − arctan t
lim .
t→0+ t
. . . and so on. It is routine to check that the first of these, p1 , has the same
value and the same derivative as f had at a, the second p2 has the same value, the
same first derivative and the same second derivative as f had at a, the third has
additionally the same third derivative as f at a, and so on.
It would be nice to believe that these so-called Taylor polynomials were
approximating f better and better not only at a but on the interval as a whole.
16.3 TAYLOR SERIES 285
Unfortunately, this is not always true. However, it is true in many important cases.
This is what Taylor’s theorem is about: it sets out to examine how well the list of
polynomials that we just described approximates f over an interval of values of x
around x = a. There are many slightly different versions of it, but we’ll focus on
just one of them:
f (a) f k (a)
pk (x) = f (a) + f (a)(x − a) + (x − a)2 + · · · + (x − a)k .
2! k!
Then
f k+1 (ξ )
f (x) = pk (x) + (x − a)k+1
(k + 1)!
Proof
Consider the function
f (t) f k (t)
F(t) = f (t) + f (t)(x − t) + (x − t)2 + · · · + (x − t)k
2! k!
where t varies over the interval in question. The first nice thing about this function
(please check this out) is just how its derivative simplifies:
f k+1 (t)
F (t) = (x − t)k
k!
and the second nice thing (check this one also) is that F(x) − F(a) simplifies to
f (x) − pk (x).
Now with G(t) = (x − t)k+1 , use the Cauchy mean value theorem on the
functions F, G on the interval joining a and x. It tells us that
F(x) − F(a) F (ξ )
=
G(x) − G(a) G (ξ )
f k+1 (ξ )
k! (x − ξ )
k
f (x) − pk (x) f k+1 (ξ )
= =−
−(x − a) k+1 −(k + 1)(x − ξ )k (k + 1)!
16.3.2 Notes
1. The series (of functions) whose partial sums are these Taylor polynomials is
known as the Taylor series of the function f at the point a.
2. For historical reasons, the special case a = 0 is named after Maclaurin as well
as after Taylor: Maclaurin’s theorem, Maclaurin polynomials, and so on.
3. Think about Taylor’s theorem as saying ‘original function = approximating
polynomial + error term’, where all three are functions of x, of course, and both
the polynomial and the error depend on k (on how far we have gone with the
approximation process). So the main point of the theorem above is to give us a
usable formula for the kth -stage error. Typical questions then are: does the
error always tend to zero, at least for each value of x across a range? Can we
make the error as small as we please over a range of x values simultaneously?
Determine a value of f to so-many decimal places (etc.)
4. The kth -stage error term is often called the remainder after k terms and
denoted by Rk (x):
f (x) = pk (x) + Rk (x).
16.3.3 Example Assuming that ex is its own derivative, we use the theorem to
show that (for every real number x) ex is the limit (as k → ∞) of
x2 x3 x4 xk
1+x+ + + + ··· + .
2! 3! 4! k!
Solution
Since the exponential function equals all of its own derivatives, and since they all
take the value 1 at 0, it is easy to see that the formula displayed here is just pk (x).
Our task, then, is only to show that the remainder tends to zero. Also, via Taylor’s
theorem,
eξ
Rk (x) = xk+1
(k + 1)!
e|x| |x|k+1
(for some ξ between 0 and x) whose modulus is at most , which does
(k + 1)!
indeed tend to zero (see paragraph 6.2.9).
16.3.4 Example We use Taylor’s theorem to show that, for the interval
J = [−1000, 1000], we can find a polynomial that differs from sin x at each point
of J by less than 0.000001. (Assume standard facts about the trig functions.)
Solution
In the theorem, take f (x) = sin x, a = 0. All the derivatives of f are either ± sin x
or ± cos x so they never exceed 1 in modulus. The remainder term
xk+1
|Rk (x)| = ±(sin, cos)(ξ )
(k + 1)!
16.4 DIFFERENTIATING A POWER SERIES 287
cannot exceed 1000k+1 /(k + 1)!, which tends to zero as before. Choosing a value
k0 of k that makes the latter expression less than 0.000001, we then see that
| sin x−pk0 (x)| = |Rk0 (x)| is less than 0.000001 at every point of J, that is, the Taylor
polynomial pk0 (x) is as good an approximation to sin x as the question wanted.
Solution
Take f (x) = ln x, a = 1. The derivatives fall into an obvious pattern f (x) = x−1 ,
f (x) = − x−2 , f (x) = + 2x−3 , f iv (x) = − 3! x−4 and, in general, we see that
f k (x) = (−1)(k−1) (k − 1)! x−k . Setting a = 1 in these and appealing to Taylor’s
theorem, we see that the Taylor polynomials take the form
1 1 1
0 + 1(x − 1) − (x − 1)2 + (x − 1)3 − (x − 1)4 + · · ·
2 3 4
f k+1 (ξ ) ξ −k−1
Rk (x) = (x − 1)k+1 = ± (x − 1)k+1
(k + 1)! k+1
where we are about to replace x by 1.12 and know, therefore, that ξ lies between 1
and 1.12. Thus the modulus of the remainder after k terms cannot be more than
1
k+1 (0.12)
k+1 . Experimenting now with a calculator, we soon find that k = 4 gives
1
k+1 (0.12)
k+1 = 0.000005 approximately, which ought to be good enough for four
1 1 1
p4 (1.12) = 0 + 1(1.12 − 1) − (1.12 − 1)2 + (1.12 − 1)3 − (1.12 − 1)4
2 3 4
2 that is, every function that can be repeatedly differentiated as often as you wish
288 16 MEAN VALUE THEOREMS, POWER SERIES
What we now seek is a converse, to the effect that every convergent power series
gives, as its limit, an infinitely differentiable function. (The only proviso needed
this time is that the radius of convergence shall be greater than zero, and this is
obviously necessary since a series whose radius of convergence was exactly zero
would define a sum function only at x = 0, so the issue of differentiability would
simply not arise.) We started this investigation back in Chapter 14, from which the
following revision material is taken:
Given that the power series
a0 + a1 x + a2 x2 + a3 x3 + · · · + an xn + · · ·
(and more such series, if the need arises) by exactly the same mechanical process.
We showed in Chapter 14 that all of these series also converge (absolutely) at every
point of (−D, D), and we expressed the aspiration that the sum function of the
derived series ‘ought’ to be the derivative f (x) of f (x). It is time to show that this
is not mere wishful thinking.
Proof
This is, by a good margin, the biggest and most demanding proof in the entire
text, so
• we shall split it down into (hopefully) more comprehensible chunks,
• do not worry unduly if you don’t fully understand it,
• do pay careful attention to what the theorem says, which is both useful and
natural, even if you decide to shelve the proof till later,
• do read through it carefully at some stage, because understanding why a result
works gives a deeper understanding of what it can do.
16.4 DIFFERENTIATING A POWER SERIES 289
f (x0 + h) − f (x0 )
E(h) = − g(x0 )
h
is zero.
Let ε > 0 be given. We need to show that, for all sufficiently small (non-zero)
values of h, |E(h)| < ε.
STEP 2: BREAK THE TASK UP INTO MANAGEABLE PIECES
For each positive integer N we need to think about the N th partial sum of what
is happening here, so let
N ∞
SN (x) = ak xk , TN (x) = ak xk
0 N+1
and use this notation to decompose E(h) into three fragments (with the intention
of handling each separately):
where
SN (x0 + h) − SN (x0 )
E1 (h) = − SN (x0 ), E2 = SN (x0 ) − g(x0 ), and
h
TN (x0 + h) − TN (x0 )
E3 (h) = .
h
Fortunately, two of these three pieces are pretty easy to deal with. After all, SN is
just a polynomial, and polynomials surely are differentiable, so the limit (as h → 0)
of E1 is zero; continuing, SN is the N th partial sum of the series that defines g, so
the limit of E2 is also zero. The sole abiding difficulty is E3 .
290 16 MEAN VALUE THEOREMS, POWER SERIES
so, if we can arrange that both x0 and x0 + h have modulus less than r:
(using the triangle inequality yet again). Now we are fully prepared to look more
closely at E3 :
∞ ∞
1
E3 (h) = ak (x0 + h)k − ak x0k
h
N+1 N+1
∞
1
= ak {(x0 + h)k − x0k } ,
h
N+1
therefore
∞ ∞
1
|E3 (h)| = ak {(x0 + h)k − x0k } ≤ |ak | × krk−1 .
h
N+1 N+1
so the last summation on the line above is a tail of a convergent series, and therefore
must tend to zero (see 7.3.20) as N → ∞.
STEP 4: PUT THE PIECES TOGETHER (IN THE RIGHT ORDER)
Remember that ε > 0 was given some time ago. How small must we make
h in order that all the pieces of this jigsaw-puzzle of a proof shall come
together?
1. Because SN (x0 ) is the N th partial sum of the series whose limit (by definition)
is g(x0 ), there is a positive integer N0 such that, for every N ≥ N0 , we
will get
ε
|E2 | = |SN (x0 ) − g(x0 )| < .
3
16.4 DIFFERENTIATING A POWER SERIES 291
2. Because ∞ 0 kak r
k−1 converges absolutely, there is a positive integer N such
2
that, for every N ≥ N2 , the ‘remainder’ of the (modulussed) series
∞ k−1 will be less than ε .
N+1 |ak | × kr 3
3. Choose now and fix a value of N that is bigger than each of N1 , N2 .
4. Because SN is merely a differentiable polynomial, the limit of E1 is zero. Hence
there is δ1 > 0 such that 0 < |h| < δ1 will guarantee that
ε
|E1 (h)| < .
3
5. Because |x0 | < r < D, if we pick δ2 = r − |x0 | then 0 < |h| < δ2 will
guarantee that (not only |x0 | but also) |x0 + h| will be less than r: in
consequence of which, the first round of estimates in STEP 3 will work.
6. So now the second round of estimates in STEP 3 gives
∞
|E3 (h)| ≤ |ak | × krk−1
N+1
16.4.2 Comments Why did we select a number r for which |x0 | < r < D?
• The superficial reason is that bad things can happen to power series at ±D, that
is, at the endpoints of the interval whose ‘radius’ is the radius of convergence of
the series. By selecting such an r and then only working inside [−r, r], we were
keeping these dangers at a small but safe distance away from our calculations.
• More precisely, the estimates taking place in STEP 3 were all rounded up, so to
speak, to the behaviour of the derived power series at the single point r. If the
power series had not converged at r, these estimates would have collapsed.
There was no guarantee that the derived series would converge at D itself: we
needed a point less than D upon which to hang these estimates.
• Furthermore, that same rounding-up process by which we estimated our
various power series by the absolutely convergent derived series at r shows that
they are all absolutely convergent also: and this we needed in order to be able to
rearrange them, which we did extensively in breaking up E(h) into
individually-handled fragments. Such a radical reorganisation of the terms of
those infinite series would have been illegal without a guarantee of their
absolute convergence.
.........................................................................
17 Riemann integration —
area under a graph
.........................................................................
17.1 Introduction
The words integrate, integral, integration occur very often in later school
mathematics with what seem to be, at first sight, two entirely different meanings.
Let us begin by reviewing the sorts of pre-university questions in this area that you
have certainly encountered many times before.
Interpretation
This means: find, by any means whatsoever (not excluding trial and error) another
function g(x) whose derivative g (x) is exactly the given f (x). (This is sometimes
described informally as un-differentiation of f (x).)
Solution
You probably know tricks such as ‘integration by parts’ for tackling this sort of
question (and if so, just use them), but don’t undervalue trial and error either.
The presence of cos suggests that sin ought to be part of the answer, so x sin x is a
reasonable first guess. Differentiate (via the product rule) and see if we are close:
d
{x sin x} = x(sin x) + (sin x)(x) = x cos x + sin x.
dx
Not bad, but we need to get rid of that last sin x. Of course, the derivative of cos x
is −sin x which would cancel it out, so . . . a second guess:
d
{x sin x + cos x} = x(sin x) + (sin x)(x) − sin x
dx
= x cos x + sin x − sin x = x cos x.
Success. An answer is g(x) = x sin x + cos x. Indeed, for any constant C that you
care to select, another answer is g(x) = x sin x + cos x + C since added-in constants
disappear under differentiation.
Undergraduate Analysis: A Working Textbook, Aisling McCluskey and Brian McMaster 2018.
© Aisling McCluskey and Brian McMaster 2018. Published 2018 by Oxford University Press
294 17 RIEMANN INTEGRATION — AREA UNDER A GRAPH
Interpretation
This means: find the area of the region of the coordinate plane lying under the
graph of f , above the horizontal axis, and between the vertical lines x = 1 and x = 6.
Solution
Even a very rough sketch graph will clarify what this region is: it consists of a right-
angled triangle with vertices at (1, 0), (3, 0), (3, 2) and a rectangle with vertices at
(3, 0), (6, 0), (6, 2), (3, 2). Primary school mathematics is perfectly good enough to
identify its area as 2 + 6 = 8.
(Incidentally, the terms definite and indefinite are often left out, and so may be
the phrase with respect to x provided that there are no other variables in play.)
Now for an exercise that brings the two ideas together:
17.1.3 Example C To determine the area of the region lying under the graph of
f (x) = x cos x, above the x-axis and between x = 0 and x = π4 .
Procedure
• First, find an indefinite integral g of f . We’ll lift our earlier answer from
Example A: g(x) = x sin x + cos x.
• Second, calculate the change in value of g from x = 0 to x = π4 ; this is often
π
x=
denoted by [g(x)]x = 04 :
x= π π π π π
[g(x)]x = 04 = g − g(0) = sin + cos − 0 sin(0) − cos(0)
4 4√ 4√ 4
π 2+4 2−8
= .
8
f (x)
a b
graph of f (x)
1. A partition of [a, b] is a finite set of elements in [a, b] that includes both the
endpoints a and b. Since is finite, we can always label its elements as
x0 , x1 , x2 · · · , xn in such a way that
and by default we shall always assume that this has been done. The effect of the
partition is to divide up [a, b] into a list of subintervals
296 17 RIEMANN INTEGRATION — AREA UNDER A GRAPH
1. Given a partition = {a = x0 < x1 < x2 < · · · < xn−1 < xn = b} of [a, b],
since f is bounded on the whole of [a, b], it is certainly bounded on each of the
subintervals [x0 , x1 ], [x1 , x2 ], [x2 , x3 ], · · · , [xk , xk+1 ], · · · , [xn−1 , xn ] and
possesses an infimum and a supremum on each of them. Put
for each k = 0, 1, 2, · · · , n − 1.
2. Notice that if f happens to have a smallest and/or a biggest value over
[xk , xk+1 ] then mk and/or Mk will be these values. In particular, this will
happen in cases where f is continuous and in cases where f is either increasing
or decreasing, and when it happens it simplifies the rest of the argument.
However, it fails to happen with many of the functions that we need to
consider.
f (x)
Mk
mk
xk xk+1
3. The lower Riemann sum and the upper Riemann sum (for this function, this
interval and this partition) are
17.2 RIEMANN INTEGRABILITY 297
n−1
L( f , [a, b], ) = mk (xk+1 − xk );
k=0
n−1
U( f , [a, b], ) = Mk (xk+1 − xk ).
k=0
Notice that L( f , [a, b], ) ≤ U( f , [a, b], ) just because mk ≤ Mk for all
relevant k.
f (x)
a b
A lower Riemann sum for f
f (x)
a b
An upper Riemann sum for f
298 17 RIEMANN INTEGRATION — AREA UNDER A GRAPH
These sums have natural geometric interpretations (at least, in the case where
f (x) > 0 always) as the area of the largest ‘histogram’ figure (placed upon the
subintervals of the partition) that definitely fits under the graph, and the area
of the smallest ‘histogram’ figure (placed upon the subintervals of the
partition) that definitely fits over the graph. At this point, common sense
suggests that, whatever we eventually define the area under the graph to mean,
it should lie somewhere between these under- and overestimates.
4. In any extended argument in which f or [a, b] doesn’t change, it is normal
practice to stop labelling the lower and upper sums with them: so that
17.2.3 Lemma If we add one extra point to a partition, it does not decrease the
lower Riemann sum nor increase the upper Riemann sum. That is, if + is
together with one extra point y where, say, xk < y < xk+1 , then
17.2 RIEMANN INTEGRABILITY 299
Proof
The change from L( f , [a, b], ) to L( f , [a, b], + ) only alters the individual sum-
mand mk (xk+1 − xk ) to m (y − xk ) + m (xk+1 − y) where m , m are the infima of f
on the sub-subintervals [xk , y] and [y, xk+1 ]. Since, clearly, m ≥ mk and m ≥ mk ,
this is a nett increase or, more precisely, cannot be a nett decrease.
A similar analysis deals with the upper sum.
Proof
We can evolve from to in stages, adding one new point at a time. At each
stage, the first lemma tells us that the lower sum increases (or stays still) and that
the upper sum decreases (or stays still). Hence the final result.
Proof
What this says is that if 1 and 2 are any partitions at all of [a, b], then
L(1 ) ≤ U(2 ). The proof consists of recalling that 3 = 1 ∪ 2 is a refinement
of each of the given partitions, and then applying the Improved Lemma:
17.2.6 Definition
1. Let us put A = {L( f , [a, b], ) : all partitions of [a, b]}. From the
proposition, the non-empty set A of real numbers is bounded above by any
U( f , [a, b], ) and therefore has a supremum, a least upper bound
(write it temporarily as J − ) such that J − ≤ U( f , [a, b], ) for every
partition .
In turn, the set B = {U( f , [a, b], ) : all partitions of [a, b]} is a
non-empty set of real numbers bounded below by J − , so it has an infimum, a
greatest lower bound J + such that J − ≤ J + .
2. The numbers J − and J + are called the lower Riemann integral and the
upper
Riemann integral of f over [a, b], and are more usually denoted by f
and f .
300 17 RIEMANN INTEGRATION — AREA UNDER A GRAPH
4. We call f Riemann integrable over [a, b] if f = f , and in that case, their
common value is called the Riemann integral of f over [a, b] and written as f .
b b
Sometimes, a more elaborated symbol such as a f or R f or a f (x) dx will
be employed if we feel a need to stress which interval we are operating over, or
which procedure (for there are others beyond Riemann’s) we are using, or
which variable is associated with the horizontal axis.
5. Since Riemann’s is the only integration procedure (apart from naïve
un-differentiation) being discussed in this chapter, we shall feel free to
abbreviate Riemann integable to integrable, Riemann integral to integral, and
upper or lower Riemann sum to upper or lower sum when it makes the text
easier to read.
17.2.7 Example To show that a constant function is Riemann integrable over any
closed bounded interval, and to evaluate the integral.
Solution
We consider f : [a, b] → R given by f (x) = C, x ∈ [a, b] where C is a constant. In
this case, for any partition
n−1
n−1
L( f , [a, b], ) = C(xk+1 − xk ) = C (xk+1 − xk ) = C(b − a)
0 0
and, likewise, U( f , [a, b], ) = C(b − a). The sets we denoted by A and B in the
definition paragraph above each consists of the single number C(b − a), so it is
entirely trivial to determine inf and sup for them: f = C(b−a) and f = C(b−a).
Therefore f is integrable, and its integral over [a, b] is C(b − a).
17.2 RIEMANN INTEGRABILITY 301
17.2.8 Example Given b > 0, and f defined by f (x) = x on the interval [0, b], to
b2
show that R f exists and equals .
2
Solution
For any partition
of [0, b], the fact that f is increasing tells us that, on the typical subinterval
[xk , xk+1 ], f (xk ) = xk is the least value of f and f (xk+1 ) = xk+1 is the greatest value
of f : that is, mk = xk and Mk = xk+1 . Thus, typical lower and upper sums take
the form
n−1
n−1
xk (xk+1 − xk ), xk+1 (xk+1 − xk ).
0 0
Consider now the special-case partition n all of whose subintervals have the
same length b/n (where n is a particular positive integer). In this case, xk is simply
kb/n for each k, so the lower and upper sums become much more accessible to
calculation:
n−1
n−1
L( f , [0, b], n ) = xk (xk+1 − xk ) = (kb/n)((k + 1)b/n − kb/n)
0 0
n−1
n−1
= (kb/n)(b/n) = (b2 /n2 ) k = (b2 /n2 )(n(n − 1))/2
0 0
n−1 b2 1
= b2 = 1−
2n 2 n
b2
and this is an increasing sequence with limit (and therefore supremum) . Of
2
course, the n s are only some of the possible partitions, so it is imaginable that the
supremum of all their lower sums might be different from the supremum of this
b2
sample. Yet it certainly cannot be smaller:1 so we learn that f ≥ .
2
n−1
n−1
U( f , [0, b], n ) = xk+1 (xk+1 − xk ) = ((k + 1)b/n)((k + 1)b/n − kb/n)
0 0
n−1
n−1
2 2
= ((k + 1)b/n)(b/n) = (b /n ) (k + 1) = (b2 /n2 )(n(n + 1))/2
0 0
b2 n+1 b2 1
= = 1+
2 n 2 n
b2
which is a decreasing sequence with limit (and therefore infimum) . The
2
infimum of all the upper sums might conceivably differ from the infimum of this
b2
special sample, but it cannot be greater: therefore f ≤ .
2
b2
Bearing in mind that f ≤ f in all cases, we now deduce that f = f = ,
2
as expected.
17.2.9 Remark It took us a full typed page of calculations to establish the integral
of the absurdly simple function f (x) = x. Your rational, entirely legitimate response
to that observation should be one of bitter disappointment, combined with a
determination to get access to better methods as soon as possible. The first step
in that direction is the following lemma, strongly reminiscent of the Cauchy
condition’s ability to detect the existence of a limit (for a sequence) without any
need to know what number that limit might turn out to be.
Proof
If, for each ε > 0, such a partition exists, then our inequality
L( f , [a, b], ) ≤ f ≤ f ≤ U( f , [a, b], ) for all partitions
says that f and f differ, if at all, by less than ε. Since the lower and upper
integrals do not depend on ε, that can only happen if they are equal. Hence f is
integrable.
Conversely, suppose that f is integrable. Then J = f = f is both the supremum
of the lower sums and the infimum of the upper sums. Given ε > 0, it is therefore
possible to find a partition 1 with L(1 ) > J − ε2 , and a partition 2 with
U(2 ) < J + ε2 . Put 3 = 1 ∪ 2 :
17.2 RIEMANN INTEGRABILITY 303
ε ε
J− < L(1 ) ≤ L(3 ) ≤ U(3 ) ≤ U(2 ) < J + .
2 2
Therefore the gap between L(3 ) and U(3 ) is smaller than ε, and the Darboux
criterion holds.
f (x)
a b
Darboux: U(f, Δ) − L( f, Δ) is shaded
Although the criterion seems to concern only the existence of the integral of f ,
its numerical value quite often emerges from the same calculations. Note that:
17.2.11 Corollary If f is Riemann integrable over [a, b], then its integral is the
unique number J such that
Proof
The integral certainly does lie between every lower sum and every upper sum, by its
definition. If it were not unique in this respect, there would have to be two distinct
numbers J1 and J2 with
Yet this implies U() − L() ≥ ε = J2 − J1 > 0 for every partition , in contra-
diction to the Darboux criterion.
As a first indication of the usefulness of the Darboux test, here is a worked
example of a function that is not Riemann integrable. Given how labour intensive
304 17 RIEMANN INTEGRATION — AREA UNDER A GRAPH
Solution
Let be absolutely any partition of [0, 5]. Since, in every interval, there are both
rationals and irrationals, f will take both the value 2 and the value 3 somewhere
in each of the subintervals [xk , xk+1 ] into which carves up [0, 5]. So (for each k)
mk = 2 and Mk = 3. Hence2
L( f , ) = 2(xk+1 − xk ) = 2 (xk+1 − xk ) = 2(5) = 10.
Likewise, U( f , ) = 15. So the supremum of ‘all’ the lower sums is just the sup of
the one single number 10, namely 10. Likewise, the infimum of ‘all’ the upper sums
is 15.
We conclude that no partition can force U( f , ) − L( f , ) to be smaller than
15−10 = 5, so Darboux alerts us that this function is not integrable. (Alternatively:
we have just shown that f = 10 = 15 = f , so f does not exist via the definition.)
Clearly, the choice of the numbers 0, 5, 2 and 3 has no real bearing on the
outcome: a function that takes a constant value on the rationals and a different
constant value on the irrationals is not integrable over any non-degenerate interval.
Here is another worked example illustrating how we can use Darboux plus its
corollary (17.2.11) both to guarantee the existence of a Riemann integral and to
determine its numerical value.
Solution
For any positive h < 1 let us consider the partition h = {1, 2 − h, 2 + h, 4} (whose
intention is to isolate the somewhat anomalous value x = 2 of the domain). We see
that L( f , h ) = 0 and that U( f , h ) = 14h.
ε
Firstly, given ε > 0, if we choose h to be, say, min 12 , 15 , then L( f , h ) and
4 f , h ) differ by less than ε, so Darboux guarantees that the Riemann integral
U(
1 f exists.
2 Note that (xk+1 − xk ) is always the total length of all the subintervals, that is, the length of
the whole interval in question.
17.3 THE INTEGRAL THEOREMS WE OUGHT TO EXPECT 305
Secondly, now that the integral is known to exist, Corollary 17.2.11 observes that
it is the unique number that lies between L( f , ) and U( f , ) for every partition
. Since 0 is the only number lying between L( f , h ) = 0 and U( f , h ) = 14h
(for every h between 0 and 1) even for the particular partitions h that we have
4
examined, 1 f can only be zero.
Proof
Given ε > 0, use Darboux to find a partition 1 of [a, b] such that
ε
U([a, b], 1 ) − L([a, b], 1 ) < .
2
ε
U([b, c], 2 ) − L([b, c], 2 ) < .
2
and
U([a, c], 3 ) = U([a, b], 1 ) + U([b, c], 2 ).
It follows that
Also, of course, c
L([a, c], 3 ) ≤ f ≤ U([a, c], 3 ).
a
b c c
Comparing these two displays, we see that a f + b f and a f differ, if at all, by no
more than L([a, c], 3 ) and U([a, c], 3 ) differ: that is, by less than ε. Since they
are independent of ε, which is arbitrary – and could therefore be made arbitrarily
small – that can be true only if they are exactly equal.
It is very convenient to be able to jettison the requirement a < b < c from this
result, and a simple notational convention allows this to happen:
The effect of this (rather artificial seeming) convention is that the equality
c b c
f= f+ f.
a a b
now becomes true no matter how the three numbers a, b, c are arranged on the
real line (always provided, of course, that the two integrals on the right-hand side
b a
do exist). For instance, if a = c, the equality merely says that 0 = a + b , that
b a
is, that a = − b , which is correct by the convention. Again, if a < c < b then
c b c c b b
the equality a = a + b decodes under the convention as a = a − c , that
17.3 THE INTEGRAL THEOREMS WE OUGHT TO EXPECT 307
b c b
is, a = a + c which is the originally established version of the theorem when
the limits of integration occur in that order.
There is also a valid converse to 17.3.1: if a < b < c and f is integrable over [a, c],
then it is necessarily integrable also over [a, b] and over [b, c]. In fact, with really
no additional work we can prove something slightly more general:
Proof
Given ε > 0, first use Darboux to find a partition of [a, b] for which
U( f , [a, b], ) − L( f , [a, b], ) < ε. Of course, it is perfectly possible that will
not include the points c and d . . . but if we refine by adding them in, the lower
sum will increase or stay still, and the upper sum will decrease or stay still: so, after
the refinement (if necessary) the gap between U(. . .) and L(. . .) will still be less
than ε. For that reason, we may as well assume that this has been done already,
and that does include both c and d.
With that understanding, = ∩ [c, d] is now a partition of [c, d], and
U( f , [c, d], ) − L( f , [c, d], ) is just the sum of those expressions
(Mk − mk )(xk+1 − xk )
for which the subinterval [xk , xk+1 ] happens to lie within [c, d].
f (x)
a c d b
Darboux: over subinterval [c, d] and over whole interval [a, b]
308 17 RIEMANN INTEGRATION — AREA UNDER A GRAPH
It is therefore ≤ the total of all such expressions across the whole of [a, b]; that is:
Proof
Given ε > 0 we use Darboux (twice) to find partitions 1 , 2 of [a, b] such that
ε ε
U( f , 1 ) − L( f , 1 ) < , U(g, 2 ) − L(g, 2 ) < .
2 2
ε ε
U( f , 3 ) − L( f , 3 ) < , U(g, 3 ) − L(g, 3 ) < .
2 2
Thinking back to the definitions of mk and Mk , and necessarily now enhancing that
notation to refer explicitly to the function it concerns, we have
Mk ( f + g) ≤ Mk ( f ) + Mk (g).
Therefore
n−1
L( f + g, 3 ) = (mk ( f + g))(xk+1 − xk )
0
n−1
≥ (mk ( f ) + mk (g))(xk+1 − xk )
0
n−1
n−1
= (mk ( f ))(xk+1 − xk ) + (mk (g))(xk+1 − xk )
0 0
= L( f , 3 ) + L(g, 3 ),
17.3 THE INTEGRAL THEOREMS WE OUGHT TO EXPECT 309
and
n−1
U( f + g, 3 ) = (Mk ( f + g))(xk+1 − xk )
0
n−1
≤ (Mk ( f ) + Mk (g))(xk+1 − xk )
0
n−1
n−1
= (Mk ( f ))(xk+1 − xk ) + (Mk (g))(xk+1 − xk )
0 0
= U( f , 3 ) + U(g, 3 ).
U( f + g, 3 ) − L( f + g, 3 ) ≤ U( f , 3 ) + U(g, 3 ) − L( f , 3 ) − L(g, 3 )
ε ε
= U( f , 3 ) − L( f , 3 ) + U(g, 3 ) − L(g, 3 ) < + = ε,
2 2
and it follows from Darboux that ( f + g) exists.
so, adding,
L( f , 3 ) + L(g, 3 ) ≤ f+ g ≤ U( f , 3 ) + U(g, 3 )........(1)
≤ U( f + g, 3 ) ≤ U( f , 3 ) + U(g, 3 )..............(2)
Proof
ε
• Case 1: C > 0. Given ε > 0, apply Darboux to f with the tolerance : there is a
C
ε
partition of [a, b] such that U( f , ) − L( f , ) < . Now (for this same
C
partition)
so
L(Cf , ) = mk (Cf )(xk+1 − xk ) = C mk ( f )(xk+1 − xk ) = CL( f , )
and likewise U(Cf , ) = CU( f , ).
ε
Therefore U(Cf , ) − L(Cf , ) = C(U( f , ) − L( f , )) < C = ε,
C
showing via Darboux that Cf is integrable.
Secondly, now that we know Cf exists,
L(Cf , ) ≤ Cf ≤ U(Cf , ); ............(3)
but also L( f , )
≤ f ≤ U( f , ) so, multiplying across by C, we find that
CL( f , ) ≤ C f ≤ CU( f , ), that is,
L(Cf , ) ≤ C f ≤ U(Cf , )........(4)
Comparing
(3) and (4), we again find that the difference between C f and
Cf is less than arbitrary ε, so the two expressions must coincide.
• Case 2: C = 0. Here, Cf is a constantly zero function, whose integral is zero, so
the result is entirely trivial.
• EXERCISE: check out the details of Case 3: C < 0. Be aware that scaling by a
negative number will swop over sups and infs: this time we have to anticipate
mk (Cf ) = CMk ( f ), L(Cf , ) = CU( f , ), and so on.
Here is perhaps the easiest of this set of theorems to believe and to prove:
Proof
For any partition , any resulting subinterval [xk , xk+1 ] and any x in that subin-
terval, we know f (x) ≤ g(x). That feeds through the sups and infs to give us
mk ( f ) ≤ mk (g) and Mk ( f ) ≤ Mk (g), feeds through the formation of sums to give
us L( f , [a, b], ) ≤ L(g, [a, b], ) and U( f , [a, b], ) ≤ U(g, [a, b], ), and feeds
through more sups and infs to provide f ≤ g and f ≤ g. Since the functions
are integrable, that is equivalent to what we had to prove.
17.3.7 Corollary If K ≤ f (x) ≤ L for all x ∈ [a, b], where K and L are constants,
then
K(b − a) ≤ f ≤ L(b − a).
Proof
Immediate upon applying the theorem to f together with each of the constant
functions K and L (whose integrals we determined some time ago).
Proof
All we have to show is that |f | can be integrated: because then
(for all x) f (x) ≤ |f (x)| ⇒ f ≤ |f |, and
(for all x) − f (x) ≤ |f (x)| ⇒ − f = (−f ) ≤ |f |
by the previous theorem. Then | f | is either f or − f and, whichever of the
two it is, it is ≤ |f |.
Using Darboux again (and given ε > 0), the integrability of f says there is a
partition for which (in the usual notation)
U( f , ) − L( f , ) = (Mk ( f ) − mk ( f ))(xk+1 − xk ) < ε.
‘biggest’ values may fail to happen, and we shall need to settle for the second-best
option, the supremum. What that indicates (and you can prove it formally without
much difficulty) is that Mk ( f )−mk ( f ) is the supremum of the differences-in-value
|f (t) − f (u)| as t and u vary across the subinterval [xk , xk+1 ].
Although that is a slightly awkward way to think about Mk ( f ) − mk ( f ) most of
the time, it is the optimal approach to take to it in this particular proof, because
the inverse triangle inequality:
|p| − |q| ≤ |p − q|, p, q ∈ R
gives us a neat way to connect these values for f and for |f |; look:
|f (t)| − |f (u)| ≤ |f (t) − f (u)|, all t, u in the subinterval.
Taking sups across that last line gives Mk (|f |) − mk (|f |) ≤ Mk ( f ) − mk ( f ), and
therefore
(Mk (|f |) − mk (|f |))(xk+1 − xk ) ≤ (Mk ( f ) − mk ( f ))(xk+1 − xk ),
that is,
U(|f |, ) − L(|f |, ) ≤ U( f , ) − L( f , ) < ε.
Therefore, via Darboux once more, |f | is indeed integrable.
The last instalment in this catalogue of expected theorems is the one that says
that the product of two integrable functions is integrable. It is quite intricate to
prove this directly, so we shall instead sneak up on it from behind using the
following lemma as cover:
Proof
Think again about the quantity we denote by Mk −mk as being the supremum of all
the differences |f (t) − f (u)| as t and u vary over the kth subinterval of the partition.
We need to compare this quantity as calculated for f with the same quantity as
calculated for f 2 , and this – keeping in mind that f is bounded, so |f (x)| < K for
some constant K and for all x – turns out to be rather easy:
|f 2 (t) − f 2 (u)| = |( f (t) + f (u))( f (t) − f (u))| = |( f (t) + f (u))| |( f (t) − f (u))|
≤ (|( f (t)| + |f (u))|) |( f (t) − f (u))| ≤ 2K|( f (t) − f (u))|.
Taking sups across that line, we find that (for each relevant k):
Mk ( f 2 ) − mk ( f 2 ) ≤ 2K(Mk ( f ) − mk ( f ))
17.4 THE FUNDAMENTAL THEOREM OF CALCULUS 313
and, in consequence,
U( f 2 , ) − L( f 2 , ) ≤ 2K(U( f , ) − L( f , )).
Now, given ε > 0, Darboux says that (when f is integrable) there is a partition
ε
for which U( f , ) − L( f , ) < . Feed that into the previous line, and we get
2K
ε
U( f 2 , ) − L( f 2 , ) ≤ 2K = ε.
2K
17.3.10 Theorem: multiplying two integrable functions Suppose that f and g are
both integrable over [a, b]. Then so is their product fg.
Proof
Forget Darboux for once: this is just basic school algebra. For any x ∈ [a, b], and
abbreviating f (x) to f and g(x) to g in order to minimise the clutter:
( f + g)2 − ( f − g)2
= fg
4
and, finally, so is
( f + g)2 − ( f − g)2
,
4
whence the result.
3 Very.
314 17 RIEMANN INTEGRATION — AREA UNDER A GRAPH
However, the theorems you have seen developing from it over the last several pages
now allow us to construct a much more efficient and easier calculation method, not
for all integrable functions, but for a huge range of commonly occurring ones. More
specifically, we now set out to investigate whether the method shown in Example
C in the introduction to this chapter is valid for the Riemann integral. The label
‘the fundamental theorem of calculus’ is used to refer to both of the theorems in
this section (and, on occasions, to other similar results).
Then:
1. F is continuous on [a, b].
2. If f is continuous at a point p of (a, b), then F is differentiable at p, and
F (p) = f (p).
3. If f is continuous on [a, b], then F = f everywhere in (a, b).
Proof
Since f is integrable, it must be bounded. Choose therefore a positive constant K
such that |f (x)| ≤ K always.
1. For any x ∈ [a, b) let positive h be small enough to ensure that y = x + h also
belongs to [a, b]. Then
x x+h
x+h
|F(x + h) − F(x)| = f− f = f
a a x
x+h
≤ |f |
x
x+h
≤ K = K((x + h) − x) = Kh.
x
Since the final item tends to zero as h → 0+ , we get the one-sided limit
limy → x+ F(y) = F(x).
Similarly, for any x ∈ (a, b], limy → x− F(y) = F(x).
The agreement of the two one-sided limits and of the value of F tells us that
F is continuous at each point of (a, b) (and we also got the correct one-sided
limits at a and at b, where it is only one-sided limits that are relevant). This
proves (1).
(We’re writing in the (x) here just to stress that f varies across the interval of
integration, whereas f (p) does not.)
Next, take the modulus:
p+h
F(p + h) − F(p) p {f (x) − f (p)}
− f (p) =
h h
p+h
p |f (x) − f (p)|
≤
h
(Consider now the supremum of all the values of |f (x) − f (p)| as x varies over
[p, p + h]:)
p+h
p sup |f (x) − f (p)| h sup |f (x) − f (p)|
≤ =
h h
= sup |f (x) − f (p)|.
p≤x≤p+h
which is equivalent to
F(p + h) − F(p)
lim = f (p).
h → 0+ h
F(p + h) − F(p)
lim = f (p)
h → 0− h
and completes the proof of part(2) since the two one-sided limits agree upon
the number f (p).
3. This is immediate from part (2).
17.4.2 Comments
• We actually proved a bit more than we claimed in part 2 above. Provided that
you interpret the phrase ‘F is differentiable on [a, b]’ to mean differentiability
on (a, b) plus existence of both
then our proof established that F was differentiable on the closed interval [a, b]
rather than just on the open interval (a, b).
• Note that we just now showed that every continuous function is the derivative
of something. However, that is no guarantee that we can come up with a simple,
explicit formula for that something. Furthermore, the converse is false: there are
plenty of discontinuous functions that are derivatives of something.5
5 For instance, differentiate f (x) = x2 sin(x−1 ) for x = 0, f (0) = 0, paying careful attention
to exactly what happens at the difficult point x = 0, and you should find that the derivative exists
at every point (including 0) but doesn’t even have a limit as x → 0.
17.4 THE FUNDAMENTAL THEOREM OF CALCULUS 317
Proof
With F as defined in the previous theorem, notice that F −G is continuous on [a, b]
and has zero derivative everywhere in (a, b). Therefore6 it is constant. In particular,
F(b) − G(b) = F(a) − G(a), which we rearrange into
17.4.4 Comment If T S Eliot is right when he says that . . . the end of all our
exploring will be to arrive where we started and know the place for the first time,
then that is just about the point we have reached in our brief exploration of the
integral as defined by the Riemann process: for the previous result now justifies
the way in which you have been calculating integrals up to now. In particular, the
phrase ‘suppose that we can find’ is, in a way, liberating: when you are looking
for an expression whose derivative is the given function f , you are free to use any
sixth-form tricks, any trial-and-error or sheer guesswork process, even mindlessly
looking up a cookbook table of standard derivatives and integrals, so long as you
check that the derivative of the thing you ‘found’ actually is f – and this check
is almost always a routine process since differentiating, unlike un-differentiating,
is normally a pretty algorithmic business. After that, the actual calculation of the
integral is just arithmetic.
17.4.5 Examples Assuming that the following expressions are integrable over the
indicated intervals, calculate their integrals. (You should assume, where appropri-
ate, basic properties of trig, logarithmic and exponential functions.)
1. f (x) = xn on [a, b] assuming that n = −1;
2. f (x) = x2 (5 + 2x3 )3/4 over the interval [0, 1];
3. f (x) = sin2 x cos3 x over [0, π/6];
4. f (x) = x2 e−x on [0, ln 2].
Solution
1 n+1
1. Since the derivative of n+1 x is exactly xn , the answer is
b
1 n+1 1
x = (bn+1 − an+1 ).
n+1 a n+1
2 2 2 7 7
[G(x)]10 = G(1) − G(0) = (5 + 2)7/4 − (5)7/4 = 74 − 54 .
21 21 21
3. Change of variable and the black arts of trigonometry will provide one way to
stumble into the function G(x) = 13 sin3 x − 15 sin5 x, whose derivative is easily
seen to be equal to f (x). Thus the answer is
π/6
[G(x)]0 = G(π/6) − G(0)
1 1 11 1 1 17
= sin3 (π/6) − sin5 (π/6) − 0 = − = .
3 5 3 8 5 32 480
[G(x)]ln 2
0 = G(ln 2) − G(0)
= −e− ln 2 ((ln 2)2 + 2(ln 2) + 2) − {−e−0 (02 + 2(0) + 2)}
1
= − ((ln 2)2 + 2(ln 2) + 2) + 2
2
1
= 1 − ((ln 2)2 + 2 ln 2).
2
It is easy to see that g is differentiable at each point of (0, 1] – just use the various
rules of differentiation to find g (x) there. With a bit more caution, you can also
verify that g (0) exists (and equals 0, if you’re curious about it). So the function g
upon the interval [0, 1] can certainly be un-differentiated. Yet, look at the formula
you get for g and you will see that it is unbounded, and therefore cannot be
Riemann integrated.
17.4 THE FUNDAMENTAL THEOREM OF CALCULUS 319
In fact, much stranger things than that can happen. It is possible to define a
function that is differentiable everywhere in R, and whose derivative is bounded
everywhere in R, and yet that derivative is not Riemann integrable over a closed
interval. The construction used in the definition is seriously sophisticated: search
for Volterra’s function if you have a lot of time and patience to spare.
For our purposes, the main point here is that un-differentiability is not a good
enough reason to assert that a function can be integrated. Our final major task
for this chapter is to identify a good range of functions (but by no means all) that
definitely can.
Proof
If f : [a, b] → R is continuous then, by the key theorem on uniform continuity, it
is also uniformly continuous there. Let ε > 0 be given.
By definition, we can find δ > 0 such that any two points of [a, b] that are less
than δ apart have f -values that are less than ε apart. Choose any partition of
[a, b] whose subintervals are each shorter than δ. On each subinterval [xk , xk+1 ] the
continuous function f has biggest and smallest values (which will be the numbers
Mk and mk used in the Riemann integral’s construction) and they will necessarily
differ by less than ε, that is:
(for each k) Mk − mk < ε, so U( f , ) − L( f , ) = (Mk − mk )(xk+1 − xk )
<ε (xk+1 − xk ) = ε(b − a).
ε
Go back over the last paragraph and re-run it with ε replaced by (this kind
b−a
of in-flight course correction in ‘epsilontics’ should be almost second nature to you
by now) and the revised conclusion U( f , ) − L( f , ) < ε shows via Darboux
that f is integrable.
Now we know that it is OK to dump the phrase ‘assuming that the following
expressions are integrable’ out of the last batch of examples, because all the
expressions there presented actually were (fairly obviously) continuous.
There are also many functions that are not continuous but are nevertheless
Riemann integrable. Here is one source:
Proof
Suppose that f : [a, b] → R is monotonically increasing (the decreasing case is
very similar or, in that scenario, you could choose to look at −f which is then
increasing . . . ). If f is actually constant then we already know it is integrable, so
suppose not: that is, suppose f (a) < f (b). Also let ε > 0 be given, so that we are
ready to try Darboux again.
On each subinterval [xk , xk+1 ] created by an arbitrary partition , the increasing
function f has biggest value f (xk+1 ) and smallest value f (xk ) (which will be the
numbers Mk and mk ). So
U( f , )−L( f , ) = (Mk −mk )(xk+1 −xk ) = ( f (xk+1 −f (xk ))(xk+1 −xk ).
Pick a partition whose subintervals all have the same length (say, h) and we now get
U( f , ) − L( f , ) = ( f (xk+1 − f (xk ))(xk+1 − xk )
=h ( f (xk+1 − f (xk )) = h( f (b) − f (a)).
ε
Now reverse-engineer the last paragraph by choosing h = , and
2( f (b) − f (a))
we find that
U( f , ) − L( f , ) = h( f (b) − f (a))
ε ε
= ( f (b) − f (a)) = < ε.
2( f (b) − f (a)) 2
There are also plenty of integrable functions that are neither continuous nor
monotonic. Here is an exercise to help you find some of them for yourself.
f (x) = 0 if x = c, f (c) = 1
where c is some particular point in [a, b]. Use Darboux to show that f is integrable
over [a, b] and check that its integral is zero.
(Paragraph 17.2.13 offers a useful approach. The cases where c equals either a or
b need a little extra attention.)
Extend this result (using whichever theorems help you) to show that a function
that is zero on a closed interval except at a finite number of particular points is
integrable.
17.4 THE FUNDAMENTAL THEOREM OF CALCULUS 321
Extend it further to verify that if two bounded functions f, g on [a, b] are equal
in value at all but a finite number of points, and one of them is integrable, then so
is the other one, and their integrals are equal.
17.4.10 EXERCISE
1. Given that f is bounded on [0, 2] and that, for each positive integer n, it is
integrable over [0, 2 − n1 ], show that f is also integrable over [0, 2], and that
2 2− n1
f = lim f.
0 n→∞ 0
A final idea to examine in this chapter is a test that, in some sense, really
belongs in Chapter 14 since it concerns convergence of series, but which had to
wait until we had developed the idea of integration. There are, in fact, not very
many series problems for which it is useful; however, for those few, it is usually the
only reasonably obvious test that will work at all.
17.4.13 The integral test for series Suppose that f : [1, ∞) → R is continuous,
positive and decreasing.7 Then the following statements are equivalent:
1. the series ∞k = 1 f (k) is convergent;
n
2. the sequence x = 1 f (x) dx n∈N is convergent.
Proof
For the integral just mentioned, f (2) + f (3) + f (4) + · · · + f (n) is a lower Riemann
sum, and f (1) + f (2) + f (3) + · · · + f (n − 1) is an upper Riemann sum (where
n ≥ 1 is an integer). If, as usual, we denote by Sn the nth partial sum of the series
∞
k = 1 f (k), we therefore have:
n
Sn − f (1) ≤ f (x) dx ≤ Sn − f (n).
x=1
n
This shows that if either of the sequences x = 1 f (x) dx , (Sn ) is bounded above,
then so must the other be. Since both sequences are increasing, this is the same as
saying that if either of them is convergent, then so must the other be.
17.4.14 Illustration An alternative proof that the harmonic series diverges: the
function f (x) = x1 certainly is positive, continuous and decreasing on [1, ∞), and
n
(as is well known) 1 f = [ln x]n1 = ln n − ln 1 = ln n. Since the sequence (ln n)n≥1
is unbounded and therefore divergent, the integral test informs us that the series
∞
∞
1
f (k) =
k
k=1 k=1
1
ak =
k ln k
converge or diverge?
Solution
In order to use the integral test, we need to consider the corresponding real
function f specified by
1
f (x) = .
x ln x
Notice that this formula goes bad at x = 1 since ln 1 = 0, but that this does not
impede our progress since the series started at n = 2 (for the same necessitating
reason). We’ll therefore regard f as being defined on [2, ∞) (and we also need a
slightly modified version of the test, in which k = 1 and x = 1 are replaced by k = 2
and x = 2).
Now it needs a little bit of insight or experience (or luck) to notice that f (x) is
precisely the derivative of the function g(x) = ln(ln x), x ∈ [2, ∞). The rest of the
argument is routine:
17.4 THE FUNDAMENTAL THEOREM OF CALCULUS 323
n
f (x) dx = [g(x)]n2 = ln(ln n) − ln(ln 2)
2
which is unbounded8 and therefore divergent. By the integral test, the given series
is also divergent.
∞
1
n=3
n ln n ln(ln n)
diverges.
17.4.17 EXERCISE
∞
≤ f (k) ≤ + f (n0 ).
k = n0
∞ 1
2. Estimate the sum of the (convergent) series 1 k5 with an error less than
0.001.
17.4.18 Note It’s occasionally useful to notice that, in the context and notation
of the integral test, when the two (equivalent) conditions hold then f (x) has a
limit of 0 as x → ∞. For suppose that f : [0, ∞) → R is continuous,10 positive and
decreasing and that ∞ 1 f (n) converges. Then certainly f (n) → 0 as n → ∞. For
any given ε > 0 we can therefore find n0 ∈ N such that f (n) < ε. Thinking now
of f (x) decreasing (and also positive) as the real variable x increases, we see that
x ∈ R, x ≥ n0 together imply that 0 < f (x) < ε. Hence f (x) → 0 as x → ∞, as
claimed.
K
8 (for instance) because, for any positive constant K, if we choose a value of n greater than ee ,
then we shall get g(n) > K − ln(ln 2) > K.
9 Notice that the function (x ln x ln(ln x))−1 is positive and decreasing on [3, ∞] but not on
[2, ∞]: indeed, it is not even defined at x = e.
10 actually, continuity does not play any role in this part of the argument.
.........................................................................
18 The elementary
functions revisited
.........................................................................
18.1 Introduction
One of the benefits of now having a logically watertight definition of integral is
that we can at last provide reliable definitions of the so-called elementary functions
such as ln x, ex and sin x, and prove that the basic properties of these entities that we
have been cheerfully using throughout – in order to enrich our library of examples
and exercises – do actually hold good in all circumstances. If it seems surprising
and even counter-intuitive that integration theory should be required for this task,
pause and think about the useful facts that the area under the graph of f (x) = x1
between x = 0 and x = a is ln a, that the area under the graph of f (x) = ex between
x = a and x = b is eb − ea (assuming that a < b), and that the number π that is
so critical to trigonometry is the area of a circle of unit radius. Area is evidently
quite central to how these functions operate, so perhaps it is more natural than
it initially seems that area (interpreted as integral) should provide a means for
defining them in a way that is consistent with intuition and common sense, but
not dependent on either.
The fact that f is continuous on the interval from 1 to t guarantees that the integral
exists, and the following details about the so-called natural logarithm function thus
created are immediate:
Undergraduate Analysis: A Working Textbook, Aisling McCluskey and Brian McMaster 2018.
© Aisling McCluskey and Brian McMaster 2018. Published 2018 by Oxford University Press
326 18 THE ELEMENTARY FUNCTIONS REVISITED
1. ln 1 = 0;
2. if t > 1 then ln t > 0;
t 1
3. if t < 1 then ln t = 1 f = − t f < 0.
18.2.2 Lemma The function ln is differentiable at the point t (for each t > 0) and
1
its derivative there is .
t
Proof
Immediate from the fundamental theorem of calculus.
Proof
It is easy to ‘see’ this from a sketch graph:
1
x
1 1 1
ln n > 2 + 3 ··· + n
1
2
1
1 3
4
1
n
···
1 2 3 4 ··· n−1 n x
1
x
1 1 1
ln n < 1 + 2 + 3 ··· + n−1
1
2
1
3
1
n−1
···
1 2 3 4 ··· n−1 n x
but a more logically robust reason is the fact that the first and third items in that
display are exactly the lower and upper Riemann sums for f using the partition
1 < 2 < 3 < 4 < · · · < n of the interval [1, n].
18.2 LOGARITHMS AND EXPONENTIALS 327
18.2.4 Corollary ln x → ∞ as x → ∞.
Proof
From the Lemma, and the fact that the harmonic series diverges, we get1 ln n → ∞
as n → ∞. Now for any real x we have x ≥ x so, since ln is increasing,2
ln x ≥ lnx and, letting x → ∞ (and consequently x → ∞ also) we get
ln x → ∞.
Mildly reassuring though these details are, we are still missing the essential
point of what logarithms are for: their prime purpose, whether for calculation, for
algebraic simplification or for theoretical arguments, is to convert multiplication
into addition: ‘ln(xy) = ln x + ln y’. We next need to establish this fundamental
‘law’.
Proof
Consider any real constant a > 0, and define a real function g : (0, ∞) → R by
the formula
g(x) = ln(ax) − ln x.
Using the chain rule (and our known derivative of ln), it is easy to differentiate this
at any positive x:
1 1
g (x) = a − = 0.
ax x
It follows that g must be constant, and that its constant value is
g(1) = ln a − ln 1 = ln a,
Proof
(Using the theorem):
1 1
0 = ln 1 = ln y = ln y + ln .
y y
1 See 2.9.9.
b a b
2 If 0 < a < b then ln b − ln a = 1 x1 dx − 1 1x dx = a x1 dx which is positive.
328 18 THE ELEMENTARY FUNCTIONS REVISITED
x
18.2.7 Corollary 2 For any x > 0 and y > 0, ln = ln x − ln y.
y
Proof
(Using the theorem and its first corollary):
x 1 1
ln = ln x × = ln x + ln = ln x − ln y.
y y y
Proof
If x is a small positive number, then x−1 is a large positive number. More
precisely, as x → 0+ , x−1 → ∞ and (via 18.2.4) ln x−1 → ∞. Consequently
ln x = −ln x−1 → −∞. (If the last step does not appear obvious, you can confirm
it from the definitions.)
Our intuitive picture of the graph of ln is now reasonably complete:
ln x
1
x
graph of ln x
and an important detail emerges: the range of ln is the whole real line (−∞, ∞)
because, for any x ∈ R, we can use the behaviour of ln near to 0 and ‘near to ∞’ to
find a value ln a of ln that is less than x and a value ln b of ln that is greater than x.
Now since ln is continuous, the IVT tells us that x itself is a value of ln.
SUMMARY: ln is a (strictly increasing and therefore) one-to-one map from
(0, ∞) onto R = (−∞, ∞) and, of course, the vital thing about one-to-one onto
maps is that they possess inverses.
18.2 LOGARITHMS AND EXPONENTIALS 329
18.2.9 Definition We define a function, called the exponential function and (for
the moment) denoted by exp : R → (0, ∞), by declaring exp to be the inverse of ln.
We are reluctant to reach for the notation ln−1 since the risk of consequent
confusion is high. Instead, let’s concentrate on what inverse function means in
this case:
• For each t > 0 we have exp(ln t) = t.
• For each x ∈ R we have ln(exp x) = x.
At this point, look back to what we found out about inverses of continuous,
differentiable, strictly increasing functions in Chapters 8 and 12 (specifically,
paragraphs 8.6.8 and 12.2.16). From there, we know that exp is continuous and
differentiable, and we have a formula for its derivative at each point, namely:
1
at each p ∈ (0, ∞), exp (ln p) =
ln p
which simplifies, since ln p = p1 , to exp (ln p) = p. Substituting x for ln p, that is,
p = exp x, and noting that x ranges over the whole real line as p ranges over (0, ∞),
that says
exp (x) = exp x for every x ∈ R
and we have recovered what is possibly the most important fact about exp: that it
is its own derivative. The other basic details follow easily enough:
18.2.10 Proposition
Proof
1. Its derivative ( = itself) is always positive. (Alternatively, appeal to 8.6.8 or
8.6.10.)
2. Firstly, if ε > 0 is given, then (using part 1) x < ln ε implies
0 < exp(x) < exp(ln ε) = ε. (Recall that exp(x) is always positive.) Secondly,
given K > 0, then x > ln K implies exp(x) > K for the same reasons.
3. Let p = exp x, q = exp y, that is, x = ln p, y = ln q. Then
x + y = ln p + ln q = ln(pq) and therefore exp(x + y) = pq = exp x × exp y.
4. Because ln 1 = 0.
330 18 THE ELEMENTARY FUNCTIONS REVISITED
The proposition guarantees that the overall appearance of the graph of exp is, as
is widely known:
ex
x
graph of e x
18.2.11 EXERCISE
1. Use induction to check that exp operates through all finite sums, in the sense
that exp( n1 xi ) = exp(x1 ) exp(x2 ) exp(x3 ) · · · exp(xn ) for all finite lists of real
numbers x1 , x2 , x3 , · · · , xn .
2. Show that exp(n) = en for every positive integer n.
3. Show that exp(1/m) = e1/m for every positive integer m.
4. Show that exp(r) = er for every positive rational r.
5. Show that exp(q) = eq for every rational q.
The essence of that little investigation is that exp(x) and ex agree at all the values
of x for which common sense tells you the meaning of ex ; once x stops being
rational, ex no longer possesses
√
a ‘natural’ meaning, and this is nothing to do with
2
e, for expressions such as 2 are also beyond the grasp of basic algebra – they
require proper definition before we can study them with any degree of confidence.
18.2.12 Definition (Now that we have proper definitions of exp and ln,) for x ∈ R
and a > 0 we define
ax = exp(x ln a).
18.2 LOGARITHMS AND EXPONENTIALS 331
Notice first that in the special case where a = e, we have actually defined
ex to mean exp(x) because ln e = 1 (in turn, because exp(1) = e by definition).
As regards reconciling formal and informal definitions of powers of numbers
other than e, you may find the following ‘spiked’ version of the previous exercise
useful:
This time, the message is that wherever ax can be defined by common sense
and simple algebra (that is, whenever x is merely a rational number), then that
common-sense definition gives the same answer as the formal all-purpose defini-
tion set out above.
Of course, we now need to confirm that these general powers obey the familiar
index laws: but this is reassuringly straightforward:
Proof of 1.
ax ay = exp(x ln a) exp(y ln a)
= exp(x ln a + y ln a)
= exp((x + y) ln a)
= ax+y .
3 The point is that we don’t need ln and exp in order to define an when n is a positive integer,
it just means write a down n times and multiply the lot.
4 Again, we don’t need ln and exp in order to define an mth root for a when m is a positive
integer, it just means the (positive) number whose mth power is a.
5 Similar remarks.
6 Similar remarks.
332 18 THE ELEMENTARY FUNCTIONS REVISITED
18.2.16 EXERCISE
• For a given real number a, use the one-sided version of l’Hôpital’s Rule (see
paragraph 16.2.6) to determine
ln(1 + ax)
lim .
x→0+ x
• Now use the sequence-based description of one-sided limits (see 10.3.6) to
deduce that
ln 1 + na
lim 1
= a.
n→∞
n
• Lastly, use continuity of the exponential function to deduce that
a n
1+ → ea as n → ∞.
n
18.2.17 EXERCISE Compute the Taylor series of ln x about x = 1, and confirm
that it converges (at least) everywhere in the interval ( 12 , 32 ).
18.2.18 EXERCISE Compute the Taylor series of exp x about x = 0, and confirm
that it converges everywhere on the real line.
18.3.1 Definition
1
π =2 f1
−1
1
=2 1 − x2 dx.
−1
18.3 TRIGONOMETRIC FUNCTIONS 333
y = f1(x)
−1 1
y = f 2(x)
θ Q 1
O x
P′
18.3.2 Comments To see the roughwork that lies behind the next definition, take
a look at the second diagram above, where the unit circle (centre 0, radius 1) is
cut by the vertical chord PP that crosses the horizontal axis at Q (so all three
points P, Q, P have the same first coordinate x), the sector POP (shaded) has area
A and the angle POQ is designated θ. Allowing ourselves to think, for just one more
paragraph, that we actually did understand basic trigonometry years ago, what is
the relationship between x and A? Well, θ is the angle (within the acceptable range)
whose cosine is x and, by the ‘ 12 r2 θ’ formula for sector area, A is 12 12 (2θ) = θ .
Therefore
A = cos−1 x.
334 18 THE ELEMENTARY FUNCTIONS REVISITED
1
x 1 − x2 + 2 1 − u2 du.
x
(We have had to use a letter other than x for the variable of integration, since x has
been assigned already to denote the first coordinate of P.)
Secondly, although the above diagram (and our thinking that it supported)
tacitly assumed that θ was less than a right angle, the displayed formula for area
A remains valid if θ lies between π2 and π : for now, the sector area is twice the
√
area under the graph of x 1 − x2 minus the triangular area POP , and that minus
is picked up by the fact that the first x in the displayed formula is now negative
(please refer to the following diagram).
Q θ
1
x O
P′
7 Of course, it is common practice to use the notation arccos x instead of cos−1 x, but here we
have opted for the latter in order to stress the importance of invertible functions in this approach.
18.3 TRIGONOMETRIC FUNCTIONS 335
1
A(x) = x 1 − x2 + 2 1 − u2 du, −1 ≤ x ≤ 1.
x
18.3.4 EXERCISE
1. Verify that, on the open interval (−1, 1), A is differentiable and its
derivative is
1
A (x) = − √ .
1 − x2
(You
√ will need to use only the product rule and the chain rule to differentiate
x 1 − x2 . 1√ x√
Then express x 1 − u2 du as − 1 1 − u2 du, and appeal to the
fundamental theorem of calculus.)
2. Check that A(−1) = π and that A(1) = 0, and show that
18.3.5 Note The essence of the above Exercise is that A is the sort of function
to which we can apply the first mean value theorem: it is continuous on [−1, 1],
differentiable on (−1, 1) and its derivative is always less than zero here, so it is
strictly decreasing and therefore one-to-one, and its range is precisely the interval
[0, π ]. Viewing it therefore as a map
A : [−1, 1] → [0, π ],
18.3.6 Definition The real function cos (cosine) is defined as follows (using the
notation from above):
1. For 0 ≤ x ≤ π , cos x = A−1 (x).
2. Then for −π ≤ x ≤ 0, cos x = cos(−x).
3. Then for each integer n, cos(x + 2nπ ) = cos x.
336 18 THE ELEMENTARY FUNCTIONS REVISITED
18.3.7 Remarks
1. Although it is not strictly part of the definition, you can follow the ‘evolution’
of cosine through points 1, 2 and 3 above by looking at the three phases of the
diagram supplied:
cos x
0 π x
cos x
−π π x
cos x
2. The inequality −1 ≤ cos x ≤ 1 (for all real x) is built into the definition, and so
is cos(x + 2nπ ) = cos x (for all real x and integer n).
3. We also record that cos, when restricted to the interval [0, π ], has an inverse
cos−1 : [−1, 1] → [0, π ] (namely, the function A) whose derivative
1
cos−1 (x) = − √
1 − x2
Having formally defined one of the trig functions, we can now create the others
from it quite routinely:
18.3.9 Remarks
1. Again, it may be helpful to follow the evolution of sine through points 1, 2 and
3 above by looking at the three phases of the sketch graphs provided:
18.3 TRIGONOMETRIC FUNCTIONS 337
sin x
0 π x
sin x
−π
π x
sin x
2. The inequality −1 ≤ sin x ≤ 1 (for all real x) is built into the definition, as are
the equalities sin(x + 2nπ ) = sin x (for all real x and integer n) and
(sin x)2 + (cos x)2 = 1 (for all real x).
3. It is, as you will almost certainly be aware, conventional to write sin x and cos x
rather than sin(x) and cos(x) (and similarly for the other trig functions, and
also for ln) provided that the argument x is a single letter, but beware of the
dangers of extending this custom to cases where the argument is
typographically complex. The symbol sin π x is capable of being interpreted
either as sin(π x) or as (sin π )x, so use bracketing to prevent the ambiguity. It
is also conventional (so long as n is a positive integer) to denote the nth power
of sin x and of cos x not as we have done above, but rather as sinn x and cosn x.
Clearly, one should carefully avoid creating confusion by doing this for
n = −1.
18.3.10 Theorem
Proof
1. For 0 < x < π , we need only invoke once more the theorem on differentiating
an inverse function to see that
1
cos (x) = (A−1 ) (x) = = − 1 − (cos x)2 = −sin x.
A (cos x)
338 18 THE ELEMENTARY FUNCTIONS REVISITED
2. For −π < x < 0, the result follows from part 1 because we defined cosine as
an ‘even’ function and sine as an ‘odd’ function.
3. For all real x except multiples of π , the periodicity of both sine and cosine
extends the validity of parts 1 and 2.
4. When x is a multiple of π , we can ‘patch’ the desired equality by an appeal to
Theorem 12.3.20.
√
18.3.11 EXERCISE Starting with sin x = 1 − cos2 x on the interval (0, π ), show
that sin (x) = cos x for all real x. You may expect to have to extend the result from
(0, π ) to R in stages, as in the preceding theorem.
The addition formulas for sine and cosine (the identities for sin(x + y) and
cos(x + y)) are not very obvious in the present approach, but can be established
indirectly by one of the nicest instances of rabbit-out-of-hat mathematics (the sort
of argument that is clear afterwards, but completely invisible beforehand) that you
are ever likely to see:
Proof
Consider the function g(x) = f (x)2 + f (x)2 . Routine differentiation shows that
g (x) = 0 everywhere, so g is a constant function. Now the second and third bullet
points tell us that its constant value is zero.
Proof
Consider the function h(x) = f (x) − a cos x − b sin x. Apply Lemma 1 to h and we
find that it is the zero function.
18.3 TRIGONOMETRIC FUNCTIONS 339
Partial proof
Consider y as fixed for the moment and define f by f (x) = sin(x + y). Differenti-
ating twice shows that f (x) = −sin(x + y), and also notice that f (0) = sin y and
f (0) = cos y. By Lemma 2, f (x) = sin y cos x + cos y sin x for every real x. Since y
was arbitrary, part 1 is established.
18.3.16 Note At this point, the tasks of defining and differentiating the functions
tan, sec, cot and cosec, and their inverses where appropriate, are pedestrian and
can safely be left unless and until there is a need for them.
18.3.17 EXERCISE
1. Verify that the function tan, defined (initially) on the interval (− π2 , π2 ) by the
formula
sin x
tan x = ,
cos x
is continuous and differentiable, with (positive) derivative (cos x)−2 , and has
range R; also that its inverse arctan or tan−1 : R → (− π2 , π2 ) is continuous
and differentiable, and that its derivative is given by
1
(tan−1 ) (x) = .
1 + x2
t t2 t3 t4
1− + − + − ···
3 5 7 9
x2 x4 x6 x8
1− + − + − ···
3 5 7 9
and
x3 x5 x7 x9
x− + − + − ···
3 5 7 9
are absolutely convergent, the second one to some function f (x), on (at least)
the interval (−1, 1).
340 18 THE ELEMENTARY FUNCTIONS REVISITED
1
f (x) = 1 − x2 + x4 − x6 + x8 − · · · = (−1 < x < 1).
1 + x2
x3 x5 x7 x9
x− + − + − · · · = tan−1 (x).
3 5 7 9
19 Exercises: for
additional practice
.........................................................................
These further exercises are presented broadly in line with the order in which their
associated material occurs in the main text, but you should be aware that analysis
is a profoundly interconnected subject, so that ideas from an earlier or a later
section than the one that seems to be central to a particular question may well
turn out to be valuable in crafting a good answer. Specimen solutions to these
problems are available to instructors via the publishers: please visit the webpage
www.oup.co.uk/companion/McCluskey&McMaster to find out how to seek access
to these.
1. How far along the list of numbers
1 1 1 1 1 1 1 1
1− − , 1 − − , 1 − − , 1 − − ,...
3 9 5 25 7 49 9 81
can we go and be certain that, from then on, all the numbers we find are
approximations to 1 whose errors are less than 10−6 ?
3. For the list of numbers
1 2 3 4 n
10 + , 10 + , 10 + , 10 + , · · · , 10 + ,···
2 5 10 17 1 + n2
how far along should we go if we need to be sure that, from then on, each
number we encounter differs from 10 by less than 0.000 003?
4. Find a stage along the list of numbers
9 + (1)−2 , 9 + (2)−2 , 9 + (3)−2 , 9 + (4)−2 . . .
Undergraduate Analysis: A Working Textbook, Aisling McCluskey and Brian McMaster 2018.
© Aisling McCluskey and Brian McMaster 2018. Published 2018 by Oxford University Press
342 19 EXERCISES: FOR ADDITIONAL PRACTICE
after which we can be sure that each of these approximations to 3 will have
error less than 10−5 .
2 5
5. Prove, by the definition of limit of a sequence, that 2 − + 4 → 4 (as
n n
n → ∞).
6. Use the definition of convergence of a sequence to a limit to prove that
7 − 5n3 5
(a) 3
→− ,
2n 2
4 5
(b) 3 + − 2 → 3.
n n
7. Use the definition of convergence to a limit to prove that
1
(a) → 0,
5n − 1
1
(b) √ → 0,
n n
1
(c) 2 → 0.
n − 30π n
8. Let (xn )n∈N be a given convergent sequence of real numbers whose limit is
. Prove, directly from the definition of convergence, that 6xn → 6.
9. Show via the definition of convergence that if xn ≥ 0 for every n ∈ N and
√
4 + xn → 2, then xn → 0.
10. (The arithmetic mean – geometric mean inequality, more briefly called the
AM – GM inequality.)
√ x+y
(a) If x ≥ 0 and y ≥ 0, prove that xy ≤ .
√ √ 22
(Hint: begin by noticing that ( x − y) ≥ 0.)
(b) Use part (a) to deduce that, for every four non-negative numbers w, x, y
√ w+x+y+z
and z, we have 4 wxyz ≤ .
4
(Hint: you can apply part (a) to w and x, and then to y and z. Can it then
√ √
be applied to wx and yz?)
(c) Note that these are particular cases of a more general result: for any
positive integer n and any list a1 , a2 , a3 , · · · an of non-negative numbers,
we have
√
n
a1 a2 a3 · · · an ≤ (a1 + a2 + a3 + · · · + an )/n.
11. The harmonic mean of two positive numbers a and b is defined to be the
reciprocal of the arithmetic mean (that is, the average) of their reciprocals.
Investigate
√ whether this is greater or smaller than their geometric mean
ab.
12. Using various parts of the algebra of limits theorem, determine the limits (as
n → ∞) of the sequences whose nth terms are as follows:
7n3 − 4n2 + 5
(a) ,
2 + 2n − n3
19 EXERCISES: FOR ADDITIONAL PRACTICE 343
1
(b) ,
+2
n2
2
1 2 3n + 1
(c) 1 + − 2 .
n n 2n2 − 1
13. Use the algebra of limits to determine limn→∞ an and limn→∞ bn where
3n + 4n2 + 5n4 2 2−n 2
an = , bn = (23 − 7n + 2n ) .
(6 + 7n2 )2 5 − n2
14. Using various parts of the algebra of limits theorem, determine the limits (as
n → ∞) of the sequences whose nth terms are as follows:
5
(a) 3 + ,
n
6n + π 2
(b) ,
5n
2 3
(c) 2 + − 4,
n n
6n3 + 4n2 − 1
(d) ,
17 − 7n + 2n3
1+n
(e) ,
1 + n + n2
5
3n + 2
(f) .
4−n
3 2
15. Put xn = −1 − + 2 (for each positive integer n). By simplifying the
n n
difference xn+1 − xn , show that (xn ) is an increasing sequence.
1 1
16. Show that the sequence (cn ) described by cn = 2 + − 2 is decreasing
n n
provided that n ≥ 2.
5 2
17. Let us denote (for each positive integer n) by xn the number 4 + − 2 .
n n
Show that every xn satisfies the inequality −12 ≤ xn ≤ +12. By simplifying
the difference xn+1 − xn , show that (xn ) is a decreasing sequence.
18. Notice first that 2 < 3, 4 < 5, 8 < 9, 16 < 17, . . . , 2n < 2n + 1.
Consequently, 1/2 > 1/3, 1/4 > 1/5, 1/8 > 1/9, 1/16 > 1/17, . . . ,
2−n > 1/(2n + 1). Now put
1 1 1 1 1
xn = + + + + ... + n
3 5 9 17 2 +1
for each positive integer n. Show that the sequence (xn )n∈N is bounded. Also
check that it is increasing. Why must it converge?
√
19. Show that the sequence (n − n)n∈N is increasing, and not bounded.
344 19 EXERCISES: FOR ADDITIONAL PRACTICE
20. Explain (in terms of sequence limits) the meaning of the recurrent decimals
0.44444 . . . and 0.2136363636 . . . and express each of them as a rational
number (a fraction in the usual sense of that word).
21. Explain the meaning of (and evaluate as a fraction) the recurring decimal
1.281818181 . . .
22. Prove that a decreasing sequence that is bounded below must converge, and
that its limit is the infimum of the set of all its terms.
23. Use the squeeze
√ to find the limits of the sequences whose typical terms are:
(a) xn = n 4n + 6n ,
3n + 5 sin(n2 + 2)
(b) yn = .
1 − 6n
24. Use the squeeze to show that the sequences whose nth terms are as follows
are convergent:
(−1)n +4n sin(n12 − 2n7 )
2
3n
(a) + ,
1 − 4n
√ n
√
(b) n 3n + 5n + 8n , assuming that n a → 1 for each positive constant a.
(We prove this result in paragraph 6.2.3.)
√ √
25. Find the limit as n → ∞ of the sequence ( 5n + 9 − 5n + 4).
√ √
26. Find the limit (as n → ∞) of 3n + 2 − 3n − 2.
27. We are given three sequences (an )n∈N , (bn )n∈N and (cn )n∈N and we are told
only that an → , cn → 0 and |an − bn | ≤ |cn | for every n ∈ N. Prove that
bn → .
28. Use the squeeze to show that the sequences whose nth terms are as follows
are convergent:
(−1)n
(a) π/4 + √ ,
n
√
(b) n 1000 + 3n . You√may assume that, for any positive constant k that you
choose, we have k → 1.
n
29. (a) Let there be given a sequence (xn )n∈N for which the subsequence
(x2n−1 )n∈N of all odd-numbered terms and the subsequence (x2n )n∈N of
all even-numbered terms both converge to the same limit . Prove that
the entire sequence (xn )n∈N also converges to .
(b) For the sequence (an )n∈N described by
⎧
⎪
⎪ 3 + 7n − n2
⎪
⎨ 2n2 + n + 12 if n is odd,
an =
⎪
⎪ (0.7)n − 1
⎪
⎩ if n is even,
(0.6)n + 2
use part (a) to prove that it converges and to determine its limit.
19 EXERCISES: FOR ADDITIONAL PRACTICE 345
1
30. Consider the sequence (xn ) described by xn = 1 − if n is the kth prime
k
1
number, xn = 1 + if n is the k non-prime positive integer. Write down
th
k
the first twelve terms of the sequence (xn ). How large a value of n will
guarantee that |xn − 1| < 0.01?
31. Show that the sequences whose nth terms are as follows are unbounded:
√
(a) (−1)n n,
n2 + 1
(b) .
n+5
32. Show that the sequences whose nth terms are as follows are unbounded:
√
(a) 3 n − 1000,
1 + n2
(b) .
5 − 8n
33. Show that the following sequences are unbounded:
nπ
(a) 2n + 5 + 8 sin ,
17
1 − n2
(b) .
1 + 2n
34. Write down the total of the following list of numbers:
1 1 1 1
1+ + + + ... + n
3 9 27 3
for an arbitrary positive integer n. Using this (or otherwise) show that the
sequence (xn ) defined by the formula
2 8 26 3n − 1
xn = 1 + + + + . . . +
32 34 36 32n
Show that 9 < xn ≤ 20 for all n and that (xn ) is a decreasing sequence. Then
show that it converges and determine its limit.
43. Does the following sequence converge? If so, what is its limit?
⎛ ⎞
⎜ √ √ √ √ ⎟
⎝ 12, 12 + 12, 12 + 12 + 12, 12 + 12 + 12 + 12, . . . ⎠
44. The sequence (dn )n∈N is defined recursively by the two formulae
2 2 + 2dn
d1 = , dn+1 = (for each n ∈ N).
3 3 + dn
Show that
2
(a) ≤ dn < 1 for all n ∈ N,
3
(b) (dn )n∈N is an increasing sequence,
(c) (dn )n∈N converges, and determine its limit.
19 EXERCISES: FOR ADDITIONAL PRACTICE 347
45. The sequence (xn )n∈N is defined recursively by the two formulae
2
x1 = 2, xn+1 = (for each n ∈ N).
1 + xn
Obtain a formula for xn+2 in terms of xn . Think what this tells you about the
subsequence of odd-numbered terms, and about the subsequence of
even-numbered terms. Use Exercises 44 and 29 to determine the limit of the
sequence (xn )n∈N .
46. Give an example of two divergent series an and bn for which
(an + bn ) is convergent.
47. Find(if it exists) the limit (as n → ∞) of
nπ
(a) n 100 + sin v
17
2n
(b) √
n! + n!
√
(c) n 60.5n + 3n
3 2n2 +n
n +n
(d)
n3
1 n
(Hint: the tricky part is to investigate 1 + 2 . Once you have shown
n
n2
1
that 1 + 2 converges, you know that it is bounded: there is a
n
2
1 n
constant K such that 1 + 2 < K for all n; therefore
n n
1 √
1+ 2 < n K . . .)
n
48. Determine the limit as n → ∞ of the sequences whose nth terms are as
given:
4 n
(a) 1 − 2
n
3 2n+5
(b) 1 +
n
3n
(c) .
n! + n2
5n2 5n2 − 1 5n2 − 2 5n2 − 3 5n2 − n
(d) + + + + . . . +
3n3 3n3 + 2 3n3 + 4 3n3 + 6 3n3 + 2n
49. What are the limits of the following sequences?
2 √
(a) 3n +n−2 123.47
⎛ −1 ⎞
1
(b) ⎝((n!)!) n! ⎠, that is, √ n!
(n!)!
348 19 EXERCISES: FOR ADDITIONAL PRACTICE
(3π )n
(c)
n!
2n+1
5
(d)
(2n + 1)!
3 n
(e) 1+
n
50. Determine the limits of the sequences whose typical terms are as presented
below.
2
0.5 n +n+10
(a) 1 −
n
√ √
(b) n + 7 − n + 2
√
(c) 2n+3 10n + 5
51. Find the limit of the sequences whose nth terms are as given:
6 3n+17
(a) 1 + ,
5n
3n−1
n
(b) ,
n+2
n2 n2 − 2 n2 − 4 n2 − 2n
(c) 3 + 3 + 3 + ··· + 3 .
n +1 n +4 n +7 n + 3n + 1
52. Prove or disprove the following statements concerning a general sequence
(xn )n∈N : √ √ √
• If |xn | → 2 then either xn → 2 or xn → − 2;
• xn3 → 64 ⇒ xn → 4.
53. The following sequence (xn )n∈N :
0, 1, 1, 2, 1, 1, 1, 3, 1, 1, 1, 1, 1, 1, 1, 4, 1, 1, 1, . . .
2 + 2cn
c1 = 2, cn+1 = (for each n ∈ N).
3 + cn
Show that
(a) 1 < cn ≤ 2 for all n ∈ N,
19 EXERCISES: FOR ADDITIONAL PRACTICE 349
57. We are given a bounded sequence (yn ) such that, for every constant K, the
sequence (sin(Kyn )) converges. Prove that (yn ) must also converge.
(Suggestion: try proof by contradiction.)
1
58. Given that n2
converges, use the comparison test to prove convergence
for
5n − 1
(a) ,
3n3 + 1
5n + 1
(b) .
3n3 − 1
3
59. Given that the series n− 2 converges, use the comparison test to show that
each of the following also converges:
3n − 2
(a) √ ,
n(2n2 + 1)
n2 + n + 1
(b) √ .
( n)7 + 13
60. Use the limit comparison test to decide, for each of the following series,
whether it converges or diverges:
−2+ 1
(a) n n
1001/n + (n!)−1/n
(b) .
n1+2/n
61. Use the limit comparison test to decide, for each of the following series,
whether it√converges or diverges:
n3 n + 5
(a)
2n4 − 7
n3 + 5
(b) √ .
2n4 n − 7
62. Does the following series converge or diverge? Give reasons for your answer.
1
n2
1− .
2n + 1
4n2 − 1 n
tn .
9n2 − 1
(Note that it is quite difficult to decide whether or not this series converges
when t = B exactly, and you are not asked to investigate this.)
67. Does the series
(n!)2 22n
(2n)!
converge or diverge?
68. For which t > 0 does this series converge?
n!(2n)!(3n)!
tn
(6n)!
69. For which t > 0 does this series converge?
3n n2
tn
3n + 2
((n + 2)!)2 t n
70. For which positive values of t is the series convergent, and
(2n + 1)!
for which is it divergent?
2
n + 5 n +n
71. For which values of t is the series |1 − t|n convergent, and
n
for which is it divergent?
72. Find a positive integer N so large that
N
1
> 100.
n
n=65
n 1 −n
(−1) 1 + .
n
81. Given a sequence (xn )n∈N in the interval [−10, −2], show (using, for
example, the direct comparison test) that nxn is convergent.
82. There are two sequences of positive real numbers (an )n∈N , (bn )n∈N .
A subsequence (ank )k∈N of (an )n∈N satisfies the condition ank ≥ bk for every
k ≥ 1 and the series bn is divergent. Prove that an is also divergent.
Suggestion: if not, then the partial sums of an would be bounded . . .
(−1)n
83. Putting tn = −1 + , show that ntn is divergent. Suggestion: use the
3n + 1
result of Exercise 82.
84. Identify the domains of the real functions defined by the following formulae:
(a) arcsin(5 + 3x),
x−1
(b) 2 ,
(x − 49)(x2 + 3x + 2)
(c) ln(6 + x − x2 ),
10
(d) ln .
1 + x2
85. Determine the domains of the functions described by
x
(a) 2 ,
x + 5x − 50
(b) arccos(ex ),
x(4 − x)
(c) ln .
3
√
86. For the function f (x) = x , where x denotes the floor (or integer part)
of x, find two sequences (xn ), (yn ) such that xn → 4 and f (xn ) → f (4), but
yn → 4 and f (yn ) f (4).
87. For the function defined by f (x) = x3 , find
(a) a sequence (xn ) such that xn → 0 and f (xn ) → f (0),
(b) a sequence (yn ) such that yn → 0 and f (yn ) f (0).
88. Prove that p(x) = x4 − 2x2 + 17x − 12 defines a function p that is
continuous at x = 4.
89. Show directly from the definition that the polynomial
is continuous at x = −3.
90. For the function f described by
20 − x2 if x < 2,
f (x) =
1 + 2x + 3x2 if x ≥ 2
is not continuous.
94. Verify that f is continuous at x = 2, where
1+x if x is rational,
f (x) =
5−x if x is irrational.
is continuous at 0.
97. Show that the function h given by
2 + 5x if x ∈ Q,
h(x) =
10 − 3x if x ∈ R \ Q
100. Suppose that B is a non-empty set of real numbers and that λ = inf(B).
Show that there is a sequence (xn ) of elements of B such that xn → λ.
101. Show that x5 + 15x − 20 has a root in [1, 2].
102. Show that the polynomial 6x4 − 8x3 + 1 has at least two roots in the interval
(0, 2).
103. Prove that x6 − 5x4 + 2x + 1 = 0 has at least four real solutions.
104. Prove that the equation 4x5 − 8x3 + 4x − 1 = 0 has at least three positive
solutions. It may be useful to evaluate the polynomial at x = 12 .
105. Show that the graph of the function 7 sin x − 10 cos x − 4x crosses the x-axis
at least twice between 0 and π .
106. Show that the equation x4 + x3 − 8x2 + 1 = 0 has four distinct real
solutions.
107. Given that f : [0, π/2] → [0, 1] is continuous, prove (by considering the
function g(x) = f (x) − sin x) that there is a number c in [0, π/2] such that
f (c) = sin c.
108. Given that f : [0, 1] → [0, 1] is continuous, prove that there exists a number
c ∈ [0, 1] such that (f (c))2 + 2f (c) − 4c2 = 0.
109. (a) Suppose that f : [a, b] → R is continuous and never takes the value
zero. Prove that there is δ > 0 such that no value of f (x) lies in the
interval [−δ, δ].
(b) Show by example that the statement in part (a) ceases to be true if we
replace [a, b] by (a, b).
b] → R, show that there is a positive constant K
110. Given continuous f : [a,
such that the function K + f (x) is defined everywhere on [a, b].
111. Given that f : [a, b] → R is continuous, show that
(a) there is a positive constant K such that the function ln(f (x) + K) is
defined everywhere on [a, b],
(b) there is a positive constant A such that the function arcsin(Af (x)) is
defined everywhere on [a, b].
112. Show by means of examples (preferably, simple ones) that
(a) A continuous function on a bounded open interval can fail to be
bounded,
(b) A continuous function on a bounded open interval, even if it is
bounded, can fail to have a maximum value and can fail to have a
minimum value,
(c) A continuous function on an unbounded closed interval can fail to be
bounded,
(d) A continuous function on an unbounded closed interval, even if it is
bounded, can fail to have a maximum value and can fail to have a
minimum value,
19 EXERCISES: FOR ADDITIONAL PRACTICE 355
g(x) = f (x):
(a) What is the domain of g?
(b) What is the numerical value of g(2)? Investigate the limit of g(x) as
x → 2.
(c) What is the numerical value of g(0)? Why can we not investigate the
limit of g(x) as x → 0?
120. Use sequences to evaluate
x3 − 1000
lim .
x→10 x4 − 10000
where ⎧ 4
⎨ x − 81
while x = 3,
j(x) = x3 − 27
⎩
−1 if x = 3.
356 19 EXERCISES: FOR ADDITIONAL PRACTICE
x3 − 64
(b) lim .
x→4 x2 − 16
123. For the function f defined by
⎧ 2
⎪ x + 3x − 10
⎨ if x = 2,
f (x) = x + x − 4x − 4
3 2
⎪
⎩ 1 if x = 2
2
√
x−1
Then use the epsilon-delta definition of limit to determine limx→1 ,
x−1
that is, the gradient of the curve at (1, 1).
140. Assume throughout this question that x > 0. Verify that
√
1 1 |3 − x| |9 − x|
√
x + 3 − 6 < 18
<
54
.
√
x−3 1
Now show that (if also x = 9) =√ .
x−9 x+3
√ joining the point (9, 3) to a nearby point
The gradient of the straight line
√ √ x−3
(x, x) on the curve y = x is . Use the above roughwork and the
x−9
epsilon-delta definition to evaluate the limit of this expression as x → 9
(that is, the gradient of the curve itself at x = 9).
141. Suppose that f , g have domain D, that p is a limit point of D ∩ (p, ∞) and
that (as x → p+ ) f (x) → and g(x) → m.
(a) Use the epsilon-delta description of one-sided limits to prove that
f (x) + g(x) → + m (as x → p+ ),
(b) Use the sequence description of one-sided limits to prove that
f (x) − g(x) → − m (as x → p+ ).
19 EXERCISES: FOR ADDITIONAL PRACTICE 359
142. Show that left-hand limits preserve inequalities, in the following sense: if
f (x) ≤ g(x) for all x ∈ (a, b) and both f and g have left-hand limits at b,
then
lim f (x) ≤ lim g(x).
x→b− x→b−
Prove that f has a limit as x → 3. Find the relation (in as simple a form as
possible, without modulus signs!) between a and b that is equivalent to the
modulus |f | being continuous at 3.
147. The function f specified by
⎧ 2
⎪ x − 4x + 3
⎪
⎪ if x < 1,
⎪
⎨ x−2
f (x) = ax2 + bx + 5 if 1 ≤ x ≤ 2,
⎪
⎪
⎪
⎩ x + 4x − 3 if 2 < x
2
⎪
x−1
f (x)
→ as x → ∞.
g(x) m
x
152. Show that tends to −∞ as x → −1. (You can use either the formal
(1 + x)2
definition or the sequence description.)
3x + 5
153. Verify via the definition that limx→2 = ∞.
(x − 2)2
154. Let f , g have the same domain D of which p is a limit point, and suppose that
(as x → p) f (x) → ∞ and g(x) → ∈ R. Prove that f (x) + g(x) → ∞ (as
x → p).
sin(π x)
155. Show that tends to ∞ as x → 12 . (Use the formal definition: given
(2x − 1)2
K > 0, find a positive number δ such that |x − 12 | < δ guarantees that
sin(π x)
> K.)
(2x − 1)2
156. Show that
5x2 + 3x
lim = ∞.
x→−1 (x + 1)2
157. Use either the sequence description or the epsilon-delta description to
2x2 − 3
explore the limit of f (x) = 2 as x → −1.
3x − 1
19 EXERCISES: FOR ADDITIONAL PRACTICE 361
1 + x + x2
158. For the function f defined by f (x) = , show that the limit of
2 + 3x + 4x2
1
f (x) as x → ∞ is 4 by two different methods:
• by the epsilon-style definition,
• by using the sequence description of such limits.
159. Suppose that f : (0, ∞) → R and thatanother
function g : (0, ∞) → R is
1
then defined by the formula g(t) = f . Prove that the following
t
statements are equivalent (suggestion: use an epsilon-style argument):
(a) f (x) → as x → ∞,
(b) g(t) → as t → 0+ .
160. Differentiate (within the appropriate domain) each of the following:
1 + x − x3
(a) ,
2 − x + x2
(b) sin(ex )esin x ,
√
(c) x ln x cos x,
(d) sin(ln(cos x)).
161. Differentiate the following functions (within their domains, and assuming
that ex , sin x, cos x and ln x have their well-known derivatives). (Do not
spend a lot of time simplifying your answers.)
(a) f (x) = ex ln x cos(2x),
sin x + x3
(b) f (x) = ,
cos x − x2
(c) f (x) = (1 + x + ex )13 ,
(d) f (x) = ecos(ln x) ,
x ln(1 + ex )
(e) f (x) = √ .
4 + x2
162. Differentiate (with respect to x) the expressions
x2 + sin x
(a) 3 ,
x − cos x
(b) (x4 − 12) sin9 x,
(c) ln(e2x − e−2x ).
163. Differentiate,
with
respect to x, each of the expressions
1 + ex
(a) sin ,
1 − ex
(b) (1 + x2 ) ln(1 + x2 ),
x
(c) xee .
f (c + h) − f (c − h)
164. If f is differentiable at x = c, show that possesses a limit
2h
as h → 0.
362 19 EXERCISES: FOR ADDITIONAL PRACTICE
See if you can devise a function g that is not differentiable at x = 0 and yet
g(0 + h) − g(0 − h)
possesses a limit as h → 0.
2h
165. Suppose that f is differentiable at c. Evaluate the following:
f (c + 2h) − f (c − 3h)
(a) limh→0 ,
h
(f (c + h))2 − (f (c − h))2
(b) limh→0 .
h
(Notice that the top line in (b) is the difference of two squares.)
166. If
1 + x + ax2 while x < 4,
f (x) =
3 + bx + x2 while x ≥ 4
is to define a function that is differentiable on R, find what numerical values
a and b must have.
167. Given that the function
ax3 + bx if x < 2,
f (x) =
ax2 + 5 if x ≥ 2
is differentiable at x = 2, determine the numerical values of the constants a
and b.
168. The formula 2
3x + ax + b if x < 1,
f (x) =
2x2 + 2bx − a if x ≥ 1
defines a function f that is differentiable at x = 1. Evaluate the constants a
and b.
169. Determine, if possible, the maximum and minimum values of the functions
x
f (x) = cos(2x) − cos2 x on [0, 3π/4] and g(x) = 2 on R.
x + 4x + 9
170. Find the maximum value and the minimum value of the expressions on the
intervals indicated:
(a) x2 ln x on [ 1e , 1],
(b) ex − 2e2x + e3x on [−2, 2].
171. Suppose that f and g are continuous on [a, b] and differentiable on (a, b),
and that f (x) = g (x) everywhere in (a, b). Show that the graphs of f and of
g cannot intersect at two different points. (Hint: if they did, consider the
behaviour of the function h(x) = f (x) − g(x).)
172. Use Rolle’s
theorem
on the function f given by f (x) = x3 (1 + sin x) on the
3π
interval 0, 2 , to see what it tells us about the equation
cos x 3
=− .
1 + sin x x
19 EXERCISES: FOR ADDITIONAL PRACTICE 363
173. Use Rolle’s theorem to show that the equation tan x = x2 has a solution in
the interval (0, π/2). (Suggestion: consider x2 cos x.)
Show further that there is a sequence (cn )n∈N of numbers in the interval
(0, π/2) such that, for each positive integer n,
n
tan cn = .
cn
f (x) = (1 + x − x2 )e3x
|xn − xn+6 | < 7(0.6)n , |xn − xn+10 | < 4(0.7)n , |xn − xn+15 | < 9(0.8)n .
(b) If, for a given sequence (xn ), the subsequence (x2n ) and the subsequence
(x3n ) are both Cauchy, does it necessarily follow that (xn ) itself
is Cauchy?
189. Confirm that the series
∞
7 sin(k2 + 3k) − 4 cos(2k2 − 5)
(1.1)k
k=1
Verify that |xn − xn+1 | ≤ e−(n+1) . Prove that (xn )n≥1 converges (by
verifying that it is Cauchy).
191. Of the following three statements, at least one is true in general and at least
one is false. Give a proof for each that is true, and find a counterexample to
disprove each false one.
(a) If f : (a, ∞) → R is a continuous function on an unbounded open
interval (a, ∞), and (xn )n∈N is any Cauchy sequence of elements of
(a, ∞), then (f (xn ))n∈N must also be Cauchy.
(b) If f : [a, ∞) → R is a continuous function on an unbounded closed
interval [a, ∞), and (xn )n∈N is any Cauchy sequence of elements of
[a, ∞), then (f (xn ))n∈N must also be Cauchy.
(c) If f : R → R is a continuous function on the whole real line R, and
(xn )n∈N is any Cauchy sequence, then (f (xn ))n∈N must also be Cauchy.
192. A given series consists of non-negative terms. By bracketing these terms
together in a particular way, we can create a new series that converges. Prove
that the original series converges also (and to the same sum).
193. We consider the rearrangement of the alternating harmonic series in which
the positive terms are taken in pairs followed by one negative term, thus:
1 1 1 1 1 1 1 1 1 1 1 1
+ − + + − + + − + + − + ··· (∗)
1 3 2 5 7 4 9 11 6 13 15 8
Notice that if we bracket these terms together in threes, the nth bracket is
1 1 1
+ − .
4n − 3 4n − 1 2n
366 19 EXERCISES: FOR ADDITIONAL PRACTICE
Use this to show that (i) the bracketed series converges, and that (ii) series
(*) also converges.
194. We consider the rearrangement of the alternating harmonic series in which
each positive term is followed by three negative terms, thus:
1 1 1 1 1 1 1 1 1 1 1 1
− − − + − − − + − − − +··· (∗∗)
1 2 4 6 3 8 10 12 5 14 16 18
Notice that if we bracket these terms together in fours, the nth bracket is
1 1 1 1
− − − .
2n − 1 6n − 4 6n − 2 6n
Use this to show that (i) the bracketed series converges, and that (ii) series
(**) itself converges.
195. Use the ratio test of d’Alembert to find all values of x ∈ R for which the
following series converges:
(2 − x)k
.
5k (3k + 4)
((n + 1)!)2
32n−1 x2n
(2n + 1)!
converges.
197. If x ∈ R and (for each k ∈ N)
k!(2k + 1)!
ak = (1 + 2x)k ,
(3k − 1)!
determine precisely the range of values of x for which the series ak
converges.
198. Determine the set of values of the real parameter t for which the following
series is convergent:
(n + 2)n t n
.
(3n + 1)n
199. Use the nth root test to find all values of x ∈ R for which the following series
converges:
k k2
22k xk .
k+1
19 EXERCISES: FOR ADDITIONAL PRACTICE 367
200. Find all values of x ∈ R for which the series yn converges, where:
−2n2
n 1
yn = nx 1+ .
n
201. Show that if a series an is conditionally
convergent
(that is, convergent
but not absolutely convergent) then both a+ n and a−n diverge to ∞.
202. Devise a rearrangement of the alternating harmonic series that diverges
to ∞.
203. Given a completely arbitrary series xk that is conditionally convergent,
and a completely arbitrary real number , think how you could devise a
rearrangement of xk that converges to .
Suggestion: the key ingredient in finding one is that both xk+ and xk−
diverge to infinity. You might begin by taking just enough of the
non-negative terms of the series to make the running total greater than .
204. Let s, t be distinct numbers in the interval (−1, 1). Recall that the geometric
n 1
series ∞ 0 s converges to (and likewise for t in place of s).
1−s
∞ n
Write down the Cauchy product ∞ 0 cn of the two series 0 s and
∞ n
0 t and simplify the expression you obtain for its typical term cn . From
this, deduce that
∞ n+1
s − t n+1 1
converges to .
0
s−t (1 − s)(1 − t)
205. Let −1 < s < 1. Write down the Cauchy product ∞ 0 cn of the series
∞ n
0 s by itself, and simplify the expression you obtain for its typical term
cn . Hence evaluate the sum of the series
∞
(n + 1)sn .
0
xn ∞
x2 x3
ex = 1 + x + + + ··· = ,
2! 3! 0
n!
368 19 EXERCISES: FOR ADDITIONAL PRACTICE
calculate (and simplify as necessary) the Cauchy product of the power series
representations of ex and of ey , and confirm that it converges to the product
of the two functions.
208. Determine the radius of convergence of each of the following:
∞ n
x2 x3 x
(a) (ex =) 1 + x + + + ··· = ,
2! 3! 0
n!
∞
x3 x5 x7 x2n+1
(b) (sin x =) x − + − + ··· = (−1)n ,
3! 5! 7! 0
(2n + 1)!
∞
x2 x4 x6 x2n
(c) (cos x =) 1 − + − + ··· = (−1)n ,
2! 4! 6! 0
(2n)!
(d) (30n + n30 )xn ,
(e) (n!)xn ,
n!(n + 1)!(n + 2)! n
(f) x ,
(3n + 1)!
xn
(g) 2.
3 2n
1+
n
209. Given a, b ∈ R, show that
f : R → R, x → ax + b
is uniformly continuous on R.
210. Show that the function given by f (x) = x3 is not uniformly continuous on
any interval of the form [a, ∞).
211. Decide (with proof) whether the following functions are or are not
uniformly continuous on the indicated intervals.
(i) On (0, 1), f (x) = sin( x1 ).
(ii) On (0, 1), f (x) = x sin( x1 ).
√
(iii) On [0, ∞), f (x) = x x.
212. Let a ∈ R be given. Find:
(i) a continuous function on [a, ∞) that is not uniformly continuous,
(ii) a continuous function on (a, ∞) that is not uniformly continuous,
(iii) a continuous function on (a, b) that is not uniformly continuous.
213. An interior point b of an interval I divides it into a left portion L and a right
portion R that intersect only at b. (So L takes one of the forms (−∞, b],
(a, b], [a, b] and R takes one of the forms [b, ∞), [b, c), [b, c].)
Given a real function f : I → R that is uniformly continuous on L and
uniformly continuous on R, show that f is uniformly continuous on
19 EXERCISES: FOR ADDITIONAL PRACTICE 369
1
(b) On (−1, 1) we define f by the formula f (x) = .
1 − x2
1 − cos x
(c) On (0, π/2] we define f by the formula f (x) = .
x2
(d) On (0, ∞) we define f (x) = ln x.
√
216. Show that the function f : [0, 1] → R described by f (x) = 1 − x2 is
uniformly continuous, but is not Lipschitz. (Suggestion: if there were a
constant K such that |f (x) − f (y)| ≤ K|x − y| for all relevant x and y, put
x = 1 and y = 1 − n1 for each positive integer n and see what that tells you
about K.)
217. Show that the function f described by the formula f (x) = x − x−1 is
uniformly continuous on [1, ∞), but not on (0, ∞).
218. (a) Suppose we know that a function (defined at least upon an interval of
the form [a, ∞) or (a, ∞)) possesses a limit as x → ∞. Show that there
is some interval of the form [b, ∞) upon which this function is
bounded.
√
(b) Show that f (x) = 3x + sin( x) defines a uniformly continuous
function on the interval (0, ∞). (Hint: use the result of part (a) upon the
derivative of f .)
sin x
219. Show that the function f defined on R by the formula f (x) = is
1 + x2
uniformly continuous on the real line.
√
3
220. Prove that the function specified by f (x) = x2 , x ∈ R, is uniformly
continuous on the real line.
221. We are given that f : (a, ∞) → R is differentiable on its domain and that
f (x) → ∞ as x → ∞. Prove that f is not uniformly continuous on (a, ∞).
222. Show that the function f : (0, ∞) → R defined by
sin x
f (x) =
ex − e−x
223. Use the three methods indicated to show that the equation
π πx
cos = (x + 1)ex−1
2 2
‘Dividing one by the other (and remembering that g(b) − g(a) cannot be
zero, else Rolle’s theorem would give g = 0 somewhere, contradiction)
we get
f (c) f (b) − f (a)
=
g (c) g(b) − g(a)
as desired.’
(You might wish to try running the argument of the alleged proof on a
couple of simple functions such as f (x) = x2 and g(x) = x3 over [0, 1], and
observe its failure.)
x7 − 3x5 + 2 sin x − x
228. Use l’Hôpital’s rule to evaluate limx→1 5 and limx→0 .
x + 2x3 − 3 x3
229. Use l’Hôpital’s rule to evaluate the following:
1 − esin x
(a) limx→π ,
x−π
1 − cos x + x ln(1 + x)
(b) limx→0 ,
sin2 x
1
ex − 1 − x − x2
(c) limx→0 2 .
x − sin x
x
e − (x + 2)e−1
230. Evaluate limx→−1 .
(x + 1)2
231. Evaluate limx→0+ (x ln x).
(HINT: seek the limit of f (x)/g(x) where f (x) = ln x and g(x) = x1 . You can
assume that l’Hôpital’s rule works in the ‘plus or minus infinity over infinity’
case just as it does in the ‘zero over zero’ case – see also Exercise 234.)
x − arctan x
232. Evaluate limx→0 .
x3
233. Determine the following
√ limits:
x− x
(a) limx→4 ,
4−x
ex + (2 − x)e3
(b) limx→3 ,
(x − 3)2
π
(c) limx→∞ x − arctan x .
2
234. (a) Given that 0 < ε < 1, |h − L| < ε and |m − 1| < ε, verify that
|hm − L| ≤ (2 + |L|)ε.
(b) Prove the following version of l’Hôpital’s rule: if f and g are both
differentiable (with g = 0) on (a, b) and, as x → b− :
• f (x) → ∞,
372 19 EXERCISES: FOR ADDITIONAL PRACTICE
• g(x) → ∞ and
f (x)
• → ,
g (x)
then also
f (x)
• → .
g(x)
Hints for part (b):
i. (Given ε > 0) show that there is x0 in (a, b) such that, for every x in
f (x)
(x0 , b), − < ε.
g (x)
ii. Now show that there is x1 in (x0 , b) such that, for every x in (x1 , b),
g(x0 )
1 −
g(x)
− 1 < ε.
f (x0 )
1 −
f (x)
iii. For each x in (x1 , b), apply the Cauchy mean value theorem to f , g
over the interval [x0 , x].
235. Verify that the eighth Taylor (Maclaurin) polynomial approximating the
function cos x at a = 0 is given by
x2 x4 x6 x8
p8 (x) = 1 − + − + .
2! 4! 6! 8!
1 The result is actually true on the larger interval (−1, 1), but confirming this needs a slightly
different method of proof.
19 EXERCISES: FOR ADDITIONAL PRACTICE 373
238. Use the Taylor expansion of ln(1 + x) (for small values of x) to determine
whether or not the series yn given by
n2 +n
n+5
yn = e−5n
n
is convergent.
239. By considering the logarithm of the typical term
and appealing to Taylor’s
theorem, determine whether or not the series xn specified by
n2 2n
3n
xn = e3
3n + 2
converges.
240. Provided that −1 < x < 1, what is the sum of the series
241. Assuming that it is possible to express arctan x (for −1 < x < 1) as the sum
of a power series a0 + a1 x + a2 x2 + a3 x3 + · · · + an xn + · · · , use the result
on differentiation of power series to determine the coefficients an .
242. Consider the real function f : [0, 2] → R defined by
By directly calculating the upper and lower Riemann sums for f using the
partition
By directly calculating the upper and lower Riemann sums for f using the
partition
= {0, 1/n, 2/n, 3/n, · · · , (n − 1)/n, 1}
374 19 EXERCISES: FOR ADDITIONAL PRACTICE
∞
≤ f (k) ≤ + f (n0 ).
k=n0
1
(b) Estimate the sum of the (convergent) series ∞ 1 k5 with an error less
than 0.001.
255. (a) For a given real number a, use the one-sided version of l’Hôpital’s Rule
to determine
ln(1 + ax)
lim .
x→0 + x
(b) Use the sequence-based description of one-sided limits to deduce that
a
ln 1 +
lim n = a.
n→∞ 1
n
a n
1+ → ea as n → ∞.
n
256. (a) Verify that the function tan, defined (initially) on the interval (− π2 , π2 )
by the formula
sin x
tan x = ,
cos x
is continuous and differentiable, with (positive) derivative (cos x)−2 ,
and has range R; also that its inverse arctan or tan−1 : R → (− π2 , π2 ) is
continuous and differentiable, and that its derivative is given by
1
(tan−1 ) (x) = .
1 + x2
t t2 t3 t4
1− + − + − ···
3 5 7 9
376 19 EXERCISES: FOR ADDITIONAL PRACTICE
x2 x4 x6 x8
1− + − + − ···
3 5 7 9
and
x3 x5 x7 x9
x− + − + − ···
3 5 7 9
are absolutely convergent, the second one to some function f (x), on (at
least) the interval (−1, 1).
(c) Appeal to the theorem on differentiation of power series to show that
1
f (x) = 1 − x2 + x4 − x6 + x8 − · · · = (−1 < x < 1).
1 + x2
x3 x5 x7 x9
x− + − + − · · · = tan−1 (x).
3 5 7 9
A D
Absolute convergence 237 D’Alembert’s test 120
algebra of limits Darboux integrability criterion 302
for continuous functions 136 decreasing
for convergent sequences 24 function 147
for divergent sequences 32 sequence 53
for functions 165 dense 48
for functions as x → − ∞ 190 derivative 203
for functions as x → ∞ 188 differentiation 203
for series 110 chain rule 208, 225
alternating series test 109 inverse function 212
product rule 206
B quotient rule 207
Binomial term-by-term 287
coefficient 69 direct comparison test 111
theorem 70 diverge/divergent/divergence
Bolzano-Weierstrass 83 for sequences 17
bound for series 105
lower 41
upper 41 E
bounded 41 Element 37
above 41 elementary functions
below 41 basic information 35
defined and established 325
C endpoint 41
Cauchy exponential function 329
mean value theorem 277
product 249 F
sequence 229 Fibonacci sequence 98
CMVT 277 first mean value theorem 219
completeness principle 45 floor 4
composite/composition 137 FMVT 219
conditional convergence 237 function
continuous/continuity composite/composition 137
at a point 133 continuous/continuity at a point 133
on a set 134 continuous/continuity on a set 134
uniform 263 converging to a limit 159
converge/convergent/convergence decreasing 147
absolute (for series) 237 differentiable at a point 203
conditional (for series) 237 differentiable on a set 203
for sequences 17 increasing 147
critical point 216 Lipschitz 272
380 INDEX