Ordinary Dierential Equations Principles and Applications

Ordinary Differential Equations
Many interesting and important real life problems in the field of mathematics, physics,
chemistry, biology, engineering, economics, sociology and psychology are modelled
using the tools and techniques of ordinary differential equations (ODEs). This book
offers detailed treatment on fundamental concepts of ordinary differential equations.
Important topics including first and second order linear equations, initial value problems
and qualitative theory are presented in separate chapters. The concepts of physical models
and first order partial differential equations are discussed in detail. The text covers two-
point boundary value problems for second order linear and nonlinear equations. Using
two linearly independent solutions, a Green’s function is also constructed for given
boundary conditions.
The text emphasizes the use of calculus concepts in justification and analysis of
equations to get solutions in explicit form. While discussing first order linear systems,
tools from linear algebra are used and the importance of these tools is clearly explained
in the book. Real life applications are interspersed throughout the book. The methods
and tricks to solve numerous mathematical problems with sufficient derivations and
explanations are provided.
The first few chapters can be used for an undergraduate course on ODE, and later
chapters can be used at the graduate level. Wherever possible, the authors present the
subject in a way that students at undergraduate level can easily follow advanced topics,
such as qualitative analysis of linear and nonlinear systems.
A. K. Nandakumaran is a Professor at the Department of Mathematics, Indian Institute

of Science, Bangalore. He received the Sir C. V. Raman Young Scientist State Award in
Mathematics in 2003. His areas of interest are partial differential equations,
homogenization, control and controllability problems, inverse problems and
computations.
P. S. Datti superannuated from the Centre for Applicable Mathematics at the Tata
Institute of Fundamental Research, Bangalore after serving for over 35 years. His
research interests include nonlinear hyperbolic equations, hyperbolic conservation laws,
ordinary differential equations, evolution equations and boundary layer phenomenon.
Raju K. George is Senior Professor and Dean (R&D) at the Indian Institute of Space
Science and Technology (IIST), Thiruvananthapuram. His research areas include
functional analysis, mathematical control theory, soft computing, orbital mechanics and
industrial mathematics.
CAMBRIDGE–IISc SERIES
Cambridge–IISc Series aims to publish the best research and scholarly work on
different areas of science and technology with emphasis on cutting-edge research.
The books will be aimed at a wide audience including students, researchers,
academicians and professionals and will be published under three categories:
research monographs, centenary lectures and lecture notes.
The editorial board has been constituted with experts from a range of disciplines
in diverse fields of engineering, science and technology from the Indian Institute
of Science, Bangalore.
IISc Press Editorial Board:
G. K. Ananthasuresh, Professor, Department of Mechanical Engineering
K. Kesava Rao, Professor, Department of Chemical Engineering
Gadadhar Misra, Professor, Department of Mathematics
T. A. Abinandanan, Professor, Department of Materials Engineering
Diptiman Sen, Professor, Centre for High Energy Physics
Titles in print in this series:
• Continuum Mechanics: Foundations and Applications of Mechanics by C. S. Jog

• Fluid Mechanics: Foundations and Applications of Mechanics by C. S. Jog
• Noncommutative Mathematics for Quantum Systems by Uwe Franz and Adam
Skalski
• Mechanics, Waves and Thermodynamics by Sudhir Ranjan Jain
Cambridge-IISc Series
Ordinary Differential Equations:

Principles and Applications
A. K. Nandakumaran
P. S. Datti
Raju K. George
University Printing House, Cambridge CB2 8BS, United Kingdom
One Liberty Plaza, 20th Floor, New York, NY 10006, USA
477 Williamstown Road, Port Melbourne, vic 3207, Australia
4843/24, 2nd Floor, Ansari Road, Daryaganj, Delhi - 110002, India
79 Anson Road, #06–04/06, Singapore 079906
Cambridge University Press is part of the University of Cambridge.
It furthers the University’s mission by disseminating knowledge in the pursuit of
education, learning and research at the highest international levels of excellence.
www.cambridge.org
Information on this title: www.cambridge.org/9781108416412

c Cambridge University Press 2017
This publication is in copyright. Subject to statutory exception
and to the provisions of relevant collective licensing agreements,
no reproduction of any part may take place without the written
permission of Cambridge University Press.
First published 2017
Printed in India
A catalogue record for this publication is available from the British Library
ISBN 978-1-108-41641-2 Hardback
Additional resources for this publication at www.cambridge.org/9781108416412

Cambridge University Press has no responsibility for the persistence or accuracy
of URLs for external or third-party internet websites referred to in this publication,
and does not guarantee that any content on such websites is, or will remain,
accurate or appropriate.
We would like to dedicate the book to our parents
who brought us to this wonderful world.
Contents
Figures xiii
Preface xv
Acknowledgement xix
1 Introduction and Examples: Physical Models

1.1 A Brief General Introduction 1
1.2 Physical and Other Models 3
1.2.1 Population growth model 3
1.2.2 An atomic waste disposal problem 6
1.2.3 Mechanical vibration model 8
1.2.4 Electrical circuit 9
1.2.5 Satellite problem 10
1.2.6 Flight trajectory problem 13
1.2.7 Other examples 14
1.3 Exercises 19
1.4 Notes 21
2 Preliminaries
2.1 Introduction 22
2.2 Preliminaries from Real Analysis 22
2.2.1 Convergence and uniform convergence 22
2.3 Fixed Point Theorem 34
2.4 Some Topics in Linear Algebra 38
2.4.1 Euclidean space Rn 41
2.4.2 Points versus vectors 42
2.4.3 Linear operators 42
viii Contents
2.5 Matrix Exponential eA and its Properties 43

2.5.1 Diagonalizability and block diagonalizability 46
2.5.2 Spectral analysis of A 48
2.5.3 Computation of eJ for a Jordan block J 51
2.6 Linear Dependence and Independence of Functions 53
2.7 Exercises 54
2.8 Notes 55
3 First and Second Order Linear Equations
3.1 First Order Equations 56
3.1.1 Initial and boundary value problems 57
3.1.2 Concept of a solution 59
3.1.3 First order linear equations 60
3.1.4 Variable separable equations 65
3.2 Exact Differential Equations 66
3.3 Second Order Linear Equations 72
3.3.1 Homogeneous SLDE (HSLDE) 74
3.3.2 Linear equation with constant coefficients 77
3.3.3 Non-homogeneous equation 79
3.3.4 Green’s functions 88
3.4 Partial Differential Equations and ODE 89
3.5 Exercises 93
3.6 Notes 97
4 General Theory of Initial Value Problems
4.1 Introduction 99
4.1.1 Well-posed problems 99
4.1.2 Examples 100
4.2 Sufficient Condition for Uniqueness of Solution 103
4.2.1 A basic lemma 103
4.2.2 Uniqueness theorem 106
4.3 Sufficient Condition for Existence of Solution 107
4.3.1 Cauchy–Peano existence theorem 112
4.3.2 Existence and uniqueness by fixed point theorem 116
4.4 Continuous Dependence of the Solution on
Initial Data and Dynamics 119
Contents ix
4.5 Continuation of a Solution into Larger Intervals and Maximal

Interval of Existence 120
4.5.1 Continuation of the solution outside the
interval |t − t0 | ≤ h 121
4.5.2 Maximal interval of existence 123
4.6 Existence and Uniqueness of a System of Equations 125
4.6.1 Existence and uniqueness results for systems 127
4.7 Exercises 130
4.8 Notes 133
5 Linear Systems and Qualitative Analysis
5.1 General nth Order Equations and Linear Systems 134
5.2 Autonomous Homogeneous Systems 136
5.2.1 Computation of etA in special cases 137
5.3 Two-dimensional Systems 139
5.3.1 Computation of eB j and etB j 140
5.4 Stability Analysis 143
5.4.1 Phase plane and phase portrait 143
5.4.2 Dynamical system, flow, vector fields 145
5.4.3 Equilibrium points and stability 147
5.5 Higher Dimensional Systems 155
5.6 Invariant Subspaces under the Flow etA 165
5.7 Non-homogeneous, Autonomous Systems 167
5.7.1 Solution to non-homogeneous systems
(variation of parameters) 168
5.7.2 Non-autonomous systems 169
5.8 Exercises 175
5.9 Notes 179
6 Series Solutions: Frobenius Theory
6.1 Introduction 180
6.2 Real Analytic Functions 180
6.3 Equations with Analytic Coefficients 183
6.4 Regular Singular Points 189
6.4.1 Equations with regular singular points 190
6.5 Exercises 198
6.6 Notes 200
x Contents
7 Regular Sturm–Liouville Theory

7.2 Basic Result and Orthogonality 203
7.3 Oscillation Results 208
7.3.1 Comparison theorems 211
7.3.2 Location of zeros 218
7.4 Existence of Eigenfunctions 219
7.5 Exercises 220
7.6 Notes 222
8 Qualitative Theory
8.2 General Definitions and Results 224
8.2.1 Examples 228
8.3 Liapunov Stability, Liapunov Function 229
8.3.1 Linearization 230
8.3.2 Examples 233
8.4 Liapunov Function 238
8.5 Invariant Subspaces and Manifolds 245
8.6 Phase Plane Analysis 248
8.6.1 Examples 248
8.7 Periodic Orbits 253
8.8 Exercises 264
8.9 Notes 266
9 Two Point Boundary Value Problems
9.2 Linear Problems 269
9.2.1 BVP for linear systems 274
9.2.2 Examples 275
9.3 General Second Order Equations 276
9.3.1 Examples 280
9.4 Exercises 284
9.5 Notes 284
10 First Order Partial Differential Equations:
Method of Characteristics
10.1 Linear Equations 286
Contents xi
10.2 Quasi-linear Equations 290

10.3 General First Order Equation in Two Variables 295
10.4 Hamilton–Jacobi Equation 302
10.5 Exercises 304
10.6 Notes 306
Appendix A Poincarè–Bendixon and Leinard’s Theorems
A.1 Introduction 307
A.2 Poincarè–Bendixon Theorems 310
A.2.1 Intersection with transversals 310
A.3 Leinard’s Theorem 316
Bibliography 321
Index 325
Figures
1.1 Logistic map 5

1.2 A basic LCR circuit 9
1.3 Satellite problem 11
1.4 Flight trajectory 13
3.1 Free undamped vibrations 84
3.2 Free damped vibration 85
3.3 Forced undamped vibrations with resonance 87
3.4 (a) Characteristic curves, (b) Solution curves 90
3.5 Characteristic and initial curves 91
4.1 Picard’s theorem 108
4.2 Cauchy–Peano existence 113
5.1 Saddle point equilibrium 145
5.2 Vector field 146
5.3 (a) Unstable node, (b) Stable node 148
5.4 (a) Stable node, (b) Unstable node 149
5.5 (a) Stable node, (b) Unstable node 150
5.6 Centre 151
5.7 Stable focus 152
5.8 Unstable focus 152
5.9 Degenerate case 153
5.10 Bifurcation diagram 155
xiv Figures
5.11 Phase portrait of a 3 × 3 system 156

5.12 Phase portrait of a 3 × 3 system 158
8.1 Phase line for ẋ = sin x 233
8.2 Proof of Liapunov’s theorem 240
8.3 Domain for the proof of Chetaev’s theorem 242
8.4 Potential function and phase plane for the pendulum equation 249
8.5 Potential function and phase portrait of Duffing oscillator 252
8.6 A typical closed curve Γ with a vector field 254
8.7 The closed curve to prove Iv (Γ1 ) = Iv (Γ2 ) 257
8.8 The closed curve to prove Theorem 8.7.7 259
8.9 Proof of Theorem 8.7 259
9.1 Phase portrait for ü = 2 + u2 280
9.2 Phase portrait for ü + λ eu = 0, λ > 0 281
9.3 Phase portrait for ü − λ eu = 0, λ > 0, E ≥ 0 282
9.4 Phase portrait for ü − λ eu = 0, λ > 0, E < 0 283
10.1 A characteristic curve 287
10.2 Characteristic and initial curves 292
10.3 Monge cone 295
10.4 Characteristic strips 298
10.5 Characteristic and initial strips 300
A.1 Intersection of orbits with a transversal 311
A.2 The direction of an orbit while crossing a transversal 313
A.3 Crossings of an orbit on a transversal 313
A.4 Proof of Leinard’s theorem 318
Preface
Many interesting and important real life problems are modeled using
ordinary differential equations (ODE). These include, but are not limited
to, physics, chemistry, biology, engineering, economics, sociology,
psychology etc. In mathematics, ODE have a deep connection with
geometry, among other branches. In many of these situations, we are
interested in understanding the future, given the present phenomenon. In
other words, we wish to understand the time evolution or the dynamics of
a given phenomenon. The subject field of ODE has developed, over the
years, to answer adequately such questions. Yet, there are many
important intriguing situations, where complete answers are still awaited.
The present book aims at giving a good foundation for a beginner,
starting at an undergraduate level, without compromising on the rigour.
We have had several occasions to teach the students at the
undergraduate and graduate level in various universities and institutions
across the country, including our own institutions, on many topics
covered in the book. In our experience and the interactions we have had
with the students, we felt that many students lack a clear notion of ODE
including the simplest integral calculus problem. For other students, a
course on ODE meant learning a few tricks to solve equations. In India,
in particular, the books which are generally prescribed, consist of a few
tricks to solve problems, making ODE one of the most uninteresting
subject in the mathematical curriculum. We are of the opinion that many
students at the beginning level do not have clarity about the essence of
ODE, compared to other subjects in mathematics.
While we were still contemplating to write a book on ODE, to address
some of the issues discussed earlier, we got an opportunity to present
a video course on ODE, under the auspices of the National Programme
xvi Preface
for Technology Enhanced Learning (NPTEL), Department of Science

and Technology (DST), Government of India, and our course is freely
available on the NPTEL website (see www.nptel.ac.in/courses/
111108081). In this video course, we have presented several topics. We
have also tried to address many of the doubts that students may have at
the beginning level and the misconceptions some other students may
possess.
Many in the academic fraternity, who watched our video course,
suggested that we write a book. Of course, writing a text book, that too
about a classical subject at a beginning level, meant a much bigger task
than a video course, involving choosing and presenting the material in a
very systematic way. In a way, the video course may supplement the book
as it gives a flavour of a classroom lecture. We hope that in this way,
students in remote areas and/or places where there is lack of qualified
teachers, benefit from the book and the video course, making good use of
the modern technology available through the Internet. The teachers of
undergraduate courses can also benefit, we hope, from this book in fine
tuning their skills in ODE.
We have written the present book with the hope that it can also be used
at the undergraduate level in universities everywhere, especially in the
context of Indian universities, with appropriately chosen topics in
Chapters 1, 2 and 3. As the students get more acquainted with basic
analysis and linear algebra, the book can be introduced at the graduate
level as well and even at the beginning level of a research programme.
We now briefly describe the contents of the book. The book has a total
of ten chapters and one appendix.
Chapter 1 describes some important examples from real life situations
in the field of physics to biology to engineering. We thought this as a
very good motivation for a beginner to undertake the study of ODE; in a
rigorous course on ODE, often a student does not see a good reason to
study the subject. We have observed that this has been one of the major
concerns faced by students at a beginning level.
As far as possible, we have kept the prerequisite to a minimum: a good
course on calculus. With this in mind, we have collected, in Chapter 2,
a number of important results from analysis and linear algebra that are
used in the main text. Wherever possible, we have provided proofs and
simple presentations. This makes the book more or less self contained,
though a deeper knowledge in analysis and linear algebra will enhance the
understanding of the subject.
Preface xvii
First and second order equations are dealt with in Chapter 3. This
chapter also contains the usual methods of solutions, but with sufficient
mathematical explanation, so that students feel that there is indeed
rigorous mathematics behind these methods. The concept behind the
exact differential equation is also explained. Second order linear
equations, with or without constant coefficients, are given a detailed
treatment. This will make a student better equipped to study linear
systems, which are treated in Chapter 5.
Chapter 4 deals with the hard theme of existence, non-existence,
uniqueness etc., for a single equation and also a system of first order
equations. We have tried to motivate the reader to wonder why these
questions are important and how to deal with them. We have also
discussed other topics such as continuous dependence on initial data,
continuation of solutions and the maximal interval of existence of a
solution.
Linear systems are studied in great detail in Chapter 5. We have tried to
show the power of linear algebra in obtaining the phase portrait of 2 × 2
and general systems. We have also included a brief discussion on Floquet
theory, which deals with linear systems with periodic coefficients.
In the case of a second order linear equation with variable coefficients,
it is not possible in general, to obtain a solution in explicit form. This has
been discussed at length in Chapter 3. Chapter 6 deals with a class of
second order linear equations, whose solutions may be written explicitly,
although in the form of an infinite series. This method is attributed to
Frobenius.
Chapter 7 deals with the regular Sturm–Lioville theory. This theory is
concerned with boundary value problems associated with linear second
order equations with smooth coefficients, in a compact interval on the
real, involving a parameter. We, then, show the existence of a countable
number of values of the parameter and associated non-trivial solutions of
the differential equation satisfying the boundary conditions. There are
many similarities with the existence of eigenvalues and eigenvectors of a
matrix, though we are now in an infinite dimensional situation.
The qualitative theory of nonlinear systems is the subject of Chapter 8.
The contents may be suitable for a senior undergraduate course or a
beginning graduate course. This chapter does demand for more
prerequisites and these are described in Chapter 2. The main topics of the
chapter are equilibrium points or solutions of autonomous systems and
their stability analysis; existence of periodic orbits in a two-dimensional
xviii Preface
system. We have tried to make a presentation of these important notions

so that it can be easily understood by any student at a senior
undergraduate level. The proofs of two important theorems on the
existence of periodic orbits are given in the Appendix.
Chapter 9 considers the study of two point boundary value problems
for second order linear and nonlinear equations. The first dealing with
linear equations fully utilises the theory developed in Chapter 3. Using
two linearly independent solutions, a Green’s function is constructed for
given boundary conditions. This is similar to an integral calculus
problem. For nonlinear equations, we no longer have the luxury of two
linearly independent solutions. A result which gives a taste of delicate
analysis is proved. It is also seen through some examples how phase
plane analysis can help in deciding whether a given boundary value
problem has a solution or not.
In Chapter 10, we have attempted to show how the methods of ODE are
used to find solutions of first order partial differential equations (PDE). We
essentially describe the method of characteristics for solving general first
order PDE. As very few books on ODE deal with this topic, we felt like
including this, as a student gets some benefit of studying PDE and (s)he
can later pursue a course on PDE.
We have followed the standard notations. Vectors in Euclidean faces
and matrices are in boldface.
Acknowledgement
We wish to express our sincere appreciation to Gadadhar Misra and

others at the IISc Press for suggesting to publish our book through the
joint venture of IISc Press and Cambridge University Press. We also
would like to thank Gadadhar Misra for all the help in this regard. We
wish to acknowledge the support we received from our respective
institutions and the moral support from our colleagues, during the
preparation of the manuscript. We thank our academic fraternity, who
have made valuable suggestions after reading through the various parts of
the book. We would like to thank the students who attended our lectures
at various places and contributed in a positive way. Over the years, we
have had the opportunity to deliver talks in various lecture programs
conducted by National Programme in Differential Equations (NPDE),
India and the Indian Science Acadamies; our sincere thanks to them. We
also wish to thank the anonymous referees for their constructive criticism
and suggestions, which have helped us in improving the presentation.
The illustrations have been drawn using the freely available software
packages tikz and circuitikz. We are also thankful to the CUP team for
their coordination from the beginning and their excellent production. Last
but not the least, we wish to thank our family members for their patience
and support during the preparation of this book.
1
Introduction and Examples:
Physical Models
1.1 A Brief General Introduction

The beginning of the study of ordinary differential equations (ODE)
could perhaps be attributed to Newton and Leibnitz, the inventors of
differential and integral calculus. The theory began in the late 17th
century with the early works of Newton, Leibnitz and Bernoulli. As was
customary then, they were looking at the fundamental problems in
geometry and celestial mechanics. There were also important
contributions to the development of ODE, in the initial stages, by great
mathematicians – Euler, Lagrange, Laplace, Fourier, Gauss, Abel,
Hamilton and others. As the modern concept of function and analysis
were not developed at that time, the aim was to obtain solutions of
differential equations (and in turn, solutions to physical problems) in
terms of elementary functions. The earlier methods in this direction are
the concepts of integrating factors and method of separation of variables.
In the process of developing more systematic procedures, Euler,
Lagrange, Laplace and others soon realized that it is hopeless to discover
methods to solve differential equations. Even now, there are only a
handful of sets of differential equations, that too in a simpler form, whose
solutions may be written down in explicit form. It is in this scenario that
the qualitative analysis – existence, uniqueness, stability properties,
asymptotic behaviour and so on – of differential equations became very
important. This qualitative analysis depends on the development of other
branches of mathematics, especially analysis. Thus, a second phase in the
study of differential equations started from the beginning of the 19th
century based on a more rigorous approach to calculus via the
2 Ordinary Differential Equations: Principles and Applications
mathematical analysis. We remark that the first existence theorem for first
order differential equations is due to Cauchy in 1820. A class of
differential equations known as linear differential equations, is much
easier to handle. We will analyse linear equations and linear systems in
more detail and see the extensive use of linear algebra; in particular, we
will see how the nature of eigenvalues of a given matrix influences the
stability of solutions.
After the invention of differential calculus, the question of the
existence of antiderivative led to the following question regarding
differential equation: Given a function f , does there exist a function g
such that ġ(t ) = f (t )? Here, ġ(t ) is the derivative of g with respect to t.
This was the beginning of integral calculus and we refer to this problem
as an integral calculus problem. In fact, Newton’s second law of motion
describing the motion of a particle having mass m states that the rate
change of momentum equals the applied force. Mathematically, this is
written as dtd (mv) = −F, where v is the velocity of the particle. If
x = x(t ) is the position of the particle at time t, then v(t ) = ẋ(t ). In
general, the applied force F is a function of t, x and v. If we assume F is
a function of t, x, we have a second order equation for x given by
mẍ = −F (t, x). If F is a function of x alone, we obtain a conservative
equation which we study in Chapter 8. If on the other hand, F is a
function of t alone, then the second law leads to two integral calculus
problems: namely, first solve for the momentum p = mv by ṗ = −F (t )
and then solve for the position using mẋ = p. This also suggests that one
of the best ways to look at a differential equation is to view it as a
dynamical system; namely, the motion of some physical object. Here t,
the independent variable is viewed as time and x is the unknown variable
which depends on the independent variable t, and is known as the
dependent variable.
A large number of physical and biological phenomena can be
modelled via differential equations. Applications arise in almost all
branches of science and engineering–radiation decay, aging, tumor
growth, population growth, electrical circuits, mechanical vibrations,
simple pendulum, motion of artificial satellites, to mention a few.
In summary, real life phenomena together with physical and other
relevant laws, observations and experiments lead to mathematical models
(which could be ODE). One would like to do mathematical analysis and
computations of solutions of these models to simulate the behaviour of
these physical phenomena for better understanding.
Introduction and Examples: Physical Models 3
Definition 1.1.1
An ODE is an equation consisting of an independent variable t, an
unknown function (dependent variable) y = y(t ) and its derivatives up
to a certain order. Such a relation can be written as
dny

dy
f t, y, , · · · , n = 0. (1.1.1)
dt dt
Here, n is a positive integer, known as the order of the differential
equation.
For example, first and second order equations, respectively, can be written
as
dy d 2 y

dy
f t, y, = 0 and f t, y, , 2 = 0. (1.1.2)
dt dt dt
We will be discussing some special cases of these two classes of
equations. It is possible that there will be more than one unknown
function and in that case, we will have a system of differential equations.
A higher order differential equation in one unknown function may be
reduced into a system of first order differential equations. On the other
hand, if there are more than one independent variable, we end up with
partial differential equations (PDEs).
1.2 Physical and Other Models

We begin with a few mathematical models of some real life problems and
present solutions to some of these problems. However, methods of
obtaining such solutions will be introduced in Chapter 3, and so are the
terminologies like linear and nonlinear equations.
1.2.1 Population growth model

We begin with a linear model. If y = y(t ), represents the population size
dy
of a given species at time t, then the rate of change of population is
dt
proportional to y(t ) if there is no other species to influence it and there is
no net migration. Thus, we have a simple linear model [Bra78]
dy
= ry(t ), (1.2.1)
dt
where r denotes the difference between birth rate and death rate. If y(t0 ) =
y0 is the population at time t0 , our problem is to find the population for all
t > t0 . This leads to the so-called initial value problem (IVP) which will
be discussed in Chapter 3. Assuming that r is a constant, the solution is
given by
y(t ) = y0 er(t−t0 ) (1.2.2)

Note that, if r > 0, then as t → ∞, the population y(t ) → ∞. Indeed, this
linear model is found to be accurate when the population is small and for
small time. But it cannot be a good model as no population, in reality, can
grow indefinitely. As and when the population becomes large, there will
be competition among the population entities for the limited resources like
food, space etc.
This suggests that we look for a more realistic model which is given
by the following logistic nonlinear model. The statistical average of the
number of encounters of two members per unit time is proportional to y2 .
Thus, a better model would be
dy
= ay − by2 , y(t0 ) = y0 . (1.2.3)
dt
Here a, b are positive constants. The negative sign in the quadratic term
represents the competition and reduces the growth rate. This is known
as the logistic law of population growth. It was introduced by the Dutch
mathematical biologist Verhulst in 1837. It is also known as the Malthus
law.
Practically, b is small compared to a. Thus, if y is not too large, then
2
by will be negligible compared to ay and the model behaves similar to
the linear model. However, when y becomes large, the term by2 will have
a considerable influence on the growth of y, as can be seen from the
following discussion.
The solution of (1.2.3) is given by1
1 |y| a − by0
log = t − t0 , t > t0 . (1.2.4)
|y0 | a − by

a
Note that y ≡ 0 and y ≡ ab are solutions to the nonlinear differential
equation in (1.2.3) with the initial condition y(t0 ) = 0 and y(t0 ) = ab ,
1 The reader, after getting familiarised with the methods of solutions in Chapter 3, should work out
the details for this and the other examples in this chapter.
respectively. Hence, if the initial population y0 satisfies 0 < y0 < ba , then

the solution will remain in the same interval for all time. This follows
from the existence and uniqueness theory, which will be developed in
Chapter 4. A simplification of (1.2.4) gives
ay0
y(t ) = . (1.2.5)
by0 + (a − by0 )e−a(t−t0 )
a/b
a/2b
Fig. 1.1 Logistic map

a
In case 0 < y0 < , the curve y(t ) is depicted as in Fig. 1.1. This curve is
b
called the logistic curve; it is also called an S-shaped curve, because of its
a
shape. Note that is the limiting population, also known as capacity of
b
dy
the ecological environment. In this case, the rate of population is
dt
d2y dy
positive and hence, y is an increasing function. Since 2 = (a − 2by) ,
dt dt
we immediately see that it is positive if the population is between 0 and
a
half the limiting population, namely, , whereas, it is negative when the
2b
a
population crosses the half way mark . This indicates that if the initial
2b
population is less than half the limiting
population, then there is an
2

dy d y
accelerated growth > 0, 2 > 0 , but after reaching half the
dt dt
dy
population, the population still grows > 0 , but it has now a
2 dt
d y
decelerated growth <0 .
dt 2
When we analyse the case where the initial population is bigger than
dy d2y
the limiting population, we observe that < 0 and 2 < 0. Thus, the
dt dt
population decreases with a decelerated growth to the limiting population.
Remark 1.2.1
The estimation of the vital coefficients a and b in a particular

population model is indeed an important issue which has to be
updated in a period of time as they are influenced by other parameters
like pollution, sociological trends, etc. In a more realistic model, one
needs to consider more than one species, their interactions,
unforeseen issues like epidemics, natural disasters, etc., which may
lead to more complicated equations.
1.2.2 An atomic waste disposal problem

The dumping of tightly sealed drums containing highly concentrated
radioactive waste in the sea below a certain depth (say 300 feet) from the
surface is a very sensitive issue as it could be environmentally hazardous.
The drums could break due to the impact of their velocity exceeding a
certain limit, say 40 ft/sec. Our problem is to compute the velocity by
using Newton’s second law of motion and assess the level of safety
involved in the process. Let y(t ) denote the position, at time t, of the
object, the drum, (considered as a particle) measured from the sea surface
(indicating y = 0) as a positive quantity. The total force acting on the
object is given by
F = W − B − D,
where the weight W = mg is the force due to gravity, B is the buoyancy
force of water acting against the forward movement and D = cV is the drag
dy
exerted by water (it is a kind of resistance), where V = , the velocity
dt
of the object and c > 0 is a constant of proportionality. Thus, we have the
differential equation
d2y 1 1 g
2
= F = (W −B−cV ) = (W −B−cV ), y(0) = 0. (1.2.6)
dt m m W
Equivalently,
dV cg g
+ V = (W − B), V (0) = 0. (1.2.7)
dt W W
Equation (1.2.7) can be solved to get
W −B cg

V (t ) = 1 − e− W t . (1.2.8)
c
Thus, V (t ) is increasing and tends to W −B c as t → ∞ and the value
W −B
(practically) of ≈ 700.
c
The limiting value 700 ft/sec of velocity is far above the permitted
critical value. Thus, it remains to ensure that V (t ) does not reach 40 ft/sec
by the time it reaches the sea bed. But it is not possible to compute t at
which time the drum hits the sea bed and one needs to do further analysis.
Analysis: The idea is to view the velocity V (t ) not as a function of

time, but as a function of position y. Let v(y) be the velocity at height y
measured from the surface of the sea downwards. Then, clearly,
dV dv dy dv
V (t ) = v(y(t )) so that = = v . Hence, (1.2.7) becomes
dt dy dt dy

v dv g

 = ,
W − B − cv dy W (1.2.9)

v(0) = 0.

This is a first order non-homogeneous nonlinear equation for the velocity

v. Indeed, the equation is more difficult, but it is in a variable separable
form and can be integrated easily. We can solve this equation to obtain the
solution in the form
gy v W −B W − B − cv
= − − 2 log . (1.2.10)
W c c W −B
Of course, v cannot be explicitly expressed in terms of y as it is a

nonlinear equation. However, it is possible to obtain accurate estimates
for the velocity v(y) at height y and it is estimated that v(300) ≈ 45 ft/sec
and hence, the drum could break at a depth of 300 feet.
Tail to the Tale: This problem was initiated when environmentalists

and scientists questioned the practice of dumping waste materials by the
Atomic Energy Commission of USA. After the study, the dumping of
atomic waste was forbidden, in regions of sea not having sufficient
depths.
1.2.3 Mechanical vibration model

The fundamental mechanical model, namely spring-mass-dashpot system
(SMD) has applications in shock absorbers in automobiles, heavy guns,
etc. An object of mass m is attached to an elastic spring of length l which
is suspended from a rigid horizontal body. This is a spring–mass system.
Elastic spring has the property that when it is stretched or compressed
by a small length ∆l, it will exert a force of magnitude proportional to
∆l, say k∆l in the opposite direction of stretching or compressing. The
positive constant k is called spring constant which is a measure of stiffness
of the spring. We then obtain an SMD system when this spring–mass is
immersed in a medium like oil which will also resist the motion of the
spring–mass. In a simple situation, we may assume that the force exerted
by the medium on the spring–mass is proportional to the velocity of the
mass and in the opposite direction of the movement of mass. It is also
similar to a seismic instrument used to obtain a seismograph to detect the
motion of the earth’s surface.
Let y(t ) denote the position of mass at time t, y = 0 being the position
of the mass at equilibrium and let us take the downward direction as
positive. There are four forces acting on the system, that is, F = W + R+
D + F0 , where W = mg, the force due to gravity; R = −k(∆l + y), the
restoring force; D, the damping or drag force and F0 , the external applied
force, if any. Drag force is the kind of resistance force which the medium
exerts on the mass and hence, it will be negative. It is usually
dy
proportional to the velocity, that is, D = −c . At equilibrium, the
dt
spring has been stretched a length ∆l and so k∆l = mg. Applying
Newton’s second law, we get
d2y dy
m 2
= −ky − c + F0 (t ). (1.2.11)
dt dt
That is,
d2y dy
m 2
+ c + ky = F0 (t ), m, c, k ≥ 0. (1.2.12)
dt dt
This is a second order non-homogeneous linear equation with constant
coefficients and we study such equations in detail in Chapter 3. Such a
system also arises in electrical circuits, which we discuss next.
1.2.4 Electrical circuit

A basic LCR electrical circuit is shown in Fig. 1.2, and is described as
follows:
V L
S
C
Fig. 1.2 A basic LCR circuit

By Kirchoff’s second law, the impressed voltage in a closed circuit equals
the sum of the voltage drops in the rest of the circuit. Let E (t ) be the
dQ
source of electro motive force (emf), say a battery, I = be the current
dt
flow, Q(t ) the charge on the capacitor at time t. Then, the voltage drops
across inductance (L), resistance (R) and capacitance (C ), respectively,
dI d2Q dQ Q
are given by L = L 2 , RI = R + . Thus, we obtain a similar
dt dt dt c
equation for Q as in (1.2.12):
d2Q dQ Q
L 2
+R + = E (t ). (1.2.13)
dt dt c
More often, the current I (t ) is the physical quantity of interest; by

differentiating (1.2.13) with respect to t, the equation satisfied by I is
d2I dI 1 dE
L 2
+R + I = (t ). (1.2.14)
dt dt c dt
Mathematically, the equation is exactly same as the equation obtained in
the spring–mass–dashpot system. We can also see the similarity between
various quantities: inductance corresponding to mass, resistance
corresponding to damping constant and so on.
1.2.5 Satellite problem

Consider an artificial satellite of mass m orbiting the earth. We assume
that the satellite has thrusting capacity with radial thrust u1 and a thrust u2
which is applied in a direction perpendicular to the radial direction. The
thrusters u1 and u2 are considered as the external force F or control inputs
applied to the satellite.
The satellite can be considered as a particle P moving around the earth
in the equatorial plane. If (x, y)is the rectangular coordinate of the particle
P of mass m, then by Newton’s law, the equations of motion along the
rectangular coordinate axes are given by
mẍ = Fx , mÿ = Fy (1.2.15)
where, Fx and Fy denote the components of the force F in the directions
of the axes (see Fig. 1.3). It will be convenient to represent the motion in
polar coordinates (r, θ ), where,
x = r cos θ , y = r sin θ
We will resolve the velocity, acceleration and force of the particle into
components along the radial direction and the direction perpendicular to it.
Denote by u, v; a1 , a2 and Fr , Fθ the components of velocity, acceleration
and force, respectively in the new coordinate system. The resultant of u
and v is also equal to the resultant of the components of ẋ and ẏ. Therefore,
by resolving parallel to the x-axis, we get
ẋ = u cos θ − v sin θ (1.2.16)
y
v
u
x
θ
P
u2
r
r
θ x
u1
Fig. 1.3 Satellite problem

Since x = r cos θ , differentiating with respect to time t,
ẋ = ṙ cos θ − r (sin θ )θ̇ (1.2.17)
From (1.2.16) and (1.2.17), we have
u cos θ − v sin θ = ṙ cos θ − r (sin θ )θ̇ (1.2.18)
Comparing coefficients of cos θ and sin θ from (1.2.18), we have
u = ṙ, v = rθ̇ . (1.2.19)
Then, by resolving the acceleration parallel to the x- and y-axes, we get
ẍ = a1 cos θ − a2 sin θ .
By differentiating (1.2.17), we obtain
d ẋ d
ẍ = = (ṙ cos θ − r (sin θ )θ̇ ) (1.2.20)
dt dt
= r̈ cos θ − ṙ (sin θ )θ̇ − ṙ (sin θ )θ̇ − r (cos θ )θ̇ 2 − r (sin θ )θ̈

(1.2.21)
= (r̈ − rθ̇ 2 ) cos θ − (2ṙθ̇ + rθ̈ ) sin θ (1.2.22)
Equating the expressions of ẍ obtained here, we get
a1 cos θ − a2 sin θ = (r̈ − rθ̇ 2 ) cos θ − (2ṙθ̇ + rθ̈ ) sin θ (1.2.23)
Comparing coefficients of cos θ and sin θ from (1.2.23), we get the

components of the acceleration as
a1 = r̈ − rθ̇ 2 , a2 = 2ṙθ̇ + rθ̈ (1.2.24)
Therefore, the equations of motion of the particle P reduce to
m(r̈ − rθ̇ 2 ) = Fr , 2mṙθ̇ + mrθ̈ = Fθ , (1.2.25)
The force F is called a central force if Fθ = 0. In this case, the force is
always directed towards a fixed point. Take this point as the origin, where
the earth is placed. The central force, by Newton’s law of gravitation is
proportional to the product of the mass M of earth, mass m of the satellite
and inversely proportional to the square of the distance between them.
GMm
Thus, Fr = − 2 , where G is the gravitational constant. Let k = GMm.
r
Now, the equations of motion are given by
k
m(r̈ − rθ̇ 2 ) = − (1.2.26)
r2
mrθ̈ + 2ṙθ̇ m = 0. (1.2.27)
Newton derived Kepler’s laws of planetary motion using these equations.
The interested reader can refer to [Sim91]. Note that r (t ) = σ ,
θ (t ) = ωt, where σ , ω are appropriate constants, is a special solution to
the aforementioned equations which corresponds to a circular orbit.
Assume that the mass is equipped with the ability to exert a thrust u1
in the radial direction and u2 in the direction perpendicular to the radial
direction. Then, under the presence of these external forces (known as
controls), the equations of motion become
k
mr̈ − mrθ̇ 2 + = u1
r2
2mṙθ̇ + rθ̈ m = u2 . (1.2.28)
By scaling the time variable, we may assume that the mass m of the
satellite is 1. Then, the motion of the satellite is described by a pair of
second order nonlinear differential equations:
2
d2r dθ k
2
= r (t ) − 2 + u1 (t ). (1.2.29)
dt dt r (t )
d2θ 2 dθ dr u2 (t )
=− + . (1.2.30)
dt 2 r (t ) dt dt r (t )
In applications, when a satellite is injected into an orbit, it usually drifts
from its prescribed orbit due to the influence of other cosmic forces. The
thrusters (controls) are activated to maintain the desired orbit of the
satellite.
1.2.6 Flight trajectory problem

We consider an aeroplane which departs from an airport located at point
(a, 0) and intends to reach an airport located at (0, 0) in the western
direction from the departure airport. Assume that the constant wind
velocity in the northern direction is w and the plane travels with constant
speed v0 relative to the wind. Assume that the plane’s pilot maintains its
heading directly towards the origin (0, 0).
The ground velocities of the plane in the direction of the x-axis and the
y-axis are given by
dx v0 x
= −v0 cos θ = − p
dt x2 + y2
dy v0 y
= −v0 sin θ + w = − p + w.
dt x2 + y2
y
w
(x, y) w
aeroplane
V0
p
x2 + y2 y
θ
x
x (0, 0) (a, 0)
Fig. 1.4 Flight trajectory

The path {(x(t ), y(t )),t ≥ 0} is called the orbit or trajectory of the aircraft
in the xy plane (Figure 1.4). These equations can be implicitly written as
dy 1 p
= (v0 y − w x2 + y2 ).
dx v0 x
1.2.7 Other examples

Example 1.2.2
[Unforced Duffing equation or oscillator] This is a second order

equation, named after Georg Duffing, and is given by
ẍ − αx + β x3 + δ ẋ = 0. (1.2.31)
Here α, β are nonzero real numbers and δ ≥ 0. This equation, referred

to as a nonlinear oscillator, is a perturbation of the usual linear oscillator,
namely (1.2.31) with β = δ = 0 and α < 0. The nonlinear equation
(1.2.31) has a cubic nonlinearity and a linear damping term. It models
more complicated dynamics of a spring pendulum whose spring stiffness
does not exactly obey Hooke’s law. The case of δ = 0 and a periodic
forcing term was extensively studied by Duffing.
By dilating the variables x to ax and t to bt, for suitable constants a, b,
we can write (1.2.31) in the following standard form
ẍ − x + x3 + δ ẋ = 0, (1.2.32)
if αβ > 0 and
ẍ + x + x3 + δ ẋ = 0, (1.2.33)
if αβ < 0.
Example 1.2.3
[Unforced van der Pol equation or oscillator] This is also a second

order nonlinear equation given by
ẍ − µ (x2 − 1)ẋ + x = 0, µ ∈ R. (1.2.34)
This equation, apparently first introduced in 1896 by Lord Rayleigh, was

extensively studied both theoretically and experimentally using electrical
circuits by the Dutch engineer van der Pol when he was working for the
Philips company (in the Netherlands) around 1920. He also studied this
equation with forced periodic term A sin ωt and observed the
phenomenon, which in the current literature is termed as chaos. A
detailed mathematical analysis of this equation was done by Cartwright
and Littlewood [CL45] and by Levinson [Lev49]; their study revealed the
existence of the paradoxical combination of randomness and structure,
which is also called deterministic chaos in the current literature; see
Example 1.2.5, Lorenz equations.
The van der Pol equation is also used to model certain situations in
physical and biological sciences. For example, in seismology, it is used to
model the motion of two plates in a geological fault; in biology, it is used
to model the action potential of neurons.
Example 1.2.4
[Pendulum equation or Nonlinear oscillator] Again, this is a second

order equation given by
ẍ + k sin x = 0, k > 0. (1.2.35)
When x is small, we have sin x ≈ x and one obtains the linear pendulum
equation.
Example 1.2.5
[Lorenz equations] The Lorenz system is given by

ẋ = −σ x + σ y
ẏ = Rx − y − xz (1.2.36)
ż = −bz + xy,
where R, σ , b are fixed parameters.
Motivated by the meteorological problem of weather prediction, Lorenz

derived these equations as a much simplified model of Rayleigh–Bernard
convection in fluids, which provided a first specific example of chaotic
dynamics persisting for all time. Later, in Japan, Ueda studied the steady
state chaotic behaviour in the context of the periodically forced Duffing
oscillator.
Lorenz’s work was not noticed by the mathematical community when

it was published, perhaps due to its appearance in a non-mathematical
journal. Once it was noticed, a host of papers appeared announcing similar
phenomenon in the context of other equations. All these works have been
put together in book form and the interested reader may consult [Hao84].
Example 1.2.6
[Lotka–Volterra Prey–Predator Model]
The dynamical behaviour governing the growth, decay and general

evolution of two interacting biological species (the case of a single
species was discussed earlier using the logistic model) is modelled by the
Lotka–Volterra prey–predator equations, which are given by
ẋ = ax − bxy
(1.2.37)
ẏ = −cy + dxy.
Here x denotes the population of the prey, say rabbits at a given time and
y denotes the population of the predator, say foxes. The constants a, b, c, d
are all positive and represent the growth and decay rates of the prey and
the predator. In (1.2.37), there is no competition within the same species.
If we incorporate this also in the model, the dynamics changes; see, the
exercises in Chapter 8.
Example 1.2.7
[Mathematical Epidemiology] We now describe the basic SIR model

of mathematical epidemiology and an extension of it. The host
population (humans for example) is divided into a small number of
compartments, each containing individuals that are identical in terms
of their status with respect to the disease (tuberculosis, for example)
in question. In the SIR model, we have the following three
compartments:
1. Susceptible (S): individuals who have no immunity to the
infectious agent, so might become infected if exposed.
2. Infectious (I): individuals who are currently infected and can
transmit the infection to susceptible individuals who they contact.
3. Removed (R): individuals who are immune to the infection, and

consequently do not affect the transmission dynamics in any way
when they contact other individuals.
The total host population is N = S + I + R. When N is very large, we may

treat the variables in question as continuous variables and the dynamics of
S, I variables is described by the following differential equations:
Ṡ = −β SI
(1.2.38)
I˙ = β SI − γI.
Here, the transmission rate (per capita) is β and the recovery rate is γ
(so the mean infectious period is 1γ ). Since (1.2.38) do not contain R,
no equation for R is written; the appropriate equation for R is Ṙ = γI
(outflow from I compartment goes into the R compartment). But, once the
I is determined from these equations, R can easily be determined.
If we now expand the SIR model to include B, the births per unit time
and a natural mortality rate µ(per capita), then these equations become
Ṡ = B − β SI − µS
(1.2.39)
I˙ = β SI − γI − µI.
The timescale for substantial changes in birth (decades) is much larger
than, say, a measles epidemic (a few months), so one may assume that the
total population is constant and that B = µN. Therefore, there is essentially
only one new parameter here.
Example 1.2.8
[Hamiltonian system] This is an even dimensional system described

in terms of a smooth function (a Hamiltonian) H (x, y) of 2n variables,
where H : Rn × Rn → R. The system is described by the following
set of equations
∂H ∂H
ẋ j = , ẏ j = − (1.2.40)
∂yj ∂xj
for j = 1, 2, · · · , n.
If (x(t ), y(t )) is a solution of (1.2.40), it is easy to see that H (x(t ),

y(t )) ≡ constant; in particular, if H (x(t0 ), y(t0 )) = 0 for some t0 , then
H (x(t ), y(t )) = 0 for all t.
A conservative system of order n is a system of second order equations
∂V
ẍ j + (x1 , · · · , xn ) = 0, (1.2.41)
∂xj
for j = 1, · · · , n. Here V : Rn → R is a smooth function called a potential
function. By introducing new variables y j = ẋ j , we immediately see that
a conservative system becomes a Hamiltonian system with the
corresponding Hamiltonian given by the total energy
1
H (x1 , · · · , xn , y1 , · · · , yn ) = (y21 + · · · + y2n ) + V (x1 , · · · , xn ).
2
We will recall these examples in Chapter 8 where a detailed analysis of
them will be done.
Example 1.2.9
[Discrete Dynamical System]
When the time (independent) variable t varies discretely instead of

varying continuously, we have a discrete dynamical system. In this
scenario, we deal with a difference equation instead of a differential
equation. A difference equation of nth order is given by
xn+1+k = f (x1+k , · · · , xn+k ), (1.2.42)
for k = 0, 1, · · · and n = 1, 2, · · · , for an appropriate function f defined on
a subset of Rn . For a given set of arbitrary x1 , · · · , xn , the given equation
(1.2.42) will generate xn+k , k = 1, 2, · · · and we need to study the sequence
{xm }, for its boundedness, convergence, etc.
A very familiar example of a difference equation is the Fibonacci
sequence generated by any given arbitrary real numbers x1 , x2 and
satisfying the difference equation
xn+2 = xn + xn+1 ,
for n = 1, 2, · · · . This is a second order equation. Another example is given
by the logistic map:
xn+1 = axn (1 − xn ),
for n = 1, 2, · · · . This is a first order equation. The constant a ∈ [0, 4]. Thus,
if x1 ∈ [0, 1], then xn ∈ [0, 1] for all n > 1. The logistic map has been studied
extensively and it reveals many surprising properties of the sequence {xn }
for a certain range of values of a.
We will not pursue this subject in this book, but the interested reader
may look into, for example in [Hao84, Wig90].
1.3 Exercises
1. Consider the initial value problem2
dy
= ay(t ) − by2 (t ), y(t0 ) = y0
dt
where a, b > 0, t0 , y0 ∈ R. Assume the unique existence of the (local)
solution y = y(t ) in the interval (t1 ,t2 ) with t0 ∈ (t1 ,t2 ).
(a) Without attempting an explicit representation of the solution,

show that y satisfies sign (y(t )(y(t ) − a/b)) = sign(y0 (y0 −
a/b))).
(b) Now solve the IVP to get the implicit form
|y| |a − by0 |
log = t − t0 .
|y0 | |a − by|
(c) Use the first part to obtain the solution y in the explicit form
ay0
y(t ) =
by0 + (a − by0 )e−a(t−t0 )
(d) In each of the cases of the first part, describe the maximal
interval (t∗ ,t ∗ ), where the solution y is defined. This is referred
to as the maximal interval of existence, which will be
discussed in detail in Chapter 4. (Note that t ∗ can be +∞ or t∗
can be −∞). Further, compute the limits
lim y(t ) and lim∗ y(t ).
t↓t∗ t↑t
2 The methods of solutions are described in Chapter 3
dy d 2 y
(e) In each of the cases, find dt , dt 2 and analyse the shape of the
curve.
(f) Find the conditions on y0 so that t∗ = −∞ and/or t ∗ = +∞.
(g) Plot the graphs of the solutions y in the ty plane for different
values of y0 .
(h) Let y = y(t ) be the solution as earlier and z = z(t ) be the
solution to the initial value problem:
dz
= az(t ) − bz2 (t ), z(t1 ) = y0 .
dt
Represent z in terms of y. Sketch with different initial times. Do
you observe any property? Describe the observed properties for
the general problem
dy
= f (y(t )), y(t0 ) = y0 .
dt
2. Consider the modified population model with a real parameter λ ,
namely
dy
= ay(t ) − by2 (t ) − λ , y(t0 ) = y0 .
dt
Do a similar analysis for various values of the parameter. More
precisely, show that there is a critical value λcr such that for
λ > λcr , the behaviour is exactly similar, but for λ < λcr , the
behaviour of the solution is completely different.
3. Consider the linear model of the atomic waste disposal problem:
dV cg g
+ V (t ) = (W − B), V (0) = 0,
dt W W
where V = V (t ) is the velocity at time t.
(a) Find the solution V and find the limit lim V (t ).
t→∞
(b) Now derive the non linear model:
v dv(y) g
= , v(0) = 0
W − B − cv dy W
where v = v(y) is the velocity at the distance y, and solve the

same to obtain the solution in the implicit form:
gy v W − B W − B − cv
=− − log .
W c C2 W −B
4. Obtain (1.2.32) and (1.2.33) from (1.2.31) by suitable dilations.
1.4 Notes
We have presented a few real world problems to highlight the importance
of modelling using ODEs and their analysis. Of course, the examples are
not exhaustive; in fact, one can find several text books devoted to a
particular topic, for example, mathematical biology, mechanical systems,
etc. We have seen through the atomic waste disposal problem (Section
1.2.3) that through the simple linear model, we can solve the problem
explicitly, but incomplete answer to the question set out therein.
However, a little reformulation gives us a non-linear equation, which in
general is hard to solve, yet gives us a complete answer to the question.
This exhibits the importance of correct modelling and its analysis even if
the solution is not available in explicit form. Such phenomena can be
observed in other models like population growth (Sections 1.2.1 and
1.2.2). One should bear in mind such peculiarities arising in the analysis
of ODEs. In general, it is hard to obtain explicit or implicit or even a
representation of a solution leading to the necessity of analysing the
solution in the absence of such forms.
A large number of real life examples are available in Martin Brown
[Bra78, Bra75]. See also [AMR95, TS86]
2
Preliminaries
2.1 Introduction
In this chapter, we present some topics from linear algebra and analysis
which are extensively used in the subsequent chapters of the book. Our
discussion will only be brief and more details and longer proofs may be
found in the references cited at the end of this chapter. Quite often, the
explicit solution may not be available and we may appeal to the analysis
to derive the qualitative nature of the solution, which in turn may help us
to arrive at conclusions about the behaviour of the physical or biological
problems modelled through ODE. Even when the explicit solution is
known, it may be hard to draw significant conclusions regarding the
global behaviour of the system. We have therefore emphasized the
importance of analysis and linear algebra throughout this book, with the
hope that the beginner starts appreciating the essential role of these
subjects in the study of ODE.
2.2 Preliminaries from Real Analysis

First, we recall the concepts of pointwise convergence, uniform
convergence, etc. of a sequence of functions of one variable and Lipschitz
continuity of functions.
2.2.1 Convergence and uniform convergence

Let I be any interval in R. Let fn : I → R, n = 1, 2, . . ., be a sequence of
functions. We say that the sequence { fn } converges pointwise to a function
f : I ⊂ R → R, if the numerical sequence { fn (x)} converges to f (x) for
every x ∈ I.
Preliminaries 23
Consider the sequence

kx
f k (x ) = , x ∈ [0, 1], k = 1, 2, · · · (2.2.1)
kx + 1
Clearly fk (0) = 0 for all k and hence fk (0) → 0, whereas it is easy to see
that for x ∈ (0, 1], the sequence fk (x) → 1. Thus, the limit function is
(
0 if x = 0
f (x ) =
1 if x ∈ (0, 1].
Notice that each function fk defined here is continuous on I = [0, 1], but
the limit function f is discontinuous at x = 0. We thus lost the important
property of continuity under the pointwise convergence. Therefore, we
now discuss a stronger convergence under which the continuity property
is preserved. This is the notion of uniform convergence.
Definition 2.2.1
[Uniform Convergence] Let fk , f : I → R, k = 1, 2, . . . be functions

defined on I. Then, the sequence fk is said to converge to f uniformly
in I if for any given ε > 0, there exists N = N (ε ) ∈ N such that
| fk (x) − f (x)| < ε for all k ≥ N and for all x ∈ I.
We remark that in the pointwise convergence, N depends both on ε and

x, whereas in uniform convergence N depends only on ε. This uniformity
in x has far reaching consequences including preservation of continuity,
interchange of the limit and the integral.
Theorem 2.2.2
Let fk : I → R, k = 1, 2, . . . be a sequence of continuous functions that

converges uniformly to a function f : I → R. Then f is continuous.
This theorem immediately shows that the sequence of functions given by

(2.2.1) does not converge uniformly as f is not continuous on [0, 1].
Consider the example fk (x) = xk , x ∈ [0, 1]. It is easy to see that
fk (x) → f (x) point wise, where

 0, 0 ≤ x < 1
f (x ) =
 1, x = 1,
which is discontinuous at x = 1. The reader can directly verify that the

aforementioned convergence is not uniform without appealing to the
theorem. However, it is not hard to see that the convergence is uniform in
[0, 1 − η ] for any 0 < η < 1.
kx2
On the other hand, the sequence fk defined by fk (x) =
k|x| + 1
converges to f (x) = |x|, pointwise in R; however, the convergence is
uniform only in bounded intervals. Note that the limit function is also
continuous in R.
We remind the reader that the converse of Theorem 2.2.2 may not be
true. That is, fk → f pointwise, fk , f are continuous may not imply that
the convergence is uniform.
Example 2.2.3
Consider gk : [0, 1] → R defined by


1
kx, 0≤x≤


k







2 1 2
gk (x) = k −x , ≤x≤


 k k k



 2

 0, ≤ x ≤ 1.
k
The reader can easily verify that gk → g ≡ 0 pointwise and gk , g are

continuous, but the convergence is not uniform.
Theorem 2.2.4
Let fk : [a, b] → R be a sequence of functions converging uniformly

to a function f : [a, b] → R. Assume fn , f are Riemann integrable on
[a, b], then
Preliminaries 25
Z b Z b
lim fk (t ) dt = f (t ) dt.
k→∞ a a
In particular, if each fk is continuous, the integrability condition is

automatically satisfied. Again, we remark that uniform convergence may
not be a necessary condition for the Rvalidity of Theorem 2.2.4. This can
be seen from Example 2.2.3 because 01 gk → 0 = 01 0. At the same time,
R
in general, we may not be able to interchange the limit and integral signs
if the convergence is not uniform.
In view of Theorem 2.2.2 and Theorem 2.2.4, for a given sequence of
functions, extracting a uniformly convergent subsequence is very
important in analysis. In this direction, we need to have conditions under
which one can derive uniformly convergent subsequences. A well-known
theorem is the Arzela–Ascoli theorem. Before stating this result, we
introduce some more concepts.
We discuss the convergence and uniform convergence of series of
functions. Let {uk } be a sequence of functions defined on I. Consider the
k
sequence of partial sums fk = ∑ ui . If the sequence fk converges
i=1
pointwise (respectively, uniformly) to a function u on I, then we say that
∞
the infinite series, denoted by ∑ uk converges pointwise (respectively,
k =1
uniformly) to u on I.
Theorem 2.2.2 and Theorem 2.2.4 are valid for series under appropriate
hypotheses.
Theorem 2.2.5
[Weierstrass M-Test] Let Mk be a sequence of positive constants such

∞
that the series ∑ Mk converges. Suppose {uk } is a sequence of
k =1
functions defined on I such that |uk (x)| ≤ Mk , for all x ∈ I. Then, the
∞
series ∑ uk converges uniformly to a function u on I.
k =1
∞
sin kx
For example, the series ∑ 2
, x ∈ R converges uniformly since the
k =1 k
∞
1
series ∑ k2 is convergent.
k =1
Definition 2.2.6
[Uniform Boundedness] A sequence of functions { fk } defined on I is

said to be uniformly bounded if there exists a constant M > 0 such that
| fk (x)| ≤ M, for all x ∈ I, for all k ∈ N.
Definition 2.2.7
[Equicontinuity] A sequence of functions { fk } defined on I is said to

be equicontinuous on I, if for every ε > 0, there exists δ > 0 such that
| fk (x) − fk (y)| < ε whenever x, y ∈ I and |x − y| < δ , and for all k.
Clearly, if the family of functions is equicontinuous, then each member in

the family is uniformly continuous. So, in Definition 2.2.7, we require the
existence of uniform δ which works for all the members of the family.
Choose f ∈ C (R) with compact support, that is, f is identically zero
1
outside a compact set. Then, the sequence fk (x) = f (x) is a family of
k
equicontinuous functions. On the other hand, it is not difficult to verify
that the family of functions fk (x) = xk , 0 ≤ x ≤ 1, k = 1, 2, . . . is not an
equicontinuous family; however, each member of the family is a uniformly
continuous function, as it is a continuous function on a closed and bounded
interval.
Denote by C [a, b], the space of all continuous functions f : [a, b] → R.
Theorem 2.2.8
[Arzela–Ascoli] Let { fk } be a sequence of functions in C [a, b] which

is uniformly bounded and equicontinuous. Then, there exists a
subsequence { fkn } of { fk } such that { fkn } converges uniformly to a
function f ∈ C [a, b].
A proof can be found in several books, see, for instance, [Rud76, CL72].
Definition 2.2.9
[Lipschitz Continuity] A function f : D ⊆ R → R, is said to be locally

Lipschitz in D if for any x0 ∈ D, there exists a neighbourhood Nx0 of
x0 and an α = α (x0 ) > 0 such that
| f (x) − f (y)| ≤ α|x − y|, for all x, y ∈ Nx0 .
Preliminaries 27
The function f : D ⊂ R → R is said to be Lipschitz (or globally

Lipschitz) in D if there exists α > 0 such that
| f (x) − f (y)| ≤ α |x − y|, for all x, y ∈ D. (2.2.2)
The smallest α satisfying (2.2.2) is called the Lipschitz constant of f . If f

is Lipschitz (globally), then it is uniformly continuous. We also have the
following result giving a sufficient condition for Lipschitz continuity; the
proof trivially follows from the mean value theorem.
Theorem 2.2.10
Suppose D is an open interval in R and f : D → R is differentiable

on D and α = sup | f 0 (x)| < ∞. Then, f is Lipschitz with a Lipschitz
x∈D
constant less than or equal to α.
Thus, the class of Lipschitz continuous functions is quite large and

includes all our familiar functions: polynomial functions, polynomials of
sine and cosine functions, exponential functions, etc. The definition of
Lipschitz continuity of a function of one variable given in Definition
2.2.9 can be analogously extended to any vector valued map
f : (a, b) × D → Rn , where (a, b) is an interval in R and D is a domain in
Rn . In Chapter 4, which includes a discussion on the existence of
solutions, we need to introduce the Lipschitz continuity of f(t, y) with
respect to the second argument y keeping t as a parameter. While defining
Lipschitz continuity in the single variable case, we used the usual
absolute value to denote the distance between any two real numbers. We
need a similar concept of distance in the several variables case, which we
introduce now.
In Rn , we define a notion of distance of a vector from the origin, in
terms of a function called norm.
Definition 2.2.11
[Norm] A norm, denoted by k.k on Rn is a mapping from Rn → R

that satisfies:
1. kxk ≥ 0; kxk = 0 if and only if x = 0,

2. kaxk = |a|kxk,
3. (Triangle inequality) kx + yk ≤ kxk + kyk,
for all x, y ∈ Rn and scalars a.

See also Definition 2.4.2 for a general definition of a normed linear
space. The following are some examples of norms in Rn . For
x = (x1 , x2 , · · · , xn ) ∈ Rn , define, for x ∈ Rn
!1/p
n
kxk p = ∑ |xi | p , 1 ≤ p < ∞ and kxk∞ = max |xi |.
1≤i≤n
i=1
Usually, it is the third property of the norm that does not follow in an
obvious way and needs proof. In the context of Rn , it is called Minkowski’s
inequality. When p = 2, it is the usual Euclidean norm (or distance).
It is convenient to take p = 1 for the discussion in Chapter 4 and we
write k · k1 = | · | and state the definition of Lipschitz continuity now in
terms of this 1−norm.
Definition 2.2.12
A function f(t, y) : (a, b) × D → Rn is said to be Lipschitz continuous

(globally) with respect to y if there exists α > 0 such that
|f(t, y1 ) − f(t, y2 )| ≤ α|y1 − y2 |
for all (t, y1 ) and (t, y2 ) in (a, b) × D.
The smallest such constant α is known as the Lipschitz constant of f. We

can also define local Lipschitzness analogously.
We will now see some examples of Lipschitz continuous functions.
Example 2.2.13
(i) f (t, y) = t + 3y is globally Lipschitz continuous in R with respect to

y.
(ii) f (t, y) = t 2 + y2 is Lipschitz continuous on any bounded domain D
in R2 .
(iii) f (t, y) = t 2 y3 is Lipschitz continuous on any bounded domain in R2 .
Preliminaries 29
2
(iv) For the function f (t, y) = e−t y2 sint on D = {(t, y) : 0 ≤ y ≤ 2,
t ∈ R}, we have
2
| f (t, y1 ) − f (t, y2 )| = |e−t sint||y1 + y2 ||y1 − y2 | ≤ 4|y1 − y2 |
for any (t, y1 ), (t, y2 ) in D. Thus, f (t, y) is Lipschitz continuous on
the strip D.
√
(v) f (t, y) = t y on the rectangle D = {(t, y) : 0 ≤ t ≤ 1, 0 ≤ y ≤ 1}
Note that
√ 1
| f (1, y) − f (1, 0)| = y = √ |y − 0|
y
1
and √ → +∞ as y → 0+ . Hence, the function f is not Lipschitz
y
continuous on the rectangle D, but is continuous on D.
Here we state a sufficient condition for Lipschitz continuity of f(t, y) with
respect to y.
Theorem 2.2.14
Let f : (a, b) × D → Rn be a C1 vector valued function, where D is a

convex domain in Rn such that

∂ fi
sup ∂ y j (t, y) = α < ∞,

(t,y)∈(a,b)×D
for i, j = 1, 2, · · · , n. Then, f(t, y) is Lipschitz continuous on (a, b) × D

with respect to y having a Lipschitz constant less than or equal to a
multiple of α.
The convexity assumption means, by definition, that the line segment

joining any two points of D lies inside D.
Example 2.2.15
Let f (t, y) = t + y2 on D : |t| ≤ a, |y| ≤ b. Then, ∂∂ yf = 2y which implies

∂ f
(t, y) = |2y| ≤ 2b. Thus, f (t, y) is Lipschitz continuous on D with
∂y
Lipschitz constant α = 2b.
Example 2.2.16
Let f (t, y) = t|y| on D : |t| ≤ a, |y| ≤ b. Then,
| f (t, y1 ) − f (t, y2 )| = |t|y1 | − t|y2 | | ≤ |t||y1 − y2 | ≤ a|y1 − y2 |.

∂f
Note that f (t, y) is Lipschitz continuous, but does not exist at any
∂y
point (t, 0) ∈ D for which t 6= 0. Thus, Lipschitz continuity is a smoothness
property stronger than continuity, but weaker than differentiability, locally.
A few more results from analysis, namely calculus lemma,
differentiation under integral sign and Taylor’s formula are presented
here.
Lemma 2.2.17
[Calculus Lemma] Let (a, b) be a finite or infinite interval and h :

(a, b) → R satisfy either
(i) h is bounded above and non-decreasing or

(ii) h is bounded below and non-increasing,
then, lim h(t ) exists.

t→b
The lemma is important in the sense that the boundedness alone will
1
not give the existence of limit. For example, f (x) = sin is bounded
x
1
above and below for 0 < x < 1, but lim sin does not exist.
x→0 x
Proof: If h satisfies (i), then −h satisfies (ii) and vice versa. Hence, it is
enough to prove with one of the assumptions. So assume (i) holds true.
Put α = sup h(t ) < ∞ as h is bounded above. For any ε > 0, there exists
t∈(a,b)
t0 ∈ (a, b) such that α − ε < h(t0 ) ≤ α. If t ∈ (a, b),t > t0 , then α − ε ≤
h(t0 ) ≤ h(t ) ≤ α which implies 0 ≤ α − h(t ) ≤ ε for all t ≥ t0 , that is,
lim h(t ) = α.
t→b
Z t
We know that if f : [a, b] → R is continuous, then F (t ) = f (s) ds is
a
differentiable, F 0 (t ) = f (t ) and F is referred to as an antiderivative of f .
Preliminaries 31
Often in integral calculus, we employ the method of substitution,

which is also called the change of variable formula, to evaluate integrals.
This is also the basis for the method of separable variables to solve
certain first order ODE. The formula is an easy consequence of the chain
rule used in differentiation.
Let f be a (Riemann) integrable function over an interval [a, b] in R.
If F is an anti derivative of f , then F is differentiable and
Z b
f (t ) dt = F (b) − F (a),
a
provided f is continuous. If u is a differentiable function defined on some

interval J in R, then by chain rule, we have
(F ◦ u)0 (t ) = F 0 (u(t ))u0 (t ) = f (u(t ))u0 (t ),
for t ∈ [a, b], provided that the compositions of the functions are
well-defined; here 0 denotes the derivative of the function in question.
Therefore, we obtain the formula
Z b Z u(b)
0
f (u(t ))u (t ) dt = F (u(b)) − F (u(a)) = f (s) ds.
a u(a)
The foregoing discussion on change of variable is stated in the following

theorem.
Theorem 2.2.18
[Change of Variable Formula]
Let g : [c, d ] → R be a C1 function, that is, g is continuously differentiable

and let [a, b] be any interval containing the image of g, that g[c, d ] ⊂ [a, b].
If f : [a, b] → R is a continuous function, then
Z d Z g(d )
f (g(t ))g0 (t ) dt = f (x)dx.
c g(c)
The change of variable formula is very important in many occasions and

it is equally important to understand Zcertain symbolic notations we use
frequently. For example, to compute y3 dy by substitution, say y2 = t,
1
Z Z
3
we quite often write 2ydy = dt which in turn gives y dy = tdt.
2
How do we interpret the symbolic notation 2ydy = dt? This can√be done
3
via the change of variable Z √ take f (y) = y Z and g(t ) = t, then,
formula;
1 1
Z
by Theorem 2.2.18, y3 dy = ( t )3 √ dt = tdt.
2 t 2
We now discuss an important result known as differentiation under the
integral sign.
Theorem 2.2.19
[Generalized Leibnitz Formula]
Let α, β : [a, b] → R be differentiable functions and c, d be real numbers

satisfying
c ≤ α (t ), β (t ) ≤ d, for all t ∈ [a, b].
∂f
Let f : [a, b] × [c, d ] → R be a continuous function such that ∂t (t, s) is
also continuous. Define
Z β (t )
F (t ) = f (t, s) ds.
α (t )
Then, F is differentiable and

Z β (t )
dF ∂f dβ dα
= (t, s) ds + f (t, β (t )) − f (t, α (t )) . (2.2.3)
dt α (t ) ∂t dt dt
We mention two particular cases of (2.2.3) which are often used. The first
one is obtained by taking α (t ) = a, a constant and β (t ) = t. Then, we
obtain
Z t Zt
d ∂f
f (t, s) ds = (t, s) ds + f (t,t ).
dt a a ∂t
For the other, we take f = f (t ), a one variable function, α (t ) = a, a

constant and β (t ) = t and obtain the fundamental theorem of calculus:
Z t
d
f (s) ds = f (t ).
dt a
Preliminaries 33
Theorem 2.2.20
[Taylor’s Formula]
Let f : (a, b) → R be a C2 function, that is twice continuously

differentiable function and x0 ∈ (a, b). Then
1 00
f (x0 + y) = f (x0 ) + f 0 (x0 )y + f (ζ )y2
2
for some point ζ between x0 and x0 + y. Here 0 denotes differentiation with
respect to x.
More generally, in the several variables case, let f : D → R be a C2
function, where D is an open set in Rn . Let x0 ∈ D and r > 0 be such that
B(x0 , r ) ⊂ D. For y ∈ B(x0 , r ), define
F (t ) = f (x0 + ty),
for 0 ≤ t ≤ 1 and x0 + ty ∈ B(x0 , r ). Applying the one variable Taylor
formula, we get (in the following 0 denotes differentiation with respect
to t)
1
F (1) − F (0) = F 0 (0) + F 00 (ζ )
2
for some ζ ∈ [0, 1], that is,
1
f (x0 + y) − f (x0 ) = F 0 (0) + F 00 (ζ ).
2
n
∂f
By chain rule, F 0 (t ) = ∑ ∂ x j (x0 + ty)y j . Hence,
j =1
F 0 (0) = ∇ f (x0 ) · y,
where

∂f ∂f
∇ f (x0 ) = (x0 ), · · · , (x0 )
∂ x1 ∂ xn
is the gradient of f at x0 . Doing a further differentiation of F 0 (t ), we get
n
∂2 f
F 00 (t ) = ∑ (x0 + ty)yi y j .
i, j =1 ∂ xi ∂ x j
Thus, F 00 (ζ ) = O(|y|2 ) contains terms of quadratic and higher orders in

y. Hence, it follows that
f (x0 + y) = f (x0 ) + ∇ f (x0 ) · y + O(|y|2 ).
We can extend this result further to multi-valued cases. Let f : D ⊂ Rn →
Rn . Hence, f = ( f1 , · · · , fn ) is considered as a column vector and each
fi : D → R. Thus, the formula holds for each fi . Using the matrix notation,
∇ f1 (x0 )
 
·
 
 
 
Df(x0 ) =  ·
 

 

 · 

∇ fn (x0 )
which is an n × n matrix and we may write
f(x0 + y) = f(x0 ) + Df(x0 )y + O(|y|2 ).
Note that ∇ f (x0 ) · y is the dot product, whereas Df(x0 )y is the action of
the matrix Df(x0 ) on the vector y.
2.3 Fixed Point Theorem

We now introduce the notion of a metric space.
Definition 2.3.1
[Metric Space] Let X be a non-empty set. A metric d is a mapping

d : X × X → R which satisfies the following properties:
1. d (x, y) ≥ 0; d (x, y) = 0 if and only if x = y,
2. (Symmetry) d (x, y) = d (y, x),
3. (Triangle inequality) d (x, y) ≤ d (x, z) + d (z, y),
for all x, y, z ∈ X. The set X together with the metric d is called a metric
space. We refer to (X, d ) as a metric space; when the metric in the context
is clear, we say X is a metric space.
The n dimensional real and complex Euclidean spaces, Rn and Cn are
examples of metric spaces, where the metric d is defined by d (x, y) =
Preliminaries 35
!1/2
n
∑ |xi − yi |2 for all x = (x1 , · · · , xn ), y = (y1 , · · · , yn ) ∈ Rn or Cn .
i=1
This is the standard Euclidean metric or distance. There are many other
metrics we can introduce on Rn and Cn . We will see more examples later.
Let (X, d ) be a metric space. A sequence {xk } in X is said to converge
to a point x ∈ X if for given ε > 0, there exists N ∈ N such that d (xk , x) < ε
for all k ≥ N. This statement may also be written as d (xk , x) → 0 or xk → x
as k → ∞. It is easy to see, from the triangle inequality, that if xk → x and
xk → y, then x = y.
A sequence {xn } ⊂ X is said to be a Cauchy sequence, if d (xn , xm ) → 0
as n, m → ∞. A metric space (X, d ) is said to be a complete metric space if
every Cauchy sequence in X converges in X. A normed linear space which
is a complete metric space (metric induced by the norm) is called a Banach
space (see, Definition 2.4.2).
In particular, if X = Rn , uk ∈ Rn converges to u ∈ Rn if |uk − u| → 0
as k → ∞ and Rn is a Banach space. It is also a Banach space under the
norms
!1/p
n
kxk p = ∑ |xi | p , 1 ≤ p < ∞,
i=1
and
kxk∞ = max |xi |,
1≤i≤n
for x ∈ Rn . The function space C [0, 1] or, more generally, C [a, b] with sup
norm is a Banach space. However, it is not a complete space with respect
to k · k1 introduced earlier. The completeness plays a crucial role in the
fixed point theorem to be studied later.
It is also easy to check that for a sequence { fn } ⊂ C [a, b], the statement
fn → f in sup norm is equivalent to saying that fn converges uniformly to
f.
Suppose (X, d ) is a metric space, x ∈ X, r > 0. The set Br (x) ≡ {y ∈ X :
d (x, y) < r} is called an open ball of radius r centred at x. The collection
{Br (x) : x ∈ X, r > 0} forms a basis for a topology in X. This is referred
to as the topology induced by the metric d in X. An open set in X is, by
definition, an arbitrary union of open balls. A subset of X is closed if its
complement is open in X.
If Y ⊂ X, then (Y , d ) is also a metric space. If (X, d ) is a complete

metric space and Y is a closed (referring to the metric topology) subset of
X, then (Y , d ) is also a complete metric space. In particular, the closed
balls B̄r (x) ≡ {y ∈ X : d (x, y) ≤ r} are complete metric spaces in a
complete metric space (X, d ).
A subset A of X is said to be bounded, if there exists an M > 0 such that
d (x, y) ≤ M for all x, y ∈ A. The smallest such M is called the diameter of
A. A subset A of X is said to be compact 1 if given any sequence {xn } ⊂ A,
there is a subsequence {xnk } which converges to some element in A. If
A ⊂ X is compact, it is easy to see that A is closed and bounded. However,
the converse may not be true. In Rn , A is compact if and only if A is closed
and bounded. This is the Heine–Borel theorem.
In this section, we present the Banach fixed point theorem which is
used in the existence result. Suppose T : X → X is a mapping. A point
x∗ ∈ X is said be a fixed point of T if T x∗ = x∗ .
Theorem 2.3.2
[Banach Fixed Point Theorem] Suppose (X, d ) is a complete metric

space and T : X → X is a contraction, that is, there exists an α ∈ (0, 1)
such that
d (T x, Ty) ≤ α d (x, y) (2.3.1)
for all x, y ∈ X. Then, T has a unique fixed point x∗ ∈ X. Further, the
sequence {xk } defined by xk = T xk−1 , x0 ∈ X is arbitrary and k =
1, 2, . . ., converges to x∗ .
We remark that many interesting problems in mathematics can be

formulated in terms of finding fixed points of appropriate maps. There are
different types of fixed point theorems in different contexts, but probably
Theorem 2.3.2 is the easiest one to apply and prove as well. Indeed, the
condition (2.3.1) is stringent and may not be found easily.
In general, if we omit either the completion or contraction condition,
we may not get a fixed point. For example, consider f : R → R defined
by f (x) = x + 1. Then d ( f (x), f (y)) = | f (x) − f (y)| = |x − y|. Hence,
α = 1 and f does not have any fixed point. On the other hand, the function
f : (0, 1) → (0, 1) defined by f (x) = mx, for some m, 0 < m < 1, is a
1 There are other notions of compactness in general topological spaces. It turns out that all these
are equivalent to the one given here for a metric space.

Preliminaries 37
contraction with α = m, but has no fixed point. In the second example,

(0, 1) is not complete.
Proof of Theorem 2.3.2: We will sketch the proof. Choose any x0 ∈ X

and define the sequence x1 = T x0 , x2 = T 2 x0 , · · · , xk = T k x0 , · · · . The
sequence {xk } is a Cauchy sequence in X. To see this, observe that
d (xn+1 , xn ) = d (T xn , T xn−1 ) ≤ αd (xn , xn−1 )
and by induction, we get
d (xn+1 , xn ) ≤ α n d (x1 , x0 ).
Next, for any m < n, using triangle inequality, we have
1 − α n−m−1
d (xn , xm ) ≤ d (xn , xn−1 ) + · · · + d (xm+1 , xm ) ≤ α m d (x1 , x0 )
1−α
and the right-most term tends to 0 as n, m → ∞. This proves xk is a Cauchy
sequence and by completeness, there exists an x∗ ∈ X such that xk → x∗ ∈
X. By continuity of T , we get T xk → T x∗ . But T xk = xk+1 → x∗ . Thus,
T x∗ = x∗ . The uniqueness can also be proved easily.
The interesting advantage in this proof is that it is constructive. It gives

us a method to get the fixed point and to obtain an approximate one to
any desired accuracy. The second and more useful fact is that we can start
at any point in X as the initial guess. Quite often, in numerics, finding a
suitable initial point itself is a big challenge.
Corollary 2.3.3
Let T : X → X be such that T k is a contraction for some k ≥ 1. Then,

T has a unique fixed point.
The corollary follows from the theorem. Let x∗ be the unique fixed point
of T k , that is, T k x∗ = x∗ . Applying T , we get T k (T x∗ ) = T x∗ and hence,
T x∗ is also a fixed point of T k . By uniqueness, T x∗ = x∗ and thus, the
unique fixed point of T k is also a fixed point of T . If x1 is another fixed
point of T , that is, T x1 = x1 , then by repeated application of T , we see
that T k x1 = x1 . By uniqueness of the fixed point of T k , we have x1 = x∗
as required.
2.4 Some Topics in Linear Algebra

To motivate the use of tools of linear algebra in the study of differential
equations, we consider a first order system of ODE with constant
coefficients written in the form
ẋ = Ax, (2.4.1)
where A is a given real n × n constant matrix. It is not easy, in general, to
describe any qualitative property of a solution to (2.4.1) just by looking at
A. Hence, we try to reduce (2.4.1) to an equivalent system
ẏ = By (2.4.2)
where, B is a matrix similar to A. If we can obtain B in a very simple form
so that (2.4.2) is completely or partially decoupled, then it may be possible
to describe the qualitative behaviour of a solution y of (2.4.2); this in turn
will describe the nature of a solution x of (2.4.1).
Our main aim is to describe a procedure to reduce (2.4.1) to (2.4.2).
This is done using the Jordan canonical form, which requires many
important tools developed in a systematic course on linear algebra! We
content ourselves with the description of several steps of this reduction.
We now introduce the concept of a vector space. Though a vector space
may be defined over any field, we only consider the field of real numbers
R.
Definition 2.4.1
A real vector space V is a non-empty set possessing a binary

operation, called addition: u + v ∈ V whenever u, v ∈ V and a scalar
multiplication: au ∈ V , whenever u ∈ V and a ∈ R, satisfying the
following axioms:
1. < V , + > is an abelian group:
(i) u + v = v + u for all u, v ∈ V . (commutativity)
(ii) (u + v) + w = u + (v + w) for all u, v, w ∈ V . (associativity)
(iii) there exists an additive identity, called zero vector and is
denoted by 0 ∈ V such that u + 0 = 0 + u = u for all u ∈ V .
(iv) to each u ∈ V , there exists its additive inverse, denoted by −u ∈
V such that u + (−u) = (−u) + u = 0.
Preliminaries 39
2. (Associativity of the scalar multiplication) a(bu) = (ab)u for all u ∈

V and a, b ∈ R.
3. (Distributive property) (a + b)u = au + bu and a(u + v) = au + av
for all u, v ∈ V and a, b ∈ R.
4. 1u = u for all u ∈ V ; 1 is the multiplicative identity in R.
The elements in a vector V are referred to as vectors. It is easy to see that 0

is unique; so is −u for each u ∈ V . The associative property enables us to
define the sum u1 + · · · + uk unambiguously for any vectors u1 , · · · , uk ∈ V .
A vector space is also called a linear space.
Examples of vector spaces are given here.
1. Let X be any non-empty set and let V = { f : X → R} be the set of all
real valued functions defined on X. Define, for f , g ∈ V and a ∈ R,
the following
( f + g)(t ) = f (t ) + g(t ), t ∈ X,
(a f )(t ) = a f (t ), t ∈ X,
where on the right are the usual addition and multiplication of real
numbers. Thus, f + g, a f ∈ V whenever f , g ∈ V and a ∈ R. It is easy
to check that V is a vector space with these operations. The additive
identity in V is the zero function: 0(t ) = 0 for all t ∈ X and the
additive inverse of f ∈ V is the function − f defined by (− f )(t ) =
− f (t ), t ∈ X.
2. If we take X = {1, 2, · · · , n} (n, a given positive integer) in
Example 1, then we identify the vector space V with Rn .
3. If instead we take X as an interval in R, then we may consider the
subsets of V consisting of polynomial functions, continuous
functions, continuously differentiable functions, etc. It is easy to
verify that all these are examples of real vector spaces. A
continuously differentiable function is one which is differentiable
and its derivative is also continuous. Higher order continuously
differentiable functions are defined in a similar way.
We now define some important concepts such as linear dependence and
independence of vectors, linear span, basis and dimension.
A finite collection of vectors u1 , · · · , uk in a vector space V , k ≥ 1, are

said to be linearly independent if whenever
a1 u1 + · · · + ak uk = 0, (2.4.3)
for a1 , · · · , ak ∈ R, we have a1 = · · · = ak = 0. Otherwise, u1 , · · · , uk are
said to be linearly dependent. Thus, u1 , · · · , uk are linearly dependent if
there exists a1 , · · · , ak ∈ R, not all zero, such that (2.4.3) holds. The left
side of (2.4.3) is referred to as a linear combination of u1 , · · · , uk .
Suppose S is subset of V . The span of S, denoted by span(S), is the set
of all finite linear combinations of vectors in S:
span(S) = {a1 u1 + · · · + ak uk : k ≥ 1, u1 , · · · , uk ∈ S, a1 , · · · , ak ∈ R}.
A subset S of V is, by definition, said to be linearly independent if any
finite number of vectors in S are linearly independent. Otherwise, S is
linearly dependent. The following are immediate from the definition.
• If S is linearly independent, then any non-empty subset of S is also
linearly independent.
• If S is linearly dependent, then any superset of S is also linearly
dependent.
A non-empty subset M of V is a subspace of V if M itself is a vector space
with the same addition and scalar multiplication as in V . If M is a non-
empty subset of V , it is easy to see that M is a subspace of V if and only
if the conditions u + v ∈ M and au ∈ M whenever u, v ∈ M and a ∈ R are
satisfied.
Note that {0} and V are always subspaces of V ; these are the trivial
subspaces of V . For any subset S of V , span(S) is a subspace of V ; it is
the smallest subspace of V containing S, that is, if M is any subspace of V
containing S, then M also contains span(S).
The vector space V is said to be infinite dimensional if for each
integer k ≥ 1, there is a linearly independent set {u1 , · · · , uk } consisting of
k vectors in V . If the space V is not infinite dimensional, then we say V is
finite dimensional. If V is finite dimensional, then there is an integer
n ≥ 1 such that every subset of V containing more than n vectors is
linearly dependent. Then, there is a smallest such n. This n is called the
dimension of V and V is called an n dimensional vector space. It is easy
to see that if V is n dimensional, then there is a subset S = {u1 , · · · , un }
such that S is linearly independent and span(S) = V . Such a set S is
Preliminaries 41
called a basis of V . If S = {u1 , · · · , un } is basis, so is S̃ = {u1 + u2 , u2 +

u3 , · · · , un−1 + un }. Thus, a basis is not unique.
Definition 2.4.2
[Normed Linear Space] A norm, denoted by k.k on a vector space or

a linear space X is a mapping from X → R that satisfies:
1. kxk ≥ 0; kxk = 0 if and only if x = 0,

2. kaxk = |a|kxk,
3. (Triangle inequality) kx + yk ≤ kxk + kyk,
for all x, y ∈ X and scalar a.

Every normed linear space X is a metric space, where the metric is
given by d (x, y) = kx − yk, for all x, y ∈ X. The Euclidean space Rn and
Cn are normed linear spaces. The space of continuous functions C [0, 1] is
a normed space with the norm (known as sup norm) given by
k f k = sup | f (x)|, f ∈ C [0, 1]. We can also give different norms in the
x∈[0,1]
same space and it is important to give appropriate norms as needed in the
applications. The space C [0, 1] can also be equipped with the integral
Z 1
norm k f k1 = | f (x)|dx. These norms are fundamentally different in
0
the sense that C [0, 1] equipped with the sup norm is complete, whereas
with the integral norm, it is not complete.
2.4.1 Euclidean space Rn

Any point or vector x ∈ Rn , is denoted as a row vector: x = (x1 , · · · , xn ),
xi ∈ R. However, for convenience, we also treat it as a column vector. The
set Rn is a vector space (linear space) over the field R with the addition and
scalar multiplication defined, respectively, by x + y = (x1 + y1 , · · · , xn +
yn ) and αx = (αx1 , · · · , αxn ), for all x = (x1 , · · · , xn ), y = (y1 , · · · , yn ) ∈
Rn and α ∈ R. The Euclidean norm, metric and inner product (usual dot
or scalar product) are, respectively, defined by
n n
|x|2 = ∑ xi2 , d (x, y) = |x − y|, and (x, y) ≡ x · y = ∑ xi yi .
i=1 i=1
for all x, y ∈ Rn . With this inner product, it can be shown that Rn is a

complete innerproduct space, that is, Rn is a Hilbert space.
2.4.2 Points versus vectors

A point x ∈ Rn can also be viewed as a vector given by the position vector.
The vector gives direction and magnitude. Where the initial position (now
it is the origin) of the vector is, is immaterial. This has a great advantage
in visualization. For example, consider a particle moving along a curve
which is at x(t ) ∈ Rn at time t. So for a fixed t, we view the position x(t )
as a point, whereas the velocity ẋ(t ) is also a point in Rn , but we view it
as a vector which is positioned at x(t ).
Similarly, if A is an n × n matrix, we can see it as a linear mapping
A : Rn → Rn by the correspondence x 7→ Ax. Now, for every x ∈ Rn , the
point Ax ∈ Rn can be viewed as a vector at x. Thus, A ∈ Mn (R), generates
a vector field in Rn , where Mn (R) denotes the set of all real n×n matrices.
We will see this in the study of linear systems in Chapter 5.
With these notations, Rn is n dimensional and a standard basis can be
chosen as {e1 , · · · , en }, where e1 = (1, 0, · · · , 0), e2 = (0, 1, 0, · · · , 0), · · · en
n n
= (0, · · · 0, 1). Further, any x ∈ Rn can be written as x = ∑ xi ei = ∑
i=1 i=1
(x · ei ) ei , where xi is the coordinate of x in the direction of ei and is given
by the dot product of x and ei .
2.4.3 Linear operators

Let X and Y be finite dimensional vector spaces. A mapping T : X → Y is
said to be a linear operator or linear transformation if T (αx + β y) =
α T x + β Ty for all x, y ∈ X and for all scalars α, β . The set of all linear
operators is denoted by L (X,Y ) and if Y = X, we write L (X, X ) =
L (X ). If Y = R, then T is called a linear functional and the set
X ∗ = L (X, R), is called the dual space of X.
Suppose dim(X ) = k and dim(Y ) = m. Given bases {u1 , · · · , uk } for X
and {v1 , · · · , vm } of Y , we can represent T ∈ L (X,Y ) by an m × k matrix
m
A whose columns are given by the coefficients of T (ui ) = ∑ ai j v j , i =
j =1
1, 2, · · · , k. This essentially means that X,Y , respectively can be viewed as
Rk , Rm as sets and complete identification is done if X is equipped with
an inner product. In particular, if T is a linear operator and A is the m × k
matrix associated with the standard bases, then T x = Ax for all x ∈ Rk .
The spectral analysis of T is essentially the same as that of A.
Preliminaries 43
We notice that Mn (R) corresponds to the set of all linear operators

from Rn to Rn . Given a norm | · | in Rn , we can introduce the following
induced norm in Mn (R) as follows: For A ∈ Mn (R), define
|A| = sup |Ax| (2.4.4)
x∈Rn , |x|=1
which is same as
|Ax|
|A| = sup = sup |Ax|.
x∈Rn , x6=0 |x| x∈Rn , |x|≤1
Note that for identity matrix I, we have |I| = 1. Using the properties of | · |
in Rn , it is not hard to verify the following:
1. |A| ≥ 0; |A| = 0 if and only if A = 0,

2. |aA| = |a||A|,
3. (Triangle inequality) |A + B| ≤ |A| + |B|,
4. |AB| ≤ |A| |B|
for all A, B ∈ Mn (R) and scalars a.

Any mapping from Mn (R) → R satisfying the aforementioned four
properties is termed as a matrix norm. Note that a matrix norm satisfies
Property 4 in addition to the usual first three properties satisfied by any
norm. It should be noted that matrix norm need not be an induced norm.
n
For example, if A = [ai j ] ∈ Mn (R), define |A|2F = ∑ |ai j |2 . Then, |A|F
i, j =1
is a matrix norm (but not an induced norm, why?) and is known as the
Frobenius norm.
The norm induces a metric: d (A, B) = |A − B| in Mn (R). This makes
it a complete metric space: every Cauchy sequence {Ak } ⊂ Mn (R)
converges to some A ∈ Mn (R), that is, |Ak − A| → 0 as k → ∞.
2.5 Matrix Exponential eA and its Properties

We next proceed to define the exponential of a matrix. Let A ∈ Mn (R),
define the sequence of matrices
A2 Ak
Sk = I + A + + ··· + .
2! k!
Here, I is the identity matrix and A2 = AA, A3 = A2 A, · · · . Using the

properties of a matrix norm, it is easy to see that, for k > l,
k
|A| j
|Sk − Sl | ≤ ∑ → 0 as l, k → ∞.
j =l +1 j!
Note that the term on the right side is a partial sum of the tail of the
(scalar) exponential e|A| . Thus, {Sk } is a Cauchy sequence and
consequently converges to some S ∈ Mn (R).
Definition 2.5.1
Given A ∈ M (Rn ), the exponential of A, denoted by eA or exp(A), is

defined by
eA = S,
k
Aj
where S = lim ∑ .
k→∞ j =0 j!
∞
Aj
We also write eA = ∑ j! . Note that eA ∈ Mn (R). Clearly |eA | ≤
j =0
e|A| , which is an interesting inequality. The computation of eA is not easy.
However, if A = diag (λ1 , · · · , λn ) is a diagonal matrix, that is, the main
diagonal entries are λ1 , · · · , λn and all other elements are zero, then Ak
is also a diagonal matrix with diagonal entries λ1k , · · · , λnk (show this by
induction) and hence, eA = diag (eλ1 , · · · , eλn ).
Here are a couple of important observations:
1. Suppose that the matrix A is similar to a matrix B, that is, there exists
a non-singular matrix P such that B = PAP−1 . Then,
B2 = (PAP−1 )(PAP−1 ) = PA(P−1 P)AP−1 = PA2 P−1 ,
and, by induction, we get Bk = PAk P−1 for any k = 1, 2, · · · . This
implies that
eB = P eA P−1 and eA = P−1 eB P (2.5.1)
Thus, eA and eB are also similar.
Preliminaries 45
2. Suppose A is represented as a block diagonal matrix:

 
A1 O · · · O
 O A2 · · · O 
A = diag (A1 , · · · , Ak ) = 
 ·

· ··· · 
O O · · · Ak
with square matrices on the diagonal (may be of different orders)
and the rest are O, the zero matrix. Then, it can be easily seen that
eA is also a block diagonal matrix and is given by

eA = diag eA1 , · · · , eAk . (2.5.2)
Further, it is not hard to show that |A| ≤ max{|A1 |, · · · , |Ak |} and

hence, |eA | ≤ max{e|A1 | , · · · , e|Ak | }. (Equality holds for the
Euclidean norm)
These observations motivate us to look for an invertible matrix P so that B

is diagonal and hence, eA can be computed easily. If such a matrix P exists,
then we say the matrix A is diagonalizable. If A is not diagonalizable, we
next look for a P so that PAP−1 is a block diagonal matrix, with easily
computable eAi .
Diagonalizability is related to eigenvalues and eigenvectors of the
given matrix. To get an idea, suppose A = diag(λ1 , · · · , λn ) and x = (0,
· · · , 0, xi , 0, · · · , 0) be a vector in the ith coordinate direction. Then, the
action of A on x is Ax = (0, · · · , 0, λi xi , 0, · · · , 0) and therefore, Ax also
lies on the same coordinate direction. Further, Ax is a scaled version of x
by the factor λi . The coodinate axes are invariant under A. Thus, for a
general matrix A, the idea is to look for n directions, if possible and A
acts invariantly along each of these directions and each vector in any of
these directions is a multiple of itself. Essentially, we are looking for a
new coordinate system or a new basis under which A is transformed to a
diagonal matrix.
More precisely, if T is a linear transformation and the usual coordinate
axes are all invariant under T , then the matrix A corresponding to T with
respect to the standard basis is a diagonal matrix. If the usual coordinate
axes are not invariant under T , then we look for n distinct directions, if
possible, which are invariant under T . Taking these directions as a new
basis, the matrix associated with T in this new basis will be a diagonal
matrix. When this happens, we say that T or A is diagonalizable. If it is
not possible to get any set of n distinct directions invariant under T , then
T will not be diagonalizable.
2.5.1 Diagonalizability and block diagonalizability

Suppose there is a non-zero vector x ∈ Cn and λ ∈ C such that Ax = λ x,
then we say λ is an eigenvalue of A with the corresponding eigenvector
x. Note that x, λ , in general, are complex, even though A is real. The
subspace {x ∈ Rn or Cn : Ax = λ x}, where λ is an eigenvalue of A, is

0 1
called the eigenspace corresponding to λ . The matrix A =
−1 0
has no real eigenvalues, but the complex eigenvalues are given by λ = ±i.
A sufficient condition for diagonalizability is as follows. Suppose A
has eigenvalues λ1 , · · · , λn , not necessarily distinct, with the corresponding
eigenvectors u1 , · · · , un . If u1 , · · · , un are real and linearly independent,
put Q = [u1 · · · un ]. Then, Q ∈ Mn (R) and is non-singular. Thus, with
P = Q−1 , we obtain PAP−1 = B = diag(λ1 , · · · , λn ) and A is therefore
diagonalizable.
We next discuss block diagonalizability of a matrix, when it is not
diagonalizable. Let A ∈ Mn (R). The eigenvalues of A are the roots of the
characteristic polynomial det(λ I − A) which is a real polynomial in λ of
degree n. The roots may be real or complex. The set of all eigenvalues of
A is known as the spectrum of A and is denoted by σ (A). If µ ∈ σ (A) is
real, then, there is a real eigenvector x ∈ Rn . Suppose µ ∈ σ (A) is
non-real, that is, µ = a + ib, a, b ∈ R, b 6= 0 and u = x + iy is a
corresponding eigenvector, where x, y ∈ Rn , that is, Au = µu. Expanding
and equating the real and imaginary parts, we get
Ax = ax − by and Ay = ay + bx. (2.5.3)
It is straightforward to verify that these vectors x and y are linearly
independent.
We now present some ideas on block diagonalization.
Invariant subspaces
Let M and N be two subspaces of Rn such that M ∩ N = {0}. We say M
and N are disjoint subspaces though 0 ∈ M ∩ N always. We say that Rn is
a direct sum of M and N, if, by definition, for every x ∈ Rn , there exist
unique y ∈ M, z ∈ N such that x = y + z. We denote the direct sum by
Rn = M ⊕ N. For example, R2 = {(x, 0) : x ∈ R} ⊕ {(0, y) : y ∈ R}.
Preliminaries 47
We can also introduce the direct sum of more than two subspaces
M1 , · · · , Mk as Rn = M1 ⊕ · · · ⊕ Mk , that is, each vector x ∈ Rn has a
unique representation x = u1 + · · · + uk , where ui ∈ Mi , i = 1, 2, · · · , k.
For example,
R3 = {(x, y, 0) : x, y ∈ R} ⊕ {(0, 0, z) : z ∈ R}
= {(x, 0, 0) : x ∈ R} ⊕ {(0, y, 0) : y ∈ R} ⊕ {(0, 0, z) : z ∈ R}.

(2.5.4)
Definition 2.5.2
A subspace M of Rn is said to be invariant under a matrix A if A(M ) ⊂

M.
Assume Rn = M ⊕ N, where M and N are invariant subspaces under

A ∈ Mn (R). Let dim M = k, dim N = l, so that n = k + l. Choose a basis
{u1 , · · · , uk } of M and {v1 , · · · , vl } of N. Since M and N are disjoint
subspaces, we see that {u1 , · · · , uk , v1 , · · · , vl } is a linearly independent
set and therefore, it is a basis of Rn . Let C1 = [u1 · · · uk ] and
C2 = [v1 · · · vl ] which are matrices of order n × k and n × l,
respectively and define C = [C1 C2 ], which is an n × n invertible matrix
as its columns form a basis.
It is not hard to see that if A(M ) ⊂ M and A(N ) ⊂ N, then there exist
matrices A1 of order k × k and A2 of order l × l such that

A1 O
AC = C
O A2
and hence,

A1 O
C−1 AC = , so that A is block diagonalizable.
O A2
In a similar way, it can be shown that if Rn is a direct sum of the subspaces
M1 , · · · , Mk , each one of them invariant under A, then, there exists a non-
singular matrix C and square matrices A1 , · · · , Ak such that
C−1 AC = diag(A1 , · · · , Ak ).
Further,
C−1 eA C = diag(eA1 , · · · , eAk )
The computation of eAi need not be easy in general. We shall next describe
a procedure to find a suitable C so that eAi are easily computed.
2.5.2 Spectral analysis of A

Let A ∈ Mn (R) and σ (A) = {λ1 , · · · , λk , µ1 , · · · , µl , µ̄1 , · · · , µ̄l }, where
λi , i = 1, · · · , k are the real eigenvalues and µi , µ̄i , i = 1, · · · , l are complex
eigenvalues with Im( µi ) 6= 0. Thus, n = k + 2l; it may happen that l = 0
or k = 0. If we get n linearly independent eigenvectors, then we see that
A is diagonalizable. The problem arises if we do not have enough
eigenvectors to form a basis of Rn . In that case, we need to look for
additional vectors to form a basis. This is done by introducing the
concept of generalized eigenvectors.
Let λ ∈ σ (A) ∩ R and m be the algebraic multiplicity of λ , that is, its
multiplicity as a root of the characteristic polynomial, det(λ I − A), det
denoting the determinant. Let N1 = ker(A − λ I) be the kernel or null
space of (A − λ I), which is the eigenspace corresponding to λ . The
number dim N1 is called the geometric multiplicity. This gives the number
of linearly independent eigenvectors corresponding to the eigenvalue λ .
Note that the geometric multiplicity is always less than or equal to the
algebraic multiplicity. If they are not equal, there is a deficiency in the
number of eigenvectors, namely, the difference between them. Define the
generalized eigenspaces N j = ker(A − λ I) j , j = 1, 2, 3, · · · . Clearly
N1 ⊂ N2 ⊂ · · · and there exists a smallest integer d ≥ 1 such that
N1 ⊂ N2 ⊂ · · · · · · ⊂ Nd = Nd +1 = · · ·
Here, d is called the index of λ . Further, it is easy to see that N j s are
invariant under A and a bit long to show that dim(Nd ) equals algebraic
multiplicity of λ . For j > 1, N j s are called the generalized eigenspaces. If
d = 1, then, geometric multiplicity = dim(N1 ) = algebraic multiplicity and
we have the required number of eigenvectors corresponding to λ . But if
d > 1, then we may look for generalized eigenvectors from N2 , N3 , · · · , Nd
to complete the deficient number of basis vectors, corresponding to λ .
Let µ be a non-real eigenvalue and let N j = ker(A − µI) j , j = 1, 2, · · ·
and consider the real and imaginary parts of the vectors of a basis of N j .
Preliminaries 49
It is not hard to see that these real vectors are linearly independent. In
conclusion, we have the following theorem.
Theorem 2.5.3
Let A ∈ Mn (R).
Then, for each λ ∈ σ (A) real or non-real, there exists an invariant
subspace Nλ of Rn such that

algebraic multiplicity of λ if λ is real
dim(Nλ ) =
twice the algebraic multiplicity of λ if λ is non-real
Further, Nλ ∩ Nµ = {0} if λ 6= µ and Rn can be decomposed as

Rn = Nλ1 ⊕ · · · ⊕ Nλk ⊕ Nµ1 ⊕ · · · ⊕ Nµl ,
where λ1 , · · · , λk are distinct real eigenvalues and µ1 , · · · , µl are distinct,
non-real eigenvalues with positive imaginary parts.
As we observed earlier, A can now be block diagonalized as

C−1 AC = diag Aλ1 , · · · , Aλk , Aµ1 , · · · , Aµl .

Our aim is to find suitable bases for Nλi and Nµ j so that Aλi and Aµ j have
simple structures. Hence, exp(Aλi ) and exp(Aµ j ) can be computed easily.
Before proceeding further, we illustrate this with an example.
Example 2.5.4
Let λ be an eigenvalue with geometric multiplicity 2 and algebraic

multiplicity 4.
Suppose that N1 ⊂ N2 ⊂ N3 with dim(N1 ) = 2, dim(N2 ) = 3,

dim(N3 ) = 4. Choose a vector, denoted by u4 ∈ N3 , but u4 6∈ N2 . Define
u3 = (A − λ I)u4 , u2 = (A − λ I)2 u4 = (A − λ I)u3 . Then, u3 ∈ N2 and
u2 ∈ N1 . Further, {u2 , u3 , u4 } is linearly independent. Since u2 ∈ N1 , and
dim(N1 ) = 2, we can choose u1 ∈ N1 so that u1 , u2 are linearly
independent. Hence, {u1 , u2 , u3 , u4 } is a linearly independent set.
Moreover, we have
Au1 = λ u1 , Au2 = λ u2 , Au3 = u2 + λ u3 , Au4 = u3 + λ u4 .
With C = [u1 , u2 , u3 , u4 ], a matrix with full rank, we obtain

" #
A1 O
AC = C ,
O A2
 
λ 1 0
where A1 = [λ ] singleton matrix and A2 =  0 λ 1  .
 
0 0 λ
The submatrices A1 and A2 are called Jordan blocks. In general, the
number of Jordan blocks corresponding to an eigenvalue coincides with
its geometric multiplicity.
If µ = a + ib, b 6= 0 is a non-real eigenvalue (complex eigenvalues
appear in pairs as A is real), the Jordan blocks corresponding to µ are of
the form
B2 I2 O · · · O
 
 O B2 I2 · · · O 
 
  (2.5.5)
 ··· ··· ··· ··· 
O O · · · · · · B2

a b 1 0 0 0
where B2 = , I2 = , O= are all 2 × 2
−b a 0 1 0 0
matrices.
This analysis can be worked out for every eigenvalue and we get the final
decomposition known as Jordan decomposition theorem (JDT).
Theorem 2.5.5
[Jordan Decomposition Theorem] Given A ∈ Mn (R), there exists a

non-singular matrix C such that
C−1 AC = diagonal(J1 , · · · Jk ), (2.5.6)

where each Ji is a Jordan block corresponding to an eigenvalue of A. A
typical Jordan block is a square matrix and has the form
Preliminaries 51
1 0 ··· 0
 
λ
 0 λ 1 ··· 0
 


 ··· ··· ··· ··· ···

 (2.5.7)
 
0 0 ··· ··· λ
if λ is a real eigenvalue or takes the form (2.5.5) if λ = a + ib, b 6= 0 is

a non-real eigenvalue. We remark that for each λ , there may be several
Jordan blocks depending on its geometric multiplicity as we have seen in
Example 2.5.4.
2.5.3 Computation of eJ for a Jordan block J

We begin with a simple observation. If A, B ∈ Mn (R), which commute
with each other, that is, AB = BA, then, the following binomial theorem
holds:
k
k k
(A + B) = ∑ A j Bk− j
j =0 j
for k = 1, 2, · · · . Further, eA+B = eA · eB .

Assume J is a Jordan block of the block (2.5.7) of order r ≥ 2
corresponding to a real eigenvalue λ of A. Write, J = λ I + N, where, N
is a matrix such that the first upper diagonal have entries that are all 1:
0 1 0 ··· 0
 
 0 0 1 ··· 0 
 
N=  .
 ··· ··· ··· ··· ··· 

0 0 ··· ··· 0
Thus, since I and N commute with each other,
eJ = eλ I .eN = eλ IeN = eλ eN .
It is easy to see that Nr = Nr+1 = · · · = O, the zero matrix. Hence,
Nr−1

J λ
e = e I + N + ··· +
(r − 1) !
The matrix N defined earlier is called a nilpotent matrix of order r. We

leave it as an exercise to the reader that if J is a Jordan block of order
r = 2m of the form (2.5.5), then
!
D 2(m−1)
eJ = diag eB2 , · · · , eB2

I + D + ··· + ,
2(m − 1) !
where
 
O I2 O ··· O
··· O

 O O I2 
a b
D= and B2 = .
 
··· ··· ··· ··· ··· −b a

 
O O O ··· O

cos b sin b
Further, it is straightforward to see that eB2 = ea . From
− sin b cos b
(2.5.6), it follows that
A = C diag(J1 , · · · Jk ) C−1 ,
eA = C diag(eJ1 , · · · eJk ) C−1 (2.5.8)

and for any t
etA = C diag(etJ1 , · · · , etJk ) C−1 .
Hence,
|etA | ≤ |diag(etJ1 , · · · , etJk | ≤ max |etJi |.
1≤i≤k
With some more computation, one can prove the following theorem (using
the representation of eJ ). See [CL72].
Theorem 2.5.6
Suppose σ (A) ⊂ {λ ∈ C : Re λ < 0}. Then, there exist positive

constants k, r such that
|etA | ≤ ke−rt , for all t ≥ 0. (2.5.9)
Preliminaries 53
2.6 Linear Dependence and Independence of

Functions
Recall the vector space V , introduced earlier, of all the real valued
functions defined on a non-empty set I. Two vectors u1 , u2 ∈ V are
linearly independent if a1 u1 + a2 u2 = 0 implies a1 = a2 = 0. Here,
a1 u1 + a2 u2 = 0 means that a1 u1 (t ) + a2 u2 (t ) = 0 for all t ∈ I.
Otherwise, u1 , u2 are linearly dependent. Of course, the definition can be
extended to a finite collection of functions.
We now discuss some sufficient conditions for two functions to be
linearly
For any two points t1 ,t2 ∈ I, t1 6= t2 , if the matrix
independent.
u1 (t1 ) u2 (t1 )
is non-singular, then u1 and u2 are independent. To
u1 (t2 ) u2 (t2 )
see this, let a1 u1 + a2 u2 = 0. Thus,
a1 u1 (t1 ) + a2 u2 (t1 ) = 0
a1 u1 (t2 ) + a2 u2 (t2 ) = 0.
The non-singularity of the matrix implies that a1 = 0 = a2 . Since the class
V is too large, we cannot make a statement about the converse. We now
consider a special class from V . Let C1 (I ) be the class of continuously
differentiable functions defined on I. Clearly C1 (I ) ⊂ V , which again is a
subspace of V . In the class C1 (I ), we get a simpler sufficient condition for
linear independence. For u1 , u2 ∈ C1 (I ), define the Wronskian of u1 , u2 ,
denoted by W = W (t ) = W (u1 , u2 )(t ) by
W (u1 , u2 )(t ) = u1 (t )u̇2 (t ) − u̇1 (t )u2 (t ), t ∈ I,

u1 (t ) u2 (t )
which is the determinant of the Wronskian matrix .
u̇1 (t ) u̇2 (t )
It is not hard to see the following. Suppose u1 , u2 ∈ C1 (I ). If there is a
point t0 ∈ I such that W (t0 ) 6= 0, then u1 , u2 are linearly independent.
The converse need not be true. The functions u1 (t ) = t 3 , u2 (t ) = |t|3 ,
t ∈ I = [−1, 1] are in C1 (I ) and are linearly independent, but
W (u1 , u2 )(t ) = 0 for all t ∈ [−1, 1]. This easy verification is left as an
exercise for the reader.
It is interesting and important that this situation does not occur when
we deal with functions which are solutions of linear second order ODE, as
will be shown in Chapter 3.
2.7 Exercises
1. Consider fk : [0, 1] → R defined by
1


 k2 x, 0≤x≤



 k

2 1 2
f k (x ) = k2 −x , ≤x≤


 k k k
2


0, ≤ x ≤ 0.


k
R1
Show that fk (x) → f ≡ 0, not uniformly and 0 fk (x) = 1,
R1
0 f (t )dt = 0.
2. Prove the following:
(a) Show that f (x) = |x|1/2 is not locally Lipschitz at 0, that is, f
is not Lipschitz in any interval (a, b) containing the origin. But,
it is Lipschitz in any interval (finite or infinite) away from the
origin. More specifically, prove that it is Lipschitz in (a, b) if
a > 0 and it is Lipschitz in (a, b) with b < 0. Is it Lipschitz in
(0, 1)? Justify your answer.
(b) Write down 3 different solutions for ẋ = |x|1/2 satisfying
x(0) = 0.
3. Discuss the Lipschitz continuity of the following functions with

respect to y.
(a) f (t, y) = y2/3

p
(b) f (t, y) = |y|
 3
 4t y (t, y) 6= (0, 0)
(c) f (t, y) = t 4 + y4

0 (t, y) = (0, 0)
(d) f (t, y) = t|y| on D : |t| ≤ a, |y| ≤ b
(e) f (t, y) = t sin y + y cost D : |t| ≤ a, |y| ≤ b
(f) f (t, y) = y + [t ] on a bounded domain D in R2 , where [t ]
denotes the greatest integer less than or equal to t. Note that f
is not continuous in t.
Preliminaries 55

1 1
4. Show that the matrix A = is not diagonalizable by
0 1
proving A has only one eigenvalue and the corresponding
eigenspace is one dimensional. Thus, it will not be possible to
obtain two linearly independent eigenvectors.
2.8 Notes
In this chapter, we have merely listed some results from analysis and
linear algebra which are used throughout the book. For a comfortable
understanding of the book, the reader is advised to get familiarized with
these basics. A good course on basic analysis and linear algebra will be
sufficient to follow the book. Quite often, the beauty and importance of
many interesting notions like diagonalization, eigenvalues and
eigenvectors are hidden in the abstraction. We have made an effort to
introduce these notions in a very natural way and hence, the
diagonalization of matrices is no longer unreachable to undergraduate
students. There are many books for both linear algebra and analysis; for
example, see [Apo11, BS05, Rud76] for analysis and [Apo11, HK97,
Kum00, Str06] for linear algebra.
3
First and Second Order Linear
Equations
3.1 First Order Equations

Recall from Chapter 1 that a general first order equation is written as
f (t, y, ẏ) = 0, (3.1.1)
where y = y(t ), is the unknown function to be determined. Since this
equation is extremely difficult to deal with, a slightly simpler, but quite
general, possibly non-linear equation is given by
ẏ = f (t, y). (3.1.2)
This is a regular form of first order equations. Equations with vanishing
coefficient of ẏ (highest derivative term for higher order equations) are
classified as singular equations. These equations are difficult, but can have
interesting features. We will see examples later.
In Chapter 1, we have given many interesting examples with explicit
solutions for some of the problems. In this chapter, we plan to study a
special class of ODE, known as linear ODE. In fact, we study in detail,
the first and second order linear equations. It is an interesting fact that
every equation of the nth order can be transformed into a system of n first
order equations. A detailed study of linear systems will be carried out in
Chapter 5. Before going to the linear equations, we briefly discuss initial
value problems (IVP) and boundary value problems (BVP).
First and Second Order Linear Equations 57
3.1.1 Initial and boundary value problems

There is a nice and beautiful theory regarding the existence and
uniqueness of solutions to the differential equation or system of equations
of the form (3.1.2) under suitable assumptions (like continuity and
Lipschitz continuity etc.) on f which we will present in Chapter 4. We
would like to remark that the uniqueness is proved for the initial value
problem, that is, for the equation (3.1.2) together with a given initial
condition of the form
y(t0 ) = y0 . (3.1.3)
This is relevant and essential in practical problems. Naturally, after the
invention of differential calculus, solutions of first order ODE became part
of the theory of integral calculus. That is, given a function f = f (t ), does
there exist an antiderivative? In other words, given a function f = f (t ),
does there exist a function y = y(t ) such that
ẏ(t ) = f (t ). (3.1.4)
Z t
Then, we may write y(t ) = f (τ ) dτ, that is, y is an antiderivative of f .
We use the notation f (τ )dτ or t f (t ) dt to denote an antiderivative of
Rt R
f , throughout the rest of this book. Using the area concept and continuity
assumption on f , we indeed prove Zthat such a function y exists. In fact, all
t
the solutions are given by y(t ) = f (τ ) dτ + C, where C is a constant.
This really is the content of the fundamental theorem of calculus. Thus, if
we know the value of y at some point, say at t0 , that is, y(t0 ) = y0 , then C
can be determined uniquely, as C = y0 and the solution is
Z t
y(t ) = y0 + f (τ ) dτ (3.1.5)
t0
In many situations, even for ‘smooth’ functions f , we may not be able to

integrate f explicitly to obtain y, which shows the difficulty even at this
stage of dealing with the simplest differential equation (DE) (3.1.4). In this
case, we can only say that y has a representation. For example, the function
2
f (t ) = et cannot be integrated explicitly. Now, physically, the solution of
IVP (3.1.4), (3.1.3) can be viewed as a function describing the motion
of a particle starting from the point y0 at time t0 with the velocity of the
particle at time t > t0 is f (t ). More specifically, we may ask the question
that knowing the instantaneous velocity ( f (t )) of a vehicle moving on a

straight road, is it possible to determine its position (y(t )) at any instance
of time (t )? The answer is affirmative if the starting point y0 at an initial
time t = t0 is known and f satisfies a continuity condition. If the DE is
non-linear, that is, f = f (t, y) depends nonlinearly on the unknown y as
well, it means that the velocity at time t not only depends on time t, but the
position y = y(t ) at time t also. In any case, this results in the following
general IVP :

ẏ = f (t, y) 
(3.1.6)
y(t0 ) = y0 
If we have a second order equation in regular form, namely ÿ = f (t, y, ẏ),

then by putting y1 = y and y2 = ẏ = ẏ1 , the second order equation can be
written as a system of two first order equations for the two unknowns y1
and y2 as
ẏ1 = y2 , ẏ2 = f (t, y1 , y2 ).
Thus, we have two first order ODEs to be integrated (though they are
coupled) and hence, we require initial conditions for each variable y1 and
y2 ; y1 (t0 ) = y(t0 ) = y0 , y2 (t0 ) = ẏ(t0 ) = y1 . Thus, one form of the initial
value problem for the second order equation is given by

ÿ = f (t, y, ẏ) 
(3.1.7)
y(t0 ) = y0 , ẏ(t0 ) = y1 
It is possible to formulate other types of initial value problems which are

physically relevant.
We now briefly discuss boundary value problems. There are many
interesting physical problems described by second order equations
defined on an interval [a, b]. The examples include Bessel’s, Legendre,
Hermite, Mathieu equations and physical conditions are given at the end
points, namely at the boundary of the interval [a, b], in contrast with
(3.1.7), where conditions are prescribed only at one point. Such systems
are generally classified as regular or singular Sturm–Liouville systems.
Hence, a typical boundary value problem (BVP) for a second order
equation in regular form can be stated as

ÿ = f (t, y, ẏ) for t ∈ (a, b) 
(3.1.8)
α1 y(a) + β1 ẏ(a) = γ1 , α2 y(b) + β2 ẏ(b) = γ2 
We remark that boundary value problems are generally more difficult than
initial value problems. We will discuss some of these issues in Chapter 7
and Chapter 9.
3.1.2 Concept of a solution

In the integral calculus problem, we have seen that it may not always be
possible to integrate the given function to get the solution in an explicit
form. In fact, the solution may also be given implicitly as an algebraic
relation between t and y as we have seen in the nonlinear model for the
atomic waste disposal problem in Chapter 1. Thus, the concept of a
solution itself has to be viewed in a very general sense, like: may exist,
but may not have an explicit/implicit formula; may have an implicit
relation; possible to represent y in terms of t (explicitly); may be
represented as a power series. It is unfortunate that in most practical
problems, we may not be able to obtain a solution to the DE in the
implicit/explicit form. This necessitates the importance of studying
differential equations (DEs) from the point of view of mathematical
analysis and numerical computations. Thus, we may ask various
questions regarding the solutions of differential equations like methods to
solve DEs; existence, uniqueness, and continuous dependence on the
initial data; numerical methods and computation; qualitative analysis like
stability, large time behavior ( lim y(t )) and so on. Existence,
t→+∞
uniqueness, continuous dependence on the initial data, continuation of
solutions etc. will be the topics of discussion in Chapter 4, whereas
qualitative analysis for systems of linear and nonlinear equations,
respectively, will be studied in Chapter 5 and Chapter 8.
Definition 3.1.1
[Solution] Let f be defined in a rectangle R := (a, b) × (c, d )

containing the initial data (t0 , y0 ). A solution to the IVP (3.1.6) is a
function y : (ā, b̄) → R which is differentiable and satisfies the DE
(3.1.6) together with the initial condition y(t0 ) = y0 .
This means that for each t ∈ (ā, b̄), y(t ) ∈ (c, d ) and ẏ(t ) = f (t, y(t )) and
y(t0 ) = y0 . The interval (ā, b̄) is referred to as an interval of existence of
the solution. Here, t0 ∈ (ā, b̄) ⊂ (a, b) for some interval (ā, b̄) and
y(t ) ∈ (c, d ) for all t ∈ (ā, b̄). If (ā, b̄) = (a, b), then we say y is a global
solution to the IVP; otherwise, it is known as a local solution. If the
function f is continuous, then y is continuously differentiable, that is
y ∈ C1 (ā, b̄). It is also possible to define a weaker notion of the solution
concept. Throughout this book, we will assume that f is continuous and
hence, we seek a solution in C1 (ā, b̄).
A similar concept of a solution may be extended to a system of first
order equations. Let f : (a, b) × Ω → Rn be a vector valued continuous
function so that f = ( f1 , · · · , fn ) and each fi is a real valued continuous
function, where Ω is an open domain in Rn . For a given initial value y0 ∈
Ω, the IVP is given by

ẏ = f(t, y) 
(3.1.9)
y(t0 ) = y0 
Thus, a local solution to the system (3.1.9) is a smooth function

y ∈ C1 ((ā, b̄), Ω) satisfying the aforementioned system. The definition of
a solution can also be extended to higher order equations of order k, in a
similar fashion by demanding that the solution is k times differentiable in
an appropriate interval.
3.1.3 First order linear equations

We have already classified differential equations by their orders. In
addition to the order classification, we can also classify them as linear
and nonlinear ODEs. In this chapter, we will study first order and second
order linear equations. There is a rich theory regarding linear ODEs and
systems of linear ODE, developed invoking the machinery from linear
algebra and analysis. Recall that the general first order ODE can also be
written in the form f (t, y, ẏ) = h(t ), where h consists of all the terms that
does not involve the unknown y and its derivative ẏ. Now treat f as a
function of y, ẏ. The equation is said to be linear if f (t, α (z1 , w1 )+
β (z2 , w2 )) = α f (t, z1 , w1 ) + β f (t, z2 , w2 ) for all real numbers α, β , z1 ,
w1 , z2 , w2 and all t. Note that we do not demand linearity with respect to
t. If the ODE is linear, it is an easy exercise from linear algebra that f
takes the form f (t, y, ẏ) = p0 (t )ẏ + p1 (t )y.
Thus, the first order general linear, homogeneous ODE is given by

d
Ly ≡ ( p0 (t ) + p1 (t ))y = p0 (t )ẏ + p1 (t )y = 0. (3.1.10)
dt
The corresponding linear, first order, non-homogeneous ODE can be
written as
Ly = q(t ), (3.1.11)
where p0 , p1 and q are given functions of t. More precisely, it is linear in
the sense that the differential operator L defined by Ly = p0 (t )ẏ + p1 (t )y
is linear in the class of functions where it is defined. That is, L(y1 +
y2 ) = Ly1 + Ly2 and L(αy) = αL(y), where α is any scalar. The set of
all solutions of Ly = 0 has a linear structure. The importance of the linear
structure lies in its superposition principle; that is, if y1 , y2 are solutions of
the homogeneous equation Ly = 0, then αy1 + β y2 is also a solution for
any constants α, β .
When the coefficient of the highest order term, namely p0 , vanishes
at one or more points, the equation can bring surprises and is much more
difficult to handle. Such equations are classified as singular equations. A
few examples of second order singular equations that occur in applications
are Bessel’s equation, Legendre equation, Lagrange equation. We will
not pursue such equations in this book; however, the reader can refer to
[BR03]. We only present here two simple examples. Later, in Chapter 6,
we deal with some more examples.
Example 3.1.2
Consider the equation t ẏ − 2y = 0.
2y
Equivalently, ẏ = when t 6= 0. Separating the variables and integrating,
t
we obtain the general solution as y(t ) = Ct 2 , where C is a constant. For
any fixed C, the solution y, therefore represents a parabola in the t − y
plane passing through the origin. Thus, if we consider the IVP for this
equation with the initial value y(0) = 0, there are infinitely many solutions
satisfying the initial condition, but no solution if the initial value is y(0) =
y0 6= 0.
Example 3.1.3
t
Consider the equation ẏ = − .
y
We can easily see that y and t satisfy the implicit√equation y2 + t 2 = C2 ,
where C is a constant, which implies y = ± C2 − t 2 and |t| ≤ |C|.
Therefore, for −|C| ≤ t ≤ |C|, there exist solutions. The solution is not
defined for |t| > |C|.
A general regular (that is, the coefficient of highest order term is never
zero) first order linear ODE can be written as
Ly := ẏ + p(t )y = q(t ), (3.1.12)
where, p and q are functions of t. We assume that p and q are continuous
functions of t. For the basic equation, namely the integral calculus
problem, ẏ = f (t ), the general solution is given by
Z t
y(t ) = f (τ )dτ + C.
Now recall the linear equation (3.1.12) and consider the corresponding
homogeneous equation Ly = 0; that is, ẏ + p(t )y = 0 or ẏ = −p(t )y.
ẏ
Writing this formally as: = −p(t ), an integration gives
y
d
log |y(t )| = −p(t )
dt
and therefore
Zt Z t

|y(t )| = C exp − p(τ )dτ , that is, y(t ) exp p(τ )dτ = C,
where C is a positive constant. Since y(t ) exp t p(τ )dτ is a continuous

R
function, we get
Z t Z t
y(t ) exp p(τ )dτ = C or y(t ) exp p(τ )dτ = −C.
In either case, we can write

Zt
y(t ) = C̃ exp − p(τ )dτ (3.1.13)
for some arbitrary constant C̃. The reader should verify that if f is a
continuous function defined in an interval in R whose modulus is a
constant, then f itself is a constant. It is also easy to directly verify that y
given by (3.1.13) indeed satisfies Ly = 0.
Remark 3.1.4
From this discussion, we see that a solution of the IVP

ẏ + p(t )y = 0, y(t0 ) = y0 , (3.1.14)
Zt
is given by y(t ) = y0 exp − p(τ )dtτ . The uniqueness of this
t0
solution follows from the general theory explained in Chapter 4.
However, in this simple situation, we can prove the uniqueness directly

without appealing to the general theory. Suppose z is another solution to
the homogeneous
Z t equation Ly = 0 and consider the function x(t ) =
linear
z(t ) exp p(τ )dτ . Then, it is easy to verify that ẋ = 0 which implies
t0 Zt
x is a constant. Thus, z(t ) = C exp − p(τ )dτ . Since z satisfies the
t0
same initial condition z(t0 ) = y0 , we get z = y, proving the uniqueness of
the solution. (see also the exercise section.)
We now consider the non-homogeneous equation
ẏ + p(t )y = q(t ). (3.1.15)
Note that we obtained a solution to the first order linear homogeneous
equation by reducing it to an integral calculus problem, namely, ḣ(t ) =
−p(t ), where h(t ) = log |y(t )|.
In a similar fashion, for the non-homogeneous problem, if it is possible
to find a function h(t ) such that
Ly = ẏ + p(t )y = ḣ,
then solving (3.1.15) is reduced to an integral calculus problem. That is,
we just need to integrate ḣ = q(t ) to obtain a solution. In this situation,
the equation (3.1.15) is referred to as an exact differential equation
(EDE). However, this may not be possible in general. If µ (t ) is any
differentiable function, then dtd ( µy) = µ ẏ + µ̇y. A comparison of this
with the expression on the left side of (3.1.15), suggests that if we

multiply (3.1.15) by µ (t ), the resulting equation may become an EDE.
This is the idea behind the concept of an integrating factor. Thus, we
consider the equation
µ (t )ẏ(t ) + µ (t ) p(t )y(t ) = µ (t )q(t ). (3.1.16)
If µ is positive, then any solution of (3.1.15) is a solution of (3.1.16) and
vice versa. The term on the left hand side of (3.1.16) can be written as
d
dt ( µy), provided µ satisfies µ̇ (t ) − p(t ) µ (t ) = 0. Thus, (3.1.16) is exact.
Note that the equationsatisfied byµ is a homogeneous linear DE in µ
Z t
and hence, µ (t ) = exp p(τ )dτ is a solution and it is positive. Thus,
(3.1.16) becomes
Z t Z t
d
exp p(τ )dτ y(t ) = exp p(τ )dτ q(t )
dt
which gives
Z t Z t Z s
exp p(τ )dτ y(t ) = exp p(τ )dτ q(s)ds + C
and we have the general solution as

Zt Z t Z s
y(t ) = exp − p(τ )dτ q(s) exp p(τ )dτ + C .
So, we have basically solved a homogeneous equation to get the solution

of the non-homogeneous equation. The function µ (t ) is called an
integrating factor (I.F) associated with the homogeneous part.
Example 3.1.5
R
Consider the differential equation ẏ + 2ty = t. Here, the I.F. is e 2tdt =
2 2 2
et . Thus, et (ẏ + 2ty) = tet , which implies
d
Z
2 2 2 2
(yet ) = tet ⇒ yet = tet + C
dt
1 2
or y(t ) = + Ce−t
2
Another class of equations which can be explicitly integrated are the

variable separable equations.
3.1.4 Variable separable equations

Recall the general regular ODE ẏ = f (t, y). If f (t, y) has the form f (t, y) =
h(t )g(y), where h is a function of t alone and g is a function of y alone,
then the equation is called variable separable and can be solved as
dy
Z Z
= h(t )dt + C. (3.1.17)
g(y)
Equation (3.1.17) can be interpreted using the change of variable formula
for integration as follows: Since y is function of t, the change of variable
formula gives
dy 1
Z Z Z
= ẏ(t )dt = h(t )dt + C,
g(y) g(y(t ))
using the given ODE. This is precisely (3.1.17), which is usually
dy
(symbolically) written as = h(t )dt.
g(y)
Example 3.1.6
Consider the equation ẏ = ty.
Separating the variables and integrating using the change of variable

2 2
formula, we get log |y| = t2 + C1 , that is, |y| = Cet /2 , C ≥ 0. We may
2
conclude that y = C1 et /2 , for an arbitrary constant C1 .
Example 3.1.7
Consider the second order equation ÿ = t 2 ẏ.
Putting ẏ = v, we reduce the given equation to a first order equation for

t3
v : v̇ = t 2 v. Solving the latter equation, we obtain
Z v = C1 e for some
3
t3 t t3
constant C1 . Therefore, ẏ = C1 e 3 and thus, y = C1 e 3 dt + C2 , with C2
another constant.
Example 3.1.8
Consider another second order equation
ÿẏ = t (1 + t ), y(0) = 1, ẏ(0) = 2.

Putting ẏ = v, we have vv̇ = t (1 + t ) with v(0) = 2. This can be written
d v2
as = t (1 + t ). Integrating and using the initial condition, we get
dt 2
v2 t2 t3
= + + 2. That is, v2 = t 2 + 23 t 3 + 4. Therefore, we have
2 2 r3
3t 2 + 2t 3 + 12
ẏ = ± and solving it gives us
Z tp 3
1
y(t ) = ± √ 12 + 3s2 + 2s3 ds + C. Using the initial values, we
3 0 Z tp
1
have y(t ) = 1 ± √ 12 + 3s2 + 2s3 ds.
3 0
We remark that more generally, a second order equation of the form
ÿ = f (t, ẏ), where f does not contain y explicitly, may always be reduced
to a first order equation for ẏ. Once ẏ is obtained, finding y reduces to an
integral calculus problem.
3.2 Exact Differential Equations

We next consider a class of equations termed as exact differential
equations. We have already encountered such an equation while
attempting to solve a first order linear non-homogeneous equation. If the
d
equation ẏ = f (t, y) could be written as ϕ (t, y) = 0 for some two
dt
variable function ϕ in a domain in the (t, y) plane, then we can represent
the solution implicitly as ϕ (t, y) ≡ C, a constant.
Definition 3.2.1
d
If the differential equation ẏ = f (t, y) can be written as ϕ (t, y(t )) =
dt
0 for a two variable function ϕ in a domain in the (t, y) plane, then the
differential equation is said to be an exact differential equation (EDE).

Example 3.2.2
d
The equation 1 + cos(t + y) + cos(t + y)ẏ = 0 can be written as [t +
dt
sin(t + y)] = 0 and hence is exact. The solution is implicitly given by
t + sin(t + y) = constant.
We will consider a first order equation in a little more general form:

M (t, y) + N (t, y)ẏ = 0 (3.2.1)
Of course, M = − f , N = 1 will give ẏ = f (t, y). Similarly, taking N ≡ 1
and M (t, y) = p(t )y, we obtain a linear homogeneous equation. But the
present form has an advantage that the equation ẏ = f (t, y) can be written
in the form (3.2.1) for different choices of M and N out of which certain
representations may be exact and others not. For example, f can be
f fµ
written as ; it can also be written as for any non-zero µ = µ (t, y).
1 µ
This is essentially the process of making an equation exact by
multiplying by suitable functions. We will also write (3.2.1) in a more
traditional (symbolic) way as Mdt + Ndy = 0. We now ask the question
that under what conditions on the given functions M and N, the equation
(3.2.1) is exact? The equation (3.2.1) is exact if and only if there exists a
function ϕ = ϕ (t, y) such that
d ∂ϕ ∂ϕ
M (t, y) + N (t, y)ẏ = ϕ (t, y) = + ẏ.
dt ∂t ∂y
Therefore, if there exists ϕ = ϕ (t, y) satisfying
∂ϕ ∂ϕ
M (t, y) = , N (t, y) = ,
∂t ∂y
then, (3.2.1) is exact.
The aforementioned conditions involve not only the given functions M
and N, but also the unknown function ϕ. If we assume that ϕ is twice
differentiable with respect to both the variables, we immediately arrive at
∂M ∂N
the necessary condition = , by using the equality of the mixed
∂y ∂t
∂ 2ϕ ∂ 2ϕ
partial derivatives: = .
∂ y∂t ∂t∂ y
Question: Given two functions M, N, does there exist a function ϕ such

∂ϕ ∂ϕ
that M = ,N= ? The answer is given in the following theorem.
∂t ∂y
Theorem 3.2.3
Assume M, N are defined on a rectangle D = (a, b) × (c, d ) and M, N ∈

C1 (D). Then, there exists a function ϕ defined in D, such that M =
∂ϕ ∂ϕ ∂M ∂N
and N = if and only if = .
∂t ∂y ∂y ∂t
Proof: If there exists a ϕ satisfying the conditions in the theorem, then

∂ϕ
the first relation M = , suggests that ϕ must be of the form ϕ (t, y) =
R ∂t
M (t, y)dt + h(y), for some h, which is a function of y alone. To satisfy
the second relation, we must have
∂M dh
Z
∂ϕ
N= = (t, y)dt + . (3.2.2)
∂y ∂y dy
Therefore, we have a first order ODE for h as
dh ∂M
Z
(y) = N (t, y) − (t, y)dt. (3.2.3)
dy ∂y
Note that on the left hand side is a function of y alone and hence, so should
there be on the right side. Therefore, the right side is independent of t and
thus, we get

∂M
Z
∂
N (t, y) − (t, y)dt = 0.
∂t ∂y
Note that the integration in the second term is with respect to t and hence,
on differentiation with respect to t, we arrive at
∂N ∂M
= ,
∂t ∂y
which is the necessary condition.
∂N ∂M
On the other hand, if M and N satisfy the condition = , we
∂t ∂y
show how to obtain ϕ as follows. Let (t0 , y0 ) ∈ D be a fixed point. Define
Z t
ϕ (t, y) = M (s, y) ds + h(y),
t0
where h is to be determined. Therefore,

Z t
∂ϕ ∂M dh
= (s, y)ds +
∂y t0 ∂y dy
Z t
∂N dh
= (s, y)ds + , using the given relation
t0 ∂t dy
dh
= N (t, y) − N (t0 , y) + .
dy
∂ϕ
Therefore, the second relation, namely = N, is satisfied if we choose h
∂y
dh
such that dy = N (t0 , y). But, this is an integral calculus problem for h and
Z y
we obtain h(y) = N (t0 , ξ ) dξ . Thus, the required function is given by
y0
Z t Z y
ϕ (t, y) = M (s, y) ds + N (t0 , ξ ) dξ .
t0 y0

We remark that ϕ is determined only up to a constant. Thus, if we change
t0 , y0 in this equation, only the constant term is going to change. Therefore,
the role of t0 , y0 is minimal and one can discard all the constants in the
expression for ϕ. We will observe this in the following examples. First,
make the following definition.
Definition 3.2.4
∂M ∂N
The DE, M (t, y) + N (t, y)ẏ = 0 is said to be exact if = .
∂y ∂t
Example 3.2.5
dy
The DE, 3y + et + (3t + cos y) = 0 is exact.
dt
We have
∂M ∂N
M = 3y + et , N = 3t + cos y, and thus =3= .
∂y ∂t
∂ϕ ∂ϕ
Therefore, M = , that is = 3y + et which gives ϕ (t, y) = 3yt +
∂t ∂t
∂ϕ dh
et + h(y). Differentiating with respect to y, we get N = = 3t + .
∂y dy
dh
Thus, = cos y or h(y) = sin y. We may take the constant of integration
dy
as 0. Hence, ϕ (t, y) = 3yt + et + sin y. Therefore, the given DE can be
∂
written as ϕ (t, y) = 0. The solution is given by ϕ (t, y) = 3yt + et +
∂t
sin y = constant.
We now discuss the notion of an integrating factor. If the DE (3.2.1)
is not exact, we may possibly make it exact by multiplying it with a
suitable function, which is called an integrating factor (I.F.). Multiplying
(3.2.1) by µ (t, y), we get
µ (t, y)M (t, y) + µ (t, y)N (t, y)ẏ = 0. (3.2.4)
Note that if the function µ > 0, then any solution y of (3.2.4) is also a
solution of (3.2.1) and vice versa. Equation (3.2.4) is exact if and only if
∂ ∂
( µM ) = ( µN ), which implies
∂y ∂t
∂µ ∂M ∂µ ∂N
M+µ = N+µ . (3.2.5)
∂y ∂y ∂t ∂t
If this equation has a solution µ, then (3.2.4) is exact and µ is an I.F. of
the original equation. As (3.2.5) is a PDE for µ, it is more difficult to
solve and goes beyond the realm of ODE! However, since we have some
freedom in choosing µ, we will try to choose it as simple as possible, say
µ is a function of only t or only y. Fortunately, such an assumption works
in many situations.
Consider a special caseµ = µ (t ) is a function of t alone. Then, (3.2.5)
∂M ∂N
becomes µ (t ) − = µ̇ (t )N and hence,
∂y ∂t

µ̇ (t ) 1 ∂M ∂N
= − .
µ (t ) N ∂y ∂t
As the expression on the left is a function of t alone, this equation makes
sense only when the expression on the right side is also a function of t
alone, say R(t ); then one can find an I.F. µ (t ) = exp t R(t )dt .
R

1 ∂N ∂M
Similarly, if the expression − is a function of y alone,
M ∂t ∂y
then we can choose µ as function of only y.
Example 3.2.6
Consider the linear equation ẏ + p(t )y = q(t ).
Here,
M = p(t )y − q(t ), N = 1.
Now

1 ∂M ∂N
− = p(t ).
N ∂y ∂t
t
Z
Hence, µ (t ) = exp p(t )dt is an I.F. as we have already seen
Section 1.2.
Example 3.2.7
Consider the equation

dy
2t sin y + y3 et + (t 2 cos y + 3y2 et ) = 0.
dt
Here, we have M = 2t sin y + y3 et and N = t 2 cos y + 3y2 et . Thus,

∂M ∂N
− = (2t cos y + 3y2 et ) − (2t cos y + 3y2 et ) = 0,
∂y ∂t
which shows that the given equation is exact. So we have
Z t
ϕ (t, y) = (2t sin y + y3 et )dt + h(y) = t 2 sin y + y3 et + h(y).
Now
∂ϕ dh
N = t 2 cos y + 3y2 et = = t 2 cos y + 3y2 et + .
∂y dy
dh
Therefore, (y) = 0, h(y) = constant. We can take ϕ (t, y) =
dy
d 2
t 2 sin y + y3 et . The equation becomes (t sin y + y3 et ) = 0 which
dt
implies t 2 sin y + y3 et = k, a constant.
Example 3.2.8
Consider the DE t ẏ − 2y = 0.
1
For t > 0, we can see that the function µ (t ) = 3 is an integrating factor
t
d y 1 2
since = 2 ẏ − 3 y = 0. Thus, y = ct 2 is a solution for any
dt t 2 t t
constant c.
3.3 Second Order Linear Equations

In Chapter 1, we have seen interesting examples of the
spring–mass–dashpot system and the LCR circuit leading to second order
linear equations with constant coefficients. Newton’s second law of
motion, describing the motion of a body under the influence of a given
force, is another example. We can also write the second order equation as
a system of two first order equations. In fact any nth order equation can be
written as a system of n first order equations. This will be seen in
Chapter 5 dealing with linear systems.
We have introduced in this chapter the IVPs for the second order ODE
together with two initial conditions. Now, we will study the following IVP
for general, regular, second order linear equations:
ÿ + p(t )ẏ + q(t )y = r (t ), t ∈ I,
(3.3.1)
y(t0 ) = y0 , ẏ(t0 ) = y1 .
Here p, q, r are real valued continuous functions in a given interval I
containing t0 and y0 , y1 are given real numbers. We denote by I (t0 ), a
general interval containing t0 . As introduced earlier, a general second
order equation can be written in the form f (t, y, ẏ, ÿ) = 0 and a second
order linear differential equation (SLDE) is given by
Ly ≡ ÿ + p(t )ẏ + q(t )y = r (t ). (3.3.2)
If r (t ) ≡ 0, the equation is called a homogeneous SLDE (HSLDE). The

operator Ly = L(t, y, ẏ, ÿ) is multi-linear, that is, L is linear in (y, ẏ, ÿ).
More precisely, if we define L(t, y, z, w) by L(t, y, z, w) = w + p(t )z+
q(t )y, then it is linear in (y, z, w). That is, L(α (y1 , z1 , w1 ) + β (y2 , z2 , w2 ))
= αL(y1 , z1 , w1 ) + β L(y2 , z2 , w2 ) for all real numbers α and β . We can
also see this in2another way. We view L as the differential operator given
d d
by Ly = + p(t ) + q(t ) y. Then L is linear in y, that is,
dt 2 dt
L(a1 y1 + a2 y2 ) = a1 Ly1 + a2 Ly2 , for any C2 functions y1 , y2 and scalars
a1 , a2 . Further, if y and ỹ are two solutions to the homogeneous equation,
then αy + β ỹ is also a solution of the homogeneous system for all
constants α and β . This is known as the superposition principle and it is
an important and useful property of linear equations.
Unlike the case of a first order equation, it is not possible to solve the
equation (3.3.2), in general, to obtain a solution of (3.3.1) in explicit form.
This necessitates us to do analysis for second and, more generally, higher
order equations like existence, uniqueness, etc., which are discussed in
Chapter 4. The proof of the existence theorem is similar to the one for
the general first order equations using Picard’s iterations. In fact, (3.3.1))
can be converted to a first order system and then Picard’s iteration can be
employed to obtain the existence of a solution.
Theorem 3.3.1
([Sim91]) Let p, q, r be continuous functions defined in a compact

interval I (t0 ) and y0 , y1 be any two real numbers. Then, the IVP
(3.3.1) has a unique solution y defined in I (t0 ) satisfying y(t0 ) = y0 ,
ẏ(t0 ) = y1 .
We remark that the compactness assumption on I (t0 ) may be relaxed, if we

assume that the functions p, q and r are bounded in I (t0 ). See Chapter 4.
As we do not have a general procedure to obtain a solution for second
order equations like in the case of first order equations, we need to have
methods, if possible, to obtain solutions in addition to the existence and
uniqueness of a solution. We now try to understand the structure of the set
of solutions via linear algebra.
3.3.1 Homogeneous SLDE (HSLDE)

By taking r = 0 in (3.3.1), we will first study the homogeneous
equation
Ly := ÿ + p(t )ẏ + q(t )y = 0. (3.3.3)
Proposition 3.3.2
Let z, w be two solutions of (3.3.3) in some interval. Then, for any

α, β ∈ R, the function y = αz + β w is also a solution to (3.3.3).
Further, if z and w are two linearly independent solutions of HSLDE
(3.3.3) in an interval I (t0 ), then every solution of HSLDE (3.3.3) is
given by y = αz + β w, where α and β are suitable constants.
A general solution of HSLDE therefore is a linear combination of any two

linearly independent solutions of HSLDE.
Proof: The first part of the proposition is trivial to verify. Now, let y
be any solution of (3.3.3) with y0 = y(t0 ) and y1 = ẏ(t0 ). We now show
that there are constants α and β such that y(t ) = αz(t ) + β w(t ), for all
t ∈ I (t0 ). In particular, taking t = t0 , we see that α and β should satisfy
the 2 × 2 matrix system

z(t0 )α + w(t0 )β = y0 ,
(3.3.4)
ż(t0 )α + ẇ(t0 )β = y1 .

This equation is solvable uniquely for α and β if and only if

z(t0 ) w(t0 )
det 6= 0. Define Wronskian W of z, w by
ż(t0 ) ẇ(t0 )
" #
z(t ) w(t )
W (t ) = det = z(t )ẇ(t ) − ż(t )w(t ). (3.3.5)
ż(t ) ẇ(t )
Thus, the system (3.3.4) is uniquely solvable for α, β if W (t0 ) 6= 0. In fact,

if z, w are linearly independent, then we have W (t ) 6= 0 for all t ∈ I (t0 ).
To see this, consider
d
W (t ) = zẅ − z̈w = −p(zẇ − żw) = pW
dt
t
Z
z, w satisfies (3.3.3). Thus, W is given by W (t ) = C exp p(t ) dt , for
some constant C. Hence, W ≡ 0 if C = 0. If C 6= 0, then W (t ) 6= 0, for
all t.
The following proposition will complete the proof of Proposition 3.3.2.

Proposition 3.3.3
W ≡ 0 if and only if z and w are dependent.
Proof: If z and w are dependent, that is, either z = kw or w = kz for some

constant k, then clearly W ≡ 0. On the other hand, assume W ≡ 0. If
one of z, w is identically zero, then z, w are linearly dependent. So, assume
z 6≡ 0 and w 6≡ 0. Since z is continuous, there is an interval [c, d ] ⊂ I (t0 )
such that z(t ) 6= 0, for all t ∈ [c, d ]. Thus, for all t ∈ [c, d ]

W zẇ − żw d w
0= 2 = = .
z z2 dt z
Thus, we get w = kz in [c, d ]. Observe that both w and kz are solutions
to (3.3.3) in I (t0 ) with the same values in [c, d ] (in particular, they can be
treated as initial conditions). Thus, by uniqueness, w = kz in I (t0 ). Hence,
z, w are linearly dependent.
This also completes the proof of the theorem since now there exist
unique α, β satisfying (3.3.4). It follows then that y = αz + β w.
Let S be the set of all the solutions to the homogeneous equation Ly = 0.

By the superposition principle, it is easily verified that S is a linear space
and it is a subspace of the infinite dimensional space C2 (I (t0 )), the space
of twice continuously differentiable functions. Proposition 3.3.2 shows
the amazing fact that dim S ≤ 2. In fact, we have the following theorem.
Theorem 3.3.4.
dim(S) = 2.
Proof: Let z and w be the solutions to the homogeneous equation Ly = 0

with the initial conditions z(t0 ) = 1, ż(t0 ) = 0 and w(t0 ) = 0, ẇ(t0 ) = 1,
respectively. Existence and uniqueness of z and w follow from Theorem
3.3.1. Then, the Wronskian W (t ) of z and w satisfies W (t0 ) = 1 6= 0.
Hence, the Wronskian is non-zero for all t and by Proposition 3.3.3, z and
w are linearly independent. It now follows from Proposition 3.3.2 that any
solution of (3.3.3) can be written as a linear combination of z and w.
We remark that the aforementioned proposition holds true for an nth order
linear equation as well. Further, the existence of a unique solution for the
aforementioned IVP is guaranteed under the assumption that the functions
p, q are continuous in a compact interval I (t0 ). But, in general, even for
second order equations, it is difficult to find independent solutions to Ly =
0 in explicit form. We present two methods describing the possibility of
obtaining linearly independent solutions. When applicable, these methods
generate two linearly independent solutions.
Method 1: The idea is to remove the term involving the first order
derivative ẏ via an integrating factor. We look for a solution of the form
y = uv, where u and v are to be properly chosen. In this case, ẏ = uv̇ + u̇v
and ÿ = uv̈ + 2u̇v̇ + üv. Substituting in Ly = 0, we get
(uv̈ + 2u̇v̇ + üv) + p(t )(uv̇ + u̇v) + q(t )uv = 0. (3.3.6)
Rearranging the terms in (3.3.6), we obtain
uv̈ + (2u̇ + p(t )u)v̇ + (ü + p(t )u̇)v + q(t )uv = 0.
Now choose u so that the coefficient of v̇ in this equation vanishes. That
is, choose u satisfying
2u̇ + p(t )u = 0, (3.3.7)
which can be easily solved for u. Note that u never vanishes, if it is not
zero initially. The equation satisfied by v now becomes
v̈ + q(t )v = −(ü + p(t )u̇)/u. (3.3.8)
Since v̇ term is absent and u is known in (3.3.8), it may be possible to solve
this equation for v, at least in some situations.
Example 3.3.5
Solve ÿ + 2t ẏ + (1 + t 2 )y = 0.
1 2
Here p(t ) = 2t, q(t ) = 1 + t 2 , u(t ) = e− 2 t . It is easy to see that v satisfies
v̈ = 0. Thus, v(t ) = C1t + C2 and the solution is given by
1 2
y(t ) = u(t )v(t ) = e− 2 t (C1t + C2 ).
Method 2 (Reduction of order): This method describes a way to find

a second (linearly independent) solution, given a non-trivial solution of
(3.3.3). This method is particularly useful when we are able to obtain a
solution of a given equation by simple inspection. Assume y1 (t ) is a
known non-trivial solution to (3.3.3). Then, any constant multiple of
y1 (t ) is also a solution. If we wish to produce another independent
solution, the idea is to take a variable multiple of y1 (t ). Therefore, we
seek a second solution in the form y2 (t ) = C (t )y1 (t ) which will be
y2
independent of y1 . That is, must be a non-constant function for y2 to
y1
be linearly independent of y1 . Using the fact that y1 is a solution and
assuming y2 is another solution, we obtain
C̈ (t )y1 + Ċ (t )(2ẏ1 + py1 ) = 0. (3.3.9)
Though this is a second order equation for C, itis only a first
order linear
Z t
1
equation for Ċ. Hence, we get Ċ (t ) = 2 exp − p dt (reduction of
y1
order) and solving for C reduces it to an integral calculus problem. One
more integration will give
Z t Zs
1
C (t ) = exp − p ds dt. (3.3.10)
y21
It is easy to compute the Wronskian of y1 and y2 and conclude that the two
solutions are indeed linearly independent.
3.3.2 Linear equation with constant coefficients

In this section, we consider the equation
Ly := aÿ + bẏ + cy = 0, (3.3.11)
where a, b, c are constants with a 6= 0. Observe the interesting fact: If ẏ ∝
y, that is, ẏ = ry for some constant r, then ÿ = rẏ = r2 y, that is ÿ ∝ y.
Substituting these relations into the equation, we get (ar2 + br + c)y = 0.

For non-trivial solution y, r must, therefore, satisfy
ar2 + br + c = 0. (3.3.12)
This motivates us to look for a solution of ẏ = ry, that is, a solution of the
form y(t ) = ert for (3.3.11). Then, r satisfies (3.3.12), which is called the
characteristic equation of (3.3.11). The two roots of (3.3.12) are given by
1 p 1 p
r1 = −b + b2 − 4ac and r2 = −b − b2 − 4ac .
2a 2a
(3.3.13)
We now analyse various cases, depending on the nature of the discriminant
of the quadratic equation (3.3.12).
Case (i) b2 − 4ac > 0: In this case, the roots r1 and r2 of (3.3.12) are real
and distinct and we get two linearly independent solutions y1 (t ) = er1t and
y2 (t ) = er2t . Hence, the general solution can be written as
y(t ) = Aer1t + Ber2t , (3.3.14)
where, A and B are arbitrary constants.
Case (ii) b2 − 4ac = 0: Here, we have a double real root

b t
− 2a
r1 = r2 = r = −b2a and we get one solution y1 (t ) = e . Now, using the
reduction of order method, the reader can construct a second linearly
b
independent solution as y2 (t ) = ty1 (t ) = te− 2a t . Thus, the general
solution is given by
b
y(t ) = (A + Bt )e− 2a t , (3.3.15)
where, A and B are arbitrary constants.
Case (iii) b2 − 4ac < 0: Then, the roots r1 and r2 are complex and er1t
and er2t are complex valued solutions. Clearly, if y(t ) = u(t ) + iv(t ) is a
complex valued solution, then u and v are real valued solutions. Thus, if
r1 = α + iβ and r2 = α − iβ , β 6= 0, the two independent solutions are
b
given by y1 (t ) = eαt cos βt and y2 (t ) = eαt sin βt, where α = − 2a and
√
4ac − b2
β= . Thus, the general solution is given by
2a
y = eαt (A cos βt + B sin βt ),

for arbitrary constants A and B.
The reader should directly verify the linear independence of solutions
in all the aforementioned cases.
3.3.3 Non-homogeneous equation

Now, we consider the non-homogeneous case
Ly := ÿ + p(t )ẏ + q(t )y = r (t ). (3.3.16)
Solution Structure: Let S̃ be the set of all solutions to the
non-homogeneous equation (3.3.16). Any solution y = y p ∈ S̃ of (3.3.16)
is called a particular solution. If z and w are solutions to the homogeneous
equation (3.3.3), then y = αz + β w + y p is also in S̃ for any constants α
and β . In fact, the converse is also true; that is, any y ∈ S̃ is given by the
form y = αz + β w + y p , where z and w are linearly independent solutions
to the homogeneous equation (3.3.3), y p is any particular solution and α,
β are constants. To see this, let y ∈ S̃ be arbitrary; then, the function
y − y p satisfies the homogeneous equation L(y − y p ) = 0. In other words,
y − y p ∈ S, the two dimensional space of solutions of the homogeneous
equation. Hence, the result follows from Proposition 3.3.2.
Theorem 3.3.6
Assume that the functions p, q, r in (3.3.16) are continuous. Let S and

S̃ be, respectively, the solution set of homogeneous equation (3.3.3)
and non-homogeneous equation (3.3.16). Then, S is a linear space of
dimension 2 and S̃ is an affine space given by S̃ = S + y p for any
particular solution y p of (3.3.16).
Example 3.3.7
Consider ÿ + y = t. The characteristic equation is given by r2 + 1 = 0

whose roots r = ±i. Thus, cost, sint are linearly independent
solutions of ÿ + y = 0 and y p (t ) = t is a particular solution. The
general solution is y(t ) = C1 cost + C2 sint + t.
We now provide two methods to obtain a particular solution of the non-

homogeneous equation.
Method of Variation of Parameters: In this method, given two linearly

independent solutions y1 and y2 of HSLDE (3.3.3), we seek a solution of
the non-homogeneous equation (3.3.16) of the form
y(t ) = C1 (t )y1 (t ) + C2 (t )y2 (t ), (3.3.17)
where, the functions C1 and C2 are to be suitably found. Note that if C1 ,C2
are constants, then y can never be a solution of (3.3.16), if r 6= 0. A simple
computation leads to

r = Ly = 2(Ċ1 ẏ1 + Ċ2 ẏ2 ) + C̈1 y1 + C̈2 y2 + p(t )(Ċ1 y1 + Ċ2 y2 ).
(3.3.18)
We need two equations to determine Ċ1 and Ċ2 . We choose the first
equation as
Ċ1 y1 + Ċ2 y2 = 0. (3.3.19)
Using this in (3.3.18), we obtain
Ly = Ċ1 ẏ1 + Ċ2 ẏ2 = r (t ). (3.3.20)
Thus, the equations (3.3.19), (3.3.20) give the system of equations for
Ċ1 , Ċ2 which is uniquely solvable for Ċ1 , Ċ2 , since the Wronskian W of y1
and y2 is non-zero. Hence, by Cramer’s rule, we have
r (t )y2 r (t )y1
Ċ1 (t ) = and Ċ2 (t ) = − . (3.3.21)
W W
By integrating, we get C1 (t ) and C2 (t ). This gives a particular solution:
r (t )y2 (t ) r (t )y1 (t )
Z Z
y p (t ) = y1 (t ) dt − y2 (t ) dt.
W W
Example 3.3.8
Consider L(y) = r (t ), where the homogeneous equation L(y) = 0

has two linearly independent solutions cost and sint. Let r (t ) = tant.
Therefore, a particular solution can be obtained as
Z t Z t
y p (t ) = sint cost tant dt −cost sint tant dt = cost log | sect + tant|.
Example 3.3.9
Now suppose that the homogeneous equation L(y) = 0 has two

linearly independent solutions e−t and e3t and let r (t ) = te−t . Now,
the Wronskian of these linearly independent solutions is W = 4e2t . A
particular solution is then given by
Z t −t Z t −t
te 3t te −t 1
y p (t ) = e−t 2t
e dt − e3t
2t
e dt = t (2t + 1)e−t .
4e 4e 16
Example 3.3.10
Consider the linear equation
1+t 1
ÿ − ẏ + y = te2t .
t t
By inspection, we observe that y1 (t ) = et is a solution of the
homogeneous equation. By the method of reduction of order, it can
be shown that y2 (t ) = 1 + t is another (linearly independent)
solution. Therefore, a particular solution of the given
non-homogeneous equation is given by
Z t 2t Z t 2t
te te −t 1
y p (t ) = et (1 + t ) dt − (1 + t ) e dt = (t + 1)e2t .
tet tet 2
Method of Undetermined Coefficients (Special Non-homogeneous

Terms)
1) Consider the equation
Ly := ÿ + pẏ + qy = eat , (3.3.22)
where p, q and a are constants. Suggested by the exponential
function on the right side, whose derivatives are also multiples of
the same function, we look for a solution of the form y p (t ) = Aeat .
This gives
A(a2 + pa + q) = 1.
Thus, if a2 + pa + q 6= 0, that is, a is not a root of the characteristic
equation (3.3.12), we get a particular solution as
eat
y p (t ) = .
a2 + pa + q
If a is root of (3.3.12), we now look for a solution of the form
y p = Ateat . In fact, one can use the reduction order method to get
the coefficient as At. A computation will lead to A(2a + p) = 1.
1
Again, we get a particular solution by choosing A = if
2a + p
2a + p 6= 0 as
teat
y p (t ) = .
2a + p
d 2
Now, note that 2a + p = (a + ap + q). Thus, 2a + p = 0 is
da
equivalent to a being a double root. If a is a double root (that is,
a2 + ap + q = 0 and 2a + p = 0), then look for a solution of the
form y p (t ) = At 2 eat which will give us A = 21 . In summary, we have
eat


 if a is not a root of (3.3.12),
a2 + pa + q





teat

y p (t ) = if a is a simple root of (3.3.12),


 2a + p

 1 t 2 eat


if a is a double root of (3.3.12).

2
(3.3.23)
2) Here, we consider a similar equation as Ly := ÿ + pẏ + qy = sin bt

or cos bt. Look for a particular solution of the form
y p (t ) = A sin bt + B cos bt.
In a similar fashion, we can determine A and B, unless y p itself is a
solution to the homogeneous system. If y p happens to be a solution
of the homogeneous equation, then, we may try y p (t ) = t (A sin bt +
B cos bt ) and proceed as earlier.
3) Consider Ly := ÿ + pẏ + qy = a0 + a1t + · · · + ant n . One can try for
a particular solution, an expression of the form
y p (t ) = A0 + A1t + · · · + Ant n .
By plugging in this expression into the ODE, we get

(2A2 + 3.2A3t + · · · + n(n−1)Ant n−2 ) + p(A1 + 2A2t + · · · + nAnt n−1 )
+q(A0 + A1t + · · · + Ant n ) = a0 + a1t + · · · + ant n .

Comparing the coefficients of equal powers, we obtain
qAn = an , pnAn + qAn−1 = an−1 , · · · 2A2 + pA1 + qA0 = a0 .
Therefore, we get a solution if q 6= 0. If q = 0, we may try a solution
of the form y p (t ) = A0t + · · · + Ant n+1 .
Example 3.3.11
Consider ÿ + y = sint.
Since sint, cost are solutions to ÿ + y = 0, we may try y p (t ) = t (A cost +

B sint ) and arrive at A = − 21 , B = 0. Hence, the general solution is y(t ) =
C1 cost + C2 cos bt − 21 t cost, with arbitrary constants C1 and C2 .
We now do a complete analysis of the spring–mass–dashpot system
introduced in Chapter 1, using the solutions obtained for equations with
constant coefficients. We can do a similar analysis for the LCR circuit
problem and other systems.
Example 3.3.12
[Spring–Mass–Dashpot System] Let us recall, the SMD system;

mÿ + cẏ + ky = F (t ),
where m > 0, c ≥ 0, k > 0, are, respectively, the mass of the body

attached to the spring, damping parameter and spring constant, and F is
the applied external force. This is a second order differential equation
with constant coefficients. We analyse various cases using the solution of
the homogeneous equation.
Case (i) (Free undamped vibrations: F = 0, c = 0): In this case, the

general solution is oscillatory which is given by
y(t ) = a cos ω0t + b sin ω0t,
r
k
where ω0 = , called the natural frequency of the system. This can
m
also be written as
y(t ) = R cos(ω0t − δ ).
√

b
Here, R = a2 + b2 , δ = tan−1 are, respectively, the amplitude and
a
2π
phase angle. Further, T0 = is the period of the motion and the motion
ω0
is periodically oscillating between −R and R (see Fig. 3.1). Note that the
term involving c is the damping term. Indeed, Newton’s law is justified as
the motion never stops.
y
R
−R
2π/ω 0
Fig. 3.1 Free undamped vibrations
Case (ii) (Damped, free motion: F = 0, c > 0): If r1 , r2 are the roots
of the characteristic equation mr2 + cr + k = 0, we can write the general
solution as
if c2 − 4mk > 0
 rt

 ae 1 + ber2t



c
y(t ) = (a + bt )e− 2m t if c2 − 4mk = 0 (3.3.24)



 −ct

e 2m [a cos µt + b sin µt ] if c2 − 4mk < 0,
√
4mk − c2
where µ = . Further, it is easy to see that r1 , r2 are negative
2m
real numbers or have negative real parts. Hence, in the first two cases,
y(t ) → 0 as t → ∞, y(t ) creeps back to the equilibrium position and there
are no oscillations at all. These are referred to as over-damped or critically
damped systems, respectively. The interesting third case is known as under

damped motion which occurs quite often in mechanical vibrations. Thus,
it is a case of damped vibrations. Now, rewrite the solution as
y(t ) = Re−ct/2m cos( µt − δ ). (3.3.25)
The displacement y(t ) oscillates between the curves y = ±Re−ct/2m and
y(t ) → 0 as t → ∞, in this case as well. See Fig. 3.2.
u
R
c
R exp( 2m t)
y(t ) t
c
−R exp( 2m t)
−R
Fig. 3.2 Free damped vibration
As remarked earlier, the motion dies out if there is damping in the

system. In other words, the initial disturbance is dissipated by damping.
That is why it is very useful in practical mechanical systems. It can
eliminate undesirable vibrations like shocks transmitted in an automobile,
the principle behind shock absorbers, gun barrel etc.
Case (iii): (Forced, damped vibrations: F 6= 0, c > 0): Of course, this

will depend on the applied force. Consider the case of a periodic applied
force F (t ) = F0 cos ωt, F0 6= 0. A particular solution can be obtained as
y p (t ) = A cos ωt + B sin ωt, (3.3.26)
where A, B are given by
m(ω02 − ω 2 ) cω
A = F0 2 2 2 2 2 2
, and B = F0 2 2
m (ω0 − ω ) + c ω m (ω0 − ω 2 )2 + c2 ω 2
r
k
and ω0 = . The general solution can be written as
m
y(t ) = ϕ (t ) + y p (t ), (3.3.27)
where ϕ is the general solution to the homogeneous equation and as
observed earlier, ϕ (t ) → 0 as t → ∞. Thus, for large time, y(t ) behaves
like y p (t ). The solution y p is called the steady state part of y(t ) and ϕ (t )
is called the transient part.
Case (iv): (Forced, undamped vibrations: c = 0, F 6= 0): Again, we

take F = F0 cos ωt. In this case, the equation becomes
F0
ÿ + ω02 y = cos ωt. (3.3.28)
m
r
k
Here, ω0 = is the natural frequency of the system and ω is the
m
external frequency. The behaviour is quite different depending on
whether the applied frequency ω and the natural frequency are the same
or not. This leads to the following situations.
Case (iv a) (Without resonance): This is the case with c = 0, F0 6=

0, ω 6= ω0 : If ω 6= ω0 , that is, the applied frequency ω is different from
the natural frequency ω0 , then, the case is similar to an earlier case and the
solution is the sum of oscillatory functions in the form
F0
y(t ) = c1 cos ω0t + c2 sin ω0t + 2
cos ωt.
m(ω0 − ω 2 )
Case (iv b) (With resonance): c = 0, F0 6= 0, ω = ω0 : Consider the

interesting case of ω = ω0 . Here, the external force and the given system
have the same frequency ω0 , the natural frequency. We have
F0
ÿ + ω02 y = cos ω0t.
m
More generally, consider the equation
F0 iω0t
ÿ + ω02 y = e .
m
Here eiω0t is a solution of the homogeneous equation. Taking the

−iF0
expression Ateiω0t for a particular solution, we get A = . Therefore,
2mω0
F0t F0t
a particular solution is sin ω0t − i cos ω0t. Thus, we get the
2mω0 2mω0
general solution as
F0
y(t ) = C1 cos ω0t + C2 sin ω0t + t sin ω0t.
2mω0
y
y(t )
t
Fig. 3.3 Forced undamped vibrations with resonance
The first two terms are periodic functions of time. The last term is
oscillatory and its amplitude keeps increasing due the presence of t.
Thus, if the forcing term F0 cos ω0t is in resonance with the natural
frequency of the system, then it will cause unbounded oscillations,
leading to mechanical catastrophes.
A criterion, similar to the lack of sufficient damping, was the reason
for the collapse of the Tacoma bridge on November 7, 1940 at 11.00 am.
This is also the cause for the collapse of the Broughton suspension bridge
near Manchester. This occurred when a column of soldiers marched in
cadence over the bridge, thereby setting up a periodic force with a rather
large amplitude. The frequency was almost equal to the natural frequency
of the bridge, and thus, large oscillations were induced and the bridge
collapsed. It is for this reason that soldiers are ordered to break cadence
when crossing a bridge.
Among many similarities with mechanical vibrations, electrical
circuits also have the property of resonance. Unlike mechanical systems,
resonance is put to good use here like, the tuning knob of radio or
television is used to vary the capacitance in such a manner that the
resonant frequency is changed until it agrees with the frequency of the
external signal, that is, from a radio or television station; the amplitude of
the current produced by this signal will be much greater than that of other
signals so that we get the desired sound quality or picture quality or both.
3.3.4 Green’s functions

We end this section by introducing an important concept in DE, namely
Green’s functions, which help us to represent the solutions of
non-homogeneous equations with an arbitrary non-homogeneous term
via a kernel which is defined independent of the non-homogeneous term;
it solves a homogeneous DE with special initial conditions. ZRecall the
t
integral calculus problem ẏ = f (t ); then y is given by y(t ) = f (t ) dt.
In other words, the solution is represented by an integral which is in some
d
sense the inverse operation to the differential operator . Can we have a
dt
similar representation for general equations Ly = r leading to the integral
representation via the so-called kernels? More precisely, we are looking
for a solution operator S given by y = Sr, where S symbolically is the
inverse of the given differential operator L. Hence, the motivation is to
look for the integral representation.
First of all observe that the solution operator, namely S : r → y, where
y solves (3.3.16) is linear. In other words, define S(r ) = y, that is, S acts
on the function space where the non-homogeneous term r belongs to, for
example, space of continuous functions and maps to the corresponding
solution. Then the operator S is linear. In general, a class of important
linear operators are given by integral operators defined via a kernel. In
fact, history tells us that the study of linear operators was indeed motivated
by integral operators. Thus, we would like to know whether the solution
operator S is an integral operator. In other words, does there exist a two
variable function G(t, ξ ) so that
Z t
y(t ) = (Sr )(t ) = G(t, ξ )r (ξ )dξ . (3.3.29)
a
In fact, it is quite reasonable to expect that the solution operator is an

integral operator because formally S is the inverse of a differential operator
L. Assume such a G exists. Then, differentiating y twice, we get
Z t
ẏ(t ) = Gt (t, ξ )r (ξ )dξ + G(t,t )r (t )
a
and
Z t
d
ÿ(t ) = Gtt (t, ξ )r (ξ )dξ + Gt (t,t )r (t ) + (G(t,t )r (t )).
a dt
Now substituting in (3.3.16), we get
Z t
Ly(t ) = LG(·, ξ )r (ξ )dξ + Gt (t,t )r (t ) + p(t )G(t,t )r (t )
a
d
+ (G(t,t )r (t )).
dt
This motivates us to define G as a solution to the following homogeneous
problem: For fixed ξ ≥ a as a parameter (in fact initial point), define G,
for t ≥ ξ to be the solution of
LG(., ξ ) = 0, G(ξ , ξ ) = 0, Gt (ξ , ξ ) = 1.
Further, define G(t, ξ ) = 0 for t ∈ [a, ξ ]. Then, y given by (3.3.29) will
satisfy the non-homogeneous equation Ly = r satisfying the initial
conditions y(0) = ẏ(0) = 0. The kernel G is called Green’s function
associated with the problem (3.3.16). For a class of problems associated
with second order equations, the process of obtaining G is done in detail
in Chapter 9.
3.4 Partial Differential Equations and ODE

In this section, we give two examples from PDE whose solution can be
obtained via ODE. In the first example, we consider a first order linear
equation with constant coefficients. The solution to the IVP for the PDE
can be derived by solving two ODEs. The interesting fact is that the
so-called method of characteristics can be applied to any general first
order PDE. We will study the general case in great detail in Chapter 10.
The second example is the heat equation which is a second order linear
PDE. A precise study of this equation needs more sophisticated tools like
semigroup theory, distribution theory and Sobolev spaces which are not
within the scope of this book. The purpose of presenting this example is
that we can reduce the PDE problem to that of an IVP for ODE, though in
an infinite dimensional Hilbert space. Hence, it will not come under the
purview of this book as it involves the modern tools indicated earlier.
Example 3.4.1
Consider the simplest PDE, namely the transport equation in two

independent variables t, x :
ut (x,t ) + cux (x,t ) = 0, (3.4.1)
for t > 0, x ∈ R, where c > 0 is a given constant. This is a linear, first
order PDE in two dimensions. Introduce the ODE ẋ = c in the upper half
(x,t ) plane, t > 0, whose solutions are given by x − ct = ξ0 . These are
straight lines with slope c (see Fig. 3.4) and we denote these curves by
t
((x,t )
(x(t ),t )
x(t ) = ct + x0 slope c
x x
x0 x0 = (x − ct, 0)
(a)
u
u(.,t0 ) = u0 (. − ct0 )
x x
t=0 t = t0
(b)
Fig. 3.4 (a) Characteristic curves, (b) Solution curves
x(t ), that is, x(t ) = ct + ξ0 . These curves, straight lines in this case, are
called the characteristic curves of (3.4.1). In general, when c is a function
of t and x, these characteristic curves need not be straight lines. Fix ξ = ξ0
and restrict u along this curve (line). Consider the function of one variable
g(t ) = u(x(t ),t ). Now, it is easy to see by the chain rule that
d d
g(t ) = u(x(t ),t ) = ux ẋ(t ) + ut .1 = ux .c + ut = 0. (3.4.2)
dt dt
Solving this ODE, we observe that u is constant along the curve x − ct =

ξ0 . In other words, u takes the same value along the fixed line constructed
earlier. This observation can be used to solve the IVP for the PDE (3.4.1)
as follows: Suppose the initial values of u are given on the x-axis, that
is, u(x, 0) = u0 (x) is given and u0 is differentiable. Now, for any point
(x,t ) in the upper half plane, t > 0, draw the straight line with slope c
which will meet the x-axis at the pint (x − ct, 0). By this observation, we
get u(x,t ) = u(x − ct, 0) = u0 (x − ct ).
Verify that this indeed is a solution to the IVP for PDE. Thus, we have
obtained the solution of (3.4.1) which satisfies the prescribed initial values.
We observe that the solution of the PDE was obtained by solving two
ODEs. The initial curve, where initial value u0 is assigned, namely the x-
axis in this case, can be general. But, at the same time, one cannot choose
the initial curve arbitrarily. For example, one can choose a curve Γ which
intersects all the characteristics, (see Fig. 3.5 (a)), but at the same time, it
should not coincide with a characteristic curve as in Fig. 3.5 (b), because
this will contradict the fact that u is a constant along the characteristics.
So the initial curve will have to satisfy certain transversality conditions,
which we will discuss in Chapter 10. We will also see in Chapter 10 that
the solvability of any first order PDE can be reduced to the solvability of
systems of ODE by the method of characteristics, but the ODE system
may be more complicated.
t t
Characteristic curves Characteristic curves
Initial curve Γ Initial curve Γ
x x
(a) (b)
Fig. 3.5 Characteristic and initial curves
We now discuss a second order linear PDE, namely, the heat equation.
Our only purpose here is to indicate that this equation may be viewed
as an ODE, though in an infinite dimensional (Hilbert) space. As such,
many of the following terminology are not explained in a rigorous manner.
An inquisitive reader may explore this and similar topics after gaining
sufficient knowledge in functional analysis and related topics.
Example 3.4.2
[Heat Equation] Consider the IVP for the heat equation
ut − uxx = 0, u(x, 0) = u0 (x), (3.4.3)

for t > 0, x ∈ R. Again u = u(x,t ) is an unknown function to be
determined. We transform this problem to an ODE problem in a Hilbert
space. Let X be the space of all compactly supported smooth (twice
dv
differentiable) functions v = v(x) in R such that both v and are square
dx
integrable over R. We equip X with the norm given by
Z 2
dv
Z
2 2
kvk ≡ |v(x)| + .
R R dx
Unfortunately, this space is not complete. Let H be the completion of X.

One of the major difficulties is the characterization of H, which can be
done only with modern tools like distributions and Sobolev spaces.
We will be looking for a solution u(x,t ) such that for each t, the
function U (t ) ≡ u(·,t ) ∈ H. Of course, we need certain smoothness in the
t variable as well. We define an operator A in an appropriate subspace of
H by Av ≡ vxx . With these notations, we may rewrite the problem (3.4.3)
as
dU
= AU, U (0) = u0 .
dt
dU
Here has to be interpreted in an appropriate infinite dimensional space.
dt
The essence is that we have an IVP for ODE in an infinite dimensional
space as well. The operator A is linear, but it is unbounded. Again, we
need a new tool such as the semi group theory from functional analysis,
to study the existence and uniqueness of solutions to the aforementioned
ODE. When A is a bounded operator in a finite dimensional space, it is the
standard linear system which we will study in Chapter 5.
3.5 Exercises
1. Prove that every separable equation is exact.
2. Find the unique solution to the IVP ẏ = 2yt , y(t0 ) = y0 , where t0 6= 0.
Also find the interval of existence and plot the solution for different
values of y0 .
3. Show that the solution of ẏ + (sint )y = 0, y(0) = 32 is given by
y(t ) = 32 ecost−1 .
dy 2
4. The solution of + et y = 0, y(1) = 2 can be represented as y(t ) =
dt
t
Z
2
2 exp − es ds .
1
5. Classify the following into linear or non-linear:
(a) ẏ = ay − by2 , ẏ = −t/y, ẏ = −y/t, ẏ(t ) = sin(t ),
sin y + x cos(ẏ) = 0 ẏ = |y|, yẏ = y,
ẏ = sin y, yẏ = Wg (W − B − cy).
(b) (Duffing equation): ÿ + δ ẏ + αy + β x3 = 0
(c) (van der Pol equation): ÿ − µ (y2 − 1)ẏ + y = 0
(d) (Prey–predator system): ẋ = ax − bxy, ẏ = −cy + dxy
(e) (Epidemiology): Ṡ = −β SI, I˙ = β SI − γI
(f) (Bernoulli equation): ẏ + φ (t )y = ψ (t )yn
(g) (Reduced Bernoulli equation): ẏ + (1 − n)φ (t )y = (1 − n)ψ (t )
(h) (Generalized Riccati equation): ẏ + ψ (t )y2 + φ (t )y + χ (t ) = 0
6. Consider the Bernoulli equation
ẋ + φ x = ψxn ,
where φ , ψ are continuous functions. For n 6= 1, it is non-linear;
show that it can be reduced to a linear equation by the substitution
y = x1−n . Then, solve the equation.
7. Find the general solution of (i) ẋ + et x = et x2 (ii) ẋ + t n x = xn .
8. Consider the Jacobi equation
(a1 + b1t + c1 x)(tdx − xdt ) − (a3 + b3t + c3 x)dx
+(a3 + b3t + c3 x)dt = 0,

where ai , bi , ci s are constants. Transform the variables (t, x) → (τ, y),

where t = τ + α and x = y + β . Choose α, β appropriately to get the
equation
τdy − ydτ + φ (y/τ )dy + ψ (y/τ )dτ = 0.
Now make the substitution y = τu to bring it to Bernoulli’s form
dτ
+ h(u)τ + g(u)τ 2 = 0.
du
9. Consider the generalized Riccati equation
ẏ + ψ (t )y2 + φ (t )y + χ (t ) = 0,
where ψ, χ, φ are functions of t. In general, we do not have solutions
in explicit form. Assume x = x1 is one known solution, let x be any
other solution. Write x = x1 + y. Show that y satisfies the Bernoulli
equation
ẏ + (2x1 ψ + φ )y + ψy2 .
10. Find the general solution of the equation ẏ + y2 + y− (1 + t + t 2 ) = 0.
11. Assume ψ (t ) 6= 0 for all t in the Riccati equation ẏ + ψ (t )y2 +
1 ż
φ (t )y + χ (t ) = 0. Show that the transformation x = , reduces
ψz
the Riccati equation to a linear second order equation for z.
12. Reduce the original Riccati equation, ẏ + ay2 = bt m , where a, b are
constants, to a second order linear equation z̈ − abt m z = 0 using the
transformation in Exercise 11.
13. Find the general solution of the following Riccati equations:
(a) ẏ = (1 − 2t − t 3 ) + 2(1 + t 2 )y − ty2 , given that y(t ) = t is a
solution of this equation.
(b) ẏ = (t + 1t )y− 12 (t 2 + y2 ), given that this equation has a solution
given by y(t ) = t.
14. a) Consider the linear problem ẏ + py = q. Show that, if q ≥ 0,
then, y ≥ 0 if it is initially so, that is, if y(0) ≥ 0. Now consider the
equation ẋ + px = q1 and ẏ + py = q2 , then compare the solutions
when q1 ≥ q2 .
b) Consider ẋ + p1 x = q and ẏ + p2 y = q. Show that, if p2 ≥ p1 ,

x(0) ≥ y(0) and y ≥ 0, then x ≥ y.
c) Consider the inequality ẏ + py ≤ q. Derive the inequality

Zt Z t Z s
y(t ) ≤ exp − p(s)ds y(0) + q(s) exp p(z)dz ds .
0 0 0
d) Derive Gronwall’s inequality. Assume f and g are continuous real

valued functions defined on the interval [a, b] and g ≥ 0. Assume
Z t
f (t ) ≤ c + k f (s)g(s)ds,
t0
where c, k are constants, k ≥ 0. Then

Zt
f (t ) ≤ c exp k g(s)ds , t0 ∈ [a, b].
t0
e) (Uniqueness) Let p, q be continuous functions on [a, b]. Show that

the linear initial value problem ẋ + p(t )x = q(t ), x(t0 ) = x0 has at
most one solution.
15. Verify linear independence of solutions in Subsection 3.3.2.
16. Find the general solution of the following equations
y t +1
(a) ẏ − a =
t t
(b) (t − t 3 )ẏ + 2(t 2 − 1)y − at 3 = 0
(c) ẏ + ty = t 3 y3
(d) y − ẏ cost = y2 cost (1 − sint )
1 3y2

2y
(e) + 4 dt = dy
t2 t t3
t 2 dy − y2 dt
(f) =0
(t − y)2
17. Find the general solution of the following equations
(a) t 2 ÿ + t ẏ − y = 0.
(b) t ÿ − (t + n)ẏ + ny = 0.
(c) ÿ − f (t )ẏ + ( f (t ) − 1)y = 0.
d3y a
(d) t 3
= 2 and ÿ = 3 .
dt y
18. Three solutions of a certain second order non-homogeneous linear
equation in R are
ϕ1 (t ) = t 2 , ϕ2 (t ) = t 2 + e2t , ϕ3 (t ) = 1 + t 2 + 2e2t .
Find the general solution of this equation.
19. Three solutions of a certain second order non-homogeneous linear
equation L y = g in R are
ψ1 (t ) = t 2 , ψ2 (t ) = t 2 + e2t , ψ3 (t ) = 1 + t 2 + 2e2t .
Here g is a continuous function in R. Find the solution y of L y = g
satisfying y(0) = 1, y0 (0) = 2.
20. Assume the unique existence of a solution to the nth order IVP
Ly := y(n) + p1 (t )y(n−1) + · · · + pn−1 (t )y(1) + pn (t )y = 0
y(t0 ) = y0 , ẏ(t0 ) = y1 , · · · y(n−1) (t0 ) = yn−1 ,

(3.5.1)
where p1 , · · · , pn are continuous functions. Let S be the set of all
solutions to Ly = 0. Show that S is a linear space of dimension n.
21. Let f (t, y, ẏ) = h(t ) be the general form of the first order equation,
where h = h(t ) is all the combined non-homogeneous terms.
Consider L(r, s) = f (t, r, s), where t is fixed. If L is linear in r and
s, show that there exists functions p0 = p0 (t ) and p1 = p1 (t ) so
that f takes the form,
f (t, y, ẏ) = p0 (t )ẏ(t ) + p1 (t )y(t ).
More generally, an nth order linear equation has the general form,
p0 (t )y(n) (t ) + p1 (t )y(n−1) (t ) + · · · + pn ((t )y(t ) = h(t ).

22. Escape Velocity Problem: Determine the smallest velocity with

which a body must be thrown vertically upwards so that it will not
return to the earth; air resistance may be neglected.
Let M denote the mass of the earth and m, the mass of the body to
be thrown. By Newton’s law of gravitation, the force of attraction f
Mm
acting on the body is f = k 2 , where r is the distance between the
r
center of the earth and the center of gravity of the body and k is the
gravitational constant. The second law of motion, then, implies that
the motion of the body is governed by the differential equations
d2r Mm dr M
m 2
= −k 2
; that is, 2
= −k 2 .
dt r dt r
The minus sign indicates that the acceleration is negative; air
resistance has been omitted. Solve this equation with initial
dr
conditions r + R, = v0 at time t = 0, where R is the radius of the
dt
earth and further, show that
2
r2 M

v kM
≡ + 0− ,
2 r 2 r
dr
where v = . Conclude that the escape velocity v0 satisfies
dt
2kM
v20 ≥ . In the usual CGS units, k = 6.66 × 10−8 cm3 /g. sec2
R
and R = 63 × 107 cm. At the earth’s surface r = R, the acceleration
kM
due to gravity is g = 981cm/sec2 . Therefore, g = 2 . Hence, we
R
require, v20 ≥ 2gR, which gives an approximate value of v0 as
11.2 × 105 cm/sec2 .
3.6 Notes
The discussion on linear first and second order equations is available in
every basic book on ODE. In addition to the linear equations, we have
also introduced a section on exact differential equations. On one hand,
we have shown how every first order linear equation can be reduced to an
integral calculus problem, namely ẏ(t ) = h(t ) with the introduction of an
integrating factor (I.F.). This also makes it clear why an IVP for a first
order equation requires just one condition as integral calculus problems

require only one condition to find the constant of integration. By
transforming the second order equation into a system of two first order
equations, we see that two (initial) conditions are needed for the
corresponding IVP. We have made an attempt to clarify these issues
regarding the number of (initial) conditions. We have also presented exact
differential equations in a natural way.
Though first order linear equations can be reduced to an integral
calculus problem, such a scheme is not available for a second order linear
equation. A complete solution structure is presented by showing that the
solution space is a two-dimensional vector space. This analysis may not
be available in many books as most books deal with second order linear
equations with constant coefficients. However, here we have emphasized
that there is no general method to find two linearly independent solutions.
We have also presented some methods in this chapter to find the general
solution. An analysis of the spring–mass–dashpot system has been
carried out to further exhibit the importance of analysis even when an
explicit solution is available. As far as references are concerned, there are
many; see [Bra78, Bra75, BR03, CL72, Sim91, SK07, Tay11, MU78].
4
General Theory of Initial
Value Problems
4.1 Introduction
4.1.1 Well-posed problems
In this chapter, we address the problem of the existence and uniqueness
of solutions of initial value problems (IVP). For this purpose, our first
task is to ensure that the given differential equation has a solution. A
mathematical model originating from a real life system may exhibit more
than one solution starting from the same initial condition, though a
unique solution is expected. This may be due to rough approximations
and assumptions made while making a mathematical model of the
physical system. On the other hand, a mathematical model may not have
a solution at all. Similarly, it is also important to study the behaviour of
the solution with respect to the initial data as the initial data is usually
measured by using some devices and is bound to have some small errors.
Continuous dependence of solutions on initial data guarantees that a
small error in the initial data does not cause a drastic change in the
solution of the system. According to the French mathematician Jacques
Hadamard, if an initial value problem arising from a physical
phenomenon qualifies the above mentioned tests, namely, a solution
exists (existence problem), the solution is unique (uniqueness problem)
and the solution depends continuously on the initial conditions (stability
problem) in appropriate norms, then the IVP is said to be well-posed.
Otherwise, the problem is ill-posed. In this chapter, we will address these
issues and prove results which ensure the well-posedness of an IVP,
under suitable assumptions. We consider the following IVP
ẏ = f (t, y), y(t0 ) = y0 , (4.1.1)

where, f : D ⊂ R2 → R is a function not necessarily linear and is assumed
to be continuous in an open connected set D of R2 containing the initial
point (t0 , y0 ). By a solution of the initial value problem (4.1.1), we mean a
continuously differentiable function y ∈ C1 (I ), where t0 ∈ I is an interval
in R, satisfying (4.1.1). This means that y(t0 ) = y0 and for each t ∈ I, the
point (t, y(t )) ∈ D and ẏ(t ) = f (t, y(t )). In Chapter 3, we have already
seen some examples including an ill-posed linear problem!
In general, the solvability property of (4.1.1) will fall under one of the
following three cases: The IVP (4.1.1) has
(i) a unique solution.

(ii) no solution.
(iii) infinitely many solutions.
When the function f (t, y), (called the vector field) is not continuous at a
point in the (t, y)-plane, then there may be a possibility of non-existence
of a solution at that point. Similarly, if the vector field is not differentiable
at a point, then there may be a possibility of non-uniqueness of solutions
through that point. The initial condition also plays a crucial role in the
existence and behaviour of the solution of an IVP.
Before proceeding further, we will consider some examples which
exhibit one or more phenomena discussed here. Also see Examples 3.1.2,
3.1.3.
4.1.2 Examples
Example 4.1.1
Consider the linear differential equation ẏ = 2y.
It has been shown in Chapter 3 that the function y(t ) = ce2t is a solution
of the equation for every arbitrary constant c. Thus, the differential
equation has infinitely many solutions. Geometrically, this is a
one-parameter family of curves.
Let us remark that a differential equation always comes with some
associated physically meaningful conditions such as initial conditions
and boundary conditions. So when we talk about well-posedness, it is for
the differential equation together with the associated conditions provided.
General Theory of Initial Value Problems 101
In this example, if the solution is required to satisfy an initial condition,

for example, y(0) = 1 then, the unique solution is given by y(t ) = e2t .
Example 4.1.2
3
We consider the problem ẏ = y,
t
with various initial conditions to exhibit multiplicity of solutions, non-
existence and a unique existence. Note that the vector field is not defined
at t = 0. By separable variable method, we get
y = ct 3
as the solution for the arbitrary constant c. Now consider the initial
condition y(0) = 0, we see that y = ct 3 is a solution to the initial value
problem for any value of c. Thus, the IVP has infinitely many solutions
with the initial condition y(0) = 0. All the solution curves pass through
(0, 0) in the (t, y)-plane.
On the other hand, if we take the same differential equation but with
the initial condition y(0) = 2, then the IVP has no solution as the general
solution is y = ct 3 . The trouble is due to the singularity at t = 0. It is also
easy to see that the IVP with y(t0 ) = y0 , with t0 6= 0 has a unique solution
y0
y(t ) = 3 t 3 .
t0
We now give an example of an initial value problem in which f is
continuous but not linear and exhibit infinitely many solutions.
Example 4.1.3
Consider the nonlinear IVP ẏ = 3y2/3 , y(0) = 0.
Clearly y ≡ 0 is a solution known as the trivial solution. Separating the

variables and integrating, we get
y(t ) = (t + c)3 .
Using the initial condition y(0) = 0, we have y(t ) = t 3 as a solution. Thus,
we get two solutions. In fact, this IVP has infinitely many solutions. It is
left as an easy exercise to verify that for any fixed a > 0, the function y(t )
defined by

 0, 0≤t ≤a
y(t ) =
 (t − a)3 , t >a
is a solution to the given IVP.

Example 4.1.3 shows that even continuity is not enough for uniqueness
of solution to an IVP. We shall see later that if f has a bounded derivative,
then the solution of the IVP is unique. We will also prove the uniqueness
under the assumption of Lipschitz continuity. But, even when a unique
solution exists for an IVP and the vector field is very smooth, the solution
may not exist for all values of t. This fact is illustrated in the following
example.
Example 4.1.4
Consider the nonlinear initial value problem:
ẏ = y2 , y(0) = y0 > 0.
y0
By an integration and using y(0) = y0 , we get a solution y(t ) = .
1 − y0t
It is the only solution to the problem. We will see this fact in the next
1
section. Note that y(t ) is defined only for t < , despite the fact that the
y0
vector field f (t, y) = y2 is very smooth on the entire real line. If y0 is large,
then we have solutions only on a very small interval to the right of t = 0;
however, they exist for all t < 0.
We now see an example of a nonlinear IVP which does not have a
solution.
Example 4.1.5
Consider the nonlinear initial value problem ẏ = −sgn(y), where the

vector field sgn(y) is defined by

 1, y ≥ 0
sgn(y) =
 −1, y < 0.
Let y(0) = 0 be the initial condition. The vector field −sgn(y) is

discontinuous at y = 0. If there is a solution to the IVP, then, the solution
must be decreasing initially as ẏ(0) = −1. Therefore, y(t ) < 0 for some
t > 0. However, for all negative values of y, the solution must be
increasing. Since these two statements contradict each other, there is no
solution to this IVP.
4.2 Sufficient Condition for Uniqueness of Solution

We first address the question of uniqueness of a solution to IVP. In fact, if
the function f (t, y) is continuous in a domain D in the (t, y)-plane
containing the initial point (t0 , y0 ), then existence of a local solution is
guaranteed in some interval (t0 − α,t0 + α ), α > 0, being a constant.
This is known as the Cauchy–Peano theorem, which will be proved later.
To ensure the uniqueness of solution of the IVP (4.1.1), we need a
stronger assumption, namely, the Lipschitz continuity of f (t, y) with
respect to y (see Chapter 2).
4.2.1 A basic lemma

We now convert the IVP into an equivalent integral equation, which will
be used to establish existence and uniqueness theorems for the IVP. The
IVP (4.1.1) can be reduced to a nonlinear Volterra integral equation as we
see in the following basic lemma.
Lemma 4.2.1
[Basic Lemma]
Let D be an open connected set in R2 and (t0 , y0 ) ∈ D. Assume that f is

continuous in D. If y is a solution to the IVP (4.1.1) in an interval (a, b)
containing t0 , then y satisfies the integral equation
Z t
y(t ) = y0 + f (τ, y(τ ))dτ, t ∈ (a, b). (4.2.1)
t0
Conversely, if y is a continuous function defined on (a, b), which satisfies

the integral equation (4.2.1) for all t ∈ (a, b), then y ∈ C1 (a, b) and satisfies
the IVP (4.1.1).
Proof: Let y be a function satisfying the IVP (4.1.1). Now integrating

(4.1.1) over (t0 ,t ), we get
Z t Z t
ẏ(τ )dτ = f (τ, y(τ ))dτ,
t0 t0
that is,
Z t
y(t ) = y0 + f (τ, y(τ ))dτ.
t0
Therefore, y satisfies the integral equation (4.2.1). Conversely, since f

and y are continuous, the composite function t 7→ f (t, y(t )), t ∈ (a, b) is
continuous and hence integrable. By the fundamental theorem of integral
calculus, we then infer that the function defined by the integral on the
right side in (4.2.1) is a differentiable and hence, y is also differentiable.
Differentiating (4.2.1) with respect to t, we obtain
ẏ(t ) = f (t, y(t )).
Further, y(t0 ) = y0 proving that y satisfies (4.1.1).
Remark 4.2.2
One can analyse the solution of (4.2.1) without the continuity

assumption of f but with certain integrability assumptions. This will
only give continuity of y and hence, we will not be able to derive
(4.1.1) from the integral formulation (4.2.1). Such solutions are
called mild solutions or weak solutions of the initial value problem.
Mild solutions are important in applications, for example, in control
theory, where the control function (continuous or discontinuous) may
appear through the source term f and one has to interpret the
solutions in the form of mild solutions.
Before proceeding to prove the existence, we will prove the uniqueness

which is quite easy. This is done via Gronwall’s inequality which is given
as an exercise in Chapter 1. However, we now state and prove Gronwall’s
inequality.
Lemma 4.2.3
[Gronwall’s Inequality] Suppose that p and q are continuous real

valued functions defined on [a, b] with q(t ) ≥ 0 on [a, b]. Assume p, q
satisfy
Z t
p(t ) ≤ C + k q(s) p(s)ds,
t0
for all t ∈ [a, b], where t0 ∈ [a, b] is fixed and C, k are constants with
k ≥ 0. Then,
Zt
p(t ) ≤ C exp k q(s)ds (4.2.2)
t0
for all t ∈ [a, b].

Z t
Proof: The proof is quite easy. Put F (t ) = C + k q(s) p(s)ds. Then,
t0
by hypothesis, we have
p(t ) ≤ F (t ). (4.2.3)
Using this inequality, we get
Ḟ (t ) = kq(t ) p(t ) ≤ kq(t )F (t ),
as
k and q are non-negative. Now, using the integrating factor
Z t
exp −k q(s)ds for the aforementioned differential inequality, we
t0
get
Zt
d
exp −k q(s)ds F (t ) ≤ 0.
dt t0
An integration over [t0 ,t ] gives

Zt
exp −k q(s)ds F (t ) ≤ F (t0 ) = C.
t0
Now using (4.2.3), we get the required Gronwall’s inequality (4.2.2).
Remark 4.2.4
If C ≡ 0, that is, p satisfies

Z t
p(t ) ≤ k q(s) p(s)ds
t0
with k, q non-negative, then p ≤ 0 by Gronwall’s inequality. In

particular, if p ≥ 0, then p ≡ 0.
4.2.2 Uniqueness theorem

We now prove the uniqueness theorem using Gronwall’s inequality, the
basic lemma and the Lipschitz continuity of f (t, y) with respect to y. Here,
we wish to clarify that uniqueness means that it does not have more than
one solution. We do not interpret that it has a solution.
Theorem 4.2.5
[Uniqueness] Let a, b be positive real numbers. Suppose that f =

f (t, y) is continuous on the rectangle R = {(t, y) ∈ R2 : |t − t0 | ≤
a, |y − y0 | ≤ b} and Lipschitz continuous with respect to y in R. Then,
the IVP (4.1.1) has at most one solution.
Proof: Suppose that y and z are two solutions of the IVP (4.1.1) defined
on an interval [c, d ] contained in the interval [t0 − a,t0 + a] and t0 ∈ [c, d ].
Thus by the basic lemma, we have
Z t Z t
y(t ) = y0 + f (τ, y(τ ))dτ and z(t ) = y0 + f (τ, z(τ ))dτ
t0 t0
for all t ∈ [c, d ]. Subtracting, we get

Z t
y(t ) − z(t ) = [ f (τ, y(τ )) − f (τ, z(τ ))]dτ,
t0
Using the Lipschitz continuity of f , we have

Z t
|y(t ) − z(t )| ≤ α|y(τ ) − z(τ )|dτ,
t0
where α is the Lipschitz constant of f . Applying Gronwall’s inequality

with C = 0, k = α, q ≡ 1 and p = |y − z|, we get
|y(t ) − z(t )| = 0
for all t ∈ [c, d ] and hence the solution is unique.
4.3 Sufficient Condition for Existence of Solution

We now provide sufficient conditions to ensure the existence of a solution
to IVP (4.1.1). We discuss three theorems on the existence namely,
Picard’s theorem, Cauchy–Peano theorem and fixed point theorem. We
assume Lipschitz continuity of the function f in Picard’s and fixed point
iterations whereas, we prove Peano’s existence theorem with only the
continuity assumption on f . We first prove the theorem due to Picard,
where Lipschitz condition is also assumed and hence uniqueness is
guaranteed in the light of the uniqueness theorem we just proved.
Theorem 4.3.1
[Picard’s Existence Theorem]
Let D be an open connected set in R2 . Assume f : D → R, satisfies the

following conditions:
(i) f (t, y) is continuous on D.
(ii) f (t, y) is Lipschitz continuous with respect to y on D with Lipschitz

constant α > 0.
Let (t0 , y0 ) ∈ D and a and b be positive constants such that the rectangle
R defined by
R = {(t, y) : |t − t0 | ≤ a, |y − y0 | ≤ b}

b
is a subset of D. Let M = max | f (t, y)| and h = min a, . Then, IVP
(t,y)∈R M
(4.1.1) has a unique solution in the interval |t − t0 | ≤ h.
Proof: Since R is a closed rectangle inside D, f satisfies all the

properties assumed in the theorem inside the rectangle R as well. Let R1
be the rectangle defined by:
R1 = {(t, y) : |t − t0 | ≤ h, |y − y0 | ≤ b} .
b b b
If a ≤ , then h = a and R1 = R, and if ≤ a, then h = and R1 ⊂ R
M M M
as shown in Fig. 4.1.
D
y = y0 + b
R R1
t = t0 − a t = t0 + a
(t0 , y0 )
t0 − h y = y0 − b t0 + h
t
Fig. 4.1 Picard’s theorem
Consider the interval [t0 ,t0 + h]. Similar arguments hold for the interval
[t0 − h,t0 ]. The proof will be established by the construction of successive
approximations, called Picard’s iterates {yn }, n = 0, 1, 2, · · · and showing
that {yn } converges uniformly to some y defined on [t0 ,t0 + h], a solution
of the integral equation (4.2.1). The basic lemma (Lemma 4.2.1), then
gives the existence of a solution to (4.1.1). The proof will be accomplished
through the following three steps.
Step 1: Here we define Picard’s iterates. For t ∈ [t0 ,t0 + h], let
y0 (t ) = y0
and define successively,
Z t
yn (t ) = y0 + f (τ, yn−1 (τ )) dτ (4.3.1)
t0
for n = 1, 2, · · · . Our first task is to show that the yn s are well-defined on

[t0 ,t0 + h] and are continuously differentiable. We do this by using an
induction argument. Note that yn is well-defined only if (t, yn−1 (t )) ∈ R1
for all t ∈ [t0 ,t0 + h]. This holds trivially for y0 . Assume, for n ≥ 1,
(t, yn−1 (t )) ∈ R1 and |yn−1 (t ) − y0 | ≤ b holds, for all t ∈ [t0 ,t0 + h]. We
show that the same statements are true when yn−1 is replaced by yn and
that completes the induction argument. Since R1 ⊂ R, we have
| f (t, yn−1 (t ))| ≤ M on [t0 ,t0 + h].
Z t
Now consider yn (t ) = y0 + f (τ, yn−1 (τ )) dτ. The aforementioned
t0
induction assumption implies that the definition of yn (t ) makes sense and
yn is continuously differentiable on [t0 ,t0 + h]. Now,
Z t Zt

|yn (t ) − y0 | = f (τ, yn−1 (τ )) dτ ≤
| f (τ, yn−1 (τ ))| dτ
t0 t0
≤ M (t − t0 ) ≤ Mh ≤ b.
Thus, (t, yn (t )) lies in the rectangle R1 and hence, f (t, yn (t )) is defined
and continuous on [t0 ,t0 + h]. Hence, the said properties hold also for yn
and induction is complete.
Step 2: In this step, we show that the sequence of functions {yn },

constructed in Step 1 satisfy the inequality
M (α (t − t0 ))n
|yn (t ) − yn−1 (t )| ≤ on [t0 ,t0 + h], (4.3.2)
α n!
also using mathematical induction. Assume that, for n ≥ 2,
Mα n−2
|yn−1 (t ) − yn−2 (t )| ≤ (t − t0 )n−1 on [t0 ,t0 + h]. (4.3.3)
(n − 1) !
Then,
Z t

|yn (t ) − yn−1 (t )| = f (τ, yn−1 (τ )) − f (τ, yn−2 (τ )) dτ

t 0
Z t
≤ | f (τ, yn−1 (τ )) − f (τ, yn−2 (τ ))| dτ.
t0
Since |yn (t ) − y0 | ≤ b for all n and t ∈ [t0 ,t0 + h], we get

(t, yn−1 (t )), (t, yn−2 (t )) in R1 for t ∈ [t0 ,t0 + h]. By applying the
Lipschitz continuity of f , we have
Z t
|yn (t ) − yn−1 (t )| ≤ α |yn−1 (τ ) − yn−2 (τ )| dτ
t0
Mα n−2
Z t
≤ α (τ − t0 )n−1 dτ, by (4.3.3)
t0 (n − 1) !
Mα n−1 (τ − t0 )n t

=
(n − 1) ! n t0
Mα n−1
= (t − t0 )n .
n!
Therefore, the inequality is true for n. For the case n = 1, we have
Z t
|y1 (t ) − y0 | ≤ | f (τ, y0 )| dτ ≤ M (t − t0 ).
t0
Thus, the inequality (4.3.2) is true for n = 1, and hence, it is also true for
any n ≥ 1 as shown earlier using mathematical induction.
Step 3: In this final step, we show the sequence of functions {yn },

defined in Step 1, converges uniformly to a function y on [t0 ,t0 + h]. Now,
consider the series
M ∞ (αh)n M αh (αh)2 (αh)3

= + + + ··· (4.3.4)
α n∑=1 n! α 1! 2! 3!
formed by the constants on the right side in the inequality (4.3.2). This
series converges to M
α (e − 1). Now consider the infinite series
αh
∞
∑ |yn (t ) − yn−1 (t )|
n=1
which is dominated by the series (4.3.4). Hence, by Weierstrass M-test,

∞
the series y0 + ∑ [yn (t ) − yn−1 (t )] converges uniformly on [t0 ,t0 + h] to
n=1
a limit function, say y. Therefore, y is continuous on [t0 ,t0 + h]. Observe
n
that the sequence of partial sums Sn (t ) = y0 + ∑ [yi (t ) − yi−1 (t )] = yn (t ).
i=1
Thus, the sequence {yn } converges uniformly to y on [t0 ,t0 + h].
Since each yn satisfies |yn (t ) − y0 | ≤ b on [t0 ,t0 + h], we get |y(t ) −

y0 | ≤ b on [t0 ,t0 + h]. This implies that f (t, y(t )) is defined on [t0 ,t0 + h].
Further, by the Lipschitz continuity of f , we have
| f (t, yn (t )) − f (t, y(t ))| ≤ α |yn (t ) − y(t )|
which shows that f (t, yn (t )) → f (t, y(t )) uniformly on [t0 ,t0 + h]. This
follows from the uniform convergence of {yn } to y on [t0 ,t0 + h].
Thus, by Theorem 2.2.4, (on interchangeability of limit and integral),
we get
Z t
y(t ) = lim yn+1 (t ) = y0 + lim f (τ, yn (τ )) dτ
n→∞ n→∞ t0
Z t
= y0 + lim f (τ, yn (τ )) dτ
t0 n→∞
Z t
= y0 + f (τ, y(τ )) dτ on [t0 ,t0 + h].
t0
Therefore, y satisfies (4.2.1). Thus, by the basic lemma, the limit function
y(t ) satisfies IVP (4.1.1) on [t0 ,t0 + h]. Using similar arguments, one can
show the existence of a solution on the interval [t0 − h,t0 ]. Thus, Picard’s
iterates converge uniformly to the unique solution of IVP (4.1.1). This
completes the proof.
Remark 4.3.2
Although the Lipschitz continuity was used in Theorem 4.3.1 to

establish the existence result, it is possible to establish the existence
theorem just by assuming continuity of f . However, to establish the
uniqueness of a solution, we need to use conditions like the Lipschitz
continuity of f with respect to y, or a weaker version of the Lipschitz
type condition given by
1
| f (t, y1 ) − f (t, y2 )| ≤ α|y1 − y2 | log
|y1 − y2 |
for all (t, y1 ), (t, y2 ) ∈ R1 , y1 6= y2 .
4.3.1 Cauchy–Peano existence theorem

We now state and prove the Cauchy–Peano existence theorem for the IVP,
where the function f (t, y) is only assumed to be continuous on a domain
D ⊆ R2 .
To prove the existence theorem, we first define an ε-approximate
solution to the IVP and prove that continuity on f is sufficient to
guarantee the existence of an ε-approximate solution for the IVP.
Subsequently, we show that the sequence of ε-approximate solutions of
the IVP is equi-continuous and uniformly bounded. Then, by invoking
the Arzela–Ascoli theorem, we show that there exists a convergent
subsequence which converges to a solution of the IVP.
Definition 4.3.3
[ε-approximate solution] Consider IVP (4.1.1), where f (t, y) is a real

valued continuous function defined on a domain D ⊆ R2 . Let ε >
0. An ε-approximate solution of IVP (4.1.1) on an interval I = [t0 −
a,t0 + a], a > 0, is a function y defined on I such that
(i) (t, y(t )) ∈ D, for all t ∈ I.
(ii) y ∈ C1 on I except possibly for a finite set S ⊂ I, that is, y ∈
C1 (I \ S) and ẏ(t ) may have simple discontinuities, that is, jump
type discontinuities, on I.
(iii) |ẏ(t ) − f (t, y(t ))| ≤ ε for t ∈ I \ S.
A function y ∈ C (I ) satisfying (ii) is said to have piecewise-continuous

derivative on I and this is indicated by writing y ∈ C1p (I ).
Theorem 4.3.4
[Existence of ε-Approximate Solution] Suppose that f (t, y) in IVP

(4.1.1) is continuous on the rectangle
R = {(t, y) : |t − t0 | ≤ a, |y − y0 | ≤ b} ,
where, a, b are positive real numbers. Let

b
M = max | f (t, y)| and h = min a, .
(t,y)∈R M
Then, for given ε > 0, there exists an ε-approximate solution y for the IVP
(4.1.1) on |t − t0 | ≤ h. Note that h does not depend on ε.
Proof: Let ε > 0 be given. An ε-approximate solution will be

constructed for the interval [t0 ,t0 + h]; a similar construction will define
the solution in the interval [t0 − h,t0 ]. The idea is to divide the interval
[t0 ,t0 + h] into subintervals by points t0 ,t1 , · · · ,tn such that
t0 < t1 < · · · < tn = t0 + h and in each interval [tk ,tk+1 ], we give a linear
approximation as follows:
Starting with the initial interval [t0 ,t1 ], approximate the given
differential equation (4.1.1) with ż = f (t0 , y0 ). Note that the latter ODE
has a constant slope f (t0 , y0 ). If t1 is close to t0 , then f (t, y0 ) will be
close to f (t, y(t )) in [t0 ,t1 ] and we expect z to be close to y in the small
interval [t0 ,t1 ]. Solving we get, z(t ) = y0 + f (t0 , y0 )(t − t0 ). With the new
initial value y1 = z(t1 ), we approximate IVP (4.1.1) by the IVP
ż = f (t1 , y1 ), z(t1 ) = y1 in the interval [t1 ,t2 ] and continue this process
till we reach tn . Effectively in each small interval, we are actually solving
an integral calculus problem. This idea is made precise as follows:
Since f is uniformly continuous on R, there exists δ = δ (ε ) > 0, such
that
| f (t, y) − f (t˜, ỹ)| < ε whenever |t − t˜| ≤ δ , |y − ỹ| ≤ δ .
y = y0 + b
slope M
t = t0 + a
(t0 , y0 ) t1 t2 t3 t = t0 + h
slope −M
Fig. 4.2 Cauchy–Peano existence

δ
Let δ1 = min δ , M . Now divide the interval [t0 ,t0 + h] into n parts t0 <
t1 < t2 < t3 < · · · < tn = t0 + h such that
max |tk − tk−1 | ≤ δ1 .
1≤k≤n
Then, in each interval [tk−1 ,tk ], we solve the IVP

ż = f (tk−1 , zk−1 ), z(tk−1 ) = zk−1 ,
where zk−1 is the value of z at tk−1 obtained from the previous interval.
From (t0 , y0 ) construct a straight line with slope f (t0 , y0 ) proceeding to
the right of t0 until it intersects the line t = t1 at some point (t1 , z1 ). Now,
repeat this process starting at (t1 , y1 ) with slope f (t1 , z1 ). See Fig. 4.2.
This line segment lies inside the triangular region bounded by the
lines starting from (t0 , y0 ) with slope M and −M. This follows from the
definition of h and the fact that | f (t, y)| ≤ M. In fact, in each interval
[tk−1 ,tk ], z is given by
z(t ) = z(tk−1 ) + f (tk−1 , z(tk−1 ))(t − tk−1 ).
Now define y on [t0 ,t0 + h] by
y(t ) = z(tk−1 ) + f (tk−1 , z(tk−1 ))(t − tk−1 ) (4.3.5)
for t ∈ [tk−1 ,tk ], k = 1, 2, 3, · · · , n. Thus, y is piecewise linear and may
fail to be differentiable only at t = tk , k = 0, 1, 2, · · · , n. Therefore, y ∈
C1p [t0 ,t0 + h]. For any t and t˜ ∈ [t0 ,t0 + h], say t˜ ∈ [ti−1 ,ti ], t ∈ [t j−1 ,t j ],
i ≥ j, we have
|y(t ) − y(t˜)| = |y(t ) − y(t j ) + y(t j ) − · · · − y(ti−1 ) + y(ti−1 ) − y(t˜)|
≤ M|t − t j | + M|t j − t j+1 | + · · · + M|ti−1 − t˜|
≤ M|t − t j + t j − · · ·ti−1 − t˜|
= M|t − t˜|. (4.3.6)

In particular, with t˜ = t0 , we get
b
|y(t ) − y(t0 )| = |y(t ) − y0 | ≤ M|t − t0 | ≤ Mh ≤ M. = b.
M
Thus, |y(t ) −y0 | ≤ b, which implies that (t, y(t )) ∈ R and hence, f (t, y(t ))
is well-defined. Now, for t ∈ (tk−1 ,tk ), we get from (4.3.5) that
δ
|y(t ) − y(tk−1 )| ≤ M|t − tk−1 | ≤ M =δ
M
and by uniform continuity of f , it follows that

|ẏ(t ) − f (t, y(t ))| = | f (tk−1 , y(tk−1 )) − f (t, y(t ))| ≤ ε.
This, in turn, implies that y(t ) is an ε-approximate solution of IVP (4.1.1).

Theorem 4.3.5
[Cauchy–Peano Existence Theorem] Let f (t, y) be continuous on the

rectangle R = {(t, y) : |t − t0 | ≤ a, |y − y0 | ≤ b}. Then, there exists a
(4.1.1) in the interval |t − t0 | ≤ h, where
solution to IVP
b
h = min a, , M = max | f |.
M R
1
Proof: Choose εn = , n = 1, 2, · · · . From Theorem 4.3.4, we have for
n
each εn , there exists an εn -approximate solution, which we denote by
yn (t ), defined on |t − t0 | ≤ h. This implies that
|yn (t ) − y0 | ≤ b
and hence, |yn (t )| ≤ |y0 | + b. Thus, the family of approximate solutions
{yn } is uniformly bounded. Again, from (4.3.6), we have
|yn (t ) − yn (t˜)| ≤ M|t − t˜|, for all t, t˜ ∈ [t0 ,t0 + h].
Therefore, {yn } is an equicontinuous family of functions; see
Chapter 2. Thus, by the Arzela–Ascoli theorem (Theorem 2.2.8), there
exists a subsequence {ynk } of {yn } such that ynk → y uniformly on
[t0 − h,t0 + h] as nk → ∞. This implies that y is continuous and
|y(t ) − y(t˜)| ≤ M|t − t˜|.
We, now prove that this limit function y is a required solution to IVP
(4.1.1). Consider the error defined by

 ẏnk (t ) − f (t, ynk (t )) if ẏnk exists
∆nk (t ) =
 0 otherwise.
That is, except possibly at finite number of points, we have

ẏnk = f (t, ynk (t )) + ∆nk (t ).
Integrating over (t0 ,t ), we get

Z t
ynk (t ) = y0 + ( f (τ, ynk (τ )) + ∆nk (τ )) dτ. (4.3.7)
t0
Now, ynk → y uniformly implies f (t, ynk (t )) → f (t, y(t )) uniformly on

[t0 − h,t0 + h]. This implies that
Z t Z t
f (τ, ynk (τ )) dτ → f (τ, y(τ ) dτ.
t0 t0
1
Furthermore, we have |∆nk | ≤ εnk = nk . Therefore, we can pass to the limit
in (4.3.7), to get
Z t
y(t ) = y0 + f (τ, y(τ )) dτ.
t0
Obviously y(t0 ) = y0 . This completes the proof, using the basic

lemma 4.2.1.
Remark 4.3.6
In general, the entire sequence {yn } may not converge to a solution.

In fact, there may exist some paths from yn which need not converge
in a neighbourhood of (t0 , y0 ). But, if we know a priori that the
solution is unique when it exists, then the entire sequence must
converge on |t − t0 | ≤ h. For, any subsequence of polygonal paths, by
the aforementioned result, there exists a further subsequence which
converges to a solution. As the solution is unique, all these limits are
the same. Hence, the entire sequence converges to the solution.
4.3.2 Existence and uniqueness by fixed point theorem

We now establish the existence and uniqueness of a solution to IVP
(4.1.1), by invoking a fixed point theorem, namely the generalized
Banach contraction principle. We arrive at the same conclusions as in
Theorem 4.3.1, under the same hypotheses therein. The proof is as
follows.
From the basic lemma, we have seen that the solvability of the initial
value problem follows from that of the nonlinear integral equation
Z t
y(t ) = y0 + f (τ, y(τ ))dτ.
t0
Define X = {y ∈ C [t0 ,t0 + h] : |y(t ) − y0 | ≤ b, for all t ∈ [t0 ,t0 + h]} which
is a closed ball in the Banach space C [t0 ,t0 + h] with the sup norm
kyk = sup |y(t )|.
t∈[t0 ,t0 +h]
Thus, X is a complete metric space. Define an operator T : X → X by

Z t
(Ty)(t ) = y0 + f (τ, y(τ ))dτ.
t0
Note that for y ∈ X, the function Ty is also in X by the choice of X and

h. Clearly, the fixed points of T are solutions of the integral equation.
We now prove that T n is a contraction for sufficiently large n ≥ 1. Let
y1 , y2 ∈ X; then, using the Lipschitz continuity of f , we have
Z t

|(Ty1 )(t ) − (Ty2 )(t )| = f (τ, y1 (τ )) − f (τ, y2 (τ ))dτ

t0
Z t
≤ α |y1 (τ ) − y2 (τ )|dτ
t0
≤ α (t − t0 ) ky1 − y2 k .
Successively applying the first and second inequalities, we get
2 Z t
(T y1 )(t ) − (T 2 y2 )(t ) ≤ α |(Ty1 )(τ ) − (Ty2 )(τ )|dτ

t0
Z t
≤ α2 (τ − t0 ) ky1 − y2 k dτ
t0
(t − t0 )2
= α2 ky1 − y2 k .
2
Hence,
2 2
T y1 − T 2 y2 ≤ α h ky1 − y2 k .
2
2
An induction argument now gives that, for any n ≥ 1,
α n hn
kT n y1 − T n y2 k ≤ ky1 − y2 k .
n!
α n hn
By choosing large n, the quantity can be made less than 1, and hence,
n
n!
T is a contraction. Thus, by the generalized Banach contraction principle
(Theorem 2.3.2 and its corollary 2.3.3), T has a unique fixed point. This
Note that the fixed point of T here is approximated by the iterates

Z t
yn+1 = Tyn = y0 + f (τ, yn (τ ))dτ,
t0
with y0 (t ) = y0 . These are nothing but Picard’s iterates defined earlier.
Remark 4.3.7
Suppose yn0 and fn be sequences of initial data such that yn0 → y0 in

R and fn → f in C (R ). Then yn → y, where yn and y are, respectively,
the solutions corresponding to the initial data yn0 , fn and y0 , f .
Example 4.3.8
Apply Picard’s iterates to solve the IVP

dy
= y, y(0) = 1.
dt
Let y0 (t ) = y0 = 1 and consider Picard’s iterates

Z t
yn+1 (t ) = y0 + yn (τ )dτ.
0
We successively get
Z t Z t
y1 (t ) = 1 + y0 (τ )dτ = 1 + 1dτ = 1 + t
0 0
t2
Z t Z t
y2 (t ) = 1 + y1 (τ )dτ = 1 + (1 + τ )dτ = 1 + t +
0 0 2
t2 t3
Z t
y3 (t ) = 1 + y2 (τ )dτ = 1 + t + +
0 2! 3!
··· ············
n
tm
yn (t ) = ∑ → et as n → ∞.
m=0 m!
But we know that the solution to the IVP is indeed given by y(t ) = et .
4.4 Continuous Dependence of the Solution on Initial

Data and Dynamics
The initial data includes the given initial value y0 and the dynamics f .
In practical applications, it is important to know how small errors in the
initial data affects the solution. In other words, we would like to know that
if the initial data is close to another initial data in appropriate norm, then
the corresponding solutions are also close to each other. This is known
as the continuous dependence of the solution on the initial condition and
dynamics.
Theorem 4.4.1
Let R be as in Theorem 4.3.1. Suppose f , f˜ ∈ C (R ) and be Lipschitz

continuous with respect to y on R with Lipschitz constants α, α̃,
respectively. Let y and ỹ be, respectively, the solutions of the IVP
ẏ = f (t, y), y(t0 ) = y0 and ỹ˙ = f˜(t, ỹ), ỹ(t˜0 ) = ỹ0 in some closed
intervals I1 , I2 containing t0 and t˜0 . For small |t0 − t˜0 |, let I any finite
interval containing t0 and t˜0 , where both y and ỹ are defined. Then,

max |y(t ) − ỹ(t )| ≤ |y0 − ỹ0 | + |I| max | f (t, y) − f˜(t, y)| + M|t0 − t˜0 | eα0 |I| ,
t∈I R

where |I| is the length of the interval I, M = max max f , max f˜ and
R R
α0 = min(α, α̃ ).
Proof: We give a proof when t0 = t˜0 . From the basic lemma, the solutions
y and ỹ satisfy the following integral equations:
Z t Z t
y(t ) = y0 + f (τ, y(τ )) dt, ỹ(t ) = ỹ0 + f˜(τ, ỹ(τ )) dτ,
t0 t0
for all t ∈ I. Subtracting the second equation from the first, we get
Z t
y(t ) − ỹ(t ) = y0 − ỹ0 + ( f (τ, y(τ )) − f˜(τ, ỹ(τ ))) dτ.
t0
When t0 6= t˜0 , an extra integral appears in this equation, which can be

estimated easily. Now add and subtract the term f (τ, ỹ(τ )) in the
aforementioned integral. We, then obtain
Z t
|y(t ) − ỹ(t )| ≤ |y0 − ỹ0 | + | f (τ, y(τ )) − f (τ, ỹ(τ ))|
t0
Z t
+ | f (τ, ỹ(τ )) − f˜(τ, ỹ(τ ))| dτ.
t0
Using the Lipschitz continuity of f with respect to y , this inequality can

be estimated as
Z t
|y(t ) − ỹ(t ) ≤ |y0 − y˜0 | + α |y(τ ) − ỹ(τ )| dt + |I| max | f (t, y) − f˜(t, y)|.
t0 R
Applying Grownwall’s inequality, we get

˜
|y(t ) − ỹ(t )| ≤ |y0 − y˜0 | + |I| max | f (t, y) − f (t, y)| exp(α|I|).
R
The same inequality is true when α is replaced by α̃ as well if we add and

subtract f˜(τ, y(τ )) instead of f (τ, ỹ(τ )). This completes the proof.
4.5 Continuation of a Solution into Larger Intervals

and Maximal Interval of Existence
Observe that the existence results discussed in Section 4.3 are local in
nature, that is, we could only claim the existence of a solution in a small
interval containing the initial time t0 . However, when the differential
equation is linear, that is, f (t, y) = a(t )y + b(t ), where a(t ) and b(t ) are
continuous functions defined on [t0 − a,t0 + a], then f is defined on
R = [t0 − a,t0 + a] × R. In this case, the solution is defined on the entire
interval [t0 − a,t0 + a], that is, h can be taken as a itself. But, this is not
true when f is not linear, that is, the solution may not exist in the interval
where f is defined. At the same time, it is possible that [t0 − h,t0 + h]
may not be the largest possible interval of existence. This leads to the
following question. Can we enlarge the domain of the solution y further?

More generally, what is the largest possible interval of existence?
Example 4.5.1
Recall the ODE in Example 4.1.4, namely ẏ = y2 , y(1) = −1.
Let R = {(t, y) : |t − 1| ≤ 1, |y + 1| ≤ 1}. Here, f (t, y) = y2 satisfies

continuity and Lipschitz continuity assumptions on R, t0 = 1, y0 = −1
and let |t − 1| ≤ h, be the interval on which existence is guaranteed (by
1
theorems), which can be computed as h = . Hence by Theorem 4.3.1,
4
the IVP has a solution on the interval [ 34 , 45 ].
Now by the method of separation of variables, we can integrate the
differential equation and use the initial condition to obtain the solution as
1
y(t ) = − . Thus, the solution exists for 0 < t < ∞. In other words, we can
t
continue the solution outside the interval [ 34 , 54 ]. At the same time, since
|y(t )| → ∞ as t → 0+, it ceases to exist to the left of 0, even though f is
a differentiable function defined everywhere. This is the typical nonlinear
phenomena and f is not Lipschitz in the entire R, but is Lipschitz in any
bounded interval.
4.5.1 Continuation of the solution outside the interval |t − t0 | ≤ h

The existence theorem (Section 4.3) guarantees that IVP (4.1.1) has a
solution φ0 on the interval [t0 − h,t0 + h]. Consider the right end point of
[t0 − h,t0 + h]. Let t1 = t0 + h, y1 = φ0 (t1 ). The point (t1 , y1 ) is inside R,
which is inside D. Now consider the IVP ẏ = f (t, y), y(t1 ) = φ0 (t1 ) = y1 .
Reapplying the existence theorem with this new initial condition to get a
solution φ1 on the interval t1 − h1 ≤ t ≤ t1 + h1 for some h1 > 0. Define

 φ0 (t ), t0 − h ≤ t ≤ t0 + h = t1
y(t ) =
 φ (t ), t ≤ t ≤ t + h .
1 1 1 1
Then,
Z t
φ0 (t ) = y0 + f (τ, φ0 (τ )) dτ, for t0 − h ≤ t ≤ t1
t0
Z t
φ1 (t ) = φ0 (t1 ) + f (τ, φ1 (τ )) dτ, for t1 ≤ t ≤ t1 + h1 .
t1
Thus, we have
Z t
y(t ) = y0 + f (τ, y(τ )) dτ
t0
for t ∈ [t0 − h,t1 + h1 ] = [t0 − h,t0 + h + h1 ]. It is easy to see that y is

differentiable at t = t1 also, from the aforementioned two expressions and
verify that y indeed satisfies the DE in question.
This solution y(t ) is called a continuation of the solution φ0 to the
interval [t0 − h,t1 + h1 ]. Now repeating this process at the new end point
t1 + h1 , we obtain a solution on [t0 − h,t1 + h1 + h2 ]. In this manner, we
may get longer intervals [t0 − h,tn + hn ]. Unfortunately, this still may not
lead to the maximum interval of existence.
In Example 4.1.4, the function f is not Lipschitz in R. When f is
not globally Lipschitz, the bounds on Picard’s iterates may become larger
and larger, thus reducing the interval of existence. This is really due to bad
non-linearity even though the function is very smooth as in Example 4.6.1.
If f is Lipschitz globally, then we get the existence in the entire interval
of definition as in the following theorem. The proof will follow along the
same lines as in Picard’s existence theorem and we leave the details as an
exercise to the reader. In fact, because f is defined on the entire real line
in the second variable, one need not check the validity of Picard’s iterates
as they are always defined.
Theorem 4.5.2
Let f (t, y) be a bounded continuous function defined in the unbounded

domain R = {(t, y) : a < t < b, − ∞ < y < ∞}. Let f be Lipschitz
in y on R. Then, a solution y of ẏ = f (t, y), y(t0 ) = y0 , t0 ∈ (a, b) is
defined on the entire open interval a < t < b. In particular, if a = −∞,
and b = +∞, then y is defined for all t in R.
Theorem 4.5.3
Suppose f : D → R is bounded and y is the solution of IVP (4.1.1) in

some interval (a, b) containing t0 . If b < ∞, then, y(b−) = lim y(t )
t→b−
exists. If (b, y(b−)) ∈ D, then, the solution y may be continued to
an interval (a, b̄] with b̄ > b. Similar statements hold at the left end
point a.
Proof: Let M be the bound on | f | on D and t0 < t1 < t2 < b. Then,

Z t2
|y(t2 ) − y(t1 )| ≤ | f (s, y(s))|ds ≤ M|t2 − t1 |.
t1
Thus, if t1 ,t2 → b from the left, it follows that

|y(t2 ) − y(t1 )| → 0
which is the Cauchy criterion for the existence of the above mentioned
limit. Since (b, y(b−)) ∈ D, we can now consider the IVP with the initial
condition at b as y(b−) and the solution can be continued beyond b as
asserted in the theorem.
4.5.2 Maximal interval of existence

Consider IVP (4.1.1). Assume the existence of a unique solution in a
neighbourhood t0 . Call such a neighbourhood, an interval of existence.
Suppose I1 and I2 are intervals of existence containing t0 , then their union
is an interval of existence (why?).
Definition 4.5.4
Let J be the union of all possible intervals of unique existence of

(4.1.1). Then, J is an interval and, is called the maximal interval of
existence. More precisely, let {Iα } be the collection of all intervals of
unique
[ existence, containing t0 . This collection is non-empty and
J = Iα .
α
Indeed, we can a define a unique solution y in J as follows: for any t ∈ J, t

is in some interval I of existence and y(t ) is the value given by the solution
in I. This is well defined by the uniqueness.
Proposition 4.5.5
The maximal interval J of existence is an open interval (α, β ), where

α can be −∞ and β can be +∞.
Proof: If not, suppose J = (α, β ]. In this case, β < ∞. Then, one can
consider the IVP for the same ODE with initial condition at β , to get
a solution in [β , β + h] for some h > 0. This will produce a solution in
(α, β + h] contradicting the maximality of J. Similar contradiction can be
arrived at if α is a point in J. .
Theorem 4.5.6
Let J = (α, β ) be the maximal interval of existence and y be the

solution to the IVP in J. Assume β < ∞. If K is any compact subset
of R such that [t0 , β ] × K ⊂ D, then there exists a t1 ∈ (α, β ) such
that y(t1 ) does not belong to K. Similar statements hold at the end
point α.
We infer the following from the conclusion of the theorem. Only one of
the following statements is true:
• If y(β −) = lim y(t ) exists, then (β , y(β −)) ∈ D̄ \ D, the boundary
t→β −
of D.
• The solution y becomes unbounded near β , that is, given any large
positive number C, there exists t1 < β such that |y(t1 )| ≥ C.
The following examples illustrate both these situations.
The proof of the theorem follows immediately from Theorem 4.5.3.
Example 4.5.7
1
Consider the equation ẏ = .
ty
1
Here, the function f (t, y) = is defined in the entire (t, y)-plane, except
ty
the t-axis and y-axis. We will consider the IVP in the first quadrant in
the (t, y)-plane: y(t0 ) = y0 where both t0 , y0 are positive. The solution
is given by y = [2 log(t/t0 ) + y20 ]1/2 . Therefore, the maximal interval of
2
existence is (α, ∞), where α = t0 e−y0 /2 and as t → α−, y(t ) → 0 with
(α, 0) belonging to the boundary of the domain in question.
Example 4.5.8
1
Consider the equation ẏ = .
t +y
In this case, we take the domain as {(t, y) : t + y > 0} and impose the initial
condition as y(t0 ) = y0 with t0 + y0 > 0. By introducing a new variable
eu(t )
u(t ) = t + y(t ), we see that the solution is implicitly given by =
1 + u(t )
e0y
et . We notice that the maximal interval of existence in this case
1 + t0 + y0
is given by (α, ∞), where α = t0 + log(1 + t0 + y0 ) − (t0 + y0 ) < t0 . Again,
it is not hard to see that as t → α−, (t, y(t )) approaches the boundary of
the domain in question, that is, t + y(t ) → 0.
Example 4.5.9
π
Consider the IVP: ẏ = (1 + y2 ), y(0) = 0.
2
We now see that the solution y(t ) = tan( π2 t ) cannot be extended beyond
the interval (−1, 1). Note that if we take any rectangle {(t, y) : |t| ≤ a, |y| ≤
b} around the origin (0, 0), then as in the local existence theorem, we
get the existence of a unique solution in an interval [−h, h], where h =
min(a, π2 1+bb2 ), which is always less than π1 .
4.6 Existence and Uniqueness of a System of

Equations
In this section, we study the existence and uniqueness of the solution of a
system of differential equations of the form:
ẋ1 = f1 (t, x1 , x2 , · · · , xn )
ẋ2 = f2 (t, x1 , x2 , · · · , xn )
············
ẋn = fn (t, x1 , x2 , · · · , xn )
with initial conditions given by:

x1 (t0 ) = x01 ; x2 (t0 ) = x02 , . . . , xn (t0 ) = x0n ,
where, for i = 1, 2, . . . , n, fi : R × Rn → Rn is possibly a nonlinear
function which is assumed to be continuous on Rn+1 . Put
x(t ) = [x1 (t ), x2 (t ), · · · , xn (t )]T ,
f(t, x) = [ f1 (t, x1 (t ), · · · , xn (t )), f2 (t, x1 (t ), · · · , xn (t )), · · · , fn (t, x1 (t ),
· · · , xn (t ))]T
and
x0 = [x01 , x02 , · · · , x0n ]T .
Here, the superscript T denotes the transpose of a matrix/vector. Using
these notations, the aforementioned system of differential equations can
be written in the following compact form
ẋ = f(t, x), x(t0 ) = x0 . (4.6.1)
An nth order scalar differential equation can be reduced into a system of
n first order differential equations of this form. Such a representation is
known as the state-space representation of the system.
Example 4.6.1
Consider the differential equation

y(n) (t ) = g t, y(t ), y(1) (t ), · · · , y(n−1) (t )
with initial conditions

y(t0 ) = x01 , y(1) (t0 ) = x02 , · · · , y(n−1) (t0 ) = x0n .
Define x1 , x2 , · · · , xn as follows:
x1 (t ) = y(t ),
x2 (t ) = y(1) (t ) = ẋ1 (t )
x3 (t ) = y(2) (t ) = ẋ2 (t )
············
xn (t ) = y(n−1) (t ) = ẋn−1 (t ).
Then, the given nth order differential equation can be written as

ẋ1 (t ) = x2 (t )
ẋ2 (t ) = x3 (t )
············
ẋn−1 (t ) = xn (t )
ẋn (t ) = g(t, x1 (t ), x2 (t ), . . . , xn (t ))
and the initial conditions reduce to
x1 (t0 ) = x01 , x2 (t0 ) = x02 , · · · , xn (t0 ) = x0n .
In vector notation, we get
ẋ(t ) = f (t, x(t )) , x(t0 ) = x0 ,
where
x0 = [x01 , x02 , . . . , x0n ]T
x(t ) = [x1 (t ), x2 (t ), . . . , xn (t )]T
f(t, x) = [ f1 (t, x), f2 (t, x), . . . , fn (t, x)]T

with
f1 (t, x) = x2 (t ), f2 (t, x) = x3 (t ), . . . fn−1 (t, x) = xn (t ), fn (t, x) = g(t, x).
4.6.1 Existence and uniqueness results for systems

Let D be an open connected set in R × Rn , whose points are denoted by
(t, x) with t ∈ R and x ∈ Rn . Assume f : D → Rn is a continuous function.
The results on existence and uniqueness of solutions to systems (4.6.1) are
very similar to the ones discussed in Section 4.3 for a single equation. Here
we restrict ourselves to a description of the method of Picard’s iterates.

The only difference is that Picard’s iterates are now vectors and hence we
need to take the vector norm in Rn instead of absolute values. Choose
positive real numbers a and b such that
R = {(t, x) ∈ R × Rn : |t − t0 | ≤ a, |x − x0 | ≤ b}
is a subset of D. Let us recall the definition of the Lipschitz condition of
the vector f. The vector function f is said to be Lipschitz continuous in x
on R if there exists a constant α > 0 such that
|f(t, x1 ) − f(t, x2 )| ≤ α |x1 − x2 |, for all (t, x1 ), (t, x2 ) ∈ R.
n
Note that we are using the norm for any vector x ∈ Rn as |x| = ∑ |xi |.
i=1
Now suppose that the components fi of f are Lipschitz with Lipschitz
constants αi in x on R, that is,
| fi (t, x1 ) − fi (t, x2 )| ≤ αi |x1 − x2 |, for all (t, x1 ), (t, x2 ) ∈ R.
Then, it is easy to see that f is Lipschitz in x on R with Lipschitz constant
α ≤ |α1 | + · · · + |αn |. Conversely, if f is Lipschitz with Lipschitz constant
α, then each component fi is also Lipschitz whose Lipschitz constant is
bounded by α.
Now consider IVP (4.6.1). This system is equivalent to the following
system of integral equations
Z t
x(t ) = x0 + f(τ, x(τ )) dτ,
t0
or, component-wise,
Z t
xi (t ) = x0i + fi (τ, x(τ )) dτ
t0
for i = 1, 2, . . . , n. Now define Picard’s iterates as

Z t
xn+1 (t ) = x0 + f(τ, xn (τ )) dτ, n = 0, 1, 2, · · · .
t0
Following exactly the procedure used for a single equation, see Theorem
4.3.1, we can show that this Picard’s iterates converge uniformly to the
unique solution of the IVP for any arbitrary x0 ∈ Rn , in some interval

[t0 − h,t0 + h] for some h > 0. Thus, we have the following theorem.
Theorem 4.6.2
Assume that f is continuous in R and each fi is Lipschitz with respect

to the space variable x on R. Then, there exists a unique solution to
IVP (4.6.1) in an interval |t − t0 | ≤ h, for some h > 0.
The interval of existence obtained from the theorem need not be the best
possible interval. One can also prove the results on continuation of
solutions to a larger interval as in the case of a single equation in a similar
fashion. We can also introduce the maximal interval of existence and
eventually obtain global solutions under the assumption that f is a
continuous bounded function and is globally Lipschitz with respect to the
x variable in the domain of definition.
Example 4.6.3
Consider the nonlinear system
ẋ1 = 2 cos(x2 ), ẋ2 = 3 sin(x1 )

x1 (0) = a, x2 (0) = b.

T T 2 cos x2
Let x = x1 , x2 , x̃ = x̃1 , x̃2 , f(t, x) = . Then
3 sin x1
" #
2 cos x2 −2 cos x˜2
|f(t, x) − f(t, x̃)| =

3 sin x1 −3 sin x˜1
≤ 3|x − x̃|.
Thus, f(t, x) is globally Lipschitz continuous with Lipschitz constant less
than or equal to 3. Hence, by the existence and uniqueness theorem, there
exists a unique solution for the given differential system around the given
initial data.
Let A = [ai j ] be a constant n × n matrix. Then, f(t, x) = Ax is
obviously Lipschitz continuous with Lipschitz constant α = kAk. Thus,
the linear system with constant coefficients ẋ = Ax, x(t0 ) = x0 has a
unique solution, and the solution is global. A detailed study of the system
will be carried out in Chapter 5. More generally, we can consider the

non-autonomous system
ẋ(t ) = A(t ) x
.
x(t0 ) = x0 ,
where A(t ) is a matrix valued continuous function defined on a compact

interval. Then, the linear function f(t, x) = A(t )x will satisfy the required
conditions globally, and hence, we get the unique solution to the system.
The compactness of the interval may be removed if all the components of
A(t ) are bounded functions of t.
4.7 Exercises
1. Discuss the existence and uniqueness of the solution of the following
IVPs
(a) ẏ = 2t y, y(t0 ) = y0 , t0 6= 0.
(b) ẏ = (cott )y, y(1) = 0.
2. Discuss the existence and uniqueness of the following systems

−g k
(a) ẋ1 = x2 , ẋ2 = sin x1 − x2 , x1 (0) = 0, x2 (0) = 0.
l m
(b) x˙1 = x2 − x1 (x1 + x2 ), x˙2 = −x1 − x2 (x12 + x22 ) x1 (0) = 0 =
2 2
x2 (0).
3. Discuss the solvability of the differential equation in Example 4.1.2

with a different initial condition y(0) = y0 > 0. (Hint: y(t ) = y0 − t
is a valid solution only for the interval (−∞, y0 )).
4. Apply Picard’s iterates to obtain the solution of IVP

dy
= 4ty, y(0) = 3.
dt
5. Discuss the existence and uniqueness of solutions of the following

IVP
1
(i) ẏ = , y(1) = 1
2y
1
(ii) ẏ = , y(0) = 0
y
(iii) ẏ = |y|1/2 , y(0) = 0
6. By suitable change of variables, transform the nth order linear non-

homogeneous equation
y(n) (t ) + a1 (t )y(n−1) (t ) + · · · + an−1 (t )y(1) + an (t )y(t ) = b(t )

(4.7.1)
with initial condition y(i) (t0 ) = x0i+1 , i = 0, 1, 2, · · · , n − 1 to the
linear system (with appropriate A and f)
ẋ = A(t ) x(t ) + f(t ), x(t0 ) = x0 .
7. Obtain a state-space representation of the gyro system

I θ̈ − H w cos θ = b θ̇ − k θ
θ (0) = a0 , θ̇ (0) = a1 .
8. Reduce the following system of second order equations

m1 ẍ = k2 (y − x) + b(ẏ − ẋ) + k1 (u − x)
m2 ÿ = −k2 (y − x) − b(ẏ − ẋ)

to a first order system.
9. State the conditions under which the following differential equation

will have unique solutions
(i) The nth order nonlinear equation in Example 4.6.1.

(ii) The nth order linear non-homogeneous equation in Exercise (6)
above.
10. In the following exercises, choosing the appropriate domain of

definition of the given function and corresponding initial condition,
find the maximal interval (α, β ) of existence of the solution and
find its limits as t approaches α, β
1
(a) ẏ = .
1 + y2
1
(b) ẏ = .
1 − y2
sint
(c) ẏ = .
1 − y2
1
(d) ẏ = .
y(1 − y)
11. Prove the continuity of the solution of the equation given below, in
appropriate norm with respect to the initial data x0 and f
ẋ = f(t, x), x(t0 ) = x0 ,
assuming that f is continuous and Lipschitz continuous with respect
to x variable, in the domain of definition.
12. Consider the n-dimensional control system
ẋ = f(t, x(t ), u(t )), x(t0 ) = x0 ,
where the function f : R × Rn × Rm → Rn is continuous and
Lipschitz continuous with respect to x and u, where the continuous
function u(t ) is an external control input applied to the system.
Prove that the system has a unique solution for a given initial
condition x0 and a given control function u(t ). Also, prove the
following:
(a). Let x be the unique solution with initial state x0 and x̃ be the unique
solution with initial condition x̃0 , for a fixed control input u. Then
there exists K1 > 0 such that
||x − x̃|| ≤ K1 ||x0 − x̃0 ||.
(b). Let xu be the unique solution with a control u and xũ be the unique
solution with a control ũ for a fixed initial state x0 . Then prove that
there exists K2 > 0 such that
||xu − xũ || ≤ K2 ||u − ũ||.
Thus, we see that the solution varies continuously in the“Lipschitz norm”,

both with respect to the initial data and control data. If there is higher
order smoothness in the data f , we can get the corresponding higher order
smoothness in the solution.
4.8 Notes
This chapter deals with some important topics on existence and
uniqueness of a solution to an ODE. The significance of these topics are
explained through several examples so that a beginner starts appreciating
the topics. We have included three results on existence–Cauchy–Peano
existence theorem, existence using Picard’s iterates and existence using
fixed point theorem. The first one requires only the minimal assumption
of continuity, but uniqueness is not guaranteed. The other two results
require the assumption of Lipschitz continuity and the uniqueness is
guaranteed. Gronwall’s inequality is stated and proved, which in turn is
used to prove the uniqueness of solutions. Gronwall’s inequality is also
useful for comparison of different solutions with different coefficients
and/ or initial data. There are other types of uniqueness results; see for
example [AO12]. For general theory, there are many good books, see for
example [CL72, Sim91, SK07, Tay11, MU78, HSD04]. Continuous
dependence on the data is also discussed in detail. Also discussed is the
topic on continuation of solutions to larger intervals; this leads to the
concept of maximal interval of existence of a solution. In particular, the
conditions for global existence of a solution is dealt with. An application
of these results is seen in the proof of Perron’s theorem in Chapter 9. A
brief discussion on systems is also carried out.
5
Linear Systems and
Qualitative Analysis
5.1 General nth Order Equations and Linear Systems

In Chapter 2, we have studied first and second order regular linear ODE.
We have seen that the first order case can be completely resolved to get a
representation of the solution, whereas for the second order case, we do
not have a general method to get two linearly independent solutions.
However, we have seen that the solution space of linear second order
homogeneous equations is of two dimensions. A general nth order regular
linear ODE can be written in the form
dny d n−1 y dy
+ p 1 (t ) + · · · + pn−1 (t ) + pn (t )y = g(t ) (5.1.1)
dt n dt n−1 dt
with n conditions such as initial or boundary conditions. For example,
initial conditions can be of the form
y(0) = y0 , y(1) (0) = y1 , · · · , y(n−1) (0) = yn−1 .
If g(t ) ≡ 0, (5.1.1) is called homogeneous, otherwise it is
non-homogeneous. A class of singular nth order equations, called Euler
equations, consists of equations of the following type:
dny n−1 d
n−1 y dy
tn n
+ p 1 t n−1
+ · · · + pn−1t + pn y = 0,
dt dt dt
where p1 , · · · , pn are real constants. This equation can be reduced to a
constant coefficient equation by changing the independent variable t to τ
satisfying t dtd = dτ d
. However, caution should be exercised while seeking
a solution around t = 0.
Linear Systems and Qualitative Analysis 135
By introducing new variables:

d n−1 y
x1 = y, x2 = y(1) , . . . , xn = y(n−1) = ,
dt n−1
we can convert (5.1.1) into a system of n first order equations which having
a matrix representation as ẋ(t ) = A(t )x(t ) + g(t ), where
 
0 1 0 ··· 0
 0 0 1 ··· 0 
 
 
A = A(t ) =  ··· ··· ··· ··· ··· 

 0 0 0 ··· 1 
 
−pn (t ) −pn−1 (t ) · · · · · · −p1 (t )

and
x1 (t )
   
0
 x (t )  0
 2  




 . 
  . 
x(t ) =   , g(t ) =  .
.
 .  



xn−1 (t )
   0 
xn g(t )
This motivates us to consider the following general system

ẋ(t ) = A(t )x(t ) + g(t ), x(0) = x0 , (5.1.2)
where A(t ), is an n × n matrix whose elements are functions of t ∈ R
and g(t ) is a vector valued function of t ∈ R. If the elements of A and
g are continuous functions of t, then the standard existence theorem (see
Chapter 4) will give the uniqueness and existence of the solution x(t ) of
the IVP (5.1.2), in any finite interval R. See also Section 4.6.
Definition 5.1.1
If the coefficient matrix A depends on t, (5.1.2) is called a

non-autonomous system; otherwise, it is called an autonomous
system.
Indeed, the operator L = L(x, ẋ) = ẋ−Ax is linear in x and ẋ. We study the
representation of the solution of (5.1.2) when A is a constant matrix and
g identically zero. This is more easily done in the case of an autonomous

system. In the case of a homogeneous, autonomous system, the solution
can be represented as
x(t ) = etA x0 , (5.1.3)
analogous to the one-dimensional case when A = a, is a scalar. Though
(5.1.3) gives the unique solution, it does not give us much information
about the trajectories. It is to be remarked that in practical applications,
the solution of the system may correspond to the motion of particles.
Hence, understanding the trajectories and their behaviour is of paramount
importance. So one of our major aims in this chapter is to get the
geometry behind (5.1.3) using the spectral analysis of A. Hence, this
chapter is also a precursor to Chapter 8 on the qualitative analysis of
nonlinear systems. We give a complete description in 2-dimensional
systems; it is more complicated in higher dimensional cases. Indeed, we
can obtain a good amount of information using the Jordan decomposition
of A. Again, the computation of etA , in general, is not that easy and
spectral decomposition helps us to compute etA . We will also study the
non-homogeneous system and represent the solution using variation of
parameters. We will further introduce the concept of a transition matrix
for representation of solutions to non-autonomous systems.
5.2 Autonomous Homogeneous Systems

Given a real matrix A of order n, we may think it as a linear operator from
Rn → Rn defined by x 7→ Ax whose operator norm is given by ||A|| =
sup |Ax|, where | · | is the Euclidean norm (or modulus) in Rn . Consider
|x|=1
m
Ak
the partial sums of operators Am = ∑ tk . It is easy to see that ||Am || ≤
k=0 k!
m
kAkk
e||A|| |t| , for all t and for m > l, ||Am −Al || ≤ ∑ t k → 0 as l, m → ∞.
k =l +1 k!
Hence, the sequence of operators {Am } converges to a linear operator from
Rn → Rn , whose matrix representation is denoted by etA . In other words,
∞
Ak
etA = ∑ tk .
k =0 k!
Consider the linear homogeneous autonomous system

ẋ = Ax(t ), x(t0 ) = x0 (5.2.1)
With the aid of the matrix exponential, it is now straightforward to verify
that a unique solution for the IVP (5.2.1), is given by x(t ) = e(t−t0 )A x0 .
The uniqueness follows from a similar argument for a single equation; see
Chapter 3. If y(t ) is any other solution of (5.2.1), then by considering the
expression e−(t−t0 )A y(t )), a differentiation yields uniqueness.
Thus, we have the unique representation of the solution of IVP (5.2.1)
as x(t ) = e(t−t0 )A x0 . Next, we use the spectral decomposition of the matrix
A to elicit more information about the behaviour of the solution, especially
of the various orbits or trajectories.
5.2.1 Computation of etA in special cases

• If the matrix A is diagonal denoted by A = diag(λ1 , . . . , λn ), then it
is an easy exercise to verify that
eA = diag(eλ1 , · · · , eλ1 ).
Note that saying that the matrix is diagonal is equivalent to the fact
that the system (5.2.1) is a decoupled system in the sense that we
have n separate equations for each component xi (t ) for all 1 ≤ i ≤ n.
Hence, each equation can be solved to get xi (t ) = eλi (t−t0 ) x0i , where
x0i is the ith component of the initial value x0 . Thus, x(t ) can be
represented as
x(t ) = diag(eλ1 (t−t0 ) , · · · , eλn (t−t0 ) )x0 = e(t−t0 )A x0 .
• We have eA+B = eA .eB , if A and B commute. In particular, eA is

always invertible and (eA )−1 = e−A . In general, eA+B 6= eA .eB for
non-commutative matrices A and B.
• (Similarity of matrices) Suppose B = P−1 AP for some invertible P,
Then, eB = P−1 eA P. This follows easily from the observation
B2 = (P−1 AP) (P−1 AP) = P−1 A2 P and hence, by induction,
Bk = P−1 Ak P for any positive integer k.
Using similarity transformation, we can convert the system corresponding

to A to a system corresponding to B as follows. Put y = P−1 x. Since P is
a constant matrix, y satisfies the system
ẏ(t ) = By(t ), y(t0 ) = y0 = P−1 x0 .

This leads to the concept of linear equivalence of two linear systems.
Definition 5.2.1
A linear system of ODE

ẋ = Ax (5.2.2)
is said to be linearly equivalent to a system
ẏ = By (5.2.3)
if A and B are similar, that is, there exists an invertible matrix P such
that B = P−1 AP.

The idea is to find B so that (5.2.3) is much simpler to analyze than (5.2.2).
Such a B is provided by the Jordan decomposition of A. We will also
see that there is no change in the qualitative behaviour of solutions of
(5.2.2) and (5.2.3); in particular, the nature of equilibrium points (which
are the same for both (5.2.2) and (5.2.3)) is unaltered. We also remark
that the transformation x → y = P−1 x or y → x = Py are linear operators
and these transformations are essentially a coordinate change or change
of basis. In effect, we view the two systems (5.2.2) and (5.2.3) in two
different coordinate systems. We will see examples later.
In a particular case when A is diagonalizable, that is,
−1
B = diag(λ1 , λ2 , . . . , λn ), then, eA = ePBP = PeB P−1 = Pdiag(eλ1 , eλ2 ,
· · · , eλn )P−1 . In this case, the system (5.2.2), which is a coupled system
to begin with, is converted to an uncoupled system (5.2.3) which can be
solved immediately. Hence, we also get the solution to the uncoupled
system (5.2.1) as
x(t ) = e(t−t0 )A x0 = Pdiag(eλ1 (t−t0 ) , eλ2 (t−t0 ) , · · · , eλn (t−t0 ) )P−1 x0 .

(5.2.4)
So, we ask a natural question. Is every matrix diagonalizable? In Chapter
2, we have seen that this is not always true. In general, diagonalizability
is not possible even in the two-dimensional case. But the two-dimensional
case is not that difficult, and we will do a complete analysis separately in
this case to bring out the aspects of diagonalizability. The general case is
more involved and Jordan decomposition is the best possible reduction.
We already know that diagonalizability is equivalent to the existence
of n independent eigenvectors. Note that we do not demand n distinct
eigenvalues. However, the existence of n distinct eigenvalues implies the
existence of n independent eigenvectors and hence, diagonalizability (it
is a sufficient condition) is guaranteed. In general, if the algebraic and
geometric multiplicities are equal for all the eigenvalues of a matrix, then
that matrix is diagonalizable.
1 0
The reader can show that the matrix is not diagonalizable.
1 1
5.3 Two-dimensional Systems

In this section, we consider the 2 × 2 system, that is A is a 2 × 2 matrix.
To begin with, we also assume that A is non-singular. Our aim is to do the
phase plane analysis and classify various types of equilibrium points. We
have the following theorem.
Theorem 5.3.1
Every 2 × 2 system ẋ = Ax is linearly equivalent to only one of the

systems ẏ = Bi y, i = 1, 2, 3, where

λ 0 λ 1 a −b
B1 = , B2 = , B3 = , (5.3.1)
0 µ 0 λ b a
where λ , µ, a, b are real numbers and b 6= 0.
Proof: If the two eigenvalues λ and µ of A are real and distinct, then
the corresponding eigenvectors x and y are linearly independent. If
λ = µ is the double real eigenvalue of A and the corresponding
eigenspace is two-dimensional, then also we obtain two linearly
independent eigenvectors
xand y. In either of these cases, we obtain,
x1 y1 λ 0
with P = [x y] = , AP = Pdiag(λ , µ ) = P . From now
x2 y2 0 µ
onwards, note that we may represent matrices in the form [x y], where x
and y are the column vectors of the matrix in question. Since x and y are
independent, the matrix P is invertible, and we get the first form B1 .
The second case is more delicate, where λ = µ is the double real

eigenvalue of A whose eigenspace is one-dimensional, say spanned by an
eigenvector x. In this case, we need one more vector linearly independent
of x, to form a basis for R2 . This can come from the notion of
generalized eigenvector. We have ker(A − λ I) ⊂ ker(A − λ I)2 ; the latter
is two-dimensional space since (A − λ I)2 = 0 as the characteristic
equation of A is det(A − λ I) = 0. Hence, ker(A − λ I)2 = R2 . Thus,
choose a vector v ∈ / ker(A − λ I) and v ∈ ker(A − λ I)2 . It is not difficult
to verify that the vectors u = (A − λ I)v and v are linearly independent
and therefore, the matrix P = [u v] is non-singular. Further, AP = PB2 .
In the third case, suppose A has complex eigenvalues. Since A is real,
these eigenvalues are complex conjugates of each other. Let λ = a + ib
and µ = a − ib be the complex eigenvalues, with b 6= 0. Let w = u + iv
be an eigenvector corresponding to λ , that is, Aw = λ w. Here u and v are
real vectors. Since A is real, we have Aw̄ = λ̄ w̄ = µ w̄, where ¯ denotes
the complex conjugate. It is not difficult to show that the real vectors u
and v are independent and AP = PB3 with P = [v u]. The details are
left as an exercise to the reader.
5.3.1 Computation of eB j and etB j

Since B1 is diagonal, we have already seen that eB1 = diag(eλ , eµ ) and
etB1 = diag(etλ , etµ ). In the case of B2 and B3 , we leave it to the reader to
check that
" # " #
B2 λ
1 1 tB2 tλ
1 t
e =e , e =e (5.3.2)
0 1 0 1
and
" # " #
cos b − sin b cos (tb ) − sin (tb )
eB3 = e a
, etB2 = eta . (5.3.3)
sin b cos b sin(tb) cos(tb)
Thus, if we consider the system (5.2.3) with B = Bi , i = 1, 2, 3, the
corresponding solutions to the IVP with y(0) = y0 can be, respectively,
written as
y(t ) = diag(etλ , etµ )y0 . (5.3.4)
Equivalently, y1 (t ) = etλ y01 , y2 (t ) = etµ y02 , for i = 1;

tλ 1 t
y(t ) = e y . (5.3.5)
0 1 0
Equivalently, y1 (t ) = etλ (y01 + ty02 ), y2 (t ) = etλ y02 , for i = 2; and for

i = 3,
" #
ta
cos(tb) − sin(tb)
y(t ) = e y0 . (5.3.6)
sin(tb) cos(tb)
Equivalently,
y1 (t ) = eta (y01 cos(tb) − y02 sin(tb)), y2 (t ) = eta (y01 sin(tb)
+y02 cos(tb)).
When we go back to the original system corresponding to A, we may write
the solution to the IVP (5.2.1), respectively as
x(t ) = etA x0 = PetB1 P−1 x0 = Pdiag(etλ , etµ )P−1 x0 for i = 1;

(5.3.7)

1 t −1
x(t ) = etλ P P x0 for i = 2; (5.3.8)
0 1
and for i = 3,
" #
cos(tb) − sin(tb)
x(t ) = e Pta
P−1 x0 . (5.3.9)
sin(tb) cos(tb)
It is interesting to observe that the general solution of any 2 × 2 system is
a linear combination of the product of the elementary functions; namely,
polynomials, exponentials and trigonometric functions. We will see later
that this statement continues to hold for any general n dimensional system
too. We will now have a few examples.
Example 5.3.2
Consider the system

ẋ1 = −x1 − 3x2 , ẋ2 = 2x2 . (5.3.10)

−1 −3
The matrix A = has eigenvalues λ1 = −1, λ2 = 2 with
0 2

1 −1 1 −1
corresponding eigenvectors and . Hence, P = and
0 1 0 1

1 1 −1 0
P−1 = . Further, B = P−1 AP = . This implies that a
0 1 0 2
linearly equivalent system is given by ẏ1 = −y1 , ẏ2 = y2 , which is a
diagonal system.
Example 5.3.3
Consider the system

ẋ1 = 7x1 − 4x2 , ẋ2 = x1 + 3x2 (5.3.11)

7 −4
The coefficient matrix A = has a double eigenvalue λ = 5 with
1 3

2
a corresponding eigenvector x = ; any other eigenvector is a multiple
1
of this
vector. Now choose any vector which is independent of x, say
1 2
v= , then u = (A−5I )v = is a generalized eigenvector for λ = 5.
0 1

2 1 −1 0 1
Thus, the matrix P is given by P = and P = . The
1 0 1 −2
reader
is advised to verify these facts and see that A is equivalent to B2 =
5 1
. The final solution is given by
0 5

5t 1 t −1
x(t ) = e P P x0 .
0 1
Example 5.3.4
In this example, we consider a second order equation

z̈ + 2ż + 3z = 0. (5.3.12)
By introducing the variables x1 = z, x2 = ż, the equation reduces to a

system of the form ẋ = Ax, where A is given by

0 1
A= . Observe that A has the complex eigenvalues given by
−3 −2
√ √ √
λ = −1 + i 2 and µ = −1 − i 2. Thus, a = −1 and b = 2√in (5.3.1).
x corresponding
A complex eigenvector eigenvalue −1
to + i 2 can
be
1 1 0 0 1
computed as x = = + √ . Thus, P = √ and
λ −1 i 2 2 −1

1 1
P−1 = √12 √ . Finally, the solution to the system is given by
2 0
" √ √ #
cos ( 2t ) − sin ( 2t ) −1
x(t ) = e−t P √ √ P x0 .
cos( 2t ) sin( 2t )
and the solution z to (5.3.12) is the first component x1 (t ) of x(t ).
5.4 Stability Analysis

We begin this section with a description of several notions like
equilibrium points, phase plane and phase portrait; the concept of a
dynamical system, a flow and a vector field. Further classification is done
among the equilibrium points, namely stable and unstable nodes; saddles;
stable and unstable foci and centres. These notions are also relevant to
nonlinear systems. In fact, these notions are at the heart of understanding
ODE, especially in the context of applications. We restrict the discussion
to the 2 × 2 systems for describing the aforementioned notions. We
remark that 2 × 2 linear systems are particularly simple to analyse as
there are at most two eigenvalues and corresponding eigenvectors to deal
with. Nonetheless, the features that we present here, are generally there
for higher dimensional systems also.
5.4.1 Phase plane and phase portrait

Consider a general n dimensional autonomous system
ẋ = f(x), x(t0 ) = x0 , (5.4.1)
where x0 ∈ Rn and in general, f : Rn → Rn is a smooth map; f (x) = Ax,
where A is an n × n matrix, will give an n dimensional linear system.
In mechanics, the motion of a particle is described by Newton’s second
law and may be formulated as a second order ODE: mẍ + F (x, ẋ) = 0,
where m is the mass of the particle. The position x(t ) and its velocity
ẋ(t ) are called the phases of the system under consideration. We adopt
same terminology for a first order system (5.4.1) and call the components
x1 , x2 , · · · , xn of the solution x as phases of (5.4.1). Indeed, if Newton’s
law transformed to a first order system, we get, x1 = x and x2 = ẋ. If
x is a solution of (5.4.1) in some interval I ⊂ R, containing t0 , the set
{x(t ) ∈ Rn : t ∈ I} is called a trajectory or an orbit passing through x0 . In
this scenario, Rn is referred to as the phase space (phase plane if n = 2
and phase line if n = 1) of (5.4.1). Thus, the phase space contains all the
trajectories of (5.4.1) passing through different points of the phase space.
Description of all the trajectories of (5.4.1) in the phase space (or phase
plane or phase line) is referred to as the phase portrait of (5.4.1) and the
analysis involved in this process may be called as the phase space (or phase
plane or phase line) analysis of (5.4.1).
We now give an example.
Example 5.4.1
Consider the decoupled system given by
ẋ1 = −x1 , ẋ2 = 2x2 , x1 (0) = x01 , x2 (0) = x02 . (5.4.2)

−t
e 0 x
The solution is given by x(t ) = 2t x0 , where x0 = 01 . Even
0 e x02
though it is a system of two separate equations (that is, uncoupled), we
would like to consider it as a system. From a dynamical system point of
view, x(t ) can be thought of as the motion of a particle in the x1 x2 -plane
which is the phase plane for (5.4.2), as t varies. At t = 0, the particle is at
x0 and the particle moves to x(t ) at time t. Figure 5.1 gives the various
trajectories of the particle under motion, where the arrow indicates the
direction of motion for positive (increasing) time and different initial
positions.
We will see later that what is observed in Fig. 5.1 will be the feature
of any system having eigenvalues with different signs, except that the
direction of arrows and coordinates might change. The reader can see the
phase portrait of Example 5.3.1.
x2
x1
Fig. 5.1 Saddle point equilibrium
5.4.2 Dynamical system, flow, vector fields

Dynamical System: The dynamical system of (5.4.1) is a mapping Φ :
R × Rn → Rn given by the solution x(t ) = x(t, x0 ), that is, Φ(t, x0 ) =
x(t, x0 ). Geometrically, the dynamical system describes the motion of the
points in the phase space along the solution curves. For fixed x0 , Φ(·, x0 )
represents the solution passing through the point x0 .
Flow: Given the dynamical system Φ as described earlier for any fixed t,
introduce the map φt : Rn → Rn by φt (x0 ) = Φ(t + t0 , x0 ) = x(t + t0 , x0 ).
Then, the collection G = {φt : t ∈ R} is called the flow of the system
(5.4.1).
The notion of a flow gives an entirely different perspective which is
quite useful in applications. For example, when we watch fluid flowing,
we normally do not see the trajectory lines (stream lines); rather, we see
a body of fluid moving. This is the view incorporated in the concept of
flow. More precisely, we would like to see a neighbourhood, say U of x0
moving with time. Thus, φt (U ) is the position of all particles at time t,
whose initial position is in U. The collection G satisfies
φ0 (x) = x, φs (φt (x)) = φs+t (x), φ−t (φt (x)) = φt (φ−t (x)) = x,
(5.4.3)
for all x ∈ Rn . The first property comes from the initial condition,
whereas the second property follows from the uniqueness of a solution to
the IVP. This is known as semigroup property. In other words, the flow is
a semigroup. The last property is a consequence of the second one, and
asserts the existence of inverse and hence, the flow G has properties of a
group. Thus, the flow can be visualized as a group action. We remark that
this notion can even be generalized to PDE. In general, every system may
not have the group structure; for example, the heat equation does not
produce a group structure, but only a semigroup structure. However, the
wave equation does produce a group structure.
In the case of an autonomous linear system, that is, if f(x) = Ax, the
dynamical system and flow are, respectively, given by Φ(t, x0 ) = e(t−t0 )A
x0 and φt = etA .
Vector Field: A point x ∈ Rn can also be viewed as a vector, namely,

the position vector. When we view it as a vector, it has got magnitude
as well as direction and it is irrelevant where the initial position was. In
other words, any vector having the same magnitude and direction as that
of x, is considered as the same vector. Both these views are helpful in
understanding the system. For example, consider the motion of a particle
in three-dimensional space. It is natural to view the position as a point,
whereas it is better to view the velocity, which is also a point in R3 , as a
vector placed at the position.
x2
x1
Fig. 5.2 Vector field

Now consider the system (5.4.1). Here f(x) is the given information or
data which we view as a vector located at the point x and thus producing
a vector at each point in a domain Ω ⊂ Rn , where the ODE is described.
This we name as a vector field. Thus, a vector field X in a domain Ω ⊂ Rn
is a mapping such that a vector X (x) ∈ Rn is associated with every x ∈ Ω.
We say that the vector field is smooth if this mapping is smooth. The vector
field associated with the system in Example 5.4.1 is represented in Fig. 5.2.
We may ask a question: what is the connection between the vector field
and the solution of a system? It is easy to see that the tangent to the solution
curves gives the vector field and conversely, any curve whose tangents are
from the vector field will be the solution to the system.
5.4.3 Equilibrium points and stability

The analysis of a general 2 × 2 linear system is done by changing it to a
linearly equivalent system via Theorem 5.3.1. Thus, there are essentially
three different types of matrices to be considered for the analysis. We
prefer to do the analysis by taking an example of each of such matrices,
as there is no change in the qualitative behaviour of solutions in the
general case. We begin with a definition.
Definition 5.4.2
For a general autonomous system ẋ = f(x), a point x̄ is called an

equilibrium point, if f(x̄) = 0.
We would like to state an important point at this stage. Note that x(t ) = x̄
for all t is a solution to the system ẋ = f(x) if x̄ is an equilibrium point.
This means that if the motion starts from the equilibrium point, the
trajectory will remain there forever. In physical problems, especially in
mechanics, it represents the steady state solution, the one that does not
change with time. Hence, we not only view an equilibrium point as a
point, but also as a solution to the system.
Since equilibrium point is a steady state solution, we would be
interested in the behaviour of solutions which start close to an
equilibrium point. This is very important since when we make small
errors, we would like to know whether the trajectory also remains in a
neighbourhood of the equilibrium point. This is the motivation behind
stability analysis of equilibrium points.
For a linear system, that is, f(x) = Ax, observe that x̄ = 0 is always
an equilibrium point and in addition, if A is invertible, this is the only
equilibrium point. In general, the set of all equilibrium points is given by
ker(A). In this section, we will characterize various types of equilibrium
points for a 2 × 2 system. We will do this via various examples. Stability
of nonlinear systems will be studied in Chapter 8.
Saddle Point, Node, Focus and Center: Note that in Example 5.4.1, the
first component x1 (t ) → 0, whereas the second component x2 (t ) → ±∞ as
t → ∞ depending on the initial condition. This equilibrium point is called a
saddle point and it is classified as unstable. In fact, this will be the feature
for any system having two real non-zero eigenvalues with opposite sign
except that the trajectories will remain in four parts separated by a different
set of coordinate axes given by the eigenvectors, and which need not be
the standard coordinate axes (see Example 5.3.2).
Example 5.4.3

λ 0
Let A = , λ > 0.
0 λ
x2 x2
x1 x1
(a) (b)
Fig. 5.3 (a) Unstable node, (b) Stable node
In this case, both trajectories x j (t ) → ∞ as t → ∞. The phase portrait is

depicted in Fig. 5.3(a). The direction of the arrows will change if we

−λ 0
consider A = with λ > 0 and both the trajectories will now
0 −λ
approach 0 as t → ∞. See Fig. 5.3(b). This equilibrium in both the cases
is referred as a node. In the first case, we have an unstable node and the
second case corresponds to a stable node.
Example 5.4.4

2 0
Take A = .
0 1
This has two distinct eigenvalues, 2 and 1, having the same sign and the
solution is given by x1 (t ) = x01 e2t , x2 (t ) = x02 et . Eliminating t, we will
get x1 = cx22 (see Fig. 5.4(b)) and an unstable node. Again the arrows will
get reversed if we take negative numbers in A and we get a stable node
(see Fig. 5.4(a)). The situation is exactly the same if we replace 2 and 1 by
any two real numbers with the same sign. More generally, the behaviour
of the trajectories will remain the same for any system having two distinct
real eigenvalues of the same sign except that the trajectories will remain
in four parts separated by a different coordinate system.
x2 x2
x1 x1
(a) (b)
Fig. 5.4 (a) Stable node, (b) Unstable node
These two examples cover matrices of the form B1 in Theorem 5.3.1. Now,
we consider an example with a double real eigenvalue, but which is not
diagonalizable.
Example 5.4.5

λ 1
Now, consider A = .
0 λ
The solution is given by

x1 (t ) = (x01 + x02t )eλt , x2 (t ) = x02 eλt .
Note that both the solutions tend to ±∞ or 0 according as λ > 0 or λ <
0, respectively. For very large t, the coefficient factor t does not matter
in analysing the behaviour of the solution (why?). But near origin, the
shape may vary depending on the initial conditions; however, this will
not affect the stability. This equilibrium is again known as node and it is
stable/unstable according as λ < 0/λ > 0. The comments regarding the
change in direction as well as for any system linearly equivalent to this
system remain the same. A phase portrait is depicted as in Fig. 5.5.
x2 x2
x1 x1
(a) (b)
Fig. 5.5 (a) Stable node, (b) Unstable node
Now consider the case with complex eigenvalues.
Example 5.4.6

a −b
Let A =
b a

cos(bt ) − sin(bt )
The solution is x(t ) = eat x0 . Indeed, the sign of a
sin(bt ) cos(bt )
will determine the stability; the components of the matrix appearing in
the solution are periodic with the sign of b determining the orientation
of the rotation. Of course, we take b 6= 0 to get the complex (non-real)
eigenvalues.

cos(bt ) − sin(bt )
Case (i), a = 0: Note that the matrix C = shows
sin(bt ) cos(bt )
the periodic nature of the trajectories, rotating around the origin. In fact,
C is a rotation matrix with determinant 1. Thus, we have |x(t )| = |Cx0 | =
|x0 | for all t. In other words, x(t ) rotates around the origin along the circle
of radius |x0 | as t increases or decreases. The rotation will be clockwise
if b < 0 and it is counter-clockwise if b > 0. In this case, the equilibrium
point 0 is referred to as a center. See Fig. 5.6.
x2 x2
x1 x1
a = 0, b < 0 a = 0, b > 0
Fig. 5.6 Centre
Case (ii), a 6= 0: Here also the rotation matrix C acts the same way, but
the presence of eat changes amplitude making a spiral around the origin.
The spiral moves towards infinity as time increases if a > 0 and it tends to
the origin, if a < 0. The situation leads to four different cases and these
are depicted in Fig. 5.7 and Fig. 5.8. The equilibrium point in this case is
referred to as a focus. It is a stable/unstable focus depending on whether
a < 0/a > 0, respectively.
x2 x2
x1 x1
a < 0, b < 0 a < 0, b > 0
Fig. 5.7 Stable focus
x2 x2
x1 x1
a > 0, b < 0 a > 0, b > 0
Fig. 5.8 Unstable focus
So far, we have considered only cases of non-zero eigenvalues, that is,

when the determinant of A is non-zero. We now consider the degenerate
case, namely, det(A) = 0. The set of equilibrium points is ker(A), whose
dimension is one or two. If dim(ker(A)) = 2, then A is the zero matrix.
In this case, every point in the plane is an equilibrium point. We will now
describe two examples when dim(ker(A)) = 1 exhibiting different
behaviours.
Example 5.4.7

0 0
Take the matrix A = . Then, x1 (t ) = x01 , x2 (t ) = x02 e−2t .
0 −2
In this degenerate case, where one eigenvalue is zero, all the points on
the x1 -axis are equilibrium points. Figure 5.9 is self explanatory. However,
A has two linearly independent
note that since the eigenvalues are distinct,

0 1
eigenvectors. Now, consider A = . The double eigenvalue 0 has
0 0
geometric multiplicity one. The reader should work out the further details
and observe the different behaviour in this case compared to the previous
example.
x2
x1
Fig. 5.9 Degenerate case
We now summarize the various types of equilibrium points discussed so

far in the following definition.
Definition 5.4.8
Let the 2 × 2 matrix A be linearly equivalent to B. Then, the

equilibrium point 0 of the linear system (5.2.2) is said to be a

λ 0
1. saddle point if B = with λ µ < 0;
0 µ

λ 0 λ 1
2. node if B = with λ µ > 0 or B = , λ 6= 0;
0 µ 0 λ

a −b
3. focus if B = with a 6= 0, b 6= 0;
b a

0 −b
4. center if B = , b 6= 0.
b 0
A stable node or a focus is also called a sink and an unstable node or focus
is called a source. If det(A) = 0, then the origin is called a degenerate
equilibrium point.
Bifurcation Diagram: We now describe an interesting bifurcation

diagram in the αδ plane, where α = tr(A) = λ + µ and
δ = det(A) = λ µ. Here λ and µ are the eigenvalues of A. We know that
the roots of the quadratic√equation z2 − αz + δ = 0, that is,
λ , µ are √
α + α 2 − 4δ α − α 2 − 4δ
λ = and µ = . It is easy to analyse the
2 2
various cases.
1. If δ < 0, that is the lower half plane in αδ plane, we have saddle
points.
2. Consider now the upper half plane, which divides into two regions
separated by the parabola ∆ ≡ α 2 − 4δ = 0. In the region α 2 − 4δ >
0, we get a node which, in fact, is separated by the δ -axis to obtain
two regions. For α < 0, we get a stable node and for α > 0, it is an
unstable node.
3. Similarly, the region α 2 − 4δ < 0 with δ > 0 has two components,
where the equilibrium point is a focus, giving us a stable focus for
α < 0 and an unstable focus for α > 0.
4. In the positive δ -axis, we have the center equilibrium, whereas the
α-axis is the case where at least one of the eigenvalues is zero, thus
falling into a degenerate case.
A schematic representation (bifurcation diagram) is shown in Fig. 5.10.
Center
∆=0
∆<0 ∆<0
Stable focus Unstable focus
∆>0 ∆>0
Stable node Unstable node
α
Saddle point
Fig. 5.10 Bifurcation diagram
5.5 Higher Dimensional Systems

In a higher dimensional system, the number of possibilities, depending
on the nature of eigenvalues of the matrix, are naturally many more
compared to a two-dimensional system. This naturally makes the analysis
more involved. Some such possibilities are: the real and complex
eigenvalues may occur together; the deficiency of eigenvalues may be
zero or positive. Accordingly, A will be linearly equivalent to a matrix B
consisting of different blocks. This is given by the Jordan decomposition.
For example, when n = 3, the following possibilities occur:
       
λ1 0 0 λ1 0 0 λ1 1 0 λ1 0 0
 0 λ2 0  ,  0 λ2 1  ,  0 λ1 1  ,  0 a −b
       
0 0 λ3 0 0 λ2 0 0 λ1 0 b a
according to the case when A has eigenvalues; 3 real (need not be
distinct) with 3 independent eigenvectors, 3 real with only two
independent eigenvectors, 3 real with only a single independent
eigenvector, a real and two complex eigenvalues, respectively.
It is not very difficult to consider all possibilities with different signs

and sketch various phase portraits. We will see a few examples and the
reader can work out the remaining cases.
Example 5.5.1
 
1 0 0
Consider A = 0 1 0 .
0 0 −1
In this case, A has two eigenvalues 1 (algebraic multiplicity 2) and

−1. The geometric multiplicity of the eigenvalue 1 is also 2 and hence,
we obtain 3 independent eigenvectors. Since it is a decoupled system, we
do not need any change of variables. The solution can be written as
x1 (t ) = et x01 , x2 (t ) = et x02 , x3 (t ) = e−t x03 .
Since there is no coupling between the components of the solution, any
trajectory whose initial condition is in any of the axes or any of the planes
determined by the axes, will remain in the same axis or plane. So, if we
consider the x1 x2 -plane, the trajectories appear like the planar trajectories
corresponding to an unstable node (unstable equilibrium), whereas in
x1 x3 - or x2 x3 -planes, the origin appears like a saddle type equilibrium.
The phase portrait is shown in Fig. 5.11. Further, any trajectory starting
from the x1 - or x2 -axis, will remain there and tends to ±∞, as t → ∞,
whereas the trajectories originating from x3 -axes will tend to the origin.
x3
x2
x1
Fig. 5.11 Phase portrait of a 3 × 3 system

Example 5.5.2
Consider an example, again, a three-dimensional system, having one

real and one complex eigenvalue; of course, the complex eigenvalues
occur in conjugate pairs. For simplicity,
 we consider
 the system in the
a −b 0
transformed form, namely A = b a 0 .
0 0 λ
The matrix A has a real eigenvalue λ and two complex eigenvalues a ±

ib, with b 6= 0. As the x3 component is separated from the other coupled
components, the system is reduced to a single equation and a 2 × 2 system.
The solution can be written as
" #
cos(tb) − sin(tb) x01

x1 (t ) ta
=e , x3 (t ) = eλt x03 ,
x2 (t ) sin(tb) cos(tb) x 02
that is,
x1 (t ) = eta (x01 cos(tb) −x02 sin(tb)), x2 (t ) = eta (x01 sin(tb) + x02 cos(tb)).
Now consider the same with special eigenvalues. The reader may work
out the same problem with different signs for a, b and λ . Let a > 0, b < 0
and λ > 0. If we take the initial point in the x1 x2 -plane, that is, x03 = 0,
then the entire trajectory will remain in the x1 x2 -plane and it is similar to
a planar trajectory corresponding to an unstable focus, rotating clockwise
as b < 0, with increasing amplitude of the spiral as a > 0. On the other
hand, if the initial point is on the x3 -axis, that is, x01 = 0 = x02 , then the
trajectory will remain on the x3 -axis and, since λ > 0, tend to ±∞ along the
x3 -axis as t → ∞, according to whether x03 is positive or negative. So, if we
put both these arguments together, for a general initial point, the trajectory
will spiral around the x3 -axis, increasing the distance of the spiral from
the x3 -axis, but moving towards ±∞. See Fig. 5.12(b). As another case,
if we take a = 0 and b > 0, we get spirals around the x3 -axis, maintaining
the same distance from the x3 -axis, moving towards ±∞, since λ > 0, but
in the counter-clockwise direction since b > 0. See Fig. 5.12(a).
x3 x3
x2 x2
x1 x1
(a) (b)
Fig. 5.12 Phase portrait of a 3 × 3 system
In general, in the case of distinct eigenvalues, real or complex, we can

transform a matrix to a diagonal matrix or block diagonal matrix with 2×2
if the complex eigenvalue is a + ib, we
blocks on the diagonal. Note that
a −b
obtain the 2 × 2 block .
b a
Theorem 5.5.3
Suppose A has n distinct real eigenvalues λ1 , λ2 , · · · , λn with

eigenvectors v1 , v2 , · · · , vn , then, the matrix P = [v1 v2 · · · vn ] is
invertible, P−1 AP = diag[λ1 , λ2 , · · · λn ] and the solution x(t ) of the
system (5.2.1) is given by
h i
x(t ) = P diag eλ1t , eλ2t , · · · eλnt P−1 x0 .
Theorem 5.5.4
Suppose A is of even order n = 2k and all the eigenvalues are

complex and distinct given by λ j = a j + ib j , λ̄ j = a j − ib j ,
j = 1, · · · , k, with complex eigenvectors w j = u j + iv j , w j =
u j − iv j . Then, P = [v1 u1 v2 u2 · · · vn un ] is invertible,
P−1 AP = diag [B1 , · · · , Bn ], where each B j is a 2 × 2 matrix given by
a −b j
Bj = j .
bj aj
In this case, the solution of (5.2.1) is given by

x(t ) = P diag etB1 , · · · , etBn P−1 x0 .

" #
cos(b j t ) − sin(b j t )
Note that etB j = ea j t .
sin(b j t ) cos(b j t )
" #
Further, observe that represents a pure
rotation.
Sylvester’s Formula: When the eigenvalues are distinct, we have an

easier formula to compute etA . Let λ1 , λ2 , · · · , λn be the distinct
eigenvalues of A. Let Pi be the projection operator from Rn onto the
kernel, ker(A − λi I) for i = 1, · · · , n. Then, it is easy to see that
P1 +··· +Pn = I
λ1 P1 +··· +λn Pn = A
··· ··· ···
λ1n−1 P1 +··· +λnn−1 Pn = An−1 .
Since the coefficient matrix of this system of equations is a Vandermonde
matrix, P1 , · · · , Pn can be obtained as
n A − λ jI
Pi = ∏
j =1, j6=i λi − λ j
for i = 1, · · · , n. Finally,
!
n
etA = exp t ∑ λ jP j .
j =1
We leave it as an exercise to the reader to show (using the properties of

projection like P j = Pkj for any k = 2, · · · ) that
n
etA = ∑ etλ j P j .
j =1
This is known as Sylvester’s formula.

Multiple Eigenvalues: In the case of multiple eigenvalues, a

diagonalization may not be possible. One of the major difficulties in
dealing with the eigenvalues of multiplicity > 1, is that we may not get
enough (as many as the algebraic multiplicity)
number
of eigenvectors to
1 0
form a basis of R . To see this, let A =
n have the eigenvalue 1
0 1
with multiplicity (algebraic) 2 with two independent eigenvectors
1 1
(correct number). But for the matrix , the eigenvalue 1 has the
0 1
same algebraic multiplicity 2; however, it has only one independent
eigenvector. In other words, the dimension of ker(A − λ I ) may not be the
same as the algebraic multiplicity. The dim(ker(A − λ I )) is known as the
geometric multiplicity of the eigenvalue λ . The difference between
algebraic multiplicity and geometric multiplicity is called the deficiency
index of the eigenvalue.
However, every matrix A can be transformed into a matrix B which
can be decomposed into a diagonalizable part and a nilpotent part.
Definition 5.5.5
A matrix N is said to be nilpotent of order k if there exists an integer
k ≥ 1 such that Nk−1 6= 0 and Nk = 0.
k−1 i
N
For a nilpotent matrix N of order k, we have eN = ∑ . For example, the
i=0 i!
0 1 N 1 1
matrix N = is nilpotent of order 2 and e = I + N = . The
0 0 0 1
n × n matrix whose only non-zero entries are 1 along the first off-diagonal,
that is
 
0 1 0 ··· 0
 0 0 1 · · · 0
N=
 
· · · · · · · · · · · · 1

0 ··· ··· ··· 0

is nilpotent of order n. More generally, any strictly upper (lower) triangular
matrix is also nilpotent.
If one or more eigenvalues of a matrix A have positive deficiency
indices, then, we will not get sufficient number of eigenvectors of A to
form a basis for Rn . This makes it necessary to find some more linearly
independent vectors, of course related to A so that the eigenvectors and

the additional vectors put together form a basis for Rn . This possibly may
transform the matrix A to a simpler form though not in a diagonal
form. More precisely, we may not be able to decouple the system
completely, but it is still possible to convert the system into decoupled
sub-systems. Further, we would like to have these smaller sub-systems in
a form as simple as possible. This is essentially carried out in the Jordan
decomposition. The main concept is the introduction of generalized
eigenvectors and the choice of these vectors in a clever order.
Definition 5.5.6
[Generalized eigenvector] Let λ be an eigenvalue, then any vector v
satisfying (A − λ I)k v = 0 for some k ≥ 1 is called a generalized
eigenvector; if k = 1, it is the usual eigenvector.
It is a known fact that the smallest such k is less than or equal to the
1 1 1
algebraic multiplicity of λ . For example, in , the vector is an
0 1 0
eigenvector. Since, we are in the second dimension and the algebraic
multiplicity of the eigenvalue 1 is 2, the matrix A satisfies (A − I)2 v = 0
for any vector v. Hence, any vector can be taken as a generalized
eigenvector. We now state the following theorem without proof.
Theorem 5.5.7
Let λ1 , λ2 , · · · , λn be real eigenvalues of an n × n matrix A counted
according to their (algebraic) multiplicity. Then, there exists an
invertible matrix P = [v1 v2 · · · vn ] consisting of generalized
eigenvectors of A such that A = S + N, where S is diagonalizable
using P, that is P−1 SP = diag[λ1 , · · · , λn ] and N = A − S is nilpotent
of order k less than or equal to n. Further, SN = NS.
Since S and N commute, the solution to the linear system (5.2.1) is given
by
x(t ) = et (S+N) x0 = etS etN x0
t k−1 Nk−1
h i
λ1 t λn t −1
= P diag e , · · · , e P I + tN + · · · + x0 .
(k − 1) !
(5.5.1)
Example 5.5.8

3 1
Solve the linear system with A = .
−1 1
We leave the details to the reader. The eigenvalues are λ1 = λ2 = 2. It is

easy to see that
the geometric multiplicity of the eigenvalue 2 is one and we
1
may choose as a corresponding eigenvector. Any other vector is a
−1
generalized
eigenvector,
but we choose it so that it is linearly
independent
1 1 1 1 0 −1
of , say, . Hence, P = and P−1 = .
−1 0 −1 0 1 1

2 0 −1 2 0
Now S is given by S = P P = and hence, N = A −
0 2 0 2

1 1
S= and N2 = 0, that is, N is nilpotent of order 2. Hence,
−1 −1

2t −1 2t 1 + t t
x(t ) = Pe P [I + tN]x0 = e x .
−t 1 − t 0
Component-wise, we have
x1 (t ) = e2t ((1 + t )x01 + x02 ), x2 (t ) = e2t (−tx01 + (1 − t )x02 ).
Example 5.5.9
 
1 0 0
Let A = −1 2 0.
1 1 2
The eigenvalues are λ1 = 1, λ2 = λ3 = 2 and the algebraic multiplicities

of the eigenvalues
 1 and 2 are, respectively one and two. Choose
1 0
v1 =  1  and v2 = 0 as the corresponding eigenvectors 1 and 2. It
−2 1
is not hard to verify that the geometric multiplicity of the eigenvalue 2 is
2
one. It is straightforward to verify that dim(ker(A −2I) ) = 2, and thus,
0
another vector from ker(A − 2I)2 , say v3 = 1 has to be chosen
0
linearly independent
 of v2 ; of course, it isalso linearly
 independent of v1 .
1 0 0 1 0 0
Thus, P =  1 0 1, P−1 =  2 0 1. Now compute
−2 1 0 −1 1 0
     
1 0 0 1 0 0 0 0 0
S = P 0 2 0 P−1 = −1 2 0 and N = A − S =  0 0 0,
0 0 2 2 0 2 −1 1 0
N2 = 0. The solution is given by
 
et 0 0
 t − e2t 2t

x(t ) = 
 e e 0  x0 .

−2et + (2 − t )e2t te2t e2t
If all the generalized eigenvectors are complex, we have the following
theorem.
Theorem 5.5.10
Let A be a 2k × 2k matrix with complex eigenvalues

λ j = a j + ib j , λ̄ j = a j − ib j , j = 1, · · · , k. Then, there exist
generalized complex eigenvectors w j = u j + iv j , w j = u j − iv j , j =
1, 2, ..., k such that {v1 , u1 , v2 , u2 , · · · , vk , uk } form a basis of R2k and
P = [v1 u1 v2 u2 ··· vk uk ]
is invertible. Further, A = S + N, where P−1 SP= diag [B1 , · · · , Bk ],
a −b j
where each B j is a 2 × 2 matrix given by B j = j and N =
bj aj
A − S is nilpotent of order m less than or equal to 2k and SN = NS.
Thus, in the case of all eigenvalues complex, the solution to (5.2.1) is given
by
t m−1 Nm−1

x(t ) = P diag etB1 , · · · , etBk P−1 I + tN + · · · +

x0 .
(m − 1) !
" #
tB a t
Here e j = e j .
The final result is the Jordan form for a general matrix A.
Theorem 5.5.11
[The Jordan Canonical Form] Let A be a real matrix of order
n = k + 2m with real eigenvalues λ1 , λ2 , · · · , λk and complex
eigenvalues λ j = a j + ib j , λ̄ j = a j − ib j , j = k + 1, · · · , m. Then,
there exists a basis {v1 , v2 , · · · , vk , vk+1 , uk+1 , · · · , vk+m , uk+m } of Rn ,
where v j , j = 1, 2, · · · , k, w j = u j + iv j , j = k + 1, · · · , k + m are
generalized eigenvectors corresponding to the eigenvalues λ j , such
that
P = [v1 v2 ··· vk vk+1 uk+1 ··· vk+m uk+m ]
is invertible and P−1 AP = diag[B1 , B2 , · · · , Br ] is block diagonal with
Jordan blocks B j , j = 1, · · · , r for some r. Further, B j takes one of the
following two forms
0 ··· , 0 0 D I2 02 · · · 02 02
   
λ 1
0
 λ 1 ··· 0 0    02 D I2 · · · 02 02 
 
   
· · · · · · , · · · · · · · · · · · · or · · · · · · · · · · · · · · · · · ·
   
 0 · · · · · · · · · λ 1   02 · · · · · · · · · D I2 
   
0 ··· ··· ··· 0 λ 02 · · · · · · · · · 02 D
depending on whether λ = λ j is real or λ= a j + ib j is complex,

a j −b j 1 0
respectively; in the latter case, we have D = , I2 =
bj aj 0 1
and 02 is the 2 × 2 zero matrix.
Hence, the solution to the IVP (5.2.1) can be written as

x(t ) = exp((t − to )A)x0 = P diag [exp((t − t0 )B1 ), · · · ,
exp((t − t0 )Br )] P−1 x0 .

The number r is the sum of the geometric multiplicities of the eigenvalues.
Using the Jordan form, we can write the solution component-wise. If B
has the first form, then B = λ I + N and for those components
t2 t m−1
 
1 t ···
 2! (m − 1) ! 

m−2
 
 t 
tB tλ tN tλ  0
 1 t ··· 
e =e e =e  (m − 2) ! 
.
· · · · · · · · · ··· ··· 
 
· · · · · · · · · ··· ··· 
 
0 ··· ··· ··· 1

On the other hand, if B has the second form, then
t m−1 R
 
 R tR · · · · · · · · · (m − 1)! 
 
· · · · · · · · · · · · · · · · · ·
 
tB ta j 
 
e =e  ,
 ... · · · · · · · · · · · · ···  
 02 02 · · · · · · R
 
tR 
02 02 · · · · · · 02 R

where R = .
Note that each component of the solution x(t ) of the initial value
problem is a linear combination of the form t k eta cos(bt ) or t k eat sin(bt ),
where λ = a + ib is an eigenvalue of A, 0 ≤ k ≤ n − 1 (b may be zero or
non-zero). This is an important feature of linear systems with constant
coefficients.
5.6 Invariant Subspaces under the Flow etA

We have seen in the examples earlier that when a trajectory starts with
initial conditions in certain planes or axes, the solution remained in the
same plane or axes without going out of that region. This essentially
means that the action of the group given by the flow etA on these
subspaces is invariant. In other words, these are invariant subspaces
under group action. This allows us to decompose the entire phase space
into smaller subspaces which are invariant under the flow and we can
restrict the problem to these smaller subspaces, if necessary, for further
analysis.
Definition 5.6.1
Let A be a given n × n matrix. A subspace E ⊂ Rn is said to be

invariant with respect to the flow etA if etA (E ) ⊂ E for all t.
Recall Example 5.4.1, where x1 and x2 axes are two invariant subspaces,
namely E1 = {(x1 , 0), x1 ∈ R} and E2 = {(0, x2 ), x2 ∈ R}. For any initial
condition (x01 , 0) ∈ E1 , we have the solution x(t ) = (x1 (t ), 0) ∈ E1 for
all t. Further, x1 (t ) → 0 as t → ∞. This subspace is referred to as a stable
subspace. On the other hand, for E2 , any solution which starts in E2 ,
remains there for all t, but now it goes to ±∞ as t → ∞. In this case, the
subspace is called an unstable subspace.
In Example 5.4.4, both axes are unstable invariant subspaces and
hence, the entire R2 space is unstable. In Example 5.5.1, the x1 x2 -plane is
the unstable subspace, whereas the x3 -axis is the stable subspace.
The subspace generated by the generalized eigenvectors of an
eigenvalue λ of a matrix A is called the generalized eigenspace
corresponding to λ .
Proposition 5.6.2
Let E be a generalized eigenspace corresponding to an eigenvalue λ of

a matrix A. Then, E is invariant under the matrix A, that is, A(E ) ⊂ E.
Proof: Let E = span{v1 , · · · , vk }, vi s are generalized eigenvectors of

λ . Hence, there exist positive integers k j such that (A − λ I)k j v j = 0.
k k
Now, for any v ∈ E, we have v = ∑ c j v j and Av = ∑ c j Av j . We will
j =1 j =1
show that Av j ∈ E for all j and hence, Av ∈ E. If k j = 1, then
Av j = λ v j ∈ E. To apply induction assume the result is true for k j ≥ 2. If
we define v̄ j = (A − λ I)v j , then
v̄ j ∈ ker(A − λ I)k j −1
and thus, v̄ j is a generalized eigenvector and v̄ j ∈ E. Finally, it follows
that Av j = λ v j + v̄ j ∈ E. Hence the proposition.
Now, the entire space Rn can be decomposed into stable, unstable and
center spaces. This is the content of the following theorem. Given an n ×
n real matrix A, denote by E s , E u and E c the subspaces spanned by the
generalized eigenvectors corresponding to the eigenvalues with negative

real parts, positive real parts and zero real parts, respectively. Then, by the
generalized spectral theorem, it follows that
Rn = E s ⊕ E u ⊕ E c .
Theorem 5.6.3
The subspaces E s , E u and E c defined earlier, are invariant under the

flow etA , that is, etA E s ⊂ E s , etA E u ⊂ E u and etA E c ⊂ E c . Further, for
any x0 ∈ E s , the solution etA x0 → 0 as t → ∞ and for any x0 ∈ E u , the
solution etA x0 → 0 as t → −∞.
Proof: To see the invariance under the flow, it suffices to consider one
of the subspaces, say for E s . Let z be a generalized eigenvector, then, by
Proposition 5.6.2, it follows that Az ∈ E s and hence, Ak z ∈ E s for any
positive integer k. Therefore, it follows that
k
t jA jz
etA z = lim ∑ ∈ E s.
k7→∞ j =0 k!
Now, any x ∈ E s is a finite linear combination of generalized eigenvectors

and since etA is a linear operator, it follows that etA x ∈ E s . The other
statements in the theorem, now follow from the explicit representation of
the solution.
Remark 5.6.4
The subspaces E s , E u and E c are respectively called stable, unstable

and center subspaces of the flow. In general, we cannot make any
conclusion on the limit of the solution etA x0 , x0 ∈ E c as t → ±∞.
There is an analogous result for nonlinear systems as well and is
known as the stable manifold theorem.
5.7 Non-homogeneous, Autonomous Systems

We now consider the non-homogeneous system
ẋ(t ) = Ax(t ) + g(t ), x(t0 ) = x0 , (5.7.1)
where g : Rn 7→ Rn . We assume g is continuous. By taking x0 = ei

(canonical basis elements), it is easy to see that the matrix Φ(t ) ≡ etA
satisfies the matrix differential equation
Φ̇(t ) = AΦ(t ), Φ(0) = I, (5.7.2)
where I is the identity matrix. Clearly, Φ̃(t,t0 ) = Φ(t −t0 ) = e(t−t0 )A will
satisfy the same differential system with the initial time at t0 , that is
˙ (t,t ) = AΦ̃(t,t ), Φ̃(t ,t ) = I,
Φ̃ (5.7.3)
0 0 0 0
and the ith column of Φ̃ will satisfy (5.7.1) with g = 0 and x(t0 ) = ei .
Definition 5.7.1
Any non-singular matrix Ψ which satisfies the matrix system

Ψ̇ = AΨ is called a fundamental matrix. The special fundamental
matrix Φ̃(t,t0 ) is called the transition matrix corresponding to the
linear system (5.2.2).
Let Φ(t ) be the fundamental matrix given in (5.7.2) and C be a

non-singular scalar matrix. Then, it is readily seen that Ψ(t ) = Φ(t )C is
also a fundamental matrix. Conversely, if Ψ is a fundamental matrix, it is
easy to check that Ψ(t ) = Φ(t )C, where C = Ψ(t0 ), using the
uniqueness of the solution of the linear matrix system. Thus, we have the
following proposition.
Proposition 5.7.2
If Ψ(t ) = Φ(t )C, where C is a non-singular scalar matrix and Φ is the

transition matrix, then Ψ is a fundamental matrix. Conversely, every
fundamental matrix Ψ is of the form Ψ(t ) = Φ(t )C, where C is a
non-singular matrix.
5.7.1 Solution to Non-homogeneous system (Variation of

parameters)
We know that etA applied to any constant vector, say v satisfies the
homogeneous system (5.2.2). So we cannot expect a solution to the
non-homogeneous system (5.7.1) of the form etA v. Hence, we can only
expect to get a solution to the non-homogeneous system by applying the
flow etA to a varying vector. Thus, look for a solution of the form
x(t ) = etA y(t ), where y(t ) is to be determined so that x(t ) satisfies
(5.7.1).
A simple computation yields
ẋ(t ) = AetA y(t ) + etA ẏ(t ) = Ax(t ) + etA ẏ(t ).
Thus, we need to choose y which satisfies etA ẏ(t ) = g(t ). In other words,
Zt Zt
−sA −t0 A
y(t ) = y(t0 ) + e g(s)ds = e x0 + e−sA g(s)ds.
t0 t0
Thus, the solution to (5.7.1) is given by
Zt Zt
x(t ) = e (t−t0 )A
x0 + e (t−s)A
g(s) ds = Φ(t −t0 )x0 + Φ(t −s)g(s) ds.
t0 t0
(5.7.4)
In the case of the finite dimensional linear control theory, we have g(t ) =
Bu(t ), where B is an n × r matrix and u is an r × 1 control vector. In this
case,
Zt
x(t ) = Φ(t − t0 )x0 + Φ(t − s)Bu(s) ds. (5.7.5)
t0
Indeed, u(t ) needs to satisfy certain integrability conditions. The formula

(5.7.4) also gives a definition of the solution of (5.7.1) in a weak or mild
form. This form does not require the differentiability assumption of the
solution. This is important in control theory as the strong form of (5.7.1)
demands the continuity of the control u, whereas (5.7.5) does not demand
smoothness. It is a great advantage to consider non-smooth control in
applications like the bang-bang control.
5.7.2 Non-autonomous systems

The equation we consider here is
ẋ(t ) = A(t )x(t ), x(t0 ) = x0 (5.7.6)
and its non-homogeneous counterpart is

ẋ(t ) = A(t )x(t ) + g(t ), x(t0 ) = x0 . (5.7.7)
Here, A(t ) = [ai j (t )] is a continuous, real n × n matrix valued function
defined on a compact interval [a, b]; g is also a vector valued continuous
function defined on [a, b]. With f(t, x) = A(t )x, we see that f is Lipschitz
continuous in x on the n + 1 dimensional set
D = {(t, x) : t ∈ [a, b], x ∈ Rn }.
For, if (t, x1 ) and (t, x2 ) in D, then
|f(t, x1 ) − f(t, x2 )| ≤ k|x1 − x2 |,
n
where k = max {|A(t )| : t ∈ [a, b]} with |A(t )| = max ∑ |ai j (t )|.
1≤ j≤n i=1
The existence and uniqueness of a solution to (5.7.7) now follows
from the results in Chapter 4. However, the representation like in the
autonomous system is not possible, in general, in a simple way like etA ;
but the methodology discussed in Section 5.7 can be adapted.
More precisely, the local existence and uniqueness theorem gives a
unique solution x defined in some interval I = [t0 − h,t0 +Zh] ⊂ [a, b]. We
t
claim that I = [a, b]. For, if t ∈ I, we have x(t ) = x0 + A(s)x(s) ds.
t0
Therefore,
Z t
|x(t )| ≤ |x0 | + |A(s)||x(s)| ds, t ≥ t0 .
t0
Therefore, by Gronwall’s inequality, it follows that

Z t
|x(t )| ≤ |x0 | exp |A(s)| ds , t ≥ t0 ,
t0
≤ |x0 | exp(k(b − a)), (5.7.8)

for all t ∈ I. Therefore, the points (t, x(t )), t ∈ I remain in a bounded subset
of D and the solution can be continued to [a, b], using similar arguments
as in Chapter 4.
Remark 5.7.3
Suppose A(t ) is defined for all t ∈ R and is continuous. Then, the

earlier arguments show that the solution to IVP in any compact
interval is bounded. Therefore, the solution is defined for all t ∈ R.
Uniqueness of the solution follows from the local Lipschitz
condition. In particular, if A(t ) is bounded, say |A(t )| ≤ k, for all
t ∈ R, then the solution satisfies the following estimate (see (5.7.8)):
|x(t )| ≤ |x0 | exp(k|t − t0 |),
for all t ∈ R. However, if A(t ) is not bounded, the solution need not
be bounded by an exponential, with a linear factor in t in the exponent.
For example, the solution of the equation ẋ = 2tx is given by x(t ) =
2
cet , where c is an arbitrary constant. See also Example 5.7.5.
The same definition of a fundamental matrix can be introduced here as
well. Let Φi (t,t0 ) be the unique solution of (5.7.6) with x(t0 ) = ei , i =
1, · · · , n. We remark that Φi (t,t0 ) do not have an exponential
representation as in the autonomous system. Indeed, {Φ1 (t,t0 ),
Φ2 (t,t0 ), · · · , Φn (t,t0 )} is a set of linearly independent vectors in Rn for
fixed t,t0 and hence, the matrix Φ(t,t0 ) = [Φ1 (t,t0 ) Φ2 (t,t0 ) · · ·
Φn (t,t0 )] is invertible. Further, it satisfies Φ̇(t,t0 ) = A(t )Φ(t,t0 ),
Φ(t0 ,t0 ) = I. An important point to note here is that, unlike in the case of
an autonomous system, one cannot write the solution at any initial time t0
by translating the solution at the initial time 0.
A matrix function Ψ(t ) is called a fundamental matrix if it satisfies
the matrix system Ψ̇(t ) = A(t )Ψ(t ). We immediately see that the matrix
Φ(t,t0 ) is a fundamental matrix satisfying the initial condition Φ(t0 ,t0 ) =
I. Further, any fundamental matrix Ψ is given by Ψ(t ) = Φ(t,t0 )C, where
C is a non-singular constant matrix. In fact, C = Ψ(t0 ), that is, Ψ(t ) =
Φ(t,t0 )Ψ(t0 ) or Φ(t,t0 ) = Ψ(t )Ψ−1 (t0 ). Indeed, when A(t ) = A, we
have Φ(t,t0 ) = e(t−t0 )A and Φ(t, s) = e(t−s)A . In general, we do not have
this advantage. But, we have the following properties
Φ−1 (t,t0 ) = Φ(t0 ,t )
and
Φ(t, s)Φ(s,t0 ) = Φ(t,t0 ), Φ(t0 ,t0 ) = I.
The last two properties together are known as semi-group properties and
in this particular ODE system, we have group structure due to the first
property explained here. As remarked earlier, when we consider a PDE,
the heat equation, for example, we may not get a group structure, but a
semi-group structure. We have also noted earlier that it is possible to study
ODEs in infinite dimensional spaces such as a Hilbert space, a Banach
space, etc.
Now the solution to (5.7.6) is given by
x(t ) = Φ(t,t0 )x0 = Ψ(t )Ψ−1 (t0 )x0 .
The matrix Φ(t,t0 ) is known as the transition matrix. The solution of the
non-homogeneous system (5.7.7) is given by
Zt
x(t ) = Φ(t,t0 )x0 + Φ(t,t0 )Φ−1 (s,t0 )g(s) ds
t0
(5.7.9)
Zt
= Φ(t,t0 )x0 + Φ(t, s)g(s) ds.
t0
As we have remarked earlier, in general, we do not have an exponential

representation of the solution for the non-autonomous system. However,
in a special case, we do have such a representation which is given in the
following proposition.
Proposition 5.7.4
Let the matrices A(t ), t ∈ [a, b] satisfy the commutative property

A(t )A(s) = A(s)A(t ) for all s,t, ∈ [a, b]. Then, the transition matrix
has the representation
Z t
Φ(t,t0 ) = exp A(s) ds ,
t0
for all t ∈ [a, b] and t0 ∈ [a, b].
The result is not true in general, in the absence of the aforementioned

commutative property.
Example 5.7.5

1 1+t
Let A(t ) = .
0 t
It is straightforward to verify that A(t ) and A(s), t 6= s do

not commute in
Z t 2
t t + t /2
general. The matrix B(t ) = A(s) ds = .
0 0 t 2 /2
It can be verified that
 2

et µ (t )(et − et /2 )
exp(B(t )) =  
t 2 /2
0 e
2
e 4e2

2+t
with µ (t ) = , when t 6= 2. When t = 2, exp(B(t )) = .
2−t 0 e2
The solution of the system ẋ(t ) = A(t )x(t ) is given by
2
x2 (t ) = et /2 x02 and
Z t
2 /2)
t t
x1 (t ) = e x01 + e x02 (1 + s)e(−s+s ds,
0

x
which is different from exp(B(t )) 01 .
x02
Wronskian Type Result: In the case of second order linear equations,

we have defined the Wronskian of two functions and we have seen that
either the Wronskian vanishes identically (dependent functions) or never
vanishes (independent functions). In this section, we give a formula
connecting the determinant of the fundamental matrix and the trace of
A(t ) known as Abel’s formula. Consider det Φ(t,t0 ), the determinant of
Φ(t,t0 ). This is called the Wronskian of the solution. Let tr(A(t )) denote
the trace of A(t ). From the DE satisfied by Φ(t,t0 ), we can derive, for
small ∆t,
Φ(t + ∆t,t0 ) = Φ(t,t0 ) + ∆t Φ̇(t,t0 ) + O((∆t )2 )
= Φ(t,t0 ) + ∆tA(t )Φ(t,t0 ) + O((∆t )2 )
= (I + ∆tA(t ))Φ(t,t0 ) + O((∆t )2 ),

which gives
det(Φ(t + ∆t,t0 )) = det(I + A(t )∆t ) det Φ(t,t0 ) + O((∆t )2 ).
For the first term,1 we have
det(I + A(t )∆t ) = 1 + ∆ttr(A(t )) + O((∆t )2 ).
d
Thus, as ∆t → 0, we see that det(Φ(t,t0 )) = tr(A(t )) det(Φ(t,t0 )).
dt
This upon integration, produces Abel’s formula
Z t
det Φ(t,t0 ) = exp tr(A(s))ds .
t0
Periodic Coefficients and Floquet Theorem: In many practical

applications, the coefficient matrix will have an additional property of
periodicity. For example, in the Mathieu equation z̈ + (λ − 16d cos 2t )z
= 0, the coefficient matrix A(t ) (convert to a first order system) is
periodic. More generally, equations of Hill’s type: z̈ + p(t )z = 0,
p(t + π ) = p(t ) are also periodic equations.
We would like to know whether the solution to the system (5.7.6) is
also periodic or not. We do not, in general, get periodicity, but we may get
quasi-periodicity. For example, any non-trivial solution of the first order
equation ẏ + (a + b cost )y = 0 with a, b real constants, is not periodic
unless a = 0. On the other hand, any solution of the second order equation
ÿ + a2 y = 0 (a 6= 0), is periodic.
Assume A(t ) is periodic with period T , that is, A(t + T ) = A(t ) for
all t.
Let λ be an eigenvalue of the transition matrix Φ(t0 + T ,t0 ) and v be a
corresponding eigenvector. Then, x(t ) = Φ(t,t0 )v is the unique solution
to the ODE system in (5.7.6) with the initial condition x(t0 ) = v. Now
x(t0 + T ) = Φ(t0 + T ,t0 )v = λ v. If λ = 0, then, x will be a solution to
the linear system in (5.7.6) with a zero initial condition at t0 + T and by
uniqueness x ≡ 0. This would imply that v = 0. This contradiction shows
that λ 6= 0.
Now consider y(t ) = x(t + T ). By periodicity of A(t ), y will satisfy
the same linear system with the initial condition y(t0 ) = x(t0 + T ) = λ v,
1 If A is any n × n matrix with eigenvalues λ , · · · , λ , not necessarily distinct, then det(I + aA) =
1 n
(1 + aλ1 ) · · · (1 + aλn ), for any scalar a.
but λ x will also satisfy the same system with the same initial condition.
Hence, by uniqueness, we get x(t + T ) = λ x(t ) for all t. This is quasi-
periodicity. Moreover, since λ 6= 0, one can choose an α so that λ = eαT .
Now, it is easy to see that x(t ) = eαt z(t ), where z is a periodic function of
period T . Thus, we have the following theorem.
Theorem 5.7.6
[Floquet Theorem] Let the coefficient matrix A(t ) be periodic with

period T and the transition matrix of the system (5.7.6) has r
independent eigenvectors, then the ODE system in (5.7.6) (without
initial conditions) has r linearly independent solutions xi which can
be represented as xi (t ) = eαit zi (t ) for some scalars αi and zi s are
periodic functions of period T .
Note that the eigenvalues, may, in general, be complex and hence solutions
appear to be complex valued; but this is not the case. This requires little
more work. The interested reader is referred to [Inc26, Lef77] for further
reading.
5.8 Exercises
1. Let A be an n × n matrix with an eigenvalue λ0 of multiplicity n.
Show that the standard basis can be chosen as the basis of
generalized eigenvectors so that B = I which allows us to write
A = S + N in the appropriate theorem and then represent the
solution.
2. Use the decomposition
 inExercise 1 and solve the system ẋ = Ax,
2 0 0
where A =  −1 2 0 .
1 1 2
3. Find the general solution and phase portraits of the following
systems
(a) ẋ1 = −x1 + x2 , ẋ2 = −x2 .
(b) ẋ1 = x1 , ẋ2 = 5x2 .
(c) ẋ1 = −x1 − 3x2 , ẋ2 = −2x2 .
(d) ẋ1 = −x2 , ẋ2 = −x1 , ẋ3 = x3 .
(e) ẋ1 = x1 − x2 , ẋ2 = x1 + x2 , ẋ3 = x3 .

(f) ẋ1 = x1 + x2 , ẋ2 = x1 + x2 , ẋ3 = −x3 .
4. Find a basis of generalized eigenvectors and solve the systems
(a) ẋ1 = x1 , ẋ2 = −x1 + x2 , ẋ3 = x1 + x2 + 2x3 .
(b) ẋ1 = −3x1 , ẋ2 = 3x2 − 2x3 , ẋ3 = x2 + 2x3 .
Draw the phase portraits of the transformed as well as the original
system.
5. In the previous problems, analyse the stability and write down the
stable, unstable and center subspaces
6. The purpose of this exercise is to establish the Lipschitz continuity
of the linear mapping x 7→ A(t )x, used in Section 5.7.2.
(a) Let A = [ai j ] be a real n × n matrix. For x = (x1 , . . . , xn ) ∈ Rn ,
define the norm
|x| = |x1 | + · · · + |xn |.
Show that |Ax − Ay| ≤ |A||x − y| for any x, y ∈ Rn , where
|A| = max |ai j |.
1≤ j≤n
(b) Suppose A(t ) is a bounded, continuous, real n×n matrix valued
function defined for t ∈ R. Show that the function f defined by
f(t, x) = A(t )x, for t ∈ R and x ∈ Rn is Lipschitz continuous
(in the norm defined earlier) with Lipschitz constant less than
or equal to k, where k = sup |A(t )|.
t∈R
7. Derive Abel’s formula (Section 5.7.2) directly, by differentiating the

determinant of the transition matrix: det Φ(t,t0 ) and obtaining the
first order linear equation satisfied by it.
8. Solve the linear system
ẋ1 = x1 + x2
ẋ2 = x2 + x3
······
ẋn−1 = xn−1 + xn
ẋn = xn
by computing the appropriate exponential matrix.

9. Consider the system
ẋ1 = x1 + x2 , ẋ2 = x1 − x2
Transform the system to a diagonal system by finding the matrix B
whose columns are eigenvectors. Solve the transformed and original
systems and sketch the solutions in the phase plane. Write down the
stable, unstable and center subspaces.
10. Consider the 6 × 6 system
ẋ1 = x1 , ẋ2 = x5
ẋ3 = −x3 − x6 , ẋ4 = x4
ẋ5 = x5 , ẋ6 = x3 − x6 .
Show that this system has trajectories (solutions) which behave like
saddle point trajectories, center (periodic) and stable focus type
trajectories depending on the initial condition.
11. Work out all the details in Examples 5.3.2–5.3.4
12. Sketch the phase-portrait for different signs and values of λ in
Example 5.4.5 and in Example 5.4.6 for different values of a and b
with different signs.
13. Solve the systems in Example 5.5.8 and Example 5.5.9 by
constructing appropriate eigenvectors and generalized eigenvectors
14. Solve and draw the phase portraits in Example 5.5.2 in the following
cases with a, b, λ respectively as;
2, 1, 2; 2, 1, −2; −2, 1, 2; −2, −1, −2; 0, −1, −2.
15. Find the Jordan canonical form of the following 2 × 2 and 3 × 3

matrices by finding generalized eigenvalues and generalized
eigenvectors. Convert the system ẋ = Ax to ẏ = By and solve both
the systems and draw the phase portraits, wherever possible, in R2
or R3 :
(a)
0 −1 1 −1

0 1 1 1 1 1 1 1
, , , , , .
1 1 1 0 0 −1 1 1 −1 1 0 1
(b)
1 0 0 1 0 0 1 0 0
     
0 0 1 , 0 1 1  , 0 0 −1 ,
0 1 0 0 0 −1 0 1 0
1 1 0 1 0 0 1 0 0
     
0 1 1  , 1 2 0 , −1 2 0 .
0 0 −1 1 2 3 1 0 2
(c)
−1
   
−1 0 0 0 0 0 0
 1 −2 0 0  1 2 0 0
, ,
   
 1 −2 3 0  1 0 2 0
 
1 2 3 −4 1 1 0 2
   
−3 1 4 0 2 1 4 0
 0 −3 1 0 0 2 1 −1
, .
   
0 0 −3 0  0 0 2 1
 
0 0 0 −3 0 0 0 2
16. List all possible upper Jordan canonical forms of a 4 × 4 matrix

with a real eigenvalue λ of multiplicity 4 and find the
corresponding deficiency index in each case. Do the same with
repeated complex eigenvalues.
Z t
17. If the matrices A(t ) and A(s) ds commute for all t in an interval,
t0
show that the transition matrix has the representation
Z t
Φ(t,t0 ) = exp A(s) ds .
t0
18. ZThe result in Exercise 18, in general, is not true if A(t ) and
t
A(s) ds do not commute. To see this (see Example 5.7.5), work
t0
1 1+t
out the details with the following matrix: A(t ) = . Also
0 t
find the solution to the corresponding IVP.
5.9 Notes
Qualitative analysis of linear systems is the main concept in this chapter.
This chapter is also a precursor to the study of stability analysis of
nonlinear systems carried out in Chapter 8. A good reference for this
chapter among others is [Per01]; see also [Tay11, Sim91, SK07, CL72,
HSD04]. A detailed study of 2 × 2 systems is done here by directly
developing the required linear algebra. However, for higher order
systems, the analysis is done by borrowing the Jordan decomposition
theorem from linear algebra. The other notions that have been introduced
are dynamical systems, flow, invariant subspaces, which will also be
useful in the study of nonlinear systems. Non-homogeneous and
non-autonomous systems are studied by introducing the concepts of
fundamental matrix and transition matrix. A brief mentioning of Floquet
theory is also done; this concerns non-autonomous systems with periodic
coefficients.
6
Series Solutions: Frobenius Theory
6.1 Introduction
In Chapter 3, we have seen that the solutions of linear first order
equations can be obtained in explicit form by converting the problem
essentially to an integral calculus problem. We have also seen that there is
no general procedure to obtain the solutions of linear second order
equations with variable coefficients, in explicit form. Nevertheless, we
could obtain valuable information about the solutions by exploiting the
linearity, superposition principle, etc. In this chapter, we consider a class
of linear second order equations whose solutions may be written down in
explicit form. Since the solutions will be in the form of an infinite
(power) series, eliciting the qualitative behavior of solutions will be
difficult. The results of this chapter are collectively called Frobenius
theory. Some important equations such as Bessel’s equation, Hermite
equation, Chebyshev equation, Laguerre equation, etc., are included in
the class of equations considered here. Owing to the importance of these
equations, which appear in applications frequently, the major properties
of their solutions have been tabulated in mathematical handbooks. The
interested reader may refer to [AS72]. We restrict our discussion to the
real domain. There are also very interesting and important results for
equations in the complex domain and the reader is referred to [Inc26].
6.2 Real Analytic Functions

The class of equations we consider will have analytic coefficients.
Roughly speaking, analyticity means convergent power series. We are
familiar with power series in the context of Taylor’s series and
Maclaurin’s series in calculus.
Series Solutions: Frobenius Theory 181
Definition 6.2.1
[Analyticity] A function f : (a, b) → R, where (a, b) is an open
interval in R, is said to be (real) analytic at t0 ∈ (a, b) if there exists
δ > 0 such that (t0 − δ ,t0 + δ ) ⊂ (a, b) and
∞
f (t ) = ∑ an (t − t0 )n ,
n=0
for all t ∈ (t0 − δ ,t0 + δ ), where an s are real numbers, that is, f (t ) is
represented as a convergent power series in t − t0 in a neighborhood
of t0 . If f is analytic at every point in the interval (a, b), we say that f
is analytic in (a, b).
We now recall certain facts about convergent power series which will be
needed in what follows. For details, see [Apo11, Rud76].
∞ p
Consider a real power series ∑ ant n and put R−1 = lim sup n |an |.
n=0 n→∞
Then, the given power series converges for all t satisfying |t| < R and
diverges for |t| > R; the case of |t| = R is, in general, inconclusive. The
number R is called the radius of convergence of the power series. Note
∞
that R can also take the value 0 or ∞. Put f (t ) = ∑ ant n for t ∈ (−R, R).
n=0
The following statements hold:
1. The series converges uniformly in any compact subset of (−R, R).
2. The function f is infinitely differentiable in (−R, R) and
∞
f (k) (t ) = ∑ (n + 1)(n + 2) · · · (n + k)an+kt n ,
n=0
for k = 1, 2, · · · . The series converges for all t ∈ (−R, R).

3. In particular, f (k) (0) = k!ak , for k = 0, 1, 2, . . ..
Remark 6.2.2
|an+1 |
If an 6= 0 after a certain stage and lim = l, then it is well
n→∞ |an |
p
known that lim n |an | also equals l. Thus, we have an alternative
n→∞
way of calculating the radius of convergence, when applicable.
We may replace t by t − t0 in the earlier discussion. We now give several

examples of analytic functions, which are familiar to us from calculus. The
functions sint, cost, et are analytic in R. The function logt is analytic in
(0, ∞) and t 1/3 is analytic at any t0 > 0 as follows from the binomial series.
The function (t − a)−1 is analytic everywhere except at a. A point where
a function is not analytic is termed as a singular point of the function.
Denote by A (a, b), the set of all analytic functions in (a, b). This is a
real vector space. It is also closed under multiplication and thus becomes
an algebra. The composition of two analytic functions, when defined, is
also analytic. If f is analytic at t0 and f (k) (t0 ) = 0, for k = 0, 1, 2, · · · , it
follows that f ≡ 0 in a neighborhood of t0 . This property distinguishes
an analytic function from a mere infinitely differentiable function. For
example, if f is analytic in (a, b), then f cannot be compactly supported
in (a, b), that is, f cannot vanish outside any [c, d ] ⊂ (a, b).
Example 6.2.3
Consider the function f : R → R defined by

(
exp(−1/t 2 ) if t > 0,
f (t ) =
0 if t ≤ 0.
It is not difficult to check that f is in C∞ (R) (verification needed only at

t = 0), but f is not analytic at t = 0.
The natural question that arises is: which C∞ functions are analytic?
We state the following result without proof.
Theorem 6.2.4
A function f defined in a neighborhood of t0 is analytic at t0 if and

only if
1. f is C∞ in a neighborhood of t0 ; and
2. there exist positive δ and M such that for any t ∈ (t0 − δ ,t0 + δ ),
the inequality
k!
| f (k) (t )| ≤ M , (6.2.1)
δk
holds for k = 0, 1, 2, · · · .
Remark 6.2.5
If we replace k! in (6.2.1) by a weaker condition, (k!)s where s ≥ 1,

we obtain a class of C∞ functions called the Gevrey class of index s.
For s = 1, we recover the functions which are analytic. When s > 1,
it is interesting to note that the Gevrey class contains functions with
compact support. The definition of the Gevrey class of functions may
easily be extended to open subsets of Rn . These functions play an
important role in obtaining the regularity of weak solutions to linear
parabolic equations and weakly hyperbolic systems in the theory of
partial differential equations.
6.3 Equations with Analytic Coefficients

We begin with a familiar example. Consider the following equation
ÿ + y = 0. (6.3.1)
We seek an analytic solution y of (6.3.1) around t = 0 in the form
∞
y(t ) = ∑ ant n . (6.3.2)
n=0
Assuming the convergence of this series in an interval (−R, R), we obtain

by term-by-term differentiation that
∞
ẏ(t ) = ∑ (n + 1)an+1t n , (6.3.3)
n=0
and
∞
ÿ(t ) = ∑ (n + 1)(n + 2)an+2t n , (6.3.4)
n=0
Substituting the expressions in (6.3.4) and (6.3.2) into (6.3.1), we obtain

∞
∑ [(n + 1)(n + 2)an+2 + an ]t n = 0. (6.3.5)
n=0
Therefore, we have
(n + 1)(n + 2)an+2 + an = 0, n = 0, 1, 2, · · · .
We, thus recursively obtain the following:

a0 a1 a2 a0 a3 a1
a2 = − , a3 = − , a4 = − = , a5 = − = ,···
2 3! 3·4 4! 4·5 5!
Substituting these expressions in (6.3.2), the expression for y becomes
∞
t 2n ∞
t 2n+1
y(t ) = a0 ∑ (−1)n (2n)! + a1 ∑ (−1)n (2n + 1)! . (6.3.6)
n=0 n=0
The two power series in (6.3.6) are very familiar to us; they represents
cost and sint respectively. Thus,
y(t ) = a0 cost + a1 sint,
where a0 and a1 are arbitrary real constants. Thus, the series (6.3.2) for y
converges for all t ∈ R. This may be expected as the coefficients in (6.3.1)
are analytic in R.
Of course, we would have obtained the aforementioned solution
without going through the exercise of power series, as (6.3.1) is an
equation with constant coefficients. Nevertheless, this exercise contains
all the ingredients of a general procedure to obtain series solutions to
linear equations with analytic coefficients. In general, we will not be as
lucky as in this example to recognize the power series in terms of familiar
functions.
We consider one more example before stating the general result. The
second order equation
ÿ − 2t ẏ + 2py = 0, (6.3.7)
where p is a real constant, is termed as Hermite’s equation. If we again
assume the solution in the form (6.3.2), then we obtain from (6.3.7), after
the substitution of expressions in (6.3.2), (6.3.3) and (6.3.4),
∞
∑ [(n + 1)(n + 2)an+2 − 2nan + 2pan ]t n = 0. (6.3.8)
n=0
Equating each coefficient to zero in this series, we get

(n + 1)(n + 2)an+2 = −2( p − n)an , n = 0, 1, 2, · · · ,
similar to the expression obtained earlier. Therefore, the solution y may
be written as
y(t ) = a0 y1 (t ) + a1 y2 (t ), (6.3.9)
where y1 and y2 are given by the following series
2p 2 22 p( p − 2) 4 23 p( p − 2)( p − 4) 6
y1 (t ) = 1 − t + t − t + ···
2! 4! 6!
(6.3.10)
and
2( p − 1) 3 22 ( p − 1)( p − 3) 5
y2 (t ) = t − t + t
3! 5!
23 ( p − 1)( p − 3)( p − 5) 7
− t + ··· (6.3.11)
7!
By the simple ratio test, it is straightforward to verify that both these series
converge for all t ∈ R. It is also not difficult to see that they are linearly
independent and hence they span the solution space of Hermite’s equation.
We now make the following observations.
First, note that unless p is a non-negative integer, the infinite series for
y1 and y2 do not terminate. If p is a non-negative even integer, the series
for y1 terminates and y1 becomes a polynomial of degree p. Similarly, if
p is a non-negative odd integer, y2 becomes a polynomial of degree p.
Any other polynomial solution of Hermite’s equation is a multiple of one
of these polynomials. It is not difficult to compute these polynomials for
small p. For example, when p = 0, 1, 2, 3, the respective polynomials are
2
given by 1,t, 1 − 2t 2 ,t − t 3 .
3
Since any constant multiple of these polynomials is also a solution of
Hermite’s equation with p a non-negative integer, it is customary to take
the coefficient of t n , the leading term, as 2n . The resulting polynomials
are then termed as Hermite polynomials and are denoted by Hn (t ). Thus,
H0 (t ) = 1, H1 (t ) = 2t and H3 (t ) = 8t 3 −12t. Hermite polynomials appear
frequently in several applications, especially in quantum mechanics.
The following interesting formula for Hn may be deduced from the
expressions (6.3.10) and (6.3.11):
2 d n −t 2
Hn (t ) = (−1)n et e .
dt n
Theorem 6.3.1
Consider the second order linear equation

ÿ + P(t )ẏ + Q(t ) = 0, (6.3.12)
where P, Q are analytic functions at t0 ∈ R and are expressed in convergent

power series in t − t0 , t ∈ (t0 − R,t0 + R) for some R > 0. Then, given any
arbitrary real numbers a0 and a1 , there exists a unique analytic solution y
of (6.4.7) satisfying y(t0 ) = a0 and ẏ(t0 ) = a1 . Further, the solution y may
also be expressed as a convergent power series in t −t0 for t ∈ (t0 − R,t0 +
R). The point t0 is referred to as an ordinary point.
Proof: The uniqueness question has already been dealt with in detail in
Chapter 3. We may take t0 = 0, by changing the variable, if necessary.
Suppose
∞ ∞
P(t ) = ∑ pnt n and Q(t ) = ∑ qnt n , (6.3.13)
n=0 n=0
for t ∈ (−R, R). We seek a solution y of (6.4.7) in the form

∞
y(t ) = ∑ ant n . (6.3.14)
n=0
Assuming for the moment that the series for y converges in a small interval
around 0 and the operations of term-by-term differentiation are legitimate,
we obtain
∞
ẏ(t ) = ∑ (n + 1)an+1t n , (6.3.15)
n=0
and
∞
ÿ(t ) = ∑ (n + 1)(n + 2)an+2t n , (6.3.16)
n=0
Using the first series in (6.3.13) and (6.3.15), we obtain1

! !
∞ ∞
P(t )ẏ = ∑ pnt n ∑ (n + 1)an+1t n
n=0 n=0
" #
∞ n
= ∑ ∑ (k + 1) pn−k ak+1 tn (6.3.17)
n=0 k =0
and, using the second series in (6.3.13) and (6.3.14),

! !
∞ ∞
Q(t )y = ∑ qnt n ∑ ant n
n=0 n=0
" #
∞ n
= ∑ ∑ qn−k ak t n. (6.3.18)
n=0 k =0
Plugging the expressions in (6.3.16),(6.3.17) and (6.3.18) into the

equation (6.4.7), we obtain
" #
∞ n n
∑ (n + 1)(n + 2)an+2 + ∑ (k + 1) pn−k ak+1 + ∑ qn−k ak t n = 0.
n=0 k =0 k =0
Equating each coefficient in this series to zero, we obtain the following

recursion relations for the coefficients an s:
n
(n + 1)(n + 2)an+2 = − ∑ [(k + 1) pn−k ak+1 + qn−k ak ]. (6.3.19)
k =0
Through these recursion relations, the an s are determined in terms of

a0 , a1 and pn , qn s. We now proceed to prove the convergence of the
series (6.3.14) for t ∈ (−R, R). Once this is accomplished, the operation
of term-by-term differentiation is justified and completes the proof.
1 Recall that for two convergent series, their product is given by
! !
∞ ∞ ∞
∑ αnt n ∑ βn t n = ∑ γnt n ,
n=0 n=0 n=0
n
where γn = ∑ αk βn−k .
k =0
However, the task of proving convergence is not easy and it is a good

exercise in infinite series. We adapt the ratio test for this purpose.
Fix r < R. Since both the series in (6.3.13) for P, Q converge for t = r,
we can find an M > 0 such that
|pn |rn ≤ M and |qn |rn ≤ M,
for n = 0, 1, 2, . . .. Using these estimates in the recursion relation (6.3.19),
we obtain
M n
(n + 1)(n + 2)|an+2 | ≤ [(k + 1)|ak+1 | + |ak |]rk
rn k∑
=0
M n
≤ [(k + 1)|ak+1 | + |ak |]rk + M|an+1 |r,
rn k∑
=0
where the term M|an+1 |r is added for the purpose of what follows. Now
define b0 = |a0 |, b1 = |a1 | and recursively
M n
(n + 1)(n + 2)bn+2 = [(k + 1)bk+1 + bk ]rk + Mbn+1 r, (6.3.20)
rn k∑
=0
bn+1
It follows that |an | ≤ bn for all n. We now consider the ratios for
bn
large n for the application of the ratio test. Replace first n by n − 1 and
then, by n − 2 in (6.3.20) to obtain
n−1
M
n(n + 1)bn+1 = [(k + 1)bk+1 + bk ]rk + Mbn r
rn−1 k∑
=0
and
n−2
M
(n − 1)nbn = [(k + 1)bk+1 + bk ]rk + Mbn−1 r.
rn−2 k∑
=0
Multiplying the first expression here by r and using the second, we obtain
n−2
M
rn(n + 1)bn+1 = [(k + 1)bk+1 + bk ]rk + rM (nbn + bn−1 )
rn−2 k∑
=0
+ Mbn r2
= (n−1)nbn − Mbn−1 r + rM (nbn + bn−1 ) + Mbn r2
= [(n − 1)n + rMn + Mr2 ]bn .

Thus,
bn+1 (n − 1)n + rMn + Mr2
= ,
bn rn(n + 1)
∞
1
which tends to as n → ∞. Therefore, by the ratio test, the series ∑ bnt n
r n=0
converges for all t satisfying |t| < r. Since |an | ≤ bn , by comparison test, it
follows that the series (6.4.7) for the solution y also converges for |t| < r.
Since r < R is arbitrary, this completes the proof.
6.4 Regular Singular Points

Many second order linear equations that make frequent appearance in
applications do not have analytic coefficients. However, the singularities
present are isolated and regular, which will be defined later in this
section. We have already had an idea in Chapter 4 of how a singularity in
the coefficients may influence the solution in question. We consider the
following simple example to see what we can expect when singularities
are present in the coefficients.
Example 6.4.1
Consider the second order equation

k
ÿ + y = 0, t > 0.
t2
Here, k is a real constant. Notice that t = 0 is a singular point of the
coefficient of y. It is not hard to write down the general solution of this
equation. We have
y(t ) = t 1/2 (c1 sin( µ logt ) + c2 cos( µ logt )), t > 0, if k > 1/4,
y(t ) = t 1/2 (c1 logt + c2 ), t > 0, if k = 1/4,

y(t ) = t 1/2 (c1t µ + c2t −µ ), t > 0, if k < 1/4.

√ √
Here, µ = k − 1/4 if k > 1/4 and µ = 1/4 − k if k < 1/4; c1 , c2 are
arbitrary constants. Observe that any solution, except one, is defined only
for t > 0 and possesses a singularity at t = 0. One solution, which is a
multiple of t 1/2 when k = 1/4 is defined for t ≥ 0. Even in this case, we
cannot arbitrarily prescribe the initial data at t = 0.
The general situation is going to be somewhat similar, as we will see
later. First, we list some of the important equations which fall in this
category.
1. Bessel’s Equation of Order Zero: t ÿ + ẏ + ty = 0.
Observe that t = 0 is a singularity.
2. Bessel’s Equation of Order p: t 2 ÿ + t ẏ + (t 2 − p2 )y = 0,
where p is a non-negative real number. Again t = 0 is a singularity.
3. Legendre’s Equation: (1 − t 2 )ÿ − 2t ẏ + p( p + 1)y = 0,
where p is a real constant. Here t = ±1 are the singular points.
4. Chebyshev’s Equation: (1 − t 2 )ÿ − t ẏ + p2 y = 0,
where p is a real constant. Again, t = ±1 are the singular points.
5. Gauss’s Hypergeometric Equation: t (1−t )ÿ + [c− (a + b + 1)t ]ẏ −
aby = 0,
where a, b and c are real constants. Here t = 0, 1 are the singular
points.
6.4.1 Equations with Regular Singular Points

We now consider the general second order equation
ÿ + P(t )ẏ + Q(t )y = 0. (6.4.1)
We assume that the functions P and Q in (6.4.1) have a singular point t0 in
R. The singular point t0 is called a regular singular point of (6.4.1) if the
functions (t −t0 )P(t ) and (t −t0 )2 Q(t ) are analytic at t = t0 ; otherwise, it
is called an irregular singular point. We are going to obtain a convergent
series solution y of (6.4.1) near a regular singular point. However, we will
not have freedom to arbitrarily fix initial conditions at a regular singular
point. For the ease of writing, we assume that t = 0 is a regular singular
point of (6.4.1); the general case follows from a change of variable.
From the hypothesis, it follows that P and Q are of the form

p0
P(t ) = + p1 + p2t + · · · (6.4.2)
t
and
q0 q1
Q(t ) = + + q2 + q3t + · · · (6.4.3)
t2 t
We assume a solution y of (6.4.1) in the form of a ‘quasi power series’
y(t ) = t m (a0 + a1t + a2t 2 + · · · ) (6.4.4)
where a0 6= 0. The determination of the exponent m is part of the problem.
The equation satisfied by the index m, the indicial equation, will be a
quadratic equation reflecting the order of (6.4.1). The nature of the indicial
equation will produce solutions with different behaviors at t = 0.
Assuming that the series for y is convergent in an interval (0, T ), we
obtain the following (see the previous section):
∞
ẏ = ∑ an (m + n)t m+n−1 ,
n=0
∞ ∞
ÿ = ∑ an (m + n)(m + n−1)t m+n−2 = t m−2 ∑ an (m + n)(m + n−1)t n .
n=0 n=0
For the terms P(t )ẏ and Q(t )y, using (6.4.2) and (6.4.3), we get
!" #
1 ∞ ∞
P(t )ẏ =
t n∑
pnt n ∑ an (m + n)t m+n−1
=0 n=0
" #
∞ n
m−2
= t ∑ ∑ pn−k ak (m + k) tn
n=0 k =0
" #
∞ n−1
= t m−2 ∑ ∑ pn−k ak (m + k) + p0 an (m + n) tn
n=0 k =0
and
!" #
∞ ∞
1
Q(t )y = 2
t ∑ qnt n ∑ ant m+n
n=0 n=0
" #
∞ n
= t m−2 ∑ ∑ qn−k ak tn
n=0 k =0
" #
∞ n−1
= t m−2 ∑ ∑ qn−k ak + q0 an t n.
n=0 k =0
After the substitution of these expressions for ÿ, P(t )ẏ and Q(t )y in (6.4.1)
and canceling the common factor t m−2 throughout, we obtain
"
∞
∑ an {(m + n)(m + n − 1) + (m + n) p0 + q0 }
n=0
#
n−1
+ ∑ ak {(m + k) pn−k + qn−k } t n = 0. (6.4.5)
k =0
By equating the coefficients of t n in (6.4.5) to zero, we successively obtain
a0 [m(m − 1) + mp0 + q0 ] = 0,
a1 [m(m + 1) + (m + 1) p0 + q0 ] + a0 (mp1 + q1 ) = 0,
a2 [(m + 1)(m + 2) + (m + 2) p0 + q0 ]+
+a0 (mp2 + q2 ) + a1 [(m + 1) p1 + q1 ] = 0,
··· ··· ··· ··· ··· ···
an [(m + n − 1)(m + n) + (m + n) p0 + q0 ]
+ a0 (mpn + qn ) + · · · + an−1 [(m + n − 1) p1 + q1 ] = 0,
··· ··· ··· ··· ··· ···
(6.4.6)
Put f (m) = m(m − 1) + mp0 + q0 . Then, (6.4.6) may concisely be written

as
a0 f (m) = 0,
a1 f (m + 1) + a0 (mp1 + q1 ) = 0,
a2 f (m + 2) + a0 (mp2 + q2 ) + a1 [(m + 1) p1 + q1 ] = 0,
··· ··· ··· ··· ··· ···
an f (m + n) + a0 (mpn + qn ) + · · · + an−1 [(m + n − 1) p1 + q1 ] = 0,
··· ··· ··· ··· ··· ···
(6.4.7)
Since a0 6= 0, it follows that f (m) = 0, that is,
m(m − 1) + mp0 + q0 = 0. (6.4.8)
This is the indicial equation, which determines the possible values of the
exponent m in the assumed expression for the solution y. Let m1 and m2
be the roots of (6.4.8). If we choose m = m1 , then, from the
aforementioned expressions, we see that an is determined in terms of
a0 , a1 , · · · , an−1 , successively for n = 1, 2, · · · , provided that
f (m + n) 6= 0. The process breaks off if f (m + n) = 0. Thus, if
m1 = m2 + n, for some positive integer n, the choice m = m1 gives a
formal solution, but in general, the choice m = m2 does not, since
f (m2 + n) = f (m1 ) = 0. If m1 = m2 , then also we obtain only one
formal solution. In all the other cases, when the roots of the indicial
equation are real, we obtain two linearly independent formal solutions.
The roots of the indicial equation may also be complex, and therefore,
this procedure leads to a formal series with complex coefficients. Since
we are only interested in real solutions, we need to consider real and
imaginary parts of these formal solutions, which in general is quite
complicated and requires tools from complex analysis. We will not
pursue these topics here and the interested reader may refer to [Inc26] for
a discussion on differential equations in the complex domain.
We now state the forgoing discussion in the following theorem.
Theorem 6.4.2
Assume that t = 0 is a regular singular point of (6.4.1) and that the

power series for tP(t ) and t 2 Q(t ) given, respectively, by (6.4.2) and
(6.4.3) converge for t ∈ (−R, R) for some R > 0. Let the roots m1 and
m2 of the indicial equation (6.4.8) be real with m2 ≤ m1 . Then, (6.4.1)
has at least one solution
∞
y1 = t m1 ∑ ant n (a0 6= 0) (6.4.9)
n=0
on the interval 0 < t < R, where an s are determined in terms of a0 by

the recursion formula (6.4.7) with m replaced by m1 . Also, the series
∞
∑ ant n converges on the interval (−R, R). Furthermore, if m1 − m2
n=0
is not a non-negative integer, then (6.4.1) has a second independent
solution
∞
y2 = t m2 ∑ ant n (a0 6= 0) (6.4.10)
n=0
on the same interval, where now an s are determined using the

recursion relation (6.4.7) in terms of a0 and m replaced by m2 . Again,
∞
the series ∑ ant n converges on the interval (−R, R).
n=0
The series in (6.4.9) and (6.4.10) are called Frobenius series. In a specific
problem, it is much preferable to start with a series of the form (6.4.4)
and derive the indicial equation and recursion relations. However, the
recursion formula (6.4.7) finds its main application in the proof of
Theorem 6.4.2, which is similar to the one in the previous section, but is
more delicate because of the presence of the terms f (m + n). We will not
present a proof here and the reader is referred to [Sim91] for details.
The theorem leaves unanswered the cases of m1 = m2 and when m1 −
m2 is a positive integer.
Suppose m1 = m2 and y1 is a solution given by the Frobenius series. We
may now proceed to find a second independent solution by the procedure
described in Chapter 3. Let y2 = y1 v be another solution, where v is a
non-constant function. Then,
Z
1
v̇ = 2 exp − P(t ) dt
y1
Z h i
1 p0
= 2m exp − + p1 + · · · dt
t 1 (a0 + a1t + · · · )2 t
1
= exp (−p0 logt − p1t − · · · )
t 2m1 (a0 + a1t + · · · )2
1
= exp (−p1t − · · · )
t (a0 + a1t + · · · )2
1
= g(t ), say ,
t
where we have used the fact that 2m1 + p0 = 1 when m1 = m2 and g is an
analytic function at t = 0 with g(0) = a12 . Therefore, we have
0
v(t ) = b0 logt + b1t + · · · ,

where g(t ) = b0 + b1t + · · · . Of course, it may not be easy to determine
the coefficients in the power series expansion of g. When m1 − m2 is a
positive integer k, then the expressions for v̇ is t1k g(t ). In this case, there
may or may not be a logarithmic term in v.
We now illustrate the aforementioned procedure, in a somewhat
different way, by considering Bessel’s equation of order zero:
t ÿ + ẏ + ty = 0. (6.4.11)
It is easy to see that t = 0 is a regular singular point of (6.4.11). Let us
consider the Frobenius series
∞
y = tm ∑ ant n (6.4.12)
n=0
for a solution of (6.4.11), with a0 6= 0. We may take a0 = 1. We obtain,

after collecting the like terms,
t ÿ + ẏ + ty = m2t m−1 + (m + 1)2 a1t m + {(m + 2)2 a2 + 1}t m+1
+{(m + 3)2 a3 + a1 }t m+2 + · · · (6.4.13)

Now, let a1 , a2 , · · · be chosen to satisfy the following relations
(m + 1)2 a1 = 0,
(m + 2)2 a2 + 1 = 0,
(m + 3)2 a3 + a1 = 0,
··· ···
Then, unless m is a negative integer, we have ak = 0 for odd integers k and
1
a2 = − ,
(m + 2)2
a2 1
a4 = − 2
= ,
(m + 4) (m + 2) (m + 4)2
2
·········
Substituting these values in (6.4.12) and (6.4.13), we infer that if
t2 t4

m
y=t 1− + −··· (6.4.14)
(m + 2)2 (m + 2)2 (m + 4)2
and if m is not a negative integer, then
t ÿ + ẏ + ty = m2t m−1 . (6.4.15)
Choosing m = 0 in (6.4.14) and (6.4.15), we see that
t2 t4
y = 1− + −··· (6.4.16)
22 22 · 42
is a solution of Bessel’s equation
t ÿ + ẏ + ty = 0. (6.4.17)
The series in (6.4.16) is denoted by J0 (t ) and is called Bessel’s function
of zero order of the first kind. It is easy to see that J0 (t ) is an even
function of t and converges for all t ∈ R with J0 (0) = 1. We can also see
that the indicial equation for Bessel’s equation is given by m2 = 0; thus,
its roots are equal and equal to zero. We now proceed to find another
independent solution of Bessel’s equation. The general procedure tells us
that the second solution involves a logarithm term. We are going to derive
an expression for the same using (6.4.13). Differentiating both the sides
of (6.4.13) with respect to m and then choosing m = 0, we obtain
tY¨0 + Y˙0 + tY0 = 0,
∂y
where Y0 = evaluated at m = 0. Now, from (6.4.14),
∂m
t2 t4

∂y m
= t logt 1 − + −···
∂m (m + 2)2 (m + 2)2 (m + 4)2
2t 2 2t 4

m 1 1 1
+t − +
(m + 2)2 m + 2 (m + 2)2 (m + 4)2 m+2 m+4
2t 6

1 1 1
+ + + −··· .
(m + 2)2 (m + 4)2 (m + 6)2 m+2 m+4 m+6
Hence, putting m = 0, we obtain
t2 t2 t6

1 1 1
Y0 (t ) = J0 (t ) logt − 2 + 2 2 1+ + 2 2 2 1+ + −· · · ,
2 2 ·4 2 2 ·4 ·6 2 6
(6.4.18)
which is called Bessel’s function of the second kind of order zero. Using
1 1
1+ + · · · + = log n + γ + εn ,
2 n
where γ is Euler’s constant and εn → 0 as n → ∞, it is straightforward
to check that the power series in (6.4.18) (excluding the term J0 (t ) logt)
converges for all values of t. It follows that the general solution of Bessel’s
equation is given by
y = AJ0 + BY0 ,
for arbitrary constants A, B.
Remark 6.4.3
In some situations, the point at infinity plays an important role. This

case may be easily handled as follows. Consider a second order linear
equation and change the independent variable t to τ = 1t . If the

resulting equation after this change of variable has the point τ = 0 as
an ordinary point or a singular point, then we say that the point at
infinity is an ordinary point or a singular point, respectively, for the
original equation. We may use the method of Frobenius for the
transformed equation with τ as the independent variable and then go
back to the original problem.
6.5 Exercises
1. Show that the function f in Example 6.2.3 is in C∞ (R) and
f (n) (0) = 0 for n = 1, 2, · · · .
2. Let
∞ ∞
y1 (t ) = t r1 ∑ ant n , y2 (t ) = t r2 ∑ bnt n ,
n=0 n=0
for t > 0. Here r1 and r2 are real and unequal; a0 , b0 are non-zero.
State and prove a general theorem concerning the linear
independence of y1 and y2 .
3. Consider the second order Euler equation
t 2 ÿ + at ẏ + by = 0,
where a, b are real. Find two linearly independent solutions of the
equation in each of the following cases by applying the method of
Frobenius.
1 1
(a) a = ,b=− .
2 2
(b) a = −5, b = 9.
4. Discuss the solution of Legendre’s equation
(1 − t 2 )ÿ − 2t ẏ + a(a + 1)y = 0
in the neighborhoods of t = 1 and t = −1.
5. For each of the following equations, write the indicial equation and
find its roots. Write the form of two linearly independent solutions
without computing the coefficients, and discuss the limiting behavior

of the solutions as t → 0.
5 1
(a) ÿ + ẏ + y = 0.
2t 2t
2
(b) t ÿ + 4t ẏ + (2 − t )y = 0.
(c) t 2 ÿ + (1 − 6t )y = 0.
6. In each of the following equations, locate all the singular points by

describing whether they are regular or irregular.
(a) (t − 2)(t + 3)2 ÿ + 3t 2 ẏ − 2(t + 3)y = 0.

(b) t 2 ÿ + (sint )ẏ + (cost )y = 0.
(c) (et − 1)2 ÿ + 2(sint )ẏ + 3y = 0.
1
(d) ÿ + 3ẏ + t 2 y = 0, t ≥ 0.
1
7. Consider Bessel’s equation of order
2

2 2 1
t ÿ + t ẏ + t − y = 0.
4
(a) Verify that the functions
r r
2 2
y1 (t ) = sint, y2 (t ) = cost,
πt πt
are linearly independent solutions for t > 0.
1 1
(b) Show that the indicial equation has roots m1 = and m2 = − ,
2 2
whose difference is the positive integer 1. Show, nevertheless,
that both the solutions y1 , y2 can be derived from the method of
Frobenius without introducing a logarithm term.
8. Determine whether the point at infinity is an ordinary, regular

singular, or irregular singular point for each of the given equations.
(a) t ÿ + 3ẏ + 2ty = 0.

(b) t 5 ÿ + 2t 4 ẏ + y = 0.
(c) (Bessel’s equation of order p) t 2 ÿ + t ẏ + (t 2 − p2 )y = 0.
6.6 Notes
In this chapter, we have considered a couple of classes of linear second
order equations with variable coefficients whose solutions can be
obtained explicitly, in the form of a power series; see [Sim91]. The
analysis mainly involves proving the convergence of the power series,
obtained heuristically, of a solution. For power series solutions of a
system of linear equations, the reader is referred to [Tay11].
7
Regular Sturm–Liouville Theory
7.1 Introduction
In this chapter, we are going to study certain boundary value problems
(BVP) associated with regular second order linear equations containing a
parameter. More specifically, we will be looking for non-trivial solutions
of the following equation

d du
L u(t ) ≡ − p(t ) + q(t )u(t ) = λ ρ (t )u(t ).
dt dt
Here p ∈ C1 , p > 0 and q, ρ ∈ C, ρ > 0 are defined on a bounded interval

[a, b] in R and λ is a real parameter. Additionally, the solution is required
to satisfy boundary conditions at the end points a, b of the given interval.
We refer to these BVP as regular Sturm–Liouville systems or simply S–L
systems. When p vanishes somewhere in the interval or the interval under
consideration is unbounded, the problem is termed as singular. Singular
BVP are more difficult to deal with, and we will not discuss them in this
book. The interested reader may refer to [CL72], for example.
S–L systems arise in many physical problems, for example, in the
following situation. Consider the longitudinal vibrations of an elastic bar
of local stiffness p(ξ ) and density ρ (ξ ). The mean longitudinal
displacement v(ξ , τ ), at position ξ and time τ, of the section of such a
bar from its equilibrium satisfies the one-dimensional wave equation
∂ 2v

∂ ∂v
ρ (ξ ) 2 = p(ξ ) .
∂τ ∂ξ ∂ξ
If we now seek the solution in the following form of simple harmonic

vibrations (or the normal modes of vibrations) as
v(ξ , τ ) = u(ξ ) cos(k(τ − τ0 )),
we obtain an S–L system for u, changing ξ variable to t variable, with
q ≡ 0 and λ = k2 . For a finite bar, say, ξ ∈ [a, b], following are some of
the natural boundary conditions:
u(a) = u(b) = 0 (rigidly fixed ends)
u̇(a) = u̇(b) = 0 (free ends)
u̇(a) + αu(a) = u̇(b) + β u(b) = 0 (elastically held ends)
u(a) = u(b), u̇(a) = u̇(b) (periodic conditions).

Similarly, if we consider the vibrating circular membrane, we obtain a
wave equation in two dimensions. After the introduction of polar
coordinates and assuming the radial symmetry of the solution, the
equation can be reduced to a one-dimensional wave equation. If we again
seek a solution by the separation of variables as in the case of the elastic
bar earlier, this will lead to a singular S–L system involving Bessel’s
equation. For more details, see [BR03].
It will be seen that non-trivial solutions to the BVP exist only for a
discrete set of values of the parameter λ , tending to infinity. The situation
may thus be compared with eigenvalues of a matrix, considered as a
linear operator on a finite dimensional space. The main difference is that
we are now working in an infinite dimensional space. In analogy with
matrices, the discrete set of the values of the parameter for which the
non-trivial solutions exist, are called the eigenvalues of the BVP and the
corresponding non-trivial solutions are called the eigenfunctions.
Again, continuing the similarity with matrices, we know that for a
good matrix, any vector in the finite dimensional space in question, can
be written as a unique linear combination of eigenvectors of the matrix.
For example, this happens when the matrix is real, symmetric and the
space is Rn for any positive integer n. In this case, the eigenvectors are
also orthogonal; we discuss more regarding orthogonality in the next
section. In the case of a BVP too, one may express an arbitrary function
as a linear combination of the corresponding eigenfunctions. Since these
Regular Sturm–Liouville Theory 203
are (infinite) power series, we need to discuss the convergence, etc. and
the tools required come from functional analysis (Hilbert space), the
discussion of these topics is outside the purview of the present book.
Before proceeding further, let us consider a simple example
L u = −ü.
It is easy to see that non-trivial solutions to the BVP
L u = λ u, u(0) = u(π ) = 0,
exist if and only if λ = n2 , n ∈ Z \ {0}. Thus, the eigenvalues are n2 ,
n a non-zero integer and the corresponding eigenfunctions are sin(nt ).
If, instead, we consider the boundary conditions as u(−π ) = u(π ) = 0,
the eigenvalues are now n2 /4, n ∈ Z \ {0}, and the eigenfunctions are
sin(nt/2) and cos(nt/2) for n even and n odd respectively. A reader
familiar with Fourier (sine and cosine) series recognizes that any suitable
function satisfying the given boundary conditions, can be written as an
infinite series involving the corresponding eigenfunctions.
If we now consider the periodic boundary condition u(−π ) = u(π ),
we do obtain the situation of the Fourier series. However, the question of
convergence, especially that of point-wise convergence, of a Fourier series
is a delicate issue.
From this example, we learn that the form of the boundary conditions
plays an important role in the determination of the eigenvalues; it is also
important in making the operator L self-adjoint and the orthogonality
condition of the eigenfunctions, as we will see in the next section.
7.2 Basic Result and Orthogonality

We begin with following basic result.
Theorem 7.2.1
Let

d d
L =− p(t ) + q(t ),
dt dt
where p ∈ C1 , p > 0 and q ∈ C on a given interval [a, b] in R. Consider

the following BVPs:
L u = r, u(a) = α, u(b) = β , (7.2.1)
where α, β are real, r ∈ C [a, b] and its homogeneous counterpart
L u = 0, u(a) = 0, u(b) = 0. (7.2.2)
Then, the following alternatives hold:
(1) If (7.2.2) has only trivial solutions, then, there exists a unique
solution of (7.2.1).
(2) If (7.2.2) has a non-trivial solution, then, (7.2.1) has infinitely many
solutions, provided that it has a solution.
We may replace the boundary conditions in (7.2.1) by other linear

combinations of u, u̇ at the end points a, b, separately; see (7.2.3) later in
this section. See also Theorem 9.2.1 for a more general situation and for
higher order equations, see [Inc26]. We have the following example
concerning the hypothesis of (2) in the theorem.
The BVP, ü + π 2 u = 0, u(0) = u(1) = 0 has a non-trivial solution.
The BVP ü + π 2 u = 0, u(0) = α, u(1) = β has a solution if and only if
α + β = 0.
Proof: (of Theorem 7.2.1) Suppose u1 and u2 are two linearly

independent solutions of the homogeneous equation L u = 0. If we put
u = c1 u1 + c2 u2 ,
for some constants c1 , c2 , then, u is a solution of (7.2.2) if and only if the
following equations are satisfied:
c1 u1 (a) + c2 u2 (a) = 0,
c1 u1 (b) + c2 u2 (b) = 0.
Therefore, (7.2.2) has only trivial solutions if and only if the matrix
" #
u1 (a) u2 (a)
A=
u1 (b) u2 (b)
is non-singular. Next, let u0 be any particular solution of L u = r. Then,

the function u defined by u = u0 + c1 u1 + c2 u2 is a solution of (7.2.1) if
and only if the following equations are satisfied:
u0 (a) + c1 u1 (a) + c2 u2 (a) = α,
u0 (b) + c1 u1 (b) + c2 u2 (b) = β .

The possibility of obtaining such c1 , c2 depends on the rank of the
augmented matrix
u1 (a) u2 (a) α − u0 (a)
" #
B= .
u1 (b) u2 (b) β − u0 (b)
We infer that a solution for (7.2.1) exists if and only if the matrices A and
B have the same rank.
If (7.2.2) has only trivial solutions, then the matrix A is non-singular
and therefore, A and B have the same rank. This completes the proof of
(1). If u2 is a non-trivial solution of (7.2.2), then we choose u1 satisfying
L u1 = 0, u1 (a) 6= 0 such that u1 , u2 are linearly independent. Then, rank
of A is 1 and therefore, a solution of (7.2.1) exists if and only if rank of B
is also 1. If u0 is a solution of (7.2.1), so is u0 + cu2 for any real c. This
proves (2) and the proof of the theorem is thus complete.
It turns out that the more interesting situation is when (7.2.2) has
non-trivial solutions. This will lead to the existence of eigenvalues and
eigenfunctions for L .
We are now going to obtain the self-adjointness of L . To this end, we
introduce the following inner product. For any continuous functions u, v
defined on [a, b], define the inner product by
Z b
hu, vi = u(t )v(t ) dt.
a
(If u, v are complex valued, v(t ) should be replaced by v(t ), the complex
conjugate).
We impose the following boundary conditions on functions in C2 [a, b]:
α1 u(a) + α2 u̇(a) = 0, |α1 | + |α2 | > 0,
)
(7.2.3)
β1 u(b) + β2 u̇(b) = 0, |β1 | + |β2 | > 0.
Here α1 , α2 , β1 , β2 are real constants.
Theorem 7.2.2
(1) Suppose u, v ∈ C2 [a, b] satisfy the boundary conditions (7.2.3). Then,

L is self-adjoint in the sense that
hL u, vi = hu, L vi.
(See Remark 7.2.3).
(2) Suppose u, v ∈ C2 [a, b] satisfy the following equations
L u = λ ρu, L v = µρv
and the boundary conditions (7.2.3). If λ 6= µ, then
Z b
ρ (t )u(t )v(t ) dt = 0.
a
The functions u, v are said to satisfy an orthogonal condition with

weight ρ.
Proof: We begin by proving Lagrange’s identity. For any two C2

functions u, v defined on [a, b] and t ∈ [a, b], we have
d
v(t )L u(t ) − u(t )L v(t ) = − v(t ) ( p(t )u̇(t )) + q(t )u(t )v(t )
dt
d
+ u(t ) ( p(t )v̇(t )) − q(t )u(t )v(t )
dt
d
=− ( p(t )v(t )u̇(t )) + p(t )u̇(t )v̇(t )
dt
d
+ ( p(t )u(t )u̇(t )) − p(t )u̇(t )v̇(t )
dt
d
= [ p(t ) (u(t )v̇(t ) − v(t )u̇(t ))]
dt
d
= ( p(t )W (t )), (7.2.4)
dt
where W denotes the Wronskian of u and v. Integrating Lagrange’s identity

(7.2.4), over [a, b], we obtain
hL u, vi − hu, L vi = p(b)W (b) − p(a)W (a). (7.2.5)
Using the boundary conditions (7.2.3), we see that W (a) = 0 and W (b) =
0. Thus, the right side of (7.2.5) vanishes. This completes the proof of (1).
Now, suppose u and v satisfy the equations in (2). Then, we have, using
the self-adjointness of L ,
0 = hL u, vi − hu, L vi
Z b Z b
=λ ρ (t )u(t )v(t ) dt − µ ρ (t )u(t )v(t ) dt
a a
Z b
= (λ − µ ) ρ (t )u(t )v(t ) dt. (7.2.6)
a
Since λ 6= µ, we get the weighted orthogonality as in (2) and the proof of

the theorem is complete.
Remark 7.2.3
What actually we have shown is that L is symmetric with respect to

inner product h, i. In functional analysis, self-adjointness is more than
being symmetric. This will require a detailed description of the
domain of L , whose discussion is beyond the scope of this book.
With a slight modification of Lagrange’s identity, the same proof
works for complex valued functions also. The symmetry immediately
yields that the eigenvalues, if any, of L are all real. Why should they
be only discrete is another question that will be addressed later. At
this stage, we again stress that the boundary conditions do play an
important role in establishing the symmetry of L ; see, also, the
following remark.

Remark 7.2.4.
Looking at the expression on the right side of (7.2.5), we observe that

it may be made to vanish for a variety of boundary conditions; for
example, we may take the boundary conditions u(a) = u(b) = 0 or
u̇(a) = u̇(b) = 0. The condition p(b)W (b) = p(a)W (a) is referred

to as a periodic boundary condition. We should bear in mind that it is
not only the form of the operator L but also the form of the boundary
conditions, that play an important role in proving the orthogonality
condition.
Next, we take up the issue of obtaining the eigenvalues and the

corresponding eigenfunctions of the operator L . We do this by studying
the oscillations of a solution u, that is, the zeros of u, of the homogeneous
equation L u = 0 in [a, b]. The study of oscillations will eventually lead
us to the existence of eigenvalues and the eigenfunctions.
7.3 Oscillation Results

Consider the BVP for a second order linear homogeneous equation

d du
P(t ) (t ) + Q(t )u(t ) = 0, t ∈ (a, b), (7.3.1)
dt dt
along with the boundary conditions (7.2.3). Here P > 0 is differentiable
and Q is continuous in the interval [a, b]. For the S–L system, we have
P = p and Q = λ ρ − q. Before going to the details of the oscillation
results, we have the following result.
Theorem 7.3.1
Any non-trivial solution u (u 6≡ 0) of (7.3.1) can have at most finite

number of zeros in [a, b].
Proof: Suppose not. Then, since [a, b] is compact, there is a sequence

{ξn } of zeros of u such that ξn → ξ as n → ∞ for some ξ ∈ [a, b]. By
continuity, u(ξ ) = 0. Put hn = ξn − ξ , so that hn → 0 as n → ∞ and u(ξ +
u(ξ + hn ) − u(ξ )
hn ) = 0. We have 0 = . Letting n → ∞, it follows that
hn
u̇(ξ ) = 0. Hence, by uniqueness, u ≡ 0. This contradiction proves the
theorem.
To study the oscillations of u, we introduce a powerful tool known as

Prüfer substitution:
P(t )u̇(t ) = r (t ) cos θ (t ), u(t ) = r (t ) sin θ (t ). (7.3.2)
We, then, have
u
r2 = u2 + P2 u̇2 , θ = arctan . (7.3.3)
Pu̇
Here, r is called the amplitude variable and θ , the phase variable. When
P ≡ 1, this gives the usual polar coordinates in the (u̇, u) plane. For r 6= 0,
the correspondence (Pu̇, u) = (r, θ ) as defined here, is smooth (in fact,
analytic) with non-vanishing Jacobian. Also, for non-trivial solutions u,
we have r > 0. For, if r (t0 ) = 0, then u(t0 ) = 0 and P(t0 )u̇(t0 ) = 0. Since
P > 0, it follows that u(t0 ) = 0 and u̇(t0 ) = 0. By uniqueness, u ≡ 0.
We now derive an equivalent system of ODEs for r and θ . Using
u Pu̇
tan θ = or cot θ = , we obtain
Pu̇ u

d du
u P
dt dt Pu̇2 1
− csc2 (θ ) θ̇ = 2
− 2 = −Q − cot2 θ .
u u P
This, after multiplication throughout by sin2 θ gives
1
θ̇ = Q sin2 θ + cos2 θ ≡ F (t, θ ). (7.3.4)
P
Similarly, by differentiating the expression r2 = u2 + P2 u̇2 , with respect
to t, we obtain

1 1 1
ṙ = − Q r sin θ cos θ = − Q r sin 2θ . (7.3.5)
P 2 P
This system is equivalent to the original system (7.3.1) in the sense that
every non-trivial solution of (7.3.1) defines a unique solution of the ODEs
(7.3.4) and (7.3.5) by the Prüfer substitution. Next, observe that F (t, θ ) is
Lipschitz with respect to θ , as
∂F 1
= Q sin 2θ − sin 2θ
∂θ P
and therefore

∂F 1
∂ θ ≤ sup |Q(t )| + sup |P(t )| .

t∈[a,b] t∈[a,b]
Hence, we obtain a unique solution θ defined on [a, b] for any initial value
θ (a) = γ. Once θ is known, (7.3.5) for r gives
Z t
1 1
r (t ) = r (a) exp − Q(s) sin 2θ (s) ds ,
2 a P(s)
for all t ∈ [a, b]. Each solution of the Prüfer system depends on an initial
amplitude r (a) and an initial phase γ = θ (a). Changing r (a) just
multiplies the solution u by a constant factor. Thus, the zeros of any
solution u can be located by studying only the ODE for the phase θ .
From (7.3.2), we see that the zeros of any non-trivial solution u of
(7.3.1) occur where the phase function θ assumes the values nπ, n ∈ Z. At
these points, cos2 θ = 1 and θ̇ > 0, as follows from (7.3.4). Geometrically,
this means that the curve (P(t )u̇(t ), u(t )), t ∈ [a, b] in the (Pu̇, u) plane,
corresponding to a solution u can cross the Pu̇-axis at θ = nπ only counter-
clockwise.
The advantage of Prüfer substitution in studying the zeros of the
solution u is now evident from (7.3.4) satisfied by the phase variable. It is
only a first order equation for θ and does not contain r and the solution
exists in [a, b] for any given initial condition. We will now make the
following observations, which will be useful when we consider S–L
systems.
1. If θ is a solution of (7.3.4), so are −θ and θ + nπ for any integer n.
We may thus fix the initial condition θ (a) = γ ∈ [0, π ).
2. If we are just interested in the location of the zeros of u, then, it is
sufficient to solve the first order equation (7.3.4) and find the points
where θ takes the values nπ, n a positive integer.
3. Fix a non-negative integer n. If there is a tn ∈ [a, b] such that θ (tn ) =
1
nπ, then, from (7.3.4), it follows that θ̇ (tn ) = > 0. Hence,
P(tn )
θ (t ) > nπ for t > tn , close to tn .
We claim that θ (t ) > nπ for all t > tn . For, if there is a t > tn
such that θ (t ) = nπ, then we would have that θ̇ (t ) ≤ 0. But this
1
contradicts the fact that θ̇ (t ) = > 0, which follows from
P(t )
(7.3.4). Though θ need not be a monotonically increasing function
(it is if Q is also non-negative), it remains above the line θ = nπ, (n
any non-negative integer) once it crosses that line, for all future
times. In particular, for the chosen initial condition, θ > 0 in (a, b].
The existence of zeros of a non-trivial solution will now be done using
comparison theorems.
7.3.1 Comparison theorems

We, now consider the following two equations similar to (7.3.1)

d dui
Pi (t ) + Qi (t )ui (t ) = 0, t ∈ (a, b), (7.3.6)
dt dt
along with the boundary conditions (7.2.3), for i = 1, 2. Here Pi s are
positive differentiable functions and Qi s are continuous functions defined
on [a, b]. If θi s are the corresponding phase functions given by the Prüfer
substitution, then the phase functions satisfy the following equations (see
(7.3.4))
1
θ̇i = Qi sin2 θi + cos2 θi ≡ Fi (t, θi ), (7.3.7)
Pi
for i = 1, 2. We now wish to compare the solutions θ1 and θ2 , which will
enable us to compare the zeros of the solutions u1 and u2 .
Theorem 7.3.2
Suppose F (t, y) is a continuous function defined in the strip

D = [a, b] × R. Assume F is Lipschitz continuous with respect to y in
D. Let the functions f , g defined on [a, b] satisfy
f˙(t ) ≤ F (t, f (t )), ġ(t ) = F (t, g(t )),
for all t ∈ [a, b]. If f (a) = g(a), then, f (t ) ≤ g(t ) for all t ∈ [a, b].
Proof: Put σ (t ) = f (t ) − g(t ), t ∈ [a, b]. Suppose the conclusion of the

lemma is false. Then, we can find, by continuity, a subinterval [a1 , b1 ] of
[a, b] such that σ (a1 ) = 0, σ (t ) > 0 for t ∈ (a1 , b1 ). For t ∈ [a1 , b1 ], we
have
σ̇ (t ) = f˙(t ) − ġ(t ) ≤ F (t, f (t )) − F (t, g(t )) ≤ Lσ (t ).

Here, L is the Lipschitz constant of F and we have tacitly used the
condition that σ is non-negative in [a1 , b1 ]. Integrating this inequality, we
obtain the estimate σ (t ) ≤ σ (a1 )eL(t−a1 ) for t ∈ [a1 , b1 ]. Since
σ (a1 ) = 0, this gives a contradiction to the assumption that σ > 0 in
(a1 , b1 ) and completes the proof.
Theorem 7.3.3
[Comparison Theorem] Suppose F, G are continuous functions

defined in the strip D = [a, b] × R. Assume, either F or G is Lipschitz
continuous with respect to the second variable in D and that
F (t, y) ≤ G(t, y) for all (t, y) ∈ D. Let f , g be solutions of the DEs
f˙(t ) = F (t, f (t )), ġ(t ) = G(t, g(t )),
respectively, for t ∈ [a, b]. If f (a) = g(a), then f (t ) ≤ g(t ) for all
t ∈ [a, b].
Proof: First assume that G satisfies a Lipschitz condition. Then, we

have f˙(t ) = F (t, f (t )) ≤ G(t, f (t )), so that f , g satisfy the hypothesis in
Theorem 7.3.2 and the result follows.
Assume now that F satisfies a Lipschitz condition. Put h(t ) = − f (t )
and k(t ) = −g(t ) for t ∈ [a, b]. Then,
ḣ(t ) = − f˙(t ) = −F (t, f (t )) = −F (t, −h(t )).
Similarly, for k, we have
k̇(t ) = −G(t, −k(t )) ≤ −F (t, −k(t )).
Applying Theorem 7.3.2 again to functions h, k and −F (t, −h(t )), we
obtain that k(t ) ≤ h(t ) for all t ∈ [a, b]. This completes the proof.
Carefully observing the aforementioned proofs, we obtain the following

strict inequalities in the conclusion.
Corollary 7.3.4
[Corollary to Theorem 7.3.2] For any t1 ∈ (a, b], either f (t1 ) < g(t1 )
or f ≡ g in [a,t1 ].
Proof: Either f ≡ g in [a,t1 ], or there is a t0 ∈ [a,t1 ] such that f (t0 ) <

g(t0 ). Then, the function σ̃ defined by σ̃ (t ) = g(t ) − f (t ) for t ∈ [a,t1 ] is
non-negative and σ̃ (t0 ) > 0. We then have
σ̃˙ (t ) = ġ(t ) − f˙(t ) ≥ F (t, g(t )) − F (t, f (t )) ≥ −Lσ̃ (t ).
Here, L is the Lipschitz constant of F and we have tacitly used the
condition that σ̃ is non-negative in [a,t1 ]. Integrating this inequality, we
obtain the estimate σ̃ (t ) ≥ σ̃ (t0 )e−L(t−t0 ) for t ∈ [a,t1 ]. This gives strict
inequality σ̃ (t ) > 0 as σ̃ (t0 ) > 0, and completes the proof.
Corollary 7.3.5
[Corollary to Theorem 7.3.3] Suppose both F and G satisfy Lipschitz

conditions. If f (a) < g(a), then f (t ) < g(t ) for all t ∈ [a, b].
Proof: Suppose the conclusion is false. Then, we can find a t1 > a such
that f (t1 ) = g(t1 ). Now, consider the functions φ and ψ defined by
φ (t ) = f (−t ), ψ (t ) = g(−t ), t ∈ [−t1 , −a].
Then, φ and ψ satisfy the DEs
φ̇ (t ) = −F (−t, φ (t )), ψ̇ (t ) = −G(−t, ψ (t )),
for t ∈ [−t1 , −a] and satisfy the condition φ (−t1 ) = ψ (−t1 ). Since
−F (−t, y) ≥ −G(−t, y), we can apply Theorem 7.3.3 in the interval
[−t1 , −a]. We conclude that φ (−a) ≥ ψ (−a). This implies f (a) ≥ g(a),
a contradiction. The proof is complete.
We are now going to apply these comparison results, especially the

corollaries 7.3.4 and 7.3.5, to compare the phases θ1 and θ2 satisfying
(7.3.7). Suppose P1 ≥ P2 > 0 and Q1 ≤ Q2 in [a, b]. If θ1 (a) ≤ θ2 (a),
then θ1 (t ) ≤ θ2 (t ) for all t ∈ [a, b]. Furthermore, θ1 (b) = θ2 (b) only if
θ1 ≡ θ2 in [a, b].
We will discuss the equality case in more detail. Suppose θ1 ≡ θ2 = θ ,
1
say, in [a, b]. Since θ̇ (t ) = > 0, when sin θ (t ) = 0, the zeros of
P(t )
sin θ (t ) are only isolated. Now, from (7.3.7), we obtain

2 1 1
(Q2 − Q1 ) sin θ + − cos2 θ = 0.
P2 P1
Since, by assumption, both the terms are non-negative, they must

separately vanish in [a, b]. This gives that Q1 ≡ Q2 . Therefore, we then
have, P1 ≡ P2 , except on intervals where cos θ vanishes. On such
intervals, using (7.3.4), we obtain θ ≡ constant and therefore, using
(7.3.7), we get Q1 ≡ Q2 ≡ 0. In this situation, we also show that u1 and
u2 are linearly dependent. On the intervals where cos θ vanishes, both u1
and u2 are constants, hence they are multiples of each other. On the
intervals where cos θ does not vanish, we have, P1 ≡ P2 and therefore u1
and u2 satisfy the same second order linear homogeneous equation. Now
looking at their Wronskian, we see that they are linearly dependent. By
continuity, it follows that u2 = cu1 on [a, b] for some constant c. We have
thus proved the following.
Theorem 7.3.6
[Sturm Comparison Theorem] Assume P1 , P2 are in C1 [a, b], P1 ≥

P2 > 0 and Q1 ≤ Q2 are continuous functions in [a, b]. Then, between
any two zeros of a non-trivial solution u1 of (7.3.6) with i = 1, there
lies at least one zero of every real solution u2 of (7.3.6) with i = 2,
except when u2 ≡ cu1 , for some constant c. In the latter case, we have
P1 ≡ P2 and Q1 ≡ Q2 , except possibly on intervals where Q1 ≡ Q2 ≡ 0.
We are now in a position to discuss the oscillation results for the solutions
of the Sturm–Liouville system

d du
L u(t ) ≡ − p(t ) + q(t )u(t ) = λ ρ (t )u(t ), (7.3.8)
dt dt
satisfying the boundary conditions (7.2.3). Comparing (7.3.8) with the
equation (7.3.1), we find that P = p and Q = λ ρ − q. We would like to
study the number of zeros of a non-trivial solution of (7.3.8) as the real
parameter λ varies. Denote by θ (t, λ ), the corresponding phase variable.
Then, by (7.3.4), we have
1
θ̇ (t, λ ) = [λ ρ (t ) − q(t )] sin2 θ (t, λ ) + cos2 θ (t, λ ), (7.3.9)
p(t )
for t ∈ [a, b]. Recall that p is a positive C1 function and ρ > 0, q are
continuous functions defined on [a, b]. Now, fix a real number γ and
consider the solution θ (t, λ ) of (7.3.9) satisfying θ (a, λ ) = γ, for all λ .
Here, γ is determined by the boundary condition (7.2.3) at a as
α2
α1 sin γ + cos γ = 0. (7.3.10)
p(a)
There is a unique solution γ ∈ [0, π ) of (7.3.10). If α1 6= 0, we have, tan γ =
α2 π
− ; if α1 = 0, then, put γ = and tan γ = ∞.
α1 2
We are now going to obtain the following results as a direct
consequence of the comparison theorems and their corollaries proved
earlier. See also the observations made on the phase function in the
previous section.
Lemma 7.3.7
For any fixed t > a, the phase variable θ (t, λ ) is a strictly increasing
function of λ .
This is a direct consequence of Corollary 7.3.5.
Lemma 7.3.8
Suppose for some tn > a, θ (tn , λ ) = nπ, where n is a non-negative

integer. Then, θ (t, λ ) > nπ for all t > tn .
Proof: This follows from the third observation made earlier in the
previous section.
Lemma 7.3.8, combined with the condition 0 ≤ γ = θ (a, λ ) < π, gives

the first zero of u in (a, b) when θ = π and the nth zero when θ = nπ. We
now analyze the behavior of θ (t, λ ) for fixed t > a and as λ → ∞. Denote
by tn (λ ), the smallest t > a for which θ (t, λ ) = nπ; here n is a positive
integer.
Lemma 7.3.9
Fix a positive integer n. Then, for sufficiently large λ , tn (λ ) is defined

and is a continuous function. It is a decreasing function of λ , and
lim tn (λ ) = a.
λ →∞
Proof: From the continuous dependence results in Chapter 4, it follows

that the solution θ (t, λ ) is a continuous function of t ∈ [a, b] and λ ∈ R.
Assuming that tn (λ ) is well-defined, we first observe that it is a decreasing

function of λ . Suppose λ1 < λ2 . Then,
θ (tn (λ1 ), λ1 ) = nπ = θ (tn (λ2 ), λ2 ).
By Lemma 7.3.7, we have
nπ = θ (tn (λ1 ), λ1 ) < θ (tn (λ1 ), λ2 ),
and since tn (λ2 ) is the smallest number such that θ (tn (λ2 ), λ2 ) = nπ, it
follows from Lemma 7.3.8 that tn (λ1 ) ≥ tn (λ2 ), proving the assertion of
monotonicity.
We now show that tn (λ ) is well-defined for large enough λ and prove
the limit stated in the theorem. We need to show that for large enough λ ,
there is t > a such that θ (t, λ ) = nπ. We will do this by comparing any
non-trivial solution of (7.3.8) and its phase variable given by the Prüfer
substitution and satisfying (7.3.9). Thus, it suffices to show that any non-
trivial solution of (7.3.8) has at least n zeros in (a, b), which will imply
that θ (t, λ ) being a continuous function of t, must assume all the values
between θ (a, λ ) = γ < π and nπ.
Let qM and pM be the maxima of the functions q and p over [a, b]
respectively; and ρm be the minimum of the function ρ over [a, b].
Consider the following S–L system with constant coefficients:
pM v̈(t ) + (λ ρm − qM )v(t ) = 0, (7.3.11)
qM
for t ∈ [a, b]. If we choose λ such that λ > , then (7.3.11) has a non-
ρm
trivial solution given by v(t ) = sin k(t − a), where k2 = (λ ρm − qM )/pM .
Observe that v(a) = 0 and p ≤ pM and (λ ρm − qM ) ≤ (λ ρ − q) in [a, b]
and hence, by the Sturm comparison theorem (Theorem 7.3.6), any non-
trivial solution u of (7.3.8) must have at least one zero between any two
zeros of v. The zeros of v are given by a + nπ k , n an integer; the distance
between any two consecutive zeros is therefore πk . Thus, v has n zeros in
(a, b) for large enough λ . This proves that u also has at least n zeros and
therefore, θ (t, λ ) takes the value nπ when λ is large enough. Thus, tn (λ )
is well-defined.
(n − 1)π nπ
Note that tn (λ ) lies between a + and a + . This
k k
immediately gives the continuity of tn with respect to λ and also proves
the required limit. This completes the proof of the lemma.
We now come to the main oscillation result.
Theorem 7.3.10
[Oscillation Theorem]
Let θ (t, λ ) be the solution of (7.3.9) satisfying the initial condition

θ (a, λ ) = γ, 0 ≤ γ < π for each λ . Then, for fixed t > a, θ (t, λ ) is a
continuous and strictly increasing function of λ . Moreover, for t ∈ (a, b],
lim θ (t, λ ) = ∞, lim θ (t, λ ) = 0. (7.3.12)
λ →∞ λ →−∞
Proof: Lemmas 7.3.7–7.3.9 contain all the conclusions of the theorem,

except the statement regarding the second limit. We now proceed to show
the second assertion in (7.3.12). Given any ε > 0, we need to show that
there is a Λ, depending on ε and perhaps on t ∈ (a, b], such that θ (t, λ ) < ε
for any λ < Λ. This proves the assertion, as we have already seen that
θ > 0 in (a, b].
Put |q|M = sup |q(t )| and pm = inf p(t ). Then, it follows from
t∈[a,b] t∈[a,b]
(7.3.9) that
θ̇ (t, λ ) ≤ λ ρ (t ) sin2 θ (t, λ ) + K, (7.3.13)
1
for t ∈ [a, b], where K = |q|M + . Choose γ1 such that γ < γ1 < π and let
pm
ε > 0, t1 ∈ (a, b]. We may assume that, by taking a smaller ε if necessary,
ε < γ1 ≤ π − ε.
Now consider the straight line θ = s(t ) in the tθ -plane joining the points
(a, γ1 ) and (t1 , ε ). Its slope is
γ1 − ε
m=− < 0.
t1 − a
Observe that θ (a, λ ) = γ < γ1 = s(a). If we show that the graph of the
solution of (7.3.9), for λ < 0 with large absolute value, lies below this line
on the interval [a,t1 ], then it follows that θ (t1 , λ ) < s(t1 ) = ε and we are
done.
From (7.3.13), it follows that θ̇ ≤ K for λ < 0, and so we first obtain
that θ (t, λ ) ≤ γ + K (t − a). Therefore, θ (t, λ ) < s(t ) provided that t ≥ a
γ1 − γ
and t − a < . Thus, the solution curve lies below the straight line, in
K −m
[a, a1 ] for some a1 > a. Now suppose, if possible, that θ (t, λ ) > s(t ) for
some t ∈ [a,t1 ]. We will obtain a contradiction for a choice of λ .
By continuity, we can find the smallest t∗ ∈ [a,t1 ] such that θ (t∗ , λ ) =
s(t∗ ) with θ̇ (t∗ , λ ) ≥ m, since the solution curve can cross the straight line
only from below. Then, observe that θ (t∗ , λ ) = s(t∗ ) = γ1 + m(t∗ − a).
Substituting the expression for m and the upper bound for γ1 , we see that
θ (t∗ , λ ) ∈ [ε, π − ε ]. Therefore, sin θ (t∗ , λ ) ≥ ε.
m−K
Let ρ∗ = inf ρ (t ) and choose Λ = . Then, remembering
t∈[a,t1 ] ρ∗ sin2 ε
that λ < 0, we obtain the following contradiction
m ≤ θ̇ (t∗ , λ ) ≤ λ ρ (t∗ ) sin2 θ (t∗ , λ ) + K ≤ λ ρ∗ sin2 ε + K < m,
provided that λ < Λ. This shows that θ (t1 , λ ) < ε if λ < Λ and completes
the proof.
7.3.2 Location of zeros

We now obtain some estimates on the location of zeros of a non-trivial
solution of (7.3.8) satisfying the condition (7.2.3) at a. To this end, we
consider the following S–L system with constant coefficients, as in the
proof of Lemma 7.3.9:
pm ü + (λ ρM − qm )u = 0, (7.3.14)
where pm and qm are the minima of p and q over [a, b], respectively; ρM is
the maximum of ρ over [a, b]. For the solutions of (7.3.11) and (7.3.14),
α2
we impose the initial condition, tan γ = − if α1 6= 0. The solutions of
α1
these equations can be found explicitly and hence, their zeros, for suitable
values of λ . The nth zero of the solutions of (7.3.11) and (7.3.14) are,
respectively, given by
√ √
(nπ − γ ) pM (nπ − γ ) pm
a+ p and a + p .
λ ρm − qM λ ρM − qm
By Sturm comparison theorem, we therefore, have the following result:
Lemma 7.3.11
Let u be a non-trivial solution of the S–L system (7.3.8) satisfying the

u(a)
condition tan γ = . If tn denotes the nth zero of u in (a, b),
p(a)u̇(a)
then
√ √
pm tn − a pM
p ≤ ≤p . (7.3.15)
λ ρM − qm nπ − γ λ ρm − qM
The preceding estimates were proved under the assumption that α1 6= 0.

The same argument works when β1 6= 0 by changing variable t to a + b−t.
If both α1 and β1 are zero, we can still prove the forgoing estimates by
taking γ = π/2.
7.4 Existence of Eigenfunctions

We will now show the existence of eigenvalues and the corresponding
eigenfunctions to the S–L system (7.3.8) with boundary conditions in
(7.2.3), using the results developed in the previous sections.
Theorem 7.4.1
[Existence of Eigenfunctions] The regular S–L system (7.3.8) has an

infinite sequence of real eigenvalues λ0 < λ1 < · · · with lim λn = ∞.
n→∞
The eigenfunction un corresponding to eigenvalue λn has exactly n
zeros in (a, b). Also, un is unique up to a constant factor.
Proof: The initial condition on the phase variable θ ensures that the
boundary condition (7.2.3) is satisfied at a for the corresponding solution
u of (7.3.8). This u will be an eigenfunction if the boundary condition at
b in (7.2.3) is also satisfied. Translating this condition at b on θ , we see
that θ (b, λ ) = δ + nπ for n = 0, 1, 2, · · · , provided that δ satisfies
β1 sin δ + β2 ( p(b))−1 cos δ = 0. (7.4.1)
There is a unique value δ ∈ (0, π ] satisfying (7.4.1). For this value of δ , we
now ask whether there is a λ satisfying θ (b, λ ) = δ . Theorem 7.3.10 gives
the positive answer. Call this value λ0 and the corresponding eigenfunction
u0 . Since δ ≤ π, it follows that θ < π in (a, b) and therefore, u0 does not
vanish in this interval.
Next, we ask whether there is a λ satisfying θ (b, λ ) = δ + π. The

answer is again yes, because of Theorem 7.3.10. Call this value λ1 and
the corresponding eigenfunction u1 . This eigenfunction has a zero in
(a, b) since θ takes the value π there. Continuing in this fashion and
repeatedly invoking Theorem 7.3.10, we obtain all the eigenvalues and
the corresponding eigenfunctions.
For the last assertion in the theorem, let u and v be two eigenfunctions
corresponding to the same eigenvalue λ and W be their Wronskian. From
the boundary condition (7.2.3) at a, we see that W (a) = 0 and hence W ≡
0. Therefore, u and v are linearly dependent and the proof is complete.
7.5 Exercises
1. Let u, v satisfy the following equations

d du
P1 (t ) (t ) − Q1 (t )u(t ) = 0,
dt dt

d dv
P2 (t ) (t ) − Q2 (t )v(t ) = 0,
dt dt
on some interval in R, where P1 ≥ P2 > 0 are differentiable functions
and Q1 ≥ Q2 are continuous functions. If v does not vanish at any
point in a closed interval [a, b], show that
hu ib Z b Z b
(P1 u̇v − P2 uv̇) = (Q1 −Q2 )u2 dt + (P1 −P2 )u̇2 dt
v a a a
(u̇v − uv̇)2
Z b
+ P2 dt
a v2
where [ χ ]ba = χ (b) − χ (a). This formula is known as the Picone
formula. Deduce Sturm comparison theorem from this formula.
2. Suppose u satisfies the following equations

d du
P1 (t ) (t ) − Q1 (t )u(t ) = 0,
dt dt

d du
P2 (t ) (t ) − Q2 (t )u(t ) = 0,
dt dt
on some interval in R, where P1 ≥ P2 > 0 are differentiable functions

and Q1 ≥ Q2 are continuous functions. If a and b are two consecutive
zeros of u, show that P1 ≡ P2 and Q1 ≡ Q2 in [a, b].
3. Consider the constant coefficient equation
Pü + Qu = 0,
where P, Q are positive constants. Write down the corresponding
equation of the phase variable given by the Prüfer substitution and
solve it.
4. Verify the estimates in (7.3.15).
5. Suppose P is a positive C1 function on [a, b]. Solve explicitly, the
equations
u u
Pü + = 0 and Pü − = 0.
P P
6. In the following exercises, a second order differential operator L
is given on an interval I along with the boundary conditions. Find
the eigenvalues and eigenfunctions of the resulting Sturm–Liouville
problem.
(a) L u = ü, u(0) = u̇(1) = 0, I = [0, 1].
(b) L u = α ü, u(0) = u(l ) = 0, I = [0, l ] (α > 0).
(c) L u = ü, u̇(0) = u̇(1) = 0, I = [0, 1].

7. In the text, we have only considered boundary conditions which are
separated at the end points of the given interval [a, b]. Now consider
the following boundary conditions for L :
u(a) u(b)

0
A +B =
u̇(a) u̇(b) 0
with
p(a) det(B) = p(b) det(A).
Here A, B are 2 × 2 real matrices such that the block matrix [A B]
has rank 2. Verify that the boundary conditions given in (7.2.3)
satisfy the aforementioned conditions.
8. Find the eigenvalues and eigenfunctions of the following

Sturm–Liouville problems:
(a) L u = ü, I = [0, 1], u(0) = u̇(1), u̇(0) = u̇(1).
(b) L u = ü − u, I = [0, 1], u(0) − u̇(0) + u(1) = 0,
u(0) + u̇(0) + 2u̇(1) = 0.
7.6 Notes
We have studied regular Sturm–Liouville boundary value problems in
this chapter, mainly concentrating on the existence of eigenvalues and the
corresponding eigenfunctions. We have followed the approach in [BR03]
by the consideration of the Prüfer substitution. For other approaches to
this problem, the reader is referred to [Inc26, CL72, Sim91, SK07]
among others. We have not done the expansion in terms of
eigenfunctions, as this requires tools from Hilbert space. We also have
not considered the more difficult topic of singular S–L systems.
Representation of solutions of BVP through Green’s function will be
taken up in Chapter 9. We may also use the integral operator defined
through Green’s function to show the existence of eigenvalues and
eigenfunctions; but this also requires tools from functional analysis
(compact operators).
8
Qualitative Theory
8.1 Introduction
Nonlinear dynamics, essentially concentrated around the study of
planetary motions, has some claim to be the most ancient of scientific
problems, perhaps as old as geometry. It, therefore, seems surprising that
until the twentieth century, geometric methods in nonlinear dynamics
were not much pursued. Henri Poincarè is universally acknowledged as
the founder of geometric dynamics, followed by G. D. Birkoff. But apart
from a few instances such as the stability analysis of Liapunov,1
Poincarè’s ideas seemed to have had little impact on applied dynamics for
almost half a century. A reason perhaps could be that Poincarè and
Birkoff concentrated on conservative systems motivated by problems in
celestial mechanics. Dissipative systems, on the other hand, have the
property that an evolving ensemble of states occupies a region of phase
space whose volume decreases with time. Over a long period of time, this
volume has the tendency to simplify the topological structure of the orbits
in the phase space; this may be true even in an infinite dimensional phase
space, for example, governed by a partial differential equation.
In this chapter, we study the qualitative behavior of solutions to
nonlinear ODE. We wish to do this by plotting the phase portrait of these
systems similar to the one that was done for 2 × 2 linear systems in
Chapter 5. The material in this chapter will be developed through
important examples described in Chapter 1, which will be recalled in the
sequel frequently.
1 also spelled Lyapunov in some text books

We close this section with a few remarks on the phase portrait. There
is a similarity between the plotting of a phase portrait of a system and
plotting of a plane or space curve given by a parametric representation. In
both the situations, we suppress the independent variable t while plotting
the curve in question. We have already observed this in great detail while
analyzing 2 × 2 linear systems. In general, it will be more difficult to have
a complete phase portrait for nonlinear systems. Physically, the position
vector x(t ) and its velocity vector ẋ(t ) are called phases of the system,
hence the name phase portrait.
8.2 General Definitions and Results

In this section, we will introduce several concepts that will be used
throughout this chapter. Consider a system of n first order equations:
ẋ = f(x), (8.2.1)
or, as explicitly written
ẋ j = f j (x1 , x2 , · · · , xn ), j = 1, 2, · · · , n.
Here x j s, the unknowns, the components of x, and f j are real valued
functions defined on Rn , the components of the vector valued function f.
The positive integer n is referred to as the dimension of the system. Since
an ODE of any given order can be written in the form of a system of first
order equations, the aforementioned consideration is more general.
System (8.2.1) is called an autonomous system as the right side function f
does not depend on t explicitly. If f depends on t explicitly as well, the
system is referred to as non-autonomous. For example, the equation
ẋ = x + t (the 1D or one-dimensional equation) is non-autonomous. A
non-autonomous system may be converted into an autonomous system by
increasing the dimension of the system by 1 as follows: Introduce a new
unknown variable xn+1 satisfying ẋn+1 = 1. Then, an nth order
non-autonomous system ẋ = f(x,t ) can be written as an (n + 1)th order
autonomous system Ẋ = F(X), where X = (x, xn+1 ) and F = (f, 1).
However, even after this reduction, the analysis of a non-autonomous
system does not become easier. One reason for this is that, since
ẋn+1 = 1, the reduced system does not have any equilibrium points
(defined later in this section). For example, if we consider the linear
system ẋ = A(t )x(t ), then x = 0 ∈ Rn is an equilibrium point for the
Qualitative Theory 225
unreduced system, but 0 ∈ Rn+1 is not an equilibrium point for the

enlarged system, namely ẋ(t ) = A(t )x(t ), ẋn+1 (t ) = 1.
Usual assumptions on f are made so that the system (8.2.1) has a
unique, global solution for a prescribed initial condition; global means
existence for all t; see Chapter 4. Even when a solution does not exist for
all t, we can still plot the phase portrait by considering the maximum
interval of existence of the solution in question; we will consider an
example later. However, uniqueness plays a crucial role.
The solution x(t ) has the geometrical meaning of a curve in Rn and
(8.2.1) gives its tangent vector at every x(t ). For this reason, f in (8.2.1)
is also referred to as a vector field and the corresponding solution as an
integral curve of the vector field.
If x is a solution of (8.2.1), we say that x passes through x0 ∈ Rn if
x(t0 ) = x0 , for some t0 ∈ R.
Definition 8.2.1
Given a solution x of (8.2.1) passing through x0 ∈ Rn with x(t0 ) = x0 ,

for some t0 ∈ R, the orbit through x0 , is the set O (x0 ) defined by
O (x0 ) = {x(t ) ∈ Rn : t ∈ R},
and the positive orbit through x0 is the set O + (x0 ) defined by
O + (x0 ) = {x(t ) ∈ Rn : t ≥ t0 }.
Lemma 8.2.3 shows that any solution passing through x0 may be used to
define O (x0 ) or O + (x0 ) unambiguously. Generally speaking, the phase
space (plane) analysis is about describing all the (positive) orbits of
(8.2.1). The other terminologies used for orbit are trajectory and path.
We will now discuss some important properties of solutions of
autonomous systems. In the following results, statements regarding t
refer to all t ∈ R.
Lemma 8.2.2
If x is a solution of (8.2.1), define xc by xc (t ) = x(t + c) for any fixed

c and for all t. Then, xc is also a solution of (8.2.1).
Proof: Direct differentiation.
We remark that this lemma is not true for a non-autonomous system.
Lemma 8.2.3
If x and y are solutions of (8.2.1) passing through x0 ∈ Rn with

x(t0 ) = y(t1 ) = x0 , for some t0 ,t1 ∈ R, then y(t ) = x(t + t0 − t1 )
and x(t ) = y(t + t1 − t0 ) for all t.
proof: Define z(t ) = x(t + t0 − t1 ) for t ∈ R. By Lemma 8.2, z is a

solution of (8.2.1) and z(t1 ) = x(t0 ) = y(t1 ). By uniqueness, z ≡ y. This
completes one part of the proof and the other part is similar.
This lemma shows that O (x0 ) or O + (x0 ) is the same set whether x or y
is used in its definition.
Corollary 8.2.4
If x0, x1 ∈ Rn and x1 ∈ O (x0 ) (respectively, x1 ∈ O + (x0 )), then,

O (x0 ) = O (x1 ) (respectively, O + (x0 ) ⊃ O + (x1 )).
Lemma 8.2.5.
Let, x0 , x1 ∈ Rn , then, either O (x0 ) = O (x1 ) or O (x0 ) ∩ O (x1 ) = φ ,

the empty set.
Proof: If x̃ ∈ O (x0 ) ∩ O (x1 ), then by Corollary 8.2.4, it follows that

O (x0 ) = O (x̃) = O (x1 ) and the proof is complete.
Similar statements may be made regarding the positive orbits.
Lemma 8.2.6
Suppose x is a solution of (8.2.1) and there exist t0 and T > 0 such

that x(t0 + T ) = x(t0 ). Then, x(t + T ) = x(t ) for all t.
Proof: Define xT by xT (t ) = x(t + T ). Then, xT is a solution and by

hypothesis, xT (t0 ) = x(t0 ). The proof is complete by uniqueness.
Remark 8.2.7
The solution in Lemma 8.2.6 is termed as a periodic solution. The

smallest such T > 0 is called the period of x. The orbit of a periodic
solution is called a periodic orbit or closed orbit. If a periodic orbit
is isolated in the sense that there is no other periodic orbit in its
immediate vicinity, then the periodic orbit is called a limit cycle. For
example, the orbits of ẍ + x = 0 are all periodic orbits but, none of
them is a limit cycle. Limit cycles can only occur in nonlinear
systems. This is clear from the detailed study of linear systems that
has been carried out in Chapter 5. The existence of periodic solutions
to (8.2.1) is an important aspect of qualitative theory and two
important results, namely the Poincarè–Bendixon theorem and
Leinard’s theorem give sufficient conditions for the existence of
periodic solutions in 2D systems. These will be discussed later in
Section 8.7.
We now discuss an important class of solutions to (8.2.1).
Definition 8.2.8
A point x̄ ∈ Rn is called an equilibrium point or equilibrium

solution of (8.2.1) if f(x̄) = 0. An equilibrium point x̄ is isolated if
there is a neighborhood of x̄ not containing any other equilibrium
point of (8.2.1). Otherwise, the equilibrium point is non-isolated.
Thus, an equilibrium point is precisely a constant solution of (8.2.1) and

for this reason, it is also called a fixed point, critical point, steady state
solution, stationary point, or a singularity. The equilibrium solutions of
(8.2.1) are obtained by solving the system of algebraic equations f(x) = 0;
the equilibrium solutions are the common zeros (roots) of the functions
f j . For example, for a 2 × 2 linear system with a nonsingular coefficient
matrix, the origin (0, 0) ∈ R2 is the only equilibrium point. Here is an
important property of equilibrium points.
Lemma 8.2.9
Suppose x is a solution of (8.2.1) and lim x(t ) = ξ exists. Then, ξ is

t→∞
an equilibrium point.
Proof: For any fixed h > 0, x(t + h) is also a solution and converges to
ξ as t → ∞. By mean value theorem, we have
x(t + h) − x(t ) = hẋ(t˜) = hf(x(t˜))
for some t˜ between t and t + h. Hence t˜ → ∞ and x(t + h) − x(t ) → 0
as t → ∞. By continuity, we therefore get hf(ξ ) = 0 and conclude that
f(ξ ) = 0 as required.
Thus, if a solution x(t ) has a finite limit as t → ±∞, then, the limit is an
equilibrium point. In one dimension, it is easy to see that in the absence
of equilibrium points, all orbits will be unbounded and will not have finite
limits as t → ±∞.
8.2.1 Examples
Example 8.2.10
Consider the 1D equation ẋ = x2 .
Here x = 0 is the only equilibrium point. Hence, it is isolated.
Example 8.2.11
For the 1D equation

ẋ = sin x,
the equilibrium points are nπ, n ∈ Z. All these equilibrium points are
isolated.
Example 8.2.12
Let us write Duffing’s equations (1.2.32) and (1.2.33) as a 2D system:

ẋ = y, ẏ = ±x − x3 − δ y. (8.2.2)
If we take the negative sign in the second equation, then the origin (0, 0) is
the only equilibrium point. On the other hand, for the case of the positive
sign, (0, 0) and (±1, 0) are the equilibrium points. In either case, they are
isolated.
Example 8.2.13
Writing the van der Pol equation (1.2.34) as the following 2D system:
ẋ = y, ẏ = µ (x2 − 1)y − x, (8.2.3)
we immediately see that (0, 0) is the only equilibrium point; hence, it is

isolated.
Example 8.2.14
Write the pendulum equation (1.2.35) as a 2D system:

ẋ = y, ẏ = −k sin x. (8.2.4)
We find that (nπ, 0), n ∈ Z are the equilibrium points and each one of them
is isolated.
Example 8.2.15
Consider the following 2D system:

ẋ = −y + x(x2 + y2 ), ẏ = x + y(x2 + y2 ).
The origin (0, 0) is the only equilibrium point of this system (Why?).
Example 8.2.16
Consider the second order equation ẍ + xẋ = 0.
Writing this as a first order system in x, ẋ, we see that each point on the
line ẋ = 0 is an equilibrium point. Hence, none of the equilibrium points
is isolated.
8.3 Liapunov Stability, Liapunov Function

We confine ourselves to a discussion of stability of equilibrium solutions
of (8.2.1) in the sense of Liapunov. Further, we consider only isolated
equilibrium points. For a discussion of stability of arbitrary solutions, see
[Per01, Wig90]. There are also important notions of orbital stability and
structural stability; see, for example, [Wig90]
Definition 8.3.1
[Liapunov stability] An isolated equilibrium point x̄ ∈ Rn of (8.2.1)

is said to be stable (Liapunov stable) if given ε > 0, there is a δ =
δ (ε ) > 0 such that for any solution x of (8.2.1) satisfying |x(t0 ) −
x̄| < δ , we have |x(t ) − x̄| < ε for all t > t0 . Otherwise, x̄ is said to
be unstable.
Usually, we take t0 = 0. Here |x| denotes the Euclidean norm in Rn defined

by
|x|2 = x12 + x22 + · · · + xn2 , for x = (x1 , x2 , · · · , xn ) ∈ Rn .
Definition 8.3.2
[Asymptotic stability] An isolated equilibrium point x̄ of (8.2.1) is

said to be asymptotically stable if it is stable and for any solution x of
(8.2.1), there exists b > 0 such that if |x(t0 ) − x̄| < b, then lim |x(t ) −
t→∞
x̄| = 0.
Before proceeding further, we again recall from 2D linear theory that, in

the case of a non-singular coefficient matrix, the only equilibrium point
(0, 0) is asymptotically stable if all the eigenvalues have negative real
parts (complex eigenvalues occur in conjugate pairs); it is stable, but not
asymptotically stable if eigenvalues have zero real parts; and unstable in
all the other cases. In higher dimensions, the possibilities are more.
8.3.1 Linearization
We now discuss the linearization around an equilibrium point of (8.2.1).
We assume that f in (8.2.1) is a C2 function. If x̄ is an equilibrium point,
then by Taylor’s formula (see Chapter 2), we have
f(x̄ + y) = f(x̄) + Ay + O(|y|2 ) = Ay + O(|y|2 ), (8.3.1)

∂ fi
where A = Df(x̄) ≡ (x̄) denotes the Jacobian matrix of f at x̄.
∂xj
Writing x = x̄ + y and ignoring quadratic and higher order terms in y, we
obtain from (8.2.1) and (8.3.1), the following linear system:
ẏ = Ay. (8.3.2)
Equation (8.3.2) is referred to as linearized system of (8.2.1) around the

equilibrium point x̄. It is also referred to as the variation equation of
(8.2.1). Since we are concerned about the stability of x̄, it is quite
reasonable to examine the stability of 0 for the linearized system (8.3.2).
This is termed as linear stability analysis. However, we see through some
examples that the stability of 0 of the linearized system (8.3.2) may or
may not imply the stability of x̄ for (8.2.1). We do have a result in the
positive direction, namely:
Theorem 8.3.3
Suppose the eigenvalues of A in (8.3.2) all have negative real parts.

Then, the equilibrium point x̄ is asymptotically stable for (8.2.1).
This result is a special case of Perron’s theorem, discussed next.
Theorem 8.3.4
[Perron’s Theorem] [CL72] Let A be a real n × n matrix whose

eigenvalues all have negative real parts and consider the
n-dimensional system
ẋ = Ax + f(t, x), (8.3.3)
where f is a continuous function satisfying
f(t, x) = o(|x|), (8.3.4)
as x → 0, uniformly in t. Then, a solution x of (8.3.3) with sufficiently
small x(0) exists for all t ≥ 0 and x(t ) tends to 0 as t → ∞.
Thus, the identically zero solution of (8.3.3) is asymptotically stable.

The assumption on A implies that the zero solution, which is the only
equilibrium point of the linear system ẋ = Ax is asymptotically stable.
Therefore, the theorem asserts that the asymptotic stability persists under
small nonlinear perturbations. The hypothesis may be modified so that a
solution exists for small positive time; uniqueness is not an issue here. For
related issues, see [CL72].
We also remark that the existence of the solution for all t ≥ 0 is not
trivial; the reader may wish to compare in this regard the 1D equations
ẋ = x2 , ẋ = −µx + x2 , µ > 0.
Proof of Theorem 8.3.4: The hypothesis on A implies that there exist

positive constants K, σ such that
| exp(tA)| ≤ Ke−σt , for all t ≥ 0. (8.3.5)
(see, Chapter 2, Theorem 2.5.6). By the local existence result, given any
x0 ∈ Rn , there exists a solution x of (8.3.3) for small positive times, say
t ∈ [0,t ∗ ], with x(0) = x0 . Further, x satisfies the integral relation
Z t
x(t ) = exp(tA)x0 + exp((t − s)A)f(s, x(s)) ds,
0
for t ∈ [0,t ∗ ]. The hypothesis (8.3.4) on f means that given any ε > 0, there
is a δ > 0, depending only on ε such that
ε
|f(t, x)| ≤ |x|, (8.3.6)
K
for all x satisfying |x| ≤ δ and for all t. Therefore, we have from (8.3.5)
and (8.3.6), using (8.3.4),
Z t
|x(t )| ≤ Ke−σt |x0 | + ε e−σ (t−s) |x(s)| ds (8.3.7)
0
as long as the solution x(t ) satisfies the condition |x(t )| ≤ δ , for t ∈ [0,t ∗ ];
this may be achieved by choosing small |x0 | and t ∗ if necessary. Next, by
multiplying the inequality (8.3.7) throughout by eσt and using Gronwall’s
inequality, we obtain
eσt |x(t )| ≤ K|x0 |eεt .
σ
Choosing ε = , we obtain the a priori estimate
2
σ
|x(t )| ≤ K|x0 |e− 2 t , (8.3.8)
provided that |x(t )| ≤ δ . If we choose x0 such that K|x0 | ≤ δ , then, from
(8.3.8), we see that |x(t )| ≤ δ , for all t, where the local solution exists.
Therefore, for the chosen initial data x0 , all the aforementioned
arguments are justified and the solution x(t ) satisfies (8.3.8) in its interval
of existence. This allows us to extend the solution for all t ≥ 0 and (8.3.8)
holds, from which the theorem follows, using results similar to

Proposition 4.5.5 and Theorem 4.5.6 when applied to systems.
Definition 8.3.5
An equilibrium point x̄ of (8.2.1) is called a hyperbolic equilibrium

point if all the eigenvalues of Df(x̄) have non-zero real parts.
8.3.2 Examples
Example 8.3.6
Consider the 1D equation ẋ = sin x.
The equilibrium points are nπ, n ∈ Z. We now linearize around an

equilibrium point nπ. For this, we write y = x − nπ and use the
trigonometric formula sin x = sin(y + nπ ) = (−1)n sin y. Thus, the
linearized equation is ẏ = (−1)n y and therefore, the equilibrium point is
unstable if n is even and asymptotically stable if n is odd.
Let us analyze the nonlinear equation directly. See Fig. 8.1. Consider
two consecutive equilibrium points nπ and (n + 1)π. These are
themselves (trivial) orbits of the equation. Now consider a solution x(t )
starting at nπ + α with 0 < α < π, that is, x(0) = nπ + α. Since, sin x is
globally Lipschitz, the solution exists for all time and it can approach nπ
or (n + 1)π only as t → ∞. Since sin x is positive or negative in
(nπ, (n + 1)π ) according as n is even or odd, x is, respectively, either
increasing or decreasing in (nπ, (n + 1)π ). Therefore, we see that
x(t ) → (n + 1)π or nπ as t → ∞ according as n is even or odd
respectively. Thus, the phase portrait in this example consists of an
infinite number of orbits consisting of open intervals (nπ, (n + 1)π )
along with the equilibrium points nπ with n ∈ Z. The equilibrium point
nπ is unstable if n is even and asymptotically stable if n is odd, a
conclusion same as in linearization.
−2π −π 0 π 2π
Fig. 8.1 Phase line for ẋ = sin x
In this case, we also have an explicit formula for the solution, from which
these conclusions may be drawn. We have, with α as earlier
nπ + 2 arctan(Ce−t )
(
if n is odd
x(t ) =
nπ + 2 arctan(Cet ) if n is even,
where the constant C satisfies α = 2 arctan(C ).
Example 8.3.7
Let us consider one more 1D equation ẋ = x2 .
In this case, we know that the solutions do not exist for all time; but there is
a maximum interval of existence depending on the initial condition. Here
0 is the only equilibrium point and linearization gives the equation ẏ = 0.
Thus, in the linearization, 0 is stable but not asymptotically stable. For the
nonlinear equation, we have the solution in explicit form:
x0
x(t ) =
1 − x0t
with x(0) = x0 . The solution is defined in the interval ( x10 , ∞) if x0 < 0 and
in the interval (−∞, x10 ) if x0 > 0. Since any solution is always increasing,
we see that 0 is an unstable equilibrium point and its behavior is more like
a saddle point: if x0 < 0, then x(t ) → 0 as t → ∞ and when x0 > 0, then
1
x(t ) → ∞ as t → .
x0
Example 8.3.8
Consider the 2D system

ẋ = −y + x(x2 + y2 ), ẏ = x + y(x2 + y2 ).
The only equilibrium point is (0, 0). The corresponding linearized system
is
ẋ = −y, ẏ = x
and (0, 0) is stable, but not asymptotically stable for the linearized
system. However, by considering the original equations in polar
coordinates, we see that ṙ = r3 , where r2 = x2 + y2 . Thus, r is increasing
and the orbits starting near the origin spiral away from the origin as t
increases and therefore the origin is unstable. On the other hand, if we

change the signs of the nonlinear terms in the equations, we find that the
origin is asymptotically stable as r now satisfies ṙ = −r3 .
Example 8.3.9
Consider the Duffing system (8.2.2), with negative sign

ẋ = y, ẏ = x − x3 − δ y.
The equilibrium points are (0, 0) and (±1, 0). We now discuss the
linearization of the system around each of these equilibrium points. At
any point (x, y) ∈ R2 , the Jacobian of the right side functions is given by

0 1
.
1 − 3x2 −δ

0 1
At (0, 0), this becomes , whose eigenvalues are
1 −δ
1
√
2
2 (−δ ± δ + 4). Hence, for δ ≥ 0, there is always one positive
eigenvalue and the equilibrium point (0, 0) is linearly unstable.
At (±1, 0), the Jacobian matrix is given by

0 1
.
−2 −δ
√
Here, the eigenvalues are 12 (−δ ± δ 2 − 8). Hence, the equilibrium points
(±1,
√ 0) are asymptotically stable when δ > 0. If δ = 0, the eigenvalues are
± 2 i and the equilibrium points are now stable, but not asymptotically
stable in the linear approximation.
Example 8.3.10
For the van der Pol system (8.2.3)

ẋ = y, ẏ = µ (x2 − 1)y − x,
the origin (0, 0) is the only equilibrium point.
The linearized system at (0, 0) is given by

ẋ = y, ẏ = −x − µy.

0 1
The corresponding Jacobian matrix is , whose eigenvalues
−1 −µ
p
are 21 ( µ± µ 2 − 4). Thus, (0, 0) is asymptotically stable for µ < 0 and
unstable for µ > 0. For µ = 0, it is stable in the linear approximation; the
original system itself is linear when µ = 0.
Example 8.3.11
Consider the Lorenz system (1.2.36)
ẋ = −σ x + σ y, ẏ = Rx − y − xz, ż = −bz + xy,

where R, σ and b are constants. We will now find the equilibrium points of
this system and do their linear analysis. We assume that all the constants
are positive.
The origin (0, 0, 0) is always an equilibrium point. It will be the only
equilibrium point for R ≤p1. But, if R >p1, there are two more equilibrium
points given by (± b(R − 1), ± b(R − 1), R − 1). These are
symmetrically situated in the z = R − 1 plane, in the (x, y, z) space.
Next, the Jacobian of the right hand side at a general point (x, y, z) is
given by
 
−σ σ 0
R − z −1 −x .
 
y x −b
At the origin, it becomes
 
−σ σ 0
 R −1 0  ,
 
0 0 −b
whose eigenvalues are −b and the roots of the quadratic equation

λ 2 + (1 + σ )λ + σ (1 − R) = 0.
Therefore, for R < 1, all the eigenvalues are negative and when R > 1, one
eigenvalue is positive. Hence, in the linear analysis, the equilibrium point
at the origin is asymptotically stable for R < 1 and unstable for R > 1. For
R = 1, the eigenvalues are −b, 1 + σ , 0.
p p
We, next consider the equilibrium points (± b(R − 1), ± b(R − 1),
R − 1), which exist when R > 1. The corresponding Jacobian matrix is
given by
 
−σ σ 0
p
1 −1 ∓ b(R − 1) ,
 

p p
± b(R − 1) ± b(R − 1) −b
whose eigenvalues are the roots of the cubic equation
λ 3 + (σ + b + 1)λ 2 + (R + σ )bλ + 2σ b(R − 1) = 0.
The analysis of this cubic equation becomes more difficult, as there are
three parameters. Being a cubic equation with real coefficients, there is
always a real root, which can be shown to be negative. The Hurewitz
criterion (see, for instance, [Mer97]) shows that the eigenvalues all have
negative real parts if and only if
R > σ (σ + b + 1)(σ − b − 3)−1 .
We will not discuss this further and the interested reader can refer to many
works on the subject. For example, see [GH83, Hao84, Wig90]. However,
we wish to make the following remark on the value R = 1, which is special
as observed earlier. As the value of R moves from the region R < 1 to
the region R > 1, we have observed either a change in the stability of an
equilibrium point or increase in the number of equilibrium points. For this
reason, R = 1 is referred to as a bifurcation point. The topic of bifurcation
is an important and difficult part of the qualitative analysis. The interested
reader may refer to the works cited earlier.
Example 8.3.12
(See [Wig90]) We now consider a time dependent, that is, a

non-autonomous system exhibiting a somewhat unexpected behavior.
Consider the following 2D linear non-autonomous system
x˙1

x1
= A(t ) ,
x˙2 x2
where
3 3
 
2
 −1 + 2 cos t 1 − cost sint 
2
A(t ) = 
 .
3 3 2 
−1 − cost sint −1 + sin t
2 2
√
The eigenvalues of this system are given by 14 (−1± 7i) for all t and thus,
they have negative real parts. However, this system has the following two
linearly independent solutions:
− cost

sint
v1 (t ) = et/2 , v2 (t ) = e−t .
sint cost
Hence, the only equilibrium point (0, 0) is unstable of saddle type, a

conclusion that does not follow from the nature of the eigenvalues of
A(t ).
We end the section with a mention of an important theorem, namely,
the Hartman–Grobman theorem, which states that in a neighborhood of a
hyperbolic equilibrium point, the orbits of a nonlinear system and those
of its linearized system are linked via a homeomorphism. For more
details, see [Per01]. We have already seen (Perron’s theorem) that if the
linearized system around an equilibrium point has eigenvalues, all with
negative real parts, then, the equilibrium point is also asymptotically
stable for the nonlinear system. Thus, the only case when the orbits of the
nonlinear system and those of the linearized system around an isolated
equilibrium point may not be comparable is when the linearized system
has at least one eigenvalue with zero real part, that is, when the
equilibrium point is non-hyperbolic. This case can be handled by the
construction of an appropriate Liapunov function, which is the topic of
the next section. The center manifold theorem, see [Wig90], also deals in
detail with the case of a non-hyperbolic equilibrium point.
8.4 Liapunov Function

We, now discuss the stability of the equilibrium points of (8.2.1) using a
Liapunov function. This, indeed, is a very useful and powerful tool, quite
well suited to the situation of non-hyperbolic equilibrium points. Recall
that these equilibrium points are not covered by Perron’s theorem or the
Hartmann–Grobman theorem.
Assume that 0 is an isolated equilibrium point of (8.2.1). Thus, there

is an open neighborhood Ω containing 0 which does not contain any
other equilibrium point of (8.2.1). The case of a non-zero equilibrium
point can be brought to the case of a zero equilibrium point by some
suitable translation.
Definition 8.4.1
A C1 function V : Ω → R satisfying
(1) V (0) = 0, V (x) > 0 for all x ∈ Ω \ {0},
(2) ∇V · f ≤ 0 in Ω
is called a Liapunov function for (8.2.1).
Here, ∇V denotes the gradient of V and a · b for vectors a, b ∈ Rn denotes

the standard scalar or dot (inner) product.
Theorem 8.4.2
Assume that 0 is an isolated equilibrium point of (8.2.1) and the

system (8.2.1) possesses a Liapunov function V . Then, the origin 0 is
stable. If, in addition, ∇V · f < 0 in Ω \ {0}, then 0 is asymptotically
stable.
Proof: We begin with a discussion on level surfaces of V . For c > 0,

define the level surface Vc by
Vc = {x ∈ Ω : V (x) = c}.
If x ∈ Vc , we say x is at an energy level2 c. By the continuity of V , the set Vc
is a closed subset. Also in a neighborhood of a point x0 where ∇V (x0 ) 6=
0, Vc represents a surface in Rn .
If x(t ) is a solution of (8.2.1), then, by condition (2) and the chain rule,
we have
d
V (x(t )) = ∇V (x(t )) · ẋ(t ) = ∇V (x(t )) · f(x(t )) ≤ 0.
dt
2 Later, while discussing conservative equations we will see that V may be taken as the sum of
kinetic energy and potential energy. Thus, Vc may be thought of as the surface at energy level c .
Thus, the positive function V is non-increasing along the orbits of (8.2.1).

Therefore, if an orbit begins on Vc , then for all future times t ≥ t0 , the orbit
either stays on Vc or moves to a level surface Vc0 with c0 < c.
To prove stability, let ε > 0 be such that the sphere C1 with radius ε,
centered at the origin and its interior (namely, the closed ball of radius
ε) are inside Ω. See Fig. 8.2. Let m > 0 be the minimum of V on C1 .
Since V (0) = 0, by continuity, we can find a positive δ ≤ ε such that
m
V (x) ≤ < m for all |x| ≤ δ . Let C2 be the circle with radius δ , centered
2
m
at the origin. Thus, if |x(t0 )| ≤ δ , then, V (x(t0 )) ≤ ; that is, x(t0 ) is at
2
m
an energy level c ≤ . Hence, by our observation, for t ≥ t0 , x(t ) cannot
2
intersect C1 as all the points on C1 have energy level ≥ m. Thus, x(t ) lies
inside C1 , proving stability.
C1
ε
t = t0 δ
C2 Ω
η
C3
x(t )
Fig. 8.2 Proof of Liapunov’s theorem
Next, suppose, in addition, that ∇V · f < 0 in Ω \ {0}. To prove

asymptotic stability, we first show that V (x(t )) → 0 as t → ∞, if the orbit
x(t ) starts at t = t0 in a suitable neighborhood around 0. Let x(t0 ) lie
d
inside C2 . Since V (x(t )) ≤ 0, the function V (x(t )) is non-increasing
dt
and bounded below by 0 as V (0) = 0. Hence, it follows that lim V (x(t ))
t→∞
exists. Call the limit L; clearly, L ≥ 0. We claim that L = 0.
Suppose L > 0. Let C3 be the sphere of radius η < δ , centered at the

L
origin such that V (x) ≤ < L on and inside C3 ; that is, for all |x| ≤ η. The
2
existence of η again follows from the continuity of V and V (0) = 0. Now
∇V · f is continuous and negative in the closed annular region between C1
and C3 . Hence, ∇V · f has a negative minimum denoted by −k with k > 0.
Indeed, we have V (x(t0 )) ≥ V (x(t )) ≥ L and hence, the trajectory x(t )
cannot intersect the circle C3 . Therefore, the trajectory lies in the annular
region for all t ≥ t0 . Now for t > t0 ,
Zt
d
V (x(t )) = V (x(t0 )) + (V (x(s)) ds ≤ V (x(t0 )) − k(t − t0 ).
ds
t0
Since k > 0, V (x(t )) < 0 for large t, which contradicts the positivity of
V . Thus, L = 0, that is, lim V (x(t )) = 0. We leave it as an exercise to the
t→∞
reader to show that lim x(t ) exists and the limit is 0 as V (x) > 0 for x 6= 0.
t→∞
Thus, 0 is asymptotically stable. The proof is complete.
On similar lines, we have the following instability result.
Theorem 8.4.3
[Instability Result] Suppose there is a C1 function V : Ω → R

satisfying:
(1) V (0) = 0.
(2) Every sphere centered at 0 contains at least one point where V is
positive.
(3) ∇V · f > 0 in Ω \ {0}.
Then, 0 is unstable for (8.2.1).
Actually, one does not need to assume the condition (2) that every sphere
around 0 contains a point where V > 0; it may be replaced by a weaker
assumption. Theorem 8.4.3 follows from the following theorem, due to
Chetaev.
Theorem 8.4.4
[Chetaev] Suppose in any neighborhood of 0, there is a non-empty set

where V > 0 and ∇V · f > 0 in the region {V > 0}. Then 0 is unstable
for (8.2.1).
Proof: Choose an ε > 0 so that the sphere C1 centered at 0 with radius

ε is such that the closed ball Bε of radius ε is inside Ω. See Figure 8.3.
By hypothesis, there is an x0 ∈ Bε satisfying V (x0 ) > 0. Let x(t ) be a
solution of (8.2.1) with x(t0 ) = x0 . Now, V > 0 in a neighborhood of x0
and therefore, by hypothesis, ∇V · f > 0 in the same neighborhood. By a
similar integration as in Liapunov’s theorem 8.4.2, we see that
V (x(t ) ≥ V (x(t0 )) > 0, for all t ≥ t0 . In fact, the function t → V (x(t ))
is non-decreasing for t ≥ t0 . This shows that the positive orbit O + (x0 )
through x0 is confined to {V > 0}.
V = V (x0 )
V =0 K
ε
x0
Fig. 8.3 Domain for the proof of Chetaev’s theorem
Claim: This positive orbit crosses C1 after a finite time greater than t0 .
Suppose not. Then, O + (x0 ) ⊂ K ≡ {x ∈ Ω : |x| ≤ ε ,V (x) ≥ V (x0 ) > 0}.

Observe that 0 does not belong to K.
The continuous function V is bounded in the compact set K. Also, by
hypothesis, ∇V · f > 0 in K and therefore has a minimum m > 0 in K.
Hence, one more integration shows that V (x(t ) ≥ V (x0 ) + m(t − t0 ) for
all t > t0 , which contradicts the boundedness of V in K. This completes
the proof.
We now consider a few examples to illustrate stability and instability

results.
Example 8.4.5

ẋ1 = −x23 and ẋ2 = x13 .
By considering V (x1 , x2 ) = x14 + x24 , we see that the orbits (x1 (t ), x2 (t )) of

the given system are given by
x14 (t ) + x24 (t ) = c2 ,
where c is a constant. Therefore, all the orbits are closed and surround the
origin, the only equilibrium point. Hence, (0, 0) is stable, but not
asymptotically stable. Indeed, V is a Liapunov function.
Example 8.4.6

ẋ1 = −2x2 + x2 x3 , ẋ2 = x1 − x1 x3 , ẋ3 = x1 x2 .
The Jacobian matrix of this system at the equilibrium point (0, 0, 0) is

given by
 
0 −2 0
1 0 0 ,
 
0 0 0
whose eigenvalues are 0, ±2i. Consider the function V (x1 , x2 , x3 ) = c1 x12 +

c2 x22 + c3 x32 . Then, we find that
1
∇V · f = (c1 − c2 + c3 )x1 x2 x3 + (−2c1 + c2 )x1 x2 .
2
Choosing c1 = c3 > 0 and c2 = 2c1 , we find that V (x) > 0 for x 6= 0 and
∇V · f ≡ 0. Thus, the orbits of the system lie on the ellipsoids
x12 + 2x22 + x32 = c2 ,
where c is a constant. These are ellipsoids surrounding the origin and
therefore (0, 0, 0) is stable, but not asymptotically stable. This conclusion
follows directly from the nature of the trajectories. The Liapunov theorem
is not applicable here as the equilibrium point (0, 0, 0) is not isolated.
Note that this system has many more equilibrium points. In fact, the
points (0, 0, c), (0, b, 2) and (a, 0, 1), where a, b, c ∈ R, are all equilibrium
points and none of them is isolated! The reader should be able to construct
suitable Liapunov functions and do the stability analysis.
Example 8.4.7
[Slight modification of Example 8.4.6] Consider
ẋ1 = −2x1 + x2 + x2 x3 − x13

ẋ2 = x1 − x1 x3 − x23
ẋ3 = x1 x3 − x33
The reader can verify that the origin (0, 0, 0) is the only equilibrium point.
The Jacobian matrix is same as in Example 8.4.6. If we take V (x) = x12 +
2x22 + x32 , we find that ∇V · f = −2(x14 + 2x24 + x34 ) < 0 for x 6= 0. Thus,
(0, 0, 0) is asymptotically stable.
Example 8.4.8
ẋ = x2 + 2y5 , ẏ = xy2 .
The origin (0, 0) is the only equilibrium point. It is easy to see that the
linearization does not reveal much regarding the nature of the equilibrium
point. Consider the Liapunov function V (x, y) = x2 − y4 . This is not a
positive definite function, but has subsets in any neighborhood around the
origin, where it is positive. These subsets are bounded by the parabolas
x = y2 and x = −y2 . Next, along a trajectory (x(t ), y(t )) of the given
system, we find that
V̇ (x(t ), y(t )) = 2xẋ − 4y3 ẏ = 2x3 ,

using the given equations. Thus, in the region x > y2 bounded by the
parabola x = y2 , we see that both V and V̇ are positive. By Chetaev’s
theorem, we therefore conclude that the origin is unstable for the given
system.
8.5 Invariant Subspaces and Manifolds

Consider the autonomous system (8.2.1), namely, ẋ = f(x). Denote by
x(t,t0 , x0 ), the orbit passing through x0 ∈ Rn at time t0 , that is,
x(t0 ,t0 , x0 ) = x0 .
Definition 8.5.1
A set S ⊂ Rn is said to be invariant under (8.2.1) if for any x0 ∈ S, we

have x(t,t0 , x0 ) ∈ S for all t ∈ R. If we restrict this to t ≥ t0 , then, we
say that S is positively invariant, that is, x(t,t0 , x0 ) ∈ S for all t ≥ t0 .

If an invariant set S possesses a smooth manifold structure, like a smooth
curve in R2 or an open set or a subspace in Rn , then S is said to be an
invariant manifold. Obviously, a positive orbit is a positively invariant set;
so are the unions of positive orbits. Conversely, any positively invariant
set is a union of positive orbits. Similar statements hold for invariant sets.
Invariant sets play an important role in many applications. For
example, when (8.2.1) represents a population dynamics model, we are
interested in positive solutions of (8.2.1). Thus, it is necessary to ensure
that a solution of such a model remains positive for all future times when
started positive initially.
The invariant subspaces associated with a linear system was discussed
in detail in Chapter 5. We now discuss a nonlinear example. See [Wig90].
Example 8.5.2
Consider the 2D system ẋ = x, ẏ = −y + x2 .
The linearized system at the only equilibrium point (0, 0) is given by ẋ =

x and ẏ = −y, and has stable and unstable subspaces given by
E s = {(x, y) ∈ R2 : x = 0}, E u = {(x, y) ∈ R2 : y = 0}.
Turning to the nonlinear system, we eliminate the independent variable t

and obtain
dy ẏ y
= = − + x.
dx ẋ x
2
Upon integration, we find that y(x) = x3 + xc , with c a constant of
integration. Therefore, any solution (x, y) of the given system satisfies
x2

x y− = c,
3
x02

c a constant; c = x0 y0 − , x0 , y0 being the initial values of x and y
3
respectively.
We choose c = 0. If x0 = 0, then x(t ) = 0 for all t. This in turn gives
the y-axis, which is an invariant set for the given system. It is a stable
manifold, denoted by W s (0, 0), which is the same as E s , as any solution of
the given system starting on the y-axis remains there for all t and converges
to 0 as t → ∞. Now suppose x0 6= 0. Then, as c is assumed to be 0, we
x2
have y0 − 0 = 0. It is not difficult to see that the parabola given by,
3
x2
W u (0, 0) = {(x, y) ∈ R2 : y = },
3
is an invariant manifold of the given nonlinear system. It is unstable as
any non-trivial solution starting on this parabola remains there and moves
away from the origin as t increases.
Example 8.5.3
For the system in Example 8.4.6, it is easily verified that

d 2 d
(x1 (t ) + 2x22 (t ) + x32 (t )) = 0 and (x12 (t ) + x22 (t ) + 2x3 (t )) = 0,
dt dt
for any solution (x1 (t ), x2 (t ), x3 (t )) of the system. Therefore, the
ellipsoids x12 + 2x22 + x32 = a2 , a ≥ 0 is a constant and the paraboloids
x12 + x22 + 2x3 = b, b is a constant, are invariant manifolds for the system.
Analogous to the linear theory concerning stability of an equilibrium
point, we have the following:
Theorem 8.5.4
[The Stable Manifold Theorem]
Consider the autonomous system (8.2.1), namely ẋ = f(x). Assume that

f ∈ C1 (Ω), where Ω is an open set in Rn containing the origin and
f(0) = 0. Let the Jacobian matrix of f at 0, Df(0), have k eigenvalues
with negative real parts and n − k eigenvalues with positive real parts.
Then, the following hold:
• There exists a k-dimensional differentiable manifold S tangent to the
stable subspace E s of the linear system (8.3.2) at 0 such that S is
invariant under (8.2.1) and for all x0 ∈ S,
lim x(t,t0 , x0 ) = 0.
t→∞
• There exists an n − k dimensional differentiable manifold U tangent

to the unstable subspace E u of (8.3.2) at 0 such that x0 ∈ U implies
x(t,t0 , x0 ) ∈ U for all t ≤ t0 and
lim x(t,t0 , x0 ) = 0.
t→−∞
For the description of the subspaces E s and E u , see Theorem 5.6.3. A more
delicate and detailed analysis is contained in the following:
Theorem 8.5.5
[Hartman–Grobman Theorem]
Consider the autonomous system (8.2.1), namely ẋ = f(x). Assume that

f ∈ C1 (Ω) for some open set Ω ⊂ Rn containing the origin, 0 is a
hyperbolic equilibrium point of (8.2.1) and put A = Df(0), the Jacobian
matrix of f at 0. Then, there exists a homeomorphism H of an open set U
containing the origin onto an open set V containing the origin such that
for each x0 ∈ U, there is an open interval I (t0 ) ∈ R containing t0 such that
H (x(t,t0 , x0 )) = e(t−t0 )A H (x0 ), for all t ∈ I (t0 );

that is, H maps orbits of (8.2.1) near the origin onto the orbits of (8.3.2)
near the origin and preserves the parametrization by time.
For the description of the terminologies in the aforementioned two

theorems and for their proofs, the interested reader is referred to [Per01].
8.6 Phase Plane Analysis

We begin by analyzing a nonlinear equation in the conservative form:
d
ẍ + V (x) = 0, (8.6.1)
dx
where V is a given function of the position x, called potential function. A
simple computation shows that the total energy (kinetic energy + potential
energy) is conserved; that is,
1
(ẋ(t ))2 + V (x(t )) = E, (8.6.2)
2
where E is a constant, for any solution of (8.6.1) and for all t; hence the
name conservative system. Though it may not be possible, in general, to
obtain x(t ) explicitly from (8.6.2), it is however possible to obtain
valuable information regarding x(t ) and ẋ(t ) by the use of (8.6.2). This is
the content of the phase plane analysis.
If we assume that the potential function V is bounded below, then,
using (8.6.2), it is not very difficult to show that a solution of (8.6.1), with
given initial conditions, exists for all t ∈ R; see [Arn98]. The pendulum
equation and Duffing’s equation with no damping, discussed in the next
section, are examples of conservative equations with the corresponding
potential functions bounded below.
8.6.1 Examples
Example 8.6.1
[Pendulum equation for small oscillations] This is the well-known

equation ẍ + kx = 0, k > 0.
Solving the equation to obtain the solution explicitly, we see that every
solution is periodic. The same conclusion may be reached by analyzing
(8.6.2), which turns out to be
(ẋ(t ))2 + k(x(t ))2 = 2E
and E is obtained from the initial conditions. The orbits in this case are
ellipses (circles if k = 1) surrounding the origin in the phase plane.
Example 8.6.2
[Pendulum equation] This is the equation ẍ + k sin x = 0, k > 0,
and the corresponding relation (8.6.2) is given by

(ẋ(t ))2 + 2k(1 − cos x) = 2E,
with E again obtained from the initial conditions.
We now describe the phase portrait of solutions using this relation for
different values of E. The potential function may be taken as
V (x) = k(1 − cos x). Since V (x) ≥ 0, we also have E ≥ 0. Also note
that V (x) = 0 at x = 2nπ and V (x) = 2k at x = (2n − 1)π, n ∈ Z. We
consider the following different cases: (See Fig. 8.4).
z = V (x) = k(1 − cos x)
z = E = 2k
z = E < 2k
x
−3π −2π −π 0 π 2π 3π
ẋ
Fig. 8.4 Potential function and phase plane for the pendulum equation
Case 1: E = 0: Here, we obtain the equilibrium solutions (2nπ, 0), n ∈

Z.
Case 2: 0 < E < 2k: In this case, the values of x will be restricted to
a symmetric interval around each equilibrium point 2nπ, n ∈ Z of length
2b, where 0 < b < π and V (2nπ ± b) = E; see the graph of the potential
function V in Figure 8.4. To see the nature of the orbits, consider now an
orbit x(t ), around (0, 0) with x(0) ∈ (−b, b). If ẋ(0) > 0, then the orbit
starts moving towards b and when it reaches b, at that time ẋ is 0, because
of energy conservation. Then, ẋ must become negative, for otherwise, x
will exceed b, which is not allowed. Now the orbit starts moving towards
−b and when it reaches −b, similar observations can be made. Thus, the
orbit is a periodic orbit traveling between −b and b with corresponding
values of ẋ given by the energy conservation equation. Similar periodic
orbits exist around each of the equilibrium points (2nπ, 0), n ∈ Z. We need
to just watch the orientation of the orbits: whether they rotate in clockwise
or counter-clockwise direction. Therefore, these equilibrium points are all
stable but not asymptotically stable.
Case 3: E = 2k: We now obtain the equilibrium solutions
((2n − 1)π, 0), n ∈ Z. We can argue in a similar way as in the previous
case. For example, if an orbit x(t ) starts with x(0) ∈ (−π, π ) (the other
cases of odd multiples of π are similar), then the orbit approaches the
equilibrium point (±π, 0) as t → ∞ according to whether ẋ is positive or
negative at t = 0. Thus, all the solutions in this case are bounded, but
none is periodic and the equilibrium points ((2n − 1)π, 0), n ∈ Z are
unstable. In this case, it is also possible to write down the solutions
explicitly. For example, if x(0) = x0 ∈ (−π, π ) with corresponding ẋ(0)
obtained from the energy conservation equation, then the solution x(t ) is
given by
y − y−1

x(t ) = 2 arctan ,
2
x x
0 0
where y = y0 exp(±k1/2t ) with y0 = sec + tan and the sign
2 2
is chosen as the sign of ẋ(0). We also observe that, if ẋ(0) > 0, then
lim x(t ) = π and lim x(t ) = −π. Such an orbit is called a heteroclinic
t→∞ t→−∞
orbit.
Case 4: E > 2k: Here, we have (ẋ)2 ≥ E − 2k > 0 and therefore, ẋ

is bounded away from 0. Thus, the solution is either strictly increasing
or decreasing for all time according to whether ẋ is positive or negative
initially. Therefore, the solutions in this case are unbounded, though ẋ
remains bounded.
The orbit of Case 3 is referred to as a separatrix as it separates the
periodic orbits (Case 2) from the non-periodic (Case 4) orbits.
Example 8.6.3
[Duffing’s equation with no damping] This is the equation given by

ẍ − x + x3 = 0
2
and is in the conservative form, with the potential function V (x) = − x2 +
x4
4 . The corresponding energy equation (8.6.2) is given by
1 2 x2 x4
ẋ − + = E.
2 2 4
We will now do a similar phase portrait analysis as in the previous
example. Note that V is symmetric around the origin, that is,
V (x) = V (−x) and attains its minimum at ±1 with V (±1) = − 14 and
√
V (± 2) = 0. Thus, E ≥ − 41 and we consider the following cases. See
Fig. 8.5.
Case 1: E = − 14 : Here, we obtain only the equilibrium solutions
(±1, 0).
Case 2: − 14 < E < 0: In this case, the values of x are restricted to
the symmetric intervals around the equilibrium points (±1, 0) of length
2b where b > 0 satisfies V (±1 ± b) = E; see the graph of the potential
function in Fig. 8.5. We, then, obtain periodic solutions surrounding each
of these equilibrium points separately.
Case 3: E = 0: Now, we first obtain the equilibrium solution (0, 0).
Thus, any other orbit can reach this equilibrium point only as t √→ ±∞.
√ such solution x therefore lies either in the interval (0, 2] or
Any
[− 2, 0) and is thus bounded. The direction of the orbit for increasing t
can easily be determined and is shown in Fig. 8.5. In this case, we obtain
2 orbits, one with x(0) > 0 and the other with x(0) < 0. Each one of
these orbits approaches the equilibrium point (0, 0) both as t → ±∞.

Such orbits are termed homoclinic orbits. Note that the equilibrium point
(0, 0) is unstable.
V (x )
E >0
x (E = 0)
−1/4 < E < 0
E = −1/4
ẋ
Fig. 8.5 Potential function and phase portrait of a Duffing oscillator
We also have an explicit formula for a solution when E = 0. For

example, if x(0) = x0 > 0 and the corresponding ẋ(0), given by energy
conservation equation (8.6.2) is non-negative, then, the solution x(t ) is
given by
√ et √
x(t ) = 2 2y1/2
0 ≡ 2 2(z + z−1 )−1 ,
1 + y0 et
√ √
2 − 2−x02
where y0 = √ √ and z = y1/2 t
0 e . (Replace x by −x if x0 < 0 and z
2 + 2−x02
by y0 /z if ẋ(0) ≤ 0.)
Case 4: E > 0: The values √ of a solution x are now restricted to an
interval [−b , b], where b > 2 satisfies V (±b) = E. We again obtain a
periodic orbit, now surrounding all the three equilibrium points.
8.7 Periodic Orbits

So far we have discussed the equilibrium points (trivial solutions) of an
autonomous system ẋ = f(x), and their linear and nonlinear stability.
Next in simplicity are periodic solutions (also called, closed paths or
orbits), which are ‘simpler’ than the general solutions. Recall that from
Lemma 8.2.6, it follows that if x(t ) is a solution such that
x(t0 + T ) = x(t0 ) for some t0 and T > 0, then x(t + T ) = x(t ) for all t, so
that x is periodic. The smallest such T > 0 is the period of x. The
equilibrium points may be considered as periodic solutions with period 0.
Unlike equilibrium points, it is, in general, very difficult to say when a
given autonomous system has a periodic solution. The class of periodic
solutions is interesting and important in applications as many systems
exhibit them; for example, planetary motion and the motion of a simple
pendulum. We have also seen in the previous section that the pendulum
equation and Duffing’s equation having periodic orbits in certain, but not
all, situations.
We now restrict our discussion to 2D systems as results are more
complete in this case. For higher dimensional systems, the reader is
referred to advanced texts cited in the references. Now, we change the
notation a bit and consider the following 2D autonomous system:
ẋ = f (x, y), ẏ = g(x, y), (8.7.1)
where x, y are real valued functions and f , g are real valued, Ck (k ≥ 2)
functions defined in a domain of R2 . We shall now discuss the notion of
Poincarè index and its implications to the periodic orbits of 2D systems.
We begin with a discussion on smooth curves in the plane. We will not
go into much detail and the reader should refer good text books on one
variable complex analysis and multivariable calculus.
Let Γ be a smooth curve in R2 . This means that as a set, Γ is the image
of a smooth mapping from a closed interval [a, b] in R into R2 . Thus, there
are functions x, y: [a, b] → R such that
Γ = {(x(t ), y(t )) ∈ R2 : a ≤ t ≤ b}.
This is referred to as a parametric representation of a curve in R2 , t is
referred to as the parameter and the smoothness refers to the smoothness
of the functions x, y defining Γ. The points (x(a), y(a)) and (x(b), y(b))
are, respectively, referred to as initial and end points of Γ. A point on
Γ is called a multiple point if it corresponds to more than one parameter

value in [a, b). A curve with no multiple points is called a Jordan curve.
When initial and end points coincide, Γ is said to be a closed curve. In
what follows, we are mainly interested in closed Jordan curves.
For example, {(cost, sint ): 0 ≤ t ≤ 2π} represents the unit circle
centered at the origin, and it is a closed Jordan curve. On the other hand,
the curve {(cost, sint ): 0 ≤ t ≤ 4π}, though same as the unit circle
centered at the origin, is not a Jordan curve.
Let v = (v1 , v2 ) be a smooth vector field, that is, v is a mapping from
a bounded open set of R2 into R2 . Thus, v1 and v2 are smooth real valued
functions defined on a bounded open set in R2 . Eventually, we will be
interested in taking v1 = f and v2 = g, where f , g are as in (8.7.1). Let Γ
be a closed Jordan curve, contained in the domain of v such that v does not
vanish on Γ. The points where v vanishes may be called the equilibrium
points of the vector field v; the assumption means that Γ does not pass
through any equilibrium points of v.
In this set up, at each point (x, y) of Γ, the vector field v(x, y) defines a
unique direction making an angle φ with some fixed direction, which we
take as the x-axis. Starting with a point on Γ, if we now move in the
counter-clockwise direction along Γ, these vectors on Γ rotate, (See
Fig. 8.6) and when we get back to the point where we started, these
vectors would have rotated through an angle 2πk for some integer k. This
integer is called the Poincarè index or simply index of Γ and is denoted
by Iv (Γ).3
Fig. 8.6 A typical closed curve Γ with a vector field
A good way to visualize the variation of the angle the vector v makes with
the x-axis as it moves in the positive direction along Γ, is to place all these
3 A reader familiar with one variable complex analysis would realize that the Poincarè index is
similar to the notion of the winding number of a closed curve in the plane.
vectors at one point and see how these vectors rotate. The reader should
try this with the simple examples mentioned a little later and more.
Since the angle φ = arctan(v2 /v1 ), it is not difficult to see that the
analytic expression for the index is as follows. Suppose the vector field is
given by v = (v1 , v2 ), where v1 , v2 are smooth real valued functions. Then,
Z
1 1 v2 (x, y) 1 v1 dv2 − v2 dv1
Z Z
Iv (Γ) = dφ = d arctan = .
2π Γ 2π Γ v1 (x, y) 2π Γ v21 + v22
(8.7.2)
It is important to keep the direction right as far as the line integral is
concerned; it is always in the counter-clockwise direction, which we may
call positive direction. Using the parametric representation of Γ, the line
integral in (8.7.2) may be expressed as the following one-dimensional
integral:
1 b 2

dv2 dv1
Z
2 −1
Iv (Γ) =

v + v2 v1 − v2 ds, (8.7.3)
2π a 1 ds ds
where, in the integrand, vi = vi (x(s), y(s)), i = 1, 2. The derivatives with
respect to s may be evaluated, using the chain rule, in terms of the partial
derivatives of v1 , v2 and the derivatives of x, y. Before proceeding to see
the relevance of the index with the periodic orbits of (8.7.1), we will see
some examples.
Example 8.7.1
Let Γ be the unit circle centered at the origin.

1. Let the vector field be given by v1 (x, y) = x and v2 (x, y) = y. If we
use the parametric representation x(s) = cos(s) and y(s) = sin(s),
0 ≤ s ≤ 2π, for Γ, we obtain Iv (Γ) = 1.
2. Now take the vector field given by v1 (x, y) = y and v2 (x, y) = −x.
We again find that Iv (Γ) = 1.
3. We consider the vector field v1 (x, y) = x, v2 (x, y) = −y. This time,
we find that Iv (Γ) = −1.
Example 8.7.2
Let Γ be the unit circle centered at (−2, 0) and the vector field be
v1 (x, y) = x, v2 (x, y) = y. We now find that Iv (Γ) = 0.
Example 8.7.3
Let Γ be again the unit circle centered at the origin and the vector field
be given by v1 (x, y) = x2 , v2 (x, y) = y2 .
The reader should try to visualize the rotation of the given vector field
along the positive direction on Γ as described earlier. We leave it to the
reader as an exercise to show that Iv (Γ) = 2. On the other hand, if we now
take v1 (x, y) = −x2 , v2 (x, y) = y2 , the index will be −2.
These are typical examples covering many vector fields with isolated
equilibrium points.
Theorem 8.7.4
Let Γ be a closed Jordan curve such that Γ and its interior DΓ do not
contain any equilibrium points of a smooth vector field v. Then,
Iv (Γ) = 0.
The theorem is true with minimal assumption of continuity. See [CL72,

Lef77]. However, the following proof is simpler using in addition, the
smoothness assumption (see [JS03]).
Proof: By Green’s theorem,4 we have
ZZ
∂P ∂Q
Z
P dy − Q dx = + dx dy (8.7.4)
Γ DΓ ∂x ∂y
for any continuously differentiable functions P, Q. The line integral on
the left side of (8.7.4) may be expressed as a one-dimensional integral for
any given parametric representation of Γ and on the right side, we have a
double integral. In (8.7.2), we write, using chain rule,
dv1 ∂ v1 dx ∂ v1 dy dv2 ∂ v2 dx ∂ v2 dy
= + , = + .
ds ∂ x ds ∂ y ds ds ∂ x ds ∂ y ds
Therefore, (8.7.3) becomes the line integral
1
Z
Iv (Γ) = P(x, y) dx − Q(x, y)dy,
2π Γ
4 Green’s theorem is more generally valid for any bounded open set with a smooth boundary.
where,

−1 ∂ v2 ∂ v1 2 −1 ∂ v2 ∂ v1
v21 + v22 2

Q=− v1 − v2 , P = v1 + v2 v1 − v2
∂x ∂x ∂y ∂y
and they satisfy the conditions of Green’s theorem, as the denominator is
never zero on Γ and in its interior DΓ . Hence, using (8.7.4), we obtain
ZZ
∂P ∂Q
Iv (Γ) = + dx dy.
DΓ ∂ x ∂y
It is easy to check that the integrand, after the evaluation of the partial
derivatives, is identically zero. Thus, Iv (Γ) = 0 and the proof is complete.

Example 8.7.2 illustrates this theorem. If we examine the proof carefully,

we see that the theorem is valid for any closed curve, not necessarily a
Jordan curve, for which the hypothesis is satisfied and Green’s theorem is
valid. For example, the closed curve may be taken as the union of two
concentric circles connected by a line. This is illustrated in the next
interesting corollary.
Corollary 8.7.5
Suppose Γ1 and Γ2 are two simple closed curves in the plane, one
lying in the interior of the other, such that the vector field v has no
equilibrium points on Γ1 , Γ2 and the ‘annular’ region between them.
Then, Iv (Γ1 ) = Iv (Γ2 ).
Γ2
L
Γ1
Fig. 8.7 The closed curve to prove Iv (Γ1 ) = Iv (Γ2 )

Proof: Suppose Γ2 lies in the interior of Γ1 . Let L be a line joining them.

Consider the closed curve Γ shown in Fig. 8.7. By the theorem, Iv (Γ) = 0.
Performing the line integral, we see that Iv (Γ) = Iv (Γ1 ) − Iv (Γ2 ). This
The conclusion of the corollary is usually stated as: If a closed curve not
containing any equilibrium points of v, is continuously deformed without
crossing any equilibrium points of v, then the index is unchanged. Thus,
the index is in a way independent of the closed curve in question and
is associated with some special points in the plane. This enables us to
talk of the index of an isolated equilibrium point x0 of the vector field
v as the index of any Jordan curve containing x0 in its interior, and no
other equilibrium point of v. This will be denoted by Iv (x0 ). The reader is
advised to look at the examples discussed earlier keeping this discussion
in mind. For isolated equilibrium points, it can be shown that the index
of an equilibrium point, computed from the linearized system, is the same
for the nonlinear system as well. For more details, see [CL72, JS03]. For
linear systems, we can do the computations easily and find the following.
Suppose the vector field v is linear and is given by
v1 (x, y) = ax + by, v2 (x, y) = cx + dy, ad − bc 6= 0.
Then, the origin is the only equilibrium point of v and one finds, using
(8.7.4), that Iv (0) = the sign of ad − bc. This is left as an exercise to the
reader.
The idea of the proof of the corollary can also be extended to prove the
following general result.
Theorem 8.7.6
If Γ is a closed Jordan curve surrounding a finite number of

equilibrium points a1 , · · · , an of a vector field v (See Fig. 8.8) which
n
does not vanish on Γ, then, Iv (Γ) = ∑ Iv (ai ).
i=1
Fig. 8.8 The closed curve to prove Theorem 8.7.7
Theorem 8.7.7
Let Γ be a closed Jordan curve with a continuous tangent vector v at
each point of Γ, which has no equilibrium points on Γ. Then, Iv (Γ)= 1.
Proof: If u(p) is the unit tangent vector to Γ at p, then, clearly

Iv (Γ) = Iu (Γ). Thus, it suffices to prove the theorem for u. As rotation
and translation of Γ will not affect its index, we may assume that Γ lies in
the region y ≥ 0 in the (x, y)-plane,and that Γ is parameterized by
dx dy
(x(s), y(s)), a ≤ s ≤ b. Thus, v(s) = (s), (s) . We will assume
ds ds
that v(a) is in the direction of the positive x-axis. See Fig. 8.9.
y t
(a, b) (b, b)
p(t )
T
ū p(s)
x s
(a, a) (a, a)
Fig. 8.9 Proof of Theorem 8.7

We will now construct an auxiliary vector field ū, which will be used to
prove the theorem. Let
T = {(s,t ) : a ≤ s ≤ b, s ≤ t ≤ b},
be the triangular region in the (s,t )-plane. Define ū on T by
ū(s, s) = u(s),
for a ≤ s ≤ b and ū(a, b) = −u(a); at all other points in T , define ū(s,t ) to
be the unit vector in the direction from p(s) to p(t ) on Γ. See Figure 8.7.
Let θ (s,t ) be the angle the vector ū(s,t ) makes with the positive x-axis.
Therefore, θ (a, a) = 0. Since Γ is assumed to lie in the region y ≥ 0,
θ (a,t ) varies from 0 to π as t varies from a to b. Similarly, θ (s, b) varies
from π to 2π as s varies from a to b. Also, by the definition, ū does not
vanish on the boundary Γ̄ of T . Therefore, by Theorem 8.7.4, we conclude
that Iū (Γ̄) = 0. This means that the variation of θ (s, s) as s varies from a
to b is 2π. But this is precisely the same as saying that the variation of the
angle that u makes with the positive x-axis as Γ is traversed once in the
positive direction is 2π. Hence, Iu (Γ) = 1 and the proof is complete.
The following theorem and corollaries are immediate consequences of

these results.
Theorem 8.7.8
Suppose C is a periodic orbit of (8.7.1). Put v = ( f , g). Then,

1. Iv (C ) = 1.
2. Assume that the equilibrium points of v are only isolated. Then,
C contains only finitely many equilibrium points of v, the sum of
whose indices is 1.
From these two properties of the index, the following theorem is an

immediate consequence.
Theorem 8.7.9
A periodic orbit of (8.7.1) necessarily surrounds an equilibrium point.
Theorem 8.7.9, thus, asserts that equilibrium points are necessary for the
existence of a periodic orbit in two-dimensions. However, it is interesting

to note that the presence of equilibrium points is not a necessary condition
for the existence of a periodic orbit in higher dimensions.
Example 8.7.10
Consider the following 3D system

ẋ1 = x2 , ẋ2 = −x1 , ẋ3 = 1 − x12 − x22 .
This system does not have any equilibrium points, but has periodic orbits
given by (cost, sint, c), for arbitrary constants c.
Theorem 8.7.11
[Bendixon’s criterion] If in a region Ω of R2 (the phase plane), the

∂f ∂g
term + has a definite sign, then Ω contains no periodic orbit
∂x ∂y
of (8.7.1).
Proof: If Ω contains a periodic orbit C = {(x(t ), y(t )) : 0 ≤ t ≤ T }, let

D be the region interior to C. Then, by Green’s theorem, we have
ZZ
∂ f ∂g
Z
f dy − g dx = + dx dy. (8.7.5)
C D ∂x ∂y
The line integral on the left side of (8.7.5) is
Z T
( f ẏ − gẋ) dt,
0
which vanishes using (8.7.1) and, by hypothesis, the right side is either
positive or negative. This contradiction proves the theorem.
Now, we state the important Poincarè–Bendixon theorem, whose proof

will be given in the Appendix.
Theorem 8.7.12
[Poincarè–Bendixon Theorem] Let Ω be a bounded region in the

phase plane together with its boundary and assume that Ω does not
contain any equilibrium point of (8.7.1). If C = {(x(t ), y(t ))} is an
orbit which lies in Ω at t = t0 and remains there for all t > t0 , then, C
itself is a periodic orbit or it spirals towards a periodic orbit as t → ∞.

Thus, in the case of two-dimensional systems, a non-trivial orbit is either

unbounded or bounded. In the latter case, either it eventually enters an
equilibrium point or the Poincarè–Bendixon theorem applies.
While working on a particular situation, Ω will be taken as an annular
region centered at an equilibrium point of (8.7.1), but not containing any
equilibrium points of (8.7.1). We, then, have the existence of a periodic
orbit once we show that a positive orbit gets trapped in Ω. We illustrate
this in the following example.
Example 8.7.13
Consider the following 2D system

ẋ = −y + x(1 − x2 − y2 ), ẏ = x + y(1 − x2 − y2 ).
Using polar coordinates x = r cos θ and y = r sin θ , we find that r2 = x2 +

1
y2 and tan θ = y/x. Therefore, rṙ = xẋ + yẏ and θ̇ = 2 (xẏ − yẋ). Thus,
r
using the given equations satisfied by x and y, we obtain
ṙ = r (1 − r2 ) and θ̇ = 1.
Observe that for any orbit (x(t ), y(t )), r (t ) is increasing in the region r < 1
and decreasing in the region r > 1. Therefore, if we take, for example,
Ω to be the annulus {1/2 < r < 3/2}, we see that any orbit that starts
in Ω remains there for all future times; this incidentally proves that the
solution exists in [t0 , ∞) for any initial time t0 . Also, Ω does not contain the
origin, the only equilibrium point of the given system. Therefore, by the
Poincarè–Bendixon theorem, we conclude that the given system possesses
a periodic orbit in Ω.
Looking at the system satisfied by r and θ , we see that it is decoupled
and therefore, it is straightforward to integrate this system. We ask the
reader to show that
r (t ) = (1 + c exp(−2(t − t0 )))−1/2 , θ (t ) = θ0 + (t − t0 ),
1−r02
where t0 is the initial time and c = r02
is given in terms of the initial
condition for r (t0 ) = r0 > 0. Hence, every orbit that either begins inside
or outside the circle r = 1, spirals towards the circle r = 1 as t → ∞. Thus,
r = 1 is the only periodic orbit for the given system and it is a stable limit
cycle.
The following theorem, due to Leinard, (see [Sim91, SK07]) for the
existence of periodic orbits of second order equations has particularly
verifiable hypothesis compared to the Poincarè–Bendixon theorem.
Theorem 8.7.14
[Leinard’s Theorem] Consider the second order equation

ẍ + f (x)ẋ + g(x) = 0, (8.7.6)
where f , g satisfy the following conditions.

(1) f , g are C1 functions.
(2) g is odd, that is, g(−x) = −g(x) and g(x) > 0 for x > 0; f is even,
that is, f (−x) = f (x).
(3) The odd function F (x) = 0x f (s) ds has exactly one positive zero a;
R
F is negative in (0, a) and is positive and non-decreasing in (a, ∞)

with F (x) → ∞ as x → ∞.
Then, (8.7.6) has a unique periodic orbit surrounding the only
equilibrium point (0, 0) in the phase plane. Further, this periodic orbit is
spirally approached by every other (non-trivial) orbit as t → ∞.
Thus, the periodic orbit is a stable limit cycle. The proof will be given
in the Appendix. We, illustrate the applicability of Leinard’s theorem by
an example.
Example 8.7.15
(van der Pol Equation) Applying Leinard’s theorem to the van der Pol
equation,
ẍ + µ (x2 − 1)ẋ + x = 0, µ > 0,
we see that it possesses a unique periodic orbit, which is a stable limit
cycle.
8.8 Exercises
1. In the following systems, find the equilibrium points, draw the phase
portraits and find explicit solutions wherever possible.
(a) Consider the Lotka–Volterra prey–predator model discussed in
Example 1.2.6. This system is given by ẋ = ax − bxy, ẏ =
−cy + dxy, where a, b, c, d are all positive real numbers.
(b) Consider the system ẋ = y + sin x, ẏ = x − cos x.
(c) Show that all solutions of the system
ẋ = x2 + y sin x,
ẏ = 1 + xy cos y,
which start in the first quadrant must remain in the first quadrant
for all future time.
(d) (Competition between two species) Consider the system
ẋ = ax − bxy − ex2 ,
ẏ = −cy + dxy − f y2 ,
where the constants a, b, . . . are all positive. If c/d > a/e,
show that every orbit that starts in the first quadrant approaches
the equilibrium point (a/e, 0) as t → ∞. (Note that this system
is a generalization of the logistic model with two competing
species, that is, the Lotka–Volterra prey–predator model).
2. Consider the equation ẍ = ex . There are no equilibrium points!
Hence, all solutions are unbounded. Find the solution explicitly
using the equation of conservation of energy (8.6.2).
3. Do the same as in the previous exercise for the equation ẍ = −ex . Is
there any difference?
4. Work out all the details in Example 8.6.2 of the pendulum equation
when E = 2k.
5. Work out all the details in Example 8.6.3 of Duffing’s equation with
no damping when E = 0.
6. Consider the equation −ẍ + xẋ = 0. For this equation, all the
equilibrium points are non-isolated! Solve this equation explicitly.
7. The viscous Burgers’ equation is the following nonlinear partial

uτ + uuξ = εuξ ξ ,
where u = u(τ, ξ ) is the unknown function of the real variables τ
and ξ and ε > 0. By rescaling the variables, we may take ε = 1. A
traveling wave solution is a solution of the form u(τ, ξ ) = x(t ),
t = ξ − cτ for a constant c. If we substitute this into Burgers’
equation, we see that x satisfies, after some adjustments, the
equation in Exercise 6 above.
Now it is required to find a solution x of the aforementioned equation
satisfying
lim x(t ) = a and lim x(t ) = −a.
t→−∞ t→∞
for appropriate a 6= 0. Using the knowledge of the solutions, show

that this is possible if and only if a > 0 and write down the solution
in explicit form.
8. Consider a 2n-dimensional linear system
ẋ = Ax + By,
ẏ = Cx + Dy.
Here, x, y ∈ Rn are functions of t and A, B, C and D are real n × n

(constant) matrices. If B, C are symmetric and D = −At (superscript
t denotes the transpose of the matrix), show that the given system is
a Hamiltonian system and write down a corresponding Hamiltonian
function.
9. Work out all the details in Example 8.7.13. Do similar analysis for
the following:
(a) ẋ = x + y − x exp(x2 + y2 − 4), ẏ = −x + y − y exp(x2 + y2 − 4).

(b) ẋ = 2x + 3y − x(x2 + y2 ), ẏ = −3x + 2y − y(x2 + y2 ).
10. Draw the phase portrait of ẍ = 21 (x2 −1). Derive the formulas for the
solution x in Example 8.2.6, using the method of separable variables.
11. Work out all the details in Example 8.6.11.
12. Let (x, y) be a solution of the system in Example 8.5.2 with initial
data (x0 , y0 ) at t = 0. Show that
x2

(a) x y − is a constant.
3
x(t )2 x02

(b) y(t ) − =e −t y0 − for all t ∈ R.
3 3
8.9 Notes
In this chapter, we have studied the very basic notion of stability of an
equilibrium point of an autonomous system. This is an important aspect
in many physical systems such as mechanical systems, motion of a
satellite, etc. The stability analysis of a hyperbolic equilibrium point of a
general autonomous system follows from the analysis of the linearized
system. This follows from the theorems of Perron and
Hartman–Grobman. However, the case of a non-hyperbolic equilibrium
point is more delicate. We have discussed a powerful tool, namely the
Liapunov function, to deal with this situation. Though the stability results
are easy to state and prove using the method of Liapunov, it is not trivial
to construct a Liapunov function for a general system. For polynomial
vector fields, one may try to use the quadratic forms to generate a
Liapunov function.
In the same way, one can consider the stability of an orbit of a
solution. However, a notion called structural stability is required, which is
not considered here, and which is more involved and complicated.
Another aspect we have completely left out is the study of stability of a
periodic orbit. We have briefly mentioned this during our discussion on
Floquet theory, which concerns linear systems with periodic orbits. An
interested reader, after thoroughly going through the present chapter, may
look into more advanced texts regarding the other aspects of stability
theory. A good list of references is [CL72, HSD04, Wig90, Per01, JS03].
9
Two Point Boundary Value
Problems
9.1 Introduction
In this chapter, we discuss some boundary value problems (BVPs) for
linear and nonlinear second order equations. These problems arise in a
vast number of practical situations ranging from physics, engineering to
biology. For a very good collection of such problems and their detailed
descriptions, refer to [AMR95].
The analysis, in the linear case, makes use of the existence of two
linearly independent solutions, discussed thoroughly in Chapter 3. These
are, then, used to construct the so-called Green’s function of the given
BVP, which in turn will generate the required solution.
The nonlinear case is more delicate. We describe a well-known
method, shooting method, to prove the existence and uniqueness of a
solution to BVP. Several examples will be discussed in detail as
illustrations of the theory developed. Doing a phase plane analysis, at
least in the case of autonomous equations, may help decide whether a
solution to the given BVP is possible or not. However, in most situations
one needs to use a suitable numerical scheme to obtain a solution.
The study of the existence and uniqueness of solutions to a BVP is
more difficult than that of an IVP, even in the linear case. We first look at
some examples to see these difficulties.
Example 9.1.1
Consider the equation ü + u = 0, in (0, b) with boundary conditions

u(0) = 0, u(b) = β .
Here, b > 0. The general solution to this equation is given by

u(t ) = c1 sint + c2 cost,
for arbitrary constants c1 , c2 . If this solution were to satisfy the given
boundary conditions, then, we have c2 = 0 and c1 sin b = β . From this,
we observe the following:
• If b = π and β 6= 0, then there is no solution.
• If b = π and β = 0, then there are infinitely many solutions.
• If b < π, then there is a unique solution u(t ) = c sint provided that
|β | ≤ |c|.
Example 9.1.2
Consider the equation −ü = f (t ) in (0, 1), with the boundary

conditions u̇(0) = γ1 and −u̇(1) = γ2 .
This represents a steady state heat flow in a rod; the boundary conditions
represent the heat fluxes at the ends of the rod. The given function f ,
represents the external heating or cooling of the rod.
Assuming that a solution exists, we obtain after an integration that
Z 1
f (t ) dt = −u̇(1) + u̇(0) = γ2 + γ1 .
0
The left hand side represents total heat (or cooling) supply to the rod and
the term on the right side represents the total heat flux at the ends.
Therefore, we immediately conclude that no solution exists if, for
example, f ≡ 1 and γ1 = γ2 = 0. On the other hand, if f (t ) = sin(2πt ),
0 ≤ t ≤ 1 and γ1 = γ2 = 0, then there are infinitely many solutions given
by
t 1
u(t ) = a − + 2 sin(2πt ),
2π 4π
with an arbitrary constant a.
Example 9.1.3
Consider the equation ü + a0 u = 0 in (0, 1) with boundary conditions

u(0) = a1 , u(1) = a2 .
Two Point Boundary Value Problems 269
Here a0 , a1 , a2 are real constants. The reader should work out the details
to find out various cases of existence and non-existence of solutions.
9.2 Linear Problems

We consider the following BVP for a general second order linear equation
ÿ + α (t )ẏ(t ) + β (t )y(t ) = g(t ), t ∈ (a, b), (9.2.1)
" # " # " #

y(a) y(b) γ1
A +B = . (9.2.2)
ẏ(a) ẏ(b) γ2
Here, α, β and g are given continuous functions defined on a compact

interval [a, b] in R and A, B are given 2 × 2 non-zero real matrices; γ1 , γ2
are arbitrarily prescribed real numbers. The problem is to find a C2
function y defined on [a, b] that satisfies (9.2.1) and (9.2.2). The linear
boundary conditions given by (9.2.2) are most general. If γ1 = γ2 = 0, the
boundary conditions are called homogeneous; otherwise, they are called
non-homogeneous. For example, the homogeneous boundary
conditions

1 0 0 0
y(a) = 0 and y(b) = 0 result if we take A = ,B= and
0 0 1 0
γ1 = γ2 = 0.
Corresponding to (9.2.1) and (9.2.2), we now consider the following

homogeneous problem
ÿ + α (t )ẏ(t ) + β (t )y(t ) = 0, t ∈ (a, b), (9.2.3)
" # " # " #

y(a) y(b) 0
A +B = . (9.2.4)
ẏ(a) ẏ(b) 0
We analyze the question of existence and uniqueness of solutions by
relating problems (9.2.1), (9.2.2) and (9.2.3), (9.2.4). Suppose y1 , y2 are
any two linearly independent solutions of the homogeneous equation
(9.2.3). Then, any general solution of (9.2.3) is given by y = c1 y1 + c2 y2
for arbitrary real constants c1 and c2 . If y were to satisfy the boundary
conditions (9.2.4), then c1 , c2 should satisfy
" # " #
c1 0
(AW(a) + BW(b)) = . (9.2.5)
c2 0

y1 (t ) y2 (t )
Here, W(t ) = denotes the Wronskian matrix of the
ẏ1 (t ) ẏ2 (t )
solutions y1 and y2 at t. The aforementioned system of linear algebraic
equations has a non-trivial solution if and only if the matrix
AW(a) + BW(b) is singular, that is, its rank is either 0 or 1; notice that
y ≡ 0 is always a solution of (9.2.3) and (9.2.4) (the trivial solution).
Also, note that W(t ) is non-singular and the rank of AW(a) + BW(b)
does not depend on any particular choice of a pair of linearly independent
solutions.
Now, let y0 be any particular solution of the non-homogeneous
equation (9.2.1). Then, any general solution of (9.2.1) is given by
y = y0 + c1 y1 + c2 y2 for arbitrary real constants c1 and c2 . If this y were
to satisfy the boundary conditions (9.2.2), then c1 , c2 should satisfy
" # " #
c1 γ1 − ξ1
(AW(a) + BW(b)) = γ −ξ ≡ , (9.2.6)
c2 γ2 − ξ2
y0 (a) y0 (b)

ξ1
where =A +B . The system of algebraic equations
ξ2 ẏ0 (a) ẏ0 (b)
(9.2.6) has a solution if and only if
rank (AW(a) + BW(b)) = rank [AW(a) + BW(b) γ − ξ ] . (9.2.7)
Note that this condition on ranks is automatically satisfied if
AW(a) + BW(b) is non-singular. We now state the foregoing discussion
in the following theorem.
Theorem 9.2.1
The following hold:
1. If AW(a) + BW(b) is non-singular, then, the BVP (9.2.1) and

(9.2.2) has a unique solution for arbitrary γ1 , γ2 . In this case, the
BVP (9.2.3) and (9.2.4) has only the trivial solution.
2. If AW(a) + BW(b) is singular, then, the BVP (9.2.1) and (9.2.2)
has a solution if and only if the rank condition (9.2.7) is fulfilled; the
solution is not unique. In this case, the BVP (9.2.3) and (9.2.4) has
non-trivial solutions.
For simplicity of presentation, we now consider the BVP (9.2.1) with the
following homogeneous boundary conditions, replacing the general
boundary conditions in (9.2.2):
y(a) = 0 and y(b) = 0. (9.2.8)
Fix α, β . We wish to derive a formula for the solution y, when it exists,
in the form of an integral involving the ‘input’ function g, much similar to
the case of first order linear equations. We rewrite (9.2.1) in the equivalent
form as
d
[ p(t )ẏ(t )] + q(t )y(t ) = f (t ), (9.2.9)
dt
where p is a positive C1 function on [a, b]. Given (9.2.9), it is obvious
that it can be written in the form (9.2.1) by taking α (t ) = ṗ(t )/p(t ),
β (t ) = q(t )/p(t ) and
R g(t ) = f (t )/p(t ). Conversely, given (9.2.1), we
define p(t ) = exp at α (s) ds and find that (9.2.1) can be put in the form
(9.2.9) with q(t ) = p(t )β (t ) and f (t ) = p(t )g(t ).
We begin with a heuristic description of the method to obtain a
solution to the problem (9.2.9) satisfying the boundary conditions (9.2.8).
Let u1 , u2 be two linearly independent solutions of (9.2.9) with f = 0,
that is, the homogeneous equation corresponding to (9.2.9). By the
method of variation of parameters, we find a general solution of (9.2.9) as
Z t
f (s)
y(t ) = Au1 (t ) + Bu2 (t ) + [u1 (s)u2 (t ) −u1 (t )u2 (s)] ds, (9.2.10)
a W (s)
where A, B are constants and W is the Wronskian of u1 and u2 defined by
W (t ) = u1 (t )u̇2 (t ) − u̇1 (t )u2 (t ). By the linear independence of u1 , u2 , it
follows that W is never zero. If we now require that y given by (9.2.10)
satisfy the boundary conditions (9.2.8), then we must have, using (9.2.10),
Au1 (a) + Bu2 (a) = 0

Z b
f (s)
Au1 (b) + Bu2 (b) = [u1 (b)u2 (s) − u1 (s)u2 (b)] ds.
a W (s)
(9.2.11)
Solving for A, B and substituting in the expression (9.2.10) for y, we obtain

y(t ) = I1 (t ) + I2 (t ),
where
Z b
f (s) [u1 (b)u2 (s) − u1 (s)u2 (b)] · [u1 (a)u2 (t ) − u1 (t )u2 (a)]
I1 (t ) = ds,
a W (s) u1 (a)u2 (b) − u1 (b)u2 (a)
Z t
f (s)
I2 (t ) = [u1 (s)u2 (t ) − u1 (t )u2 (s)] ds.
a W (s)
Looking at these expressions, we find it convenient to introduce the
following:
w1 (t ) = u1 (a)u2 (t ) − u1 (t )u2 (a)
(9.2.12)
w2 (t ) = u1 (b)u2 (t ) − u1 (t )u2 (b).
Then, w1 , w2 being linear combinations of u1 , u2 , are themselves solutions
of the homogeneous equation (9.2.9) ( f ≡ 0). They are also linearly
independent satisfying w1 (a) = 0, w2 (b) = 0, provided that u1 (a)u2 (b)−
u1 (b)u2 (a) is not zero. Looking at the aforementioned expression for
I1 , I2 , we see that the required solution y may be written as
Z t Z b
y(t ) = [· · · ] f (s) ds + [· · · ] f (s)ds.
a t
Having obtained the knowledge of the form of the solution, we now

proceed directly to obtain a representation of a solution to the BVP
(9.2.9) and (9.2.8). Choose any linearly independent solutions w1 , w2 of
the homogeneous equation corresponding to (9.2.9) satisfying w1 (a) = 0,
w2 (b) = 0.1 Then, the required solution is given by
Z b
y(t ) = G(t, s) f (s) ds (9.2.13)
a
where the function G, called Green’s function of (9.2.9) and (9.2.8), is

given by
1 Note that the existence of w and w is not automatic and the given boundary conditions play an
1 2
important role in their existence.
if a ≤ s ≤ t
(
w1 (s)w2 (t )
G(t, s) =
w1 (t )w2 (s) if t < s ≤ b.
We now directly verify that y given by (9.2.13) is a solution of (9.2.9)
satisfying the boundary conditions (9.2.8), after suitably normalizing
w1 , w2 . Clearly y satisfies (9.2.8) as w1 (a) = 0 and w2 (b) = 0.
Note that
∂ ∂
lim G(t, s) − lim G(t, s) = w1 (t )ẇ2 (t ) − ẇ1 (t )w2 (t ).
s→t− ∂t s→t + ∂t
The right side expression being the Wronskian of w1 , w2 , it follows from
(9.2.9) that this limit equals C/p(t ), where C is a constant. This follows
from the fact that the Wronskian satisfies a first order linear equation; see
Chapter 3. We may normalize w1 , w2 so that C = 1. With this
normalization, we have2
Z t Z b
∂ ∂
ẏ(t ) = G(t, s) f (s) ds + G(t,t ) f (t ) + G(t, s) f (s) ds
a ∂t t ∂t
−G(t,t ) f (t )
and therefore,
Z t Z b
ẏ(t ) = ẇ2 (t ) w1 (s) f (s) ds + ẇ1 (t ) w2 (s) f (s) ds.
a t
Now multiply this expression by p(t ) and differentiate once again with
respect to t to obtain
Z t
d d
[ p(t )ẏ(t )] = w1 (s) [ p(t )w˙2 (t )] f (s) ds + p(t )w1 (t )w˙2 (t ) f (t )
dt a dt
Z b
d
+ [ p(t )w˙1 (t )w2 (s) f (s) ds − p(t )w˙1 (t )w2 (t ) f (t ).
t dt
Using the normalization and that w1 , w2 satisfy the homogeneous equation
(9.2.9), we see that the expression on the right equals −q(t )y(t ) + f (t ).
This completes the verification that y is indeed a solution of the BVP. We
take up the uniqueness question in the next section.
2 See differentiation under the integral sign in Chapter 2.
If we consider more general boundary conditions in (9.2.2),

homogeneous or not, instead of (9.2.8), the computation of Green’s
function becomes more tedious even in the case of constant coefficient
equations. The case of non-homogeneous boundary conditions can
always be reduced to the case of homogeneous boundary conditions by
subtracting a solution of the homogeneous equation satisfying the
non-homogeneous boundary conditions. The details will be given in the
following discussion on BVP of linear systems. Some BVP with
non-homogeneous boundary conditions may lead to ‘non-self-adjoint’
problems.
9.2.1 BVP for linear systems

The earlier discussion on BVP can be easily generalized to linear systems
with the aid of a fundamental matrix. Consider the following BVP:
ẋ(t ) = A(t )x(t ) + f(t ), t ∈ [a, b], (9.2.14)
Mx(a) + Nx(b) = ξ . (9.2.15)

Here, A is an n × n matrix valued continuous function defined on [a, b], a
given interval on the real line, f : [a, b] → Rn is a given continuous vector
function and M, N are (constant) n × n matrices and ξ is a given vector in
Rn . The problem is to find a solution x(t ) satisfying (9.2.14) in [a, b] and
the boundary conditions (9.2.15).
Let Φ be a fundamental matrix of the homogeneous equation in
(9.2.14) satisfying Φ(aR) = I. Then, any solution of (9.2.14) is given by
x(t ) = Φ(t )c + Φ(t ) at Φ−1 (s)f(s) ds for some constant vector c in Rn .
If x(t ) were to satisfy the boundary conditions (9.2.15), then, the
following relation must hold:
Mc + NΦ(b)c + xb = ξ ,
Z b
where xb = NΦ(b) Φ−1 (s)f(s) ds. Therefore, the BVP (9.2.14),
a
(9.2.15) has a unique solution if and only if the matrix Y ≡ M + NΦ(b)
is non-singular. These arguments enable us to construct (matrix) Green’s
function for the homogeneous BVP (9.2.14) and (9.2.15) with ξ = 0, as
follows. Define

Φ(t )Φ−1 (s) − Φ(t )Y−1 NΦ(b)Φ−1 (s), if a ≤ s ≤ t;
G(t, s) =
−Φ(t )Y−1 NΦ(b)Φ−1 (s), if t < s ≤ b.

Then, the required solution is given by

Z b
x(t ) = G(t, s)f(s) ds.
a
The two variable (matrix) function G is called (matrix) Green’s function

and this representation is Green’s formula to obtain a solution to the given
homogeneous BVP. A solution to the non-homogeneous BVP (9.2.14) and
(9.2.15) can be found by adding a solution of the homogeneous system
in (9.2.14) satisfying (9.2.15) (for example, Φ(t )Y−1 ξ ) to the solution
obtained from Green’s formula.
9.2.2 Examples
Example 9.2.2
Consider the following simple BVP

ÿ = f (t ), y(0) = y(1) = 0.
Choose w1 (t ) = t and w2 (t ) = t − 1 for t ∈ [0, 1]. These are linearly

independent solutions of the given equation. Moreover, w1 (0) = 0 and
w2 (1) = 0 and their Wronskian is 1, the required normalization constant.
Here p ≡ 1. Therefore Green’s function of the problem is given by

s(t − 1), if 0 ≤ s ≤ t,
G(t, s) = ,
t (s − 1), if t < s ≤ 1.

and the solution to the given BVP is given by

Z 1
y(t ) = G(t, s) f (s) ds.
0
1
In particular, if f ≡ 1, we find that y(t ) = t (t − 1).
2
Example 9.2.3
Consider the following BVP

ÿ + 4y = f (t ), y(0) = y(π/4) = 0.
Since, sin(2t ) and cos(2t ) form a basis for the solution space of the
homogeneous equation, we choose w1 (t ) = sin(2t ) and w2 (t ) =
− 12 cos(2t ) to satisfy the required boundary conditions and normalization
condition. Thus, Green’s function is given by

− 12 sin(2s) cos(2t ), if 0 ≤ s ≤ t,
G(t, s) =
 1
− 2 sin(2t ) cos(2s), if t < s ≤ 1.
If, again, f ≡ 1, then the solution is given by y(t ) = 14 (1 − cos(2t )).
Example 9.2.4
In this example, we consider the BVP

ÿ − 4ẏ + 4y = f (t ), y(0) + 2ẏ(0) = 0, y(1) − ẏ(1) = 0.
Here p(t ) = e−4t and we need to normalize the linearly independent

solutions we choose. Since e2t and te2t are basis functions for the solution
space of the homogeneous equation, we choose w1 (t ) = c1 e2t (2 − 5t )
and w2 (t ) = c2 e2t (2 − t ). Then, w1 and w2 satisfy the homogeneous
equation and w1 (0) + 2w˙1 (0) = 0; w2 (1) − w˙2 (1) = 0. The constants
c1 , c2 are chosen to satisfy 8c1 c2 = 1 for the normalization requirement.
We leave further construction details of Green’s function and the solution
to the reader.
9.3 General Second Order Equations

Consider the following BVP
ÿ = f (t, y, ẏ) (9.3.1)
with boundary conditions

a0 y(a) − a1 ẏ(a) = α, |a0 | + |a1 | 6= 0
(9.3.2)
b0 y(b) + b1 ẏ(b) = β , |b0 | + |b1 | 6= 0
The negative sign of a coefficient in the boundary condition is chosen just
for later convenience. We look for a solution in the interval [a, b] as in the
previous section. A formal approach to the solution of (9.3.1) satisfying
the boundary conditions (9.3.2) is as follows: Start with the initial value
problem (IVP)
ü = f (t, u, u̇) (9.3.3)
satisfying
a0 u(a) − a1 u̇(a) = α,
(9.3.4)
c0 u(a) − c1 u̇(a) = s,
where, for the purpose of independence, we assume a1 c0 − a0 c1 6= 0. We
may fix c0 , c1 by requiring a1 c0 − a0 c1 = 1. Denote by u(t; s), the
corresponding solution of the IVP, emphasizing the dependence on s. We
seek a value of s so that
φ (s) ≡ b0 u(b; s) + b1 u̇(b; s) − β = 0. (9.3.5)
If s = s∗ is a solution of (9.3.5), then y(t ) ≡ u(t; s∗ ) would be a solution
of (9.3.1) satisfying the boundary conditions (9.3.2). This method is
called the shooting method and is quite extensively used in solving
(9.3.1) and (9.3.2) numerically. The following theorem is a consequence
of the standard existence, uniqueness and continuous dependence of
solutions of IVP.
Theorem 9.3.1
Let the function f (t, u1 , u2 ) be continuous in the region

R = {(t, u1 , u2 ) : t ∈ [a, b], u1 , u2 ∈ R} and satisfy the uniform
Lipschitz condition in u1 and u2 . Then, the BVP (9.3.1), (9.3.2) has
as many solutions as there are distinct roots, s = s j of (9.3.5). The
solutions of (9.3.1), (9.3.2) are given by
y(t ) = y j (t ) ≡ u(t; s j ),
where u solves IVP (9.3.3).
Theorem 9.3.2
[Existence and Uniqueness] In addition to the hypotheses in Theorem

9.3.1, suppose that

∂f ∂f
≤ M,
> 0 and
∂ u1 ∂ u2
in R. Also assume that

a0 a1 ≥ 0 , b0 b1 ≥ 0 and |a0 | + |b0 | 6= 0. (9.3.6)
Then, the BVP (9.3.1) and (9.3.2) has a unique solution. The condition
∂f ∂f
> 0 can be replaced by < 0.
∂ u1 ∂ u1
Corollary 9.3.3
[Uniqueness for the Linear Problem] Consider the linear equation

ÿ + p(t )ẏ + q(t )y = r (t ),
with the boundary conditions (9.3.2) and the same conditions on the
coefficients a0 , a1 , b0 , b1 as in Theorem 9.3.1 and Theorem 9.3.2. If p, q
are continuous in [a, b] and q > 0 in [a, b], then, the linear BVP has a
unique solution.
Proof: (of Theorem 9.3.2) It suffices to show that (9.3.5) has a unique
∂u
root. If u(t; s) denotes a solution of (9.3.3), put ξ (t ) = (t; s). By
∂s
differentiating (9.3.3) with respect to s, we obtain
ξ̈ = p(t )ξ̇ + q(t )ξ , (9.3.7)
where
∂f ∂f
p(t ) = (t, u(t; s), u̇(t; s)) and q(t ) = (t, u(t; s), u̇(t; s)).
∂ u̇ ∂u
Equation (9.3.7) is referred to as the variational equation. By hypothesis,

q > 0 and p is bounded. Also by differentiating the second equation in
(9.3.4) with respect to s, we obtain
ξ (a) = a1 , ξ̇ (a) = a0 .
Since |a0 | + |a1 | 6= 0, ξ 6= 0 for a < t ≤ a + ε for some ε > 0.
Claim: ξ (t ) 6= 0 for a < t ≤ b : Suppose the claim is not true. We

assume that a0 ≥ 0 and a1 ≥ 0; the other case is similar. With this
assumption, we have ξ (t ) > 0 for a < t ≤ a + ε. Suppose there exists
t∗ ∈ (a, b] such that ξ (t∗ ) ≤ 0. Therefore, ξ has a positive maximum in
[a,t∗ ). If a0 > 0, then ξ̇ (a) = a0 > 0, so maximum cannot occur at a; if
a0 = 0, then, ξ̈ (a) = q(a)a1 > 0, so, again, maximum cannot occur at
a. If t0 ∈ (a,t∗ ) is a point where a positive maximum occurs, then
ξ (t0 ) > 0 , ξ̇ (t0 ) = 0 and ξ̈ (t0 ) ≤ 0. But ξ̈ (t0 ) = q(t0 )ξ (t0 ) > 0. This
contradiction proves that ξ (t ) > 0 for a < t ≤ b.
Thus, q(t )ξ (t ) > 0 for a < t ≤ b and from (9.3.7), we have
ξ̈ (t ) > p(t )ξ̇ (t ) , a < t ≤ b.

Zt
Multiplying this inequality by exp − p(t1 ) dt1 and integrating, we
a
obtain
t
Z
ξ̇ (t ) > a0 exp p(t1 ) dt1 , a < t ≤ b.
a
One more integration gives,

Z t Z t2

ξ (t ) > a1 + a0 exp p(t1 ) dt1 dt2 , a < t ≤ b.
a a
Using p ≥ −M, we have

Z t
2
exp p(t1 ) dt1 ≥ exp(−M (t2 − a))
a
and thus,
∂ u̇
ξ̇ (t ) = (t; s) > a0 exp(−M (t − a)) ≥ 0,
∂s
for a < t ≤ b. In particular, we see that by taking t = b, u(b; s) is a

monotone function of s whose derivative (with respect to s) is bounded
away from 0 for any a0 . The same is true with u̇(b; s) if a0 6= 0; if a0 = 0,
its derivative (with respect to s) is not bounded away from 0. But, since
b0 , b1 do not both vanish and have same sign, and b0 6= 0 if a0 = 0, the
function φ (s) ≡ b0 u(b; s) + b1 u̇(b; s) − β must have a derivative of one
sign which is bounded away from 0 for any a0 . Such a function takes on
each real value once and only once; hence φ (s) = 0 has a unique root.
This completes the proof.
9.3.1 Examples:
Example 9.3.4
Consider the BVP

ü = 2 + u2 , t ∈ [0, 1]; u(0) = u(1) = 0. (9.3.8)
This is an equation in the conservative form and its phase portrait is not
hard to draw. This is depicted in Fig. 9.1. Note that E = E (t ) = 12 u̇2 (t ) −
2u(t ) − 13 u3 (t ) is the total conserved energy.
If we do not insist that u(1) = 0, but just require that u(b) = 0 for
some b > 0, then we obtain an infinite number of solutions of (9.3.8) by
choosing u(0) = 0 and u̇(0) < 0, as can be seen from the phase portraits
shown in Fig. 9.1.
u̇
Fig. 9.1 Phase portrait for ü = 2 + u2

The requirement of the latter condition that u̇(0) < 0 may also be seen as
follows. From the equation in (9.3.8), we see that u is convex. Thus, if it
satisfies the boundary conditions in (9.3.8), we must have u̇(0) < 0 and
u̇(1) > 0. It is not possible to integrate the given equation explicitly, but
the equation may be solved numerically. We find that if we ‘shoot’ with
u̇(0) slightly less than −1, we obtain a solution of (9.3.8).
Example 9.3.5
Consider the BVP

ü + λ eu = 0, t ∈ [0, 1]; u(0) = u(1) = 0. (9.3.9)
Here, λ > 0. A complete phase portrait of the equation is shown in Fig. 9.2.
Note that in this case a solution u is concave and therefore, we need to
‘shoot’ with u̇(0) > 0 to possibly obtain a solution of (9.3.9). In the present
situation, an explicit solution is possible. Here, the constant conserved
energy is given by E = E (t ) = 12 (u̇)2 (t ) + λ eu(t ) .
u̇
Fig. 9.2 Phase portrait for ü + λ eu = 0, λ > 0

A function of the form3
u(t ) = − log(C2 cosh2 (κ (1 − 2t ))),
where, C and κ are positive constants, may be tested for a solution of
(9.3.9). The function u satisfies the boundary conditions in (9.3.9) if
λ
C cosh(κ ) = 1 and it satisfies the equation if 8κ 2 = 2 . Eliminating C,
C
we obtain the following nonlinear equation connecting κ and λ :
3 The reader should try to obtain such an expression for the solution by integrating the conservation
equation 12 (u̇)2 + λ eu = E ≡ constant.

√
κ λ
= √
cosh(κ ) 2 2
The function of κ > 0 on the left side, has a unique positive maximum
and tends to 0 as κ → ∞. It follows, therefore, that there is a critical value
λcr of λ such that (9.3.9) has no solution for λ > λcr , unique solution for
λ = λcr and two solutions for λ < λcr .
For a slightly different representation of the solution, see [AMR95].
Example 9.3.6
We change the sign of the nonlinear term in Example 9.3.5 and

consider the BVP
ü − λ eu = 0, t ∈ [0, 1]; u(0) = u(1) = 0. (9.3.10)
We again take λ > 0. Surprisingly, the phase portrait of this equation is

substantially different from the previous one. The behavior of a solution
1
very much depends on the sign of the initial total energy E = (u̇(0))2 −
2
λ eu(0) . The solutions corresponding to E ≥ 0, all are unbounded and tend
to ±∞ as t → ∞. Some typical trajectories are shown in Fig. 9.3, when
E ≥ 0. On the other hand, any solution corresponding to E < 0, is bounded
below and there is a possibility that it may satisfy the given boundary
conditions. Some trajectories for E < 0 are shown in Fig. 9.4.
u̇
Fig. 9.3 Phase portrait for ü − λ eu = 0, λ > 0, E ≥ 0

u̇
Fig. 9.4 Phase portrait for ü − λ eu = 0, λ > 0, E < 0

We now proceed to obtain explicit expressions for the solutions like in the
previous example. Let
u(t ) = −2 log(C sinh(κ1 − κ2t )). (9.3.11)
The constants C, κ1 , κ2 in (9.3.11) are all positive. We should observe that
there is a need to choose κs so that the argument of sinh function in
(9.3.11) is never 0 and permits the values of t in an interval containing
[0, 1], the interval of our interest. This leads to the choice
κ1 > 0 and κ2 + 1 ≤ κ1 , κ2 6= 0.
(We may also use κ1 + κ2t instead of κ1 − κ2t in (9.3.11).) It is easily
λ
checked that u is a solution of the equation in (9.3.10) if 2κ22 = 2 .
C
However, for none of these values of C, κ1 , κ2 , are both the boundary
conditions satisfied. We leave it to the reader to verify that this solution
corresponds to E ≥ 0.
Next, consider the function u defined by
u(t ) = −2 log(C cos(κ (2t − 1))) (9.3.12)
We again remark that such an expression arises from the situation when
E < 0. Note that now the function is defined for only those t for which the
argument of cos function lies in (−π/2, π/2) as we are interested in an
interval containing the interval [0, 1].
The function given by (9.3.12) is a solution of the equation in (9.3.10)

λ
if 8κ 2 = 2 and it satisfies the boundary conditions therein if C cos(κ ) =
C
1. Since κ > 0, we must have C > 1. Eliminating C, we now obtain the
equation
√
κ λ
= √ . (9.3.13)
cos(κ ) 2 2
κ
Since the function for κ ∈ (0, π/2) is an increasing function and
cos(κ )
→ ∞ as κ → π/2, we see that for each λ > 0, there is a unique κ in
(0, π/2) satisfying (9.3.13). This in turn gives the unique solution of BVP
(9.3.10) through (9.3.12) for each λ > 0.
9.4 Exercises
1. Determine the values of λ for which a Green’s function can be
constructed for the equation ÿ + λ y = f (t ), with the following
prescribed boundary conditions. Construct a Green’s function for
all such values of λ :
(a) y(0) = y(1) = 0.
(b) y(0) + ẏ(0) = 0 and y(1) = ẏ(1).
(c) y(0) = ẏ(0) and y(l ) = 0 (l > 0).
(d) y(0) = ẏ(0) and y(π ) = ẏ(π ).

2. Determine the values of λ for which the BVP
ü + 2u̇ + λ u = 0, u(0) = u(l ) = 0, (l > 0),
has non-trivial solutions.
3. Work out all the details in the examples of Section 9.3.1.
9.5 Notes
Two point BVP are studied in this chapter for both linear and nonlinear
second order equations. For linear equations, we take the advantage of
the existence of linearly independent solutions in order to construct

Green’s function of the given BVP. A brief discussion on the construction
of a matrix Green’s function for linear system is also included. For
nonlinear equations, we no longer have the advantage of the existence of
two linearly independent solutions. An existence and uniqueness result is
proved under some sufficient conditions on the coefficients. The
examples considered, however, show that these conditions may not be
necessary. For other results on this topic, the interested reader is referred
to [Sim91, SK07, CL72, Kel90, AMR95].
10
First Order Partial Differential
Equations: Method of
Characteristics
In Chapter 3, we have seen an example of a PDE (see Example 3.4.1)

which can be solved by solving two ODEs. In this chapter, we will present
the method of characteristics, where the problem of local solvability of
any first order PDE, linear or nonlinear, can be reduced to solvability of a
system of ODE. However, it must be observed that the geometry involved
turns out to be more and more complicated when we go from linear to
quasi-linear to nonlinear equations.
10.1 Linear Equations

In Example 3.4.1, the way we arrived at the characteristic curves, which
were straight lines, may look artificial. We will now explain the geometry
of the problem and see how we can extend these ideas to general first
order equations. First, consider the general first order linear PDE in two
variables;
a(x, y)ux + b(x, y)uy = c(x, y)u + d (x, y), (10.1.1)
where (x, y) ∈ Ω. Here, Ω is a smooth bounded domain in R2 ; a, b, c, d
are given smooth functions defined on Ω and u = u(x, y) is the unknown
function. Let Γ0 ⊂ Ω be an initial curve, which is given in the parametric
form:
Γ0 = {(x0 (s), y0 (s)) : 0 ≤ s ≤ 1}.
First Order Partial Differential Equations: Method of Characteristics 287
Let u0 = u0 (s) be a given function defined on the initial curve. The IVP
for the PDE can be defined as follows: Find u = u(x, y) satisfying the PDE
(10.1.1) together with the initial condition
u(x0 (s), y0 (s)) = u0 (s) (10.1.2)
for all s ∈ [0, 1]. The problem of local solvability of IVP is to find u in a
neighborhood in Ω of the initial curve satisfying (10.1.1) and (10.1.2).
y
(x(t ), y(t ))
(x0 , y0 ) (a(x, y), b(x, y))
Fig. 10.1 A characteristic curve
We now give the geometric idea behind introducing the characteristics.

First observe that for a fixed point (x, y), the term on the left side of
(10.1.1) is the directional derivative of u at (x, y) in the direction
(a(x, y), b(x, y)). Thus, if we consider any curve (x(t ), y(t )),
parameterized by the t variable, in the (x, y) plane whose tangent at each
point is the direction (a(x, y), b(x, y)) (see Fig. 10.1), then the term on the
left side of (10.1.1) will become a total derivative along the curve with
the parameter describing the curve. Thus, consider the one parameter
family of curves defined by the system of ODE
dx dy dy b(x, y)
= or = (10.1.3)
a(x, y) b(x, y) dx a(x, y)
or in the parametric form
dx dy
(t ) = a(x(t ), y(t )), (t ) = b(x(t ), y(t )). (10.1.4)
dt dt
Along these curves, u will satisfy the ODE

d
u(x(t ), y(t )) = c(x(t ), y(t )) u(x(t ), y(t )) + d (x(t ), y(t )). (10.1.5)
dt
Note that t here is a parameter along the curves. The one parameter family
of curves defined by (10.1.4) are called the characteristic curves of the
PDE (10.1.1).
Now fix s and fix a point (x0 (s), y0 (s)) on the initial curve and look for
the characteristic curve passing through this point; that is, solve the ODE
system (10.1.4) together with the initial values
x(0) = x0 (s), y(0) = y0 (s).
Under appropriate assumptions on a and b, for example, a, b are C1
functions, the existence theory discussed in Chapter 4 will give us a
unique characteristic curve in a neighborhood of the origin (local
existence). Since the solution also depends on s, we denote the solution
as (x(t, s), y(t, s)). More precisely, for fixed s and varying t, the solution
(x(t, s), y(t, s)) moves along the characteristic curve. As s changes, we
get different characteristic curves.
Now by restricting u along a characteristic curve (that is, fixing an s),
we get the equation for u as
d
u(x(t, s), y(t, s)) = c(x(t, s), y(t, s))u(x(t, s), y(t, s))
dt
+d (x(t, s), y(t, s)) (10.1.6)
with the initial values
u(x(0, s), y(0, s)) = u0 (s). (10.1.7)
Note that in this ODE, s is merely a parameter, whereas t is the
independent variable. Thus, we have obtained u in the fixed characteristic
curve and changing s, we can obtain u along different characteristic
curves. However, we still need to answer the question whether u is indeed
defined in a neighborhood Ω1 in Ω of the initial curve Γ0 satisfying the
PDE and the initial conditions. To achieve this, we need to assure
ourselves that the family of characteristics {(x(t, s), y(t, s))}s covers such
an Ω1 . In other words, we need to solve the following inverse problem:
Given an arbitrary point (x, y) ∈ Ω1 , find a characteristic curve passing

through (x, y) and meeting the initial curve; more precisely, given (x, y) ∈
Ω1 , find (t, s) and a solution of (10.1.4) such that
x(t, s) = x, y(t, s) = y. (10.1.8)
An answer to this inverse problem imposes certain conditions on the initial
curve known as transversality condition. We will study this problem in a
more general situation, in the next section, while dealing with quasi-linear
equations.
Example 10.1.1
Consider the transport equation uy (x, y) + kux (x, y) = 0 with the initial
condition u(x, 0) = u0 (x) on Γ : y = 0 considered in (3.4.1).
dy 1
Here a = k, b = 1, c = d = 0. Thus, = or x − ky = constant, are the
dx k
characteristics which we have already seen in Example 3.4.1. Note that
we had used t instead of y.
Example 10.1.2
Consider the PDE, xux + yuy = αu and u = φ (x) on the initial curve
y = 1.
It is easy to see that y = cx, c constant, are the characteristic curves and
along any of these curves, u satisfies
d y α
u(x, cx) = ux (x, cx) + uy (x, cx).c = ux + uy = u(x, cx),
dx x x
whose solution is given by u(x, cx) = kxα . Here k = k(c) depends on c
which may differ from characteristic to characteristic. Thus, we have the
general solution u(x, y) = k xy xα , where k is an arbitrary function. Now

applying the condition u = φ (x) at y = 1, we get

1 α 1 α
φ (x ) = k x or k(x) = φ x
x x
and hence, the required solution is

x α
u(x, y) = φ y .
y
There will be difficulties if instead we prescribe the initial condition on
the x-axis (why?).
10.2 Quasi-linear Equations

The general form of a quasi-linear equation in two variables can be written
as
a(x, y, u)ux + b(x, y, u)uy − c(x, y, u) = 0. (10.2.1)
As in the linear case, we may prefer to introduce the characteristic curves
by solving the ODE system as in (10.1.4). But in this case, the right hand
side in (10.1.4) has to be replaced by a(x(t ), y(t ), u(x,t )), b(x(t ), y(t ),
u(x,t )) which depends on the unknown function u as well. Hence, the
system is not complete and we need to adjoin the ODE for u as well to
get a complete system. Thus, we need to introduce space curves, instead
of just plane curves.
Definition 10.2.1
A solution u(x, y) of (10.2.1) represented by a surface z = u(x, y) in

the three-dimensional (x, y, z) space is known as an integral surface.
The equation (10.2.1) can be written as a dot product, namely,

(a, b, c) · (ux , uy , −1) = 0. (10.2.2)
The vector (ux , uy , −1) is the normal to the integral surface z = u(x, y) at
the point (x, y, z) on the surface. Thus, (10.2.1) can be interpreted as the
condition that at each point on the integral surface, the vector (a, b, c) is
tangent to the surface. Thus, the PDE (10.2.1) defines a vector (direction)
field (a, b, c) in R3 , called the characteristic direction, having the property
that a surface z = u(x, y) is an integral surface if and only if at each point
on the surface, the tangent plane contains the characteristic direction.
This motivates us to look at an integral surface as a family of space
curves whose tangent at any point on the curve coincides with the
characteristic direction. These curves are known as characteristic curves.

Thus, introduce the family of space curves given by
dx dy dz
= = . (10.2.3)
a(x, y, z) b(x, y, z) c(x, y, z)
In the parametric form, these equations can be written as
dx

(t ) = a(x(t ), y(t ), z(t )), 

dt 





dy

(t ) = b(x(t ), y(t ), z(t )), (10.2.4)
dt 




dz



(t ) = c(x(t ), y(t ), z(t )). 
dt
Note that in the linear case, a and b were independent of z and hence, the
solution (x(t ), y(t )) of (10.1.4) defined plane curves in the x − y plane. In
relation to (10.2.4), when a and b were independent of z, these plane
curves are nothing but the projection of the space curves given by the
solutions of (10.2.4). Using the results of Chapter 4, under suitable
conditions on a, b, c, we see that through each point (x0 , y0 , z0 ), there
passes a unique characteristic curve
x(t ) = x(x0 , y0 , z0 ,t ), y = y(x0 , y0 , z0 ,t ), z = z(x0 , y0 , z0 ,t ),
defined for small t. Now, it is trivial to see that if a surface is generated
by a family of characteristic curves, then, it is an integral surface as both
have the same tangential directions. Conversely, if z = u(x, y) is an integral
surface ∑ and (x0 , y0 , z0 = u(x0 , y0 )) is a point on ∑, then, the integral
curve through (x0 , y0 , z0 ) will lie completely on ∑ and thus, ∑ is generated
by a family of characteristic curves. To see this, consider the solution of
dx dy
= a(x, y, u(x, y)), = b(x, y, u(x, y))
dt dt
with x = x0 , y = y0 at t = 0. Then, the corresponding curve
x = x(t ), y = y(t ), z = z(t ) = u(x(t ), y(t ))
satisfies
dz dx dy
= ux + uy = aux + buy = c.
dt dt dt
Thus, the curve (x(t ), y(t ), z(t )) = u(x(t )), y(t )) satisfies the system
(10.2.4) and hence, it is the characteristic through (x0 , y0 , z0 ). Moreover,
it lies on ∑ by definition. Further, if two integral surfaces intersect at a
point, then the characteristic curve through the point would lie on both
the surfaces and hence, they intersect along the whole characteristic
through this common point. With this detailed discussion, we can now
formulate the initial value problem as follows:
Initial Value Problem (IVP): Let Γ0 be a given initial curve as in the

linear case and the initial value u0 be given. Since this data is insufficient to
solve (10.2.4), we ‘lift’ the initial curve Γ0 to a space curve Γ̄0 by adjoining
the initial values as
Γ̄0 = {(x0 (s), y0 (s), u0 (s)) : 0 ≤ s ≤ 1}. (10.2.5)
Now the IVP reads as follows: find an integral surface ∑ passing through
the lifted initial curve Γ̄0 .
Characteristic curves
Initial curve
Fig. 10.2 Characteristic and initial curves
Theorem 10.2.2
Consider the PDE (10.2.1) and let a, b, c have continuous partial

derivatives with respect to x, y, u. Suppose that along the initial curve Γ0 ,
the initial values u = u0 (s) are prescribed, where x0 , y0 , u0 are
continuously differentiable functions for 0 ≤ s ≤ 1. Define the initial
space curve Γ̄0 by (10.2.5) and assume that the transversality condition
holds:
dy0 dx0
a(x0 (s), y0 (s), u0 (s)) − b(x0 (s), y0 (s), u0 (s)) 6= 0 (10.2.6)
ds ds
for all 0 ≤ s ≤ 1. Then, there exists a unique solution u(x, y) defined in
some neighborhood of the initial curve Γ0 , which satisfies the PDE
(10.2.2) and the initial conditions
u(x0 (s), y0 (s)) = u0 (s). (10.2.7)

The theorem thus, confirms that there is an integral surface through the
space curve Γ̄0 in some neighborhood.
Proof: Consider the system of ODE

dx dy du
= a(x, y, u), = b(x, y, u), = c(x, y, u).
dt dt dt
For any fixed (x0 (s), y0 (s), u0 (s)) ≡ (x0 , y0 , z0 ) on the initial space curve,
by solving this system, we obtain a unique family of characteristics
x = x(x0 , y0 , u0 ,t ) ≡ x(t, s)
y = y(x0 , y0 , u0 ,t ) ≡ y(t, s) (10.2.8)
u = u(x0 , y0 , u0 ,t ) ≡ u(t, s)
for small t, with x(0, s) = x0 (s), y(0, s) = y0 (s), u(0, s) = u(x0 (s),
y0 (s)) = u0 (s). Note that the derivatives of x, y, u with respect to s and t
are continuous. Thus, we can solve for u along the characteristic curve.
But we need to do more because we need to solve for u in any arbitrary
point in the neighborhood of the initial space curve as we discussed
earlier in the linear case; in other words, we need to answer the question:
does there exist a characteristic curve passing through any arbitrary point
in the neighborhood and meeting the initial space curve? Since the
Jacobian

∂x ∂x

∂ (x, y) ∂s ∂t dx0 dy0
= =b −a 6= 0,

∂ (s,t ) t =0

∂y ∂ y ds ds

∂s

∂t t =0
we can invoke the inverse function theorem ([Apo11, Rud76]) to obtain
s,t as functions of x, y in some neighborhood of the initial curve t = 0, say
s = s(x, y),t = t (x, y). Now define
ϕ (x, y) = u(s(x, y),t (x, y)).
One can verify that ϕ is the unique solution satisfying the initial
conditions.
Example 10.2.3
Consider the PDE : uux + uy = 1 with initial conditions x = s, y =

1
s, u = s, 0 ≤ s ≤ 1 that is, the value of u are given on the diagonal of
2
the unit square in R2 .
Here a = u, b = 1, c = 1. Thus, the transversality condition (10.2.6):

dy0 dx0 s
a(x0 (s), y0 (s), u0 (s)) − b(x0 (s), y0 (s), u0 (s)) = · 1 − 1 · 1 6= 0,
ds ds 2
is satisfied for 0 ≤ s ≤ 1. Solving the systems of ODE with the initial
conditions:
dx dy du s
= u, = 1, = 1, x(0, s) = s, y(0, s) = s, u(0, s) = ,
dt dt dt 2
we get the family of characteristic curves
t 2 st s
x(t, s) = + + s, y(t, s) = t + s, u(t, s) = t +
2 2 2
x − y2 /2 y−x
Solving s and t, we get s = ,t= and finally, the solution
1 − y/2 1 − y/2
is given by
2(y − x) + (x − y2 /2)
u(x, y) = .
2−y
10.3 General First Order Equation in Two Variables

Substituting z = u(x, y), p = ux = zx , q = uy = zy , a general first order
equation in two variables can be written as
F (x, y, z, p, q) = 0, (10.3.1)
where F is a given function with continuous second derivatives with
respect to its variables x, y, z, p, q. The interesting and surprising fact is
that, once again, we can reduce the study of the problem (10.3.1) to that
of a system of ODE. But the geometry is more complicated than the case
of quasi-linear equations and require much more complicated
geometrical objects known as strips and cones.
Let (x0 , y0 , z0 ) be a point in space and consider an integral surface
z = u(x, y) through (x0 , y0 , z0 ). The direction numbers ( p, q, −1) define
the normal direction to the tangent plane at (x0 , y0 , z0 ) of the integral
surface. Then, (10.3.1) states that there is a relation
F (x0 , y0 , z0 , p, q) = 0 (10.3.2)
between the direction numbers p and q, that is, the differential equation
(one relation with two numbers) will restrict its solutions to those surfaces
having tangent planes belonging to a one parameter family.
In general, this one parameter family of planes will envelope a cone
called the Monge cone. Thus, the differential equation (10.3.1) describes a
field of cones having the property that a surface will be an integral surface
if and only if it is tangent to a cone at each point.
Fig. 10.3 Monge cone

Remark 10.3.1
In the quasi-linear case, the cone degenerates into a straight line whose
direction is given by (a, b, c).
At each point, the surface will be tangent to a Monge cone. The line of
contact of the surface and the cones define a field of directions on the
surface called the characteristic directions and the integral curves of this
field define a family of characteristic curves. The Monge cone at
(x0 , y0 , z0 ) is the envelope of the one parameter family of planes (whose
normal is ( p, q, −1)) which can be written as
z − z0 = p(x − x0 ) + q(y − y0 ), (10.3.3)
where p, q solves (10.3.2). By solving (10.3.2) for q in terms of p as
q = q(x0 , y0 , z0 , p), we can write (10.3.3) as
z − z0 = p(x − x0 ) + q(x0 , y0 , z0 , p)(y − y0 )
which is a one parameter family of planes describing the Monge cone.
Differentiating with respect to p, we get
dq
0 = (x − x0 ) + (y − y0 ) .
dp
From (10.3.2), we have
dF dq
= Fp + Fq = 0. (10.3.4)
dp dp
dq
Eliminating , the equations describing the Monge cone can be written
dp
as

F (x0 , y0 , z0 , p, q) = 0 




z − z0 = p(x − x0 ) + q(y − y0 )

(10.3.5)

x − x0 y − y0


= .



Fp Fq
Given p and q, the last two equations give the line of contact between the
tangent plane and the cone. The last two equations can be written as
x − x0 y − y0 z − z0
= = . (10.3.6)
Fp Fq pFp + qFq
Thus, on the given integral surface, at each point p0 = p(x0 , y0 ), q0 =
q(x0 , y0 ) are known, the tangent plane
z − z0 = p0 (x − x0 ) + q0 (y − y0 )
together with the third equation in (10.3.5) determines the line of contact
with the Monge cone given by (10.3.6) or the characteristic direction.
Thus, the characteristic curves are given by the system of ODE
dx dy dz
= =
Fp Fq pFp + qFq
or
dx dy dz
= Fp , = Fq , = pFp + qFq . (10.3.7)
dt dt dt
As there are five unknowns x(t ), y(t ), z(t ), p(t ), q(t ), we need two more
equations to complete the system (10.3.7). But along a characteristic curve
on the given integral surface, we have
dp dx dy

= px + py = px Fp + py Fq  
dt dt dt 
(10.3.8)
dq 
= qx Fp + qy Fq .


dt
However, px , py , qx , qy are second derivatives of u which are undesirable;
we need to eliminate them. Differentiating the given PDE with respect to
x and y, we obtain
Fx + Fz p + Fp px + Fq qx = 0,
Fy + Fz q + Fp py + Fq qy = 0
so that (10.3.8) becomes
dp

= −Fx − Fz p 
dt 
(10.3.9)
dq 
= −Fy − Fz q, 

dt
∂ 2u
where we have used py = = qx . Thus, on the integral surface
∂ y∂ x
z = u(x, y), we have a family of characteristic curves with coordinates
x(t ), y(t ), z(t ) along with the numbers p(t ), q(t ) and which is given by
the system (10.3.7), (10.3.9). Moreover along the curve, we have
dF dx dy dz dp dq
= Fx + Fy + Fz + Fp + Fq
dt dt dt dt dt dt
and we readily see that dF dt = 0 using (10.3.7) and (10.3.9), showing that
F = constant, is an integral of ODE. Thus, if F = 0 is satisfied at an
initial point x0 , y0 , z0 , p0 , q0 for t = 0, then (10.3.7), (10.3.9) will determine
a unique solution x(t ), y(t ), z(t ), p(t ), q(t ) passing through this point and
along which F = 0 will be satisfied for all t.
Hence, a solution can be interpreted using these five numbers and is
called a strip, that is, a space curve x = x(t ), y = y(t ), z = z(t ) and, along
it, a family of tangent planes whose normal directions are ( p(t ), q(t ), −1).
(x(t ), y(t ), z(t ))

( p, q, −1)
Fig. 10.4 Characteristic strips
For fixed t0 , the five numbers x0 , y0 , z0 , p0 , q0 are said to define an element

of the strip. That is, a point on the curve together with the tangent plane.
From (10.3.7), we get
dz(t ) dx(t ) dy(t )
= p(t ) + q(t ) . (10.3.10)
dt dt dt
This is the condition that the planes are tangent to the curve and is called
the strip condition. The strips which are solutions to (10.3.7), (10.3.9)
are respectively called characteristic strips and the curves, characteristic
curves.
Furthermore, as in the case of quasi-linear equations, if a characteristic

strip has one element x0 , y0 , z0 , p0 , q0 in common with an integral surface
z = u(x, y), then it lies completely on the surface. To see this, solve the
ODE system
dx dy
= Fp (x, y, u(x, y), uy (x, y)), = Fq (x, y, u(x, y), uy (x, y))
dt dt
to obtain a curve x = x(t ), y = y(t ) satisfying the initial conditions x(0) =
x0 , y(0) = y0 . Then, by defining z(t ) = u(x(t ), y(t )), p(t ) = ux (x(t ), y(t ))
q(t ) = uy (x(t ), y(t )), we see that
dz(t ) d p(t ) dq(t )
= p(t )Fp + q(t )Fq , = −Fx −Fz ux , = −Fy −Fz uy .
dt dt dt
Therefore, x(t ), y(t ), z(t ), p(t ), q(t ) determine a characteristic strip and by
definition, they lie on the surface. But, by uniqueness of the characteristic
strip with the initial element x0 , y0 , z0 , p0 , q0 , this is the given strip.
Initial Value Problem: Suppose now an initial curve Γ : x = x0 (s), y =

y0 (s), z = z0 (s) be given. Further, assume that we can assign functions
p0 (s) and q0 (s) such that they together form a family of appropriate initial
strips. That is, they satisfy the equation
F (x0 (s), y0 (s), z0 (s), p0 (s), q0 (s)) = 0 (10.3.11)
and the strip condition
dz0 dx0 dy0
= p0 + q0 . (10.3.12)
ds ds ds
So, by fixing s and taking an initial element, the idea is to construct the
characteristic strip starting from the given initial strip. As s varies, we get a
family of characteristic strips (see Figure 10.5). This, in turn will give the
integral surface satisfying the initial conditions. Again, this requires the
initial curve Γ0 to be a non-characteristic curve; see the following theorem.
We now state the theorem without proof.
Characteristic elements
Initial element
Fig. 10.5 Characteristic and initial strips
Theorem 10.3.2
Let x0 , y0 , z0 , p0 , q0 be as in (10.3.11) and (10.3.12). Assume x0 , y0 , z0

have continuous derivatives and p0 , q0 are continuously differentiable.
Moreover, assume the following non-characteristic condition along the
initial curve:
dx0 dy0
Fq (x0 , y0 , z0 , p0 , q0 ) − Fp (x0 , y0 , z0 , p0 , q0 ) 6= 0.
ds ds
Then, in some neighborhood of the initial curve, there exists a solution
z = u(x, y) of (10.3.1) containing the initial strip; that is,
z(x0 (s), y0 (s)) = z0 (s), zx (x0 (s), y0 (s)) = p0 (s), zy (x0 (s), y0 (s)) = q0 (s).
In general, there is no uniqueness of solution. See the following example.
Example 10.3.3
Consider the equation p2 + q2 = 1 with initial condition u(x, y) = 0 on the

1
line x + y = 1. Then, there are two solutions given by u(x, y) = ± √ (x +
2
y − 1). The details are left as an exercise.
Remark 10.3.4
We can extend the method of characteristics to the first order

equations in n variables, namely F (x1 , . . . , xn , z, p1 , . . . , pn ) = 0,
∂u
where z = u(x1 , . . . , xn ) and pi = , 1 ≤ i ≤ n. Here, the
∂ xi
characteristic equations are given by
n
dxi dz d pi
= Fpi , = ∑ pi Fpi , = −Fxi − pi Fz , 1 ≤ i ≤ n
dt dt i=1 dt
which is a system of 2n + 1 equations in 2n + 1 unknowns.
Remark 10.3.5
One can easily deduce the quasi-linear and linear case from the general
equations. In the quasi-linear case
F (x, y, z, p, q) = a(x, y, z) p + b(x, y, z)q − c(x, y, z) = 0.
Thus, Fp = a, Fq = b and pFp + qFq = c and hence, the equations in
(10.3.7) are independent of p and q and so it can be solved to
determine the characteristic curves (x(t ), y(t ), z(t )). But, in the
nonlinear case, one has to solve for (x(t ), y(t ), z(t )) together with the
direction numbers p and q. Moreover, in the quasi-linear case, the
Monge–Cone equations (10.3.6), reduces to
x − x0 y − y0 z − z0
= =
a b c
which represents the equation of a line in the space showing that the
Monge cone degenerates to a line.
In the linear case, a and b are independent of z as well, so that the
first two equations in (10.3.7) form a complete system for x and y;
the characteristic curves are plane curves, that is, the curves lie on the
(x, y) plane. Moreover, the third equation reduces to
du dz
(x(t ), y(t )) = (t ) = c(x(t ), y(t ))
dt dt
which can be solved to obtain u.
10.4 Hamilton–Jacobi Equation

Our main motivation in this section is to introduce the reader to an
important equation, namely the Hamilton–Jacobi equation (HJ). This is
given by
ut + H (x, Du) = 0. (10.4.1)
Our presentation here is deliberately vague, essentially because even the
explanation of the various terminology involved goes much beyond the
scope of the present book. However, because of the importance of the HJ
equation in numerous applications, including optimal control problems,
where it is referred to as the Hamilton–Jacobi–Bellman(HJB) equation,
we have included this brief introduction so that the interested reader can
look into more advanced texts on the subject.
In the HJ equation, u = u(t, x) is the unknown function of n + 1
variables, t > 0 and x ∈ Rn ; the Hamiltonian H is a real valued smooth
function defined on Rn × Rn and Du = (ux1 , · · · , uxn ) is the gradient
vector of u. Time dependent Hamiltonian, that is, H = H (t, x, Du) can
also be considered. This equation arises in the classical calculus of
variations and more generally in optimal control problems (here, it is
called the Hamilton–Jacobi–Bellman(HJB) equation). Further, H may be
of the general form H (t, x, Du). In a smooth set up, the minimum value
of an associated cost functional or energy functional, namely value
function is known to satisfy the HJ or HJB equation. Equation (10.4.1) is
a first order PDE in n + 1 dimensions, and thus, the characteristic
equations form a system of 2n + 3 equations. But, due to the special
structure of (10.4.1), the system of characteristic equations can be written
as 2n equations known as Hamilton’s ODE system

dx
= Dp H (x, p)  
dt

(10.4.2)
dp 
= −Dx H (x, p) 

dt
together with an ODE equation for the unknown u. Further, the variable t
itself can be used as the parameter. See Example 1.2.8.
Calculus of Variations: Given a Lagrangian L = L(x, q), x, q ∈ Rn ,

introduce the functional
Z T
J (w) : = L(w(s), w0 (s))ds, (10.4.3)
0
where w ∈ A , the space of admissible trajectories, say for example,

A = {w ∈ C2 ([0, T ]; Rn ) : w(0) = a, w(T ) = b}.
Here T > 0 is the terminal time and the vectors a and b are the given
vectors in Rn . The basic problem is to find a curve x ∈ A which solves
the minimization problem
J (x) = min J (w). (10.4.4)
w∈A
One can extend the elementary result on finite dimensional optimization

that extremal points occur at critical points, to the present minimization
problem which is infinite dimensional, since we are working on
trajectories belonging to the infinite dimensional space A . If x ∈ A is a
solution to the problem (10.4.4), known as optimal trajectory, then, x
satisfies the Euler–Lagrange(EL) equations
d
(Dq L(x(s), ẋ(s))) + Dx L(x(s), ẋ(s)) = 0. (10.4.5)
ds
This is a system of n second order equations. Thus, any minimizer x ∈
A of (10.4.4) solves EL equations, but the converse need not be true. A
solution to the EL equations is called a critical point of the functional J.
Since EL is a system of n second order equations, we can convert it
into a system of 2n first order system as follows. Introduce p(s) =
Dq L(x(s), ẋ(s)), called the generalized momentum corresponding to the
position x(·) and velocity ẋ(·). These terminologies were motivated from
classical mechanics which we will see soon. We need an important
hypothesis to get Hamilton’s ODE.
Assumption: Suppose that, for given x, p ∈ Rn , the equation

Dq L(x, q) = p can be uniquely and smoothly solved for q as q = q(x, p).
Now introduce the Hamiltonian
H (x, p) = p · q(x, p) + L(x, q(x, p)). (10.4.6)
Theorem 10.4.1
Let x solve the EL equations and p be the corresponding generalized

momentum. Then, under this assumption, x and p satisfy Hamilton’s
ODE system with the Hamiltonian defined as in (10.4.6). Further, the
mapping s → H (x(s), p(s)) is constant.
We have earlier seen that Hamilton’s ODE are characteristic curves

corresponding to a PDE. Thus, the characteristic curves are nothing but
the level curves of the energy. There are many results in this direction and
more general analysis in the context of HJB equations. The analysis for
HJB is not a direct generalization from classical HJ equations and it took
almost two centuries to come up with a theory. Two theories emerged
after the 1950s due to Pontryagin from USSR and Bellman from USA.
The former one is based on Hamiltonian ODE whereas Bellman’s theory
is based on PDE known as HJB equations. We now give the classical
example from Newtonian mechanics.
Example 10.4.2
Consider the motion of a particle of mass m under the influence of a

force field f given by a potential φ , that is, f = ∇φ .
Define the Lagrangian L(x, q) = m2 |q|2 − φ (x), the difference in kinetic

and potential energy. Then, the corresponding EL equations describe
Newton’s second law of motion, namely mẍ(s) = f(x(s)) = ∇φ (s).
Here, the assumption that p = Dq L(x, q) = mq is trivially solvable. It is
easy to see that the Hamiltonian is the total energy H (x, p) =
1 2
2m |p| + φ (x). In Newton’s theory, position and velocity formulation is
given (that is, Lagrangian formulation), whereas position and momentum
(Hamiltonian formulation) are the unknowns given through the
Hamiltonian system. It is possible to go from one formulation to the
other.
10.5 Exercises
1. Find and sketch some sample characteristic curves of the PDE
(x + 2)ux + 2yuy = 2u
in the x − y plane. Write the ODE for u along a characteristic curve

with x as parameter
p and then, solve the PDE with the initial condition
u(−1, y) = |y|.
2. Consider the PDE
xux + yuy = 2u, x > 0, y > 0.
Plot the characteristic curves and then, solve the equation with the
following initial conditions: (a) u = 1 on the hyperbola xy = 1; and
(b) u = 1 on the circle x2 + y2 = 1.
Can you solve the equation in general, if certain initial data is
prescribed on the initial curve y = ex ? Justify with reasons.
3. Write down the characteristic ODE system for the equation uy = u3x
and then, solve with the initial condition 2x3/2 on the positive x-axis.
4. Sketch the characteristic curve, the initial curve and solve the
following problems
(a) xux + yuy = ku, x ∈ R, y ≥ α > 0; u(x, α ) = F (x), where

k, α are fixed and F is a given smooth function.
√
(b) (x + 2)ux + 2yuy = αu; u(−1, y) = y.
(c) yux − xuy = 0; u(x, 0) = x2 .
(d) x2 ux − y2 uy = 0; u(1, y) = F (y).
5. Find the characteristic curves of the following equations

(a) (x2 − y2 + 1)ux + 2xyuy = 0.
(b) 2xyux − (x2 + y2 )uy = 0.
6. Solve the quasi-linear problem and verify the transversality
condition.
(a) uux + uy = 0, u(x, 0) = x.
(b) uux + uy = 1, u(x, x) = x/2, x ∈ (0, 1].
7. Find and sketch the characteristic curves of uux + uy = 0 with the
following initial conditions

0 if x < 0
(a) u(x, 0) =
1 if x ≥ 0

1 if x < 0
(b) u(x, 0) =
0 if x ≥ 0

0 if x < 0
(c) u(x, 0) = and u(x, 0) is smooth and
1 if x ≥ 1
increasing.
2
∂u ∂u
8. Find the integral surface of the equation x +y =u
∂x ∂y
passing through the line y = 1, x + z = 0.
9. Consider the equation p2 + q2 = 1 with initial condition u(x, y) = 0
on the line x + y = 1. Show that there are two solutions given by
1
u(x, y) = ± √ (x + y − 1) using the method of characteristics.
2
10.6 Notes
The purpose of this chapter is not to give an expository introduction to
PDE, but to show how the ODE play an important role in the analysis of
first order PDE. The reader can refer to [Eva98, Joh75, RR04, PR96] for
further discussion and more details.
Appendix A
Poincarè–Bendixon and
Leinard’s Theorems
A.1 Introduction
In this appendix (see ([CL72])), we present a proof of the
Poincarè–Bendixon theorem concerning the existence of periodic orbits
to two- dimensional autonomous systems. We also discuss Leinard’s
theorem. First we discuss some basic notions of limit sets. Consider an
n-dimensional autonomous system
ẋ = f(x), (A.1.1)
where f : Rn → Rn is a continuous, locally Lipschitz function. Denote by
φt (x0 ), the unique solution x of (A.1.1) with x(0) = x0 , for t ∈ Ix0 , where
Ix0 is the corresponding maximal interval of existence.
Recall the definition of an invariant set. A set A ⊂ Rn is said to be
invariant with respect to (A.1.1), if φt (x) ∈ A for every x ∈ A and t ∈ Ix .
Next, recall the definitions of orbit O (x), positive (semi) orbit O + (x)
and negative (semi) orbit O − (x) through a given point x ∈ Rn :
O (x) = {φt (x) : t ∈ Ix };
O + (x) = {φt (x) : t ∈ Ix ,t ≥ 0};
O − (x) = {φt (x) : t ∈ Ix ,t ≤ 0}.

308 Appendix
It is easy to verify that a set A ⊂ Rn is invariant if and only if
O (x).
[
A=
x∈A
We now introduce the notions of α-limit and ω-limit sets (observe that α
and ω are, respectively, the first and the last letters of the Greek alphabet).
Defintion A.1.1
Given x ∈ Rn , the α-limit set and the ω-limit set of x, with respect to
(A.1.1) are defined, respectively, by
O − (y)
\
α (x) = αf (x) =
y∈O (x)
and
O + (y).
\
ω (x) = ωf (x) =
y∈O (x)
We have the following result concerning ω-limit sets.
Theorem A.1.2
Suppose the positive orbit O + (x) of a point x ∈ Rn is bounded. Then,

the following hold:
1. the ω-limit set ω (x) is non-empty, compact and connected;

2. a point y ∈ ω (x) if and only if there is a sequence tk % +∞ such that
φtk (x) → y as k → ∞;
3. the ω-limit set ω (x) is positively invariant, that is, φt (y) ∈ ω (x) for
every y ∈ ω (x) and for all t > 0;
4. inf{kφt (x) − yk : y ∈ ω (x)} → 0 as t → +∞.
Proof: Put K = O + (x). By hypothesis, K is compact. Being the

arbitrary intersection of closed sets, ω (x) is closed. Since ω (x) ⊂ K, it
follows that ω (x) is compact.
Appendix 309
Since O + (x) is bounded, it follows that (0, ∞) ⊂ Ix ; see Chapter 4 on

extension of the interval of existence of a solution. Therefore,
\
ω (x) = At , (A.1.2)
t>0
where
At = {φs (x) : s > t}.
Using (A.1.2), we now establish the second property. If y ∈ ω (x), then,
there exists a sequence tk % ∞ such that y ∈ Atk for k ∈ N. Then, there is
also a sequence sk % ∞, sk ≥ tk such that φsk (x) → y. Conversely, if there
exists a sequence tk % ∞ such that φtk (x) → y, then, y ∈ Atk for k ∈ N,
and hence,
∞
\ \
y∈ Atk = At ,
k =1 t>0
since At ⊂ At 0 for t > t 0 .

Next, we show that ω (x) 6= 0. / Consider the sequence {φk (x)}
contained in K. By compactness, there is a subsequence {φtk (x)} with
tk % +∞, converging to a point in K. The second property therefore that
ω (x) 6= 0.
/
We now show that ω (x) is connected. Suppose not. Then, since ω (x)
is closed, there exist two closed, disjoint subsets A, B such that ω (x) =
A ∪ B. By compactness, A, B are at a positive distance apart, that is,
δ ≡ inf{ka − bk : a ∈ A, b ∈ B} > 0.
Since A is a subset of ω (x), it follows that d (φtk (x), A) ≤ δ /2 for some
sequence tk % ∞, and hence, by triangle inequality, d (φtk (x), B) ≥ δ /2,
for sufficiently large k. By compactness of K, there is a subsequence such
that φtk (x) → y ∈ K. Therefore, y ∈ ω (x). By continuity of the distance
function and the solution, it follows that d (y, A) = δ /2. But then,
d (y, B) ≥ δ /2, by triangle inequality. Thus, y 6∈ ω (x), which is a
contradiction.
We now proceed to prove the third property. Suppose y ∈ ω (x). Then,
by the second property, there is a sequence tk % +∞ such that
{φtk (x)} → y as k → ∞. Recall from the continuous dependence of
310 Appendix
solutions on initial data (Chapter 4), the function y 7→ φt (y) is continuous

for each fixed t. Therefore, for given t > 0, we have
φtk +t (y) = φt (φtk (y)) → φt (y)
as k → ∞. Since tk + t % +∞ also, we conclude that φt (y) ∈ ω (x).
For the last property, assume the contrary. There would then exist a
sequence tk % +∞ and a constant δ1 > 0 such that
inf kφtk (x) − yk ≥ δ1 (A.1.3)
{y∈ω (x)}
for k ∈ N. But then, by compactness of K, there is a subsequence φt 0 (x)

k
of φtk (x) converging to some p ∈ ω (x). The inequality (A.1.3) then
implies that kp − yk ≥ δ1 for all y ∈ ω (x). This contradiction proves the
last property and the proof is complete.
Analogous statements may be made and proved for negative orbits. In

fact, these properties follow if we apply Theorem A.1.2 to the system
ẋ = −f(x).
A.2 Poincarè–Bendixon Theorem

We now restrict the analysis to R2 and establish one of the important
results in the qualitative theory of autonomous systems, namely the
Poincarè–Bendixon Theorem.
A.2.1 Intersection with Transversals

Let f : R2 → R2 be a C1 function; f may be defined in an open set in R2 .
A point in R2 which is not an equilibrium point will be called a regular
point. A finite line segment L is said to be transversal to f, if each point
on L is a regular point and at each point x on L, the directions of L and
f(x) generate R2 . In other words, the directions of L and f are linearly
independent at every point on L. We also say that L is transverse to f. A
point on L is said to be an interior point if it is not an end point of the line
segment L.
Appendix 311
φs ( y )
y
Fig. A.1 Intersection of orbits with a transversal
Lemma A.2.1
The following statements hold:
1. Every regular point x is an interior of some transversal, which may

have any direction except that of f(x).
2. Every orbit which meets a transversal L must cross it, and all such
orbits cross L in the same direction.
3. Let x0 be an interior point of a transversal L. Then, for each ε > 0,
there is a circle Cε with x0 as center such that every orbit passing
through a point inside Cε at t = 0 crosses L at some t, |t| < ε.
Proof: By continuity of f, there is a circle C centered at x such that f(y) 6=

0 for all y in C and its interior. Any diameter of C, which is not in the
direction of f(x) serves as a transversal. This proves the first statement.
Let L be a transversal which is represented by the equation a · x + b = 0
for some non-zero vector a ∈ R2 . The meaning of the first part of the
assertion is the following:
If there is a t0 < ∞ (t0 cannot be ±∞; why?) such that a · x(t0 ) + b = 0
for an orbit x(t ), then, for small t, a · x(t ) + b < 0 for t < t0 and a · x(t ) +
b > 0 for t > t0 , or vice versa. Since L is a transversal, it follows that
a · f(x(t )) 6= 0 for any orbit x. If the said assertion were not true, then,
we must have a · f(x(t0 )) = 0, in case L is tangent to the orbit at t = t0 , or
a · f(x(t )) = 0, in case the orbit moves along L for t closer to t0 . In either
case, we arrive at a contradiction to the transversality condition.
For the other assertion, let the orbits x, y meet L at t = t1 and t = t2
respectively. Suppose a · f(x(t1 )) > 0 and a · f(y(t2 )) < 0. As t1 6= t2 , by
312 Appendix
continuity, we then have a · f(z) = 0 for some z along the line segment
joining x(t1 ) and y(t2 ) along L. This again contradicts the transversality
condition, proving the second statement. Some of such crossings are
depicted in Fig. A.1.
For the last statement in the theorem, let the equation for L be given by
a · x + b = 0 for some non-zero vector a ∈ R2 . By continuity of f, there is a
circle around x0 containing only regular points. The solution φt (x) passing
through any x inside this circle at t = 0 is continuous in (t, x) in an open set
containing (0, x0 ); this follows from the continuous dependence on initial
data. Put L(t, x) = a · φt (x) + b. Then, L(0, x0 ) = 0 and ∂∂tL (0, x0 ) 6= 0,
by transversality. Hence, by the implicit function theorem, for any ε > 0,
there is a circle C, centered at x0 and a continuous function t = t (x) defined
inside C satisfying t (x0 ) = 0 and L(t (x), x) = 0 for all x inside Cε . By
continuity of t at x0 , it now follows that, given any ε > 0, there is a circle
Cε , centered at x0 such that |t (x)| < ε for all x inside Cε . Hence, the
orbit passing through any x inside Cε at t = 0 crosses L at time t (x) and
|t (x)| < ε.
Proposition A.2.2
Given x ∈ R2 , the intersection ω (x) ∩ L contains at most one point,

where L is transversal to f.
Proof: Assume ω (x) ∩L is non-empty and let q ∈ ω (x) ∩L. By Theorem
A.1.2, there is a sequence tk % ∞ such that φtk (x) → q as k → ∞. Since
L is a transversal to f, it follows from Lemma A.2.1 that for each y ∈ R2
sufficiently close to q, there exists a unique time s such that φs (y) ∈ L
and φt (y) 6∈ L for t ∈ (0, s) if s > 0, or t ∈ (s, 0) if s < 0; in particular,
for each k ∈ N, sufficiently large, there is an s = sk as earlier such that
xk = φtk +sk (x) ∈ L.
We now consider two cases: either the sequence {xk } is a constant
sequence or is a non-constant sequence. In the first case, the orbit of x is
periodic and hence, the ω-limit set ω (x) = O (x) only intersects L at the
constant value of the sequence {xk }, and thus, ω (x) ∩ L = {q}. In the
second case, let us consider two successive points of intersection xk and
xk+1 ; these intersections occur in one of the two ways as shown in
Fig. A.2. By Lemma A.2.1, we know that along L, the vector field f
always points to the same side. Next, note that the segment of the orbit
between xk and xk+1 together with the line segment between these two
points form a continuous curve C. Hence, by Jordan’s curve theorem, its
Appendix 313
complement R2 \ C has two connected components, one bounded and

another unbounded. The bounded component is the shaded region in
Fig. A.2. Due to the direction of f on the line segment between xk and
xk+1 , the positive orbit O + (xk ) is contained in the unbounded
component. This implies that the next point of intersection xk+2 does not
belong to the line segment between xk and xk+1 . Therefore, the points
xk , xk+1 and xk+2 are ordered on the transversal L as shown in Fig. A.3.
Due to the monotonicity of the sequence {xk } along L, it has at most one
limit point in L and hence, ω (x) ∩ L = {q}.
xk xk+1
xk+1 xk
Fig. A.2 The direction of an orbit while crossing a transversal
xk
xk+1
xk+2
Fig. A.3 Crossings of an orbit on a transversal
We now come to the Poincarè–Bendixon theorem.

314 Appendix
Theorem A.2.3
[Ponicarè–Bendixon] Let f : R2 → R2 be a C1 function. For the

equation (A.1.1), if the positive orbit O + (x) of a point x ∈ R2 is
bounded and ω (x) contains no equilibrium points of f, then, ω (x) is
a periodic orbit.
Proof: Since the positive orbit O + (x) is bounded, it follows from

Theorem A.1.2 that ω (x) is non-empty. Let p ∈ ω (x). It follows from
the first and third properties in Theorem A.1.2 and the definition of
ω-limit set, that ω (p) is non-empty and ω (p) ⊂ ω (x). Now consider any
q ∈ ω (p). By hypothesis, q is not an equilibrium point. Thus, by
Lemma A.2.1, there a line segment L transversal to f containing q in its
interior. The second property in Theorem A.1.2, then, shows that there is
a sequence tk % ∞ such that φtk (p) → q as k → ∞. Proceeding as in
Proposition A.2.2 (see the procedure in which the sequence {xk } ⊂ L was
obtained), we may assume that φtk (p) ∈ L for k ∈ N. On the other hand,
since p ∈ ω (x), it follows from the third property in Theorem A.1.2 that
φtk (p) ∈ ω (x) for k ∈ N. Since φtk (p) ∈ ω (x) ∩ L, we conclude from
Proposition A.2.2 that
φtk (p) = q, for every k ∈ N.
This implies that O (p) is a periodic orbit. In particular, O (p) ⊂ ω (x).
It remains to show that O (p) = ω (x). Suppose not. Since ω (x) is
connected, in each neighborhood of O (p), there exist points of ω (x) that
are not in O (p). The continuity of f shows that in a sufficiently small
neighborhood of O (p), there are no equilibrium points of f. Then, Lemma
A.2.1 shows that there exists a transversal L0 to f containing in its interior,
a point of ω (x) and a point of O (p). Since O (p) ⊂ ω (x), it follows that
ω (x) ∩L0 contains at least two points, a contradiction to Proposition A.2.2.
Therefore, O (p) = ω (x) and ω (x) is a periodic orbit.
As an extension of the Poincarè–Bendixon theorem, to include the case of

ω (x) containing equilibrium points, we have the following theorem.
Theorem A.2.4
Let f : R2 → R2 be a C1 function. Assume that the positive orbit

O + (x) for (A.1.1) of a point x ∈ R2 is contained in a compact set
Appendix 315
having at most finite number of equilibrium points of f. Then, one of

the following alternatives hold: the ω-limit set ω (x) is
(i) an equilibrium point;
(ii) a periodic orbit;
(iii) a union of a finite number of equilibrium points and a homoclinic or
heteroclinic orbits.
A homoclinic orbit is a solution x(t ) of (A.1.1) such that lim x(t ) = ξ
t→±∞
for some equilibrium point ξ of (A.1.1). For Duffing’s equation without
damping, we have seen that the orbit ‘connecting’ the equilibrium point at
the origin is a homoclinic point.
A heteroclinic orbit is a solution x(t ) of (A.1.1) such that lim x(t ) = ξ1
t→∞
and lim x(t ) = ξ2 for some equilibrium points ξ1 and ξ2 of (A.1.1). For
t→−∞
the pendulum equation ẍ + k sin x = 0 (k > 0), we have seen that the orbit
‘connecting’ the equilibrium points, for example, (−π, 0) and (π, 0) is a
heteroclinic orbit.
Proof (of Theorem A.2.4) Since ω (x) ⊂ O + (x), the set ω (x) contains
at most finite number of equilibrium points. If ω (x) consists of only
equilibrium points, then it is necessarily a single equilibrium point
because of connectedness.
Now, assume ω (x) contains some regular points as well and at least
one periodic orbit O (p). We claim that ω (x) is the periodic orbit. For
otherwise, by connectedness, there would exist a sequence {xk } ⊂ ω (x) \
O (p) and a point x0 ∈ O (p) such that xk → x0 as k → ∞. Next consider a
transversal L to f such that x0 ∈ L. It follows from Proposition A.2.2 that
ω (x) ∩ L = {x0 }. On the other hand, proceeding as in Proposition A.2.2,
we infer that O + (xk ) ⊂ ω (x) intersects L for sufficiently large k. Since
ω (x) ∩ L = {x0 }, it follows that xk ∈ O (x0 ) = O (p) for sufficiently large
k, which contradicts the choice of the sequence {xk }. Therefore, ω (x) is
a periodic orbit.
Finally, assume that ω (x) contains regular points, but no periodic
orbit. We show that for any regular p ∈ ω (x), the sets ω (p) and α (p) are
equilibrium points.
If p ∈ ω (x) is a regular point, notice that ω (p) ⊂ ω (x). If q ∈ ω (p) is
a regular point and L is a transversal to f containing q in its interior, then
by Proposition A.2.2, we have
316 Appendix
ω (p) ∩ L = ω (x) ∩ L = {q};

in particular, the orbit O + (p) intersects L at a point x0 . Since
O + (p) ⊂ ω (x), we have x0 = q, and thus, O + (p) and ω (x) have the
point q in common. Then, proceeding as in Proposition A.2.2, we
conclude that O (p) = ω (p) is a periodic orbit. This contradiction shows
that ω (p) contains only equilibrium points and by connectedness, it
consists of a single equilibrium point. The proof for α (p) is similar.
A.3 Leinard’s Theorem

We now present Leinard’s theorem concerning the existence of periodic
orbits of a second order nonlinear equation and a proof of the same (see
[CL72, Per01, Sim91]).
Theorem A.3.1
[Leinard’s Theorem] Consider Leinard’s equation

ẍ + f (x)ẋ + g(x) = 0, (A.3.1)
where f , g : R → R are C1 functions satisfying the following conditions:
1. The function g is odd and satisfies g(x) > 0 for x > 0; f is an even
function.
2. The odd function F (x) = 0x f (ξ ) dξ has exactly one positive zero at
R
x = a, is negative in (0, a), is positive and non-decreasing for x > a,

and F (x) → ∞ as x → ∞.
Then, (A.3.1) has a unique periodic orbit surrounding the origin in the
phase plane, and this orbit is approached spirally by every other (non-
trivial) orbit as t → ∞.
Proof: First let us write (A.3.1) as a first order system

)
ẋ = y
(A.3.2)
ẏ = −g(x) − f (x)y.
Since f , g are assumed to be of class C1 , the basic theorem on existence

and uniqueness of solutions holds. By condition (1), it follows that g(0) =
0 and g(x) 6= 0 for x 6= 0. Therefore, origin is the only equilibrium point
Appendix 317
for the system (A.3.2). Thus, any periodic orbit must surround the origin.
Now
Z x
d dx
ẍ + f (x)ẋ = + f (ξ ) dξ
dt dt 0
d
= [y + F (x)],
dt
which suggests that we introduce a new variable
z = y + F (x ).
Thus, (A.3.2) can be written in the equivalent form
ẋ = z − F (x)
)
(A.3.3)
ż = −g(x),
in the x − z plane. Again, the origin is the only equilibrium point for
(A.3.3) and the usual existence and uniqueness result holds. Because of
assumption (2), the correspondence (x, y) ↔ (x, z) between the points in
the two planes is one–one and continuous both ways. Therefore, periodic
orbits correspond to periodic orbits and the configurations of orbits in the
two planes are qualitatively similar. The orbits of (A.3.3) satisfy the
dz −g(x)
= . (A.3.4)
dx z − F (x )
The orbits of (A.3.3) may easily be described using the hypothesis in the
theorem and (A.3.4). We will now make the following observations which
will help in understanding the directions of the orbit in Fig. A.4.
First note that since g and F are odd functions, (A.3.3) and (A.3.4) are
unchanged when x, z are replaced by −x, −z. This means that any curve
symmetric to an orbit with respect to the origin is also an orbit. Therefore,
if we know an orbit in the right half plane (x > 0), it is also known in the
left half plane (x < 0) by reflection through the origin.
Next, if an orbit starts on the z-axis with positive z coordinate (denoted
by P in Fig. A.4), then, the orbit is horizontal (parallel to the x-axis) at
this point. The x coordinate of the orbit is increasing and thus, moves into
318 Appendix
the right half plane, until the orbit meets the curve z = F (x) (denoted by
Q in Fig. A.4), where the orbit becomes vertical, that is, parallel to the
z-axis; after crossing the curve z = F (x), the x coordinate of the orbit
starts decreasing, which continues up to the time when the orbit meets the
z-axis again (denoted by R in Fig. A.4). As long as the orbit is in the
right half plane, the z coordinate is decreasing. Let b be the abscissa (the x
coordinate) of the point Q and denote by Cb the orbit described previously.
z
Cb S
P N
z = F (x )
M
x=a x=b
x
O L K
T
R
Fig. A.4 Proof of Leinard’s theorem
It is not hard to see that when the orbit is continued beyond P and R into
the left half plane, the result will be a periodic orbit if and only if the
distances OP and OR are equal, by using the reflection through the origin;
O is the origin. Therefore, to show that there is a unique periodic orbit, it
suffices to show that there is a unique value of b which gives OP = OR.
To this end, we introduce the function
Z x
G(x ) = g(ξ ) dξ
0
and consider the function

z2
E (x, z) = + G(x ).
2
Appendix 319
z2
Note that E (0, z) = . Along any orbit, we have
2
Ė = g(x)ẋ + zż
= −[z − F (x)]ż + zż (using (A.3.4))
= F (x)ż,
which may be written as dE = F dz. Now evaluate the line integral of F
along the orbit Cb from P to R, to obtain
1
Z Z
I (b) = F dz = dE = ER − EP = (OR2 − OP2 ).
PR PR 2
Thus, it suffices to show the existence of a unique b such that I (b) = 0.
For b ≤ a (see the hypothesis), we have F < 0 and ż < 0. Hence, I (b) >
0. For b > a, write I (b) = I1 (b) + I2 (b), where
Z Z Z
I1 (b) = F dz + F dz and I2 (b) = F dz.
PS TR ST
See Fig. A.4. Since F < 0 and ż < 0 as we move along Cb from P to S and
from T to R, we have I1 (b) > 0. On the other hand, when we move from S
to T along Cb , we have F > 0 and ż < 0, so I2 (b) < 0. Therefore, we need
to find a b such that I1 (b) = −I2 (b) > 0.
We now show that I (b) is a decreasing function of b for b ≥ a and
tends to −∞ as b → ∞. Since I (a) > 0, this gives, by continuity, a unique
b0 such that I (b0 ) = 0. We, then, have the unique periodic orbit Cb0 .
From (A.3.4), it follows that
dz −g(x)F (x)
F dz = F dx = dx.
dx z − F (x )
Hence, the effect of increasing b is to raise arc PS (the arc PS is part of the
−g(x)F (x)
orbit) and to lower arc T R, which decreases the magnitude of
z − F (x )
for a given x ∈ (0, a). Since the limits of integration for I1 (b) are fixed,
the result is a decrease in I1 (b). Furthermore, since F is positive and
non-decreasing for x > a, we see that an increase in b gives an increase
in the positive number −I2 (b), and hence, decrease in I2 (b). This proves
320 Appendix
the first assertion that I (b) is a decreasing function in b ≥ a. To show that

I2 (b) → −∞ as b → ∞, which in turn implies that I (b) → −∞ as b → ∞,
consider a point L as shown in Fig. A.4. fiFix it and let K be a point to the
right of L. Then, using F (x) > 0 for x > a, we have
Z Z
I2 (b) = F dz < F dz ≤ −(LM ) · (LN );
ST NK
and since LN → ∞ as b → ∞, we have I2 (b) → −∞. This proves the first

assertion in the theorem. For the second part, observe that OR > OP for
b < b0 and by symmetry, we conclude that the orbits which start inside
Cb0 spiral out to Cb0 . Similarly, using OR < OP for b > b0 , we see that the
orbits which start outside Cb0 spiral in to Cb0 . This completes the proof.
Bibliography
[AMR95] Ascher, U. M., R. M. M. Mattheij, and R.D. Russel. 1995.

Numerical Solution of Boundary Value Problems for
Ordinary Differential Equations. Philadelphia: SIAM.
[AO12] Agarwal, Ravi P. and Donal O’Regan. 2012. An Introduction
to Ordinary Differential Equations. NewDelhi: Springer.
[Apo11] Apostol, Tom M. 2011. Calculus, Vol 1 and 2. New Delhi:
Wiley.
[Arn98] Arnold, V. I. 1998. Ordinary Differential Equations. India:
Prentice-Hall.
[AS72] Abramowitz, M. and A. Stegun. 1972. Handbook of
Mathematical Functions. New York: Dover.
[BR03] Birkhoff, G. and G-C. Rota. 2003. Ordinary Differential
Equations. New York: John Wiley and Sons.
[Bra75] Braun, Martin. 1975. Differential Equations and Their
Applications. New York: Springer-Verlag.
[Bra78] Braun, Martin. 1978. Differential Equations and Their
Applications. New Delhi: Springer International.
[BS05] Bartle, R. G. and D. R. Sherbert. 2005. Introduction to Real
Analysis. New Delhi: John Wiley and Sons.
[CL45] Cartwright, M. L. and J. E. ‘Littlewood. 1945. On the
nonlinear differential equations of the second order. i. the
equation ÿ + k(1 − y2 )ẏ + y = bλ k cos(λt + a); k large’.
J.Lond.Math.Soc. 20: 180–189.
[CL72] Coddington, E. A. and N. Levinson. 1972. Ordinary
Differential Equations. New Delhi: Tata McGraw-Hill.
[Eva98] Evans, L. C. 1998. Partial Differential Equations.
Providence, Rhode Island: American Mathematical Society.
322 Bibliography
[GH83] Guckenheimer, John and Philip Holmes. 1983. Nonlinear

Oscillations, Dynamical Systems and Bifurcation of Vector
Fields. New York: Springer Verlag.
[Hao84] Hao, B. L. 1984. Chaos. Singapore: World Scientific.
[HK97] Hoffman, Kenneth and Ray Kunze. 1997. Linear Algebra.
New Delhi: Prentice Hall.
[HSD04] Hirsch, M. W., S. Smale, and R. L. Devaney. 2004.
Differential Equations, Dynamical System and an
Introduction to Chaos. New Delhi: Academic Press.
[Inc26] Ince, E. L. 1926. Ordinary Differential Equations. New
York: Dover, Inc.
[Joh75] John, F. 1975. Partial Differential Equations, 2 ed.. New
York: Springer-Verlag.
[JS03] Jordan, D. W. and P. Smith. 2003. Nonlinear Differential
Equations, third edition (reprint). Oxford: Oxford University
Press.
[Kel90] Keller, H. B. 1990. Numerical Solution of Two-point
Boundary Value Problems. New York: Dover.
[Kum00] Kumaresan, S. 2000. Linear Algebra: A Geometric
Approach. New Delhi: Prentice Hall.
[Lef77] Lefschetz, Solomon. 1977. Differential Equations:
Geometric Theory. New York: Dover.
[Lev49] Levinson, N. 1949. ‘A second order differential equation
with singular solutions.’ Ann.Math. 50: 127–153.
[Mer97] Merkin, David R. 1997. Introduction to the Theory of
Stability. New York: Springer.
[MU78] Myint-U, T. 1978. Ordinary Differential Equations. New
York: North-Holland.
[Per01] Perko, Lawrence. 2001. Differential Equations and
Dynamical Systems. New York: Springer.
[PR96] Prasad, P. and R Ravindran. 1996. Partial Differential
Equations, 3 ed. New Delhi: New Age International.
[RR04] Renardy, M. and R. C. Rogers. 2004. Partial Differential
Equations. New York: Springer-Verlag.
Bibliography 323
[Rud76] Rudin, Walter. 1976. Principles of Mathematical Analysis.

London: McGraw-Hill.
[Sim91] Simmons, G. F. 1991. Differential Equations with
Applications and Historical Notes. New Delhi: McGraw-
Hill International Edition.
[SK07] Simmons, G. F. and S. G. Krantz. 2007. Differential
Equations: Theory, Technique and Pratice. New Delhi: Tata
McGraw-Hill.
[Str06] Strang, Gilbert. 2006. Linear Algebra and its Applications.
New Delhi: Cenagage Learning.
[Tay11] Taylor, Michael E. 2011. Introduction to Differential
Equations, Indian edition (2013). Providence, Rhode Island:
American Mathematical Society.
[TS86] Thomson, J.M.T. and H. B. Stewart. 1986. Nonlinear
Dynamics and Chaos. New York: John Wiley and Sons.
[Wig90] Wiggins, S. 1990. Introduction to Applied Nonlinear
Dynamical Systems and Chaos. New York: Springer.
Index
α-limit set 308 diagonalizable 138

ω-limit set 308 difference equation 18
Abel’s formula 173 exact 63, 66, 69
asymptotically stable 230 linear 2
autonomous system 135–136, 224, linear system 134
linear, second order 72
Bendixon’s criterion 261
linear, second order,
bifurcation diagram 154
homogeneous 74
bifurcation point 237
boundary value problem 56, 58
order 3
linear system 274
dynamical system 2, 143, 145
nonlinear equation 276
two point 267 eigenspace
center 154 generalized 48, 166
characteristic equation 78 eigenvector
characteristic generalized 140, 161, 166
curve 287–288, 291 equation
strip 298 Bernoulli 93
conservative system 248 Bessel’s 190, 195, 199
continuation of solution 121 Burgers 265
continuous dependence 119 Chebyshev’s 190
convergence Duffing 14, 93, 228
pointwise 22 Euler 198
uniform 22 Hermite’s 184
Hill’s 174
deficiency index 160 hypergeometric 190
diagonalizability 46, 139 Jacobi 93
block 46 Legendre’s 190, 198
326 Index
Lorenz 15, 236 Green’s formula 275

Lotka-Volterra 16 Gronwall’s inequality 104, 106
Mathieu 174
pendulum 15, 229 Hamilton-Jacobi equation 302
Riccati 93 Hamiltonian system 17, 302
van der Pol 14, 93, 229 Hartman-Grobman theorem 238,
equi-continuity 26 247
equilibrium point 139, 143, Hermite polynomials 185
147, 227 heteroclinic orbit 315
center 143, 151 homoclinic orbit 315
degenerate 154
focus 143 index 48
hyperbolic 233 indicial equation 193
isolated 227 initial value problem 4, 56
node 143 integral calculus problem 2
stable 149 integral surface 290
unstable 149 integrating factor 1, 64, 70
non-isolated 227 interval of existence 60
saddle 143, 148, 153 invariant manifold 245
Euler equation 134
Euler-Lagrange equation 303 Jordan canonical form 38, 164
Jordan curve 254
first order equation 56 Jordan decomposition theorem 50
linear 60
linear, homogeneous 61 Lagrange’s identity 206
linear, non-homogeneous 61, 63 Leinard’s theorem 316
nonlinear 60 Liapunov
Floquet theorem 174 function 238–239
flow 143, 145 stability 229
focus 151, 153 limit cycle 227
Frobenius series 194 stable 263
Frobenius theory 180 linear equation
fundamental matrix 168, 171 constant coefficients 77
fundamental theorem of calculus 57 nonhomogeneous 79
linear equivalence 138
general solution 74 linear stability analysis 231
Gevrey class 183 linearized system 231
Green’s function 88–89, 267, 272 linearly equivalent 138
matrix 275 Lipschitz continuity 22, 27
Index 327
maximal interval of existence 120 spring-mass-dashpot system 8

method of characteristics 89, 286 Picone formula 220
Monge cone 295 Poincare index 253–254
multiplicity Poincare-Bendixon theorem 261,
algebraic 48, 160 310, 313
geometric 48, 160 problem
ill-posed 99
node 153 well-posed 99
non-autonomous system 135, 169, Prufer substitution 209
224
norm real analytic function 180
Frobenius 43 reduction of order 77
induced 43 regular form 56, 58
matrix 43 regular singular point 190
resonance 86–87
orbit 144, 225
closed 227 separatrix 251
heteroclinic 250 shooting method 267, 276–277
homoclinic 252 singular equation 56, 61
periodic 227, 253 sink 154
ordinary point 186 SIR model 16
orthogonal condition solution
with weight 206 existence 99
global 60
particular solution 79 local 60
path 225 periodic 227
periodic boundary condition 203, stability 99
208 uniqueness 99
Perron’s Theorem 231 source 154
phase plane 139, 143, 248 spectral analysis 48
phase portrait 143–144, 224 stable manifold theorem 167, 247
physical model Sturm-Liouville problem
atomic waste disposal 6 regular 201
electrical circuit 9 singular 201
logistic model 4 Sturm-Liouville theory 201
mechanical vibration 8 subspace
population model 3 center 167
satellite problem 10 invariant 46, 165, 245
spring-mass-dashpot 83 stable 167
328 Index
unstable 167 trajectory 144, 225

superposition principle 61, 73 transition matrix 136, 168, 172
Sylvester’s formula 159 transversal 310
system of equations 125 transversality conditions 289
traveling wave solution 265
Tacoma bridge 87
Taylor’s formula 33 undetermined coefficients 81
theorem uniqueness theorem 106
Arzela-Ascoli 25–26
Cauchy-Peano 103, 107, variable separable 1, 65
112, 115 variation of parameters 80, 136, 168
comparison 211–212 vector field 143, 146, 225
fixed point 34, 107, 116 vector space 38
Picard’s 107
Sturm comparison 214 Wronskian 74

Ordinary Dierential Equations Principles and Applications

Uploaded by

Copyright:

Available Formats

Ordinary Dierential Equations Principles and Applications

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Ordinary Dierential Equations Principles and Applications

Uploaded by

Copyright:

Available Formats

What are the main topics covered in the book?

What are the main topics covered in the book?

What are some of the real life applications discussed in the book?

What are some of the real life applications discussed in the book?

Ordinary Differential Equations

A. K. Nandakumaran is a Professor at the Department of Mathematics, Indian Institute

Titles in print in this series:

• Continuum Mechanics: Foundations and Applications of Mechanics by C. S. Jog

Ordinary Differential Equations:

ISBN 978-1-108-41641-2 Hardback

Additional resources for this publication at www.cambridge.org/9781108416412

1 Introduction and Examples: Physical Models

2.5 Matrix Exponential eA and its Properties 43

4.5 Continuation of a Solution into Larger Intervals and Maximal

7 Regular Sturm–Liouville Theory

10.2 Quasi-linear Equations 290

1.1 Logistic map 5

5.11 Phase portrait of a 3 × 3 system 156

for Technology Enhanced Learning (NPTEL), Department of Science

system. We have tried to make a presentation of these important notions

We wish to express our sincere appreciation to Gadadhar Misra and

1.1 A Brief General Introduction

1.2 Physical and Other Models

1.2.1 Population growth model

y(t ) = y0 er(t−t0 ) (1.2.2)

respectively. Hence, if the initial population y0 satisfies 0 < y0 < ba , then

Fig. 1.1 Logistic map

The estimation of the vital coefficients a and b in a particular

1.2.2 An atomic waste disposal problem

Analysis: The idea is to view the velocity V (t ) not as a function of

This is a first order non-homogeneous nonlinear equation for the velocity

Of course, v cannot be explicitly expressed in terms of y as it is a

Tail to the Tale: This problem was initiated when environmentalists

1.2.3 Mechanical vibration model

1.2.4 Electrical circuit

Fig. 1.2 A basic LCR circuit

More often, the current I (t ) is the physical quantity of interest; by

1.2.5 Satellite problem

Fig. 1.3 Satellite problem

= r̈ cos θ − ṙ (sin θ )θ̇ − ṙ (sin θ )θ̇ − r (cos θ )θ̇ 2 − r (sin θ )θ̈

Comparing coefficients of cos θ and sin θ from (1.2.23), we get the

1.2.6 Flight trajectory problem

Fig. 1.4 Flight trajectory

1.2.7 Other examples

[Unforced Duffing equation or oscillator] This is a second order

Here α, β are nonzero real numbers and δ ≥ 0. This equation, referred

[Unforced van der Pol equation or oscillator] This is also a second

This equation, apparently first introduced in 1896 by Lord Rayleigh, was

[Pendulum equation or Nonlinear oscillator] Again, this is a second

[Lorenz equations] The Lorenz system is given by

Motivated by the meteorological problem of weather prediction, Lorenz

Lorenz’s work was not noticed by the mathematical community when

[Lotka–Volterra Prey–Predator Model]

The dynamical behaviour governing the growth, decay and general

[Mathematical Epidemiology] We now describe the basic SIR model

3. Removed (R): individuals who are immune to the infection, and

The total host population is N = S + I + R. When N is very large, we may

[Hamiltonian system] This is an even dimensional system described

If (x(t ), y(t )) is a solution of (1.2.40), it is easy to see that H (x(t ),