Caltech Vector Calculus 3
Caltech Vector Calculus 3
Caltech Vector Calculus 3
3.1
at the function y = f (x) with f (x) = 1 x2 , whose graph is C + . Note that (by the
chain rule) f 0 (x)
= x(1 x2 )1/2 . The slope of the tangent to C + at a is then given
by
0
f (1/2) = 1/
3. Hence the tangent line to C at a is the line L passing through (1/2, 3/2)
of slope 1/
3. This is the unique line passing through a in the direction of the tangent
vector ( 3/2, 1/2). (In other words
L is the unique line passing through a and parallel
to the line joining the origin to ( 3/2, 1/2).) The situation is similar for any plane curve
which is the graph of a one-variable function f , i.e. of the form (t) = (t, f (t)). So the
definition of a tangent vector above is very reasonable.
At points t0 where 0 (t0 ) = 0 the curve may or may not look smooth.
2
Examples. a) Consider (t) = (t2 , t3 ). This curve has a cusp at (0, 0).
b) Consider the curve defined on [1, 1] by (t) = (t2 , 0) for 1 t 0 and (t) = (0, t2 )
for 0 t 1. This curve is differentiable at 0 (check each component is, see Ch.2, Lemma
2) with derivative 0 (0) = (0, 0). The curve has a corner at (0, 0).
c) Consider (t) = (t|t|, t4 ). This curve is differentiable at 0 with derivative 0 (0) = (0, 0)
but looks perfectly smooth at (0) = (0, 0). In fact it is just a reparametrization of the
parabola (t) = (t, t2 ) which has a well defined tangent space spanned by (1, 0) at (0, 0). So
sometimes the vanishing of 0 (t) is just an artefact of the parametrization whereas the curve
itself has a meaningful tangent space.
3.2
In the last paragraph we have defined the tangent line to a parametrized curve. This is a
set defined by a map into Rn . Here we look at different kinds of sets which are defined by
maps from Rn .
Definition. Let f : D R, D Rn , be a scalar field. For each c R, the level set of f
at c is given by
Lc = {x D | f (x) = c}
Example. f (x) = height from sea level, where x is a point on Mount Wilson (sitting in
R3 ). Lc (f ) consists of all points of constant height above sea level, just like those lines on a
topographic map.
Definition. Let f : D R, D Rn , be a differentiable scalar field and a Lc (f ). If
f (a) 6= 0 then we define the tangent space a (Lc (f )) to Lc (f ) at a to be the vector
space
a (Lc (f )) = {x Rn | f (a) x = 0}.
This is a solution set of one nonzero equation in Rn , hence a vector space of dimension
n 1. One calls a point a Lc (f ) where f (a) 6= 0 a smooth point of Lc (f ). The idea
being that a small neighborhood of a in Lc (f ) looks just like a small neighborhod of any
point in Rn1 though we wont make this statement precise.
3
Definition. Let f : Rn Rm be a vector field with components (f1 , ..., fm ). Then for
c = (c1 , .., cm ) Rm the level set
Lc (f ) := {x Rn |f (x) = c} = Lc1 (f1 ) ... Lcm (fm )
is defined as before, and the tangent space
a (Lc (f )) := {x Rn |(Ta f )(x) = 0}
is defined if Ta f has largest possible rank, namely m. This latter condition is equivalent to
requiring f1 (a), ..., fm (a) to be linearly independent. We also have
a (Lc (f )) = a (Lc1 (f1 )) ... a (Lcm (fm ))
and dimR a (Lc (f )) = n m.
Once we have the notion of tangent space, a normal vector is defined as before for both
classes of examples. Note that n m is also the (intuitive) dimension of Lc (f ).
Examples:
(1) Find a normal to the surface S given by z = x2 y 2 + y + 1 at (0, 0, 1): Set f (x, y, z) =
x2 y 2 + y z + 1. Then z = x2 y 2 + y + 1 iff (x, y.z) lies on L0 (f ). Since f is a polynomial
function, it is differentiable. We have:
f (a) = (2xy 2 , 2x2 y + 1, 1)|(0,0,1) = (0, 1, 1),
which isa normal
to L0 (f ) = S by the Proposition above. So a unit normal is given by
n = (0, 1/ 2, 1/ 2).
(2) (Physics example) Let P be a unit mass particle at (x, y, z) in R3 . Denote by F~ the
gravitational force on P exerted by a point mass M at the origin ). Then Newtons
gravitational law tells us that
GM ~r
~
,
F = 2
r
r
where ~r = x~i + y~j + z~k, r = ||~r||, and G the gravitational constant. (Here ~i, ~j, ~k denote
the standard basis vectors (1, 0, 0), (0, 1, 0), (0, 0, 1).) Put V = GM/r, which is called
r
the gravitational potential. By chain rule, V
equals GM
. Similarly for V
and V
.
x
r2 x
y
z
Therefore
GM r r r
GM x y z
V = 2 ( , , ) = 2 ( , , ) = F~ .
r x y z
r
r r r
5
For c < 0, the level sets Lc (V ) are spheres centered at the origin of radius GM/c, and F~
is evidently perpendicular to each of them. Note that F gets stronger as r gets smaller as
the physical intuition would suggest.
(3) Let f = x3 + y 3 + z 3 and g = x + y + z. The level set S := L10 (f ) L4 (g) is a curve.
The set S {(x, y, z) R3 |x 0, y 0, z 0} is closed and bounded hence compact (verify
this!).
3.3
[] = 3, [ 2] = 1 and [] = 4.) Then it is easy to see that f (x) is always non-negative
and that f (n) = 0 for all integers n. Hence the integers n are all relative minima for f . The
graph of f is continuous except at the integers and is periodic of period 1. Moreover, the
image of f is the half-closed interval [0, 1). We claim that there is no relative maximum.
Indeed the function is increasing (with slope 1) on [n, n + 1) for any integer n, any relative
maximum on [n, n + 1) has to be an absolute maximum and this does not exist; f (x) can
get arbitrarily close to 1 but can never attain it.
In the first two examples above, f was continuous, even differentiable. In such cases the
relative extrema exert an influence on the derivative, which can be used to locate them.
Lemma 1 If a = (a1 , . . . , an ) is an interior point of D and a relative extremum of a differentiable scalar field f , then f (a) is zero.
Proof.
so that gj (0) = f (a). Since f has a relative extremum at a, gj has a local extremum at
0. Since f is differentiable, gj is differentiable as well, and we see that by Ma 1a material,
f
f
f
(a). Since f (a) = ( x
(a)),
(a), . . . x
gj0 (0) = 0 for every j. But gj0 (0) is none other than x
n
1
j
it must be zero.
QED.
Definition.
A stationary point of a differentiable scalar field f is a point a where
f (a) = 0. It is called a saddle point if it is not a relative extremum.
Some call a stationary point a critical point. In the language of the previous section,
stationary points are the non-smooth points of the level set. But here we focus on the
function rather than its level sets, hence the new terminology.
Note that if a is a saddle point, then by definition, every open ball Ba (r) contains points
x with f (x) f (a) and points x0 such that f (x0 ) f (a).
Examples (contd.):
(3) Let f (x, y) = x2 y 2 , for all (x, y) R2 . Being a polynomial, f is differentiable
everywhere, and f = (2x, 2y). So the origin is the unique stationary point. We claim
that it is a saddle point. Indeed, consider any open ball B0 (r) centered at 0. If we move
away from 0 along the y direction (inside B0 (r)), the function is z = y 2 and so the origin
is a local maximum along this direction. But if we move away from 0 in the x direction,
7
we see that the origin is a local minimum along that direction. So f (x, y) f (0, 0) is both
positive and negative in B0 (r), and this is true for any r > 0. Hence the claim. Note that if
we graph z = x2 y 2 in R3 , the picture looks quite a bit like a real saddle around the origin.
This is the genesis of the mathematical terminology of a saddle point.
(4) The above example can be jazzed up to higher dimensions as follows:
integers n, k with k < n, and define f : Rn R by
Fix positive
3.4
Given a stationary point a of a nice two-variable function (scalar field) f , say, with continuous
second partial derivatives at a, there is a test one can apply to determine whether it is a
relative maximum or a relative minimum or neither. This generalizes in a non-obvious way
the second derivative test in the one-variable situation. One crucial difference is that in
that (one-variable) case, one does not require the continuity of the second derivative at a.
This added hypothesis is needed in our (2-variable) case because the second mixed partial
derivatives, namely 2 f /xy and 2 f /yx need not be equal at a.
First we need to define the Hessian. Suppose we are given a function f : D R with
D Rn , and an interior point a D where the second partial derivatives 2 f /xi xj exist
for all 1 i, j n. We put
2
f
,
Hf (a) =
xi xj 1i,jn
which is clearly an n n-matrix. It is a symmetric matrix iff all the mixed partial derivatives
are independent of the order in which the partial differentiation is performed. The Hessian
determinant of f at a is by definition the determinant of Hf (a), denoted det(Hf (a)).
We will make use of this notion in the plane.
Theorem 1 Let D R2 , f : D R a scalar field, and a Do (the interior). Assume
that all the second partial derivatives of f exist and are continuous at a. Suppose a is a
stationary point. Then
(i) If 2 f /x2 (a) < 0 and det(Hf (a)) > 0, then a is a relative maximum for f ;
8
(ii) 2 f /x2 (a) > 0 and det(Hf (a)) > 0, then a is a relative minimum for f ;
(iii) If det(Hf (a)) < 0, a is a saddle point for f .
The test does not say anything if Hf (a) is singular, i.e. if it has zero determinant. Since
the second partial derivatives of f exist and are continuous at a, we know from the previous
chapter that
2f
2f
(a) =
(a).
xy
yx
This means the matrix Hf (a) is symmetric (and real). So by a theorem in Linear Algebra
(Ma1b), we know that we can diagonalize it. In other words, we can find an invertible matrix
M such that
1 0
1
M Hf (a)M =
,
0 2
where 1 , 2 are the eigenvalues. (We can even choose M to be an orthogonal matrix, i.e.,
with M 1 being the transpose matrix M t .) Then we have
(D)
det(Hf (a)) = 1 2 ,
and
(T )
where tr denotes the trace. (Recall that the determinant and the trace do not change when
a matrix is conjugated by another matrix, such as M .)
Now suppose Hf (a) has positive determinant, so that
2
2 f /x2 (a) 2 f /y 2 (a) > 2 f /xy(a) 0.
In this case, by (D), 1 and 2 have the same sign. Using (T), we then see that 2 f /x2 (a)
and 2 f /y 2 (a) must have the same sign as 1 and 2 . When this sign is positive (resp.
negative), a is a relative minimum (resp. relative maximum) for f . When the Hessian
determinant is negative at a, 1 and 2 have opposite sign, which means that f increases in
some direction and decreases in another. In other words, a is a saddle point for f .
3.5
Very often in practice one needs to find the extrema of a scalar field subject to some constraints imposed by the geometry of the situation. More precisely, one starts with a scalar
9
field f : D R, with D Rn , and a subset S of the interior of D, and asks what the
maxima and minima of f are when restricted to S.
For example, we could consider the saddle function f (x, y) = x2 y 2 with D = R2 , which
has no relative extrema, and ask for the constrained extrema relative to
S = unit circle :
x2 + y 2 = 1
This can be solved explicitly. A good way to do it is to look at the level curves Lc (f ). For
c = 0, we
y = x and y = x. For c 6= 0, we get hyperbolas with
get the union of the lines
foci ( c, 0) if c > 0 and (0, c) if c < 0. It is clear by inspection that the constrained
extrema of f on S are at the four points (1, 0), (1, 0), (0, 1), (0, 1); the former two are
constrained maxima (with value 1) and the latter two are constrained minima (with value
1). If you like a more analytical proof, see below.
The question of finding constrained extrema is very difficult to solve in general. But
luckily, if f is differentiable and if S is given as the intersection of a finite number of level
sets of differentiable functions (with independent gradients), there is a beautiful method
due to Lagrange to find the extrema, which is efficient at least in low dimensions. Here is
Lagranges result.
Theorem 2 Let f be a differentiable scalar field on Rn . Suppose we are given scalars
c1 , c2 , . . . , cr and differentiable functions g1 , g2 , . . . , gr on Rn , so that the constraint set S is
the intersection of the level sets Lc1 (g1 ), Lc2 (gx ), , Lcr (gr ). Let a be a point on S where
g1 (a), g2 (a), , gr (a) are linearly independent. Then we have the following:
If a is a constrained extremum of f on S, then there are scalars 1 , 2 , , r (called the
Lagrange Multipliers) such that
f (a) = 1 g1 (a) + + 1 gr (a).
(3.1)
Proof. We use the fact, mentioned for r = 1 in the previous section, that one can always
find a differentiable curve : R S with (0) = a and 0 (0) any given tangent vector
in a (S) (note that S is a generalized level set and the linear independence of the gi (a)
assures that a (S) is defined). This is not so easy to prove and we may do this later in the
course if there is time left.
So if a is a constrained extremum of f on S we take such a curve and notice that a is
also an extremum of f on the image of . So the one-variable function h(t) := f ((t)) has
vanishing derivative at t = 0 which by the chain rule implies
f (a) 0 (0) = 0.
10
Since 0 (0) was any given tangent vector we find that f (a) is orthogonal to the whole
tangent space (i.e. it is a normal vector). The normal space N , say, has dimension r since it
is a solution space of a system of linear equations of rank n r (picking a basis 1 , ..., nr of
a (Lc (f )) the system of equations is i x = 0). On the other hand gi (a) all lie in N and
they are linearly independent by assumption, so they form a basis of N . Since weve found
that f (a) lies in N , f (a) must be a linear combination of these basis vectors. Done.
Note that the condition (3.1) is only a necessary condition for a to be a relative extremum
of f on the constraint set S. Once you find such a point a you still have to decide whether
it is a maximum, minimum or saddle point. In other words, we have the same situation as
with the condition f (a) = 0 for unconstrained extrema.
Now let us use (a superspecial case of) this Theorem to check the answer we gave above
for the problem of finding constrained extrema of f (x, y) = x2 y 2 relative to the unit circle
S centered at (0, 0). Here r = 1.
Note first that S is the level set L0 (g), where g(x, y) = x2 + y 2 1. We need to solve
f = g. Since f = (2x, 2y) and g = (2x, 2y), we are left to solve the simultaneous
equations x = x and y = y. For this to happen, x and y cannot both be non-zero,
as otherwise we will get = 1 from the first equation and = 1 from the second! They
cannot both be zero either, as (x, y) is constrained to lie on S and so must satisfy x2 +y 2 = 1.
So, either x = 0, y = 1, when = 1, or x = 1, y = 0, in which case = 1. So the
four candidates for constrained extrema are (0, 1), (1, 0). One can now check that they
are indeed constrained extrema. We get the same answer as before.
We come back to the example of the set S defined as the intersection of the two level sets
x + y 3 + z 3 = 10 and x + y + z = 4. Suppose we want to find points on S with maximal or
minimal distance from the origin. We need to look at f (x, y, z) = x2 + y 2 + z 2 and solve the
system of five equations in five variables
3
x3 + y 3 + z 3 =10
x + y + z =4
f (x, y, z) = (2x, 2y, 2z) =1 (3x2 , 3y 2 , 3z 2 ) + 2 (1, 1, 1).
These are nonlinear equations for which there is no simple method like the Gauss Jordan
algorithm for linear equations. Its a good idea in every such problem to look at symmetries.
Our set S is invariant under any permutation of the coordinates x, y, z, so it has a sixfold
symmetry. Each of x, y, z satisfies the quadratic equation 2t = 31 t2 + 2 or
t2
2
2
t+
= 0.
31
31
11
()
7 13
2
and it factors as (x 1)(x 7x + 9). The two roots of the quadratic part are x = 2
and for each of them we find corresponding a = (x, y, z) namely
7
+
7
+
7
13
13
13
13
, 3 13,
), a00 = (
, 3 + 13,
).
a0 = (
2
2
2
2
We also have
f (a00 ) = 53 13 13 6.12
However, we still dont know if these three points are maxima, minima or saddle points.
Suppose now we look only at the part S + of S where all of x, y, z are nonnegative. It is
easy to see that S + is a closed set (being the intersection of inverse images of closed sets of
the real line under continuous maps) and that S + is bounded. Hence S + is compact and so
f must take on an absolute minimum and an absolute maximum somewhere on S + . Now S
has no intersection with the coordinate hyperplanes where some coordinate is zero (e.g. if
z = 0 then y = 4 x and x3 + (4 x)3 = 10 is a quadratic equation for x with no real roots).
Since a0
/ S + the only candidates for absolute extrema of f on S + which satisfy x = z are
the two points a, a00 . By looking at the values of f we find that f has an absolute minimum
at a and an absolute maximum at a00 .
Taking into account the possibilities where the other two coordinates coincide, weve
found three absolute maxima and three absolute minima, and no other relative extrema, of
f on S + .
Had we found more points on S + where (3.1) is satisfied, the point with the largest (resp.
smallest) value of f would be the absolute maximum (resp. minimum) but for points with
intermediate values we would still need additional arguments to decide whether they are
relative maxima, relative minima or saddle points.
12
a a
L1 and L2 meet if and only if (a1 , a2 ) =
6 (b1 , b2 ). In other words, det 1 2
b1 b2
not be zero.
should
IFF
13
f is not proportional to g
IFF