Convexity and Differentiable Functions: R R R R R R R R R R R R R R R R
Convexity and Differentiable Functions: R R R R R R R R R R R R R R R R
2 3
We know that half – planes in R and half – spaces in R are fundamental examples
of convex sets. Many of these examples are defined by inequalities of the form y ≥
f (x1, x2, ... , xk) where f is a first degree polynomial in the coordinates x j and k =
2 3
1 or 2 depending upon whether we are looking at R or R . Our objective here is to
derive a simple criterion for recognizing other convex sets defined by inequalities of the
form y ≥ f (x1, x2, ... , xk), where f is a function with continuous second partial
derivatives; the set defined in this manner is sometimes called the epigraph of the real
valued function f .
(Source: http://en.wikipedia.org/wiki/File:Epigraph_convex.svg)
Many functions in elementary calculus have convex epigraphs. In particular, if we take
x
f (x) = x2 or e , then inspection of the graphs strongly suggests that their epigraphs
are convex; our recognition criterion will prove these statements (and also yield many
others of the same type).
(Source:
http://algebra.freehomeworkmathhelp.com/Relations_and_Functions/Graph
s/Graphs_of_Algebra_Functions/graphs_of_algebra_functions.html)
Nearly all of our discussion generalizes to a class of examples known as convex
functions, but our discussion will be limited because we mainly interested in finding
examples. Here are some references for the more general results about convex
functions:
We shall begin by considering convex functions of one variable, and afterwards we shall
explain how everything can be extended to functions of two (or more) variables.
n
Definition. If K is a convex subset of R and f is a real valued function on K, then f
is said to be convex if for each x, y in K and each t in the open interval (0, 1) we
have
f ( t x + (1 – t ) y) ≤ t f (x) + (1 – t ) f (y).
In calculus textbooks (but practically nowhere else!) such functions are often said to be
concave upward.
(Source: http://withfriendship.com/user/levis/convex-function.php)
Convex functions have been studied extensively in both theoretical and applied
mathematics. Further information can be found in the following online article:
http://en.wikipedia.org/wiki/Convex_function
t f (x) + (1 − t) f (y) ≥ f (t x + (1 − t) y )
t u + (1 − t) v ≥ t f (x) + (1 − t) f (y)
and the convexity of f shows that the right hand side is greater than or equal to f (t x + (1 − t) y ).
Combining these, we obtain the inequality in first sentence of the paragraph.
We shall now state the main result; versions of it are implicit in the discussions of curve
sketching that appear in standard calculus texts.
Theorem 2. Let K ⊂ R be an interval, and let f be a real valued function on K with a continuous
second derivative. If f 00 is nonnegative everywhere, then f is convex on K.
The next result contains the main steps in the proof of Theorem 2.
Lemma 3. Let f be a real valued function on the closed interval [a, b] with a second continuous
derivative. Suppose further that f 00 is nonnegative on [a, b] let g be the linear function with
f (a) = g(a) and f (b) = g(b). Then f (x) ≤ g(x) for all x ∈ [a, b].
Proof of Lemma 3. Define a new function h on [a, b] by h = g − f . Then by construction we
have h(a) = h(b) = 0 and h00 (x) = g 00 (x) − f 00 (x) = 0 − f 00 (x) because the second derivative of the
linear function g is zero; since f 00 ≥ 0 it follows that h00 ≤ 0. The preceding observations then yield
the following properties of the function h:
(i) For some C ∈ (a, b) we have h0 (C) = 0.
(ii) The derivative h0 is nondecreasing.
The first of these is a consequence of Rolle’s Theorem, and the second follows because the derivative
h00 of h0 is nonpositive. If we combine (i) and (ii) we see that h 0 (x) ≥ 0 for x ≤ C and h0 (x) ≤ 0
for x ≥ C.
In order to prove Lemma 3, we must show that h(x) ≥ 0 everywhere. Since h(a) = 0 and
0
h ≥ 0 for x ≤ C, it follows that h(x) ≥ 0 for x ≤ C. We need to prove the same conclusion for
x ≥ C; assume this is false, and assume specifically that h(D) < 0 for some D ∈ (C, b). Since
h0 ≤ 0 for x ≥ D, we must also have h(x) < 0 for all x ∈ (D, b]. In particular, this implies that
h(b) < 0, which is a contradiction because we know that h(b) = 0. The source of this contradiction
is our supposition that h(D) < 0 for some D, and thus we must have h ≥ 0 everywhere.
Proof of Theorem 2. Let a, b ∈ K; our objective is to prove that if t ∈ (0, 1) then
We claim that, without loss of generality, we may assume a is less than b; this is true because
for each t ∈ (0, 1) we may rewrite (1 − t) a + t b as (1 − s) a + s b where s = 1 − t also lies in (0, 1).
Assuming a < b. take g to be the linear function defined in Lemma 3, let t ∈ (0, 1) and set x
equal to (1 − t) a + t b; since x ∈ (a, b) there is a unique solution t to this equation given by
x−a
t =
b−a
and this solution lies in (0, 1). We can now apply Lemma 3 to conclude that f (x) ≤ g(x). Since g
is a linear function we have
and by the preceding sentence we know this is greater than or equal to f ( (1 − t) a + t b). Therefore
f is a convex function.
EXAMPLES. Theorem 2 implies that both f (x) = x 2 and f (x) = ex are convex because their
second derivatives are the positive valued functions 2 (the constant function) and e x respectively.
Similarly, f (x) = 1/x is convex on the open half-line defined by x > 0 because f 00 (x) = 2/x3 is
positive for x > 0.
Although a few complications arise, we can prove a corresponding Second Derivative Test for
recognizing convex functions of finitely many (say n) variables. The first of these is standard in
multivariable differential calculus; namely, we must restrict attention to open convex subsets of
Rn if n ≥ 2. Likewise, in analogy with the second derivative tests for relative maxima and minima,
we need to consider certain algebraic properties of the Hessian matrix
∂2f
H(f ) =
∂xj ∂xj
which is symmetric (mixed partials do not depend upon the order in which the partial derivatives
are taken).
Algebraic digression. If A = (ai,j ) is a symmetric n × n matrix, then A is said to be positive
definite if for each nonzero vector v = (v 1 , · · · , vn ) we have
X
ai,j vi vj > 0.
i,j
The standard test for recognizing such matrices is the principal minors test:
Given a symmetric matrix A as above, let A k be the k × k submatrix generated by the
first k rows and columns of A. Then A is positive definite if and only if det A k > 0 for
k = 1, ... , n.
See pages 84 and 88–90 of the document
http://math.ucr.edu/'res/math132/linalgnotes.pdf
for further information, including a proof of this fact.
The relevance of positive definite matrices arises from the following observation.
Lemma 4. Let U be a convex open subset of R n , let f be a real valued function with continuous
second partial derivatives, let x and y be distinct points of U , and write v = y − x (hence v is
nonzero. If ϕ(t) = f (x + t v) for t in some open interval containing [0, 1], then
X ∂2f
ϕ00 (t) = (x + t v) vi vj .
i,j
∂xj ∂xj
NOTE. The convexity and openness of U imply that ϕ(t) can be always be defined for all t in some
open interval containing [0, 1].
Proof of Lemma 4. This follows immediately from successive applications of the Chain Rule to
ϕ(t) and ϕ0 (t).
By construction ϕ(0) = f (x) and ϕ(1) = f (y), so the convexity of f follows from substitution of
these values into the right hand side of the display above.
If we specialize to the case n = 2 the Principal Minors Test for the Hessian of f reduces to the
pair of inequalities
∂2f ∂2f
> 0, det > 0
∂x21 ∂xj ∂xj
and computations for specific examples are often very easy.
EXAMPLES. If K = R2 , then the functions f (x, y) = x2 +y 2 and f (x, y) = ex +ey are convex on K
by Theorem 5 and the preceding simplification of the Principal Minors Test. Similar considerations
show that if K is the open first quadrant defined by x > 0 and y > 0, then f (x, y) = 1/xy is convex
on K.
Exercise
Show that if K is an open convex set and f is a convex function on K then the open epigraph
consisting of all (x, u) ∈ K × R such that u > f (x) (i.e., we have strict inequality) is also a convex
set. [Hint: Imitate the relevant portion of the proof for Theorem 1.]