Convex Optimisation Solutions
Convex Optimisation Solutions
Convex functions
3
2
1
PSfrag replacements
PSfrag replacements
1 2 3
Solution. The first function could be quasiconvex because the sublevel sets appear to be
convex. It is definitely not concave or quasiconcave because the superlevel sets are not
convex.
It is also not convex, for the following reason. We plot the function values along the
dashed line labeled I.
3
2
1
PSfrag replacements
II
Along this line the function passes through the points marked as black dots in the figure
below. Clearly along this line segment, the function is not convex.
Exercises
3
2
1
PSfrag replacements
If we repeat the same analysis for the second function, we see that it could be concave
(and therefore it could be quasiconcave). It cannot be convex or quasiconvex, because
the sublevel sets are not convex.
3.3 Inverse of an increasing convex function. Suppose f : R R is increasing and convex
on its domain (a, b). Let g denote its inverse, i.e., the function with domain (f (a), f (b))
and g(f (x)) = x for a < x < b. What can you say about convexity or concavity of g?
Solution. g is concave. Its hypograph is
hypo g
=
=
=
=
{(y, t) | t g(y)}
{(y, t) | f (t) f (g(y))}
{(y, t) | f (t) y)}
0
1
1
0
(because f is increasing)
epi f.
For differentiable g, f , we can also prove the result as follows. Differentiate g(f (x)) = x
once to get
g 0 (f (x)) = 1/f 0 (x).
so g is increasing. Differentiate again to get
g 00 (f (x)) =
f 00 (x)
,
f 0 (x)3
so g is concave.
3.4 [RV73, page 15] Show that a continuous function f : Rn R is convex if and only if for
every line segment, its average value on the segment is less than or equal to the average
of its values at the endpoints of the segment: For every x, y Rn ,
1
0
f (x + (y x)) d
f (x) + f (y)
.
2
1
0
f (x + (y x)) d
1
0
f (x) + f (y)
.
2
Now we show the converse. Suppose f is not convex. Then there are x and y and
0 (0, 1) such that
f (0 x + (1 0 )y) > 0 f (x) + (1 0 )f (y).
Convex functions
1
0
f (u + (u v)) d >
f (u) + f (v)
.
2
In other words, the average of f over the interval [u, v] exceeds the average of its values
at the endpoints. This proves the converse.
3.5 [RV73, page 22] Running average of a convex function. Suppose f : R R is convex,
with R+ dom f . Show that its running average F , defined as
F (x) =
1
x
f (t) dt,
dom F = R++ ,
=
=
=
(1/x2 )
(2/x3 )
3
(2/x )
f (t) dt + f (x)/x
0
x
0
x
0
1
t1
g(0) + g(t)
t
t
Convex functions
1
1
0,
x,
x0
x > 0,
v=
4
4/9
Exercises
Examples
3.15 A family of concave utility functions. For 0 < 1 let
u (x) =
x 1
,
1 1
= 0.
u0 (x) = x1 ,
which is positive for all x (since 0 < < 1), so these functions are increasing. To
show concavity, we examine the second derivative:
u00 (x) = ( 1)x2 .
Since this is negative for all x, we conclude that u is strictly concave.
3.16 For each of the following functions determine whether it is convex, concave, quasiconvex,
or quasiconcave.
(a) f (x) = ex 1 on R.
Solution. Strictly convex, and therefore quasiconvex. Also quasiconcave but not
concave.
(b) f (x1 , x2 ) = x1 x2 on R2++ .
Solution. The Hessian of f is
2
f (x) =
0
1
1
0
1
x1 x2
2/(x21 )
1/(x1 x2 )
1/(x1 x2 )
2/x22
0
Convex functions
1/x22
2x1 /x32
0
1/x22
f (x) =
2/x2
2x1 /x22
2x1 /x22
2x21 /x32
= (2/x2 )
1
2x1 /x2
2x1 /x2
0.
2 f (x)
( 1)x12 x1
2
(1 )x11 x
2
1
)x
1 x2
(1
1
(1 )x
1 x2
0.
(1 )x11 x
2
1
(1 )()x
1 x2
1/x21
1/x1 x2
1/x1 x2
1/x22
1/x1
1/x2
T
1/x1
1/x2
f (x) =
xpi
i=1
!1/p
Pn
1/2
with dom f = Rn
x )2 and
++ is concave. This includes as special cases f (x) = (
i=1 i
Pn
the harmonic mean f (x) = ( i=1 1/xi )1 . Hint. Adapt the proofs for the log-sum-exp
function and the geometric mean in 3.1.5.
Solution. The first derivatives of f are given by
n
X p (1p)/p p1
f (x)
=(
xi )
xi
=
xi
i=1
f (x)
xi
1p
f (x)
xi
2 f (x)
1p
=
f (x)
x2i
p
f (x)2
x2i
f (x)
xj
1p
1p
1p
xi
1p
f (x)
f (x)
xi
f (x)2
xi xj
1p
1p
Convex functions
(e) f (x, t) = log(tp kxkpp ) where p > 1 and dom f = {(x, t) | t > kxkp }. You can
use the fact that kxkpp /up1 is convex in (x, u) for u > 0 (see exercise 3.23).
Solution. Express f as
f (x, t)
=
=
The first term is convex. The second term is the composition of a decreasing convex
function and a concave function, and is also convex.
3.23 Perspective of a function.
(a) Show that for p > 1,
f (x, t) =
kxkpp
|x1 |p + + |xn |p
= p1
p1
t
t
f (x) =
kAx + bk22
cT x + d
v
w
T
I/t
y T /t
y/t2
y T y/t3
v
w
Exercises
(d)
Pn
i=1
n
X
i=1
pi a2i (
n
X
p i ai ) 2 ,
i=1
i=1
where k = max{i = 1, . . . , n | i < } is the largest integer less than , and p[i] is
Pk
the ith largest component of p. We know that
p is a convex function of p,
i=1 [i]
Pk
so the inequality
p
<
0.9
defines
a
convex
set.
[i]
i=1
In general, f (p) is not quasiconvex. For example, we can take n = 2, a1 = 0 and
a2 = 1, and p1 = (0.1, 0.9) and p2 = (0.9, 0.1). Then f (p1 ) = f (p2 ) = 1, but
f ((p1 + p2 )/2) = f (0.5, 0.5) = 2.
(h) The minimum width interval that contains 90% of the probability, i.e.,
inf { | prob( x ) 0.9} .
Solution. The minimum width interval that contains 90% of the probability must
be of the form [ai , aj ] with 1 i j n, because
prob( x ) =
j
X
k=i
pk = prob(ai x ak )
Convex functions
We show that the function is quasiconcave. We have f (p) if and only if all
intervals of width less than have a probability less than 90%,
j
X
pk < 0.9
k=i
3.25 Maximum probability distance between distributions. Let p, q Rn represent two probability distributions on {1, . . . , n} (so p, q 0, 1T p = 1T q = 1). We define the maximum
probability distance dmp (p, q) between p and q as the maximum difference in probability
assigned by p and q, over all events:
dmp (p, q) = max{| prob(p, C) prob(q, C)| | C {1, . . . , n}}.
Here
P prob(p, C) is the probability of C, under the distribution p, i.e., prob(p, C) =
p.
iC i
Pn
Find a simple expression for dmp , involving kp qk1 = i=1 |pi qi |, and show that dmp
is a convex function on Rn Rn . (Its domain is {(p, q) | p, q 0, 1T p = 1T q = 1}, but
it has a natural extension to all of Rn Rn .)
Solution. Noting that
prob(q, C)),
X
iC
(pi qi ).
C ? = {i {1, . . . , n} | pi > qi }.
Lets show this. The indices for which pi = qi clearly dont matter, so we will ignore
them, and assume without loss of generality that for each index, p> qi or pi < qi . Now
consider any other subset C. If there is an element k in C ? but not C, then by adding
k to C we increase prob(p, C) prob(q, C) by pk qk > 0, so C could not have been
optimal. Conversely, suppose that k C \ C ? , so pk qk < 0. If we remove k from C,
wed increase prob(p, C) prob(q, C) by qk pk > 0, so C could not have been optimal.
P
Thus, we have dmp (p, q) =
(pi qi ). Now lets express this in terms of kp qk1 .
pi >qi
Using
X
X
(pi qi ) = 1T p 1T q = 0,
(pi qi ) +
pi >qi
pi qi
Convex functions
(a) Let A = conv B. Since B A, we obviously have SB (y) SA (y). Suppose we have
strict inequality for some y, i.e.,
yT u < yT v
for all u B and some v A. This leads to a contradiction, because
by definition v
P
is
Pthe convex combination of a set of points ui B, i.e., v = i i ui , with i 0,
= 1. Since
i i
y T ui < y T v
yT v =
i y T ui <
i y T v = y T v.
=
=
=
sup{y T (u + v) | u A, v B}
sup{y T u | u A} + sup{y T v | u B}
SA (y) + SB (y).
=
=
=
sup{y T u | u A B}
(d) Obviously, if A B, then SA (y) SB (y) for all y. We need to show that if A 6 B,
then SA (y) > SB (y) for some y.
Suppose A 6 B. Consider a point x
A, x
6 B. Since B is closed and convex, x
Conjugate functions
3.36 Derive the conjugates of the following functions.
(a) Max function. f (x) = maxi=1,...,n xi on Rn .
Solution. We will show that
f (y) =
if y 0, 1T y = 1
otherwise.
We first verify the domain of f . First suppose y has a negative component, say
yk < 0. If we choose a vector x with xk = t, xi = 0 for i 6= k, and let t go to
infinity, we see that
xT y max xi = tyk ,
i
Exercises
is unbounded above. Similarly, when y 0 and 1T y < 1, we choose x = t1 and
let t go to infinity.
The remaining case for y is y 0 and 1T y = 1. In this case we have
xT y max xi
i
for all x, and therefore x ymaxi xi 0 for all x, with equality for x = 0. Therefore
f (y) = 0.
(b) Sum of largest elements. f (x) =
Solution. The conjugate is
f (y) =
Pr
i=1
x[i] on Rn .
0 y 1,
otherwise,
1T y = r
We first verify the domain of f . Suppose y has a negative component, say yk < 0.
If we choose a vector x with xk = t, xi = 0 for i 6= k, and let t go to infinity, we
see that
xT y f (x) = tyk ,
so y is not in dom f .
Next, suppose y has a component greater than 1, say yk > 1. If we choose a vector
x with xk = t, xi = 0 for i 6= k, and let t go to infinity, we see that
xT y f (x) = tyk t ,
so y is not in dom f .
Finally, assume that 1T x 6= r. We choose x = t1 and find that
xT y f (x) = t1T y tr
is unbounded above, as t or t .
If y satisfies all the conditions we have
xT y f (x)
for all x, with equality for x = 0. Therefore f (y) = 0.
i=1,...,m
We see that dom f = [a1 , am ], since for y outside that range, the expression inside
the supremum is unbounded above. For ai y ai+1 , the supremum in the
definition of f is reached at the breakpoint between the segments i and i + 1, i.e.,
at the point (bi+1 bi )/(ai+1 ai ), so we obtain
f (y) = bi (bi+1 bi )
y ai
ai+1 ai
Convex functions
(d) Power function. f (x) = xp on R++ , where p > 1. Repeat for p < 0.
Solution. Well use standard notation: we define q by the equation 1/p + 1/q = 1,
i.e., q = p/(p 1).
We start with the case p > 1. Then xp is strictly convex on R+ . For y < 0 the
function yx xp achieves its maximum for x > 0 at x = 0, so f (y) = 0. For y > 0
the function achieves its maximum at x = (y/p)1/(p1) , where it has value
y(y/p)1/(p1) (y/p)p/(p1) = (p 1)(y/p)q .
Therefore we have
f (y) =
y0
y > 0.
0
(p 1)(y/p)q
For p < 0 similar arguments show that dom f = R++ and f (y) =
p
(y/p)q .
q
f (y) =
if y 0,
otherwise.
(yi )
1/n
1/n
We first verify the domain of f . Assume y has a positive component, say yk > 0.
Then we can choose xk = t and xi = 1, i 6= k, to show that
xT y f (x) = tyk +
X
i6=k
yi t1/n
( )
x y f (x) = tn t
yi
i
as t . This demonstrates that the second condition for the domain of f is also
needed.
1/n
Q
(yi )
1/n, and x 0. The arithmeticNow assume that y 0 and
i
geometric mean inequality states that
xT y
Y
i
(yi xi )
!1/n
Y
i
xi
!1/n
We first verify the domain. Suppose kyk2 u. Choose x = sy, t = s(kxk2 + 1) >
skyk2 su, with s 0. Then
y T x + tu > sy T y su2 = s(u2 y T y) 0,
Exercises
3.52 [MO79, 3.E.2] Log-convexity of moment functions. Suppose f : R R is nonnegative
with R+ dom f . For x 0 define
(x) =
ux f (u) du.
ux1 eu du,
is log-convex for x 1.
Solution. g(x, u) = ux f (u) is log-convex (as well as log-concave) in x for all u > 0. It
follows directly from the property on page 106 that
(x) =
g(x, u) du =
ux f (u) du
is log-convex.
3.53 Suppose x and y are independent random vectors in Rn , with log-concave probability
density functions f and g, respectively. Show that the probability density function of the
sum z = x + y is log-concave.
Solution. The probability density function of x + y is f g.
3.54 Log-concavity of Gaussian cumulative distribution function. The cumulative distribution
function of a Gaussian random variable,
1
f (x) =
2
et
/2
dt,
is log-concave. This follows from the general result that the convolution of two log-concave
functions is log-concave. In this problem we guide you through a simple self-contained
proof that f is log-concave. Recall that f is log-concave if and only if f 00 (x)f (x) f 0 (x)2
for all x.
(a) Verify that f 00 (x)f (x) f 0 (x)2 for x 0. That leaves us the hard part, which is to
show the inequality for x < 0.
(b) Verify that for any t and x we have t2 /2 x2 /2 + xt.
(c) Using part (b) show that et
/2
et
ex
/2
/2xt
dt ex
. Conclude that
/2
ext dt.
(d) Use part (c) to verify that f 00 (x)f (x) f 0 (x)2 for x 0.
2
f 0 (x) = ex /2 / 2,
(a) f 00 (x) 0 for x 0.
(b) Since t2 /2 is convex we have
f 00 (x) = xex
/2
/ 2.
t2 /2 x2 /2 + x(t x) = xt x2 /2.
This is the general inequality
g(t) g(x) + g 0 (x)(t x),
which holds for any differentiable convex function, applied to g(t) = t2 /2.
Convex functions
/2
et
/2
et
/2
ex /2
.
x
dt
dt ex
ext dt =
ex
.
x
g(t) dt =
eh(t) dt
(a) Express the derivatives of f in terms of the function h. Verify that f 00 (x)f (x)
(f 0 (x))2 if h0 (x) 0.
(b) Assume that h0 (x) < 0. Use the inequality
h(t) h(x) + h0 (x)(t x)
(which follows from convexity of h), to show that
eh(t) dt
eh(x)
.
h0 (x)
Rx
h (x)e
h(x)
eh(t) dt e2h(x) ,
(h (x))
eh(t) dt
exh
exh
eh(x)
h0 (x)
eh(x) .
eh(t) dt
(x)h(x)
eth
(x)
dt
/(h0 (x))