PS Answers Fall2022 Merged

Mathematics for Economics and Finance (Fall 2022)
Problem Set 13: Optimization

Professor: Roxana Mihet
TA: Samy
Due Dec 16
1 Optimization
Consider the function f defined for all (x, y) by
f (x, y) = (1 + y)3 x2 + y 2 .
Prove that f has a unique stationary point at (0, 0) which is a local minimum, but f has no global minimum.
Anwer:
FOC:
∂f
= 2(1 + y)3 x = 0, (1)
∂x
∂f
= 3x2 (1 + y)2 + 2y = 0. (2)
∂y
From (1) ⇒ x = 0 or y = −1. If x = 0, then from (2) ⇒ y = 0. If y = −1, from (2) derive contradiction.
⇒ Only (0, 0) is stationary point.
00 00 00 00
Check Hessian, f11 = 2(1 + y)3 , f22 = 6x2 (1 + y) + 2, f12 = f21 = 6x(1 + y)2 . At point (0, 0),
" #
2 0
H=
0 2
⇒ D1 = 2, D2 = 4 > 0, Positive definite ⇒it is local minimum.

But it is not a global minimum, for example: f (x, −2) = −x2 + 4, if x → ∞, then f (x, −2) → −∞.
2 Constrained Optimization on General Convex Sets

Consider the optimization problem
max f (x; θ)
x∈X(θ)
where f is a real-valued C 1 function, θ ∈ Θ is a parameter vector, and X(θ) is a closed, convex subset of RN
for all θ. Denote the gradient by Df (x; θ). Determine whether the following statements are true or false:
1. If Df (x∗ ; θ) = 0, then x∗ is a local extremum.
2. If x∗ is a local extremum, then Df (x∗ ; θ) = 0.
3. If Df (x∗ ; θ) 6= 0, then x∗ is not a local extremum.
4. If x∗ is a local extremum and f is strictly quasi-concave, then x∗ is a global extremum.
5. If f is strictly concave on X × Θ, then the value function is strictly concave.
1
Answer:
1. False. If Df (x∗ ; θ) = 0, then x∗ may be an inflection point.
2. False. If x∗ is a local extremum, then x∗ may be a corner solution at which Df (x∗ ; θ) 6= 0.
3. False. If Df (x∗ ; θ) 6= 0, then x∗ may be a corner solution yet still a local extremum.
4. True. We want to show that if f is strictly quasi-concave and x∗ is a local max then x∗ is a global
max. By definition of local max, x∗ is a local max ⇒ there exists δ > 0 such that ∀x ∈ X(θ) when
||x−x∗ || < δ it is true that f (x∗ ) ≥ f (x). To prove by contradiction, lets assume that x∗ is not a global
maximum. Then ∃x0 ∈ X(θ) and x0 6= x∗ such that f (x0 ) > f (x∗ ). Consider xλ := λx0 + (1 − λ)x∗ ,
0 ≤ λ ≤ 1. Since X(θ) is convex and x0 , x∗ ∈ X(θ), then xλ ∈ X(θ) for λ ∈ [0, 1]. On the other side,
for λ small enough, it is clear that ||xλ − x∗ || < δ. From strict quasi-concavity definition we have:
f (xλ ) = f (λx0 + (1 − λ)x∗ ) > min(f (x0 ), f (x∗ )) = f (x∗ )
and this holds for small enough λ. This is a contradiction with x∗ being a local maximum (since we
got that f (xλ ) > f (x∗ ) even when ||xλ − x∗ || < δ). Thus, if f is strictly quasi-concave, x∗ is not a
global max implies x∗ is not a local max. So, we arrive to a contradiction.
5. True. If f is strictly concave on X × Θ, then the value function is strictly concave. The value function
inherits this properties from the objective function.
3 Cobb-Douglas production
Consider a competitive firm that produces a single output y using two inputs x1 and x2 . The firm’s
production technology is described by a Cobb-Douglas function
β
y(x) = f (x1 , x2 ) = xα
1 x2 ,
where α + β < 1, α > 0 and β > 0. Taking as given the output price p and the input prices w1 and w2 , the
firm maximizes its profits given by
Π(x) = py(x) − wT x.
1. Does the production function exhibit decreasing, constant, or increasing returns to scale? Show.
2. Write the first-order conditions and check whether sufficient conditions for a maximum are satisfied.
3. Solve for the firm’s factor demands, giving the optimal input levels x∗i as functions of input and output
prices.
Answer:
1.
α β
f (λx1 , λx2 ) = (λx1 ) (λx2 )
= λα+β f (x1 , x2 )
So Production function f (x1 , x2 ) is hd(α + β), since α + β < 1, it exhibits decreasing returns to scale.
2
2. For max Π = py(x) − w0 x, the FOC: py 0 (x) − w = 0. It is the sufficient condition for a maximum,
x1 ,x2
if Π(x) is concave. To check this, firstly, we will show y(x) is strictly concave provided α + β <
β−1 ∂ 2 y ∂2y
∂y
1 ⇔Hessian is negative definite. ∂x 1
= αx1α−1 xβ2 , ∂x
∂y
2
= βxα
1 x2 , ∂x2 = α(α − 1)xα−2
1 xβ2 , ∂x 2 =
1 2
2
β−2
β(β − 1)xα
1 x2 , ∂x∂1 ∂x
y
2
= αβxα−1
1 x2β−1 , ⇒
!
α(α − 1)x1α−2 xβ2 αβx1α−1 xβ−1
2
H= .
αβxα−1
1 xβ−1
2 β(β − 1)xα β−2
1 x2
One way to check is to show that the leading principal minors,d1 < 0, d2 > 0,
d1 = α(α − 1)xα−2
1 xβ2 < 0 since (α − 1) < 0
2(α−1) 2(β−1) 2(α−1) 2(β−1)
d2 = |H| = αβ(α − 1)(β − 1)x1 x2 − α2 β 2 x1 x2
2(α−1) 2(β−1)
= αβx1 x2 (1 − (α + β)) > 0 since α + β < 1
which implies Hessian is Negative Definite ⇒ the Hessian of Π is Negative Definite ⇒ Π is strictly
concave, so the sufficient condition is satisfied.
3. By FOC:
∂Π
= 0 ⇒ pαx1α−1 xβ2 = w1 (1)
∂x1
∂Π β−1
= 0 ⇒ pβxα
1 x2 = w2 (2)
∂x2
1
β β 1−α−β
(1) α −1 w1 βw1 pα β w1
(2) ⇒ β x1 x2 = w2 ⇒ x2 = αw2 x1
(3), substituting (3) into (1), solve for x∗1 = w1 α w2 ,
h α α i 1−α−β 1
pβ w2
x∗2 can be obtained in the same way, x∗2 = w 2
α
β w1 .
4 Cobb-Douglas utility function

1−α
Assume the consumer’s utility function is Cobb-Douglas utility function: u(x1 , x2 ) = kxα 1 x2 for some
α ∈ (0, 1) and k > 0,and his budget constraint follows: p1 x1 + p2 x2 = w. Find the consumer’s optimal
consumption bundle.
Answer: Assume Cobb-Douglas utility function is concave and it is increasing at all (x1 , x2 ) >> 0, hd(1).
We need to max u(x1 , x2 ), s.t. p1 x1 + p2 x2 = w It turns out to be easier to use increasing transformation
x1 ,x2
max α ln x1 + (1 − α) ln x2 ,
x1 ,x2
1
s.t p1 x1 + p2 x2 = w. Since we have x2 = p2 (w − p1 x1 ), we can rewrite
1
maxα ln x1 + (1 − α) ln (w − p1 x1 ).
x1 p2
Using FOC , take the first derivative w.r.t x1 ,

α p1
− (1 − α) = 0.
x1 w − p1 x1
3
Solve it, we have
αw
x1 = ,
p1
(1 − α) w
x2 = .
p2
5 The Kuhn-Tucker Method: Optimization Under Inequality Con-

straints (Final Exam 2015/2016)
A social planner is faced with the following optimization problem:
max f (x, y, z) = 2x + y + az,

(x,y,z)∈D
where
D = {(x, y, z) ∈ R3 | x2 + ay 2 ≤ a, x2 + az 2 ≤ a, a > 0}.
1. Explain why f must have a maximum on D.
2. For which points (x, y, z) and parameter a does constraint qualification hold?
3. Formulate the necessary conditions for the maxima of f on D.
4. Find the optimum.
5. What is the value of a if the value of z at optimum is √1 .

2
6. Without computations, what can you say about the solution of the optimization problem if:
(a) D = {(x, y, z) ∈ R3 | x2 + ay 2 ≥ a, x2 + az 2 ≤ a, a > 0}.

(b) You minimize f (x, y, z).
Answer:
1. The objective function f (x, y, z) is continuous, and the constraint set D is closed and bounded. There-
fore by Weierstrass’ Theorem we can conclude that f (x, y, z) has a maximum on D.
2. There are two constraint functions:
g1 (x, y, z) = a(1 − y 2 ) − x2 ≥ 0, g2 (x, y, z) = a(1 − z 2 ) − x2 ≥ 0,
The constraints are not linear.

The gradients are:
5g1 (x, y, z) = (−2x, −2ay, 0) and 5 g2 (x, y, z) = (−2x, 0, −2az) ,
The rank of the Jacobian, rank([5g1 ; 5g2 ]) can be lower than 2 if:
(a) a = 0, impossible because by definition a > 0

(b) y = z = 0, then two cases can happen
4
i. x = 0, the rank of the Jacobian is 0, but the constraints cannot be binding with a > 0.
Therefore the rank of the Jacobian is equal to the maximum number of binding constraints
and constraint qualification is satisfied.
ii. x 6= 0, the rank of the Jacobian is 1, but both constraints become identical (x2 ≤ a). With
only one constraint the maximum number of binding constraints is 1 and equal to the rank
of the Jacobian in this case too. Therefore the constraint qualification is satisfied.
It implies that CQ is satisfied for any point of the set D.
3. The Lagrangian is given by L = 2x + y + az + λ1 (a(1 − y 2 ) − x2 ) + λ2 (a(1 − z 2 ) − x2 )

The necessary first-order conditions are:
∂L
1: ∂x = 2 − 2λ1 x − 2λ2 x = 0,
∂L
2: ∂y = 1 − 2λ1 ay = 0,
∂L
3: ∂z = a − 2λ2 az = 0,
The necessary slackness conditions are:
λ1 ≥ 0, a(1 − y 2 ) − x2 ≥ 0, λ1 (a(1 − y 2 ) − x2 ) = 0,
λ2 ≥ 0, a(1 − z 2 ) − x2 ≥ 0, λ2 (a(1 − z 2 ) − x2 ) = 0.
4. from 1: x∗ = 1
λ1 +λ2
from 2: y ∗ = 1 2 2
2aλ1 ⇒ λ1 6= 0 ⇒ a(1 − y ) − x = 0
from 3: z ∗ = 1 2 2
2λ2 ⇒ λ2 6= 0 ⇒ a(1 − z ) − x = 0
i.e. both constraints must be binding at optimum.

With both constraints binding we find y ∗ = ±z ∗ .
From first order conditions and slackness conditions y and z must be positive therefore we find y ∗ = z ∗ .
Then from 2 and 3: 2aλ 1
1
= 2λ1 2 ⇒ λ1 = λa2 .
∗ a
Back in 1 we find: x = λ2 (a+1)
Using 3 and slackness conditions we find:
1 4aλ22 − a a2
x2 = a(1 − z 2 ) = a(1 − ) ⇒ = 2 ,
4λ22 2
4λ2 λ2 (a + 1)2
4a2
⇒ 4aλ22 − a = ,
(a + 1)2
1 a
⇒ λ22 = + ,
4 (a + 1)2
s
∗ 1 a
⇒ λ2 = + ,
4 (a + 1)2
5
q
1 a
4 + (a+1)2
Then we find λ1 = a and:
1 a
x∗ = q ,
1
+ a a+1
4 (a+1)2
1
y∗ = z∗ = q .
1 a
2 4 + (a+1) 2
5. Using previous result q 1 = z ∗ therefore with z ∗ = √

1
1
we find a = 1.
2 14 + (a+1)
a
2 2 2
6. (a) The domain D is no more bounded. In this case y becomes free and there is no more optimum.
(b) The domain D is bounded and by Weierstrass we can find an optimum.
6
Problem Set 14: Statistics and Static Optimization
TA: Samy
Due Dec 23
1 Unconstrained optimization on open set

Show that the function f (x1 , x2 , x3 ) = x21 + x22 + 3x23 − x1 x2 + 2x1 x3 + x2 x3 defined on R3 has only one
stationary point and it is a local minimum.
Answer: A stationary point of f can be found by solving FOC:
 ∂f

 ∂x1 = 2x1 − x2 + 2x3 = 0
∂f
∂x2 = −x1 + 2x2 + x3 = 0
∂f

= 2x1 + x2 + 6x3 = 0

∂x3
⇒ f has only one stationary point, x = (0, 0, 0). The Hessian matrix is
 
2 −1 2
H(x) =  −1 2 1 ,
 
2 1 6
and the leading principal minors of the Hessian matrix are D1 = 2, D2 = 3, D3 = 4. H(x) is positive definite,
and thus f (x) is strictly convex. The point x = (0, 0, 0) is a local (and a global) minimum.
2 The Langrange Method: Optimization under Equality Constraints

Consider an economy with 100 units of labor. It can produce chocolates x or watches y. To produce x
chocolates, it takes x2 units of labor, likewise y 2 units of labor are needed to produce y units of watches.
Economy has the following objective function F (x, y) = ax + by. Solve for the optimal amounts of x and y.
Answer: The economy’s resource constraint is:
x2 + y 2 = 100.
To solve the problem form the Lagrangian:
L(x, y, λ) = ax + by + λ(100 − x2 − y 2 ).
The first-order conditions are:

∂L
≡ a − 2λx = 0
∂x
∂L
≡ b − 2λy = 0
∂y
∂L
≡ 100 − x2 − y 2 = 0
∂λ
1
Substitute from the first two into the third to get,
√
a2 + b2 a2 + b2
100 = =⇒ λ = .
4λ2 20
a 10a
x= =√ .
2λ a2 + b2
b 10b
y= =√ .
2λ a2 + b2
3 Complements or substitutes?
An agent consumes two goods, x1 and x2 , with prices p1 and p2 ,respectively. Her utility function is of the
form U (x1 , x2 ) = α(xα α
1 + x2 ), with α < 1. Verify that utility function is strictly concave. Derive the demand
function of the agent. In what direction does the demand for good 1 change if there is an increase in the
price of good 2.
max α(xα α
1 + x2 ) s.t. p1 x + p2 x2 = y.
x1, x2
We can write down the Lagrangian function
L = α(xα α
1 + x2 ) + λ[y − p1 x + p2 x2 ].
FOC:
∂L
= α2 xα−1
1 − λp1 = 0
∂x1
α2
=⇒ λ = (1)
p1 x1−α
1
∂L
= α2 xα−1
2 − λp2 = 0
∂x2
α2
=⇒ λ = (2)
p2 x1−α
2
To verify that the utility function is strictly concave, check Hessian matrix for U (x1, x2 ) = α(xα α
1 + x2 ),
0 2 α−1 0 2 α−1 0 2 α−2 0 2 α−2 00 00
" U1 = α x1 , U2 = α x2 , U#11 = α (α − 1)x1 , U22 = α (α − 1)x2 , U12 = U21 = 0 ⇒ H =
α2 (α − 1)xα−2
1 0
.
0 α2 (α − 1)xα−2
2
Check the leading principal minors: D1 = α2 (α−1)x1α−2 < 0, because α < 1; D2 = α4 (α−1)2 x1α−2 xα−2
2 >
0. =⇒ H is negative definite. =⇒ U (x1, x2 ) is strictly concave.
2
To find the demand function, using (1) and (2),
α2 α2
1−α =
p 1 x1 p2 x1−α
2
x2 1−α p1
=⇒ ( ) =
x1 p2
x2 p1 1−α
1
=⇒ =( ) (3).
x1 p2
Substituting (3) into the constraint,
p 1 x + p 2 x2 = y
=⇒ x∗1 = x1 (p1 , p2 , y)
y
= 1
p1 + p2 ( pp12 ) 1−α
y
= α .
p1 [1 + ( pp12 ) 1−α ]
The demand for the other good is almost identical, but with the roles of p1 and p2 reversed.
Differentiating the demand for the first good w.r.t. p2 .
p1 ( 1−α
α
−1)
∂x∗1 y
α
α−1 ( p2 ) · (− pp21 )
2
= α .
∂p2 p1 [1 + ( pp12 ) 1−α ]2
∂x∗
Because 1 − α > 0, the sign of ∂p21 is same as α.If α > 0, the goods are substitutes; If α < 0, the good are
complements, means as an increase in the price of either good reduces the demand for both.
4 Kuhn-Tucker Theorem again

An agent consumes two commodities. Her utility function is
u(x1 , x2 ) = x21 + 2x2 ,
where x1 indicates her expenditure on good 1 and x2 here expenditure on good 2. The prices for the
commodities are p1 and p2 respectively, both positive. And the consumer can not spend more than her
income I, also a positive value.
1. Assuming that consumption of either commodities must be non-negative, formulate the utilization
maximization problem of this consumer.
2. Explain carefully why this maximization problem has a solution.
3. Show that all conditions required in Kuhn-Tucker’s Theorem are satisfied.
4. Suppose p1 = 2, p2 = 1, and I = 6. Using Kuhn-Tucker’s Theorem, or otherwise, find the pair (x1 , x2 )
that maximizes the consumer’s utility.
Answer:
3
1. The optimization problem becomes
max u(x1 , x2 ) = x21 + 2x2

for D = (x1 , x2 ) ∈ R2 | x1 ≥ 0, x2 ≥ 0, p1 x1 + p2 x2 ≤ I .
2. The objective function u(x1 , x2 ) is continuous. The constraint set D is closed , bounded because for
every (x1 , x2 ) ∈ D, we have 0 ≤ x1 ≤ PI1 , 0 ≤ x2 ≤ PI2 ⇒ By Weierstrass’ Theorem, there must be a
maximum of u on D.
3. Since all the constraints are linear, constraint qualification is satisfied.
4. We have h3 (x1 , x2 ) = 6 − 2x1 − x2 .According to Kuhn-Tucker’s Theorem.



 λ1 ≥ 0, x1 ≥ 0, λ1 x1 = 0 (1)

 λ2 ≥ 0, x2 ≥ 0, λ2 x2 = 0 (2)



λ3 ≥ 0, 6 − 2x1 − x2 ≥ 0, λ3 (6 − 2x1 − x2 ) = 0 (3)
 ∂L : 2x1 + λ1 + 0 ∗ λ2 − 2λ3 = 0

(4)


 ∂x1
 ∂L : 2 + 0 ∗ λ + λ − λ = 0

(5)
∂x2 1 2 3
Since we know in a maximum , all income is used. So it means (6 − 2x1 − x2 ) = 0 (6)

From (1) we get λ1 = 0 or x1 = 0, and from (2) we have λ2 = 0 or x2 = 0. So we consider 4 different
cases.

 (5) =⇒ λ3 = 2

Suppose λ1 = 0 and λ2 = 0, ⇒ (4) =⇒ x1 = λ3 = 2 .

(6) =⇒ x2 = 2

So we have a solution (x∗1 , x∗2 ; λ∗1 , λ∗2 , λ∗3 ) = (2, 2; 0, 0, 2).


 (6) =⇒ x1 = 3

Suppose λ1 = 0 and x2 = 0, ⇒ (4) =⇒ λ3 = x1 = 3 .

(5) =⇒ λ2 = λ3 − 2 = 1



 (5) =⇒ λ3 = 2

Suppose x1 = 0 and λ2 = 0, ⇒ (6) =⇒ x2 = 6 .

(4) =⇒ λ1 = 4


Suppose x1 = 0 and x2 = 0. But that fails (6). No candidate solution.
From above analysis , three possible points for the maximum left: (2, 2), (3, 0), (0, 6). Compute function
values in these three points. We obtain u(2, 2) = 8, u(3, 0) = 9, u(0, 6) = 12. So u has maximum on D
in (x∗1 , y ∗ ) = (0, 6).
4
5 Dynamic programming: Minimizing quadratic costs
The agent solves
∞
X
β t x2t + vt2

min
{vt }
t=0
s.t. xt+1 = 2xt + vt , x0 is given, 0 < β < 1.
Answer: We approach this problem by using the Bellman equation and the guess and verify method.
The Bellman equation is given by
V (xt ) = min x2t + vt2 + βV (xt+1 ) .

{vt }
We use the following guess for the value function,
V (xt ) = Ax2t .
Replacing the guess in the Bellman, and using the transition equation, we get
n o
2
Ax2t = min x2t + vt2 + βA (2xt + vt ) .
{vt }
The FOC gives the opimality condition that links xt and vt ,
2vt + 2βA (2xt + vt ) = 0,

2βAxt
⇒ vt = − .
1 + βA
This result is used to find the constant A. Putting it back in the Bellman, we get,
2 2
2βAxt 2βAxt
Ax2t = x2t+ − + βA 2xt − ,
1 + βA 1 + βA
2 ! 2
2βA 2βA
Ax2t = 1+ x2t + βA 2 − x2t .
1 + βA 1 + βA
Note that we can drop the term x2t , since the Bellman is valid for all values of xt and we obtain a quadratic
equation for A, which gives the value of the constant given β,
2 ! 2
2βA 2βA
A= 1+ + βA 2 − .
1 + βA 1 + βA
PS. What remains to be checked is the transversality condition.
5
6 Dynamic programming question on Final Exam
Consider an economy where the utility of the representative agent is given by
∞
X
β t u(Ct , Lt ), β ∈ (0, 1),
t=0
where Ct is the consumption and Lt the leisure in period t. Assume that the representative agent has the
following instantaneous utility function
L1−γ
t
u(Ct , Lt ) = ln(Ct ) + θ .
1−γ
Denote output by Yt , capital by Kt , and the number of hours worked by Ht . The production function is
given by:
Yt = AKtα Ht1−α , (1)
where A is a constant productivity parameter. The total available time in each period is normalized to 1.
So, Ht + Lt = 1. The law of motion of capital is
Kt+1 = (1 − δ)Kt + It (2)
where δ ∈ (0, 1) is the depreciation rate and It denotes investment in period t. K0 is given. The resource
constraint in period t is given by Yt = Ct + It . We want to find the optimal paths of the choice variables for
t = 0, 1, ..., ∞.
1. Explain Bellman’s Principle of Optimality.
2. List all choice variables and the required state variables in this problem.
3. Write down the Bellman equation in terms of H and K.
4. Derive the necessary first order conditions and the intertemporal Euler equation expressing the marginal
rate of substitution in terms of Yt+1 and Kt+1 .
Dynamic optimization:
1. Richard Bellman (1957) stated his Principle of Optimality as follows: “An optimal policy has the
property that whatever the initial state and initial decision are, the remaining decisions must constitute
an optimal policy with regard to the state resulting from the first decision.” Bellman (1957, p83).
2. State: Kt , Choices: Ct , Lt , It , Yt , Ht , Kt+1
3. The Bellman equation amounts to:

" #
L1−γ
t
V (Kt ) = max ln(Ct ) + θ + βV (Kt+1 )
(Ct ,It ,Lt ) 1−γ
subject to Kt+1 = (1 − δ)Kt + It , Yt = AKtα Ht1−α , Ct = It − Yt and Ht = 1 − Lt . After substitution

we obtain the unconstrained version:
(1 − Ht )1−γ

V (Kt ) = max ln[AKtα Ht1−α − Kt+1 + (1 − δ)Kt ] + θ + βV (Kt+1 )
(Ht ,Kt+1 ) 1−γ
6
4. FOC with respect to Ht :
(1 − α)AKtα Ht−α
= θ(1 − Ht )−γ
Ct
⇔
Yt
(1 − α) Ht
= θ(1 − Ht )−γ
Ct
⇔
Yt θCt
(1 − α) = .
Ht (1 − Ht )γ
FOC with respect to Kt+1 :
1
= βV 0 (Kt+1 ).
Ct
Rewrite the Bellman equation for the next time step t + 1 and take the derivative with respect to Kt+1 :
Y Y
α Kt+1 +(1−δ) α Kt+1 +(1−δ)
V 0 (Kt+1 ) = t+1
Ct+1 and so we obtain 1
Ct =β t+1
Ct+1 . After rewriting obtain the following
Euler equation:
Yt+1
Ct+1 = α + (1 − δ) βCt . (3)
Kt+1
7
Problem Set 1 (Optional): Sets, functions, logic, proofs
TA: Eliott Gabrielle
You do not need to hand in any solutions!
Solutions: Sept 23
1 Exercise: Sets
The consumption set of a consumer is:
C = {(x, y) ∈ R2 : x ≥ 0, y ≥ y 0 > 0} (1)
Illustrate this set. How would you interpret y 0 ?
Answer: y 0 is a minimal consumption threshold. Imagine the goods y is food. It is reasonable to assume
you have no real benefits to consume less than the minimum you need to survive. Consuming less is not an
admissible solution of an eventual optimization problem.
Comment: Note that in the notation y 0 the prime stands to distinguish the point y 0 from y. It has no
relation to a derivative.
2 Exercise: Functions
Let L(X, Y ) be the vector space of all linear mappings from vector space X to vector space Y . Let
T ∈ L(Y, Z),S ∈ L(X, Y ). Show that the composition of the two linear functions, T (S(.)), is linear.
Answer: Let T : Y → Z and S : X → Y , be linear transformations. The composite mapping is then

T ◦ S = R : X → Z.
To prove that R = T ◦ S is linear, one needs to show that the image of the sum under R is equal to the sum
of the images, and the image of the scalar product of a vector is equal to the scalar times the image of the
vector under R (Definition A.15 (Linear Transformation) in Lecture Notes, p. 262):
R(ax0 + bx00 ) = aR(x0 ) + bR(x00 ), where x0 , x00 ∈ X, a, b ∈ R (2)
1
Let x0 , x00 ∈ X, and a, b ∈ R,
R(ax0 + bx00 ) = T (S(ax0 + bx00 )) = (3)

0 00
= T (aS(x ) + bS(x )), by linearity of S (4)
0 00 0 00
= aT (S(x )) + bT (S(x )), by linearity of T = aR(x ) + bR(x ) (5)
3 Exercise: Functions
Prove there exists a bijection between the natural numbers and the integers. (Hint: define a function f
separately on the odd and even positive integers.)
Answer: Consider the following mapping:


x if x is even
2
f (x) = −(x−1) (6)

2 if x is odd
Remember that according to our definition, a function is a bijection iff it is both one-to-one and onto.
We will prove that this function f : N → Z is a bijection, by first showing that it is one-to-one and then
showing that it is onto.
One-to-one: Suppose towards a contradiction that f (x) = f (y). Then they both must have the same sign.
Therefore either f (x) = x2 and f (y) = y2 . So f (x) = f (y) =⇒ x = y =⇒ x = y. Contradiction. The
second case is very similar, f (x) = (x+1)
2 and f (y) = y+1)
2 . So f (x) = f (y) =⇒ x + 1) = (y + 1) =⇒ x = y.
Contradiction, and thus f is one-to-one.
Onto: If y is positive, then f (2y) = y. Therefore, y has a pre-image. If y is negative, then f (a(2y + 1)) = y.
Therefore, y has a pre-image. Thus, f is onto.
Since f is a bijective function, this tells us that N and Z have the same size! Another way to describe this
mapping is: positive integers ⇐⇒ even natural numbers, negative integers ⇐⇒ odd natural numbers.
4 Exercise: Logic
Negate the statements:
1. For all x, there is y such that for all z: z > x , z > y.
2. ∀x ∈ R, ∃(y, z) ∈ R2 such that y < x < z.
3. ∀(y, z) ∈ R2 , y < z, ∃x ∈∈ R such that y < x < z.
4. There is a country in which every citizen has a friend who knows all the constellations.
5. In every village, there is a person who knows everybody else in that village.
Answer:
1. There exists x for which for all y there exists z s.t. z ≤ x or z ≤ y.
2. ∃x ∈ R, ∀(y, z) ∈ R2 such that x ≤ y or x ≥ z.
2
3. ∃(y, z) ∈ R2 , y < z, ∀x ∈∈ R such that x ≤ y or x ≥ z.
4. In every country there is a citizen who has no friend who knows all the constellations.
5. There exists a village where everybody does not know everyone else in the same village. Or in other
words: In at least one village, each person does not know everyone else. Or: In at least one village,
nobody knows everyone else.
5 Exercise: Indirect proof

Suppose there are 400 new bachelor students enrolled in HEC this year. Prove that at least two of them
celebrate their birthdays on the same day (Hint: use indirect proof).
Answer: We have A = there are 400 new bachelor students, B = at least two of them have a birthday on
the same day. Suppose the opposite is true: ¬B = there are no two people with the same day of birth. Then
we show that ¬A follows, that is ¬A = there cannot be 400 first-year students enrolled. Thus ¬B =⇒ ¬A.
Since there are at most 366 days in a year, thus the largest number of students that have distinct birthdays
is 366 — one for each day of the calendar. Thus 400 students are too many for the available 366 days, and
some of the students will share their days of birth.
6 Exercise: Induction
n(n+1)(n+2)
Prove that 1×2+2×3+...+n×(n+1) = 3 for all natural n > 1. (Hint: use mathematical induction).
Answer: The base of induction:

We verify that for n = 2 the equation holds
2(2 + 1)(2 + 2)
1×2+2×3=8=
3
That allows us to state that the equation holds for n = k
k(k + 1)(k + 2)
1 × 2 + 2 × 3 + ... + k(k + 1) =
3
The inductive step:
We verify that the equation holds for n = k + 1 (given that it holds for n = k):
k(k + 1)(k + 2) (k + 1)(k + 2)(k + 3)

1 × 2 + 2 × 3 + ... + k(k + 1) +(k + 1)(k + 2) = + (k + 1)(k + 2) =
| {z } 3 3
k(k+1)(k+2)
= 3
Thus we conclude that 1 × 2 + 2 × 3 + ... + n(n + 1) = n(n + 1)(n + 2) holds for all natural n > 1.
7 Exercise: Application of Induction

Consider a 10 years coupon bond. The coupon payments are annual and equal to $1, the face value is $2 and
the rate of return on this bond is 2%. Compute the price of the bond. What is the price of the perpetual
3
bond? (Hint: Compute S = 1 + k + k 2 + k 3 + .. + k n first.)
1−kn+1 1
Answer: The sum S = 1 + k + k 2 + k 3 + .. + k n = 1−k . Here, k = 1+r .
The price of the bond is the present discounted value of cash flows we get from it.
4
Problem Set 2: Linear algebra I
You do not need to hand-in your answers
Due September 30th
1 Exercise: Proof by induction

Show that for all n and for all x 6= 1,
1 − xn+1
1 + x + x2 + ... + xn = (1)
1−x
Hint: You can do this by direct proof, or by induction.
Answer: Direct proof: Let
S = 1 + x + x2 + x3 + ... + xn
xS = x + x2 + x3 + ... + xn + xn+1
S − xS = 1 − xn+1
1 − xn+1
S=
1−x
Or by mathematical Induction:
1 − x2
1.) n = 1 implies 1 + x =
1−x
1 − xn+1
2.) Assume that it is true for n = k : 1 + x + ... + xn =
1−x
1 − xn+1 1 − xn+1 + xn+1 − xn+2 1 − xn+2
3. if n = k + 1 : 1 + x + ... + xn + xn+1 = + xn+1 = =
1−x 1−x 1−x
2 Exercise: Proof by induction

2
Prove that Sn = 13 + 23 + . . . n3 = (1 + 2 + · · · + n) for all natural n. (Hint: use mathematical induction).
The base of induction for n = 1 obviously holds: S1 = 13 = 12 .

2
The inductive step: suppose that Sk = 13 + 23 + . . . k 3 = (1 + 2 + · · · + k) for some k. We need to show
3 2 2
that Sk+1 = 13 +23 +. . . +k 3 +(k + 1) = (1 + 2 + · · · + k + k + 1) . Indeed, Sk+1 = (1 + 2 + · · · + k + k + 1) =
2 2 2 2
(1 + 2 + · · · + k) +2 (1 + 2 + · · · + k) (k + 1)+(k + 1) =(∗) (1 + 2 + · · · + k) +2 k(k+1)
2 (k + 1)+(k + 1) =
2 2 2 2 3 3 3
(1 + 2 + · · · + k) + k(k + 1) + (k + 1) = (1 + 2 + · · · + k) + (k + 1) =(∗∗) 13 + 23 + . . . +k + (k + 1) .
In (∗) we used the result 1 + 2 + · · · + n = n(n+1)

2 (see Lecture notes). In (∗∗) we applied the induction
hypothesis for Sk . Thus we conclude that Sn holds for all n.
1
3 Exercise: Rank, determinant, trace, inverse of a matrix
1. Compute
3 1 −1 2
−5 1 3 −4
D= .
2 0 1 −1
1 −5 3 −3
2. Calculate rank(A), |A|, tr(A), A−1 for

 
1 4 7
A= 3 2 5 .
 
5 2 8
Answer:
1. We first simplify the matrix in order to make the computation of the determinant faster.
Reminder: If we add or subtract, from one row, a multiple of another row, then the determinant does
not change.
We first subtract 3 times the third row to the fourth row (row4 − 3 × row3 ).
3 1 −1 2 3 1 −1 2
−5 1 3 −4 −5 1 3 −4
D= −→ D1 =
2 0 1 −1 2 0 1 −1
1 −5 3 −3 −5 −5 0 0
Since that det(D) = det(D1 ) we compute the det(D1 ) applying the Laplace formula to the last line.
1 −1 2 3 −1 2
det(D1 ) = (−1)4+1 (−5) −1 3 −4 + (−1)4+2 (−5) −5 3 −4
0 1 −1 2 1 −1
3 1 2 3 1 −1
4+3 4+4
+(−1) (0) −5 1 −4 + (−1) (0) −5 1 3 =
2 0 −1 2 0 1
= 5 [−3 + 0 + 2 + 0 + 4 − 1] − 5 [−9 + 8 − 10 − 12 + 12 + 5] + 0 + 0
= 5 × 2 + (−5) × (−6) = 10 + 30 = 40 = det(D)

 
6 −18 6
2. |A| = −18, rank(A) = 3, tr(A) = 11 and A−1 1 
= − 18  1 −27 16  .

−4 18 −10
2
4 Gaussian Elimination Method
For the matrix below  
3 2 4
A= 2 0 2 .
 
4 2 3
Find its inverse using Gaussian elimination and answer if the matrix is 1) symmetric, 2) idempotent, 3)
orthogonal.
Answer:
5 Exercise: true or false?

Consider a matrix Am×n . Which of the following statements are true and which are false? Explain why.
1. If the rows of matrix A are linearly independent, then m > n.
2. If the rows of matrix A are linearly dependent, then m > n.
3. If m > n, then the rows of matrix A are linearly independent.
4. If m > n, then the rows of matrix A are linearly dependent.
Answer:
1. False. If the rows of matrix A are linearly independent, then m can not exceed n.
2. False. If the rows of matrix A are linearly dependent, then m and n can be in whatever relation.
If the rows of matrix A are linearly dependent, then rank(A) = row rank(A) = k, k < m. At the
same time, we do not know anything about the linear independence of the columns, k ≤ n. Thus,
k < m, k ≤ n which does not allow us to conclude anything about the relation between m and n.
3. False. We can have at most n linearly independent rows, so if m > n, then the rows of matrix A are
linearly dependent.
3
4. True. We can have at most n linearly independent rows, so if m > n, then the rows of matrix A are
linearly dependent.
4
Problem Set 3: Linear algebra: systems of equations
Due Oct 7nd
1 Exercise: Using linear algebra to verify market completeness

 
x11 ......x1n
 x ......x 
 21 2n 
There are n different states of nature and m assets in economy. X =   is a payoff matrix,
 : 
xm1 ......xmn
where each column corresponds to each state and each row shows how much an asset gives in each particular
state. Financial market is complete if by means of existing assets one can construct a portfolio which will
give any desired combination of payoffs in different states. If there are more assets than needed to construct
such a portfolio then assets which are not used in a portfolio are called redundant.
1. If the market is complete, what is the relation between m and n (n ≤ m, m ≤ n or no relation)?
2. If the market is complete what can you say about the rank of matrix X?
3. What is a condition for absence of redundant assets?

 
301
4. Consider an economy with 3 states and 3 assets. The payoff matrix is: X =  5 1 4  . Is this market
 
919
complete? Are there any redundant assets?
 
135
 113 
5. Consider an economy with 3 states and 4 assets. The payoff matrix is: X =   . Is this
 
 3 9 15 
5 5 15
market complete? Are there any redundant assets?
Answer:
1. If market is complete =⇒ m ≥ n , we have to have at least as many assets as there are states in the
economy, otherwise ( m < n ). We are not able to replicate any desired combination of payoffs.
2. If market is complete =⇒ rank(X) = n. Even if m ≥ n , It could be that rank(X) < n. In this case
we are not able to replicate all combinations =⇒ rank(X) = n is a necessary and sufficient condition.
3. There are no redundant assets if m = n . If m > n & rank(X) = n =⇒ some of the assets are linear
combination of other assets =⇒ redundant.
 
3 0 1
4. X =  5 1 4  n = 3, m = 3
 
9 1 9
1
Let us find rank(X).
     
3 0 1 3 0 1 3 0 1
 5 1 4  −→  −7 1 0  −→  −7 1 0  −→
     
9 1 9 −18 1 0 12 0 0
   
3 0 1 0 0 1
 −7 1 0  −→  0 1 0 .
   
1 0 0 1 0 0
=⇒ rank(X) = 3 =⇒ 3 assets , 3 states rank(X) = 3 =⇒ market is complete.

n = m =⇒ no redundant assets.
5.  
1 3 5
 1 1 3 
X= n = 3, m = 4
 

 3 9 15 
5 5 15
Let’s find rank(X).

   
1 3 5 1 3 5
 1 1 3   0 −2 −2 
 −→   −→
   

 3 9 15   0 0 0 
5 5 15 0 −10 −10
 
1 3 5 ! !
 0 1 1  1 3 5 1 0 2
 −→ −→
 

 0 0 0  0 1 1 0 1 1
0 1 1
rank(X) = 2 , n = 3, m = 4 =⇒ market is incomplete.

There are two assets which are linear combinations of other 2 assets =⇒ These 2 assets are redundant.
2 Determinant, inverse
Calculate det(A), A−1 for  
1 4 7
A= 3 2 5 .
 
5 2 8
Answer:
Finding determinant.
1 4 7
2 5 4 7 4 7
det(A) = 3 2 5 = (−1)1+1 + 3(−1)2+1 + 5(−1)3+1 = 2 ∗ 8 − 2 ∗ 5 − 3(4 ∗
2 8 2 8 2 5
5 2 8
8 − 2 ∗ 7) + 5(4 ∗ 5 − 2 ∗ 7) = 6 − 54 + 30 = −18
2
1 4 7 1 4 7
(−3) row 1 + row 2, (−5) row 1 + row 3
det(A) = 3 2 5 = 0 −10 −16 =
5 2 8 0 −18 −27
1 4 7 1 0 1
1
= (−2) ∗ (−9) 0 5 8 = 18 ∗ 2 0 0 2
=
0 2 3 (−2) row 3 + row 1, (− 25 ) row 3 + row 1 0 1 3
2
1 0 1 1 0 0 1 0 0
= 18 0 0 1 = 18 0 0 1 = −18 0 1 0 = −18
(− 32 ) row 2 + row 3, − row 2 + row 1
0 1 23 0 1 0 0 0 1
Inverted matrix.
Gauss Elimination Method:
 
1 4 7 1 0 0
 (−3) row 1 + row 2, (−5) row 1 + row 3
 3 2 5 0 1 0 

5 2 8 0 0 1
 
1 4 7 1 0 0
→  0 −10 −16 −3 1 0 
 
∗ (−0.1)
0 −18 −27 −5 0 1
 
1 4 7 1 0 0
→ 0 1 1.6 0.3 −0.1 0 
 
(−4) row 2 + row 1, 18 row 2 + row 3
0 −18 −27 −5 0 1
 
1 0 0.6 −0.2 0.4 0
→  0 1 1.6 0.3 −0.1 0 
 
1
0 0 1.8 0.4 −1.8 1
  1.8
1 0 0.6 −0.2 0.4 0
→  0 1 1.6 0.3 −0.1 0 
 
4 10
0 0 1 18 −1 18 (−0.6) row 3 + row 1, (−1.6) row 3 + row 2
 
6 6
1 0 0 − 18 1 − 18
1
→  0 1 0 − 18 1.5 − 16 ⇒
 
18 
4 10
0 0 1 18 −1 18
   
6 6
− 18 1 − 18 6 −18 6
A−1 =  − 18 1
1.5 − 16 = − 18 1 
 1 −27 16 
  
18 
4
18 −1 10 18 −4 18 −10
Method of Adjoint Matrix:
For the computation of the inverse matrix A−1 we use the following algorithm:
1
A−1 = adj(A) = (dij )
det(A)
where
1
dij = (−1)i+j det(Mji )
det(A)
1 2 5 1 6
d11 = − 18 (−1)1+1 = − 18 (16 − 10) = − 18
2 8
1 3 5 1 1
d21 = − 18 (−1)2+1 = 18 (24 − 25) = − 18
5 8
3
1 3 2 1 4
d31 = − 18 (−1)3+1 = − 18 (6 − 10) = 18
5 2
1 4 7 1 18
d12 = − 18 (−1)1+2 = 18 (32 − 14) = 18
2 8
1 1 7 1 27
d22 = − 18 (−1)2+2 = − 18 (8 − 35) = 18
5 8
1 1 4 1
d32 = − 18 (−1)2+3 = 18 (2 − 20) = − 18
18
5 2
1 4 7 1 6
d13 = − 18 (−1)1+3 = − 18 (20 − 14) = − 18
2 5
1 1 7 1
d23 = − 18 (−1)2+3 = 18 (5 − 21) = − 16
18
3 5
1 1 4 1
d33 = − 18 (−1)3+3 = − 18 (2 − 12) = 10
18
3 2
     
6 18 6
d11 d12 d13 − 18 18 − 18 6 −18 6
−1 1 27 16  1 
A =  d21 d22 d23  =  − 18 − 18  = − 18  1 −27 16 
   
18
4 18 10
d31 d32 d33 18 − 18 18 −4 18 −10
3 Linear algebra for econometrics

A very useful matrix which has been used in econometrics is
1
M 0 = In − 1 × 10
n
where 1 is an n × 1 vector of ones. It can be used to transform data to deviations from their mean.
a) Show that M 0 is symmetric idempotent.
Hint: it must satisfy (M 0 )0 M 0 = M 0 .
n
b) To obtain the sum squared deviations about the mean ( (xi − x̄)2 ), compute x0 (M 0 )0 M 0 x.
P
i=1
Answer:
1. The matrix M 0 can be computed,

 1

1− n − n1 ··· − n1

 −1 1
.. 
0 n 1− n − n1 . 
M = .
 
.. ..
. − n1 . − n1
 
 
− n1 − n1 ··· 1 − n1
4
Clearly, M 0 is symmetric (M 0 )0 = M 0 (all entries off the diagonal are equal).
 1
 
1− n − n1 ··· − n1 1− 1
n − n1 ··· − n1

 −1 ..  .. 
0 0 0 n 1 − n1 − n1 .  −1
n 1− 1
n − n1 . 
(M ) M = 
  
.. ..
 .. ..

. − n1 . − n1 . − n1 . − n1
  
  
− n1 − n1 · · · 1 − n1 − n1 − n1 ··· 1 − n1
(1 − n1 )2 + (− n1 )2 (n − 1) 1 2
+ (− n1 )2 (n
 
−(1 − n)n − 1)
··· − n1

 = 1 − n1 = − n1 

 .. 
= − n1 1− 1
− n1 .
 
n 
.. ..
 
− n1 . − n1
 
 . 
− n1 − n1 ··· 1 − n1
= M 0.
Therefore, M 0 is idempotent, M 0 M 0 = M 0 .
2.
1 0
M 0 x=(In − 11 )x
n
n
 
1
P
  n xi 
x1
 i=1
n
 
1
  P 

 x2   n x i 
= .. − i=1

  ..
 
 .  
 .


xn n
 
1
 P 
n xi
i=1
= x − x̄1.
0
x0 (M 0 )0 M 0 x = (x − x̄1) (x − x̄1)
 
x1 − x̄
 x2 − x̄

 
= x1 − x̄ x2 − x̄ · · · xn − x̄ 
 .. 

 . 
xn − x̄
n
X
= (xi − x̄)2 .
i=1
4 Potential exam question

Consider a matrix Am×n . Which of the following statements are true (in every case)? Which are false (in
at least one case)? Explain why.
1. If the rank of matrix A is smaller than m, A has full rank.
2. If m = n, A is a square matrix with full rank.
5
3. If we interchange the last two rows of matrix A, the determinant of A is not affected.
4. If there are k linearly independent rows in A, then k is the rank of A.
Answer:
1. False. It might be the case that n is smaller than m but neither columns nor rows are linearly inde-
pendent and the matrix does not have full rank. Take A = 13×2 with Rk(A) = 1 as a counterexample.
2. False. If m = n, A is indeed a square matrix, but it does not mean it has full rank. Take A = 12×2 as
a counterexample.
3. False. The determinant is multiplied by −1. Notice however that the rank is not affected.
4. True. This follows from the definition of the rank.
5 Systems of linear equations: application

You have just finished your master’s studies at HEC with success. Not surprisingly, you have been hired as
an investment advisor for OBS (Ordinary Bank of Switzerland). Your first client is a wealthy Saudi-Arabian
who wants you to do some asset allocation. The asset classes he is considering are bonds, stocks, and real
estate. You have done some preliminary analysis and determined that the payoffs of the three types of
securities in three possible scenarios are
 
2 1 1
A= 2 −3 2 
 
2 −2 3
where each column is a different asset, and each row is a different scenario. Your goal is to determine
the portfolio allocation in each asset class, denoted θ ∈ R3 , given the client’s target payoff b ∈ R3 in each
scenario. In other words, you want to determine θ such that Aθ = b. To make sure that this is feasible, you
determine the following:
1. Calculate |A|.
2. Is A invertible? Justify briefly.
3. Calculate A−1 .
4. What is the optimal portfolio θ given b = (b1 ; b2 ; b3 )0?
Answer:
1. |A| = −10.
2. Yes, since |A| = −10 6= 0.
3.  
1 1
2 2 − 12
A−1 =  1
− 52 1
 
5 5 
− 51 − 35 4
5
6
4. Aθ = b ⇒ θ = A−1 b ⇒
1 1 1
θ1 = b1 + b2 − b3
2 2 2
1 2 1
θ2 = b1 − b2 + b3
5 5 5
1 3 4
θ 3 = − b1 − b2 + b3
5 5 5
6 System of equations: unique, several, no solutions

For a given system of equations
   
  

 x1 + x2 + x3 = 2q,  1 1 1  x 1  2q 
2x1 − 3x2 + 2x3 = 4q, ⇔  2 −3 2   x2  =  4q ,
    
    
3x1 − 2x2 + px3 = q, 3 −2 p x3 q

| {z } |{z}
A B
answer the following questions:
1. For which values of the constants p and q does the system have a unique solution, several solutions or
no solution?
For exercises (b) and (c) assume that the rank of matrix A is equal to the size of vector x. In other words,
if x is n × 1, the rank of A has to be n. (This condition implies that (A0 A)−1 exists).
1. Show that if the solution of the system exists and is unique, then B is in the sub-space generated by
the column vectors of A (Hint: B can be rewritten as a linear combination of the column vectors of
A).
2. If the system admits no solution, it is possible to approximate the system by:
Ax = B − ,
where is constructed in such a way that B is in the sub-space generated by the column vectors of A.
To be efficient, we want to minimize the norm of . What should we do and what is the final solution?
(You can provide a graphical intuition in parallel with your mathematical results).
Answer:
1. The 1st way

To admit a unique solution, the matrix A has to be invertible ⇔ det(A) 6= 0
1 1 1
det(A) = 2 −3 2 = −5p + 15 6= 0
3 −2 p
If p 6= 3, then the matrix is invertible and the system will admit one solution, whatever the value of q.
7
If p = 3, then we can simplify the system as follows:
   
1 1 1 2q 1 1 1 2q
 2 −3 L
2 2q  3 − L − L =⇒  2 −3 2 2q 
   
2 1
3 −2 3 2q 0 0 0 −5q
If q 6= 0 and p = 3, then the system admits no solutions. If q = 0 and p = 3, then the system admits
an infinity of solutions.
The 2nd way
   
1 1 1 2q
Let A =  2 −3 2  and b =  4q , then we rewrite as Ax = b.
   
3 −2 p q
Homogeneous Case. q = 0 ⇒ b = 0.
If |A| =
6 0, which means p 6= 3, then ∃! a trivial solution 0.
If p = 3, then ∃ infinite number of solution.
Inhomogeneous Case. b 6= 0 (q 6= 0).
If |A| =
6 0, which means p 6= 3, then ∃ unique solution.
   
1 1 1 2q 1 1 1 2
If |A| = 0, p = 3 ⇒ rank(A) = 2, while Ã =  2 −3 2 4q  ∼  2 −3 2 4  and
   
3 −2 3 q 3 −2 3 1
rank(Ã) = 3, so no solution.
2. We know that Ax = B holds. We can rewrite A = {a1 , a2 , a3 }, where ai is the column vector i of
P3
matrix A ∀i = 1, 2, 3. So B = x1 a1 + x2 a2 + x3 a3 = i=1 xi ai
3. We want to minimize the norm of , in other words, we want to minimize the distance between vector
B, and the sub-space generated by A. Thus, we will have to project orthogonally in the sub-space of
A ⇔ A⊥
A0 = A0 (B − Ax) = A0 B − A0 Ax = 0
This gives us the set of normal equations A0 B = A0 Ax. If A’A is full rank, then we can premultiply
by (A0 A)−1 and obtain the standard OLS estimator:
x = (A0 A)−1 A0 B.
But: As we have seen in (a), the system of equations A admits for no solutions if p = 3. In this case,
det(A) = 0 and the matrix A is not invertible. This implies, that also (A0 A) is not invertible.
8
7 Cramer’s Rule
Consider the following system of equations:
5P1 + P2 − P3 = 9
−2P1 + 5P2 − P3 = 3
−2P1 − 2P2 + 14P3 = 34
What is the vector P = (P1 , P2 , P3 ) that solves the system?

Answer: Applying Cramer’s Rule results in:
|A| = 356, : |A1 | = 712, : |A2 | = 712, : |A3 | = 1068 ⇒ P1 = 2, : P2 = 2, : P3 = 3
9
Problem Set 4: Linear algebra: systems of equations
TAs: Elliot
Due Oct 14th
1. Systems of linear equations

For what values of the constant p and q does the following system have a unique solution, several solutions,
or no solution?
x1 + x2 + x3 = 2q
2x1 − 3x2 + 2x3 = 4q
3x1 − 2x2 + px3 = q.
Answer:
 Let   
1 1 1 2q
A =  2 −3 2  and b =  4q , then we rewrite as Ax = b.
   
3 −2 p q
• Case 1: Homogeneous Case. q = 0 ⇒ b = 0.

If |A| =
6 0, which means p 6= 3, then ∃! a trivial solution 0.
If p = 3, then ∃ infinite number of solution.
• Case 2: Inhomogeneous Case. b 6= 0 (q 6= 0).

If |A| =
6 0, which means p 6= 3, then ∃ unique solution.
   
1 1 1 2q 1 1 1 2
If |A| = 0, p = 3 ⇒ rank(A) = 2, while Ã =  2 −3 2 4q  ∼  2 −3 2 4  and
   
3 −2 3 q 3 −2 3 1
rank(Ã) = 3, so no solution.
2. Economy with 3 industries

Suppose that there is an economy with three industries: labor, transportation, and food.
Let $1 produced in the labor industry require 40 cents in transportation and 20 cents in food as inputs; while
$1 in transportation takes 50 cents in labor and 30 cents in transportation; finally, $1 in food production
uses 50 cents in labor, 5 cents in transportation, and 35 cents in food.
Let demand in the current production period be $10, 000 for labor, $20, 000 for transportation, and $10, 000
for food.
Find the production schedule for the economy.
Let x1
, x2 , x3be the dollar value of labor, transportation, and food, correspondingly, produced.
x1
Let X =  x2  be “a production vector”,
 
x3
1
 
10, 000
d =  20, 000  be “a demand vector”, and
 
10, 000
 
0 0.5 0.5
C =  0.4 0.3 0.05  be an input matrix.
 
0.2 0 0.35
We assume that all produced goods not used in the subsequent production process are consumed. Thus,
X, C and d are linked by the following equation X − CX = d.
X = (I − C)−1 d
  
1.82 1.3 1.5 10, 000
=  1.08 2.2 1   20, 000 
  
0.56 0.4 2 10, 000

 
59, 200
=  64, 800 
 
33, 600
So the production schedule should be $59, 200 labor, $64, 800 transportation and $33, 600 food.
3. Quadratic forms and definiteness

Determine the definiteness of
Q = 3x21 + x22 + 6x23 − 2x1 x2 + 2x1 x3 + 4x2 x3
Answer: In matrix notation

 
3 −1 1
Q = x Ax, where A =  −1
0
1 2 .
 
1 2 6
Leading principle minors are:
D1 = 3 > 0.
3 −1
D2 = = 3 ∗ 1 − (−1) ∗ (−1) = 2 > 0.
−1 1
3 −1 1
1 2 −1 1 −1 1
D3 = −1 1 2 = (−1)1+1 ∗ 3 ∗ + (−1)2+1 (−1) + (−1)3+1 ∗ 1
2 6 2 6 1 2
1 2 6
= 3 ∗ (6 − 4) + 1 ∗ (−6 − 2) + 1 ∗ (−2 − 1) = 6 − 8 − 3 = −5 < 0.
D1 > 0, D2 > 0, D3 < 0 ⇒ not positive definite, not negative definite.
2
The arbitrary minors are:
order 3 : ∆k=3 = D3 = −5 < 0 ⇒ not positive semidefinite.

order 1 : ∆11 = 3, ∆12 = 1, ∆13 = 6 ⇒ (−1)1 ∆1 < 0 ⇒ not negative semidefinite.
So, quadratic form is indefinite.
4. Definiteness, decomposition, eigenvectors

Consider two following quadratic forms:
QID = x21 + x22 + x23 − 2x1 x2 − 2x1 x3 ,
Q = 2x21 + x22 + x23 − 2x1 x2 − 2x1 x3 .
1. Determine the definiteness of QID and Q.
2. Consider the corresponding matrix A of quadratic form Q (i.e. Q = xT Ax). Is it symmetric? Is it

invertible? Is it orthogonal?
3. Determine the eigenvalues of A. Recheck the definiteness of Q by using eigenvalue criteria.
4. Find the eigenvectors that correspond to the eigenvalues of A.
5. Perform three different decomposition techniques on A that were introduced in class.
6. Calculate explicitly An as a function of n (Hint: use spectral decomposition of A).
Answer:
1. 1. In matrix notation  
1 −1 −1
QID = x0 Ax, where A =  −1 1 0 .
 
−1 0 1
Leading principle minors are:
D1 = 1 > 0.
1 −1
D2 = = 1 ∗ 1 − (−1) ∗ (−1) = 0.
−1 1
1 −1 −1
1 0 −1 −1 −1 −1
D3 = −1 1 0 = (−1)1+1 ∗ 1 ∗ + (−1)2+1 (−1) + (−1)3+1 ∗ (−1)
0 1 0 1 1 0
−1 0 1
= 1 − 1 − 1 = −1 < 0.
D1 > 0, D2 = 0, D3 < 0 ⇒ not positive definite, not negative definite.
3
The arbitrary minors are:
order 3 : ∆k=3 = D3 = −1 < 0 ⇒ not positive semidefinite.

order 1 : ∆11 = 1, ∆12 = 1, ∆13 = 1 ⇒ (−1)∆1 < 0 ⇒ not negative semidefinite.
So, the quadratic form is indefinite.

2.   
2 −1 −1 x1
Q = x0 Ax = x1 x2 x3 −1 1 0   x2  .
  

−1 0 1 x3
Principal minors:
D1 = 2 > 0.
2 −1
D2 = = 2 − (−1)(−1) = 1 > 0.
−1 1
2 −1 −1
−1 1 2 −1
D3 = −1 1 0 = (−1)1+3 · (−1) · + (−1)3+3 · 1 · = −1 + 1 = 0.
−1 0 −1 1
−1 0 1
D3 = 0 ⇒ not positive definite, not negative definite.

Arbitrary minors :
m=1: fix a11 =⇒ ∆1 = 2 >0

fix a22 =⇒ ∆1 = 1 >0
fix a33 =⇒ ∆1 = 1 >0
1 0
m=2: fix a11 , a11 =⇒ ∆2 = =1 >0
0 1
2 −1
fix a22 , a22 =⇒ ∆2 = =1 >0
−1 1
2 −1
fix a33 , a33 =⇒ ∆2 = =1 > 0
−1 1
2 −1 −1
m=3: fix a11 , a22 , a33 =⇒ ∆3 = −1 1 0 =0≥0
−1 0 1
All arbitrary minors of order 1 are positive =⇒ not negative semidefinite.

Check positive semidefiniteness:
m=1: ∆1 ≥ 0
4
=⇒ True for matrix A
m=2: ∆2 ≥ 0
=⇒ True for matrix A
m=3: ∆3 ≥ 0
=⇒ True for matrix A =⇒ Quadratic form is positive semidefinite.
 
2 −1 −1
2. The corresponding matrix is A =  −1 1 0 . It is symmetric since aij = aji . Since det A =
 
−1 0 1
0 =⇒ A−1 doesn’t exist, so A is not invertible. It is not orthogonal because determinant of an
orthogonal matrix must be equal ±1 (see Exercise 1.4.2 from Exercise Session 2).
3.
2−λ −1 −1
−1 1 − λ 2−λ −1
|A − λI| = −1 1−λ 0 = (−1)1+3 (−1) + (−1)3+3 (1 − λ) =0
−1 0 −1 1−λ
−1 0 1−λ
=⇒ −(1 − λ) + (1 − λ)((2 − λ)(1 − λ) − 1) = (1 − λ)((2 − λ)(1 − λ) − 2) = (1 − λ)(λ2 − 3λ) =
= λ(1 − λ)(λ − 3) = 0.

 λ1 = 0

eigenvalues λ2 = 1 .

λ3 = 3

All eigenvalues λi ≥ 0 for i = 1, 2, 3 ⇐⇒ positive semidefinite (see proposition 2.11, p.53).
4. λ1 = 0 :
   
2 −1 −1 1 0 −1 !
 row1 + row2 −1 1 0
(A − λI) =  −1 1 0  −→  −1 1 0  −→
  
1 0 −1
−1 0 1 1 0 −1
    
! x1 0  x1 = x1
−1 1 0  −x + x = 0
1 2
−→  x2  =  0  =⇒ x2 = x1 .
   
1 0 −1  x1 − x3 = 0

x3 0 x3 = x1

  
x1 c
1
=⇒ x =  x2  =  c  ∀c ∈ R =⇒ λ1 = 0, a1 = (1, 1, 1)0 .
   
x3 c
5
λ2 = 1 :
 
1 −1 −1 !
 row1 + row2 0 −1 −1
(A − λI) =  −1 0 0  −→

−1 0 0
−1 0 0
    
! x1 0  x1 = 0
0 −1 −1   −x − x = 0
2 3
−→ x = 0 =⇒ x2 = x2 .
  
 2   
−1 0 0  −x 1 = 0
x3 0 x3 = −x2

   
x1 0
=⇒ x2 =  x2  =  c  ∀c ∈ R =⇒ λ2 = 1, a2 = (0, 1, −1)0 .
   
x3 −c
λ3 = 3 :
row1 − row2
   
−1 −1 −1 0 1 −1 !
0 1 −1
(A − λI) =  −1 −2 0  (−1)row2 + row3 −→  0 2 −2  −→
   
1 0 2
−1 0 −2 (−1)row3 1 0 2
    
! x1 0  x1 = −2x3
0 1 −1   x2 − x3 = 0
−→ =
 x2   0  =⇒ x2 = x3 .
  
1 0 2  1 + 2x3 = 0
 x
x3 0 x3 = x3
   
x1 −2c
=⇒ x3 =  x2  =  c  ∀c ∈ R =⇒ λ3 = 3, a3 = (−2, 1, 1)0 .
   
x3 c
5. From the lecture notes we know three different ways to decompose a matrix A, namely
1. Spectral decomposition (or eigendecomposition): A = CΛC 0 , where Λ is a diagonal matrix with all
eigenvalues and matrix C contains all corresponding eigenvectors.
2. LU decomposition: A = LU , where L is a lower triangular and U is an upper triangular matrix.
3. Cholesky decomposition: A = LL0 = U 0 U , with L and U as above.
1. Spectral decomposition: the corresponding eigenvector matrix C = (a1 , a2 , a3 ) is given by
 
1 0 −2
C = 1 1 1 .
 
1 −1 1
After calculating C −1 using method of adjoint matrix, we obtain
 T  
1 1 1
2 0 −2 3 3 3
1 1
C −1 = adj (C) =  2 3 1  = 0 1
− 21
  
det C 6 2 
2 −3 1 − 13 1
6
1
6
6
At the end we result in the following eigendecomposition:
   
1 1 1
1 0 −2 0 0 0 3 3 3
1
A = 1 1 1  0 1 0  0 − 12
   
2 
1 −1 1 0 0 3 − 13 1
6
1
6
Remark. One may also normalize the eigenvectors and obtain an orthogonal matrix C̃ and thus the
0
0
following matrix decomposition: A = C̃ΛC̃ . The normalized eigenvectors are a1 = √13 , √13 , √13 ,
0 √ 0
a2 = 0, √12 , − √12 , a3 = − √23 , √16 , √16 . And thus the resulting decomposition is
 √   1 
√1 0 − √23 0 0 0 √ √1 √1
3 3 3 3
√1 − √12
   
√1 √1 √1
A=  0√
 0 1 0  .
 3 2 6  2 
√1
3
− √12 √1
6
0 0 3 − √23 √1
6
√1
6
2. The LU decomposition corresponds to the Gaussian elimination algorithm. Our first step is to find
a reduced form of A such that we are left with an upper triangular matrix.
2 row1
   
1
2 −1 −1 1 0 0 1 − 21 − 12 1
2 0 0
2 row1 + row2
1 1
−1 1 0 0 1 0 ∼ 0 − 21 1
1 0
   
2 2
∼
2 row1 + row3 row2 + row3
1
−1 0 1 0 0 1 0 − 21 1
2
1
2 0 1
 
1 − 12 − 12 12 0 0
1
∼ 0 − 12 12 1 0 .
 
2
0 0 0 1 1 1
The first part is the matrix U and the second part is L−1 . Now, we go on by finding the inverse of
L−1 that is L.
   
1
2 0 0 1 0 0 2row1 1 0 0 2 0 0
row2 − 2row1
1
2 1 0 0 1 0 ∼ 0 1 0 −1 1 0 ,
  
1 1 1 0 0 1 row3 − row1 − row2 0 0 1 −1 −1 1
The second matrix is the searched lower triangular matrix L. After all the computations we end
up with   
2 0 0 1 − 21 − 21
1
A = LU = −1 1 0 0 − 21 
  
2
−1 −1 1 0 0 0
3. Finally, the Cholesky decomposition. We have

  
l1,1 0 0 l1,1 l2,1 l3,1
A = LL0 = l2,1 l2,2 0   0 l2,2 l3,2  = (1)
  
l3,1 l3,2 l3,3 0 0 l3,3

 
2
l1,1 l1,1 l2,1 l1,1 l3,1
= l1,1 l2,1 2
l2,2 2
+ l2,1 l2,1 l3,1 + l2,2 l3,2  (2)
 
2 2 2
l1,1 l3,1 l2,1 l3,1 + l2,2 l3,2 l3,3 + l3,1 + l3,2
7
Matrix LL0 is symmetric by definition. Now, we can just use the lower part of the matrix and find the
searched values, i.e.
   
2
2 l1,1
A = −1 2 2
1  = l1,1 l2,1 l2,2 + l2,1 .
   
2 2 2
−1 0 1 l1,1 l3,1 l2,1 l3,1 + l2,2 l3,2 l3,3 + l3,1 + l3,2
After solving the resulting system of 6 non-linear equations with 6 unknowns we can obtain 4 possible
matrices L.One of them is √ 
2 0 0
L = − √12 √1 0.

2
− √12 − √12 0
n
6. One may note that An = CΛC −1 = CΛC −1 ·CΛC −1 ···CΛC −1 = CΛC −1 CΛC −1 C ···C −1 CΛC −1 =
CΛn C −1 .
 n  
0 0 0 0 0 0
n 
Λ = 0 1 0  = 0 1 0 .
  
0 0 3 0 0 3n
     
1 1 1
1 0 −2 0 0 0 3 3 3 2 · 3n−1 −3n−1 −3n−1
Thus An = 1 1 1  0 1 0  0 1
− 12  =  −3n−1 1
· 3n−1 + 1 1
· 3n−1 − 21 
     
2 2 2 2
1 −1 1 0 0 3n − 31 1
6
1
6 −3n−1 1
2 · 3n−1 − 1
2
1
2 · 3n−1 + 21
5. Basis, subspaces
Let u1 = (3, 3, 4)0 and uk = (−1, −1, 5)0 be a basis for the subspace W of Rn .
1. Compose a matrix A using the u1 and u2 as the columns of A.
2. Construct a corresponding projection matrix P .
3. What is n in this problem? What is the dimension of W ?

   
1 1
4. Project vectors v = 2 and w = 1 onto a subspace described by P .
   
3 1
5. Compute the corresponding residual terms ev and ew .
Answer:
1.  
3 −1
A = 3 −1
 
4 5
.
2.  
1 1
2 2 0
P = A(A0 A)−1 A0 =
1 1
0

2 2
0 0 1
8
.
3. n = 3 and dim(W ) = 2.
4.     
1 1 3
2 0
2 1 2
1 1   3
Pv = 2 0 2 =  2 
2
0 0 1 3 3
    
1 1
2 2 0 1 1
1 1
P w =  2 2 0 1 = 1
   
0 0 1 1 1
5.  
− 21
1
ev = 
 
2 
0
 
0
ew = 0
 
6. Basis, null space

Let A be a matrix  
1 2 3
2 4 6
A=
 
−1

4 1
2 −8 −2
.
1. Find a basis for the null space of A, N ull(A).
2. Find a basis for the column space of A, Col(A).
3. Verify the Rank-Nullity Theorem (See the Lecture Notes).
Answer:
1. The null space of A is the set of all vectors x such that Ax = 0.
 
1 2 3   
2 x 0
4 6 

· y = 0 
   
−1

4 1
z 0
2 −8 −2
     
1 2 3 1 2 3 3 0 5
 2 4 6   0 0 0   0 0 0 
⇔ ⇔
     
−1
 
 4 1   0 6 4   0 3 2 
2 −8 −2 0 −12 −8 0 0 0
The solution to the system is (x, y, z) = (5a, 2a, −3a)0 , a ∈ R and N ull(A) = (5, 2, −3)0 .
9
2. The column space is the subspace that is spanned by the columns of a matrix. In other words it is
the subspace built taking all the possible (linear) combinations of the columns of a matrix.
Using the simplified matrix from the previous point
 
3 0 5
 0 0 0 
,
 

 0 3 2 
0 0 0
one can say that the first and the second columns of a matrix are linearly independent, while the third
column is a linear combination of them. The corresponding columns in the initial matrix A will present
the basis of its column space    

 1 2 

 2   4 
 
Col(A) =  ,  .
   
 −1   4 
 
 
2 −8
 
3. One needs to show that dim(Col(A)) + dim(N ull(A)) = dim(A).

dim(Col(A)) = 2, as Col(A) consists of two linearly independent vectors.
dim(N ull(A)) = 1, as N ull(A) consists of one vector.
dim(Col(A)) + dim(N ull(A)) = 3, i.e. the Theorem holds for the 3 × 4-dimensional matrix A.
4. ALTERNATIVE SOLUTION TO (a) and (b)

The column space is the subspace that is spanned by the columns of a matrix. In other words it is
the subspace built taking all the possible (linear) combinations of the columns of a matrix:
Let us find x1 , x2 , x,3 ∈ Rs.t. xi 6= 0 for at least one i and
       
1 2 3 0
2 4  6  0
x1   + x2   + x3   =  
       
−1 4  1  0
2 −8 −2 0
This system of three equations leads to the following results:
5 5 2
x1 = x2 , x1 = − x3 , x 2 = − x3 .
2 3 3
Therefore, the columns of A are not lineary independent. We have dim Col(A) = 2.
One can say that the first and the second columns of a matrix are linearly independent, while the
third column is a linear combination of them. The corresponding columns in the initial matrix A will
present the basis of its column space
   

 1 2 


 2   4 

Col(A) =  ,  .
   



 −1   4 


2 −8
 
Let us now find a basis for the null space of A, which is the set of all vectors x such that Ax = 0.
10
From the first part, we have
5 5 2
x1 = x2 , x1 = − x3 , x 2 = − x3 .
2 3 3
Therefore, N ull(A) = (5x, 2x, −3x), x ∈ R. A basis for this space is
 

 5 

N ull(A) =  2  .
 
 
−3
 
Note that in practise, it is common to normalize the vectors of a basis, which can be done by dividing
each element by the norm of the corresponding vector.
7. Eigenvalues and eigenvectors

Consider        
−2 −1 4 1 1 1
A= 2 1 −2  , x1 =  0  , x2 =  −1  , x3 =  1  .
       
−1 −1 3 1 0 1
1. Verify that x1 , x2 and x3 are eigenvectors of A and find the corresponding eigenvalues.
2. Let
B = A · A.
Show that
B · x2 = x2
and
B · x3 = x3 .
Is it true that
B · x1 = x1 ?
1.
−2 − λ −1 4
|A − λI| = 2 1−λ −2
−1 −1 3−λ
1−λ −2 −1 4 −1 4
= −(2 + λ) −2 −1
−1 3−λ −1 3−λ 1−λ −2
= −(2 + λ)(3 − 4λ + λ2 − 2) − 2(−3 + λ + 4) − (2 − 4 + 4λ)
= −λ3 + 2λ2 + 7λ − 2 − 2λ − 2 − 4λ + 2 = −λ3 + 2λ2 + λ − 2
= −(λ − 1)(λ − 2)(λ + 1) = 0
=⇒ Eigenvalues λ = 1, λ = 2, λ = −1.
11
λ=1:
(A − λI)X = 0
   
−3 −14 0 2 −2
 −3 ∗ row 3
 2 −2 
0 −→  0 −2 2 
  
2 ∗ row 3
−1 −12 −1 −1 2
! !
0 1 −1 0 1 −1
−→ −→
1 1 −2 −1 ∗ row 1 1 0 −1
   
! x 0 (
0 1 −1  y−z =0 y=z
 y  =  0  =⇒ .
  
1 0 −1 x−z =0 x=z
z 0
 
1
Take z = 1 =⇒  1  is an eigenvector that corresponds to λ = 1.
 
1
λ=2:
(A − λI)X = 0
   
−4 −1 4 0 3 0
 −4 ∗ row 3
 2 −1 −2  −→  0 −3 0 
  
2 ∗ row 3
−1 −1 1 1 1 −1
! !
0 1 0 0 1 0
−→ −→
1 1 −1 −1 ∗ row 1 1 0 −1
   
! x 0 (
0 1 0 y=0 y=0
y  =  0  =⇒
   
0 −1 x−z =0

1 x=z
z 0
 
1
Take z = 1 =⇒  0  is an eigenvector that corresponds to λ = 2.
 
1
λ = −1 :
(A − λI)X = 0
 
−1 −1 4 ! ! !
1 1 −4 1 1 −4 1 1 −4
 2 2 −2  −→ −→ −→ 4 ∗ row 1
 
1 1 −1 −1 ∗ row 1 0 0 3 0 0 1
−1 −1 4
   
! x 0 (
1 1 0 x+y =0 x = −y
 y  =  0  =⇒ .
   
0 0 1 z=0 z=0
z 0
 
1
Take y = 1 =⇒  −1  is an eigenvector that corresponds to λ = −1.
 
12
 
1
2. x2 =  −1  corresponds to λ = −1 =⇒ Ax2 = λx2 =⇒ Ax2 = −x2 .
 
B · x2 = (A · A)x2 = A(Ax2 ) = A(−x2 ) = −Ax2 = −(−x2 ) = x2 .

 
1
x3 =  1  corresponds to λ = 1 =⇒ Ax3 = λx3 =⇒ Ax3 = x3 .
 
B · x3 = (A · A)x3 = A(Ax3 ) = A(x3 ) = Ax3 = x3 .

 
1
x1 =  0  corresponds to λ = 2 =⇒ Ax1 = 2x1 .
 
B · x1 = (A · A)x1 = A(Ax1 ) = A(2x1 ) = 2(Ax1 ) = 4x1 .

=⇒ B · x1 6= x1 .
13
Problem Set 5: Linear algebra, Calculus
TAs: Madhushree A
Due Oct 21st
1 Spectral theorem
Let Q be the following quadratic form:
Q = x21 + 2x22 + 2x23 − 2x1 x2 − 2x1 x3 .
1. Write the quadratic form in matrix notation (i.e. Q = xT Ax) .
2. Compute the eigenvalues of matrix A. check the definiteness of Q using eigenvalues of A.
3. Compute the orthonormal eigenvectors of A.
4. Calculate explicitly An as a function of n (use spectral decomposition).
5. Compute the inverse matrix of A using spectral decomposition trick.
   
x1 1 −1 −1
1. Q = xT Ax where x =  x2  and A =  −1 2 0 
   
x3 −1 0 2
 
1−λ −1 −1
2. det(A−λIn ) = 0 ⇔  −1 2−λ 0  = 0 ⇔ (1−λ)(2−λ)2 −(4−2λ) = 0 ⇔ λ3 −5λ2 +6λ = 0
 
−1 0 2−λ
Thus, we have 3 eigenvalues: λ1 = 0, λ2 = 2, λ3 = 3. Two are positive, and one is equal to zero. This
implies that the quadratic form Q is positive semidefinite.
3. for λ1 = 0:    
1 −1 −1 1 −1 −1
(A−λIn ) =  −1 2 0  row 2 and row 3 + row 1 and finally row 2+row 3 −→  0 1 −1 
   
−1 0 2 0 −1 1
    
1 −1 −1 x1 0 (
x1 = 2x2
−→  0 1 −1  2   0  ⇒
x =
    
x2 = x3
0 0 0 x3 0
 
2c
⇒ x1 =  c  ∀c ∈ R =⇒ λ1 = 0
 
c
⇒ To get the orthonormal eigenvectors, we need to rescale the eigenvector by his norm such that the
1
√ √
new norm is 1. Putting c constant and equal to 1, The actual norm is 22 + 12 + 12 = 6, so:
 2 
√
6
x̂1 = c  √1  =⇒ λ1 = 0
 
6
√1
6
for λ1 = 2:       
−1 −1 −1 −1 −1 −1 x1 0 (
x1 = 0
(A − λIn ) =  −1 0 0  −→  −1 0 0   x2  =  0  ⇒
      
x2 = −x3
−1 0 0 0 0 0 x3 0
 
0
⇒ x2 =  −c  ∀c ∈ R =⇒ λ2 = 2
 
c
 
0
 −1 
x̂2 = c  √ 2 
=⇒ λ2 = 2
√1
2
for λ1 = 3:       
−2 −1 −1 −1 0 −1 −1 0 −1 x1
(A−λIn ) =  −1 −1 0  row 1 minus row 2 −→  −1 −1 0  −→  −1 −1 0   x2  =
      
−1 0 −1 −1 0 −1 0 0 0 x3
 
0 (
x1 = −x2
 0 ⇒
 
x2 = x3
0
 
−c
⇒ x3 =  c  ∀c ∈ R =⇒ λ3 = 3
 
c
 −1 
√
3
3
x̂ = c  √1  =⇒ λ3 = 3
 
3
√1
3
4. Because A is a non-singuliar and symmetric matrix, we can decompose matrix A = ĈΛĈ 0 .
• Ĉ = (x̂1 , x̂2 , x̂3 )
• Λ = diag(λ1 , λ2 , λ3 )
Because Ĉ is an orthogonal matrix (by definition), Ĉ 0 Ĉ = In . From this statement, one may note that
An = ĈΛĈ 0 ĈΛĈ 0 ...ĈΛĈ 0 = ĈΛn Ĉ 0 ,

 n  
0 0 0 0 0 0
where Λ =  0
n
1 0  = 0 1 0 .
   
0 0 3 0 0 3n
2
5. Because one of the eigenvalues is equal to 0, we can’t find an inverse.
But assuming we can, from definition 2.22 of the lecture notes, we know that the inverse matrix of A is
just ĈΛ−1 Ĉ 0 . By using the spectral decomposition in this specific case (when the matrix is symmetric),
we don’t need to do any inverse computation:
• Ĉ doesn’t change.
 
0
∞ 0 0
• Λ−1 =  0 1 0 .
 
1
0 0 3
• We know everything to compute the inverse matrix. But in our case, as told before, there is no
inverse matrix!
2 Matrix Decomposition
Let A be a symmetric matrix with  
2 1 −2
A= 1 2 −2
 
−2 −2 5
2
1. Show that the characteristic polynomial may be written in the form p (λ) = (λ − 1) (λ − 7).
2. Show that the vector equation Ax = x reduces to a single scalar equation. Find orthonormal eigenvec-
tors x, y corresponding to the eigenvalue λ = 1.
3. Find a normalized eigenvector corresponding to the eigenvalue λ = 7.
4. Find a diagonal matrix D and an orthogonal matrix S such that S −1 AS = D.
5. Find A−1 and define spectral decomposition.
1. The characteristic polynomial is given by:
2−λ 1 −2
det (A − λI) = 1 2−λ −2
−2 −2 5−λ
2−λ −2 1 −2 1 −2
= (2 − λ) −1 −2
−2 5−λ −2 5−λ 2−λ −2
= (2 − λ) [(2 − λ) (5 − λ) − 4] − (5 − λ − 4) − 2 (−2 + 4 − 2λ)
= (2 − λ) 6 − 7λ + λ2 − 1 + λ + 4λ − 4

= 12 − 14λ + 2λ2 − 6λ + 7λ2 − λ3 + 5λ − 5

= −λ3 + 9λ2 − 15λ + 7
= (λ − 7) −λ2 + 2λ − 1

= (λ − 1)2 (λ − 7).
3
2. Ax = x ⇒ (A − I) x = 0, so each vector x that solves Ax = x is at the same time an eigenvector of A
that corresponds to λ = 1.
Let’s find the eigenvector corresponding to λ = 1:
        
x1 0 2−1 1 −2 x1 0
(A − λI) x2  = 0 ⇔  1 2 − 1 −2  x2  = 0
        
x3 0 −2 −2 5 − 1 x3 0
After a few computations we get

    
1 1 −2 x1 0
0 0 0  x2  = 0 ⇒ x1 + x2 − 2x3 = 0.
    
0 0 0 x3 0
So, Ax = x reduces to a single scalar equation.

The Eigenvalue λ = 1 is a root of the characteristic polynomial of degree 2. This means that under a
linear transformation Ax, there is not only a line corresponding to λ = 1 that does not move, but there
is also a whole hyperplane that is not changed. Any two non-parallel vectors define a hyperplane, hence
we should pick up any two non-parallel vectors to define an eigen-hyperplane. In the question we are
0
asked to choose two orthogonal vectors: xT y = 0. If x = (x1 , x2 , x3 ) is an eigenvector corresponding
to the eigenvalue λ = 1, then
x1 + x2
x1 + x2 − 2x3 = 0 ⇒ x3 = .
2
Take e.g. x1 = 1, x2 = 1, x3 = 1+1

2 = 1. Normalizing vector x gives:
 
1
x 1  
x̂ = = √ 1 .
kxk 3
1
If y = (y1 , y2 , y3 ) is an eigenvector corresponding to the eigenvalue λ = 1:
y1 + y2
y1 + y2 − 2y3 = 0 ⇒ y3 = .
2
We are looking for orthogonal vectors:
x̂0 y = 0.
Thus  
y1
1 y1 + y2 3
√1 √1 √1  y2  = √ y1 + y2 + = √ (y1 + y2 ) = 0
 
3 3 3 2
y1 +y2 3 3
2
y1 + y2 y1 − y1
⇒ y1 = −y2 , y3 = = = 0.
2 2
Hence, vector y of the form:  
y1
y = −y1 
 
4
is an eigenvector corresponding to the eigenvalue 1 and y is orthogonal to x. Take y1 = 1.
Normalizing vector y results in:
  1 

1 √
2
y 1    √1 
ŷ = =√ −1 = − 2  .
kyk 1+1
0 0
So, we have found two orthonormal vectors corresponding to λ = 1:

  
1 1
1   1  
x̂ = √ 1 , ŷ = √ −1 .
3 2
1 0
These two vectors define a hyperplane that is not changed under linear transformation Ax.
3. Let us find the eigenvectors corresponding to λ = 7:

        
z1 0 2−7 1 −2 z1 0
(A − 7I) z2  = 0 ⇔  1 2 − 7 −2  z2  = 0
        
z3 0 −2 −2 5 − 7 z3 0
After a few elementary transformations we obtain the following system in reduced form:
  
! z1 0
1 1 1 z1 + z2 + z3 = 0
z2  0 ⇒
  
1
0 1 2 z2 + 21 z3 = 0
z3 0
Hence, the solution vectors can be written as

 
− 12 z3
z = − 12 z3  .
 
z3
0
Take, e.g. z3 = 1, then the solution vector is given by z = (−0.5, −0.5, 1) or after normalising
0
ẑ = − √16 , − √16 , √26 .
4. Using the spectral theorem for symmetric matrices (matrix A is symmetric) we can write D = S −1 AS,
where D is a diagonal matrix with principal diagonal composed of eigenvalues and matrix S is composed
of the corresponding eigenvectors. In our case:
√1 √1 √1
 1
√1 √1
    
1 0 0 √
3 2 6 3 3 3
0 , S = S 0 =  √12
 √1  −1
S= − √12 √1  , D = 0 1 − √12 0 .
   
 3 6
√1 0 − √26 0 0 7 √1 √1 − √26
3 6 6
5
Check that
D = S −1 AS.
 1
√1 √1
 1
√1 √1
 
√ 2 1 −2 √
3 3 3 3 2 6
 √1
− √12
  √1
= 0  1 2 −2  3 − √12 √1 
 
 2 6
√1 √1 − √26 −2 −2 5 √1 0 − √26
6 6 3
√1 √1 1 1 √1 1 
  
√ √ √
3 3 3 3 2 6
−1
=  √12 √ 0   √13 − √12 √1 
  
2 6
√7 √7 14
−√ √1 0 − √26
6 6 6 3
 
1 0 0
= 0 1 0 .
 
0 0 7
5. If A is symmetric then A−1 = SD−1 S 0 :
A−1 = SD−1 S −1
 1 1 √1
 1
√1 √1
 
√ √ 1 √
3 2 6 1 0 0 3 3 3
 √1
− √12 √1   0
  √1
= 1
0  2 − √12 0 
  
 3 6 1
√1 1
3
0 − √26 0 0 7
√1
6
√1
6
− √26
√1 √1 1   √1 √1 √1
 
√
3 2 7 6 3 3 3
 √1
= − √12 1   √1
√ − √12 0 

 3 7 6  2
√1 0 2
− 7√ √1 √1 − √26
3 6 6 6
 
6
7 − 17 2
7
 1 6 2
= − 7 7 7
.
2 2 3
7 7 7
3 Calculus in Euclidean Space

Consider the function f (x) = ln(x2 + x − 1). What are the support and the range of f ?
Answer: Function ln(x) is defined for x > 0 √ √

x2 + x − 1 > 0 ⇒The support of function f (x) is D = −∞, −1−2 5 ∪ −1+2 5 , ∞
√ √
if x ∈ D = −∞, −1−2 5 ∪ −1+2 5 , ∞ ⇒ x2 +x−1 ∈ (0, ∞) ⇒the range of function f (x) = ln(x2 +x−1)
is (−∞, +∞) .
4 Concavity
Is a function f (x) = 1 − x2 concave? Is it strictly concave? Do the same properties still hold for a function
f (x, y) = 1 − x2 ?
Answer: For concavity, we need to show:
∀x ∈ R, x0 ∈ R(x 6= x0 ) and λ ∈ (0, 1), f (λx + (1 − λ)x0 ) ≥ λf (x) + (1 − λ)f x0

6
For strict concavity the inequality should be strict.
So,if f (x) = 1 − x2 ⇒ f (λx + (1 − λ)x0 ) − λf (x) + (1 − λ)f (x0 )

2
= 1 − (λx + (1 − λ)x0 )2 − λ(1 − x2 ) − (1 − λ)(1 − x0 ) = λ(1 − λ)(x − x0 )2 > 0
since λ > 0, (1 − λ) > 0, and (x − x0 )2 > 0 (if x 6= x0 )
Hence, f (x) = 1 − x2 is concave and strictly concave.
If f (x, y) = 1 − x2 , for concavity, we need to show that
∀z = (x, y) ∈ R2 , z 0 = x0 , y 0 ∈ R2 , (z 6= z 0 ), andλ ∈ (0, 1), f (λz + (1 − λ)z 0 ) ≥ λf (z) + (1 − λ)f z 0

f (λz + (1 − λ)z 0 ) − λf (z) + (1 − λ)f (z 0 )

2
= 1 − (λx + (1 − λ)x0 )2 − λ(1 − x2 ) − (1 − λ)(1 − x0 )
= λ(1 − λ)(x − x0 )2 ≥ 0
If z = x0 , y and z 0 = x0 , y 0 ⇒ z 6= z 0 and f (λz + (1 − λ)z 0 ) = λf (z) + (1 − λ)f z 0

Hence, the function is concave, but not strictly concave.
5 Function continuity
a) Let f, g be two continuous functions on [a, b] ⊂ R and let x0 ∈ (a, b) with f (x0 ) = g(x0 ). We then
define 
f (x) if x ∈ [a, x ),
0
h(x) =
g(x) if x ∈ [x0 , b].
Verify that h is continuous on [a, b].
b) Verify that h is continuous on R, where h is given by


1 if x ∈ (−∞, −1),


h(x) = x2 if x ∈ [−1, 2),

x + 2 if x ∈ [2, ∞).


Answer:
a) Let us first observe that h(x) is continuous on [a, xo ) since f is continuous by definition and h is
continuous on (x0 , b] since g is continuous. The only point we have to verify is x0 . Recall the definition
of a continuous function in terms of topological limits: a function h(x) is continuous at point x0 ∈ (a, b)
if
1. h(x0 )exists
2. limx→x0 h(x)exists and
3. limx→x0 h(x) = h(x0 )
Let us thus verify those points in order to show that h is continuous at x0 .
7
1. h(x0 ) = f (x0 ) = g(x0 ) = L
2. The limit exists if both sides limits exist.
– limx%x0 h(x) = limx%x0 f (x) = f (x0 ) = L
– limx.x0 h(x) = limx.x0 g(x) = g(x0 ) = L
3. limx→x0 h(x) = L = h(x0 )
Therefore, h(x) is continuous at point x0 and is by consequent continuous on [a, b].
b) Let us observe that h(x) is continuous on (−∞, −1) ∪ (−1, 2) ∪ (2, ∞) because f1 (x) = 1, f2 (x) = x2
and f3 (x) = x + 2 are all continuous on R. We therefore only have to verify the continuity of h(x) at
x0 = −1 and x1 = 2.
x0 = −1: 1. h(x0 ) = f1 (x0 ) = f2 (x0 ) = 1

∗ limx%x0 h(x) = limx%x0 f1 (x) = f1 (x0 ) = 1
∗ limx.x0 h(x) = limx.x0 f2 (x) = f2 (x0 ) = 1
3. limx→x0 h(x) = 1 = h(x0 )
x1 = 2: 1. h(x1 ) = f2 (x1 ) = f3 (x1 ) = 4
∗ limx%x1 h(x) = limx%x1 f2 (x) = f2 (x0 ) = 4
∗ limx.x1 h(x) = limx.x1 f3 (x) = f3 (x1 ) = 4
3. limx→x1 h(x) = 4 = h(x1 )
We have verified that h(x) is continuous at both points x0 and x1 . It is therefore continuous on R.
Figure 1: Graph of h(x) for x ∈ [−5, 5]

7
4
h(x)
−5 −4 −3 −2 −1 0 1 2 3 4 5
x
8
Problem Set 6: Calculus
TA: Madhushree
Due Oct 28
1 Limits
Find the following limits
4n
1. lim ,
n→∞ n!
n2 +3n
2. lim 2 ,
n→∞ 2n +1
3. limx→0 x2 sin 1
.

x
q
4. lim 9x3 +5 .
x3 +7x
x→∞
Answer:
4n
1. We want to find: lim = lim 4∗4∗···∗4
n→∞ n! n→∞ 1∗2∗···∗n
n−4
4n 4 ∗ 4 ∗ ··· ∗ 4 4∗4∗4∗4 4
We can write = ≤ × →0
n! 1 ∗ 2 ∗ · · · ∗ n |1 ∗ 2{z
∗ 3 ∗ 4} 5
| {z }
constant
→0 because 4/5 < 1
n
4
So by the Sandwich theorem 0 ≤ lim ≤0
n→∞ n!
4n
lim =0
n→∞ n!
3
n2 +3n 1+ n
2. lim 2 = lim 1 = 12 .
n→∞ 2n +1 n→∞ 2+ n2
3. limx→0 x2 sin 1

x
1
1 ≤ sin( ) ≤ 1 ⇒ bounded
x
1
x2 ≥ 0 ∀x ⇒ −x2 ≤ x2 sin( ) ≤ x2 ⇒ −x2 ≤ f (x) ≤ x2
x
by Sandwich theorem 1
lim x2 = 0 and lim −x2 = 0 =⇒ lim x2 sin( ) = 0
x→0 x→0 x→0 x
q
4. lim x3 +7x
9x3 +5
x→∞ r
x3 (1+ x72 )
q q
x3 +7x 1 1
lim 9x3 +5 = lim x3 (9+ x53 )
= 9 = 3
x→∞ x→∞
2 Homothetic preferences
1
Suppose that in a two-commodity world, the consumer’s utility function takes the form u(x) = [α1 xρ1 + α2 xρ2 ] ρ , α1 +
α2 = 1 (CES).
1
1. Show that when ρ = 1, indifference curves become linear.
2. Show that as ρ → 0, this utility function comes to represent the same preferences as the (generalized)
Cobb-Douglas utility function u(x) = xα 1 α2
1 x2 .
3. Show that as ρ → −∞, indifference curves become “right angles”; that is, this utility function has in
the limit the indifference map of the Leontief utility function u(x1 , x2 ) = M in {x1 , x2 } .
Answer:
1. For ρ = 1, we have u(x) = α1 x1 + α2 x2 .Thus the indifference curves are linear.
2. Since every monotonic transformations of a utility function represents the same preference, we shall
consider
1
ũ(x) = ln u(x) = ln(α1 xρ1 + α2 xρ2 ).
ρ
By L’Hopital’s Rule,
α1 xρ1 ln x1 + α2 xρ2 ln x2 α1 ln x1 + α2 ln x2
lim ũ(x) = lim = .
ρ→0 ρ→0 α1 xρ1 + α2 xρ2 α1 + α2
Since exp {(α1 + α2 )ũ(x)} = xα

1 x2 , we have obtained a Cobb-Douglas utility.
1 α2
3. Suppose x1 ≤ x2 , we want to show

1
x1 = lim [α1 xρ1 + α2 xρ2 ] ρ .
ρ→−∞
Let ρ < 0, since x1 ≥ 0, x2 ≥ 0 we have

1 1
α1 xρ1 ≤ α1 xρ1 + α2 xρ2 ⇒ (α1 ) ρ x1 ≥ (α1 xρ1 + α2 xρ2 ) ρ .
On the other hand , since x1 ≤ x2 , we have
α1 xρ1 + α2 xρ2 ≤ α1 xρ1 + α2 xρ1 = (α1 + α2 )xρ1 .
Hence
1 1
(α1 xρ1 + α2 xρ2 ) ρ ≥ (α1 + α2 ) ρ x1 .
Therefore
1 1 1
(α1 ) ρ x1 ≥ (α1 xρ1 + α2 xρ2 ) ρ ≥ (α1 + α2 ) ρ x1 .
1
Letting ρ → −∞, we obtain lim [α1 xρ1 + α2 xρ2 ] ρ = x1 by Sandwich Theorem.
ρ→−∞
3 Sandwich Theorem
Use the Sandwich theorem to prove that:

1
lim xcos =0 (1)
x→0 x
Answer: For any x 6= 0, we have
2

1
x cos ≤ |x| .
x
Hence
1
−|x| ≤ x cos ≤ |x| .
x
Since
lim −|x| = lim |x| = 0 ,
x→0 x→0
then the Sandwich Theorem implies

1
lim x cos =0.
x→0 x
4 Sandwich theorem
Consider the function : (
1 + 2x2 if x is rational
f (x) =
1 + x4 if x is irrational
Use the Sandwich Theorem to prove that

lim f (x) = 1 .
x→0
Answer: Since we are considering the limit when x gets closer to 0, then we may assume that |x| ≤ 1. In
this case, we have x4 ≤ x2 ≤ 2x2 . Hence for any x, we have
1 ≤ f (x) ≤ 1 + 2x2 .
Since lim 1 + 2x2 = 1, then the Sandwich Theorem implies

x→0
lim f (x) = 1 .
x→0
5 Homogeneous functions
Is a linear production function homogeneous? Is it homothetic?
F (K; L) = αK + βL
where K denotes capital, L denotes labor, and α and β are strictly positive constants.
• F (tK; tL) = αtK + βtL = t(αK + βL) = tF (K; L)

F (K; L) is a homogeneous function of degree 1.
• F (K; L) = h(g(K; L)), where h(x) = x is a monotonic transformation and g(K; L) = αK + βL is a

homogeneous function of degree 1.
F (K; L) is a homothetic function
6 Homogeneous function
What is the degree of homogeneity of this function?
3
• f (x, y) = x2 + y 2
• f (x, y) = x4 + y 4 + 2
Answer:
• f (x, y) is HOM(2)
• f (x, y) is not HOM.
7 Bounded, closed, open

The set M ⊂ R2 has boundary points that do not belong to M . Which of the following statements are true
and which are false? Explain why.
1. M is open.
2. The set of points of the set M which are isolated is not empty.
3. M is unbounded.
4. There exists a limit point of the set M which does not belong to M .
Answer:
1. False. It is not known if all the boundary points do not belong to the set.
2. False. Isolated point is a point of the set, for which there exists a neighborhood with no other points
of this set. We do not have enough information to claim if there are some isolated points in the set.
3. False. We do not have enough information to claim this. M can be either bounded or unbounded.
4. True. The boundary points that do not belong to the set are just these limit points.
8 Derivatives
Compute the derivatives:
1. y = sin 1+x
2x
2
√
2. y = ln(x + 1 + x2 )
q
(x−1)(x−2)
3. y = (x−3)(x−4)
Answer:
1.
2 · 1 + x2 − 2x · 2x 2(1 − x2 )

dy 2x 2x
= 2 cos = 2 cos
dx (1 + x2 ) 1 + x2 (1 + x2 ) 1 + x2
2.
− 1

dy 1 1 1
= √ · 1 + · 1 + x2 2 · 2x = √
dx x + 1 + x2 2 1 + x2
4
q
(x−1)(x−2)
3. the domain of the function y = (x−3)(x−4) is D = {(−∞, 1] ∪ [2, 3) ∪ (4, ∞)}
for x ∈ (4, ∞) we can take ln for both sides:
1
ln y = [ln(x − 1) + ln(x − 2) − ln(x − 3) − ln(x − 4)]
2
Take derivative for both sides:
y0

1 1 1 1 1
= + − −
y 2 x−1 x−2 x−3 x−4
s
0 1 (x − 1)(x − 2) 1 1 1 1
y = + − −
2 (x − 3)(x − 4) x − 1 x − 2 x − 3 x − 4
q
for x ∈ (−∞, 1] we can rewrite y = (1−x)(2−x)
(3−x)(4−x) , then do the same as above
q
for x ∈ [2, 3) we can rewrite y = (x−1)(x−2)
(3−x)(4−x) , then do the same as above.
9 Taylor expansion
Consider the function f (x) = ln(x2 + x − 1). Compute the exact value of f at x = 1.1 and compare it to
the third-order Taylor approximation in the neighborhood of x = 1. What is the order of the approximation
error?
Answer:
f (1.1) = ln(1.12 + 1.1 − 1) = ln(1.31) = 0.27

2x + 1
f 0 (x) = 2
x +x−1
2(x2 + x − 1) − (2x + 1)(2x + 1) 2x2 + 2x − 2 − 4x2 − 4x − 1 −2x2 − 2x − 3
f 00 (x) = 2 = 2 = 2
(x2 + x − 1) (x2 + x − 1) (x2 + x − 1)
2
(−4x − 2) x2 + x − 1 − −2x2 − 2x − 3 2 x2 + x − 1 (2x + 1)
f 000 (x) = 4
(x2 + x − 1)
2+1 1 −2 − 2 − 3
f (x) = ln(x2 + x − 1) ≈ ln(1 + 1 − 1) + (x − 1) + (x − 1)2 +
1+1−1 2 (1 + 1 − 1)2
2
1 (−4 − 2) (1 + 1 − 1) − (−2 − 2 − 3) 2 (1 + 1 − 1) (2 + 1)
+ 4 (x − 1)3
6 (1 + 1 − 1)
7
= 0 + 3(x − 1) − (x − 1)2 + 6(x − 1)3
2
7 2
f (1.1) ≈ 3 ∗ 0.1 − 0.1 + 6 ∗ 0.13 = 0.3 − 0.035 + 0.006 = 0.271
2
The error equals 0.001 ∼

0.27 = 0.37% of the solution. Generally, we define the order of the error as the order
of the derivative in the Lagrange Remainder of the Taylor series. In this example we use a third order
approximation, so the order of the error is four. However, a precise estimation of the error requires an
analysis of the remainder term.
5
Problem Set 7: Calculus
TA: Madhushree
Due: Nov 4
1 Derivatives
Find the following derivatives:
1+x−x2
1. y(x) = 1−x+x2
Answer:
1.
1 + x − x2 0 (1 + x − x2 )0 (1 − x + x2 ) − (1 + x − x2 )(1 − x + x2 )0
=
1 − x + x2 (1 − x + x2 )2
(1 − 2x)(1 − x + x ) − (1 + x − x2 )(−1 + 2x)
2
=
(1 − x + x2 )2
(1 − 2x)(1 − x + x2 + 1 − x + x2 )
=
(1 − x + x2 )2
2(1 − 2x)
=
(1 − x + x2 )2
2 Taylor approximation
Consider the function f (x) = ln(x2 + x − 1). Compute the exact value of f at x = 1.1 and compare it to
the third-order Taylor approximation in the neighborhood of x = 1. What is the order of the approximation
error?
Answer:
f (1.1) = ln(1.12 + 1.1 − 1) = ln(1.31) = 0.27

2x + 1
f 0 (x) = 2
x +x−1
2
2(x + x − 1) − (2x + 1)(2x + 1) 2x2 + 2x − 2 − 4x2 − 4x − 1 −2x2 − 2x − 3
f 00 (x) = 2 = 2 = 2
(x2 + x − 1) (x2 + x − 1) (x2 + x − 1)
2
(−4x − 2) x2 + x − 1 − −2x2 − 2x − 3 2 x2 + x − 1 (2x + 1)
f 000 (x) = 4
(x2 + x − 1)
2+1 1 −2 − 2 − 3
f (x) = ln(x2 + x − 1) ≈ ln(1 + 1 − 1) + (x − 1) + (x − 1)2 +
1+1−1 2 (1 + 1 − 1)2
2
1 (−4 − 2) (1 + 1 − 1) − (−2 − 2 − 3) 2 (1 + 1 − 1) (2 + 1)
+ 4 (x − 1)3
6 (1 + 1 − 1)
7
= 0 + 3(x − 1) − (x − 1)2 + 6(x − 1)3
2
7 2
f (1.1) ≈ 3 ∗ 0.1 − 0.1 + 6 ∗ 0.13 = 0.3 − 0.035 + 0.006 = 0.271
2
1
The error equals 0.001 ∼
0.27 = 0.37% of the solution. Generally, we define the order of the error as the order
of the derivative in the Lagrange Remainder of the Taylor series. In this example we use a third order
approximation, so the order of the error is four. However, a precise estimation of the error requires an
analysis of the remainder term.
3 Partial derivatives
Assume (
1
(x2 + y 2 ) sin x2 +y 2, x2 + y 2 =
6 0,
f (x, y) =
0, x2 + y 2 = 0,
1. Show that all first-order partials exist, and check whether they are continuous.
2. Check whether the above function is differentiable at (0, 0).
Answer: (
1
(x2 + y 2 ) sin x2 +y 2, x2 + y 2 =
6 0
f (x, y) = ,
0, x2 + y 2 = 0
1. When (x, y) 6= (0, 0). we have

1 1 2x
fx0 (x, y) = 2x sin 2 2 2
+ (x + y ) cos 2 · − 2
x + y2 x + y2 (x + y 2 )2
1 2x 1
= 2x sin 2 − 2 cos 2 ,
x + y2 x + y2 x + y2
1 2y 1
fy0 (x, y) = 2y sin 2 − 2 cos 2 .
x + y2 x + y2 x + y2
At point (x, y) = (0, 0),
f (x, 0) − f (0, 0) 1
fx0 (0, 0) = lim = lim x sin 2 = 0,
x−→0 x x−→0 x
f (0, y) − f (0, 0) 1
fy0 (0, 0) = lim = lim y sin 2 = 0.
y−→0 y y−→0 y
We need to show lim fx0 (x, y) and lim fy0 (x, y). However when (x, y) approach (0, 0) along
y−→0,x−→0 y−→0,x−→0
y axis⇒ fx0 (x, 0) = 2x sin x12 − 2x 1
x2 cos x2 . The limit does not exist. So x−→0,y−→0
lim fx0 (x, y) does not
exist.=⇒ although all first order partial exist they are not continuous at (0, 0).
2. To check whether f (x, y) is differentiable at (0, 0) point,
f (x, y) − f (0, 0) − xfx0 (0, 0) − yfy0 (0, 0)

lim p
y−→0,x−→0 x2 + y 2
p 1
= lim x2 + y 2 · sin 2
y−→0,x−→0 x + y2
= 0 =⇒ f (x, y)
is differentiable at (0, 0) point.
2
4 First and second order partial derivatives
Let f : R2 → R be
xy(x2 −y 2 )
(
x2 +y 2 if (x, y) 6= (0, 0),
f (x, y) =
0 if (x, y) = (0, 0).
1. Find the first and second order partial derivatives with respect to x and y.
2. Compute the total differential of f (x, y).
3. Compute the total derivative with respect to y.
4. Check whether all second-order partial derivatives are continuous.
5. Does Schwartz’s Theorem apply at (x, y) = (0, 0)?
Answer:
1. If (x, y) 6= (0, 0),
∂f yx4 + 4y 3 x2 − y 5
= ,
∂x (x2 + y 2 )2
∂f −xy 4 − 4x3 y 2 + x5
= ,
∂y (x2 + y 2 )2
∂2f 12xy 5 − 4x3 y 3
= ,
∂x2 (x2 + y 2 )3
∂2f 4x3 y 3 − 12x5 y
= ,
∂y 2 (x2 + y 2 )3
∂2f ∂2f x6 − y 6 + 9x4 y 2 − 9x2 y 4
= = .
∂y∂x ∂x∂y (x2 + y 2 )3
If (x, y) = (0, 0),
∂f f (h, 0) − f (0, 0)
= lim = 0,
∂x h→0 h
∂f f (0, h) − f (0, 0)
= lim = 0,
∂y h→0 h
∂f ∂f
∂2f ∂x |(h,0)
− ∂x |(0,0)
= lim = 0,
∂x2 h→0 h
∂f ∂f
∂2f ∂y | (0,h) − ∂y |(0,0)
= lim =0
∂y 2 h→0 h
∂f ∂f yx4 +4y 3 x2 −y 5
∂2f |(0,h) −0
∂x |(0,h) − ∂x |(0,0) (x2 +y 2 )2
= lim = lim =
∂y∂x h→0 h h→0 h
h·04 +4h3 ·02 −h5 −h5
(02 +h2 )2 −0 −0
h4
= lim = lim = −1
h→0 h h→0 h
∂f ∂f
∂y |(h,0) − ∂y |(0,0)
2
∂ f
= lim = 1.
∂x∂y h→0 h
3
2. The total differential is given by
∂f (x, y) ∂f (x, y)
df = dx + dy =
∂x ∂y
yx4 + 4y 3 x2 − y 5 −xy 4 − 4x3 y 2 + x5
= 2 2 2
dx + dy
(x + y ) (x2 + y 2 )2
3. The total derivative w.r.t. y is given by
df ∂f (x, y) dx ∂f (x, y) dy
= + =
dy ∂x dy ∂y dy
yx4 + 4y 3 x2 − y 5 dx −xy 4 − 4x3 y 2 + x5
= +
(x2 + y 2 )2 dy (x2 + y 2 )2
∂2f
4. Let us see the values of the function ∂x2 along the line y = αx, α ∈ R
∂2f 12xα5 x5 − 4x3 α3 x3 12α5 − 4α3

= = = const
∂x2 (x2 + α2 x2 )3 (1 + α2 )3
2
Along the line y = αx, α ∈ R the function ∂∂xf2 is constant, no matter how closely it approaches to
(0, 0), and depends on the ratio of the arguments x and y only. We conclude, that the origin is a point
2
of discontinuity for the function ∂∂xf2 .
∂2f ∂2f ∂2f
We use the same idea to prove that ∂y 2 , ∂y∂x , ∂x∂y are discontinuous in the origin.
∂2f ∂2f x6 − α6 x6 + 9x4 α2 x2 − 9x2 α4 x4 1 − α6 + 9α2 − 9α4

= = =
∂y∂x ∂x∂y (x2 + α2 x2 )3 (1 + α2 )3
∂2f 3 3 3
4x α x − 12x αx5 3
4α − 12α
= =
∂y 2 2 2
(x + α x )2 3 (1 + α2 )3
5 Concavity using Hessian matrix

q
5
You are given the following function: f (x, y) = −2x4 − y(3 + 10x) − 2y 6 − 5y 2 , with x ≥ 12 .
1. Is this function concave, convex, strictly concave, strictly convex?
2. What can you say about quasi-concavity and quasi-convexity of this function (without performing
further calculations)?
3. What can you say about strict quasi-concavity and strict quasi-convexity of this function (without
performing further calculations)?
Answer:
1. In order to assign (strict) concavity and convexity, let us compute the Hessian matrix. We have:
!
−24x2 −10
H(x; y) =
−10 −60y 4 − 10
4
We can now compute leading principal minors:
r
2 5
|H1 | = −24x < 0 (as x > ),
12
|H2 | = 24x2 (60y 4 + 10) − 100 = 1440x2 y 4 + 10(24x2 − 10) ≥ 0.
q
5
We directly see that the second inequality turns to the equality when y = 0 and x = 12 , which is
different from (0, 0). Thus, the Hessian H(x; y) is neither PD, nor ND and the corresponding function
f (x; y) is neither strictly convex, nor strictly concave.
Let us thus check arbitrary principal minors:
Order 1: ∆11 = −24x2 < 0 and ∆21 = −(60y 4 + 10) < 0
Order 2: ∆2 = |H2 | ≥ 0.
As we have (−1)k ∆k ≥ 0, H(x; y) is negative semi-definite and the function f (x; y) is concave.
2. As we know that f (x; y) is concave, it follows that f (x; y) is also quasi-concave.
3. However, we cannot say anything about strict quasi-concavity before performing further computation.
6 Quasi-concavity
Let
f (x) = x3 ,
g(y) = −y.
1. Check whether f , g are strictly concave/strictly convex, concave/convex, pseudo-concave/pseudo-

convex, quasi-concave/quasi-convex.
2. Show that f + g is neither quasi-concave nor quasi-convex.
3. Show that a pseudo-concave function can only achieve its global maximum at a zero gradient point.
1. Df (x) = 3x2 ≥ 0, D2 f (x) = 6x ∈ R. ⇒ f (x) is not strictly concave/convex, not concave/convex.

For quasi-concavity, ∀x0 , x00 ∈ R, x0 6= x00 , assume f (x00 ) > f (x0 ) ⇒ x00 > x0 x3 is an increasing function ⇒

Df (x0 )(x00 − x0 ) ≥ 0. So f (x) is quasi-concave.

For quasi-convexity, we need to show that h(x) = −f (x) is quasi-concave: ∀x0 , x00 ∈ R, x0 6= x00 , assume
h(x00 ) > h(x0 ) ⇒ x00 < x0 −x3 is a decreasing function ⇒ Df (x0 )(x00 − x0 ) = −3x02 (x00 − x0 ) ≥ 0. So

h(x) is quasi-concave and f (x) is quasi-convex.

For pseudo-concavity, ∀x0 , x00 ∈ R, x0 6= x00 , f (x00 ) > f (x0 ) ; Df (x0 )(x00 − x0 ) > 0. So it is not
pseudo-concave.
g(y) is a linear function, so it is not strictly concave/convex, but concave/convex, hence quasi con-
cave/convex, pseudo concave/convex.
2. Use bordered Hessians:  

0 3x2 −1
H̄r =  3x2 6x 0 ,
 
−1 0 0
5
where we have (−1) H̄1 = 9x4 ≥ 0, (−1)2 H̄2 = −6x ∈ R. So it can not be quasi concave/convex.
3. Suppose f is pseudo concave and Df (x∗ ) = 0. If f (x∗ ) is not a global maximum, ⇒ ∃ x∗∗ 6= x∗ , s.t.
f (x∗∗ ) > f (x∗ ) ⇒ Df (x∗ )(x∗∗ − x∗ ) = 0. Therefore it is contradicting that f is pseudo concave.
6
Problem Set 9: Calculus and Probability
TAs: Oksana, Laura, Remy
Due Nov 13
1 Integration
Compute the following integrals:
Re
1. x2 ln xdx
1
Z
2. cos (2x) e3x dx
Re√ R1 2
3. 1
ln x dx + 0
ex dx
R 3 R 2x
4. 2 x
(x + 2y) dydx
RπRπ sin x
5. 0 y x dxdy
Z √
sin x
6. √ dx;
x
Z
7. ex sin x dx.
Answer:
1. To compute
Ze
x2 ln xdx
1
let
ln x = u
1
x dx= du
2
x dx = dw
x3
3 =w
so
Ze Ze
x3 x3 1
x ln xdx = ln x |e1 −
2
dx =
3 3 x
1 1
e3 x3 e3 e3 1 2e3 1
= − |e1 = − + = +
3 9 3 9 9 9 9
1
2.
cos 2x = u
Z Z
−2 sin (2x) dx = du e3x e3x
I= cos (2x) e3x dx = = cos (2x) − − 2 sin (2x) dx
e3x dx = dw 3 3
e3x
3 =w
Z
e3x 2
= cos (2x) + sin (2x) e3x dx
3 3
sin (2x) = u
Z Z
2 cos (2x) dx = du e3x e3x
sin (2x) e3x dx = 3x
= sin (2x) − 2 cos (2x) dx
e dx = dw 3 3
e3x
3 =w
3x Z
e 2 e3x 2
= sin (2x) − cos (2x) ∗ e3x dx = sin (2x) − I
3 3 3 3
3x Z 3x

e 2 e 2 e3x 2
I = cos (2x) + sin (2x) e3x dx = cos (2x) + sin (2x) − I ⇒
3 3 3 3 3 3

3 2
I= cos (2x) + sin (2x) e3x + C
13 13
√
3. Introducing the following substitution: ln x = y, one can compute the integral:
√ 2
Z ( y = ln x ⇒ y 2 = ln x ⇒ ey = x )
e √ 2 2
ln x dx = dy = 21 √ln
1 1
xx
dx ⇒ dx = 2yey dy = dey =
1
y = 1 if x = e, y = 0 if x = 1
Z 1 1 Z 1 Z 1
2 2 2 2
(1st way) = ydey = yey − ey dy = e − ex dx
0 0 0
0
Z ( u=y du = dy )
1
nd y2
(2 way) = y ∗ 2ye dy = =
0 y2 y2
v=e dv = 2ye dy
1 Z 1 Z 1
y2 y2 2
= ye − e dy = e − ex dx
0 0
0
Z e √ Z 1
2
ln x dx + ex dx = e.
1 0
4.  
Z 3 Z 2x Z 3 2x Z 3 3
 2  2 4x3 76
(x + 2y) dydx = xy + y  dx = 4x dx = = .
2 x 2 2 3 3
x 2
5. Reversing the order of integration properly, that is {0 ≤ y ≤ π, y ≤ x ≤ π} ⇐⇒ {0 ≤ x ≤ π, 0 ≤ y ≤ x},

we obtain
2
 
x π
RπRπ sin x
RπRx sin x
Rπ  Rπ
0 y x dxdy = 0 0 x dydx = 0  sinxx y  dx = 0 sin xdx = −cos x = 2.
0 0
6. Integration by substitution works when we have an integral of the following form:


Z u = g(x) Z
f [g(x)] g 0 (x) dx ⇒ ⇒ f (u) du
du = g 0 (x)dx
We have:
 √  √
Z √ f [g(x)] = sin x u = g(x) = x
sin x
√ ⇒ 1 ⇒ 1 √
x g 0 (x) = √ du = √ dx ⇒ dx = 2 xdu
2 x 2 x
Z Z
sin u √
⇒ 2u du = 2 sin u du = −2 cos u + C = −2 cos x + C
u
7. Integration by parts works when we have an integral of the following form:
Z Z
0
f (x)g (x) dx = f (x)g(x) − f 0 (x)g(x) dx
We have:
 
Z f (x) = sin x f 0 (x) = cos x
ex sin x dx where ⇒
g 0 (x) = ex g(x) = ex
Z Z
⇒ ex sin x dx = ex sin x − ex cos x dx = ex sin x − A
 
Z f (x) = cos x f 0 (x) = − sin x
A= ex cos x dx where ⇒
g 0 (x) = ex g(x) = ex
Z Z
A = e cos x − −e sin x dx = e cos x + ex sin x dx
x x x
Z Z Z
x x x x x x sin x − cos x
So e sin x dx = e sin x − e cos x − e sin x dx ⇒ e sin x dx = e
2
2 Bounds of integration
Compute the following integrals and see what happens when you change the order of integration
Z 1Z 1
1. (1 − x2 − y 2 ) dx dy;
0 0
Z √
1Z x y
e
2. dy dx.
0 x y
3
Z y=1 Z x=1
1. (1 − x2 − y 2 ) dx dy.
y=0 x=0
Here, the region over which we want to integrate does not depend neither on x nor on y. In other
words, we want to integrate the function f (x, y) = 1 − x2 − y 2 over the square of area 1, given by
0 ≤ x ≤ 1 and 0 ≤ y ≤ 1. So, setting the bounds of integration is easy because, no matter what x I
take, y still goes from 0 to 1.
Every double integral can be computed in two steps:
i) Compute the inner integral (in dx), treating y as a constant.

Z x=1 1
x3 1 2
(1 − x2 − y 2 ) dx = x − − y2 x = 1 − − y2 = − y2 .
x=0 3 0 3 3
ii) Take the result obtained in the point i) and plug it into the outer integral
Z y=1 1
2 2 2 2 y3 2 1 1
− y dy = y − = − =
y=0 3 3 3 0 3 3 3
Z Z √
1 x y
e
2. dy dx.
0 x y
Here the problem is more complicated. Indeed the bounds of integration involve x. To understand and
set the region of integration it is better to view some pictures.
i) First we draw the bounds of integration in the x − y plane. The region over which we integrate is
√
the area between the two curves, y = x and y = x. See Figure 1.
y=x
y
√
1 y= x
0 x
1
Figure 1: Set the bounds of integration
ii) We fix the outer variable, x. See Figure 2.
4
y=x
y
√
1 y= x
Z Z √
x=1 y= x y
e
dy dx
x=0 y=x y
1. Fix the outer

variable, x.
0 x
x0
1
Figure 2: Fix the outer variable, x
iii) We check where the slice of the inner variable y goes. See Figure 3.
y=x
y
√
1 y= x
√
...to y = x
2. Set the size of
the slice of the
inner variable, y.
from y = x...
Z Z √
x=1 y= x y
e
dy dx
x=0 y=x y
0 x
1
Figure 3: Check the size of the slice of the inner variable y
iv) We let the outer variable x vary from its first value, x = 0, to its last value, x = 1. In this way
we sum up all the slices built in points ii) and iii). See Figure 4.
5
y=x
This tells me which is the first y
slice, x = 0, and which is the
last slice, x = 1. √
1 y= x
Z Z √
x=1 y= x y
e
dy dx
x=0 y=x y
This tells me the size of

a slice in the y direction,
for a fixed x. This size
depends on x. Indeed Sum up all the slices in the direction
the blue slices change as x of the outer variable, x.
changes.
0 x
1
Figure 4: Sum up all the slices along the outer variable, x
Once the bounds of integration are set, we integrate the function. We start with the inner integral.
Z √
y= x y
e
dy
y=x y
We notice that there is no closed form solution for this integral. Hence we must change the order of
integration. To do this, we do again the steps made before, but changing the role of x and y. See
Figure 5,6 and 7.
y=x
y
√
1 y= x
Z y=1 Z x=y y
e
dx dy
y=0 x=y 2 y
y0
1. Fix the outer

variable, y.
0 x
1
Figure 5: Fix the outer variable, y
6
y=x
y
from x = y 2 ... ...to x = y
√
1 y= x
Z y=1 Z x=y y
e
dx dy
y=0 x=y 2 y
2. Set the size of

the slice of the
inner variable, x.
0 x
1
Note: you may ask why we are going from x = y 2 to x = y and not the other way
around. This is because, in the previous case, we were going
√ from the smaller value
of y (that is y = x) to the bigger value of y (that is y = x). Here, we must keep
the same order. So, we go from the smaller value of x (that is x = y 2 ) to the bigger
value of x (that is x = y).
Figure 6: Check the size of the slice of the inner variable, x
v) Once the bounds of integration are set, we integrate the function:

Z y=1 Z x=y y Z y=1 y Z y=1
e ey
dx dy = x dy = [ey − yey ] dy
y=0 x=y 2 y y=0 y y2 y=0
Z y=1 Z y=1 Z y=1
y y 1 1
= e dy − ye dy = [ey ]0 − [yey ]0 − y
e dy
y=0 y=0 y=0
n o
1 1 1 1 1
= [ey ]0 − [yey ]0 − [ey ]0 = [ey − yey + ey ]0 = [2ey − yey ]0
= (2e − e) − (2 − 0) = e − 2.
7
y=x
y
This tells me which is the
first slice, y = 0, and √
1 y= x
which is the last slice,
y = 1.
Z y=1 Z x=y y
e
dx dy
y=0 x=y 2 y
This tells me the size of a

slice in the x direction, for
a fixed y. This size de-
pends on y. Indeed the Sum up all the slices along the outer
green slices change as y variable, y.
changes. 0 x
1
Figure 7: Sum up all the slices along the outer variable, y
8
3 Probability
Let A and B be two independent events.
1) Show that Ac and B c are independent.
2) Show that Ac and B are independent.
3) Compute P(A | B)
Answer
1)
P(Ac ∩ B c ) = 1 − P(A ∪ B)
= 1 − (P(a) + P(B) − P(A ∩ B))
= 1 − P(A) − P(B) − P(A)P(B)
= (1 − P(A))(1 − P(B))
= P(Ac )P(B c )
Therefore, Ac and B c are independent.
2)
P(Ac ∩ B) = P(B) − P(A ∩ B)

= P(B) − P(A)P((B)
= P(B)(1 − P(A))
= P(B)P(Ac )
Therefore, Ac and B are independent.
3) We know that A and B are independent, therefore we have
P(A ∩ B) P(A)P(B)
P(A | B) = = = P(A)
P(B) P(B)
9
Problem Set 10: Statistics and Probability
TA: Madhushree
Due Nov 25
1 Random variable
Let’s assume the random variable Xt to be distributed as follows ∀t = 1...T :
(
1
2 if x = 1,
P[Xt = x] = 1
2 if x = −1.
Then we rewrite the sum of those independent random variables as:
t
X
Wt = Xi .
i=1
Compute the expectation and variance of the random variable WT .
Answer:
E[Xt ] = 21 × 1 + 21 × (−1) = 0
V[Xt ] = E[Xt2 ] − E[Xt ]2
E[Xt2 ] = 12 × 12 + 21 × (−1)2 = 1, ⇒
V[Xt ] = 1 − 02 = 1
With this, we can compute the expectation and the variance of Wt .
Pt
E[Wt ] = E[ i=1 Xi ]
Pt
= i=1 E[Xi ] = 0
| {z }
Pt =0
V[Wt ] = V[ i=1 Xi ]
Pt
= i=1 V[Xi ] = t
| {z }
=1
2 Coin problem
There are exactly 64350 gold coins in a chest. One of them is drawn randomly and tossed 5 times. Each
time the outcome is "heads".
1. What is the probability of this event ?
2. Now it appears that one of the coins was counterfeit : it had "heads" on both sides. What is the
probability that the coin that was drawn and tossed initially was actually the counterfeit ?
3. Later on you learn that there were another 29 counterfeits among those in the chest, all of which had
"tails" on both sides. Recalculate the probability that the coin tossed was the one with "heads" on
both sides.
Answer:
1
1. For each tossing the probability of the outcome “heads” is P (”heads”) = 12 . Outcomes of consecutive
tossings are independent events =⇒ probability of 5 consecutive outcomes “heads” is P (5 ”heads”) =
1
25 .
2. One can apply Bayes’s Theorem:
P (counterf eit) · P (5 ”heads” | counterf eit)

P (counterf eit | 5 ”heads”) =
P (5 ”heads”)
1 1
P (counterf eit) = =
N 64350
P (5 ”heads” | counterf eit) = 1.(Obviously)
P (5 ”heads”) = P (counterf eit) · P (5 ”heads” | counterf eit) + P (normal) · P (5 ”heads” | normal) =

1 1 1
= ·1+ 1− · 5.
64350 64350 2
1
64350 ·1 1 1
Therefore, P = = 2011.9 ≈ 2012 .
1
64350 ·1+( 1− 1
64350 )· 215
3. Bayes’s formula adjusted for part (c) is as follows:
1
0
64350 1
P = 1 29 30
1 = .
64350 ·1+ 64350 ·0+ 1− 64350 · 25
2011
2
3 Probability events
For events A, B, C, C1 , C2 such that 0 < P (C) < 1 which of the following statements are true?
1. P (A) > P (B) =⇒ P (A ∩ C) > P (B ∩ C)
2. P (A |C ) = P (A) , P (B |C ) = P (B) , P (A) > P (B) =⇒ P (A ∩ C) > P (B ∩ C)
3. P (A |C ) ≥ P (B |C ) , P (A |C c ) ≥ P (B |C c ) =⇒ P (A) ≥ P (B)
4. 0 < P (Ci ) < 1, ∪2i=1 Ci = Ω, P (A |Ci ) ≥ P (B |Ci ) =⇒ P (A) ≥ P (B)
Answer:
1. False. Counterexample: let C = Ac =⇒ P (A ∩ C) = P (A ∩ Ac ) = 0 > P (B ∩ Ac ) which cannot be

true.
2. True. Since A and B are independent from C =⇒ P (A ∩ C) = P (A) P (C) , P (B ∩ C) =

P (B) P (C). Thus from P (A) > P (B) follows P (A ∩ C) > P (B ∩ C).
3. True. We have P (A |C ) P (C) ≥ P (B |C ) P (C) , P (A |C c ) P (C c ) ≥ P (B |C c ) P (C c ) =⇒
P (A) = P (A |C ) P (C) + P (A |C c ) P (C c ) ≥ P (B |C ) P (C) + P (B |C c ) P (C c ) = P (B) .
4. False. Counterexample: Ω = {ω1 , ω2 , ω3 } , P (ωi ) = 1/3, A = {ω1 } , B = {ω2 , ω3 } , C1 = {ω1 , ω2 } , C 2 =

{ω1 , ω3 } .
4 Probability measure
Given a complete probability space (Ω, F, P). Prove that the conditional probability defined by
P (A ∩ B)
P ( A| B) = ,
P (B)
with A, B ∈ F is a probability measure.
Answer: From the lecture notes we know that a probability measure is defined by the three axioms of
Kolmogorov, i.e
1. P (∅) = 0,
2. P (Ω) = 1,
S∞ P∞
3. P ( i=1 Ai ) = i=1 P (Ai ) for disjoint sets A1 , A2 , . . . ∈ F.
Thus, we have just to check whether the conditional probability measure satisfies all three conditions. In
order to show that all conditions are in fact true we fix the event B and just write Q (A) := P ( A| B) as
short-hand notation. Assume that all axioms are fulfilled for unconditional probabilities, like P (A) for any
set A ∈ F.
3
Ad 1.
P (∅ ∩ B) P (∅)
Q (∅) = = = 0,
P (B) P (B)
Ad 2.
P (Ω ∩ B) P (B)
Q (Ω) = = = 1,
P (B) P (B)
Ad 3.
∞ ∞
! S∞ S∞ P∞
[ P (( i=1Ai ) ∩ B) P ( i=1 (B ∩ Ai )) P (B ∩ Ai ) X
Q Ai = = = i=1 = Q (Ai ) ,
i=1
P (B) P (B) P (B) i=1
because the sets (B ∩ Ai ) are disjoint for all i = 1, . . . , ∞.
This completes our proof.
5 Dependent/Independent random variables

Consider the following random variables. The random pair (X, Y ) has the following joint distribution:
X
1 2 3
1 1 1
2 12 6 12
1 1
Y 3 6 0 6
1
4 0 3 0
Are X and Y dependent random variables? Explain your answer. What is the marginal distribution of X
and Y , respectively?
Answer: X and Y are independent random variables

⇔ P (X = a, Y = b) = P (X = a)P (Y = b)
1 1 1
P (X = 1) = , P (X = 2) = , P (X = 3) =
4 2 4
1 1 1
P (Y = 2) = , P (Y = 3) = , P (Y = 4) =
3 3 3
1 1 1
P (X = 1, Y = 2) = ∗ =
4 3 12
1 1 1 1
P (X = 1, Y = 3) = ∗ = 6=
4 3 12 6
1 1 1
P (X = 1, Y = 4) = ∗ = 6= 0
4 3 12
So, X and Y are dependent random variables
4
Marginal distribution of X:
x < 1, FX (x) = 0
1 1 1
1 ≤ x < 2, FX (x) = + =
12 6 4
1 1 1 3
2 ≤ x < 3, FX (x) = + + =
4 6 3 4
3 1 1
x ≥ 3, FX (x) = + + =1
4 12 6
Marginal distribution of Y:
y < 2, FY (y) = 0
1 1 1 1
2 ≤ y < 3, FY (y) = + + =
12 6 12 3
1 1 1 2
3 ≤ y < 4, FY (y) = + + =
3 6 6 3
2 1
y ≥ 4, FY (y) = + = 1
3 3
5
6 Statistics for finance
Standard quantitative models of the stock market assume that stock returns follow a log-normal distribution.
If log X ∼ N (µ, σ 2 ), 0 < X < ∞, −∞ < µ < ∞, σ 2 > 0.
1. Find the pdf for X.
2. Compute E(X), V ar(X).
Answer:
1. Set Y = g(X) = log(X), then X = g −1 (Y ),we have:
Increasing Monotone Transformation

FX (x) = P (X ≤ x) = P (g −1 (Y ) ≤ x) P (g g −1 (Y ) ≤ g(x)) = P (Y ≤ g(x)) = FY (g(x)).

=
For pdf:
d d Chain Rule d d 1 1 (log x−µ)2

fX (x) = FX (x) = FY (g(x)) = FY (g(x)) (g(x)) = fY (g(x))· = √ e− 2σ2 .
dx dx dg dx x 2πσx
2.
Z Z
1 (y−µ)2 1 (y−µ)2 −2σ 2 y
E(X) = E(elog X ) = E(eY ) = ey √ e− 2σ2 dy = √ e− 2σ 2 dy
2πσ 2πσ

(y−µ)2 −2σ 2 (y−µ)+σ 4 −σ 4 −2σ 2 µ
Z Z
1 − 1 (y−µ−σ 2 )2 σ2
= √ e 2σ 2 dy = √ e− 2σ2 + 2 +µ dy
2πσ 2πσ
Z Z
1 (y−µ−σ 2 )2 σ 2 σ 2 1 (y−µ−σ 2 )2
= √ e− 2σ2 e 2 +µ dy = e 2 +µ √ e− 2σ2 dy
2πσ 2πσ
σ2
= eµ+ 2 .
2 2 2 2 2 2
V ar(X) = EX 2 − (EX) = Eelog X − (EX) = Ee2Y − (EX) = e2µ+2σ − e2µ+σ .
7 MGF
2
1
u2 )
1. Verify that the MGF of a N (µ, σ 2 ) is ΦX (u) = e(uµ+ 2 σ .
2. Compute E[X] and V[X] using the MGF of N (µ, σ 2 ).
Answer:
6
1.
+∞
Z +∞
Z
1 (x−µ)2 1 x2 +µ2 −2xµ−2σ 2 ux
ΦX (u) = E(euX ) = √ eux e− 2σ2 dx = √ e− 2σ 2 dx
2πσ 2πσ
−∞ −∞
+∞ 2 2 +∞ 2
1
Z ( ) (
x2 −2x µ+σ 2 u + µ+σ 2 u ) −(µ+σ2 u) +µ2
1
Z
−
(x−(µ+σ2 u)) 2 2
+ µu+ σ 2u

−
=√ e 2σ 2 dx = √ e 2σ 2 dx
2πσ 2πσ
−∞ −∞
+∞ 2
2 u2 1
Z (x−(µ+σ2 u)) σ 2 u2
µu+ σ −
=e 2 √ e 2σ 2
dx = eµu+ 2 .
2πσ
−∞
| {z }
=1(integral of pdf N (µ+σ 2 u,σ 2 ))
∂ΦX (u)
2. EX = ∂u |u=0 = ΦX (u) × (µ + σ 2 u)|u=0 = ΦX (0) ×µ = µ
| {z }
=1
V X = E(X ) − (EX) . (EX) = µ . We need to compute E(X 2 ).

2 2 2 2
∂ 2 ΦX (u)
E(X 2 ) = |u=0 = µ × ΦX (u) × (µ + σ 2 u) + σ 2 × u × ΦX (u) × (µ + σ 2 u) + σ 2 × ΦX (u)|u=0
∂u2
= µ × ΦX (0) × µ + σ 2 × ΦX (0) = µ2 + σ 2
V X = µ2 + σ 2 − µ2 = σ 2
7
Problem Set 11: Statistics and Probability
TA: Madhushree
Due Dec 2nd
1 Probability for finance: VaR

The Value-at-Risk (VaR) at probability α of a portfolio is defined as the minimum potential loss that the
portfolio may suffer in the α% worst cases, over a given time horizon.
Let the time horizon be one month. Then making an assumption on the distribution of portfolio returns, an
investor may determine that he has a 5% one month VaR of $x. This means that there is a 5% chance that
the investor could lose more than $x in any given month. Therefore, a $x loss should be expected to occur
once every 20 months.
Rb
Let a pdf (t)dt = 0.05. Find x and illustrate the case if
"
λe−λt t > 0,
1. a = −∞, b = x, pdf (t) =
0 t < 0,
t 2
2. a = −∞, b = x, pdf (t) = √1 e− 2σ2 ,
σ 2π
"
λe−λt t > 0,
3. a = x, b = ∞, pdf (t) = (How would you interpret the “positive” VaR?)
0 t < 0,
t 2
4. a = x, b = ∞, pdf (t) = √1 e− 2σ2 .
σ 2π
Answer:
Rb R0 Rx
1. a
pdf (t)dt = −∞
0dx + 0
λe−λt dt = 1 − e−λx
1 − e−λx = 0.05
ln 0.95
x=−
λ
Rx (t−µ)2
2. −∞
√ 1
2πσ 2
e− 2σ 2 dt = Φ( x−µ
σ )
x−µ
Φ(
) = 0.05
σ
In the table of standard normal distribution we find the closest corresponding argument
x−µ
≈ −1.64
σ
x ≈ −1.64σ + µ ≈ −1.64σ,
since µ = 0 in this case.
Pay attention that if you discuss a potential loss (a positive amount which you lose), then in α% worst
cases it will be greater than or equal to VaR. Thus, VaR is a minimum potential loss.
1
α%
losses
0 VaR
If you consider a potential return (a positive or, more interesting for us, negative amount which you
gain), then in α% worst cases it will be less than or equal to VaR. Thus, VaR will be a maximal
potential return.
α%
return
VaR
Rb R∞
3. a
pdf (t)dt = x
λe−λt dt = 0 + e−λx
e−λx = 0.05
ln 0.05
x=−
λ
R∞ (t−µ)2
4. x
√ 1
2πσ 2
e− 2σ 2 dt = 1 − Φ( x−µ
σ )
x−µ
1 − Φ( ) = 0.05
σ
x−µ
Φ(
) = 0.95
σ
In the table of standard normal distribution we find the closest corresponding argument
x−µ
≈ 1.64
σ
x ≈ 1.64σ + µ ≈ 1.64σ,
since µ = 0 in this case.
2 Mean and variance of a discrete and continuous random variable

Suppose that the random variable Z has cdf

0

 if z < 0

1

if z = 0
2
FZ (z) =


 (z + 1) /2 if 0 < z ≤ 1


1 if z > 1.

Is Z a discrete or continuous random variable? Calculate the mean and variance of Z.
2 1
Answer: Note that the cdf of the random variable Z is a mixture of a continuous and a step function,
thus it is neither continuous nor discrete. Nevertheless, for the continuous part the pdf is given by

 1 , 0 < x ≤ 1;
fX (x) = 2
0, otherwise.
Additionally, we have an atom at x = 0 with probability P (x = 0) = 12 . After we have found the pdf of Z
and identified the atom it is not to difficult to obtain the expectation and the variance of it. We have
1
1 1 1
Z
E (X) = · 0 + x dx = .
2 0 2 4
Z 1 2
1 1 2 1 5
V ar (X) = · 02 + x dx − = .
2 0 2 4 48
3 Moments from PDF function

2
Consider the pdf defined by fX (x) = x3 , x > 1 and zero elsewhere.
1. Show that pdf is correctly defined.
2. Find E(X) and V ar(X).
Answer:
R∞
1. fX (x) ≥ 0, 1 x23 dx = 1.
R∞
2. E(X) = 1 x x23 dx = 2, V ar(X) = ∞.
4 True or false?
You have two random variables, X and Y . Consider three statements:
A = ‘The expectation of the product of X and Y is equal to zero. That is, E[XY ] = 0.’
B = ‘The correlation between X and Y is zero.’
C = ‘X and Y are independent.’
For the following statements, state whether they are true or false and explain why:
1. ¬B =⇒ ¬A
2. A =⇒ B
3. C =⇒ B
4. C =⇒ (A ∧ B), where ∧ means a logical “and”.
Answer:
Cov(X;Y )
1. False. We have Cov(X; Y ) = E(XY ) − E(X)E(Y ). and Corr(X; Y ) = σX σY . If Corr(X; Y ) 6= 0,
Cov(X; Y ) 6= 0, but E(XY ) may still be equal to 0.
3
2. False. If E(XY ) = 0, it may be that E(X)E(Y ) 6= 0 and thus both covariance and correlation between
X and Y are different from 0.
3. True. If X and Y are independent, we have that E(XY ) = E(X)E(Y ) and Cov(X; Y ) = 0, leading
to Corr(X; Y ) = 0.
4. False. If X and Y are independent, we have that E(XY ) = E(X)E(Y ), but it does not mean that
E(XY ) = 0.
5 Conditional, Joint and Marginal Densities

Consider the following conditional density for Y given X = x is
2y + 4x
f Y |X ( y| x) = .
1 + 4x
The marginal density of x is

1 + 4x
fX (x) = ,
3
for 0 < x < 1 and 0 < y < 1. Find
1. the joint density fXY (x, y),
2. the marginal density of Y , fY (y), and
3. the conditional density for X given Y = y, f X|Y ( x| y).
Answer:
1. From Definition 4.18 in the lecture notes we know that the conditional density of a random variable
Y given another random variable X = x is

 fX,Y
 (x,y)
fX (x) , if fX (x) > 0;
f Y |X ( y| x) =
0,

otherwise.
1+4x
∀x ∈ (0, 1) fX (x) = 3 > 0, therefore, the joint density of X, Y is
2y + 4x 1 + 4x 2y + 4x
fXY (x, y) = f Y |X ( y| x) fX (x) = · = .
1 + 4x 3 3
The joint cumulative distribution is an integral over this density function

y x y x
2b + 4a y 2 x + 2x2 y
Z Z Z Z
FX,Y (x, y) = P(X 6 x, Y 6 y) = fXY (a, b) da db = da db = .
0 0 0 0 3 3
2. The marginal density of Y can be derived by integrating out the variable x

∞ 1
2y + 4x 2y + 2
Z Z
fY (y) = fXY (x, y) dx = dx = .
−∞ 0 3 3
4
The corresponding marginal distribution of Y is then given by
y y
2b + 2 y 2 + 2y
Z Z
FY (y) = P(Y 6 y) = fY (b) db = db = .
0 0 3 3
2y+2
3. Finally, as ∀y ∈ (0, 1) fY (y) = 3 > 0, the conditional density of X given Y is
2y+4x
fX,Y (x, y) 3 y + 2x
f X|Y ( x| y) = = 2y+2 = .
fY (y) 3
y+1
The corresponding conditional cumulative distribution function is

x x
y + 2a yx + x2
Z Z
FX|Y (x|y) = P(X ≤ x|Y = y) = fX|Y (a|y) da = da = .
0 0 y+1 y+1
6 Conditional Moments

For random variables X, Y ∼ N 0, σ 2 determine

1. E X X 2
2. E (X |XY )

3. E X + Y |X 2 + Y 2 .
Answer:
1. Since normal distribution is symmetric around zero, thus knowing X 2 = x one can immediately tell
√ √
that X = x with probability 1/2 or X = − x with probability 1/2. Thus E X X 2 = x =
√ √ √ √ √ √
P (X = x) · x + P (X = − x) · (− x) = 21 · x + 21 · (− x) = 0.

2
Another way to show is to notice that −X ∼ N 0, σ 2 =⇒ E X X 2 = E −X (−X) =
2
2
2

E −X X = −E X X =⇒ E X X = 0.

2. One may note that −X, −Y ∼ N 0, σ 2 .
E (X |XY ) = −E (−X |XY ) = −E (−X |(−X) (−Y ) ) = −E (X |XY )
because −X, −Y have the same distribution as X, Y . Thus E (X |XY ) = −E (X |XY ) ⇐⇒ E (X |XY ) =
0.
2 2
3. Applying the same idea, E X + Y |X 2 + Y 2 = −E (−X − Y | (−X) +(−Y ) ) = −E X + Y |X 2 + Y 2 =⇒

E X + Y |X 2 + Y 2 = 0.
5
Problem Set 12: Statistics and Static Optimization
TA: Samy
Due Dec 9
1 Cramer-Rao Lower Bound

Consider logistic distribution which is heavily used in demographics and economics.
e−x+θ π2
Its density function is f (x; θ) = (1+e −x+θ )2 , x ∈ R, and EX = θ, V arX = 3 . Consider the following
estimator of θ: θb = X.
1. Show that it is unbiased and consistent.
2. Show that it is not EFFICIENT estimator of θ. Hint: compute the Cramer-Rao lower bound.
Answer:
Pn
Xi nEX
1. Unbiased: E θb = EX = E i=1
n = n = EX = θ.
Pn
Xi V arX π2
Consistent: limn→∞ V arθb = limn→∞ V arX = limn→∞ V ar i=1
n = limn→∞ n = limn→∞ 3n =
0.
2
∂
2. We have − (∂θ) 2 ln f (x; θ) = 2f (x; θ) (obtained by direct computation!).
R∞ R∞
∂2
Thus I (θ) = E − (∂θ) 2 ln f (x; θ) = 2Ef (x; θ) = 2 −∞ f (x; θ) · f (x; θ) dx = 2 −∞ f 2 (x; θ) dx =
∞
2 ∞ ∞
e−x+θ e−x+θ 0
Z Z Z Z
−x+θ tdt tdt
=2 4 dx = 2 − 4 de =2 − 4 =2 4 =
−∞ (1 + e−x+θ ) −∞ (1 + e−x+θ ) ∞ (1 + t) 0 (1 + t)
Z ∞ −3
! Z ∞ ∞ Z ∞ ∞ Z ∞ −3
(1 + t) 2t (1 + t)
=2 td − =2 udv = 2uv −2 vdu = − 3 −2 − dt =
0 3 0 0 3(1 + t) 0 3
0 0
−2 ∞
(1 + t) 1
=0−2 = .
6 3
0
1 −1 3
The lower bound on the variance of all estimators of θ equals nI (θ) = n. Estimator θb = X is
efficient if its variance equals to the Cramer-Rao lower bound.
π2 3
However, V arθb = 3n > n, thus estimator θb is not EFFICIENT estimator of θ.
Remark: Note that the lower bound might not be achievable, i.e. the efficient estimator might not
exist. Also note that from the above analysis we cannot conclude anything on the optimality of θb in
the class of all unbiased estimators strictly speaking.
1
2 Unbiased estimator
Let X1 , ..., Xn be i.i.d N (µ, σ 2 ).
1. Show the statistics sample variance S 2 is unbiased estimator for σ 2 .

(n−1)S 2
2. Compute MSE of the estimator. (hints: σ2 ∼ χ2n−1 and Varχ2n−1 = 2(n − 1).)
n−1 2
3. An alternative estimator for σ 2 is the maximum likelihood estimator σ̂ 2 , show σ̂ 2 = n S .
4. Is σ̂ 2 a biased estimator? What is the variance of σ̂ 2 ? (e) Show that σ̂ 2 has smaller MSE than
S 2 .Explain why.
Answer:
1.
n
1X 1
E X̄ = E( Xi ) = nEX1 = µ;
n i=1 n
n
1X
V ar(X̄) = V ar( Xi )
n i=1
n
1 X
= V ar( Xi )
n2 i=1
1
= nV ar(X1 )
n2
2
σ
= ;
n
E(X̄ 2 ) = V ar(X̄) + (E X̄)2

σ2
= + µ2 ;
n
n
1 X
E(S 2 ) = E{ (Xi − X̄)2 }
n − 1 i=1
n
1 X
= E{ (X 2 − 2Xi X̄ + X̄ 2 )}
n − 1 i=1 i
n n n
1 X X X
= { E(Xi2 ) − 2E(X̄ Xi ) + E( X̄ 2 )}
n − 1 i=1 i=1 i=1
| {z }
=nX̄
1
= nE(X12 ) − 2nE(X̄ 2 ) + nE(X̄ 2 )
n−1
1 σ2
= {n(σ 2 + µ2 ) − n( + µ2 )}
n−1 n
= σ2 .
So Sample variance is unbiased estimator for σ 2 .
2
2. We get
(n − 1)S 2
V ar( ) = 2(n − 1) ⇒
σ2
(n − 1)2
V ar(S 2 ) = 2(n − 1) ⇒
σ4
2σ 4
V ar(S 2 ) = .
n−1
3. Compute MLE σ̂ 2 ,
n
1 1 X (Xi − µ)2
L(µ, σ 2 | X) = n exp{− };
2
(2πσ ) 2 2 i=1 σ2
n
n n 1 X (Xi − µ)2
lnL = − ln 2π − ln σ 2 − ;
2 2 2 i=1 σ2
FOC:
n
∂ ln L 1X
= 2 (Xi − µ) = 0 (1)
∂µ σ i=1
n
∂ ln L n 1 X
= − + (Xi − µ)2 = 0 (2)
∂σ 2 2σ 2 2σ 4 i=1
n
1 n−1 2
From (2) ⇒ σ̂ 2 = (Xi − X̄)2 ⇒ σ̂ 2 =
P
n n S .
i=1
4. E σ̂ 2 = E( n−1 2
n S )=
n−1 2
n σ (clearly, it is biased. )
n−1 2
2 2
Bias: E(σ̂ − σ ) = n σ −σ 2 = − n1 σ 2 (Notes: as n → ∞, bias → 0 )
2(n−1)σ 4
V ar(σ̂ 2 ) = V ar( n−1 2 n−1 2 2
n S ) = ( n ) V ar(S ) = n2 .
5. MSE of σ̂ 2 is given as
E(σ̂ 2 − σ 2 )2 = V ar(σ̂ 2 ) + (Bias(σ̂ 2 ))2

2(n − 1)σ 4 1
= + 2 σ4
n2 n
2n − 1 4
= σ .
n2
Compare M SE(σ̂ 2 ) with M SE(S 2 ), we find
2n − 1 4 2
σ < σ4 ⇒
n2 n−1
M SE(σ̂ 2 ) < M SE(S 2 ).
This shows there is trade-off between bias and variance.
3
3 Markov Chains
Let πij = Pr(z 0 = zj |z = zi ) be the transition probability of an economic system to move to state zj if the
previous state was zi . The stochastic process z forms a Markov chain. Consider the matrix:
" #
π11 π12
Π= .
π21 π22
1. Under what conditions does Π constitute a transition probability matrix?
2. Assume the probability distribution over the states today is pt = (pt1 , pt2 ). What is the probability
distribution over the state in period t + 1 and t + 2, respectively?
3. Is z a Markov process? Show.
4. A stationary distribution is a vector (q, 1 − q) with 0 ≤ q ≤ 1 such that the probability distribution
over the state in period t + 1 equals the probability distribution over the state in period t. Show that
q solves (2 − π22 − π11 )q = (1 − π22 ).
Answer: Markov chains:

P2
1. 0 ≤ πij ≤ 1, j=1 πij = 1.
2. pt+k = pt Πk .
3. E[z t+1 |z t = zi , z t−1 , ...z 0 ] = z1 πi1 + z2 πi2 = E[z t+1 |z t = zi ].
4. (q; 1 − q) = (q; 1 − q)Π = (qπ11 + (1 − q)(1 − π22 ), q(1 − π11 ) + (1 − q)π22 ) ⇔ q = qπ11 + (1 − q)(1 − π22 )
and 1 − q = q(1 − π11 ) + (1 − q)π22 ⇔ q(2 − π22 − π11 ) = (1 − π22 ).
4 Fisher Information Matrix

X1 , ..., Xn is a sample from a normal distribution with mean θ1 and variance 1, Y1 , ..., Yn is a sample from a
normal distribution with mean θ2 and variance 2 and Z1 , ..., Zn is a sample from a normal distribution with
mean θ1 + θ2 and variance 4. All three samples are independent.
1. Find the maximum likelihood estimators of θ1 and θ2 .
2. Find the Fisher Information Matrix.
3. Use it to find the asymptotic distribution of the MLEs.
Answer:
1. The log likelihood function is

n
1X 2 1 2 1 2
lnL(θ1 , θ2 | X, Y, Z) = |{z}
c − (Xi − θ1 ) + (Yi − θ2 ) + (Zi − θ1 − θ2 ) .
2 i=1 2 4
constant
4
The FOC
∂ 1
lnL = (X − θ1 ) + (Z̄ − θ1 − θ2 ) = 0 ⇒ 4X̄ + Z̄ = 5θ1 + θ2
∂θ1 4
∂ 1 1
lnL = (Ȳ − θ2 ) + (Z̄ − θ1 − θ2 ) = 0 ⇒ 2Ȳ + Z̄ = θ1 + 3θ2
∂θ2 2 4
Solving for θ̂1 and θ̂2 ,
6X̄ − Ȳ + Z̄
θ̂1 = .
7
−2X̄ + 5Ȳ + 2Z̄
θ̂2 = .
7
2. Take n = 1. The vector of scores is,
 
 
1 1 1
2 2 2

∂ c − 2 (X − θ 1 ) + 2 (Y − θ 2 ) + 4 (Z − θ 1 − θ 2 )
 |{z}
∂ ln f (θ1 , θ2 )

constant
=
∂θ " ∂θ#
(X − θ1 ) + 14 (Z − θ1 − θ2 )
= 1 1
.
2 (Y − θ2 ) + 4 (Z − θ1 − θ2 )
The Fisher Information Matrix:

" # " #
1
− 41 5 1

∂ ln f (θ1 , θ2 ) −1 − 4 4 4
I(θ1 , θ2 ) = −E 0
= −E = .
∂θ∂θ − 14 − 12 − 1
4
1
4
3
4
3. The inverse of the information matrix:
" #−1 " # " # " #

5 1 3
−1 4 4 1 4 − 41 8 3
4 − 14 1 6 −2
I(θ1 , θ2 ) = 1 3
= 15 1 = = .
4 4 16 − 16 − 14 5
4
7 − 14 5
4
7 −2 10
The asymptotic distribution of the MLE:

! " #!
√ θ̂1 − θ1 d −1 1 6 −2
n → N (0, I )=N 0, .
θ̂2 − θ2 7 −2 10
5 Static optimization: Oil exploration

(Final Exam 2012) Consider the exploration company Dig Deep (DD). DD requires fresh capital to fund a
new exploration project in the Gulf region. For this purpose, DD approaches the investment bank Big Bank
(BB) for financing. BB is happy to help but requires compensation for providing capital.
5
If DD can secure financing, the expected net present value to DD, N P VDD , of the exploration project is
V minus the underwriting fees paid to BB, X. BB, in turn, earns an expected net present value, N P VBB ,
equal to the fees. BB incurs no costs.
DD and BB now negotiate over how to split the surplus. They decide to play a Nash bargaining game.
The bargaining power of DD is α ∈ [0, 1] and the bargaining power of BB is 1 − α. The equilibrium in the
game is determined by maximizing the generalized Nash product S, where
S = [N P VDD ]α · [N P VBB ]1−α .
1. State the optimization problem formally. What are the natural constraints on X implied by the
participation constraints?
2. What is the equilibrium fee X ∗ charged by BB? How does it depend on α? Interpret.
3. What are the expected net present values N P VDD and, respectively, N P VBB ? How do they depend
on α? Interpret.
You are an analyst at BB in the credit risk department. One day, your supervisor asks you to analyze
the profitabilities of all the projects that BB has financed during the last year. For simplicity, assume the
bargaining power of BB last year was 50%. For the purpose of your empirical study, you are given a sample
∗
of past fees, Xsample = (X1∗ , X2∗ , X3∗ , ..., XN
∗
), on N deals. You decide to fit the fees to an exponential
distribution.
4. If X ∗ ∼ Exp(λ) has an exponential distribution with mean λ, what is the distribution of the project
values V given your assumptions above and the Nash solution?
5. Compute by using direct computations, the two first central moments of the random variables X ∗ and
V.
6. Find the maximum likelihood estimator for λ.
7. What is the expected project value, E[V ], and what is the variance of the project values, V ar[V ], given
∗
the sample Xsample ?
Answer:
1.
max S = (V − X)α X 1−α

X
s.t. X < V
2. FOC:
−α(V − X)(α−1) X 1−α + (V − X)α (1 − α)X −α = 0
⇒ (V − X)α (1 − α)X −α = α(V − X)(α−1) X 1−α

⇒ X ∗ = (1 − α)V
The fees charged by the bank is increasing with the bank’s bargaining power (1 − α).
6
3. For the bank sic previous answer, for the exploration company the NPV at equilibrium = V −X ∗ = αV
and so is decreasing with the bargaining power of the bank.
∗
4. The distribution of X ∗ is an exponential distribution with parameter λ. The distribution of V = (1−α)
X
follows also an exponential distribution with the scaled parameter: P(V ≤ x) = P(X ∗ ≤ (1 − α)x) =
1 − e−λ(1−α)x , so V ∼ Exp(λ(1 − α)).
X∗
5. First for E[V ] at equilibrium: V = 1−α , so if we set du = λe−λx dx and v = x, we get dv = dx and
u = −e−λx and it follows that:
 
Z ∞ ∞ Z ∞
1 1 
E[V ] = xλe−λx dx = −xe
−λx
+ e−λx dx

1−α 0 1−α 0
0
 
∞
1  1 −λx 1 1
= − e =

1−α λ 1−αλ
0
1 ∗ ∗
We look now for V ar(V ) = (1−α) 2 V ar(X ). We want to compute V ar(X ). We need to compute
E[X ∗ 2 ], this time we set du = λe−λx dx and v = x2 , get dv = 2xdx and u = −e−λx . This leads to:
∞ Z ∞
∗2 2 −λx
E[X ] = −x e + 2xe−λx dx
0
0
Z ∞
2 2 2
= xλe−λx dx = m1 = 2 ,
λ 0 λ λ
2
where m1 is the first moment of X ∗ . Now we eventually get V ar(X ∗ ) = m2 − (m1 ) = 1
λ2 and
1
V ar(V ) = ((1−α)λ) 2.
6. Here the constant (1 − α) does not play any role: The MLE is defined by
λ̂M L = arg max L(λ |X ∗ )
λ
or equivalently λ̂M L = arg max ln L(λ |X ∗ )

λ
We first define the likelihood function L(λ |X ∗ ) by
n
n Xi∗
P
−λ
∗ −λXi∗ n
Q
L(λ |X ) = λe =λ e i=1
i=1
We then compute the log-likelihood function
n !
Xi∗ n
P
−λ
ln L(λ |X ∗ ) = ln λn e Xi∗
P
i=1 = n ln λ − λ
i=1
The first-order condition for the maximum likelihood estimation is

FOC:
7
∂
∂λ ln L(λ̂M L |X ∗ ) = 0
n
n
Xi∗ = 0
P
λ̂
−
ML
i=1
n 1
λ̂M L = n = X¯∗
Xi∗
P
i=1
7. Using the ML estimator for λ we can rewrite the two first central moments of X ∗ as follows:
E [X ∗ ] = X¯∗
2
V [X ∗ ] = X¯∗
X¯∗ X¯∗ 2 2
Then we find E[V ] = 1−α = 2X¯∗ and V ar(V ) = (1−α)2 = 4X¯∗

PS Answers Fall2022 Merged

Uploaded by

Copyright:

Available Formats

PS Answers Fall2022 Merged

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

PS Answers Fall2022 Merged

Uploaded by

Copyright:

Available Formats

Mathematics for Economics and Finance (Fall 2022)

Problem Set 13: Optimization

⇒ D1 = 2, D2 = 4 > 0, Positive definite ⇒it is local minimum.

2 Constrained Optimization on General Convex Sets

1. If Df (x∗ ; θ) = 0, then x∗ is a local extremum.

2. If x∗ is a local extremum, then Df (x∗ ; θ) = 0.

3. If Df (x∗ ; θ) 6= 0, then x∗ is not a local extremum.

4. If x∗ is a local extremum and f is strictly quasi-concave, then x∗ is a global extremum.

5. If f is strictly concave on X × Θ, then the value function is strictly concave.

1. False. If Df (x∗ ; θ) = 0, then x∗ may be an inflection point.

2. False. If x∗ is a local extremum, then x∗ may be a corner solution at which Df (x∗ ; θ) 6= 0.

f (xλ ) = f (λx0 + (1 − λ)x∗ ) > min(f (x0 ), f (x∗ )) = f (x∗ )

4 Cobb-Douglas utility function

Using FOC , take the first derivative w.r.t x1 ,

5 The Kuhn-Tucker Method: Optimization Under Inequality Con-

max f (x, y, z) = 2x + y + az,

1. Explain why f must have a maximum on D.

3. Formulate the necessary conditions for the maxima of f on D.

4. Find the optimum.

5. What is the value of a if the value of z at optimum is √1 .

(a) D = {(x, y, z) ∈ R3 | x2 + ay 2 ≥ a, x2 + az 2 ≤ a, a > 0}.

2. There are two constraint functions:

g1 (x, y, z) = a(1 − y 2 ) − x2 ≥ 0, g2 (x, y, z) = a(1 − z 2 ) − x2 ≥ 0,

The constraints are not linear.

5g1 (x, y, z) = (−2x, −2ay, 0) and 5 g2 (x, y, z) = (−2x, 0, −2az) ,

(a) a = 0, impossible because by definition a > 0

It implies that CQ is satisfied for any point of the set D.

3. The Lagrangian is given by L = 2x + y + az + λ1 (a(1 − y 2 ) − x2 ) + λ2 (a(1 − z 2 ) − x2 )

The necessary slackness conditions are:

i.e. both constraints must be binding at optimum.

5. Using previous result q 1 = z ∗ therefore with z ∗ = √

1 Unconstrained optimization on open set

2 The Langrange Method: Optimization under Equality Constraints

To solve the problem form the Lagrangian:

The first-order conditions are:

We can write down the Lagrangian function

Substituting (3) into the constraint,

Differentiating the demand for the first good w.r.t. p2 .

4 Kuhn-Tucker Theorem again

u(x1 , x2 ) = x21 + 2x2 ,

2. Explain carefully why this maximization problem has a solution.

3. Show that all conditions required in Kuhn-Tucker’s Theorem are satisfied.

max u(x1 , x2 ) = x21 + 2x2

3. Since all the constraints are linear, constraint qualification is satisfied.

4. We have h3 (x1 , x2 ) = 6 − 2x1 − x2 .According to Kuhn-Tucker’s Theorem.

Since we know in a maximum , all income is used. So it means (6 − 2x1 − x2 ) = 0 (6)

So we have a solution (x∗1 , x∗2 ; λ∗1 , λ∗2 , λ∗3 ) = (2, 2; 0, 0, 2).

So we have a solution (x∗1 , x∗2 ; λ∗1 , λ∗2 , λ∗3 ) = (3, 0; 0, 1, 3).

So we have a solution (x∗1 , x∗2 ; λ∗1 , λ∗2 , λ∗3 ) = (0, 6; 4, 0, 2).

s.t. xt+1 = 2xt + vt , x0 is given, 0 < β < 1.

V (xt ) = min x2t + vt2 + βV (xt+1 ) .

We use the following guess for the value function,

The FOC gives the opimality condition that links xt and vt ,

2vt + 2βA (2xt + vt ) = 0,

PS. What remains to be checked is the transversality condition.

Kt+1 = (1 − δ)Kt + It (2)

1. Explain Bellman’s Principle of Optimality.

3. Write down the Bellman equation in terms of H and K.

2. State: Kt , Choices: Ct , Lt , It , Yt , Ht , Kt+1