Bellman Equation
Bellman Equation
Bellman Equation
Abstract
We study existence and uniqueness of a fixed point for the Bellman
operator in deterministic dynamic programming. Without any topo-
logical assumption, we show that the Bellman operator has a unique
fixed point in a restricted domain, that this fixed point is the value
function, and that the value function can be computed by value iter-
ation.
∗
This is an extensively revised version of the paper presented as “The Bellman Operator
as a Monotone Map” at the 11th SAET Conferance in Faro, 2011. I would like to thank
all participants at this conference and the Workshop in Honor of Cuong Le Van in Exeter,
2011, for comments and discussions. In particular, I am grateful to Larry Blume for his
encouraging comments and suggestions, and to Juan Pablo Rincón-Zapatero, V. Filipe
Martins-da-Rocha, Yiannis Vailakis, and Cuong Le Van for their helpful comments and
discussions. Financial support from the Japan Society for the Promotion of Science is
gratefully acknowledged.
†
RIEB, Kobe University, Rokkodai, Nada, Kobe 657-8501 JAPAN. Email:
[email protected]. Tel/Fax: +81-78-803-7015.
1 Introduction
Dynamic programming is one of the most fundamental tools in economic
analysis. This has been particularly true since the publication of the influ-
ential book by Stokey and Lucas (1989). In this book and earlier studies,
however, models with unbounded returns were not fully covered, though such
models are extremely common in economics, especially in macroeconomics.
This problem has been treated in several important contributions, includ-
ing Alvarez and Stokey (1998), Le Van and Morhaim (2002), and Rinćon-
Zapatero and Rodrı́guez-Palmero (2003, 2007, 2009).
Building on the work by the last pair of authors, Martins-da-Rocha and
Vailakis (2010) recently established one of the most general results on ex-
istence and uniqueness of a solution to the Bellman equation—or a fixed
point of the Bellman operator—applicable to models with unbounded re-
turns. Among the assumptions of their result are the following:
(i) The state space is Rn+ .
1
Although the uniqueness part of this result can be shown by extending
some of Stokey and Lucas’s (1989) arguments,1 the existence and conver-
gence parts require an additional tool. In the case of Martins-da-Rocha and
Vailakis (2010), it is their fixed point theorem on local contractions for both
existence and convergence (as well as uniqueness). In our case, we exploit
the monotonicity of the Bellman operator and apply the Knaster-Tarski fixed
point theorem (e.g., Aliprantis and Border, 2006) for existence, and to de-
velop additional monotonicity-based arguments for convergence.2
Unlike the previous contributions mentioned above, we establish no regu-
larity property of the value and policy functions. However, many properties
of these functions can be shown separately under additional assumptions.
For example, if the return function is upper semi-continuous, then the value
function is also upper semi-continuous; see Le Van and Morhaim (2002). If
the return function is concave, then the value function is also concave and
thus continuous except on the boundary of the state space.3 Such arguments
can easily be added to our analysis. Moreover, on a practical level, it is
useful to know that the value function can be computed by value iteration
regardless of its regularity properties.
The rest of the paper is organized as follows. In the next section, we
describe our framework and state our main result, which is proved in Section
4. In Section 3, we present two examples. The first one is trivial but has
a continuum of fixed points, illustrating the importance of restricting the
domain of the Bellman operator. The second example shows that value
iteration may fail to converge to the value function unless the initial function
is chosen appropriately.
1
We do not entirely follow their approach since we prove uniqueness along with existence
and convergence.
2
The monotonicity arguments of Bertsekas and Shreve (1978, Chapter 5) are not ap-
plicable to our setting since they require the return function to be everywhere positive or
everywhere negative. Le Van and Vailakis (2011) also use a monotonicity argument, but
they require the Bellman operator to be concave to ensure uniqueness of a fixed point.
Both Bertsekas and Shreve (1978) and Le Van and Vailakis (2011) deal with stochastic
models, which are beyond the scope of this paper.
3
For yet another example, if the return function is strictly concave, then the optimal
policy correspondence is single-valued and thus continuous (provided that it is upper hemi-
continuous).
2
2 The Main Result
Let X be a set. Let Γ be a nonempty-valued correspondence from X to X.
Let D be the graph of Γ:
Π = {{xt }∞ t=0 ∈ X
∞
: ∀t ∈ Z+ , xt+1 ∈ Γ(xt )}. (2.2)
∞
Π(x0 ) = {{xt }t=1 ∈ X : {xt }∞
∞
t=0 ∈ Π}, x0 ∈ X. (2.3)
where L ∈ {lim, lim} with lim = lim inf and lim = lim sup. Since u(x, y) < ∞
for all (x, y) ∈ D, the objective function is well-defined for any feasible path.
For {xt }∞t=0 ∈ Π, we define
∞
X
S({xt }∞
t=0 ) = L β t u(xt , xt+1 ). (2.5)
T ↑∞
t=0
Π0 = {{xt }∞ ∞
t=0 ∈ Π : S({xt }t=0 ) > −∞}, (2.7)
Π0 (x0 ) = {{xt }∞ ∞ 0
t=1 ∈ Π(x0 ) : {xt }t=0 ∈ Π }, x0 ∈ X. (2.8)
4
We follow the convention that sup ∅ = −∞.
3
Let V be the set of functions from X to [−∞, ∞). The Bellman operator
B on V is defined by
v ≤ v, (2.13)
Bv ≥ v, (2.14)
Bv ≤ v, (2.15)
∀{xt }∞ 0
t=0 ∈ Π , lim β t v(xt ) ≥ 0, (2.16)
t↑∞
∀{xt }∞
t=0 ∈ Π, lim β t v(xt ) ≤ 0. (2.17)
t↑∞
4
and Rodrı́guez-Palmero (2003) offer several nontrivial, economically relevant
examples satisfying stronger versions of these conditions. A detailed com-
parison between Theorem 2.1 and Martins-da-Rocha and Vailakis’s (2010)
result is available in an earlier version of this paper (Kamihigashi, 2011).
If there exist v, v ∈ V satisfying (2.13)–(2.15), then the Bellman operator
B has a fixed point in [v, v] by the Knaster-Tarski fixed point theorem (see
Section 4 for a precise argument). But if (2.16) and (2.17) are violated, then
B can have multiple fixed points in [v, v]; see Section 3.1 for an example.
If (2.16) is strengthened by replacing Π0 with Π, then it essentially follows
from Stokey and Lucas (1989, Theorem 4.3) that any fixed point of B in [v, v]
coincides with v ∗ . However, this strengthened version of (2.16) is almost
never satisfied if u is unbounded below. Conditions similar to (2.16) have
been used to solve this problem since Le Van and Morhaim (2002).
In conclusion (c), we have convergence to v ∗ only from v. Our argument
for (c) is based on the observation that the limit of the increasing sequence
{B n v}∞n=1 is the supremum of the sequence. This allows us to interchange this
supremum and another supremum (see (4.16)–(4.18)) to show that the limit
is the value function v ∗ . The case of the decreasing sequence {B n v}∞
n=1 , which
also converges pointwise, is not symmetric since the sup and inf operators
are in general not interchangeable. See Section 3.2 for an example satisfying
(2.13)–(2.17) in which limn↑∞ B n v 6= v ∗ .5
3 Counterexamples
3.1 Multiple Fixed Points
The Bellman operator B can have multiple fixed points in [v, v] if (2.16) and
(2.17) are violated. To see this, suppose that β > 0, X = Z+ , and
5
in [v, v]. In fact, for any a ∈ [−α, α], the function v defined by v(i) = aβ −i
for all i ∈ X is a fixed point of B. Therefore B has a continuum of fixed
points. Note that there is only one feasible path from state 0, which is given
by {xt }∞ ∞ t t
t=1 = {t}t=1 . Then β v(xt ) = −α and β v(xt ) = α for all t ∈ Z+ ; i.e.,
(2.16) and (2.17) are violated in this example.
3.2 Nonconvergence to v ∗
Even under (2.13)–(2.17), the sequence {B n v}∞ ∗
n=1 may not converge to v .
To see this, let α > 0 and suppose that α < β < 1. Consider the example
depicted in Figure 1; more precisely, assume the following:
X = {(i, j) : i, j ∈ Z+ , j ≤ i}, (3.2)
0 0
{(i , 0) : i ∈ Z+ } if (i, j) = (0, 0),
Γ((i, j)) = {(i, j)} if i = j 6= 0, (3.3)
{(i, j + 1)} if j < i,
−α
if (i, j) = (i0 , j 0 ) = (0, 0),
u((i, j), (i0 , j 0 )) = −β −i if (i, j) = (i0 , j 0 ) 6= (0, 0), (3.4)
0 otherwise.
6
Figure 1: States (i, j) ∈ X (circles), feasible transitions (arrows), and asso-
ciated returns (values adjacent to arrows) under (3.2)–(3.4)
-Β-3
3, 3
-2
-Β
0
2, 2 3, 2
-Β-1
0 0
1, 1 2, 1 3, 1
-Α
0 0 0
0
0, 0 1, 0 2, 0 3, 0
0
0 0
This formula works for (i, j) = (0, 0) as well; i.e., v n ((0, 0)) = 0 for all n ∈ N.7
Now letting v ∗ = limn↑∞ v n ,8 we see that v ∗ ((i, j)) = v ∗ ((i, j)) for all
(i, j) ∈ X \ {(0, 0)}, but v ∗ ((0, 0)) = 0 > v ∗ ((0, 0)); i.e., the sequence {v n }∞
n=1
fails to converge to v ∗ at (0, 0).
Interestingly, the sequence {B n v ∗ }∞ ∗
t=1 restarted from v converges to v .
∗
7
of the terminal stock xT given by v(xT ). This result extends the classical
idea of Bertsekas and Shreve (1978, Section 3.2) to our setting. The last
lemma is less trivial than the first two. The concluding argument applies the
Knaster-Tarski fixed point theorem and combines the first and last lemmas.
Since v(x0 ) > −∞, we have β T v(xT ) > −∞ for all T ∈ N. It follows that
T −1
X
T
v(x0 ) − − β v(xT ) ≤ β t u(xt , xt+1 ). (4.6)
t=0
By (2.17) we have v(x0 ) − ≤ v ∗ (x0 ). Since this is true for any > 0, we
have v(x0 ) ≤ v ∗ (x0 ). Since x0 was arbitrary, we obtain v ≤ v ∗ .
For any v ∈ V , define v1 = Bv; for each n ∈ N, provided that vn ∈ V ,
define vn+1 = Bvn . The following remark follows from (2.11).
8
Remark 4.1. Let v, w ∈ V satisfy v ≤ w and Bw ≤ w. Then for all n ∈ N,
we have vn ≤ w and thus vn ∈ V .
Proof. Note from (2.15) and Remark 4.1 with w = v that vn ∈ V for all
n ∈ N. For any x0 ∈ X, we have
9
This step uses the assumption that Γ is nonempty-valued.
9
where (4.13) uses (4.8) for T = n, (4.14) holds since u(x0 , x1 ) is independent
of {xi+1 }∞
i=1 , and (4.15) follows by combining the two suprema (see Kami-
higashi, 2008, Lemma 1). It follows that (4.8) holds for T = n + 1. By
induction, (4.8) holds for all T ∈ N.
where (4.17) uses Lemma 4.2, (4.18) follows by interchanging the two suprema
(see Kamihigashi, 2008, Lemma 1), (4.19) holds since Π0 (x0 ) ⊂ Π(x0 ) (recall
(2.8)) and LT ↑∞ aT ≤ supT ∈N aT for any sequence {aT } in [−∞, ∞), (4.20)
follows from the properties of lim and lim,11 and the inequality in (4.21) uses
(2.16). It follows that v ∗ ≥ v ∗ .
To complete the proof of Theorem 2.1, suppose that there exist v, v ∈ V
satisfying (2.13)–(2.17). The order interval [v, v] is partially ordered by ≤
10
Here v T = B T v for all T ∈ N and v ∗ (x) = limT ↑∞ v T (x) for all x ∈ X.
11
We have lim(at +bt ) ≥ lim at +lim bt and lim(at +bt ) ≥ lim at +lim bt for any sequences
{at } and {bt } in [−∞, ∞) whenever both sides are well-defined (e.g., Michel, 1990, p. 706).
10
(recall (2.10)). Given any F ⊂ [v, v], we have sup F ∈ [v, v] because
References
Aliprantis, C.D., Border, K.C., 2006, Infinite Dimensional Analysis: A Hitch-
hiker’s Guide, 3rd Edition, Springer-Verlag, Berlin.
Alvarez, F., Stokey, N.L., 1998, Dynamic programming with homogenous func-
tions, Journal of Economic Theory 82, 167–189.
Bertsekas, D.P., Shreve, S.E., 1978, Stochastic Optimal Control: The Dis-
crete Time Case, Academic Press, New York.
Kamihigashi, T., 2008, On the principle of optimality for nonstationary de-
terministic dynamic programming, International Journal of Economic
Theory 4, 519–525.
Kamihigashi, T., 2011, Existence and uniqueness of a fixed point for the
Bellman operator in deterministic dynamic programming, RIEB Discus-
sion Paper DP2011-23.
Le Van, C., Morhaim, L, 2002, Optimal growth models with bounded or un-
bounded returns: a unifying approach, Journal of Economic Theory 105,
158–187.
Le Van, C., Vailakis, Y., 2011, Monotone concave (convex) operators: ap-
plications to stochastic dynamic programming with unbounded returns,
memeo, University of Paris 1 and University of Exeter Business School.
Martins-da-Rocha, V.F., Vailakis, Y., 2010, Existence and uniqueness of a
fixed point for local contractions, Econometrica 78, 1127–1141.
12
See footnote 10 for the definitions of v n and v ∗ .
11
Michel, P., 1990, Some clarifications on the transversality condition, Econo-
metrica 58, 705–723.
Rinćon-Zapatero, J.P., Rodrı́guez-Palmero, C., 2003, Existence and unique-
ness of solutions to the Bellman equation in the unbounded case, Econo-
metrica 71, 1519–1555.
Rinćon-Zapatero, J.P., Rodrı́guez-Palmero, C., 2007, Recursive utility with
unbounded aggregators, Economic Theory 33, 381–391.
Rinćon-Zapatero, J.P., Rodrı́guez-Palmero, C., 2009, Corrigendum to “Ex-
istence and uniqueness of solutions to the Bellman equation in the un-
bounded case” Econometrica, Vol. 71, No. 5 (September, 2003), 1519–
1555, Econometrica 77, 317–318.
Stokey, N., Lucas, R.E., Jr., 1989, Recursive Methods in Economic Dynam-
ics, Harvard University Press, Cambridge, MA.
Strauch, R.E., 1966, Negative dynamic programming, Annals of Mathemat-
ical Statistics 37, 871–890.
12