A Tutorial On Dynamic Programming
A Tutorial On Dynamic Programming
A Tutorial On Dynamic Programming
Michael A. Trick
Mini V, 1997
Contents
• First Example
• A second example
• Common Characteristics
• The Knapsack Problem.
• An Alternative Formulation
• Equipment Replacement
• The Traveling Salesperson Problem
• Nonadditive Recursions
• Stochastic Dynamic Programming
o Uncertain Payoffs
o Uncertain States
o ``Linear'' decision making
• About this document ...
First Example
Let's begin with a simple capital budgeting problem. A corporation has $5 million to
allocate to its three plants for possible expansion. Each plant has submitted a number of
proposals on how it intends to spend the money. Each proposal gives the cost of the
expansion (c) and the total revenue expected (r). The following table gives the proposals
generated:
Each plant will only be permitted to enact one of its proposals. The goal is to maximize
the firm's revenues resulting from the allocation of the $5 million. We will assume that
any of the $5 million we don't spend is lost (you can work out how a more reasonable
assumption will change the problem as an exercise).
A straightforward way to solve this is to try all possibilities and choose the best. In this
case, there are only ways of allocating the money. Many of these are
infeasible (for instance, proposals 3, 4, and 1 for the three plants costs $6 million). Other
proposals are feasible, but very poor (like proposals 1, 1, and 2, which is feasible but
returns only $4 million).
1. For larger problems the enumeration of all possible solutions may not be
computationally feasible.
2. Infeasible combinations cannot be detected a priori, leading to inefficiency.
3. Information about previously investigated combinations is not used to eliminate
inferior, or infeasible, combinations.
Note also that this problem cannot be formulated as a linear program, for the revenues
returned are not linear functions.
Let's break the problem into three stages: each stage represents the money allocated to a
single plant. So stage 1 represents the money allocated to plant 1, stage 2 the money to
plant 2, and stage 3 the money to plant 3. We will artificially place an ordering on the
stages, saying that we will first allocate to plant 1, then plant 2, then plant 3.
Each stage is divided into states. A state encompasses the information required to go
from one stage to the next. In this case the states for stages 1, 2, and 3 are
Unlike linear programming, the do not represent decision variables: they are simply
representations of a generic state in the stage.
Associated with each state is a revenue. Note that to make a decision at stage 3, it is only
necessary to know how much was spent on plants 1 and 2, not how it was spent. Also
notice that we will want to be 5.
Let's try to figure out the revenues associated with each state. The only easy possibility is
in stage 1, the states . Table 2 gives the revenue associated with .
Table 2: Stage 1 computations.
We are now ready to tackle the computations for stage 2. In this case, we want to find the
best solution for both plants 1 and 2. If we want to calculate the best revenue for a given
, we simply go through all the plant 2 proposals, allocate the given amount of funds to
plant 2, and use the above table to see how plant 1 will spend the remainder.
For instance, suppose we want to determine the best allocation for state . In stage
2 we can do one of the following proposals:
The best thing to do with four units is proposal 1 for plant 2 and proposal 2 for plant 1,
returning 14, or proposal 2 for plant 2 and proposal 1 for plant 1, also returning 14. In
either case, the revenue for being in state is 14. The rest of table 3 can be filled
out similarly.
We can now go on to stage 3. The only value we are interested in is . Once again,
we go through all the proposals for this stage, determine the amount of money remaining
and use Table 3 to decide the value for the previous stages. So here we can do the
following at plant 3:
• Proposal 1 gives revenue 0, leaves 5. Previous stages give 17. Total: 17.
• Proposal 2 gives revenue 4, leaves 4. Previous stages give 14. Total: 18.
If you study this procedure, you will find that the calculations are done recursively. Stage
2 calculations are based on stage 1, stage 3 only on stage 2. Indeed, given you are at a
state, all future decisions are made independent of how you got to the state. This is the
principle of optimality and all of dynamic programming rests on this assumption.
cost. Let be the revenue of state in stage j. Then we have the following
calculations
and
All we were doing with the above calculations was determining these functions.
The computations were carried out in a forward procedure. It was also possible to
calculate things from the ``last'' stage back to the first stage. We could define
and
If you carry out the calculations, you will come up with the same answer.
You may wonder why I have introduced backward recursion, particularly since the
forward recursion seems more natural. In this particular case, the ordering of the stages
made no difference. In other cases, though, there may be computational advantages of
choosing one over another. In general, the backward recursion has been found to be more
effective in most applications. Therefore, in the future, I will be presenting only the
backward recursion, except in cases where I wish to contrast the two recursions.
A second example
Dynamic programming may look somewhat familiar. Both our shortest path algorithm
and our method for CPM project scheduling have a lot in common with it.
Let's look at a particular type of shortest path problem. Suppose we wish to get from A to
J in the road network of Figure 2.
The numbers on the arcs represent distances. Due to the special structure of this problem,
we can break it up into stages. Stage 1 contains node A, stage 2 contains nodes B, C, and
D, stage 3 contains node E, F, and G, stage 4 contains H and I, and stage 5 contains J.
The states in each stage correspond just to the node names. So stage 3 contains states E,
F, and G.
If we let S denote a node in stage j and let be the shortest distance from node S to
the destination J, we can write
where denotes the length of arc SZ. This gives the recursion needed to solve this
• by going to J,
• by going to J.
Stage 3.
Here there are more choices. Here's how to calculate . From F you can
either go to H or I. The immediate cost of going to H is 6. The following cost is
cost is for a total of 7. Therefore, if you are ever at F, the best thing to
You now continue working back through the stages one by one, each time completely
computing a stage before continuing to the preceding one. The results are:
Stage 2.
Stage 1.
Common Characteristics
There are a number of characteristics that are common to these two problems and to all
dynamic programming problems. These are:
1. The problem can be divided into stages with a decision required at each stage.
In the capital budgeting problem the stages were the allocations to a single plant.
The decision was how much to spend. In the shortest path problem, they were
defined by the structure of the graph. The decision was were to go next.
The states for the capital budgeting problem corresponded to the amount spent at
that point in time. The states for the shortest path problem was the node reached.
3. The decision at one stage transforms one state into a state in the next stage.
The decision of how much to spend gave a total amount spent for the next stage.
The decision of where to go next defined where you arrived in the next stage.
4. Given the current state, the optimal decision for each of the remaining states does
not depend on the previous states or decisions.
In the budgeting problem, it is not necessary to know how the money was spent in
previous stages, only how much was spent. In the path problem, it was not
necessary to know how you got to a node, only that you did.
5. There exists a recursive relationship that identifies the optimal decision for stage
j, given that stage j+1 has already been solved.
6. The final stage must be solvable by itself.
The last two properties are tied up in the recursive relationships given above.
The big skill in dynamic programming, and the art involved, is to take a problem and
determine stages and states so that all of the above hold. If you can, then the recursive
relationship makes finding the values relatively easy. Because of the difficulty in
identifying stages and states, we will do a fair number of examples.
The stages represent the items: we have three stages j=1,2,3. The state at stage j
represents the total weight of items j and all following items in the knapsack. The
decision at stage j is how many items j to place in the knapsack. Call this value .
This leads to the following recursive formulas: Let be the value of using units of
capacity for items j and following. Let represent the largest integer less than or equal
to a.
An Alternative Formulation
There is another formulation for the knapsack problem. This illustrates how arbitrary our
definitions of stages, states, and decisions are. It also points out that there is some
flexibility on the rules for dynamic programming. Our definitions required a decision at a
stage to take us to the next stage (which we would already have calculated through
backwards recursion). In fact, it could take us to any stage we have already calculated.
This gives us a bit more flexibility in our calculations.
The recursion I am about to present is a forward recursion. For a knapsack problem, let
the stages be indexed by w, the weight filled. The decision is to determine the last item
added to bring the weight to w. There is just one state per stage. Let g(w) be the
maximum benefit that can be gained from a w pound knapsack. Continuing to use and
as the weight and benefit, respectively, for item j, the following relates g(w) to
previously calculated g values:
Intuitively, to fill a w pound knapsack, we must end off by adding some item. If we add
item j, we end up with a knapsack of size to fill. To illustrate on the above
example:
• g(0) = 0
• g(1) = 30 add item 3.
• add item 1.
•
add item 1 or 3.
•
add item 1.
•
add item 1 or 3.
This gives a maximum of 160, which is gained by adding 2 of item 1 and 1 of item 3.
Equipment Replacement
In the network homework, you already saw how to formulate and solve an equipment
replacement problem using a shortest path algorithm. Let's look at an alternative dynamic
programming formulation.
Suppose a shop needs to have a certain machine over the next five year period. Each new
machine costs $1000. The cost of maintaining the machine during its ith year of operation
three years before being traded in. The trade in value after i years is ,
, and . How can the shop minimize costs over the five year
period?
Let the stages correspond to each year. The state is the age of the machine for that year.
The decisions are whether to keep the machine or trade it in for a new one. Let be
the minimum cost incurred from time t to time 5, given the machine is x years old in time
t.
Now consider other time periods. If you have a three year old machine in time t, you
must trade in, so
If you have a two year old machine, you can either trade or keep.
So the best thing to do with a two year old machine is the minimum of the two.
Similarly
Stage 5.
Stage 4.
Stage 3.
Stage 2.
Stage 1.
Stage 0.
So the cost is 1280, and one solution is to trade in years 1 and 2. There are other optimal
solutions.
The traveling salesperson problem is to visit a number of cities in the minimum distance.
For instance, a politician begins in New York and has to visit Miami, Dallas, and Chicago
before returning to New York. How can she minimize the distance traveled? The
distances are as in Table 5.
The real problem in solving this is to define the stages, states, and decisions. One natural
choice is to let stage t represent visiting t cities, and let the decision be where to go next.
That leaves us with states. Imagine we chose the city we are in to be the state. We could
not make the decision where to go next, for we do not know where we have gone before.
Instead, the state has to include information about all the cities visited, plus the city we
ended up in. So a state is represented by a pair (i,S) where S is the set of t cities already
visited and i is the last city visited (so i must be in S). This turns out to be enough to get a
recursion.
You can continue with these calculations. One important aspect of this problem is the so
called curse of dimensionality. The state space here is so large that it becomes impossible
to solve even moderate size problems. For instance, suppose there are 20 cities. The
number of states in the 10th stage is more than a million. For 30 cities, the number of
states in the 15th stage is more than a billion. And for 100 cities, the number of states at
the 50th stage is more than 5,000,000,000,000,000,000,000,000,000,000. This is not the
sort of problem that will go away as computers get better.
Nonadditive Recursions
Not every recursion must be additive. Here is one example where we multiply to get the
recursion.
A student is currently taking three courses. It is important that he not fail all of them. If
the probability of failing French is , the probability of failing English is , and the
probability of failing Statistics is , then the probability of failing all of them is .
He has left himself with four hours to study. How should he minimize his probability of
failing all his courses? The following gives the probability of failing each course given he
studies for a certain number of hours on that subject, as shown in Table 6.
(What kind of student is this?) We let stage 1 correspond to studying French, stage 2 for
English, and stage 3 for Statistics. The state will correspond to the number of hours
studying for that stage and all following stages. Let be the probability of failing t
and all following courses, assuming x hours are available. Denote the entries in the above
table as , the probability of failing course t given k hours are spent on it.
Stage 3.
Stage 2.
So, the optimum way of dividing time between studying English and Statistics is
to spend it all on Statistics.
Stage 1.
The overall optimal strategy is to spend one hour on French, and three on
Statistics. The probability of failing all three courses is about 29%.
Uncertain Payoffs
Consider a supermarket chain that has purchased 6 gallons of milk from a local dairy.
The chain must allocate the 6 gallons to its three stores. If a store sells a gallon of milk,
then the chain receives revenue of $2. Any unsold milk is worth just $.50. Unfortunately,
the demand for milk is uncertain, and is given in the following table:
The goal of the chain is to maximize the expected revenue from these 6 gallons. (This is
not the only possible objective, but a reasonable one.)
Note that this is quite similar to some of our previous resource allocation problems: the
only difference is that the revenue is not known for certain. We can, however, determine
an expected revenue for each allocation of milk to a store. For instance, the value of
allocating 2 gallons to store 1 is:
give to store i. If we let the above table be represented by (the value of giving k
gallons to store i, then the recursive formulae are
If you would like to work out the values, you should get a valuation of $9.75, with one
solution assigning 1 gallon to store 1, 3 gallons to store 2 and 2 gallons to store 3.
Uncertain States
A more interesting use of uncertainty occurs when the state that results from a decision is
uncertain. For example, consider the following coin tossing game: a coin will be tossed 4
times. Before each toss, you can wager $0, $1, or $2 (provided you have sufficient
funds). You begin with $1, and your objective is to maximize the probability you have $5
at the end. of the coin tosses.
We can formulate this as a dynamic program as follows: create a stage for the decision
point before each flip of the coin, and a ``final'' stage, representing the result of the final
coin flip. There is a state in each stage for each possible amount you can have. For stage
1, the only state is ``1'', for each of the others, you can set it to ``0,1,2,3,4,5'' (of course,
some of these states are not possible, but there is no sense in worrying too much about
that). Now, if we are in stage i and bet k and we have x dollars, then with probability .5,
we will have x-k dollars, and with probability .5 we will have x+k dollars next period. Let
be the probability of ending up with at least $5 given we have $x before the ith coin
flip.
Note that the next state is not known for certain, but is a probabilistic mixing of states.
Another example comes from the pricing of stock options. Suppose we have the option to
buy Netscape stock at $150. We can exercise this option anytime in the next 10 days
(american option, rather than a european option that could only be exercised 10 days
from now). The current price of Netscape is $140. We have a model of Netscape stock
movement that predicts the following: on each day, the stock will go up by $2 with
probability .4, stay the same with probability .1 and go down by $2 with probability .4.
Note that the overall trend is downward (probably conterfactual, of course). The value of
the option if we exercise it at price x is x-150 (we will only exercise at prices above 150).
We can formulate this as a stochastic dynamic program as follows: we will have stage i
for each day i, just before the exercise or keep decision. The state for each stage will be
the stock price of Netscape on that day. Let be the expected value of the option on
day i given that the stock price is x. Then, the optimal decision is given by:
and
Given the size of this problem, it is clear that we should use a spreadsheet to do the
calculations.
There is one major difference between stochastic dynamic programs and deterministic
dynamic programs: in the latter, the complete decision path is known. In a stochastic
dynamic program, the actual decision path will depend on the way the random aspects
play out. Because of this, ``solving'' a stochastic dynamic program involves giving a
decision rule for every possible state, not just along an optimal path.
Suppose we are trying to find a parking space near a restaurant. This restaurant is on a
long stretch of road, and our goal is to park as close to the restaurant as possible. There
are T spaces leading up to the restaurant, one spot right in front of the restaurant, and T
after the restaurant as follows:
Each spot can either be full (with probability, say, .9) or empty (.1). As we pass a spot,
we need to make a decision to take the spot or try for another (hopefully better) spot. The
value for parking in spot t is . If we do not get a spot, then we slink away in
embarrasment at large cost M. What is our optimal decision rule?
We can have a stage for each spot t. The states in each stage are either e (for empty) or o
(for occupied). The decision is whether to park in the spot or not (cannot if state is o). If
In general, the optimal rule will look something like, take the first empty spot on or after
spot t (where t will be negative).