[PDF][PDF] Metrics for Finite Markov Decision Processes.
UAI, 2004•cdn.aaai.org
Markov decision processes (MDPs) offer a popular mathematical tool for planning and
learning in the presence of uncertainty (Boutilier, Dean, & Hanks 1999). MDPs are a
standard formalism for describing multi-stage decision making in probabilistic environments.
The objective of the decision making is to maximize a cumulative measure of longterm
performance, called the return. Dynamic programming algorithms, eg, value iteration or
policy iteration (Puterman 1994), allow us to compute the optimal expected return for any …
learning in the presence of uncertainty (Boutilier, Dean, & Hanks 1999). MDPs are a
standard formalism for describing multi-stage decision making in probabilistic environments.
The objective of the decision making is to maximize a cumulative measure of longterm
performance, called the return. Dynamic programming algorithms, eg, value iteration or
policy iteration (Puterman 1994), allow us to compute the optimal expected return for any …
Markov decision processes (MDPs) offer a popular mathematical tool for planning and learning in the presence of uncertainty (Boutilier, Dean, & Hanks 1999). MDPs are a standard formalism for describing multi-stage decision making in probabilistic environments. The objective of the decision making is to maximize a cumulative measure of longterm performance, called the return. Dynamic programming algorithms, eg, value iteration or policy iteration (Puterman 1994), allow us to compute the optimal expected return for any state, as well as the way of behaving (policy) that generates this return. However, in many practical applications the state space of an MDP is simply too large, possibly even continuous, for such standard algorithms to be applied. A typical means of overcoming such circumstances is to partition the state space in the hope of obtaining an “essentially equivalent” reduced system. One defines a new MDP over the partition blocks, and if it is small enough, it can be solved by classical methods. The hope is that optimal values and policies for the reduced MDP can be extended to optimal values and policies for the original MDP. The notion of equivalence for stochastic processes is problematic because it requires that the transition probabilities agree exactly. This is not a robust concept, especially considering that usually, the numbers used in probabilistic models come from experimentation or are approximate estimates; what is needed is a quantitative notion of equivalence. In our work we provide such a notion via semimetrics, distance functions on the state space that assign distances quantifying “how equivalent” states are. These semimetrics could potentially be used as a new theoretical tool to analyze current state compression algorithms for MDPs, or in practice to guide state aggregation directly. The ultimate goal of this research is to efficiently compress and analyze continuous state space MDPs. Here we focus on finite MDPs, but note that most of our results should hold, with slight modifications, in the context of continuous state spaces. Recent MDP research on defining equivalence relations on MDPs (Givan, Dean, & Greig 2003) has built on the notion of strong probabilistic bisimulation from concurrency theory. Bisimulation was introduced by Larsen and Skou (1991) based on ideas of Park (1981) and Milner (1980).
cdn.aaai.org
Showing the best result for this search. See all results