[PDF][PDF] Markov decision processes
L Kallenberg - Lecture Notes. University of Leiden, 2011 - researchgate.net
L Kallenberg
Lecture Notes. University of Leiden, 2011•researchgate.netBranching out from operations research roots of the 1950's, Markov decision processes
(MDPs) have gained recognition in such diverse fields as economics, telecommunication,
engineering and ecology. These applications have been accompanied by many theoretical
advances. Markov decision processes, also referred to as stochastic dynamic programming
or stochastic control problems, are models for sequential decision making when outcomes
are uncertain. The Markov decision process model consists of decision epochs, states …
(MDPs) have gained recognition in such diverse fields as economics, telecommunication,
engineering and ecology. These applications have been accompanied by many theoretical
advances. Markov decision processes, also referred to as stochastic dynamic programming
or stochastic control problems, are models for sequential decision making when outcomes
are uncertain. The Markov decision process model consists of decision epochs, states …
Branching out from operations research roots of the 1950’s, Markov decision processes (MDPs) have gained recognition in such diverse fields as economics, telecommunication, engineering and ecology. These applications have been accompanied by many theoretical advances. Markov decision processes, also referred to as stochastic dynamic programming or stochastic control problems, are models for sequential decision making when outcomes are uncertain. The Markov decision process model consists of decision epochs, states, actions, transition probabilities and rewards. Choosing an action in a state generates a reward and determines the state at the next decision epoch through a transition probability function. Policies or strategies are prescriptions of which action to choose under any eventuality at every future decision epoch. Decision makers seek policies which are optimal in some sense. These lecture notes aim to present a unified treatment of the theoretical and algorithmic aspects of Markov decision process models. It can serve as a text for an advanced undergraduate or graduate level course in operations research, econometrics or control engineering. As a prerequisite, the reader should have some background in linear algebra, real analysis, probability and linear programming. Throughout the text there are a lot of examples. At the end of each chapter there is a section with bibliographic notes and a section with exercises. A solution manual is available on request (e-mail to kallenberg@ math. leidenuniv. nl). Chapter 1 introduces the Markov decision process model as a sequential decision model with actions, transitions, rewards and policies. We illustrate these concepts with nine different applications: red-black gambling, how-to-serve in tennis, optimal stopping, replacement problems, maintenance and repair, production control, optimal control of queues, stochastic scheduling and the multi-armed bandit problem.
Chapter 2 deals with the finite horizon model with nonstationary transitions and rewards, and the principle of dynamic programming, also called backward induction. We present an equivalent stationary infinite horizon model. We also study under which conditions optimal policies are monotone, ie, nondecreasing or nonincreasing in the ordering of the state space. We also consider the problem of finding the K best policies. In the last section of this chapter we consider the relation of the finite horizon MDP and the shortest path problem in a directed graph. In chapter 3 the discounted rewards over an infinite horizon are studied. This results in the optimality equation and methods to solve this equation: policy iteration, linear programming, value iteration and modified value iteration. Furthermore, we study under which conditions monotone optimal policies exist. A central concept in discounted Markov decision processes is
researchgate.net
Showing the best result for this search. See all results