The Variance-Penalized Stochastic Shortest Path Problem

Authors Jakob Piribauer , Ocan Sankur , Christel Baier



PDF
Thumbnail PDF

File

LIPIcs.ICALP.2022.129.pdf
  • Filesize: 0.77 MB
  • 19 pages

Document Identifiers

Author Details

Jakob Piribauer
  • Technische Universität Dresden, Germany
Ocan Sankur
  • Univ Rennes, Inria, CNRS, IRISA, France
Christel Baier
  • Technische Universität Dresden, Germany

Cite As Get BibTex

Jakob Piribauer, Ocan Sankur, and Christel Baier. The Variance-Penalized Stochastic Shortest Path Problem. In 49th International Colloquium on Automata, Languages, and Programming (ICALP 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 229, pp. 129:1-129:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022) https://doi.org/10.4230/LIPIcs.ICALP.2022.129

Abstract

The stochastic shortest path problem (SSPP) asks to resolve the non-deterministic choices in a Markov decision process (MDP) such that the expected accumulated weight before reaching a target state is maximized. This paper addresses the optimization of the variance-penalized expectation (VPE) of the accumulated weight, which is a variant of the SSPP in which a multiple of the variance of accumulated weights is incurred as a penalty. It is shown that the optimal VPE in MDPs with non-negative weights as well as an optimal deterministic finite-memory scheduler can be computed in exponential space. The threshold problem whether the maximal VPE exceeds a given rational is shown to be EXPTIME-hard and to lie in NEXPTIME. Furthermore, a result of interest in its own right obtained on the way is that a variance-minimal scheduler among all expectation-optimal schedulers can be computed in polynomial time.

Subject Classification

ACM Subject Classification
  • Theory of computation → Verification by model checking
Keywords
  • Markov decision process
  • variance
  • stochastic shortest path problem

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Mohamadreza Ahmadi, Anushri Dixit, Joel W Burdick, and Aaron D Ames. Risk-averse stochastic shortest path planning. arXiv, 2021. URL: http://arxiv.org/abs/2103.14727.
  2. Kenneth J. Arrow. Essays in the Theory of Risk-Bearing. Amsterdam, North-Holland Pub. Co., 1970. Google Scholar
  3. Christel Baier, Nathalie Bertrand, Clemens Dubslaff, Daniel Gburek, and Ocan Sankur. Stochastic shortest paths and weight-bounded properties in Markov decision processes. In 33rd Annual ACM/IEEE Symposium on Logic in Computer Science (LICS), pages 86-94. ACM, 2018. Google Scholar
  4. Christel Baier, Joachim Klein, Sascha Klüppelholz, and Sascha Wunderlich. Maximizing the conditional expected reward for reaching the goal. In Axel Legay and Tiziana Margaria, editors, 23rd International Conference on Tools and Algorithms for the Construction and Analysis of Systems (TACAS), volume 10206 of Lecture Notes in Computer Science, pages 269-285. Springer, 2017. Google Scholar
  5. Dimitri P. Bertsekas and John N. Tsitsiklis. An analysis of stochastic shortest path problems. Mathematics of Operations Research, 16(3):580-595, 1991. Google Scholar
  6. Tomáš Brázdil, Krishnendu Chatterjee, Vojtěch Forejt, and Antonín Kučera. Trading performance for stability in Markov decision processes. Journal of Computer and System Sciences, 84:144-170, 2017. URL: https://doi.org/10.1016/j.jcss.2016.09.009.
  7. EJ Collins. Finite-horizon variance penalised Markov decision processes. Operations-Research-Spektrum, 19(1):35-39, 1997. Google Scholar
  8. Luca de Alfaro. Computing minimum and maximum reachability times in probabilistic systems. In 10th International Conference on Concurrency Theory (CONCUR), volume 1664 of Lecture Notes in Computer Science, pages 66-81, 1999. Google Scholar
  9. Jerzy A Filar, Lodewijk CM Kallenberg, and Huey-Miin Lee. Variance-penalized Markov decision processes. Mathematics of Operations Research, 14(1):147-161, 1989. Google Scholar
  10. William N Goetzmann, Stephen J Brown, Martin J Gruber, and Edwin J Elton. Modern portfolio theory and investment analysis. John Wiley & Sons, 2014. Google Scholar
  11. Christoph Haase and Stefan Kiefer. The odds of staying on budget. In 42nd International Colloquium on Automata, Languages, and Programming (ICALP), volume 9135 of Lecture Notes in Computer Science, pages 234-246. Springer, 2015. Google Scholar
  12. Stratton C. Jaquette. Markov Decision Processes with a New Optimality Criterion: Discrete Time. The Annals of Statistics, 1(3):496-505, 1973. URL: https://doi.org/10.1214/aos/1176342415.
  13. Lodewijk Kallenberg. Markov Decision Processes. Lecture Notes. University of Leiden, 2011. Google Scholar
  14. Jan Kretínský and Tobias Meggendorfer. Conditional value-at-risk for reachability and mean payoff in Markov decision processes. In 33rd Annual ACM/IEEE Symposium on Logic in Computer Science (LICS), pages 609-618. ACM, 2018. URL: https://doi.org/10.1145/3209108.3209176.
  15. Masami Kurano. Markov decision processes with a minimum-variance criterion. Journal of mathematical analysis and applications, 123(2):572-583, 1987. Google Scholar
  16. Petr Mandl. On the variance in controlled Markov chains. Kybernetika, 7(1):1-12, 1971. Google Scholar
  17. Shie Mannor and John N. Tsitsiklis. Mean-variance optimization in Markov decision processes. In Proceedings of the 28th International Conference on Machine Learning, ICML'11, pages 177-184, Madison, WI, USA, 2011. Omnipress. Google Scholar
  18. Harry Markowitz. Portfolio selection. The Journal of Finance, 7(1):77-91, 1952. URL: http://www.jstor.org/stable/2975974.
  19. Jakob Piribauer. On Non-Classical Stochastic Shortest Path Problems. PhD thesis, Technische Universität Dresden, Germany, 2021. Google Scholar
  20. Jakob Piribauer and Christel Baier. Partial and conditional expectations in Markov decision processes with integer weights. In Mikolaj Bojanczyk and Alex Simpson, editors, 22nd International Conference on Foundations of Software Science and Computation Structures (FoSSaCS), volume 11425 of Lecture Notes in Computer Science, pages 436-452. Springer, 2019. Google Scholar
  21. Jakob Piribauer and Christel Baier. On Skolem-hardness and saturation points in Markov decision processes. In Artur Czumaj, Anuj Dawar, and Emanuela Merelli, editors, 47th International Colloquium on Automata, Languages, and Programming (ICALP), volume 168 of LIPIcs, pages 138:1-138:17. Schloss Dagstuhl-Leibniz-Zentrum für Informatik, 2020. URL: https://doi.org/10.4230/LIPIcs.ICALP.2020.138.
  22. Jakob Piribauer, Ocan Sankur, and Christel Baier. The variance-penalized stochastic shortest path problem. arXiv:2204.12280, 2022. URL: https://doi.org/10.48550/ARXIV.2204.12280.
  23. John W. Pratt. Risk aversion in the small and in the large. Econometrica, 32(1/2):122-136, 1964. URL: http://www.jstor.org/stable/1913738.
  24. Martin L. Puterman. Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley & Sons, 1994. Google Scholar
  25. Matthew J. Sobel. The variance of discounted markov decision processes. Journal of Applied Probability, 19(4):794-802, 1982. URL: https://doi.org/10.2307/3213832.
  26. Matthew J. Sobel. Mean-variance tradeoffs in an undiscounted mdp. Operations Research, 42(1):175-183, 1994. URL: https://doi.org/10.1287/opre.42.1.175.
  27. Michael Ummels and Christel Baier. Computing quantiles in Markov reward models. In Frank Pfenning, editor, 16th International Conference on Foundations of Software Science and Computation Structures (FoSSaCS), volume 7794 of Lecture Notes in Computer Science, pages 353-368. Springer, 2013. URL: https://doi.org/10.1007/978-3-642-37075-5_23.
  28. Tom Verhoeff. Reward variance in Markov chains: A calculational approach. In Proceedings of Eindhoven FASTAR Days. Citeseer, 2004. Google Scholar
  29. Xiao Wu and Xianping Guo. First passage optimality and variance minimisation of Markov decision processes with varying discount factors. Journal of Applied Probability, 52(2):441-456, 2015. URL: https://doi.org/10.1239/jap/1437658608.
  30. Li Xia. Optimization of Markov decision processes under the variance criterion. Automatica, 73:269-278, 2016. URL: https://doi.org/10.1016/j.automatica.2016.06.018.
  31. Li Xia. Mean-variance optimization of discrete time discounted Markov decision processes. Automatica, 88:76-82, 2018. Google Scholar
  32. Li Xia. Variance minimization of parameterized Markov decision processes. Discrete Event Dynamic Systems, 28(1):63-81, 2018. URL: https://doi.org/10.1007/s10626-017-0258-5.
  33. Li Xia. Risk-sensitive Markov decision processes with combined metrics of mean and variance. Production and Operations Management, 29(12):2808-2827, 2020. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail