Skip to main content

Bisimulation for Markov Decision Processes through Families of Functional Expressions

  • Chapter

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8464))

Abstract

We transfer a notion of quantitative bisimilarity for labelled Markov processes [1] to Markov decision processes with continuous state spaces. This notion takes the form of a pseudometric on the system states, cast in terms of the equivalence of a family of functional expressions evaluated on those states and interpreted as a real-valued modal logic. Our proof amounts to a slight modification of previous techniques [2,3] used to prove equivalence with a fixed-point pseudometric on the state-space of a labelled Markov process and making heavy use of the Kantorovich probability metric. Indeed, we again demonstrate equivalence with a fixed-point pseudometric defined on Markov decision processes [4] ; what is novel is that we recast this proof in terms of integral probability metrics [5] defined through the family of functional expressions, shifting emphasis back to properties of such families. The hope is that a judicious choice of family might lead to something more computationally tractable than bisimilarity whilst maintaining its pleasing theoretical guarantees. Moreover, we use a trick from descriptive set theory to extend our results to MDPs with bounded measurable reward functions, dropping a previous continuity constraint on rewards and Markov kernels.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Desharnais, J., Jagadeesan, R., Gupta, V., Panangaden, P.: The Metric Analogue of Weak Bisimulation for Probabilistic Processes. In: LICS 2002: Proceedings of the 17th Annual IEEE Symposium on Logic in Computer Science, July 22-25, pp. 413–422. IEEE Computer Society, Washington, DC (2002)

    Google Scholar 

  2. van Breugel, F., Worrell, J.: Towards Quantitative Verification of Probabilistic Transition Systems. In: Orejas, F., Spirakis, P.G., van Leeuwen, J. (eds.) ICALP 2001. LNCS, vol. 2076, pp. 421–432. Springer, Heidelberg (2001a)

    Chapter  Google Scholar 

  3. van Breugel, F., Worrell, J.: An Algorithm for Quantitative Verification of Probabilistic Transition Systems. In: Larsen, K.G., Nielsen, M. (eds.) CONCUR 2001. LNCS, vol. 2154, pp. 336–350. Springer, Heidelberg (2001b)

    Chapter  Google Scholar 

  4. Ferns, N., Panangaden, P., Precup, D.: Bisimulation Metrics for Continuous Markov Decision Processes. SIAM Journal on Computing 40(6), 1662–1714 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  5. Müller, A.: Integral Probability Metrics and Their Generating Classes of Functions. Advances in Applied Probability 29, 429–443 (1997)

    Article  MathSciNet  MATH  Google Scholar 

  6. Larsen, K.G., Skou, A.: Bisimulation Through Probabilistic Testing. Information and Computation 94(1), 1–28 (1991)

    Article  MathSciNet  MATH  Google Scholar 

  7. Milner, R.: A Calculus of Communication Systems. LNCS, vol. 92. Springer, New York (1980)

    Book  MATH  Google Scholar 

  8. Park, D.: Concurrency and Automata on Infinite Sequences. In: Proceedings of the 5th GI-Conference on Theoretical Computer Science, pp. 167–183. Springer, London (1981)

    Chapter  Google Scholar 

  9. Desharnais, J., Gupta, V., Jagadeesan, R., Panangaden, P.: Metrics for Labeled Markov Systems. In: Baeten, J.C.M., Mauw, S. (eds.) CONCUR 1999. LNCS, vol. 1664, pp. 258–273. Springer, Heidelberg (1999)

    Chapter  Google Scholar 

  10. Desharnais, J., Gupta, V., Jagadeesan, R., Panangaden, P.: Metrics for Labelled Markov Processes. Theor. Comput. Sci. 318(3), 323–354 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  11. van Breugel, F., Hermida, C., Makkai, M., Worrell, J.: Recursively Defined Metric Spaces Without Contraction. Theoretical Computer Science 380(1-2), 143–163 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  12. Kozen, D.: A Probabilistic PDL. In: STOC 1983: Proceedings of the Fifteenth Annual ACM Symposium on Theory of Computing, pp. 291–297. ACM, New York (1983)

    Chapter  Google Scholar 

  13. van Breugel, F., Sharma, B., Worrell, J.B.: Approximating a Behavioural Pseudometric Without Discount for Probabilistic Systems. In: Seidl, H. (ed.) FOSSACS 2007. LNCS, vol. 4423, pp. 123–137. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  14. Ferns, N., Panangaden, P., Precup, D.: Metrics for Finite Markov Decision Processes. In: AUAI 2004: Proceedings of the 20th Annual Conference on Uncertainty in Artificial Intelligence, Arlington, Virginia, United States, pp. 162–169. AUAI Press (2004)

    Google Scholar 

  15. Ferns, N., Panangaden, P., Precup, D.: Metrics for Markov Decision Processes with Infinite State Spaces. In: Proceedings of the 21 Annual Conference on Uncertainty in Artificial Intelligence (UAI 2005), Arlington, Virginia, pp. 201–208. AUAI Press (2005)

    Google Scholar 

  16. Ferns, N., Castro, P.S., Precup, D., Panangaden, P.: Methods for Computing State Similarity in Markov Decision Processes. In: Proceedings of the 22nd Annual Conference on Uncertainty in Artificial Intelligence (UAI 2006), Arlington, Virginia. AUAI Press, Arlington (2006)

    Google Scholar 

  17. Castronovo, M., Maes, F., Ernst., R.F.,, D.: Learning Exploration/Exploitation Strategies for Single Trajectory Reinforcement Learning. In: Proceedings of the 10th European Workshop on Reinforcement Learning (EWRL 2012), Edinburgh, Scotland, June 30-July 1, vol. 24, pp. 1–10 (2012)

    Google Scholar 

  18. Panangaden, P.: Labelled Markov Processes. Imperial College Press (2009)

    Google Scholar 

  19. Giry, M.: A Categorical Approach to Probability Theory. Categorical Aspects of Topology and Analysis, pp. 68–85 (1982)

    Google Scholar 

  20. Billingsley, P.: Convergence of Probability Measures. Wiley (1968)

    Google Scholar 

  21. Dudley, R.M.: Real Analysis and Probability. Cambridge University Press (August 2002)

    Google Scholar 

  22. Desharnais, J.: Labelled Markov Processes. PhD thesis, McGill University (2000)

    Google Scholar 

  23. Desharnais, J., Edalat, A., Panangaden, P.: Bisimulation for Labeled Markov Processes. Information and Computation 179(2), 163–193 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  24. Gibbs, A.L., Su, F.E.: On Choosing and Bounding Probability Metrics. International Statistical Review 70, 419–435 (2002)

    Article  MATH  Google Scholar 

  25. Villani, C.: Topics in Optimal Transportation. Graduate Studies in Mathematics, vol. 58. American Mathematical Society (2003)

    Google Scholar 

  26. Hernández-Lerma, O., Lasserre, J.B.: Further Topics on Discrete-Time Markov Control Processes. Applications of Mathematics. Springer, New York (1999)

    Book  MATH  Google Scholar 

  27. Srivastava, S.M.: A Course on Borel Sets. Graduate texts in mathematics, vol. 180. Springer (2008)

    Google Scholar 

  28. Bertsekas, D.P., Shreve, S.E.: Stochastic Optimal Control: The Discrete-Time Case. Athena Scientific (2007)

    Google Scholar 

  29. Parthasarathy, K.R.: Probability Measures on Metric Spaces. Academic, New York (1967)

    Book  MATH  Google Scholar 

  30. Chen, D., van Breugel, F., Worrell, J.: On the Complexity of Computing Probabilistic Bisimilarity. In: Birkedal, L. (ed.) FOSSACS 2012. LNCS, vol. 7213, pp. 437–451. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  31. Puterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley & Sons, Inc., New York (1994)

    Book  MATH  Google Scholar 

  32. Sriperumbudur, B.K., Fukumizu, K., Gretton, A., Schölkopf, B., Lanckriet, G.R.G.: On the Empirical Estimation of Integral Probability Metrics. Electronic Journal of Statistics 6, 1550–1599 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  33. Bouchard-Côté, A., Ferns, N., Panangaden, P., Precup, D.: An Approximation Algorithm for Labelled Markov Processes: Towards Realistic Approximation. In: QEST 2005: Proceedings of the Second International Conference on the Quantitative Evaluation of Systems (QEST 2005) on The Quantitative Evaluation of Systems, pp. 54–61. IEEE Computer Society, Washington, DC (2005)

    Google Scholar 

  34. Müller, A.: Stochastic Orders Generated by Integrals: A Unified Study. Advances in Applied Probability 29, 414–428 (1997)

    Article  MathSciNet  MATH  Google Scholar 

  35. Chaput, P., Danos, V., Panangaden, P., Plotkin, G.: Approximating Markov Processes by Averaging. In: Albers, S., Marchetti-Spaccamela, A., Matias, Y., Nikoletseas, S., Thomas, W. (eds.) ICALP 2009, Part II. LNCS, vol. 5556, pp. 127–138. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Ferns, N., Precup, D., Knight, S. (2014). Bisimulation for Markov Decision Processes through Families of Functional Expressions. In: van Breugel, F., Kashefi, E., Palamidessi, C., Rutten, J. (eds) Horizons of the Mind. A Tribute to Prakash Panangaden. Lecture Notes in Computer Science, vol 8464. Springer, Cham. https://doi.org/10.1007/978-3-319-06880-0_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-06880-0_17

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-06879-4

  • Online ISBN: 978-3-319-06880-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics