Bisimulation for Markov Decision Processes through Families of Functional Expressions

Ferns, Norm; Precup, Doina; Knight, Sophia

doi:10.1007/978-3-319-06880-0_17

Bisimulation for Markov Decision Processes through Families of Functional Expressions

Norm Ferns¹⁹,
Doina Precup²⁰ &
Sophia Knight²¹

Chapter

952 Accesses
3 Citations

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8464))

Abstract

We transfer a notion of quantitative bisimilarity for labelled Markov processes [1] to Markov decision processes with continuous state spaces. This notion takes the form of a pseudometric on the system states, cast in terms of the equivalence of a family of functional expressions evaluated on those states and interpreted as a real-valued modal logic. Our proof amounts to a slight modification of previous techniques [2,3] used to prove equivalence with a fixed-point pseudometric on the state-space of a labelled Markov process and making heavy use of the Kantorovich probability metric. Indeed, we again demonstrate equivalence with a fixed-point pseudometric defined on Markov decision processes [4] ; what is novel is that we recast this proof in terms of integral probability metrics [5] defined through the family of functional expressions, shifting emphasis back to properties of such families. The hope is that a judicious choice of family might lead to something more computationally tractable than bisimilarity whilst maintaining its pleasing theoretical guarantees. Moreover, we use a trick from descriptive set theory to extend our results to MDPs with bounded measurable reward functions, dropping a previous continuity constraint on rewards and Markov kernels.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Desharnais, J., Jagadeesan, R., Gupta, V., Panangaden, P.: The Metric Analogue of Weak Bisimulation for Probabilistic Processes. In: LICS 2002: Proceedings of the 17th Annual IEEE Symposium on Logic in Computer Science, July 22-25, pp. 413–422. IEEE Computer Society, Washington, DC (2002)
Google Scholar
van Breugel, F., Worrell, J.: Towards Quantitative Verification of Probabilistic Transition Systems. In: Orejas, F., Spirakis, P.G., van Leeuwen, J. (eds.) ICALP 2001. LNCS, vol. 2076, pp. 421–432. Springer, Heidelberg (2001a)
Chapter Google Scholar
van Breugel, F., Worrell, J.: An Algorithm for Quantitative Verification of Probabilistic Transition Systems. In: Larsen, K.G., Nielsen, M. (eds.) CONCUR 2001. LNCS, vol. 2154, pp. 336–350. Springer, Heidelberg (2001b)
Chapter Google Scholar
Ferns, N., Panangaden, P., Precup, D.: Bisimulation Metrics for Continuous Markov Decision Processes. SIAM Journal on Computing 40(6), 1662–1714 (2011)
Article MathSciNet MATH Google Scholar
Müller, A.: Integral Probability Metrics and Their Generating Classes of Functions. Advances in Applied Probability 29, 429–443 (1997)
Article MathSciNet MATH Google Scholar
Larsen, K.G., Skou, A.: Bisimulation Through Probabilistic Testing. Information and Computation 94(1), 1–28 (1991)
Article MathSciNet MATH Google Scholar
Milner, R.: A Calculus of Communication Systems. LNCS, vol. 92. Springer, New York (1980)
Book MATH Google Scholar
Park, D.: Concurrency and Automata on Infinite Sequences. In: Proceedings of the 5th GI-Conference on Theoretical Computer Science, pp. 167–183. Springer, London (1981)
Chapter Google Scholar
Desharnais, J., Gupta, V., Jagadeesan, R., Panangaden, P.: Metrics for Labeled Markov Systems. In: Baeten, J.C.M., Mauw, S. (eds.) CONCUR 1999. LNCS, vol. 1664, pp. 258–273. Springer, Heidelberg (1999)
Chapter Google Scholar
Desharnais, J., Gupta, V., Jagadeesan, R., Panangaden, P.: Metrics for Labelled Markov Processes. Theor. Comput. Sci. 318(3), 323–354 (2004)
Article MathSciNet MATH Google Scholar
van Breugel, F., Hermida, C., Makkai, M., Worrell, J.: Recursively Defined Metric Spaces Without Contraction. Theoretical Computer Science 380(1-2), 143–163 (2007)
Article MathSciNet MATH Google Scholar
Kozen, D.: A Probabilistic PDL. In: STOC 1983: Proceedings of the Fifteenth Annual ACM Symposium on Theory of Computing, pp. 291–297. ACM, New York (1983)
Chapter Google Scholar
van Breugel, F., Sharma, B., Worrell, J.B.: Approximating a Behavioural Pseudometric Without Discount for Probabilistic Systems. In: Seidl, H. (ed.) FOSSACS 2007. LNCS, vol. 4423, pp. 123–137. Springer, Heidelberg (2007)
Chapter Google Scholar
Ferns, N., Panangaden, P., Precup, D.: Metrics for Finite Markov Decision Processes. In: AUAI 2004: Proceedings of the 20th Annual Conference on Uncertainty in Artificial Intelligence, Arlington, Virginia, United States, pp. 162–169. AUAI Press (2004)
Google Scholar
Ferns, N., Panangaden, P., Precup, D.: Metrics for Markov Decision Processes with Infinite State Spaces. In: Proceedings of the 21 Annual Conference on Uncertainty in Artificial Intelligence (UAI 2005), Arlington, Virginia, pp. 201–208. AUAI Press (2005)
Google Scholar
Ferns, N., Castro, P.S., Precup, D., Panangaden, P.: Methods for Computing State Similarity in Markov Decision Processes. In: Proceedings of the 22nd Annual Conference on Uncertainty in Artificial Intelligence (UAI 2006), Arlington, Virginia. AUAI Press, Arlington (2006)
Google Scholar
Castronovo, M., Maes, F., Ernst., R.F.,, D.: Learning Exploration/Exploitation Strategies for Single Trajectory Reinforcement Learning. In: Proceedings of the 10th European Workshop on Reinforcement Learning (EWRL 2012), Edinburgh, Scotland, June 30-July 1, vol. 24, pp. 1–10 (2012)
Google Scholar
Panangaden, P.: Labelled Markov Processes. Imperial College Press (2009)
Google Scholar
Giry, M.: A Categorical Approach to Probability Theory. Categorical Aspects of Topology and Analysis, pp. 68–85 (1982)
Google Scholar
Billingsley, P.: Convergence of Probability Measures. Wiley (1968)
Google Scholar
Dudley, R.M.: Real Analysis and Probability. Cambridge University Press (August 2002)
Google Scholar
Desharnais, J.: Labelled Markov Processes. PhD thesis, McGill University (2000)
Google Scholar
Desharnais, J., Edalat, A., Panangaden, P.: Bisimulation for Labeled Markov Processes. Information and Computation 179(2), 163–193 (2002)
Article MathSciNet MATH Google Scholar
Gibbs, A.L., Su, F.E.: On Choosing and Bounding Probability Metrics. International Statistical Review 70, 419–435 (2002)
Article MATH Google Scholar
Villani, C.: Topics in Optimal Transportation. Graduate Studies in Mathematics, vol. 58. American Mathematical Society (2003)
Google Scholar
Hernández-Lerma, O., Lasserre, J.B.: Further Topics on Discrete-Time Markov Control Processes. Applications of Mathematics. Springer, New York (1999)
Book MATH Google Scholar
Srivastava, S.M.: A Course on Borel Sets. Graduate texts in mathematics, vol. 180. Springer (2008)
Google Scholar
Bertsekas, D.P., Shreve, S.E.: Stochastic Optimal Control: The Discrete-Time Case. Athena Scientific (2007)
Google Scholar
Parthasarathy, K.R.: Probability Measures on Metric Spaces. Academic, New York (1967)
Book MATH Google Scholar
Chen, D., van Breugel, F., Worrell, J.: On the Complexity of Computing Probabilistic Bisimilarity. In: Birkedal, L. (ed.) FOSSACS 2012. LNCS, vol. 7213, pp. 437–451. Springer, Heidelberg (2012)
Chapter Google Scholar
Puterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley & Sons, Inc., New York (1994)
Book MATH Google Scholar
Sriperumbudur, B.K., Fukumizu, K., Gretton, A., Schölkopf, B., Lanckriet, G.R.G.: On the Empirical Estimation of Integral Probability Metrics. Electronic Journal of Statistics 6, 1550–1599 (2012)
Article MathSciNet MATH Google Scholar
Bouchard-Côté, A., Ferns, N., Panangaden, P., Precup, D.: An Approximation Algorithm for Labelled Markov Processes: Towards Realistic Approximation. In: QEST 2005: Proceedings of the Second International Conference on the Quantitative Evaluation of Systems (QEST 2005) on The Quantitative Evaluation of Systems, pp. 54–61. IEEE Computer Society, Washington, DC (2005)
Google Scholar
Müller, A.: Stochastic Orders Generated by Integrals: A Unified Study. Advances in Applied Probability 29, 414–428 (1997)
Article MathSciNet MATH Google Scholar
Chaput, P., Danos, V., Panangaden, P., Plotkin, G.: Approximating Markov Processes by Averaging. In: Albers, S., Marchetti-Spaccamela, A., Matias, Y., Nikoletseas, S., Thomas, W. (eds.) ICALP 2009, Part II. LNCS, vol. 5556, pp. 127–138. Springer, Heidelberg (2009)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Département d’Informatique, École Normale Supérieure, 45 rue d’Ulm, F-75230, Paris Cedex 05, France
Norm Ferns
School of Computer Science, McGill University, Montréal, Canada, H3A 2A7
Doina Precup
CNRS, LORIA, Université de Lorraine, Nancy, France
Sophia Knight

Authors

Norm Ferns
View author publications
You can also search for this author in PubMed Google Scholar
Doina Precup
View author publications
You can also search for this author in PubMed Google Scholar
Sophia Knight
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Electrical Engineering and Computer Science, York University, Keele Street, 4700, M3J 1P3, Toronto, ON, Canada
Franck van Breugel
School of Informatics, Informatics Forum, University of Edinburgh, 10 Crichton Street, EH8 9LE, Edinburgh, UK
Elham Kashefi
Inria Saclay, Campus de l’École Polytechnique, Bâtiment Alan Turing, 1, rue Honoré d’Estienne d’Orves, 91120, Palaiseau, France
Catuscia Palamidessi
CWI, P.O. Box 94079, 1090, Amsterdam, GB, The Netherlands
Jan Rutten

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Ferns, N., Precup, D., Knight, S. (2014). Bisimulation for Markov Decision Processes through Families of Functional Expressions. In: van Breugel, F., Kashefi, E., Palamidessi, C., Rutten, J. (eds) Horizons of the Mind. A Tribute to Prakash Panangaden. Lecture Notes in Computer Science, vol 8464. Springer, Cham. https://doi.org/10.1007/978-3-319-06880-0_17

Download citation

DOI: https://doi.org/10.1007/978-3-319-06880-0_17
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-06879-4
Online ISBN: 978-3-319-06880-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics