Abstract
We transfer a notion of quantitative bisimilarity for labelled Markov processes [1] to Markov decision processes with continuous state spaces. This notion takes the form of a pseudometric on the system states, cast in terms of the equivalence of a family of functional expressions evaluated on those states and interpreted as a real-valued modal logic. Our proof amounts to a slight modification of previous techniques [2,3] used to prove equivalence with a fixed-point pseudometric on the state-space of a labelled Markov process and making heavy use of the Kantorovich probability metric. Indeed, we again demonstrate equivalence with a fixed-point pseudometric defined on Markov decision processes [4] ; what is novel is that we recast this proof in terms of integral probability metrics [5] defined through the family of functional expressions, shifting emphasis back to properties of such families. The hope is that a judicious choice of family might lead to something more computationally tractable than bisimilarity whilst maintaining its pleasing theoretical guarantees. Moreover, we use a trick from descriptive set theory to extend our results to MDPs with bounded measurable reward functions, dropping a previous continuity constraint on rewards and Markov kernels.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Desharnais, J., Jagadeesan, R., Gupta, V., Panangaden, P.: The Metric Analogue of Weak Bisimulation for Probabilistic Processes. In: LICS 2002: Proceedings of the 17th Annual IEEE Symposium on Logic in Computer Science, July 22-25, pp. 413–422. IEEE Computer Society, Washington, DC (2002)
van Breugel, F., Worrell, J.: Towards Quantitative Verification of Probabilistic Transition Systems. In: Orejas, F., Spirakis, P.G., van Leeuwen, J. (eds.) ICALP 2001. LNCS, vol. 2076, pp. 421–432. Springer, Heidelberg (2001a)
van Breugel, F., Worrell, J.: An Algorithm for Quantitative Verification of Probabilistic Transition Systems. In: Larsen, K.G., Nielsen, M. (eds.) CONCUR 2001. LNCS, vol. 2154, pp. 336–350. Springer, Heidelberg (2001b)
Ferns, N., Panangaden, P., Precup, D.: Bisimulation Metrics for Continuous Markov Decision Processes. SIAM Journal on Computing 40(6), 1662–1714 (2011)
Müller, A.: Integral Probability Metrics and Their Generating Classes of Functions. Advances in Applied Probability 29, 429–443 (1997)
Larsen, K.G., Skou, A.: Bisimulation Through Probabilistic Testing. Information and Computation 94(1), 1–28 (1991)
Milner, R.: A Calculus of Communication Systems. LNCS, vol. 92. Springer, New York (1980)
Park, D.: Concurrency and Automata on Infinite Sequences. In: Proceedings of the 5th GI-Conference on Theoretical Computer Science, pp. 167–183. Springer, London (1981)
Desharnais, J., Gupta, V., Jagadeesan, R., Panangaden, P.: Metrics for Labeled Markov Systems. In: Baeten, J.C.M., Mauw, S. (eds.) CONCUR 1999. LNCS, vol. 1664, pp. 258–273. Springer, Heidelberg (1999)
Desharnais, J., Gupta, V., Jagadeesan, R., Panangaden, P.: Metrics for Labelled Markov Processes. Theor. Comput. Sci. 318(3), 323–354 (2004)
van Breugel, F., Hermida, C., Makkai, M., Worrell, J.: Recursively Defined Metric Spaces Without Contraction. Theoretical Computer Science 380(1-2), 143–163 (2007)
Kozen, D.: A Probabilistic PDL. In: STOC 1983: Proceedings of the Fifteenth Annual ACM Symposium on Theory of Computing, pp. 291–297. ACM, New York (1983)
van Breugel, F., Sharma, B., Worrell, J.B.: Approximating a Behavioural Pseudometric Without Discount for Probabilistic Systems. In: Seidl, H. (ed.) FOSSACS 2007. LNCS, vol. 4423, pp. 123–137. Springer, Heidelberg (2007)
Ferns, N., Panangaden, P., Precup, D.: Metrics for Finite Markov Decision Processes. In: AUAI 2004: Proceedings of the 20th Annual Conference on Uncertainty in Artificial Intelligence, Arlington, Virginia, United States, pp. 162–169. AUAI Press (2004)
Ferns, N., Panangaden, P., Precup, D.: Metrics for Markov Decision Processes with Infinite State Spaces. In: Proceedings of the 21 Annual Conference on Uncertainty in Artificial Intelligence (UAI 2005), Arlington, Virginia, pp. 201–208. AUAI Press (2005)
Ferns, N., Castro, P.S., Precup, D., Panangaden, P.: Methods for Computing State Similarity in Markov Decision Processes. In: Proceedings of the 22nd Annual Conference on Uncertainty in Artificial Intelligence (UAI 2006), Arlington, Virginia. AUAI Press, Arlington (2006)
Castronovo, M., Maes, F., Ernst., R.F.,, D.: Learning Exploration/Exploitation Strategies for Single Trajectory Reinforcement Learning. In: Proceedings of the 10th European Workshop on Reinforcement Learning (EWRL 2012), Edinburgh, Scotland, June 30-July 1, vol. 24, pp. 1–10 (2012)
Panangaden, P.: Labelled Markov Processes. Imperial College Press (2009)
Giry, M.: A Categorical Approach to Probability Theory. Categorical Aspects of Topology and Analysis, pp. 68–85 (1982)
Billingsley, P.: Convergence of Probability Measures. Wiley (1968)
Dudley, R.M.: Real Analysis and Probability. Cambridge University Press (August 2002)
Desharnais, J.: Labelled Markov Processes. PhD thesis, McGill University (2000)
Desharnais, J., Edalat, A., Panangaden, P.: Bisimulation for Labeled Markov Processes. Information and Computation 179(2), 163–193 (2002)
Gibbs, A.L., Su, F.E.: On Choosing and Bounding Probability Metrics. International Statistical Review 70, 419–435 (2002)
Villani, C.: Topics in Optimal Transportation. Graduate Studies in Mathematics, vol. 58. American Mathematical Society (2003)
Hernández-Lerma, O., Lasserre, J.B.: Further Topics on Discrete-Time Markov Control Processes. Applications of Mathematics. Springer, New York (1999)
Srivastava, S.M.: A Course on Borel Sets. Graduate texts in mathematics, vol. 180. Springer (2008)
Bertsekas, D.P., Shreve, S.E.: Stochastic Optimal Control: The Discrete-Time Case. Athena Scientific (2007)
Parthasarathy, K.R.: Probability Measures on Metric Spaces. Academic, New York (1967)
Chen, D., van Breugel, F., Worrell, J.: On the Complexity of Computing Probabilistic Bisimilarity. In: Birkedal, L. (ed.) FOSSACS 2012. LNCS, vol. 7213, pp. 437–451. Springer, Heidelberg (2012)
Puterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley & Sons, Inc., New York (1994)
Sriperumbudur, B.K., Fukumizu, K., Gretton, A., Schölkopf, B., Lanckriet, G.R.G.: On the Empirical Estimation of Integral Probability Metrics. Electronic Journal of Statistics 6, 1550–1599 (2012)
Bouchard-Côté, A., Ferns, N., Panangaden, P., Precup, D.: An Approximation Algorithm for Labelled Markov Processes: Towards Realistic Approximation. In: QEST 2005: Proceedings of the Second International Conference on the Quantitative Evaluation of Systems (QEST 2005) on The Quantitative Evaluation of Systems, pp. 54–61. IEEE Computer Society, Washington, DC (2005)
Müller, A.: Stochastic Orders Generated by Integrals: A Unified Study. Advances in Applied Probability 29, 414–428 (1997)
Chaput, P., Danos, V., Panangaden, P., Plotkin, G.: Approximating Markov Processes by Averaging. In: Albers, S., Marchetti-Spaccamela, A., Matias, Y., Nikoletseas, S., Thomas, W. (eds.) ICALP 2009, Part II. LNCS, vol. 5556, pp. 127–138. Springer, Heidelberg (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Ferns, N., Precup, D., Knight, S. (2014). Bisimulation for Markov Decision Processes through Families of Functional Expressions. In: van Breugel, F., Kashefi, E., Palamidessi, C., Rutten, J. (eds) Horizons of the Mind. A Tribute to Prakash Panangaden. Lecture Notes in Computer Science, vol 8464. Springer, Cham. https://doi.org/10.1007/978-3-319-06880-0_17
Download citation
DOI: https://doi.org/10.1007/978-3-319-06880-0_17
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-06879-4
Online ISBN: 978-3-319-06880-0
eBook Packages: Computer ScienceComputer Science (R0)