Abstract
In the context of probabilistic verification, we provide a new notion of trace-equivalence divergence between pairs of Labelled Markov processes. This divergence corresponds to the optimal value of a particular derived Markov Decision Process. It can therefore be estimated by Reinforcement Learning methods. Moreover, we provide some PAC-guarantees on this estimation.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Bellman, R.E.: Dynamic Programming. Dover Publications, Incorporated (2003)
Blute, R., Desharnais, J., Edalat, A., Panangaden, P.: Bisimulation for labelled Markov processes. In: Proc. of the Twelfth IEEE Symposium On Logic In Computer Science, Warsaw, Poland (1997)
Censor, Y.: Parallel Optimization: Theory, Algorithms, Applications. Oxford University Press, Oxford (1997)
Cover, T.M., Thomas, J.A.: Elements of Information Theory, ch. 12. Wiley, Chichester (1991)
Even-Dar, E., Mansour, Y.: Learning rates for Q-learning. In: Helmbold, D.P., Williamson, B. (eds.) COLT 2001 and EuroCOLT 2001. LNCS (LNAI), vol. 2111, pp. 589–604. Springer, Heidelberg (2001)
Fiechter, C.N.: Design and Analysis of Efficient Reinforcement Learning Algorithms. PhD thesis, Univ.of Pittsburgh (1997)
Hoeffding, W.: Probability inequalities for sums of bounded random variables. American Statistical Association Journal 58, 13–30 (1963)
Jaakkola, T., Jordan, M.I., Singh, S.P.: Convergence of stochastic iterative dynamic programming algorithms. In: Cowan, J.D., Tesauro, G., Alspector, J. (eds.) Advances in Neural Information Processing Systems, vol. 6, pp. 703–710. Morgan Kaufmann Publishers, San Francisco (1994)
Jou, C.-C., Smolka, S.A.: Equivalences, congruences, and complete axiomatizations for probabilistic processes. In: Baeten, J.C.M., Klop, J.W. (eds.) CONCUR 1990. LNCS, vol. 458, Springer, Heidelberg (1990)
Kaelbling, L.P., Littman, M.L., Moore, A.P.: Reinforcement learning: A survey. Journal of Artificial Intelligence Research 4, 237–285 (1996)
Kearns, M., Singh, S.: Finite-sample convergence rates for q-learning and indirect algorithms. In: Proc. of the 1998 conference on Advances in neural information processing systems II, pp. 996–1002. MIT Press, Cambridge (1999)
Larsen, K.G., Skou, A.: Bisimulation through probabilistic testing. Inf. Comput. 94(1), 1–28 (1991)
Sutton, R.S., Barto, A.G.: Introduction to Reinforcement Learning. MIT Press, Cambridge (1998)
Tsitsiklis, J.N.: Asynchronous stochastic approximation and Q-learning. Machine Learning 16(3), 185–202 (1994)
Watkins, C.: Learning from Delayed Rewards. PhD thesis, Univ. of Cambridge (1989)
Watkins, C., Dayan, P.: Q-learning. Machine Learning 8, 279–292 (1992)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Desharnais, J., Laviolette, F., Moturu, K.P.D., Zhioua, S. (2006). Trace Equivalence Characterization Through Reinforcement Learning. In: Lamontagne, L., Marchand, M. (eds) Advances in Artificial Intelligence. Canadian AI 2006. Lecture Notes in Computer Science(), vol 4013. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11766247_32
Download citation
DOI: https://doi.org/10.1007/11766247_32
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-34628-9
Online ISBN: 978-3-540-34630-2
eBook Packages: Computer ScienceComputer Science (R0)