ABSTRACT
Several algorithms for learning near-optimal policies in Markov Decision Processes have been analyzed and proven efficient. Empirical results have suggested that Model-based Interval Estimation (MBIE) learns efficiently in practice, effectively balancing exploration and exploitation. This paper presents the first theoretical analysis of MBIE, proving its efficiency even under worst-case conditions. The paper also introduces a new performance metric, average loss, and relates it to its less "online" cousins from the literature.
- Brafman, R. I., & Tennenholtz, M. (2002). R-MAX---a general polynomial time algorithm for near-optimal reinforcement learning. Journal of Machine Learning Research, 3, 213--231. Google ScholarDigital Library
- Fiechter, C.-N. (1997). Expected mistake bound model for on-line reinforcement learning. Proceedings of the Fourteenth International Conference on Machine Learning (pp. 116--124). Google ScholarDigital Library
- Fong, P. W. L. (1995). A quantitative study of hypothesis selection. Proceedings of the Twelfth International Conference on Machine Learning (ICML-95) (pp. 226--234).Google ScholarDigital Library
- Kaelbling, L. P. (1993). Learning in embedded systems. Cambridge, MA: The MIT Press. Google ScholarDigital Library
- Kakade, S. M. (2003). On the sample complexity of reinforcement learning. Doctoral dissertation, Gatsby Computational Neuroscience Unit, University College London.Google Scholar
- Kearns, M. J., & Singh, S. P. (2002). Near-optimal reinforcement learning in polynomial time. Machine Learning, 49, 209--232. Google ScholarDigital Library
- Puterman, M. L. (1994). Markov decision processes---discrete stochastic dynamic programming. New York, NY: John Wiley & Sons, Inc. Google ScholarDigital Library
- Strehl, A. L., & Littman, M. L. (2004). An empirical evaluation of interval estimation for Markov decision processes. The 16th IEEE International Conference on Tools with Artifical Intelligence (ICTAI-2004) (pp. 128 135). Google ScholarDigital Library
- Strehl, A. L., & Littman, M. L. (2005). A theoretical analysis of model-based interval estimation: Proofs. Forthcoming tech report, Rutgers University.Google Scholar
- Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning: An introduction The MIT Press. Google ScholarDigital Library
- Voltaire (1759). Candide.Google Scholar
- Weissman, T., Ordentlich, E., Seroussi, G., Verdu, S., & Weinberger, M. J. (2003). Inequalities for the L1 deviation of the empirical distribution (Technical Report HPL-2003-97R1). Hewlett-Packard Labs.Google Scholar
- Wiering, M., & Schmidhuber, J. (1998). Efficient model-based exploration. Proceedings of the Fifth International Conference on, Simulation of Adaptive Behavior (SAB'98) (pp. 223 228). Google ScholarDigital Library
- A theoretical analysis of Model-Based Interval Estimation
Recommendations
An analysis of model-based Interval Estimation for Markov Decision Processes
Several algorithms for learning near-optimal policies in Markov Decision Processes have been analyzed and proven efficient. Empirical results have suggested that Model-based Interval Estimation (MBIE) learns efficiently in practice, effectively ...
An Empirical Evaluation of Interval Estimation for Markov Decision Processes
ICTAI '04: Proceedings of the 16th IEEE International Conference on Tools with Artificial IntelligenceThis paper takes an empirical approach to evaluating three model-based reinforcement-learning methods. All methods intend to speed the learning process by mixing exploitation of learned knowledge with exploration of possibly promising alternatives. We ...
A theoretical analysis of metric hypothesis transfer learning
ICML'15: Proceedings of the 32nd International Conference on International Conference on Machine Learning - Volume 37We consider the problem of transferring some a priori knowledge in the context of supervised metric learning approaches. While this setting has been successfully applied in some empirical contexts, no theoretical evidence exists to justify this ...
Comments