Learning and planning in environments with delayed feedback

Walsh, Thomas J.; Nouri, Ali; Li, Lihong; Littman, Michael L.

doi:10.1007/s10458-008-9056-7

Learning and planning in environments with delayed feedback

Published: 04 July 2008

Volume 18, pages 83–105, (2009)
Cite this article

Autonomous Agents and Multi-Agent Systems Aims and scope Submit manuscript

Thomas J. Walsh¹,
Ali Nouri¹,
Lihong Li¹ &
…
Michael L. Littman¹

673 Accesses
33 Citations
3 Altmetric
Explore all metrics

Abstract

This work considers the problems of learning and planning in Markovian environments with constant observation and reward delays. We provide a hardness result for the general planning problem and positive results for several special cases with deterministic or otherwise constrained dynamics. We present an algorithm, Model Based Simulation, for planning in such environments and use model-based reinforcement learning to extend this approach to the learning setting in both finite and continuous environments. Empirical comparisons show this algorithm holds significant advantages over others for decision making in delayed-observation environments.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Altman, E., & Nain, P. Closed-loop control with delayed information. In Proceedings of the ACM SIGMETRICS and Performance 1–5, pp. 193–204.
Atkeson C.G., Moore A.W., Schaal S. (1997) Locally weighted learning for control. Artificial Intelligence Review 11(1–5): 75–113
Article Google Scholar
Bander J.L., White C.C. III (1999) Markov decision processes with noise-corrupted and delayed state observations. Journal of the Operational Research Society 50: 660–668
Article MATH Google Scholar
Bertsekas, D. P. (2001). Dynamic programming and optimal control (2nd ed., Vol. 1/2). Athena Scientific.
Boyan, J. A., & Moore, A. W. (1995). Generalization in reinforcement learning: Safely approximating the value function. In Advances in neural information processing systems: Proceedings of the 1994 conference (pp. 369–376). Cambridge, MA: MIT Press.
Brafman R.I., Tennenholtz M. (2002) R-max—A general polynomial time algorithm for near-optimal reinforcement learning. Journal of Machine Learning Research 3: 213–231
Article MathSciNet Google Scholar
Brooks D.M., Leondes C.T. (1972) Markov decision processes with state-information lag. Operations Research 20(4): 904–907
Article Google Scholar
Fox, R., & Tennenholtz, M. (2007). A reinforcement learning algorithm with polynomial interaction complexity for only-costly-observable MDPs. In Proceedings of the 22nd Conference on Artificial Intelligence, pp. 553–558.
Hoeffding W. (1963) Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association 58(301): 13–30
Article MATH MathSciNet Google Scholar
Jong, N. K., & Stone, P. (2006). Kernel-based models for reinforcement learning. In Proceedings of the 2006 ICML Kernel Machines and Reinforcement Learning Workshop.
Kaelbling L.P., Littman M.L., Cassandra A.R. (1998) Planning and acting in partially observable stochastic domains. Artificial Intelligence 101(1–2): 99–134
Article MATH MathSciNet Google Scholar
Kakade, S. (2003). On the Sample Complexity of Reinforcement Learning. PhD thesis, University College London, UK.
Katsikopoulos K.V., Engelbrecht S.E. (2003) Markov decision processes with delays and asynchronous cost collection. IEEE Transactions on Automatic Control 48: 568–574
Article MathSciNet Google Scholar
Lin, L.-J. (1993). Reinforcement Learning for Robots using Neural Networks. PhD thesis, Carnegie Mellon University, Pittsburgh, PA.
Littman, M. L. (1996). Algorithms for sequential decision making. PhD thesis, Brown University, Providence, RI, 1996.
Loch, J., & Singh, S. (1998). Using eligibility traces to find the best memoryless policy in partially observable Markov decision processes. In Proceedings of the 15th International Conference on Machine Learning, pp. 323–331.
Munos, R., & Moore, A. W. (2000). Rates of convergence for variable resolution schemes in optimal control. In Proceedings of the 17th International Conference on Machine Learning, pp. 647–654.
Ormoneit D., Sen Ś. (2002) Kernel-based reinforcement learning. Machine Learning 49: 161–178
Article MATH Google Scholar
Papadimitriou C.H., Tsitsiklis J.N. (1987) The complexity of Markov decision processes. Mathematics of Operations Research 12(3): 441–450
Article MATH MathSciNet Google Scholar
Puterman M.L. (1994) Markov decision processes: Discrete stochastic dynamic programming. Wiley, New York
MATH Google Scholar
Singh S.P., Sutton R.S. (1996) Reinforcement learning with replacing eligibility traces. Machine Learning 22(1–3): 123–158
MATH Google Scholar
Singh S.P., Yee R.C. (1994) An upper bound on the loss from approximate optimal-value functions. Machine Learning 16(3): 227–233
MATH Google Scholar
Strehl, A. L., Li, L., Wiewiora, E., Langford, J., & Littman, M. L. (2006). PAC model-free reinforcement learning. In Proceedings of the 23rd International Conference on Machine Learning, pp. 881–888.
Sutton R.S. (1996) Generalization in reinforcement learning: Successful examples using sparse coarse coding. In: Touretzky D.S., Mozer M.C., HasselmoM. E. (Eds) Advances in neural information processing systems 8. MIT Press, Cambridge, MA, pp 1038–1045
Google Scholar
Sutton R.S., Barto A.G. (1998) Reinforcement learning: An introduction. MIT Press, Cambridge, MA
Google Scholar
Vijayakumar, S., & Schaal, S. (2000). Locally weighted projection regression: An O(n) algorithm for incremental real time learning in high dimensional space. In Proceedings of the 17th International Conference on Machine Learning, pp. 1079–1086.
Zubek, V. B., & Dietterich, T. G. (2000). A POMDP approximation algorithm that anticipates the need to observe. In Proceedings of the Pacific Rim International Conference on Artificial Intelligence, pp. 521–532.

Download references

Author information

Authors and Affiliations

Department of Computer Science, Rutgers University, 110 Frelinghuysen Rd., Piscataway, NJ, 08854, USA
Thomas J. Walsh, Ali Nouri, Lihong Li & Michael L. Littman

Authors

Thomas J. Walsh
View author publications
You can also search for this author in PubMed Google Scholar
Ali Nouri
View author publications
You can also search for this author in PubMed Google Scholar
Lihong Li
View author publications
You can also search for this author in PubMed Google Scholar
Michael L. Littman
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Thomas J. Walsh.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Walsh, T.J., Nouri, A., Li, L. et al. Learning and planning in environments with delayed feedback. Auton Agent Multi-Agent Syst 18, 83–105 (2009). https://doi.org/10.1007/s10458-008-9056-7

Download citation

Published: 04 July 2008
Issue Date: February 2009
DOI: https://doi.org/10.1007/s10458-008-9056-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Learning and planning in environments with delayed feedback

Abstract

Access this article

Similar content being viewed by others

Planning in Discrete and Continuous Markov Decision Processes by Probabilistic Programming

Bayesian Reinforcement Learning with Exploration

Probabilistic System Modeling for Complex Systems Operating in Uncertain Environments

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Learning and planning in environments with delayed feedback

Abstract

Access this article

Similar content being viewed by others

Planning in Discrete and Continuous Markov Decision Processes by Probabilistic Programming

Bayesian Reinforcement Learning with Exploration

Probabilistic System Modeling for Complex Systems Operating in Uncertain Environments

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation