Beyond Reward: The Problem of Knowledge and Data

Sutton, Richard S.

doi:10.1007/978-3-642-31951-8_2

Richard S. Sutton²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7207))

Included in the following conference series:

International Conference on Inductive Logic Programming

1024 Accesses

Abstract

Intelligence can be defined, informally, as knowing a lot and being able to use that knowledge flexibly to achieve one’s goals. In this sense it is clear that knowledge is central to intelligence. However, it is less clear exactly what knowledge is, what gives it meaning, and how it can be efficiently acquired and used. In this talk we re-examine aspects of these age-old questions in light of modern experience (and particularly in light of recent work in reinforcement learning). Such questions are not just of philosophical or theoretical import; they directly effect the practicality of modern knowledge-based systems, which tend to become unwieldy and brittle—difficult to change—as the knowledge base becomes large and diverse.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Reinforcement Learning Algorithms: Categorization and Structural Properties

Multi-task reinforcement learning in humans

Article 28 January 2021

Reinforcement Learning

References

Baird, L.C.: Residual algorithms: Reinforcement learning with function approximation. In: Proceedings of the Twelfth International Conference on Machine Learning, pp. 30–37 (1995)
Google Scholar
Degris, T., Modayil, J.: Scaling-up knowledge for a cognizant robot. In: Notes of the AAAI Spring Symposium on Designing Intelligent Robots: Reintegrating AI (2012)
Google Scholar
Konidaris, G., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: Proceedings of the 20th International Joint Conference on Artificial Intelligence, pp. 895–900 (2007)
Google Scholar
Koop, A.: Investigating Experience: Temporal Coherence and Empirical Knowledge Representation. MSc. thesis, University of Alberta (2007)
Google Scholar
Maei, H.R.: Gradient Temporal-Difference Learning Algorithms. PhD. thesis, University of Alberta (2011)
Google Scholar
Maei, H.R., Sutton, R.S.: GQ(λ): A general gradient algorithm for temporal-difference prediction learning with eligibility traces. In: Proceedings of the Third Conference on Artificial General Intelligence (2010)
Google Scholar
Maei, H.R., Szepesvári, C., Bhatnagar, S., Precup, D., Silver, D., Sutton, R.S.: Convergent temporal-difference learning with arbitrary smooth function approximation. In: Advances in Neural Information Processing Systems, vol. 22. MIT Press (2009)
Google Scholar
Maei, H.R., Szepesvári, C., Bhatnagar, S., Sutton, R.S.: Toward off-policy learning control with function approximation. In: Proceedings of the 27th International Conference on Machine Learning (2010)
Google Scholar
Mannor, S., Menache, I., Hoze, A., Klein, U.: Dynamic abstraction in reinforcement learning via clustering. In: Proceedings of the Twenty-First International Conference on Machine Learning (2004)
Google Scholar
McGovern, A., Sutton, R.S.: Macro-actions in reinforcement learning: An empirical analysis. Technical Report 98-70, University of Massachusetts, Department of Computer Science (1998)
Google Scholar
Modayil, J., White, A., Sutton, R.S.: Multi-timescale nexting in a reinforcement learning robot. In: Proceedings of the 2012 Conference on Simulation of Adaptive Behaviour (to appear, 2012)
Google Scholar
Parr, R.: Hierarchical Control and Learning for Markov Decision Processes. PhD thesis, University of California at Berkeley (1998)
Google Scholar
Precup, D.: Temporal Abstraction in Reinforcement Learning. PhD thesis, University of Massachusetts (2000)
Google Scholar
Rafols, E.J.: Temporal Abstraction in Temporal-difference Networks. MSc. thesis, University of Alberta (2006)
Google Scholar
Singh, S., Barto, A.G., Chentanez, N.: Intrinsically motivated reinforcement learning. In: Advances in Neural Information Processing Systems, vol. 17, pp. 1281–1288 (2005)
Google Scholar
Stolle, M., Precup, D.: Learning Options in Reinforcement Learning. In: Koenig, S., Holte, R.C. (eds.) SARA 2002. LNCS (LNAI), vol. 2371, pp. 212–223. Springer, Heidelberg (2002)
Chapter Google Scholar
Sutton, R.S.: “Verification” and “Verfication, the key to AI” (2001), http://richsutton.com/IncIdeas/Verification.html , http://richsutton.com/IncIdeas/KeytoAI.html
Sutton, R.S.: The grand challenge of predictive empirical abstract knowledge. In: Working Notes of the IJCAI 2009 Workshop on Grand Challenges for Reasoning from Experiences (2009)
Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press (1998)
Google Scholar
Sutton, R.S., Maei, H.R., Precup, D., Bhatnagar, S., Silver, D., Szepesvári, C., Wiewiora, E.: Fast gradient-descent methods for temporal-difference learning with linear function approximation. In: Proceedings of the 26th International Conference on Machine Learning (2009)
Google Scholar
Sutton, R.S., Modayil, J., Delp, M., Degris, T., Pilarski, P.M., White, A., Precup, D.: Horde: A scalable real-time architecture for learning knowledge from unsupervised sensorimotor interaction. In: Proceedings of the 10th International Conference on Autonomous Agents and Multiagent Systems, AAMAS (2011)
Google Scholar
Sutton, R.S., Precup, D., Singh, S.: Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artificial Intelligence 112, 181–211 (1999)
Article MathSciNet MATH Google Scholar
Sutton, R.S., Szepesvári, C., Maei, H.R.: A convergent O(n) algorithm for off-policy temporal-difference learning with linear function approximation. In: Advances in Neural Information Processing Systems, vol. 21. MIT Press (2009)
Google Scholar
Tsitsiklis, J.N., Van Roy, B.: An analysis of temporal-difference learning with function approximation. IEEE Transactions on Automatic Control 42, 674–690 (1997)
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

University of Alberta, Edmonton, Alberta, Canada
Richard S. Sutton

Authors

Richard S. Sutton
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, Imperial College London, 180 Queen’s Gate, SW7 2AZ, London, UK
Stephen H. Muggleton & Alireza Tamaddoni-Nezhad &
Dipartimento di Informatica, Università degli Studi di Bari “Aldo Moro”, Via E. Orabona, 4, 70125, Bari, Italy
Francesca A. Lisi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sutton, R.S. (2012). Beyond Reward: The Problem of Knowledge and Data. In: Muggleton, S.H., Tamaddoni-Nezhad, A., Lisi, F.A. (eds) Inductive Logic Programming. ILP 2011. Lecture Notes in Computer Science(), vol 7207. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31951-8_2

Download citation

DOI: https://doi.org/10.1007/978-3-642-31951-8_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-31950-1
Online ISBN: 978-3-642-31951-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics