Foundations of learning in autonomous agents

https://doi.org/10.1016/0921-8890(91)90018-GGet rights and content

Abstract

Autonomous agents must learn to act in complex, noisy domains. This paper provides a formal description of the problem of building autonomous agents that learn to act and describes a common framework for the specification of learning algorithms for autonomous agents. In addition, it provides correctness, convergence, and complexity metrics for learning algorithms for autonomous agents. These metrics allow algorithms based on a wide variety of programming paradigms to be objectively compared.

References (28)

  • w. Buntine

    A critique of the Valiant model

  • D. Haussler

    New theoretical directions in machine learning

    Machine Learning

    (1988)
  • J.G. Kemeny et al.

    Finite Markov Chains

    (1976)
  • N. Littlestone

    Learning quickly when irrelevant attributes abound: A new linear threshold algorithm

    Machine Learning

    (1988)
  • Cited by (12)

    • Self-Organization and Autonomous Robots

      2012, Neural Systems for Robotics
    • ARBIB: An autonomous robot based on inspirations from biology

      2000, Robotics and Autonomous Systems
      Citation Excerpt :

      Learning then acts to maximize the reward or return associated with or predicted from the reinforcement signal over time. The many variants of this form of learning (especially the Q-learning paradigm of Watkins [79] and Watkins and Dayan [80]) have been very popular in animat studies (e.g. [37,39,44,48,52,70]). Reinforcement learning is similar to the S–R theory of conditioning [35, p. 350] which posits a direct link between the conditioned stimulus and response, in contrast to Pavlovian S–S theory which posits association of two stimuli — the CS and the US.

    • TOWARDS INCREMENTAL AUTONOMY FRAMEWORK FOR ON-ORBIT VISION-BASED GRASPING

      2021, Proceedings of the International Astronautical Congress, IAC
    View all citing articles on Scopus

    This work was supported in a part by a gift from the System Development Foundation and in part by the Air force Office of Scientific Research under contract # F49620-89-C-0055

    View full text