Elsevier

Neural Networks

Volume 21, Issue 5, June 2008, Pages 699-758
Neural Networks

SOVEREIGN: An autonomous neural system for incrementally learning planned action sequences to navigate towards a rewarded goal

https://doi.org/10.1016/j.neunet.2007.09.016Get rights and content

Abstract

How do reactive and planned behaviors interact in real time? How are sequences of such behaviors released at appropriate times during autonomous navigation to realize valued goals? Controllers for both animals and mobile robots, or animats, need reactive mechanisms for exploration, and learned plans to reach goal objects once an environment becomes familiar. The SOVEREIGN (Self-Organizing, Vision, Expectation, Recognition, Emotion, Intelligent, Goal-oriented Navigation) animat model embodies these capabilities, and is tested in a 3D virtual reality environment. SOVEREIGN includes several interacting subsystems which model complementary properties of cortical What and Where processing streams and which clarify similarities between mechanisms for navigation and arm movement control. As the animat explores an environment, visual inputs are processed by networks that are sensitive to visual form and motion in the What and Where streams, respectively. Position-invariant and size-invariant recognition categories are learned by real-time incremental learning in the What stream. Estimates of target position relative to the animat are computed in the Where stream, and can activate approach movements toward the target. Motion cues from animat locomotion can elicit head-orienting movements to bring a new target into view. Approach and orienting movements are alternately performed during animat navigation. Cumulative estimates of each movement are derived from interacting proprioceptive and visual cues. Movement sequences are stored within a motor working memory. Sequences of visual categories are stored in a sensory working memory. These working memories trigger learning of sensory and motor sequence categories, or plans, which together control planned movements. Predictively effective chunk combinations are selectively enhanced via reinforcement learning when the animat is rewarded. Selected planning chunks effect a gradual transition from variable reactive exploratory movements to efficient goal-oriented planned movement sequences. Volitional signals gate interactions between model subsystems and the release of overt behaviors. The model can control different motor sequences under different motivational states and learns more efficient sequences to rewarded goals as exploration proceeds.

Introduction

This article describes the SOVEREIGN (Self-Organizing, Vision, Expectation, Recognition, Emotion, Intelligent, Goal-oriented Navigation) neural model to clarify how an animal, or animat, can learn to reach valued goal objects through planned sequences of navigational movements. The SOVEREIGN model embodies a self-organizing control system that attempts to learn and perform such behaviors autonomously. As the name SOVEREIGN indicates, this control system unifies visual, recognition, cognitive, cognitive-emotional, and motor competences. We believe that this is the first neural model that embodies and coordinates such a wide range of behavioral competences in an autonomous self-organizing control system that can operate in real time. These results have been briefly reported in Gnadt and Grossberg, 2005a, Gnadt and Grossberg, 2005b, Gnadt and Grossberg, 2006.

SOVEREIGN contributes to three large themes about how the brain works. The first theme concerns how brains learn to balance between reactive and planned behaviors. During initial exploration of a novel environment, many reactive movements occur in response to unexpected and unfamiliar environmental cues (Leonard & McNaughton, 1990, chap. 13). These movements may initially appear to be locally random, as an animal orients toward and approaches many local stimuli. As such an animal becomes familiar with its surroundings, it learns to discriminate between objects likely to yield a reward and those that lead to punishment. Such approach-avoidance behavior is often learned via a perception–cognition–emotion–action cycle in which an action and its consequences elicit sensory cues that are associated with them. Rewards and punishments affect the likelihood that the same actions will be repeated in the future. When objects are out of direct sensory range, multiple reactive exploratory movements may be needed to reach them. Eventually, reactive exploratory behaviors are replaced by more efficient planned sequential trajectories within a familiar environment. One of the main goals of SOVEREIGN is to explain how erratic reactive exploratory behaviors can give rise to organized planned behaviors, and how both reactive and planned behaviors may remain balanced so that planned behaviors can be carried out where appropriate, without losing the ability to respond quickly to novel reactive challenges.

The second design theme illustrates the hypothesis that advanced brains are organized into parallel processing streams with complementary properties (Grossberg, 2000a). Each stream’s properties are related to those of a complementary stream much as a lock fits its key, or two pieces of a puzzle fit together. The mechanisms that enable each stream to compute one set of properties prevent it from computing a complementary set of properties. As a result, each of these streams exhibits complementary strengths and weaknesses. How, then, do these complementary properties get synthesized into a consistent behavioral experience? It is proposed how interactions between these processing streams overcome their complementary deficiencies and generate behavioral properties that realize the unity of conscious experiences. In this sense, pairs of complementary streams are the functional units because only through their interactions can key behavioral properties be competently computed. SOVEREIGN clarifies how these complementary properties interact together to control goal-orienting sequences of navigational behaviors. For example, it is well-known that there are What and Where (or Where/How) cortical processing streams (Goodale and Milner, 1992, Mishkin et al., 1983, Ungerleider and Mishkin, 1982). In particular, key properties of the What and Where cortical processing streams seem to be complementary. Neural models have clarified that properties of invariant object learning and recognition in the What cortical stream are complementary to properties of goal-oriented movement control in the Where cortical processing stream.

A third design theme underlying the SOVEREIGN model is that brains use homologous circuits to compute navigational and hand/arm movements. In other words, movements of the body and of the hand/arms are controlled by circuits that share many properties. This proposed homology clarifies how navigational and arm movements can be coordinated when a body moves with respect to a goal object with the intention of grasping or otherwise manipulating it using the hand/arm system.

A considerable body of neural modeling of arm movement trajectory control (e.g., the VITE model: Bullock and Grossberg (1988) and Bullock, Cisek, and Grossberg (1998)) suggests that cortical arm movement control circuits compute a representation of where the arm wants to move (i.e., a target position) and compare this with an outflow representation of where the arm is now (i.e., the present position) by computing a difference vector between target position and present position representations. The difference vector represents the direction and distance that the arm needs to move to realize its goal position. Basal ganglia volitional signals of various kinds, such as a GO signal, translate the difference vector into a motor trajectory of variable speed. Additional cortical, spinal, and cerebellar circuitry is needed to ensure that the brain generates the forces that are needed to actually carry out such a commanded trajectory (e.g., the FLETE model: Bullock and Grossberg (1991) and Contreras-Vidal, Grossberg, and Bullock (1997)).

A key difference between navigation and hand/arm movement control concerns how present position is calculated. Because the arm is attached to the body, present position of the arm can be directly computed using outflow, or corollary discharge, movement commands that explicitly code the commanded arm position. In contrast, when a body moves with respect to the world, no such immediately available present position command is available. This difference requires more elaborate brain machinery to compute present position of the body in the world during navigational movements. The brain needs to use a variety of sensory cues, both proprioceptive and visual, to create a representation of present position that can be compared with representations of target position, so that a difference vector and volitional commands can move the body towards desired goal objects. In summary, both navigational movement in the world and movement of limbs with respect to the body use a difference vector computational strategy.

SOVEREIGN’s perceptual competences include on-line, albeit simplified, visual representations of a 3D virtual reality environment in which the model controls navigation. SOVEREIGN computes, in parallel, both visual form and motion information about the world. As in the brain, the visual form of objects is computed within the What cortical processing stream, whereas visual motion is computed within the Where cortical processing stream. In this way, the brain can process both what objects are and where and how to track and act upon them.

SOVEREIGN uses the visual form information to incrementally learn spatially-invariant and size-invariant object recognition categories whereby to recognize visually perceived objects in the world. These recognition categories, in turn, learn to read out top-down attentive expectations of the visual objects that they code. Object categories in the What stream are spatially-invariant and size-invariant to prevent a combinatorial explosion from occurring in which each position and size of an object would need its own representation. The Where stream represents the spatial locations of these objects. In particular, visual motion information is used to guide reactive orienting movements and attention shifts to locations at which changes occur in SOVEREIGN’s visual world. What–Where inter-stream interactions are needed to enable both recognition and acquisition of desired goal objects. These parallel streams help SOVEREIGN to balance between reactive and planned behaviors, in a manner that is further discussed below.

SOVEREIGN also includes cognitive processes, notably mechanisms to temporarily store sequences of events in working memory, and to learn sequential plans, or chunks, of these sequences with which to predict and control future planned behaviors. Parallel object and spatial working memories and sequential chunking networks are modeled. The object working memory and chunking network are in the model’s What stream, and the spatial working memory and chunking network are in its Where stream. SOVEREIGN clarifies how these parallel cognitive processes cooperate to acquire desired goal objects that can only be reached through a sequence of actions, and to disambiguate sequential navigational decisions in contexts where only one of them would be insufficient.

Cognitive-emotional mechanisms include the role of rewards and punishments in shaping goal-oriented behaviors. In particular, reinforcement learning can influence which learned cognitive chunks will be attended and selected to elicit behaviors that acquire desired goals within a familiar environment. Learned interactions between cognitive and emotional representations, notably motivationally-mediated attention, play an important role in this context-sensitive selection process.

The SOVEREIGN model thus contributes solutions to three key problems: (1) How an animal, or animat that embodies biologically-inspired designs, learns to balance between reactive and planned behaviors in a task-appropriate way. (2) How plans are learned during erratic reactive behaviors in such a way that, after learning, they can be read out fluently at the correct times and in the correct spatial contexts. (3) How, in particular, an animat coordinates its reactive and planned behaviors so that its perception–cognition–emotion–action cycles of exploration, real-time vision, learned recognition, sequential working memory storage, learning of sequential plans, reinforcement learning, and planned action sequences are activated when needed as the animat navigates novel and familiar environments.

A number of problems need to be solved in realize these competences:

  • 1.

    How are visual inputs used for both target identification and target position estimation?

  • 2.

    What coordinate system is used for representing target positions? How is it calibrated through learning?

  • 3.

    What coordinate system is used for representing body position in space? How is it calibrated through learning?

  • 4.

    What is the role of proprioceptive feedback in calibrating body position?

  • 5.

    How are target position and body position information combined to control the next planned movement? Are movement sequences whose units are sensory and motor items (cue-command pairs) sufficient for some types of spatial navigation?

  • 6.

    How are items within movement sequences stored in short-term working memory?

  • 7.

    How can a memory system be designed to store and recall sequential plans, or chunks? How can such learning solve the classical goal paradox (Grossberg, 1978), which notes that, on all sequence learning trials, the goal always occurs last, whereas during planned sequential performance, knowledge that the goal is being sought is activated first? That is, how can learning be order-preserving, yet also break the learned order to permit the desired goal to activate a plan capable of realizing it?

  • 8.

    How does reinforcement enable an animal to selectively attend to rewarded goal objects?

  • 9.

    How does reinforcement interact with the memory representation of experienced sequences in order to preferentially choose and perform them?

  • 10.

    How can latent learning, or learning without reward, be explained?

  • 11.

    How does competitive decision-making resolve conflicts between visually reactive and planned movement commands?

Section snippets

Approach-orienting navigation and learning in a 3D virtual reality environment

The SOVEREIGN model simulates these processes for an animat that experiences a 3D visual world in a virtual reality environment. This world is the relatively simple spatial format of a plus-maze (Munn, 1950). Although simple, this environment tests in a clear way many of the critical competences that the animat needs to achieve. Much of the problem’s difficulty arises because an animat may navigate the maze in different ways, including different speeds and directions of movement, on successive

End-to-end SOVEREIGN simulations

Three key SOVEREIGN simulations are demonstrated herein. In the Motivated Choice Experiment, the animat learns the route to two different goal locations under two different motivational states. In the Multiple Reward Experiment, the animat learns to repeatedly revisit two rewarded goal locations. Finally, in the Shortest Path Experiment, the animate learns the shortest path to the goal location, after repeated random exploration of the maze environment. Each of the simulation summaries contains

General discussion and conclusions

The SOVEREIGN architecture embodies a number of design principles whose mechanistic instantiation as neural circuits (Fig. 2) enable incremental learning of planned action sequences to carry out route-based navigation towards a rewarded goal. SOVEREIGN circuits are based on neural models that have elsewhere been used to explain and predict many behavioral and brain data. Here the emphasis is on designing a neuromorphic controller that realizes behavioral competence.

The model has several notable

Acknowledgements

William Gnadt was supported in part by Riverside Research Institute, the Defense Advanced Research Projects Agency (N00014-92-J-4015), the Air Force Office of Scientific Research (F49620-92-J-0225), the National Science Foundation (IRI 90-24877 and SBE-0354378), the Office of Naval Research (N00014-91-J-4100, N00014-92-J-1309, and N00014-95-1-0657), and Pacific Sierra Research (PSR 91-6075-2).

Stephen Grossberg was supported in part by the National Science Foundation (SBE-0354378), and the

References (94)

  • S. Grossberg

    A neural theory of punishment and avoidance, II: Quantitative theory

    Mathematical Biosciences

    (1972)
  • S. Grossberg

    A theory of human memory: Self-organization and performance of sensory-motor codes, maps, and plans

  • S. Grossberg

    The complementary brain: Unifying brain dynamics and modularity

    Trends in Cognitive Sciences

    (2000)
  • S. Grossberg

    The imbalanced brain: From normal behavior to schizophrenia

    Biological Psychiatry

    (2000)
  • S. Grossberg et al.

    Neural representations for sensory-motor control, II: Learning a head-centered visuomotor representation of 3-D target position

    Neural Networks

    (1993)
  • S. Grossberg et al.

    A neural network model of adaptively timed reinforcement learning and hippocampal dynamics

    Cognitive Brain Research

    (1992)
  • S. Grossberg et al.

    Neural dynamics of motion integration and segmentation within and across apertures

    Vision Research

    (2001)
  • S. Grossberg et al.

    Laminar cortical dynamics of 3D surface perception: Stratification, transparency, and neon color spreading

    Vision Research

    (2005)
  • M. Mishkin et al.

    Object vision and spatial vision: Two cortical pathways

    Trends in Neurosciences

    (1983)
  • J.M. O’Keefe et al.

    The hippocampus as a spatial map. Preliminary evidence from unit activity in the freely-moving rat

    Brain Research

    (1971)
  • E. Saltzman

    Levels of sensorimotor representation

    Journal of Mathematical Psychology

    (1979)
  • N.A. Schmajuk

    Role of the hippocampus in temporal and spatial navigation: An adaptive neural network

    Behavioral Brain Research

    (1990)
  • R.F. Thompson

    The neural basis of basic associative learning of discrete behavioral responses

    Trends in Neurosciences

    (1988)
  • L. Zadeh

    Fuzzy sets

    Information and Control

    (1965)
  • A.G. Barto et al.

    Landmark learning: An illustration of associative search

    Biological Cybernetics

    (1981)
  • J. Berzhanskaya et al.

    Laminar cortical dynamics of visual form and motion interactions during coherent object motion perception

    Spatial Vision

    (2007)
  • Blodgett, H. C. (1929). An analysis of the behavior units in maze learning. In Proceedings of the 9th international...
  • G. Bradski et al.

    Working memory networks for learning temporal order with application to three-dimensional visual object recognition

    Neural Computation

    (1992)
  • G. Bradski et al.

    STORE working memory networks for storage and recall of arbitrary temporal sequences

    Biological Cybernetics

    (1994)
  • D. Bullock et al.

    Cortical networks for control of voluntary arm movements under variable force conditions

    Cerebral Cortex

    (1998)
  • D. Bullock et al.

    Neural dynamics of planned arm movements: Emergent invariants and speed-accuracy properties during trajectory formation

    Psychological Review

    (1988)
  • D. Bullock et al.

    A self-organizing neural model of motor equivalent reaching and tool use by a multijoint arm

    Journal of Cognitive Neuroscience

    (1993)
  • N. Burgess et al.

    Hippocampus — Spatial models

  • C.E. Buxton

    Latent learning and the goal gradient hypothesis

    Contributions to Psychological Theory

    (1940)
  • Y. Cao et al.

    A laminar cortical model of stereopsis and 3D surface perception: Closure and da Vinci stereopsis

    Spatial Vision

    (2005)
  • G.A. Carpenter et al.

    Fuzzy ARTMAP: A neural network architecture for incremental supervised learning of analog multidimensional maps

    IEEE Transactions on Neural Networks

    (1992)
  • Carpenter, G. A., & Ross, W. D. 1993. ART-EMAP: A neural network architecture for learning and prediction by evidence...
  • H. Carr et al.

    Orientation of the white rat

    Journal of Comparative Neurology and Psychology

    (1908)
  • M.A. Cohen et al.

    Neural dynamics of speech and language coding: Developmental programs, perceptual grouping, and competition for short-term memory

    Human Neurobiology

    (1986)
  • M.A. Cohen et al.

    Masking fields: A massively parallel neural architecture for learning, recognizing, and predicting multiple groupings of patterned data

    Applied Optics

    (1987)
  • J.L. Contreras-Vidal et al.

    A neural model of cerebellar learning for arm movement control: Cortico-spino-cerebellar dynamics

    Learning and Memory

    (1997)
  • N.L. Dallal et al.

    Hierarchical structures: Chunking by food type facilitates spatial memory

    Journal of Experimental Psychology: Animal Behavior Processes

    (1990)
  • C.T. Daub

    The effect of doors on latent learning

    Journal of Comparative Psychology

    (1933)
  • P. Dayan

    Navigating through temporal difference

  • W. Dennis

    Multiple visual discrimination in the block elevated maze

    Journal of Comparative and Physiological Psychology

    (1932)
  • Fang, L., & Grossberg, S. (2007). From stereogram to surface: How the brain sees the world in depth. Technical Report...
  • Fazl, A., Grossberg, S., & Mingolla, E. (2007). View-invariant object category learning, recognition and search: How...
  • Cited by (40)

    • Emergent vision system of autonomous virtual characters

      2016, Neurocomputing
      Citation Excerpt :

      Traditionally, the problem of synthetic vision of autonomous agents (robots and virtual characters) has been approached with attempts to build models capable of preprocessing the sensor signals and drive the motor actions of the agents [1,2].

    • A spatio-temporal Long-term Memory approach for visual place recognition in mobile robotic navigation

      2013, Robotics and Autonomous Systems
      Citation Excerpt :

      These episodes are learnt and stored using spatio-temporal LTM cells. Such topological representation was also employed in studying goal seeking behavior in biological navigation by robotic simulation [40,41]. Another interesting discussion on temporal encoding of hippocampal cells for topological representation was found in [42].

    • The Ouroboros Model in the light of venerable criteria

      2010, Neurocomputing
      Citation Excerpt :

      Different models can be ranked on the basis of their coverage, sophistication and grounding. Having started this brief survey with the relation of the Ouroboros Model to symbolic approaches, it seems natural to conclude this section with a look at one of the most advanced existing connectionist models of autonomous agents [12]. Disregarding implementational details, the approach as adopted by SOVEREIGN can be understood as partly implementation of the Ouroboros Model; in particular, the described chunking mechanism delivers units of knowledge which resemble schemata, and which alike are effectively used to control movements.

    • An embodied account of serial order: How instabilities drive sequence generation

      2010, Neural Networks
      Citation Excerpt :

      The attractors can be switched by a global signal, similar to the condition of satisfaction signal of our model. Embodiment is only a secondary concern for the broader and more abstract neural dynamic models of sequence generation (Gnadt & Grossberg, 2007; Grossberg & Pearson, 2008) that are based on Grossberg’s proposal of a neural mechanism for serial order (Grossberg, 1978). Here, sequences are stored as spatial gradients of neural activation.

    View all citing articles on Scopus
    1

    Tel.: +1 617 353 7858; fax: +1 617 353 7755.

    View full text