SOVEREIGN: An autonomous neural system for incrementally learning planned action sequences to navigate towards a rewarded goal
Introduction
This article describes the SOVEREIGN (Self-Organizing, Vision, Expectation, Recognition, Emotion, Intelligent, Goal-oriented Navigation) neural model to clarify how an animal, or animat, can learn to reach valued goal objects through planned sequences of navigational movements. The SOVEREIGN model embodies a self-organizing control system that attempts to learn and perform such behaviors autonomously. As the name SOVEREIGN indicates, this control system unifies visual, recognition, cognitive, cognitive-emotional, and motor competences. We believe that this is the first neural model that embodies and coordinates such a wide range of behavioral competences in an autonomous self-organizing control system that can operate in real time. These results have been briefly reported in Gnadt and Grossberg, 2005a, Gnadt and Grossberg, 2005b, Gnadt and Grossberg, 2006.
SOVEREIGN contributes to three large themes about how the brain works. The first theme concerns how brains learn to balance between reactive and planned behaviors. During initial exploration of a novel environment, many reactive movements occur in response to unexpected and unfamiliar environmental cues (Leonard & McNaughton, 1990, chap. 13). These movements may initially appear to be locally random, as an animal orients toward and approaches many local stimuli. As such an animal becomes familiar with its surroundings, it learns to discriminate between objects likely to yield a reward and those that lead to punishment. Such approach-avoidance behavior is often learned via a perception–cognition–emotion–action cycle in which an action and its consequences elicit sensory cues that are associated with them. Rewards and punishments affect the likelihood that the same actions will be repeated in the future. When objects are out of direct sensory range, multiple reactive exploratory movements may be needed to reach them. Eventually, reactive exploratory behaviors are replaced by more efficient planned sequential trajectories within a familiar environment. One of the main goals of SOVEREIGN is to explain how erratic reactive exploratory behaviors can give rise to organized planned behaviors, and how both reactive and planned behaviors may remain balanced so that planned behaviors can be carried out where appropriate, without losing the ability to respond quickly to novel reactive challenges.
The second design theme illustrates the hypothesis that advanced brains are organized into parallel processing streams with complementary properties (Grossberg, 2000a). Each stream’s properties are related to those of a complementary stream much as a lock fits its key, or two pieces of a puzzle fit together. The mechanisms that enable each stream to compute one set of properties prevent it from computing a complementary set of properties. As a result, each of these streams exhibits complementary strengths and weaknesses. How, then, do these complementary properties get synthesized into a consistent behavioral experience? It is proposed how interactions between these processing streams overcome their complementary deficiencies and generate behavioral properties that realize the unity of conscious experiences. In this sense, pairs of complementary streams are the functional units because only through their interactions can key behavioral properties be competently computed. SOVEREIGN clarifies how these complementary properties interact together to control goal-orienting sequences of navigational behaviors. For example, it is well-known that there are What and Where (or Where/How) cortical processing streams (Goodale and Milner, 1992, Mishkin et al., 1983, Ungerleider and Mishkin, 1982). In particular, key properties of the What and Where cortical processing streams seem to be complementary. Neural models have clarified that properties of invariant object learning and recognition in the What cortical stream are complementary to properties of goal-oriented movement control in the Where cortical processing stream.
A third design theme underlying the SOVEREIGN model is that brains use homologous circuits to compute navigational and hand/arm movements. In other words, movements of the body and of the hand/arms are controlled by circuits that share many properties. This proposed homology clarifies how navigational and arm movements can be coordinated when a body moves with respect to a goal object with the intention of grasping or otherwise manipulating it using the hand/arm system.
A considerable body of neural modeling of arm movement trajectory control (e.g., the VITE model: Bullock and Grossberg (1988) and Bullock, Cisek, and Grossberg (1998)) suggests that cortical arm movement control circuits compute a representation of where the arm wants to move (i.e., a target position) and compare this with an outflow representation of where the arm is now (i.e., the present position) by computing a difference vector between target position and present position representations. The difference vector represents the direction and distance that the arm needs to move to realize its goal position. Basal ganglia volitional signals of various kinds, such as a GO signal, translate the difference vector into a motor trajectory of variable speed. Additional cortical, spinal, and cerebellar circuitry is needed to ensure that the brain generates the forces that are needed to actually carry out such a commanded trajectory (e.g., the FLETE model: Bullock and Grossberg (1991) and Contreras-Vidal, Grossberg, and Bullock (1997)).
A key difference between navigation and hand/arm movement control concerns how present position is calculated. Because the arm is attached to the body, present position of the arm can be directly computed using outflow, or corollary discharge, movement commands that explicitly code the commanded arm position. In contrast, when a body moves with respect to the world, no such immediately available present position command is available. This difference requires more elaborate brain machinery to compute present position of the body in the world during navigational movements. The brain needs to use a variety of sensory cues, both proprioceptive and visual, to create a representation of present position that can be compared with representations of target position, so that a difference vector and volitional commands can move the body towards desired goal objects. In summary, both navigational movement in the world and movement of limbs with respect to the body use a difference vector computational strategy.
SOVEREIGN’s perceptual competences include on-line, albeit simplified, visual representations of a 3D virtual reality environment in which the model controls navigation. SOVEREIGN computes, in parallel, both visual form and motion information about the world. As in the brain, the visual form of objects is computed within the What cortical processing stream, whereas visual motion is computed within the Where cortical processing stream. In this way, the brain can process both what objects are and where and how to track and act upon them.
SOVEREIGN uses the visual form information to incrementally learn spatially-invariant and size-invariant object recognition categories whereby to recognize visually perceived objects in the world. These recognition categories, in turn, learn to read out top-down attentive expectations of the visual objects that they code. Object categories in the What stream are spatially-invariant and size-invariant to prevent a combinatorial explosion from occurring in which each position and size of an object would need its own representation. The Where stream represents the spatial locations of these objects. In particular, visual motion information is used to guide reactive orienting movements and attention shifts to locations at which changes occur in SOVEREIGN’s visual world. What–Where inter-stream interactions are needed to enable both recognition and acquisition of desired goal objects. These parallel streams help SOVEREIGN to balance between reactive and planned behaviors, in a manner that is further discussed below.
SOVEREIGN also includes cognitive processes, notably mechanisms to temporarily store sequences of events in working memory, and to learn sequential plans, or chunks, of these sequences with which to predict and control future planned behaviors. Parallel object and spatial working memories and sequential chunking networks are modeled. The object working memory and chunking network are in the model’s What stream, and the spatial working memory and chunking network are in its Where stream. SOVEREIGN clarifies how these parallel cognitive processes cooperate to acquire desired goal objects that can only be reached through a sequence of actions, and to disambiguate sequential navigational decisions in contexts where only one of them would be insufficient.
Cognitive-emotional mechanisms include the role of rewards and punishments in shaping goal-oriented behaviors. In particular, reinforcement learning can influence which learned cognitive chunks will be attended and selected to elicit behaviors that acquire desired goals within a familiar environment. Learned interactions between cognitive and emotional representations, notably motivationally-mediated attention, play an important role in this context-sensitive selection process.
The SOVEREIGN model thus contributes solutions to three key problems: (1) How an animal, or animat that embodies biologically-inspired designs, learns to balance between reactive and planned behaviors in a task-appropriate way. (2) How plans are learned during erratic reactive behaviors in such a way that, after learning, they can be read out fluently at the correct times and in the correct spatial contexts. (3) How, in particular, an animat coordinates its reactive and planned behaviors so that its perception–cognition–emotion–action cycles of exploration, real-time vision, learned recognition, sequential working memory storage, learning of sequential plans, reinforcement learning, and planned action sequences are activated when needed as the animat navigates novel and familiar environments.
A number of problems need to be solved in realize these competences:
- 1.
How are visual inputs used for both target identification and target position estimation?
- 2.
What coordinate system is used for representing target positions? How is it calibrated through learning?
- 3.
What coordinate system is used for representing body position in space? How is it calibrated through learning?
- 4.
What is the role of proprioceptive feedback in calibrating body position?
- 5.
How are target position and body position information combined to control the next planned movement? Are movement sequences whose units are sensory and motor items (cue-command pairs) sufficient for some types of spatial navigation?
- 6.
How are items within movement sequences stored in short-term working memory?
- 7.
How can a memory system be designed to store and recall sequential plans, or chunks? How can such learning solve the classical goal paradox (Grossberg, 1978), which notes that, on all sequence learning trials, the goal always occurs last, whereas during planned sequential performance, knowledge that the goal is being sought is activated first? That is, how can learning be order-preserving, yet also break the learned order to permit the desired goal to activate a plan capable of realizing it?
- 8.
How does reinforcement enable an animal to selectively attend to rewarded goal objects?
- 9.
How does reinforcement interact with the memory representation of experienced sequences in order to preferentially choose and perform them?
- 10.
How can latent learning, or learning without reward, be explained?
- 11.
How does competitive decision-making resolve conflicts between visually reactive and planned movement commands?
Section snippets
Approach-orienting navigation and learning in a 3D virtual reality environment
The SOVEREIGN model simulates these processes for an animat that experiences a 3D visual world in a virtual reality environment. This world is the relatively simple spatial format of a plus-maze (Munn, 1950). Although simple, this environment tests in a clear way many of the critical competences that the animat needs to achieve. Much of the problem’s difficulty arises because an animat may navigate the maze in different ways, including different speeds and directions of movement, on successive
End-to-end SOVEREIGN simulations
Three key SOVEREIGN simulations are demonstrated herein. In the Motivated Choice Experiment, the animat learns the route to two different goal locations under two different motivational states. In the Multiple Reward Experiment, the animat learns to repeatedly revisit two rewarded goal locations. Finally, in the Shortest Path Experiment, the animate learns the shortest path to the goal location, after repeated random exploration of the maze environment. Each of the simulation summaries contains
General discussion and conclusions
The SOVEREIGN architecture embodies a number of design principles whose mechanistic instantiation as neural circuits (Fig. 2) enable incremental learning of planned action sequences to carry out route-based navigation towards a rewarded goal. SOVEREIGN circuits are based on neural models that have elsewhere been used to explain and predict many behavioral and brain data. Here the emphasis is on designing a neuromorphic controller that realizes behavioral competence.
The model has several notable
Acknowledgements
William Gnadt was supported in part by Riverside Research Institute, the Defense Advanced Research Projects Agency (N00014-92-J-4015), the Air Force Office of Scientific Research (F49620-92-J-0225), the National Science Foundation (IRI 90-24877 and SBE-0354378), the Office of Naval Research (N00014-91-J-4100, N00014-92-J-1309, and N00014-95-1-0657), and Pacific Sierra Research (PSR 91-6075-2).
Stephen Grossberg was supported in part by the National Science Foundation (SBE-0354378), and the
References (94)
- et al.
Visual learning, adaptive expectations, and behavioral conditioning of the mobile robot MAVIN
Neural Networks
(1991) - et al.
Fast learning VIEWNET architectures for recognizing 3-D objects from multiple 2-D views
Neural Networks
(1995) - et al.
How laminar frontal cortex and basal ganglia circuits interact to control planned and reactive saccades
Neural Networks
(2004) - et al.
Adaptive neural networks for control of movement trajectories invariant under speed and force rescaling
Human Movement Science
(1991) - et al.
A massively parallel architecture for a self-organizing neural pattern recognition machine
Computer Vision, Graphics, and Image Processing
(1987) - et al.
Fuzzy ART: Fast stable learning and categorization of analog patterns by an adaptive resonance system
Neural Networks
(1991) - et al.
Vector associative maps: Unsupervised real-time error-based learning and control of movement trajectories
Neural Networks
(1991) - et al.
Purkinje cell activity during motor learning
Brain Research
(1977) - et al.
Separate visual pathways for perception and action
Trends in Neurosciences
(1992) - et al.
Neural representations for sensory-motor control, I: Head-centered 3-D target positions from opponent eye commands
Acta Psychologica
(1993)
A neural theory of punishment and avoidance, II: Quantitative theory
Mathematical Biosciences
A theory of human memory: Self-organization and performance of sensory-motor codes, maps, and plans
The complementary brain: Unifying brain dynamics and modularity
Trends in Cognitive Sciences
The imbalanced brain: From normal behavior to schizophrenia
Biological Psychiatry
Neural representations for sensory-motor control, II: Learning a head-centered visuomotor representation of 3-D target position
Neural Networks
A neural network model of adaptively timed reinforcement learning and hippocampal dynamics
Cognitive Brain Research
Neural dynamics of motion integration and segmentation within and across apertures
Vision Research
Laminar cortical dynamics of 3D surface perception: Stratification, transparency, and neon color spreading
Vision Research
Object vision and spatial vision: Two cortical pathways
Trends in Neurosciences
The hippocampus as a spatial map. Preliminary evidence from unit activity in the freely-moving rat
Brain Research
Levels of sensorimotor representation
Journal of Mathematical Psychology
Role of the hippocampus in temporal and spatial navigation: An adaptive neural network
Behavioral Brain Research
The neural basis of basic associative learning of discrete behavioral responses
Trends in Neurosciences
Fuzzy sets
Information and Control
Landmark learning: An illustration of associative search
Biological Cybernetics
Laminar cortical dynamics of visual form and motion interactions during coherent object motion perception
Spatial Vision
Working memory networks for learning temporal order with application to three-dimensional visual object recognition
Neural Computation
STORE working memory networks for storage and recall of arbitrary temporal sequences
Biological Cybernetics
Cortical networks for control of voluntary arm movements under variable force conditions
Cerebral Cortex
Neural dynamics of planned arm movements: Emergent invariants and speed-accuracy properties during trajectory formation
Psychological Review
A self-organizing neural model of motor equivalent reaching and tool use by a multijoint arm
Journal of Cognitive Neuroscience
Hippocampus — Spatial models
Latent learning and the goal gradient hypothesis
Contributions to Psychological Theory
A laminar cortical model of stereopsis and 3D surface perception: Closure and da Vinci stereopsis
Spatial Vision
Fuzzy ARTMAP: A neural network architecture for incremental supervised learning of analog multidimensional maps
IEEE Transactions on Neural Networks
Orientation of the white rat
Journal of Comparative Neurology and Psychology
Neural dynamics of speech and language coding: Developmental programs, perceptual grouping, and competition for short-term memory
Human Neurobiology
Masking fields: A massively parallel neural architecture for learning, recognizing, and predicting multiple groupings of patterned data
Applied Optics
A neural model of cerebellar learning for arm movement control: Cortico-spino-cerebellar dynamics
Learning and Memory
Hierarchical structures: Chunking by food type facilitates spatial memory
Journal of Experimental Psychology: Animal Behavior Processes
The effect of doors on latent learning
Journal of Comparative Psychology
Navigating through temporal difference
Multiple visual discrimination in the block elevated maze
Journal of Comparative and Physiological Psychology
Cited by (40)
Emergent vision system of autonomous virtual characters
2016, NeurocomputingCitation Excerpt :Traditionally, the problem of synthetic vision of autonomous agents (robots and virtual characters) has been approached with attempts to build models capable of preprocessing the sensor signals and drive the motor actions of the agents [1,2].
A spatio-temporal Long-term Memory approach for visual place recognition in mobile robotic navigation
2013, Robotics and Autonomous SystemsCitation Excerpt :These episodes are learnt and stored using spatio-temporal LTM cells. Such topological representation was also employed in studying goal seeking behavior in biological navigation by robotic simulation [40,41]. Another interesting discussion on temporal encoding of hippocampal cells for topological representation was found in [42].
Executive control of cognitive agents using a biologically inspired model architecture of the prefrontal cortex
2012, Biologically Inspired Cognitive ArchitecturesThe Ouroboros Model in the light of venerable criteria
2010, NeurocomputingCitation Excerpt :Different models can be ranked on the basis of their coverage, sophistication and grounding. Having started this brief survey with the relation of the Ouroboros Model to symbolic approaches, it seems natural to conclude this section with a look at one of the most advanced existing connectionist models of autonomous agents [12]. Disregarding implementational details, the approach as adopted by SOVEREIGN can be understood as partly implementation of the Ouroboros Model; in particular, the described chunking mechanism delivers units of knowledge which resemble schemata, and which alike are effectively used to control movements.
An embodied account of serial order: How instabilities drive sequence generation
2010, Neural NetworksCitation Excerpt :The attractors can be switched by a global signal, similar to the condition of satisfaction signal of our model. Embodiment is only a secondary concern for the broader and more abstract neural dynamic models of sequence generation (Gnadt & Grossberg, 2007; Grossberg & Pearson, 2008) that are based on Grossberg’s proposal of a neural mechanism for serial order (Grossberg, 1978). Here, sequences are stored as spatial gradients of neural activation.
- 1
Tel.: +1 617 353 7858; fax: +1 617 353 7755.