Attention to multiple local critics in decision making and control

https://doi.org/10.1016/j.eswa.2010.03.029Get rights and content

Abstract

Dealing with uncertainties and lack of knowledge about problems and situations, there is a perpetual difficulty to evaluate the situations and action values in all time steps. On the other hand, the design of critics which delicately guide the agent even with reinforcement rewards and punishments in these complicated or blurred environments is laborious and cumbersome. In this study, we propose a framework for concurrent learning of control of attention to the sensory space, attention to the various critics to evaluate the selected motor actions, and the motor actions themselves. Previous works include the implementation of the control of attention for selecting the most important parts of data and/or reducing the dimensionality of the input space. However, decision making can depend not only on objective sensory data, but also upon mental states or subjective inputs as well. Specifically, we examine attention for the evaluations of selected action by various local critics, as well as sensory inputs. Each local critic evaluates the agent’s actions regarding to its standpoints on task domain. Our agent tries to learn the degree of importance of each local critic’s assessment, using sparse super-critic punishing revisions. So the agent learns the way of combining or even disregarding some of local critics while it learns to focus on the appropriate subset of features, and learns physical actions concurrently. By discovering the effective combination of local critics, the agent does not need any prior accurately designed critic. The mathematical formulation of proposed learning method is developed. Also, in order to evaluate the proposed method, two benchmarks are discussed. The effect of using attention control on robustness is analyzed via Monte Carlo analysis. The experimental results show the efficiency of proposed formulation in presence of uncertainties.

Introduction

In order to make a reliable, fast and accurate decision to cope with the complicated dynamical environment, the use of several sensors to provide adequate perception is inevitable. The expansion in number and kinds of sensors results in dramatically growth of data which is collected from the environment or the agent itself. Also, the limited computational power and the constraints on response time to the environmental stimuli, and reactions and intentions of agent, result in the impossibility of processing all captured data. In addition, far from optimal behaviors in subsequent processors can cause serious performance deteriorations if too many non informative signals are taken into consideration. Even with marginal improvements in performance, omission of signals that are of little relevance can be justified so as to avoid too high computational burdens. These clear facts demonstrate the necessity of dynamic mechanism to select the most important part of information for processing and decision making. Suitable focus of attention to the most significant sources of information can lead to reduction of the confusion of the agent for decision making. The utilization of mentioned attention control to the information which will be available for the agent shapes the active perception of the agent facing the dynamic situations and environment.

Attention is the cognitive process of selectively concentrating a subset of things, while ignoring other ones. Of the many cognitive processes associated with the human mind (decision making, memory, emotion, etc.) attention is considered the most concrete because it is tied so closely to perception (Lopez, Fernandez-Caballero, Mira, Delgado, & Fernandez, 2006). The studies on attention in different fields prepare proper substructures for design and implementation of this mechanism for artificial agents in different applications. The works especially on visual attention have lead to several achievements and construction of categories. For instance, the spatial (space based) attention involves focusing on a subset of spatial array, allowing for selective report of information at the focus of attention (Eriksen and Hoffman, 1973, Sperling, 1960). The other realized kind of attention is object based attention. Player of a game such as tennis attends to the ball rather than special part of the court. From the point that what cues lead up the attention, there are bottom-up versus top-down attention. Top-down attention is driven by the mental state of the subject, which means by the information from higher brain areas such as knowledge, expectation and current goals (Corbetta & Shulman, 2002). On the other side, the bottom-up influence is not voluntary suppressible: a highly salient region captures the focus of attention regardless of the task. Evidences from neuro-physiological studies indicate that two independent but interacting brain areas are associated with the two mentioned attentional mechanisms (Corbetta & Shulman, 2002).

Although many works are done to understand and model natural top-down and bottom-up attention mechanisms, most of them are focused on bottom-up attention. However, there are more evidences that top-down attention is more task driven and can be learned in life time of agents. Bottom-up attention have been mostly improved in evolution of creatures. Lingyun et al. show that the bottom-up attention is directed to the features with most self information, regardless of whether there is a target over visual field or not (Lingyun, Tong, & Cottrell, 2007). In addition, some bioinspired models have designed to implement and expand the bottom-up mechanism in artificial systems in which the model parameters can be tuned regarding the agent task (Navalpakkam & Itti, 2005). Although there are varieties of studies on attention, only a few focus on the learning of top-down attention. One of the reasons is that the decision and attention are not separable actions (Balkenius & Hulth, 1999). The resulted obstacle of this inseparability is the necessity of mutual learning of attention and decision in an associative way, and subsequently the difficulties encountered with this mutual learning. In this paper the learning of top-down attention is under consideration.

Control of attention depends on task, agent’s estimation and/or prediction, and decision making mechanisms. From other point of view, the optimal solution of attention control is coupled to the mentioned elements and cannot be solved independently or offline. The solution to the attention problem can be achieved and learned in interaction of the agent with the environment (Fatemi-Shariatpanahi & Nili-Ahmadabadi, 2008). Some of the works on learning attention are done by implementing reinforcement learning methods. For instance, a reinforcement learning models of selective visual attention is proposed in Minut and Mahadevan (2001). In this system, a fixed pan-tilt-zoom camera in a visually cluttered lab environment uses reinforcement learning to learn a policy on a set of regions in the room for reaching the target objects. Also, Paletta, Fritz, and Seifert (2005) use Q-learning algorithm to learn the most informative attention shifts during the task of object recognition in still images.

All mentioned works interpret attention concept as a focus on selective areas of sensory space. There is other side of attention which is concealed behind this overwhelming side. The attention can be employed to different critics’ reward and punishments which are sent to agent in interaction with the world. As well as the variety of sensory information which uses for indication of agent’s state in the environment, there should be many judgmental standpoints which prepare and send evaluation signals. From this point of view, there are several critics with different outlooks about problem whom send their reward and punishments to the agent. And at each particular situation, it is important to find that which ones or which combinations of their opinions are more significant. Therefore, by applying the attention mechanism to the critic space, the designer can eliminate the need for critic design. The implementation of attention to critic space boosts the degree of independence of design process, and is a step toward fully automating the design process.

There are only few studies on the effect of multiple goals or critics on attention. For instance, the impact of temporal goals on attention is discussed in Moskowitz (2002). Mentioned work studies the preconscious control by turning the focus of temporary goals that are activated through feelings of incompleteness. The effect of these goals is moved from compensatory behavior to cognitive responses. From the other standpoint, in Thibodeau, Hart, Karuppiahy, Sweeney, and Brock (2004) a cascade filter approach is proposed and implemented to select the trajectories for a mobile robot based on different goals and objectives. This approach of choosing controller and the concept of attention to the goals is alike. Moreover, there is a similarity between the filter approach in Thibodeau et al. (2004) and the top-down attention which acts like a filter that disregards the irrelevant sources of information to the current or future goals. In addition, though most of the approaches to solve multi-objective problems use fixed ascendency of objectives, such as conventional objectives in control engineering (Ogata, 1997), there are a few studies on nonlinear or fuzzy or linguistic methods of combining the objectives or reinforcement signals such as Arami et al., 2008, Javan-Roshtkhari et al., 2009. Others applied Pareto-based methods to deal with multiple objectives (Farina et al., 2004, Jin and Sendhoff, 2008). In this work, based on the above facts and suppositions, and with respect to the in common features of the selection of the combination of critics and attention to the sensory space-concerning attention and observation costs, and the adverse effect of false or noisy channel of data on the existing non optimal processors-, we propose and implement attention mechanisms for choosing between the multiple critics as well as sensors. These mechanisms provide us with a flexible tool that can be used in wide range of multi-objective problems, with capability of discovering the complex nonlinear relation and ascendency of critics.

This paper organized as follows, in Section 2 the motivation of implementing the attention to local critics are drawn, Section 3 includes the mathematical formulation of the proposed learning of the control of attention to the local critics and sensory information. Section 4 shows the simulation results of implementation of mentioned attention control learning concurrent with semi-supervised learning of decision making and control. Also a Monte Carlo analysis of robustness is done in Section 4. Finally, the concluding discussion part is drawn in Section 5.

Section snippets

Why attention to different critics?

When infants interactively learn from their actions and environmental responds as reward and punishments, they face a variety of rewards and punishments’ sources which are anonymous for them. These unidentified sources of rewards and punishments include parents’ or others’ emotional or physical reactions, interior excitement about intriguing happenings and the direct reactions and repercussions of their actions. So the child has to learn how to attend to these different critics to improve the

Mathematical formulation of mutual learning of attention and decision

In this formulation of learning attention control, the agents try to learn active attention control to critics and features concurrent with learning the physical actions in the problem space. The goal of agent is to maximize total expected reward subject to reducing the sensor-computation cost and sparse super-critic punishments. In the general type of addressed problem the main rewards and punishments are resulted from physical interaction of agent and environment. Also the perceptual actions

Experimental results on Benchmark problems

In order to evaluate the performance of proposed formulation in problems with cardinal uncertainty, two benchmark problems are utilized. First benchmark problem is a stochastic POMDP–MDP grid world game and the next benchmark is a multi-objective model free control problem.

Discussion and concluding remarks

A new approach to attention control which is a combination of attention to the sensory space and local critics is introduced. The agent capability of learning both types of attention can result in finding optimal or near optimal solutions in problems with uncertainties and non accurate critics. Also, a framework of concurrent learning of physical actions and active attention control to features and local critics is drawn as interactive learning. For demonstrating the postulated ideas these

References (25)

  • H. Fatemi-Shariatpanahi et al.

    Biologically inspired framework for learning and abstract representation of attention control

    Lecture Notes in Computer Science

    (2008)
  • Feedback Instrument Ltd. (2002). Digital pendulum control experiments. Manual:...
  • Cited by (1)

    View full text