Skip to main content
Log in

Emergence of emotional appraisal signals in reinforcement learning agents

  • Published:
Autonomous Agents and Multi-Agent Systems Aims and scope Submit manuscript

Abstract

The positive impact of emotions in decision-making has long been established in both natural and artificial agents. In the perspective of appraisal theories, emotions complement perceptual information, coloring our sensations and guiding our decision-making. However, when designing autonomous agents, is emotional appraisal the best complement to the perceptions? Mechanisms investigated in affective neuroscience provide support for this hypothesis in biological agents. In this paper, we look for similar support in artificial systems. We adopt the intrinsically motivated reinforcement learning framework to investigate different sources of information that can guide decision-making in learning agents, and an evolutionary approach based on genetic programming to identify a small set of such sources that have the largest impact on the performance of the agent in different tasks, as measured by an external evaluation signal. We then show that these sources of information: (i) are applicable in a wider range of environments than those where the agents evolved; (ii) exhibit interesting correspondences to emotional appraisal-like signals previously proposed in the literature, pointing towards our departing hypothesis that the appraisal process might indeed provide essential information to complement perceptual capabilities and thus guide decision-making.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Notes

  1. Perceptual information can be of an internal nature, e.g., about goals, needs or beliefs, or external, e.g., about objects or events from the environment.

  2. We adopt a rather broad definition of signal. Specifically, we refer to an appraisal signal any emotional appraisal-based information received and processed, in this case, by the decision-making module.

  3. These complementary sources of information endow the agent with a richer repertoire of behaviors that may successfully overcome agent limitations [33, 40].

  4. In our approach, we use a fitness metric that directly measures the performance of the agent in the underlying task in different scenarios.

  5. Typical RL scenarios assume that \({\mathcal {Z}}={\mathcal {S}}\) and \(\mathsf {O}(z\mid s,a)=\delta (z,s)\), where \(\delta \) denotes the Kronecker delta [42]. When this is the case, parameters \({\mathcal {Z}}\) and \(\mathsf {O}\) can be safely discarded and the simplified model thus obtained, represented as a tuple \({\mathcal {M}}=({\mathcal {S}},{\mathcal {A}},\mathsf {P},r,\gamma )\), is referred to as a Markov decision process (MDP).

  6. Our RL agents all follow the prioritized sweeping algorithm and use the exploration policy detailed in Sect. 3.

  7. We note that our choice in measuring the fitness as the cumulative external evaluation signal is only one among many other possible metrics. In the context of our study, we believe this to be a good metric as it allows us to directly measure the agent’s fitness from its performance in the underlying task in the environment.

  8. More details on GP can be found in [15].

  9. Recall that \(r^{\mathcal {F}}(s,a)\) rewards the agent in accordance with the increase/decrease of fitness caused by executing each \(a\) in each state \(s\).

  10. The first generation, corresponding to the population \({\mathcal {R}}_1\), is randomly generated.

    Fig. 5
    figure 5

    The GP approach to the ORP, as proposed in [25]. In each generation \(j\), a population \({\mathcal {R}}_{j}\) contains a set of candidate reward functions \(r_k,k=1,\ldots ,K\). All are evaluated according to a fitness function \({\mathcal {F}}(r_{k)}\) and evolve according to crossover, mutation and selection

  11. We resorted to a simple unpaired \(t\) test to determine this statistical significance.

  12. The set \({\mathcal {E}}\) is scenario-specific. For example, in the Hungry–Thirsty scenario, \({\mathcal {E}}\) includes all possible configurations of food and water.

  13. The fence is only an obstacle when the agent is moving upward from position (1:2).

  14. Denoting by \(n_t(\mathrm{fence})\) the number of times that the agent crossed the fence upwards up to time-step \(t\), \(N_t\) is given by \(N_t=\min \lbrace n_t(\mathrm{fence})+1;30\rbrace \).

  15. We again emphasize that our identification procedure is not guided by appraisal theories, the objective for now is precisely to identify useful signals, regardless of their connection with emotional appraisal.

  16. In our distillation process we are focused on extracting a minimal set domain-independent informative signals. As will become clearer in the next section, apart from additive constants (which have minimum impact on the policy and can therefore be safely discarded), it will be possible to reconstruct the reward functions (and attain comparable degrees of fitness) in Table 1 as a linear combination of these signals.

  17. Further experimental verification would be required to back up this claim, however this is not within the scope of this paper.

  18. We note that this section is more concerned with the computational applicability of the emerged signals. As such, we leave the analysis of their relation with emotional appraisal mechanisms for the discussion in Sect. 5.

  19. The keeper ghost, when present, makes it difficult for Pac-Man to reach the central cell, essential for the completion of most scenarios.

  20. Illustrative videos of the observed behaviors in different stages of the learning process in all the Pac-Man scenarios have been provided along with the submission (and are also available at http://gaips.inesc-id.pt/~psequeira/emot-emerg/).

  21. Recall that in the seasons environments the value of preys depends on the current season, but this still is an external factor that the agent cannot act upon and change.

  22. We refer to [34] where the reward functions included information about whether the agent was being considerate about other agents of the same population in the context of resource sharing scenarios.

  23. Also, many theorists distinguish between primary appraisal, providing a rather crude evaluation but a fast, almost automatic response to events, and secondary appraisal, allowing a more deep and cognitive analysis of the situation and leading to more complex patterns of response throughout time by means of associative learning processes [6, 7, 16, 26]. In that matter, the IMRL framework itself supports this dual aspect of appraisal: on one hand, RL provides a fast evaluation of the perceived stimuli (the state) and provides responses (the actions) based on a single signal—the learned \(Q\)-value function; on the other hand, as time progresses during the agent’s lifetime, the intrinsic rewards will reflect more what was learned in the previous interactions thus providing a more accurate evaluation over the environment. We note however that this aspect is related with properties of the RL framework itself and not of our approach in particular.

  24. We note that in the classical approach to RL, different problems will usually require distinct reward functions so that the agent is able to learn the intended task [42].

References

  1. Ahn, H., & Picard, R. (2006). Affective cognitive learning and decision making: The role of emotions. Proceedings of the 18th European Meeting on Cybernetics and Systems Research, pp. 1–6.

  2. Baird, L. (1993). Advantage updating. Technical Report WL-TR-93-1146, Wright Laboratory, Wright-Patterson Air Force Base.

  3. Bechara, A., Damasio, H., & Damasio, A. (2000). Emotion, decision making and the orbitofrontal cortex. Cerebral Cortex, 10(3), 295–307.

    Article  Google Scholar 

  4. Bratman, J., Singh, S., Lewis, R., & Sorg, J. (2012). Strong mitigation: Nesting search for good policies within search for good reward. Proceedings of the 11th International Joint Conference Autonomous Agents and Multiagent Systems.

  5. Broekens, D., Kosters, W., & Verbeek, F. (2007). On affect and self-adaptation: Potential benefits of valence-controlled action-selection. Bio-inspired Modeling of Cognitive Tasks: Proceedings of the 2nd International Conference Interplay Between Natural and Artificial Computation, pp. 357–366.

  6. Cardinal, R., Parkinson, J., Hall, J., & Everitt, B. (2002). Emotion and motivation: the role of the amygdala, ventral striatum, and prefrontal cortex. Neuroscience and Biobehavioral Reviews, 26(3), 321–352.

    Article  Google Scholar 

  7. Damasio, A. (1994). Descartes’ error: Emotion, reason, and the human brain. New York: Putnam Publishing.

    Google Scholar 

  8. Dawkins, M. (2000). Animal minds and animal emotions. American Zoologist, 40(6), 883–888.

    Article  Google Scholar 

  9. Ellsworth, P., & Scherer, K. (2003). Appraisal processes in emotion. In R. Davidson, K. Scherer, & H. Goldsmith (Eds.), Handbook of the affective sciences. New York: Oxford University Press.

    Google Scholar 

  10. Frijda, N., & Mesquita, B. (1998). The analysis of emotions: dimensions of variation. In M. Mascolo & S. Griffin (Eds.), What develops in emotional development? (Emotions, personality, and psychotherapy). New York: Springer.

    Google Scholar 

  11. Gadanho, S., & Hallam, J. (2001). Robot learning driven by emotions. Adaptive Behavior, 9(1), 42–64.

    Article  Google Scholar 

  12. Hester, T., Stone, P. (2012). Intrinsically motivated model learning for a developing curious agent. Procssdings of the IEEE International Conference on Development and Learning and Epigenetic Robotics, pp. 1–6. ICDL 2013.

  13. Kaelbling, L., Littman, M., & Moore, A. (1996). Reinforcement learning: A survey. Journal of Artificial Intelligence Research, 4, 237–285.

    Google Scholar 

  14. Kaelbling, L., Littman, M., & Cassandra, A. (1998). Planning and acting in partially observable stochastic domains. Artificial Intelligence, 101, 99–134.

    Article  MATH  MathSciNet  Google Scholar 

  15. Koza, J. (1992). Genetic programming: On the programming of computers by means of natural selection. Cambridge: MIT Press.

    MATH  Google Scholar 

  16. Lazarus, R. (2001). Relational meaning and discrete emotions. In K. Scherer, A. Schorr, & T. Johnstone (Eds.), Appraisal processes in emotion: Theory, methods, research. New York: Oxford University Press.

    Google Scholar 

  17. LeDoux, J. (2000). Emotion circuits in the brain. Annual Review of Neuroscience, 23(1), 155–184.

    Article  Google Scholar 

  18. Leventhal, H., & Scherer, K. (1987). The relationship of emotion to cognition: A functional approach to a semantic controversy. Cognition and Emotion, 1(1), 3–28.

    Article  Google Scholar 

  19. Littman, M. (1994). Memoryless policies: Theoretical limitations and practical results. Proceedings of the 3rd International Conference Simulation of Adaptive Behavior-From Animals to Animats 3.

  20. Lopes, M., Lang, T., Toussaint, M., & Oudeyer, P.-Y. (2012). Exploration in model-based reinforcement learning by empirically estimating learning progress. Advances in Neural Information Processing Systems 25, pp. 206–214.

  21. Marinier, R. (2008). A computational unification of cognitive control, emotion, and learning. Phd thesis, University of Michigan, Ann Arbor, MI.

  22. Marsella, S., & Gratch, J. (2009). Ema: A process model of appraisal dynamics. Cognitive Systems Research, 10(1), 70–90.

    Article  Google Scholar 

  23. Marsella, S., Gratch, J., & Petta, P. (2010). Computational models of emotion. In K. Scherer, T. Bänziger, & E. Roesch (Eds.), Blueprint for affective computing. New York: Oxford University Press.

    Google Scholar 

  24. Moore, A., & Atkeson, C. (1993). Prioritized sweeping: Reinforcement learning with less data and less real time. Machine Learning, 13, 103–130.

    Google Scholar 

  25. Niekum, Scott, Barto, Andrew G., & Spector, Lee. (2010). Genetic programming for reward function search. IEEE Transactions on Autonomous Mental Development, 2(2), 83–90.

    Article  Google Scholar 

  26. Oatley, K., & Jenkins, J. (2006). Understanding emotions. Oxford: Wiley-Blackwell.

    Google Scholar 

  27. Phelps, E., & LeDoux, J. (2005). Contributions of the amygdala to emotion processing: From animal models to human behavior. Neuron, 48(2), 175–187.

    Article  Google Scholar 

  28. Roseman, I., & Smith, C. (2001). Appraisal theory: Overview, assumptions, varieties, controversies, research. In K. Scherer, A. Schorr, & T. Johnstone (Eds.), Appraisal processes in emotion: Theory, methods. Oxford: Oxford University Press.

    Google Scholar 

  29. Rumbell, T., Barnden, J., Denham, S., & Wennekers, T. (2011). Emotions in autonomous agents: Comparative analysis of mechanisms and functions. Autonomous Agents and Multiagent Systems, 25(1), 1–45.

    Article  Google Scholar 

  30. Salichs, M., Malfaz, M. (2006). Using emotions on autonomous agents: The role of happiness, sadness and fear. In Proceedings of the Annual Conference on Ambient Intelligence and Simulated Behavior, pp. 157–164.

  31. Salichs, M., & Malfaz, M. (2012). A new approach to modeling emotions and their use on a decision-making system for artificial agents. IEEE Transactions on Affective Computing, 3(1), 56–68.

    Article  Google Scholar 

  32. Scherer, K. (2001). Appraisal considered as a process of multilevel sequential checking, research. In K. Scherer, A. Schorr, & T. Johnstone (Eds.), Appraisal processes in emotion: Theory, methods. Oxford: Oxford University Press.

    Google Scholar 

  33. Sequeira, P., Melo, F.S., & Paiva, A. (2011a). Emotion-based intrinsic motivation for reinforcement learning agents. Proceedings of the 4th International Conference Affective Computing and Intelligent Interaction, pp. 326–336.

  34. Sequeira, P., Melo, F.S., Prada, R., & Paiva, A. (2011b). Emerging social awareness: Exploring intrinsic motivation in multiagent learning. Proceedings of the 1st IEEE International Joint Conference Development and Learning and Epigenetic Robotics, vol 2, pp. 1–6.

  35. Sequeira, P., Melo, F.S., & Paiva, A. (2012). Learning by appraising: An emotion-based approach for intrinsic reward design. Technical, Report GAIPS-TR-001-12, INESC-ID.

  36. Si, M., Marsella, S., & Pynadath, D. (2010). Modeling appraisal in theory of mind reasoning. Autonomous Agents and Multi-Agent Systems, 20(1), 14–31.

    Article  Google Scholar 

  37. Singh, S., Jaakkola, T., & Jordan, M. (1994). Learning without state estimation in partially observable Markovian decision processes. Proceedings of the 11th International Conference Machine Learning, pp. 284–292.

  38. Singh, S., Lewis, R., & Barto, A. (2009). Where do rewards come from? Proceedings of the Annual Conference Cognitive Science Society, pp. 2601–2606.

  39. Singh, S., Lewis, R., Barto, A., & Sorg, J. (2010). Intrinsically motivated reinforcement learning: An evolutionary perspective. IEEE Transactions on Autonomous Mental Development, 2(2), 70–82.

    Article  Google Scholar 

  40. Sorg, J., Singh, S., & Lewis, R. (2010a). Internal rewards mitigate agent boundedness. Proceedings of the 27th International Conference Machine Learning, pp. 1007–1014.

  41. Sorg, J., Singh, S., & Lewis, R. (2010b). Reward design via online gradient ascent. Advances in Neural Information Processing Systems, 23, 1–9.

    Google Scholar 

  42. Sutton, R., & Barto, A. (1998). Reinforcement learning: An introduction. Cambridge: The MIT Press.

    Google Scholar 

  43. Syswerda, G. (1989). Uniform crossover in genetic algorithms. Proceedings of the 3rd International Conference Genetic Algorithms, pp. 2–9). CA, USA: San Francisco.

Download references

Acknowledgments

This work was partially supported by the Portuguese Fundação para a Ciência e a Tecnologia (FCT) under project PEst-OE/EEI/LA0021/2013 and the EU project SEMIRA through the grant ERA-Compl /0002/2009. The first author acknowledges Grant SFRH/BD/38681/2007 from the FCT.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pedro Sequeira.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (mp4 10873 KB)

Supplementary material 2 (mp4 10955 KB)

Supplementary material 3 (mp4 11009 KB)

Supplementary material 4 (mp4 10951 KB)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sequeira, P., Melo, F.S. & Paiva, A. Emergence of emotional appraisal signals in reinforcement learning agents. Auton Agent Multi-Agent Syst 29, 537–568 (2015). https://doi.org/10.1007/s10458-014-9262-4

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10458-014-9262-4

Keywords

Navigation