Skip to main content
Log in

The eHRI database: a multimodal database of engagement in human–robot interactions

  • Original Paper
  • Published:
Language Resources and Evaluation Aims and scope Submit manuscript

Abstract

We present the engagement in human–robot interaction (eHRI) database containing natural interactions between two human participants and a robot under a story-shaping game scenario. The audio-visual recordings provided with the database are fully annotated at a 5-intensity scale for head nods and smiles, as well as with speech transcription and continuous engagement values. In addition, we present baseline results for the smile and head nod detection along with a real-time multimodal engagement monitoring system. We believe that the eHRI database will serve as a novel asset for research in affective human–robot interaction by providing raw data, annotations, and baseline results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. https://www.softbankrobotics.com/emea/en/nao.

  2. https://www.softbankrobotics.com/emea/en/pepper.

  3. https://furhatrobotics.com/.

  4. The eHRI database will be publicly available at https://mvgl.ku.edu.tr/databases/.

References

  • Al Moubayed, S., Beskow, J., & Skantze, G. (2013). The Furhat social companion talking head. In Interspeech 2013, 14th annual conference of the international speech communication association (pp. 747–749)

  • Aubrey, A. J., Marshall, D., Rosin, P. L., Vendeventer, J., Cunningham, D. W., & Wallraven, C. (2013). Cardiff conversation database (CCDb): A database of natural dyadic conversations. In 2013 IEEE conference on computer vision and pattern recognition workshops (pp. 277–282)

  • Baltrusaitis, T., Zadeh, A., Lim, Y. C., & Morency, L. P. (2018). OpenFace 2.0: Facial behavior analysis toolkit. In 13th IEEE international conference on automatic face gesture recognition (FG 2018) (pp. 59–66)

  • Ben-Youssef, A., Clavel, C., Essid, S., Bilac, M., Chamoux, M., & Lim, A. (2017). UE-HRI: A new dataset for the study of user engagement in spontaneous human–robot interactions. In ICMI 2017, 19th ACM international conference on multimodal interaction (pp. 464–472)

  • Ben Youssef, A., Varni, G., Essid, S., & Clavel, C. (2019). On-the-fly detection of user engagement decrease in spontaneous human–robot interaction. International Journal of Social Robotics, 11(5), 815–828.

    Article  Google Scholar 

  • Busso, C., Bulut, M., Lee, C. C., Kazemzadeh, A., Mower, E., Kim, S., Chang, J. N., Lee, S., & Narayanan, S. S. (2008). IEMOCAP: Interactive emotional dyadic motion capture database. Language Resources and Evaluation, 42(4), 335.

    Article  Google Scholar 

  • Cafaro, A., Wagner, J., Baur, T., Dermouche, S., Torres Torres, M., Pelachaud, C., André, E., & Valstar, M. (2017). The NoXi database: Multimodal recordings of mediated novice-expert interactions. In ICMI 2017, 19th ACM international conference on multimodal interaction (pp. 350–359)

  • Castellano, G., Leite, I., Pereira, A., Martinho, C., Paiva, A., & McOwan, P. W. (2012). Detecting engagement in HRI: An exploration of social and task-based context. In 2012 international conference on privacy, security, risk and trust and 2012 international conference on social computing (pp. 421–428)

  • Celiktutan, O., Skordos, E., & Gunes, H. (2019). Multimodal human–human–robot interactions (MHHRI) dataset for studying personality and engagement. IEEE Transactions on Affective Computing, 10(4), 484–497.

    Article  Google Scholar 

  • Devillers, L., Rosset, S., Duplessis, G. D., Bechade, L., Yemez, Y., Turker, B. B., Sezgin, M., Erzin, E., El Haddad, K., Dupont, S., Deleglise, P., Esteve, Y., Lailler, C., Gilmartin, E., Campbell, N. (2018). Multifaceted engagement in social interaction with a machine: The joker project. In FG 2018, 13th IEEE international conference on automatic face & gesture recognition (pp. 697–701)

  • Devillers, L., Rosset, S., Dubuisson, G. D., Sehili, M. A., Béchade, L., Delaborde, A., Gossart, C., Letard, V., Yang, F., Yemez, Y., Türker, B. B., Sezgin, M., El Haddad, K., Dupont, S., Luzzati, D., Estève, Y., Gilmartin, E., & Nick, C. (2015). Multimodal data collection of human–robot humorous interactions in the JOKER project. In ACII 2015, international conference on affective computing and intelligent interaction (pp. 348–354)

  • Dhall, A., Kaur, A., Goecke, R., & Gedeon, T. (2018). Emotiw 2018: Audio-video, student engagement and group-level affect prediction. In ICMI 2018, 20th ACM international conference on multimodal interaction (pp. 653–656)

  • Glas, N., & Pelachaud, C. (2015). Definitions of engagement in human–agent interaction. In ACII 2015, international conference on affective computing and intelligent interaction (pp. 944–949)

  • Griol, D., Molina, J. M., & Callejas, Z. (2014). Modeling the user state for context-aware spoken interaction in ambient assisted living. Applied Intelligence, 40(4), 749–771.

    Article  Google Scholar 

  • Gupta, A., D’Cunha, A., Awasthi, K., & Balasubramanian, V. (2016). DAiSEE: Towards user engagement recognition in the wild. arXiv preprint. arXiv:1609.01885

  • Hussain, N., Erzin, E., Sezgin, T. M., & Yemez, Y. (2019). Speech driven backchannel generation using deep Q-network for enhancing engagement in human–robot interaction. In Interspeech 2019, 19th annual conference of the international speech communication association (pp. 4445–4449)

  • Hussain, N., Erzin, E., Sezgin, T. M., & Yemez, Y. (2022). Training socially engaging robots: Modeling backchannel behaviors with batch reinforcement learning. IEEE Transactions on Affective Computing. https://doi.org/10.1109/TAFFC.2022.3190233.

    Article  Google Scholar 

  • Jayagopi, D. B., Sheiki, S., Klotz, D., Wienke, J., Odobez, J. M., Wrede, S., Khalidov, V., Nyugen, L., Wrede, B., & Gatica-Perez, D. (2013). The vernissage corpus: A conversational human–robot-interaction dataset. In HRI 2013, 8th ACM/IEEE international conference on human–robot interaction (pp. 149–150)

  • Kantharaju, R. B., Ringeval, F., & Besacier, L. (2018). Automatic recognition of affective laughter in spontaneous dyadic interactions from audiovisual signals. In ICMI 2018, 20th ACM international conference on multimodal interaction (pp. 220–228)

  • Kaur, A., Mustafa, A., Mehta, L., & Dhall, A. (2018). Prediction and localization of student engagement in the wild. In DICTA 2018, digital image computing: Techniques and applications (pp. 1–8)

  • Lee, K. M., Jung, Y., Kim, J., & Kim, S. R. (2006). Are physically embodied social agents better than disembodied social agents?: The effects of physical embodiment, tactile interaction, and people’s loneliness in human–robot interaction. International Journal of Human–Computer Studies, 64(10), 962–973.

    Article  Google Scholar 

  • Li, J. (2015). The benefit of being physically present: A survey of experimental works comparing copresent robots, telepresent robots and virtual agents. International Journal of Human–Computer Studies, 77, 23–37.

    Article  Google Scholar 

  • Malmir, M., Forster, D., Youngstrom, K., Morrison, L., & Movellan, J. (2013). Home alone: Social robots for digital ethnography of toddler behavior. In IEEE international conference on computer vision workshops (pp. 762–768)

  • McKeown, G., Valstar, M., Cowie, R., Pantic, M., & Schroder, M. (2012). The SEMAINE database: Annotated multimodal records of emotionally colored conversations between a person and a limited agent. IEEE Transactions on Affective Computing, 3(1), 5–17.

    Article  Google Scholar 

  • Metallinou, A., Katsamanis, A., & Narayanan, S. (2013). Tracking continuous emotional trends of participants during affective dyadic interactions using body language and speech information. Image and Vision Computing, 31(2), 137–152.

    Article  Google Scholar 

  • Metallinou, A., Yang, Z., Cc, Lee, Busso, C., Carnicke, S., & Narayanan, S. (2016). The USC CreativeIT database of multimodal dyadic interactions: From speech and full body motion capture to continuous emotional annotations. Language Resources and Evaluation, 50(3), 497–521.

    Article  Google Scholar 

  • Moubayed, S. A., Skantze, G., & Beskow, J. (2013). The Furhat back-projected humanoid head: Lip reading, gaze and multi-party interaction. International Journal of Humanoid Robotics, 10(01), 1350005.

    Article  Google Scholar 

  • Mubin, O., Ahmad, M. I., Kaur, S., Shi, W., & Khan, A. (2018). Social robots in public spaces: A meta-review. In S. S. Ge, J. J. Cabibihan, M. A. Salichs, E. Broadbent, H. He, A. R. Wagner, & Á. Castro-González (Eds.), Social robotics (pp. 213–220). Springer International Publishing.

    Chapter  Google Scholar 

  • Rich, C., Ponsler, B., Holroyd, A., & Sidner, C. L. (2010). Recognizing engagement in human–robot interaction. In HRI 2010, 5th ACM/IEEE international conference on human–robot interaction (pp. 375–382)

  • Ringeval, F., Sonderegger, A., Sauer, J., & Lalanne, D. (2013). Introducing the RECOLA multimodal corpus of remote collaborative and affective interactions. In FG 2013, 10th IEEE international conference and workshops on automatic face and gesture recognition (pp. 1–8)

  • Sharma, M., Ahmetovic, D., Jeni, L. A., & Kitani, K. M. (2018). Recognizing visual signatures of spontaneous head gestures. In WACV 2018, IEEE winter conference on applications of computer vision (pp. 400–408)

  • Sidner, C. L., Lee, C., Kidd, C. D., Lesh, N., & Rich, C. (2005). Explorations in engagement for humans and robots. Artificial Intelligence, 166(1), 140–164.

    Article  Google Scholar 

  • Valstar, M. (2019). The handbook of multimodal-multisensor interfaces: Language processing, software, commercialization, and emerging directions—Volume 3, association for computing machinery and Morgan & Claypool, chap multimodal databases (pp. 393–421)

  • Vandeventer, J., Aubrey, A., Rosin, P. L., & Marshall, A. D. (2015). 4D Cardiff Conversation Database (4D CCDb): A 4D database of natural, dyadic conversations. In FAAVSP 2015, 1st joint conference on facial analysis, animation, and auditory-visual speech processing (pp. 157–162)

  • Wittenburg, P., Brugman, H., Russel, A., Klassmann, A., & Sloetjes, H. (2006). ELAN: A professional framework for multimodality research. In LREC 2006, fifth international conference on language resources and evaluation. Max Planck Institute for Psycholinguistics. https://archive.mpi.nl/tla/elan

Download references

Funding

This work is supported by Türkiye Bilimsel ve Teknolojik Araştirma Kurumu under Grant Number 217E040.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Engin Erzin.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kesim, E., Numanoglu, T., Bayramoglu, O. et al. The eHRI database: a multimodal database of engagement in human–robot interactions. Lang Resources & Evaluation 57, 985–1009 (2023). https://doi.org/10.1007/s10579-022-09632-1

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10579-022-09632-1

Keywords

Navigation