skip to main content
10.1145/3623809.3623866acmotherconferencesArticle/Chapter ViewAbstractPublication PageshaiConference Proceedingsconference-collections
research-article

Identifying the Focus of Attention in Human-Robot Conversational Groups

Published:04 December 2023Publication History

ABSTRACT

We propose a method for detecting the group’s focus of attention: the visual point at which a majority of participants direct their gaze in a conversation. This information enables a robot to infer important conversational cues and adjust its behavior to support more natural conversational interactions. Our approach uses a Hidden Markov Model based on mimicry, where the robot observes the head orientation of participants and infers their gaze direction to identify the group’s focus of attention. We demonstrate our method by replicating the gaze patterns of the group members, showing that the robot can accurately determine the focal point. We evaluated our algorithm using a combination of datasets and real-world scenarios with a Fetch robot, demonstrating an accuracy of 81% compared to a baseline of 54%. Our proposed method has the potential to significantly improve group-oriented human-robot interaction.

Skip Supplemental Material Section

Supplemental Material

hai23a-sub1035-i7.mp4

mp4

57.6 MB

References

  1. Leopoldo Acosta, Evelio González, José Natán Rodríguez, Alberto F Hamilton, 2006. Design and implementation of a service robot for a restaurant. International Journal of Robotics & Automation 21, 4 (2006), 273.Google ScholarGoogle ScholarCross RefCross Ref
  2. Ho Seok Ahn, Sheng Zhang, Min Ho Lee, Jong Yoon Lim, and Bruce A MacDonald. 2018. Robotic Healthcare Service System to Serve Multiple Patients with Multiple Robots. In International Conference on Social Robotics. Springer, 493–502.Google ScholarGoogle ScholarCross RefCross Ref
  3. X. Alameda-Pineda, J. Staiano, R. Subramanian, L. Batrinca, E. Ricci, B. Lepri, O. Lanz, and N. Sebe. 2016. SALSA: A Novel Dataset for Multimodal Group Behavior Analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence 38, 8 (Aug 2016), 1707–1720. https://doi.org/10.1109/TPAMI.2015.2496269Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Sean Andrist, Bilge Mutlu, and Michael Gleicher. 2013. Conversational gaze aversion for virtual agents. In International Workshop on Intelligent Virtual Agents. Springer, 249–262.Google ScholarGoogle ScholarCross RefCross Ref
  5. Sean Andrist, Xiang Zhi Tan, Michael Gleicher, and Bilge Mutlu. 2014. Conversational gaze aversion for humanlike robots. In 2014 9th ACM/IEEE International Conference on Human-Robot Interaction (HRI). IEEE, 25–32.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Salvatore Maria Anzalone, Elodie Tilmont, Sofiane Boucenna, Jean Xavier, Anne-Lise Jouen, Nicolas Bodeau, Koushik Maharatna, Mohamed Chetouani, David Cohen, MICHELANGELO Study Group, 2014. How children with autism spectrum disorder behave and explore the 4-dimensional (spatial 3D+ time) environment during a joint attention induction task with a robot. Research in Autism Spectrum Disorders 8, 7 (2014), 814–826.Google ScholarGoogle ScholarCross RefCross Ref
  7. Georgios Athanasopoulos, Werner Verhelst, and Hichem Sahli. 2015. Robust speaker localization for real-world robots. Computer Speech & Language 34, 1 (2015), 129–153.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Franziska Babel, Johannes Kraus, Linda Miller, Matthias Kraus, Nicolas Wagner, Wolfgang Minker, and Martin Baumann. 2021. Small talk with a robot? The impact of dialog content, talk initiative, and gaze behavior of a social robot on trust, acceptance, and proximity. International Journal of Social Robotics 13, 6 (2021), 1485–1498.Google ScholarGoogle ScholarCross RefCross Ref
  9. Christoph Bartneck and Jodi Forlizzi. 2004. A design-centred framework for social human-robot interaction. In RO-MAN 2004. 13th IEEE international workshop on robot and human interactive communication (IEEE Catalog No. 04TH8759). IEEE, 591–594.Google ScholarGoogle Scholar
  10. Dan Bohus, Chit W Saw, and Eric Horvitz. 2014. Directions robot: in-the-wild experiences and lessons learned. In Proceedings of the 2014 international conference on Autonomous agents and multi-agent systems. International Foundation for Autonomous Agents and Multiagent Systems, 637–644.Google ScholarGoogle Scholar
  11. Cynthia Breazeal and Brian Scassellati. 2000. Infant-like social interactions between a robot and a human caregiver. Adaptive Behavior 8, 1 (2000), 49–74.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Laura Cabrera-Quiros, Andrew Demetriou, Ekin Gedik, Leander van der Meij, and Hayley Hung. 2018. The MatchNMingle dataset: a novel multi-sensor resource for the analysis of social interactions and group dynamics in-the-wild during free-standing conversations and speed dates. IEEE Transactions on Affective Computing (2018).Google ScholarGoogle Scholar
  13. Punarjay Chakravarty and Tinne Tuytelaars. 2016. Cross-modal supervision for learning active speaker detection in video. In European Conference on Computer Vision. Springer, 285–301.Google ScholarGoogle ScholarCross RefCross Ref
  14. Yingfeng Chen, Feng Wu, Wei Shuai, Ningyang Wang, Rongya Chen, and Xiaoping Chen. 2015. Kejia robot–an attractive shopping mall guider. In International Conference on Social Robotics. Springer, 145–154.Google ScholarGoogle Scholar
  15. Joon Son Chung and Andrew Zisserman. 2016. Out of time: automated lip sync in the wild. In Asian conference on computer vision. Springer, 251–263.Google ScholarGoogle Scholar
  16. Shaundra B Daily, Melva T James, David Cherry, John J Porter III, Shelby S Darnell, Joseph Isaac, and Tania Roy. 2017. Affective computing: historical foundations, current applications, and future trends. In Emotions and affect in human factors and human-computer interaction. Elsevier, 213–231.Google ScholarGoogle Scholar
  17. Kahneman Daniel. 2017. Thinking, fast and slow.Google ScholarGoogle Scholar
  18. Chandan Datta, Anuj Kapuria, and Ritukar Vijay. 2011. A pilot study to understand requirements of a shopping mall robot. In Proceedings of the 6th international conference on Human-robot interaction. ACM, 127–128.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Malcolm Doering, Dražen Brščić, and Takayuki Kanda. 2021. Data-Driven Imitation Learning for a Shopkeeper Robot with Periodically Changing Product Information. ACM Transactions on Human-Robot Interaction (THRI) 10, 4 (2021), 1–20.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Mark Everingham, Josef Sivic, and Andrew Zisserman. 2006. Hello! My name is... Buffy”–Automatic Naming of Characters in TV Video.. In BMVC, Vol. 2. 6.Google ScholarGoogle Scholar
  21. Sarah Gillet, Maria Teresa Parreira, Marynel Vázquez, and Iolanda Leite. 2022. Learning Gaze Behaviors for Balancing Participation in Group Human-Robot Interactions. In Proceedings of the 2022 ACM/IEEE International Conference on Human-Robot Interaction. 265–274.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Sarah Gillet, Marynel Vázquez, Christopher Peters, Fangkai Yang, and Iolanda Leite. 2022. Multiparty interaction between humans and socially interactive agents. In The Handbook on Socially Interactive Agents: 20 years of Research on Embodied Conversational Agents, Intelligent Virtual Agents, and Social Robotics Volume 2: Interactivity, Platforms, Application. 113–154.Google ScholarGoogle Scholar
  23. H-M Gross, H Boehme, Ch Schroeter, Steffen Müller, Alexander König, Erik Einhorn, Ch Martin, Matthias Merten, and Andreas Bley. 2009. TOOMAS: interactive shopping guide robots in everyday use-final implementation and experiences from long-term field trials. In 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, 2005–2012.Google ScholarGoogle ScholarCross RefCross Ref
  24. Hooman Hedayati. 2021. Improving Human-Robot Conversational Groups.Google ScholarGoogle Scholar
  25. Hooman Hedayati, Annika Muehlbradt, Daniel J Szafir, and Sean Andrist. [n. d.]. Reform: Recognizing f-formations for social robots. In 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 11181–11188.Google ScholarGoogle Scholar
  26. Hooman Hedayati, Stela Hanbyeol Seo, Takayuki Kanda, Daniel J Rea, Sean Andrist, Yukiko Nakano, and Hiroshi Ishiguro. 2023. Symbiotic Society with Avatars (SSA) Beyond Space and Time. In Companion of the 2023 ACM/IEEE International Conference on Human-Robot Interaction. 953–955.Google ScholarGoogle Scholar
  27. Hooman Hedayati and Daniel Szafir. 2022. Predicting Positions of People in Human-Robot Conversational Groups. In Proceedings of the 2022 ACM/IEEE International Conference on Human-Robot Interaction. 402–411.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Hooman Hedayati, Daniel Szafir, and James Kennedy. 2020. Comparing f-formations between humans and on-screen agents. In Extended Abstracts of the 2020 CHI Conference on Human Factors in Computing Systems. 1–9.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Ken Hoover, Sourish Chaudhuri, Caroline Pantofaru, Malcolm Slaney, and Ian Sturdy. 2017. Putting a face to the voice: Fusing audio and visual signals across a video to determine speakers. arXiv preprint arXiv:1706.00079 (2017).Google ScholarGoogle Scholar
  30. Chong Huang and Kazuhito Koishida. 2020. Improved active speaker detection based on optical flow. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. 950–951.Google ScholarGoogle ScholarCross RefCross Ref
  31. Chien-Ming Huang, Takamasa Iio, Satoru Satake, and Takayuki Kanda. 2014. Modeling and Controlling Friendliness for An Interactive Museum Robot.. In Robotics: science and systems. Citeseer, 12–16.Google ScholarGoogle Scholar
  32. Takayuki Kanda, Masahiro Shiomi, Zenta Miyashita, Hiroshi Ishiguro, and Norihiro Hagita. 2009. An affective guide robot in a shopping mall. In Proceedings of the 4th ACM/IEEE international conference on Human robot interaction. ACM, 173–180.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Takayuki Kanda, Masahiro Shiomi, Zenta Miyashita, Hiroshi Ishiguro, and Norihiro Hagita. 2010. A communication robot in a shopping mall. IEEE Transactions on Robotics 26, 5 (2010), 897–913.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Adam Kendon. 1990. Conducting interaction: Patterns of behavior in focused encounters. Vol. 7. CUP Archive.Google ScholarGoogle Scholar
  35. Hideki Kozima and Hiroyuki Yano. 2001. A robot that learns to communicate with human caregivers. In Proceedings of the First International Workshop on Epigenetic Robotics, Vol. 2001.Google ScholarGoogle Scholar
  36. Ivan Marković and Ivan Petrović. 2010. Speaker localization and tracking with a microphone array on a mobile robot using von Mises distribution and particle filtering. Robotics and Autonomous Systems 58, 11 (2010), 1185–1196.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Ivan Markovic, Alban Portello, Patrick Danes, Ivan Petrovic, and Sylvain Argentieri. 2013. Active speaker localization with circular likelihoods and bootstrap filtering. In 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, 2914–2920.Google ScholarGoogle ScholarCross RefCross Ref
  38. Chris Moore, Philip J Dunham, and Phil Dunham. 2014. Joint attention: Its origins and role in development. Psychology Press.Google ScholarGoogle Scholar
  39. Peter Mundy, Jessica Block, Christine Delgado, Yuly Pomares, Amy Vaughan Van Hecke, and Meaghan Venezia Parlade. 2007. Individual differences and the development of joint attention in infancy. Child development 78, 3 (2007), 938–954.Google ScholarGoogle Scholar
  40. Peter Mundy and Lisa Newell. 2007. Attention, joint attention, and social cognition. Current directions in psychological science 16, 5 (2007), 269–274.Google ScholarGoogle Scholar
  41. Yukie Nagai, Koh Hosoda, Akio Morita, and Minoru Asada. 2003. A constructive model for the development of joint attention. Connection Science 15, 4 (2003), 211–229.Google ScholarGoogle ScholarCross RefCross Ref
  42. Setareh Nasihati Gilani, David Traum, Arcangelo Merla, Eugenia Hee, Zoey Walker, Barbara Manini, Grady Gallagher, and Laura-Ann Petitto. 2018. Multimodal dialogue management for multiparty interaction with infants. In Proceedings of the 20th ACM International Conference on Multimodal Interaction. 5–13.Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Marketta Niemelä, Päivi Heikkilä, Hanna Lammi, and Virpi Oksman. 2019. A social robot in a shopping mall: studies on acceptance and stakeholder expectations. In Social Robots: Technological, Societal and Ethical Aspects of Human-Robot Interaction. Springer, 119–144.Google ScholarGoogle Scholar
  44. Hirotaka Osawa, Arisa Ema, Hiromitsu Hattori, Naonori Akiya, Nobotsugu Kanzaki, Akinori Kubo, Tora Koyama, and Ryutaro Ichise. 2017. Analysis of robot hotel: Reconstruction of works with robots. In 2017 26th IEEE international symposium on robot and human interactive communication (RO-MAN). IEEE, 219–223.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Hirotaka Osawa, Arisa Ema, Hiromitsu Hattori, Naonori Akiya, Nobutsugu Kanzaki, Akinori Kubo, Tora Koyama, and Ryutaro Ichise. 2017. What is Real Risk and Benefit on Work with Robots?: From the Analysis of a Robot Hotel. In Proceedings of the Companion of the 2017 ACM/IEEE International Conference on Human-Robot Interaction. ACM, 241–242.Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Matthew KXJ Pan, Sungjoon Choi, James Kennedy, Kyna McIntosh, Daniel Campos Zamora, Günter Niemeyer, Joohyung Kim, Alexis Wieland, and David Christensen. 2020. Realistic and interactive robot gaze. In 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 11072–11078.Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Tomislav Pejsa, Sean Andrist, Michael Gleicher, and Bilge Mutlu. 2015. Gaze and attention management for embodied conversational agents. ACM Transactions on Interactive Intelligent Systems (TiiS) 5, 1 (2015), 1–34.Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Paola Pennisi, Alessandro Tonacci, Gennaro Tartarisco, Lucia Billeci, Liliana Ruta, Sebastiano Gangemi, and Giovanni Pioggia. 2016. Autism and social robotics: A systematic review. Autism Research 9, 2 (2016), 165–183.Google ScholarGoogle ScholarCross RefCross Ref
  49. Daniel Prendergast and Daniel Szafir. 2018. Improving object disambiguation from natural language using empirical models. In Proceedings of the 20th ACM International Conference on Multimodal Interaction. 477–485.Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Caleb Rascon and Ivan Meza. 2017. Localization of sound sources in robotics: A review. Robotics and Autonomous Systems 96 (2017), 184–210.Google ScholarGoogle ScholarCross RefCross Ref
  51. Joseph Roth, Sourish Chaudhuri, Ondrej Klejch, Radhika Marvin, Andrew Gallagher, Liat Kaver, Sharadh Ramaswamy, Arkadiusz Stopczynski, Cordelia Schmid, Zhonghua Xi, 2020. Ava active speaker: An audio-visual dataset for active speaker detection. In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 4492–4496.Google ScholarGoogle ScholarCross RefCross Ref
  52. Richard Savery, Ryan Rose, and Gil Weinberg. 2019. Establishing human-robot trust through music-driven robotic emotion prosody and gesture. In 2019 28th IEEE international conference on robot and human interactive communication (RO-MAN). IEEE, 1–7.Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Francesco Setti, Chris Russell, Chiara Bassetti, and Marco Cristani. 2015. F-formation detection: Individuating free-standing conversational groups in images. PloS one 10, 5 (2015), e0123783.Google ScholarGoogle ScholarCross RefCross Ref
  54. Masahiro Shiomi, Takayuki Kanda, Hiroshi Ishiguro, and Norihiro Hagita. 2006. Interactive humanoid robots for a science museum. In Proceedings of the 1st ACM SIGCHI/SIGART conference on Human-robot interaction. 305–312.Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Christopher Stanton and Catherine J Stevens. 2014. Robot pressure: the impact of robot eye gaze and lifelike bodily movements upon decision-making and trust. In International conference on social robotics. Springer, 330–339.Google ScholarGoogle ScholarCross RefCross Ref
  56. Mason Swofford, John Peruzzi, Nathan Tsoi, Sydney Thompson, Roberto Martín-Martín, Silvio Savarese, and Marynel Vázquez. 2020. Improving Social Awareness Through DANTE: Deep Affinity Network for Clustering Conversational Interactants. Proceedings of the ACM on Human-Computer Interaction 4, CSCW1 (2020), 1–23.Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Stephanie Tan, David MJ Tax, and Hayley Hung. 2022. Conversation Group Detection With Spatio-Temporal Context. arXiv preprint arXiv:2206.02559 (2022).Google ScholarGoogle Scholar
  58. Xiang Zhi Tan, Sean Andrist, Dan Bohus, and Eric Horvitz. 2020. Now, over here: Leveraging extended attentional capabilities in human-robot interaction. In Companion of the 2020 ACM/IEEE International Conference on Human-Robot Interaction. 468–470.Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Xiang Zhi Tan, Elizabeth Jeanne Carter, Prithu Pareek, and Aaron Steinfeld. 2022. Group Formation in Multi-Robot Human Interaction During Service Scenarios. In INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION. 159–169.Google ScholarGoogle Scholar
  60. Nathan Tsoi, Kate Candon, Yofti Milkessa, and Marynel Vázquez. 2020. An End-to-End Approach for Training Neural Network Binary Classifiers on Metrics Based on the Confusion Matrix. arXiv preprint arXiv:2009.01367 (2020).Google ScholarGoogle Scholar
  61. J-M Valin, François Michaud, Jean Rouat, and Dominic Létourneau. 2003. Robust sound source localization using a microphone array on a mobile robot. In Proceedings 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003)(Cat. No. 03CH37453), Vol. 2. IEEE, 1228–1233.Google ScholarGoogle ScholarCross RefCross Ref
  62. Marynel Vázquez, Elizabeth J Carter, Braden McDorman, Jodi Forlizzi, Aaron Steinfeld, and Scott E Hudson. 2017. Towards robot autonomy in group conversations: Understanding the effects of body orientation and gaze. In Proceedings of the 2017 ACM/IEEE International Conference on Human-Robot Interaction. ACM, 42–52.Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Marynel Vázquez, Alexander Lew, Eden Gorevoy, and Joe Connolly. 2021. Pose Generation for Social Robots in Conversational Group Formations. Frontiers in Robotics and AI 8 (2021).Google ScholarGoogle Scholar
  64. Zachary E Warren, Zhi Zheng, Amy R Swanson, Esubalew Bekele, Lian Zhang, Julie A Crittendon, Amy F Weitlauf, and Nilanjan Sarkar. 2015. Can robotic interaction improve joint attention skills?Journal of autism and developmental disorders 45, 11 (2015), 3726–3734.Google ScholarGoogle Scholar
  65. Eduardo Zalama, Jaime Gómez García-Bermejo, Samuel Marcos, Salvador Domínguez, Raúl Feliz, Roberto Pinillos, and Joaquín López. 2014. Sacarino, a service robot in a hotel environment. In ROBOT2013: First Iberian Robotics Conference. Springer, 3–14.Google ScholarGoogle ScholarCross RefCross Ref
  66. Abolfazl Zaraki, Daniele Mazzei, Manuel Giuliani, and Danilo De Rossi. 2014. Designing and evaluating a social gaze-control system for a humanoid robot. IEEE Transactions on Human-Machine Systems 44, 2 (2014), 157–168.Google ScholarGoogle ScholarCross RefCross Ref
  67. Lei Zhou, Dingye Yang, Xiaolin Zhai, Shichao Wu, Zhengxi Hu, and Jingtai Liu. 2022. GA-STT: Human Trajectory Prediction with Group Aware Spatial-Temporal Transformer. IEEE Robotics and Automation Letters (2022).Google ScholarGoogle Scholar

Index Terms

  1. Identifying the Focus of Attention in Human-Robot Conversational Groups

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      HAI '23: Proceedings of the 11th International Conference on Human-Agent Interaction
      December 2023
      506 pages
      ISBN:9798400708244
      DOI:10.1145/3623809

      Copyright © 2023 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 4 December 2023

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed limited

      Acceptance Rates

      Overall Acceptance Rate121of404submissions,30%
    • Article Metrics

      • Downloads (Last 12 months)75
      • Downloads (Last 6 weeks)16

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format