research-article

Identifying the Focus of Attention in Human-Robot Conversational Groups

Authors:
Hooman Hedayati

Kyoto University, Japan

Kyoto University, Japan

0000-0003-0933-9214
View Profile

,
Annika Muehlbradt

Adobe, United States

Adobe, United States

0000-0002-8284-8659
View Profile

,
James Kennedy

Independent Researcher, United States

Independent Researcher, United States

0000-0003-2760-8318
View Profile

,
Daniel Szafir

Department of Computer Science, University of North Carolina at Chapel Hill, United States

Department of Computer Science, University of North Carolina at Chapel Hill, United States

0000-0003-1848-7884
View Profile

HAI '23: Proceedings of the 11th International Conference on Human-Agent InteractionDecember 2023Pages 3–12https://doi.org/10.1145/3623809.3623866

Published:04 December 2023Publication History

HAI '23: Proceedings of the 11th International Conference on Human-Agent Interaction

Pages 3–12

ABSTRACT

We propose a method for detecting the group’s focus of attention: the visual point at which a majority of participants direct their gaze in a conversation. This information enables a robot to infer important conversational cues and adjust its behavior to support more natural conversational interactions. Our approach uses a Hidden Markov Model based on mimicry, where the robot observes the head orientation of participants and infers their gaze direction to identify the group’s focus of attention. We demonstrate our method by replicating the gaze patterns of the group members, showing that the robot can accurately determine the focal point. We evaluated our algorithm using a combination of datasets and real-world scenarios with a Fetch robot, demonstrating an accuracy of 81% compared to a baseline of 54%. Our proposed method has the potential to significantly improve group-oriented human-robot interaction.

Supplemental Material

hai23a-sub1035-i7.mp4

mp4

57.6 MB

Download

References

Leopoldo Acosta, Evelio González, José Natán Rodríguez, Alberto F Hamilton, 2006. Design and implementation of a service robot for a restaurant. International Journal of Robotics & Automation 21, 4 (2006), 273.Google ScholarCross Ref
Ho Seok Ahn, Sheng Zhang, Min Ho Lee, Jong Yoon Lim, and Bruce A MacDonald. 2018. Robotic Healthcare Service System to Serve Multiple Patients with Multiple Robots. In International Conference on Social Robotics. Springer, 493–502.Google ScholarCross Ref
X. Alameda-Pineda, J. Staiano, R. Subramanian, L. Batrinca, E. Ricci, B. Lepri, O. Lanz, and N. Sebe. 2016. SALSA: A Novel Dataset for Multimodal Group Behavior Analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence 38, 8 (Aug 2016), 1707–1720. https://doi.org/10.1109/TPAMI.2015.2496269Google ScholarDigital Library
Sean Andrist, Bilge Mutlu, and Michael Gleicher. 2013. Conversational gaze aversion for virtual agents. In International Workshop on Intelligent Virtual Agents. Springer, 249–262.Google ScholarCross Ref
Sean Andrist, Xiang Zhi Tan, Michael Gleicher, and Bilge Mutlu. 2014. Conversational gaze aversion for humanlike robots. In 2014 9th ACM/IEEE International Conference on Human-Robot Interaction (HRI). IEEE, 25–32.Google ScholarDigital Library
Salvatore Maria Anzalone, Elodie Tilmont, Sofiane Boucenna, Jean Xavier, Anne-Lise Jouen, Nicolas Bodeau, Koushik Maharatna, Mohamed Chetouani, David Cohen, MICHELANGELO Study Group, 2014. How children with autism spectrum disorder behave and explore the 4-dimensional (spatial 3D+ time) environment during a joint attention induction task with a robot. Research in Autism Spectrum Disorders 8, 7 (2014), 814–826.Google ScholarCross Ref
Georgios Athanasopoulos, Werner Verhelst, and Hichem Sahli. 2015. Robust speaker localization for real-world robots. Computer Speech & Language 34, 1 (2015), 129–153.Google ScholarDigital Library
Franziska Babel, Johannes Kraus, Linda Miller, Matthias Kraus, Nicolas Wagner, Wolfgang Minker, and Martin Baumann. 2021. Small talk with a robot? The impact of dialog content, talk initiative, and gaze behavior of a social robot on trust, acceptance, and proximity. International Journal of Social Robotics 13, 6 (2021), 1485–1498.Google ScholarCross Ref
Christoph Bartneck and Jodi Forlizzi. 2004. A design-centred framework for social human-robot interaction. In RO-MAN 2004. 13th IEEE international workshop on robot and human interactive communication (IEEE Catalog No. 04TH8759). IEEE, 591–594.Google Scholar
Dan Bohus, Chit W Saw, and Eric Horvitz. 2014. Directions robot: in-the-wild experiences and lessons learned. In Proceedings of the 2014 international conference on Autonomous agents and multi-agent systems. International Foundation for Autonomous Agents and Multiagent Systems, 637–644.Google Scholar
Cynthia Breazeal and Brian Scassellati. 2000. Infant-like social interactions between a robot and a human caregiver. Adaptive Behavior 8, 1 (2000), 49–74.Google ScholarDigital Library
Laura Cabrera-Quiros, Andrew Demetriou, Ekin Gedik, Leander van der Meij, and Hayley Hung. 2018. The MatchNMingle dataset: a novel multi-sensor resource for the analysis of social interactions and group dynamics in-the-wild during free-standing conversations and speed dates. IEEE Transactions on Affective Computing (2018).Google Scholar
Punarjay Chakravarty and Tinne Tuytelaars. 2016. Cross-modal supervision for learning active speaker detection in video. In European Conference on Computer Vision. Springer, 285–301.Google ScholarCross Ref
Yingfeng Chen, Feng Wu, Wei Shuai, Ningyang Wang, Rongya Chen, and Xiaoping Chen. 2015. Kejia robot–an attractive shopping mall guider. In International Conference on Social Robotics. Springer, 145–154.Google Scholar
Joon Son Chung and Andrew Zisserman. 2016. Out of time: automated lip sync in the wild. In Asian conference on computer vision. Springer, 251–263.Google Scholar
Shaundra B Daily, Melva T James, David Cherry, John J Porter III, Shelby S Darnell, Joseph Isaac, and Tania Roy. 2017. Affective computing: historical foundations, current applications, and future trends. In Emotions and affect in human factors and human-computer interaction. Elsevier, 213–231.Google Scholar
Kahneman Daniel. 2017. Thinking, fast and slow.Google Scholar
Chandan Datta, Anuj Kapuria, and Ritukar Vijay. 2011. A pilot study to understand requirements of a shopping mall robot. In Proceedings of the 6th international conference on Human-robot interaction. ACM, 127–128.Google ScholarDigital Library
Malcolm Doering, Dražen Brščić, and Takayuki Kanda. 2021. Data-Driven Imitation Learning for a Shopkeeper Robot with Periodically Changing Product Information. ACM Transactions on Human-Robot Interaction (THRI) 10, 4 (2021), 1–20.Google ScholarDigital Library
Mark Everingham, Josef Sivic, and Andrew Zisserman. 2006. Hello! My name is... Buffy”–Automatic Naming of Characters in TV Video.. In BMVC, Vol. 2. 6.Google Scholar
Sarah Gillet, Maria Teresa Parreira, Marynel Vázquez, and Iolanda Leite. 2022. Learning Gaze Behaviors for Balancing Participation in Group Human-Robot Interactions. In Proceedings of the 2022 ACM/IEEE International Conference on Human-Robot Interaction. 265–274.Google ScholarDigital Library
Sarah Gillet, Marynel Vázquez, Christopher Peters, Fangkai Yang, and Iolanda Leite. 2022. Multiparty interaction between humans and socially interactive agents. In The Handbook on Socially Interactive Agents: 20 years of Research on Embodied Conversational Agents, Intelligent Virtual Agents, and Social Robotics Volume 2: Interactivity, Platforms, Application. 113–154.Google Scholar
H-M Gross, H Boehme, Ch Schroeter, Steffen Müller, Alexander König, Erik Einhorn, Ch Martin, Matthias Merten, and Andreas Bley. 2009. TOOMAS: interactive shopping guide robots in everyday use-final implementation and experiences from long-term field trials. In 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, 2005–2012.Google ScholarCross Ref
Hooman Hedayati. 2021. Improving Human-Robot Conversational Groups.Google Scholar
Hooman Hedayati, Annika Muehlbradt, Daniel J Szafir, and Sean Andrist. [n. d.]. Reform: Recognizing f-formations for social robots. In 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 11181–11188.Google Scholar
Hooman Hedayati, Stela Hanbyeol Seo, Takayuki Kanda, Daniel J Rea, Sean Andrist, Yukiko Nakano, and Hiroshi Ishiguro. 2023. Symbiotic Society with Avatars (SSA) Beyond Space and Time. In Companion of the 2023 ACM/IEEE International Conference on Human-Robot Interaction. 953–955.Google Scholar
Hooman Hedayati and Daniel Szafir. 2022. Predicting Positions of People in Human-Robot Conversational Groups. In Proceedings of the 2022 ACM/IEEE International Conference on Human-Robot Interaction. 402–411.Google ScholarDigital Library
Hooman Hedayati, Daniel Szafir, and James Kennedy. 2020. Comparing f-formations between humans and on-screen agents. In Extended Abstracts of the 2020 CHI Conference on Human Factors in Computing Systems. 1–9.Google ScholarDigital Library
Ken Hoover, Sourish Chaudhuri, Caroline Pantofaru, Malcolm Slaney, and Ian Sturdy. 2017. Putting a face to the voice: Fusing audio and visual signals across a video to determine speakers. arXiv preprint arXiv:1706.00079 (2017).Google Scholar
Chong Huang and Kazuhito Koishida. 2020. Improved active speaker detection based on optical flow. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. 950–951.Google ScholarCross Ref
Chien-Ming Huang, Takamasa Iio, Satoru Satake, and Takayuki Kanda. 2014. Modeling and Controlling Friendliness for An Interactive Museum Robot.. In Robotics: science and systems. Citeseer, 12–16.Google Scholar
Takayuki Kanda, Masahiro Shiomi, Zenta Miyashita, Hiroshi Ishiguro, and Norihiro Hagita. 2009. An affective guide robot in a shopping mall. In Proceedings of the 4th ACM/IEEE international conference on Human robot interaction. ACM, 173–180.Google ScholarDigital Library
Takayuki Kanda, Masahiro Shiomi, Zenta Miyashita, Hiroshi Ishiguro, and Norihiro Hagita. 2010. A communication robot in a shopping mall. IEEE Transactions on Robotics 26, 5 (2010), 897–913.Google ScholarDigital Library
Adam Kendon. 1990. Conducting interaction: Patterns of behavior in focused encounters. Vol. 7. CUP Archive.Google Scholar
Hideki Kozima and Hiroyuki Yano. 2001. A robot that learns to communicate with human caregivers. In Proceedings of the First International Workshop on Epigenetic Robotics, Vol. 2001.Google Scholar
Ivan Marković and Ivan Petrović. 2010. Speaker localization and tracking with a microphone array on a mobile robot using von Mises distribution and particle filtering. Robotics and Autonomous Systems 58, 11 (2010), 1185–1196.Google ScholarDigital Library
Ivan Markovic, Alban Portello, Patrick Danes, Ivan Petrovic, and Sylvain Argentieri. 2013. Active speaker localization with circular likelihoods and bootstrap filtering. In 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, 2914–2920.Google ScholarCross Ref
Chris Moore, Philip J Dunham, and Phil Dunham. 2014. Joint attention: Its origins and role in development. Psychology Press.Google Scholar
Peter Mundy, Jessica Block, Christine Delgado, Yuly Pomares, Amy Vaughan Van Hecke, and Meaghan Venezia Parlade. 2007. Individual differences and the development of joint attention in infancy. Child development 78, 3 (2007), 938–954.Google Scholar
Peter Mundy and Lisa Newell. 2007. Attention, joint attention, and social cognition. Current directions in psychological science 16, 5 (2007), 269–274.Google Scholar
Yukie Nagai, Koh Hosoda, Akio Morita, and Minoru Asada. 2003. A constructive model for the development of joint attention. Connection Science 15, 4 (2003), 211–229.Google ScholarCross Ref
Setareh Nasihati Gilani, David Traum, Arcangelo Merla, Eugenia Hee, Zoey Walker, Barbara Manini, Grady Gallagher, and Laura-Ann Petitto. 2018. Multimodal dialogue management for multiparty interaction with infants. In Proceedings of the 20th ACM International Conference on Multimodal Interaction. 5–13.Google ScholarDigital Library
Marketta Niemelä, Päivi Heikkilä, Hanna Lammi, and Virpi Oksman. 2019. A social robot in a shopping mall: studies on acceptance and stakeholder expectations. In Social Robots: Technological, Societal and Ethical Aspects of Human-Robot Interaction. Springer, 119–144.Google Scholar
Hirotaka Osawa, Arisa Ema, Hiromitsu Hattori, Naonori Akiya, Nobotsugu Kanzaki, Akinori Kubo, Tora Koyama, and Ryutaro Ichise. 2017. Analysis of robot hotel: Reconstruction of works with robots. In 2017 26th IEEE international symposium on robot and human interactive communication (RO-MAN). IEEE, 219–223.Google ScholarDigital Library
Hirotaka Osawa, Arisa Ema, Hiromitsu Hattori, Naonori Akiya, Nobutsugu Kanzaki, Akinori Kubo, Tora Koyama, and Ryutaro Ichise. 2017. What is Real Risk and Benefit on Work with Robots?: From the Analysis of a Robot Hotel. In Proceedings of the Companion of the 2017 ACM/IEEE International Conference on Human-Robot Interaction. ACM, 241–242.Google ScholarDigital Library
Matthew KXJ Pan, Sungjoon Choi, James Kennedy, Kyna McIntosh, Daniel Campos Zamora, Günter Niemeyer, Joohyung Kim, Alexis Wieland, and David Christensen. 2020. Realistic and interactive robot gaze. In 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 11072–11078.Google ScholarDigital Library
Tomislav Pejsa, Sean Andrist, Michael Gleicher, and Bilge Mutlu. 2015. Gaze and attention management for embodied conversational agents. ACM Transactions on Interactive Intelligent Systems (TiiS) 5, 1 (2015), 1–34.Google ScholarDigital Library
Paola Pennisi, Alessandro Tonacci, Gennaro Tartarisco, Lucia Billeci, Liliana Ruta, Sebastiano Gangemi, and Giovanni Pioggia. 2016. Autism and social robotics: A systematic review. Autism Research 9, 2 (2016), 165–183.Google ScholarCross Ref
Daniel Prendergast and Daniel Szafir. 2018. Improving object disambiguation from natural language using empirical models. In Proceedings of the 20th ACM International Conference on Multimodal Interaction. 477–485.Google ScholarDigital Library
Caleb Rascon and Ivan Meza. 2017. Localization of sound sources in robotics: A review. Robotics and Autonomous Systems 96 (2017), 184–210.Google ScholarCross Ref
Joseph Roth, Sourish Chaudhuri, Ondrej Klejch, Radhika Marvin, Andrew Gallagher, Liat Kaver, Sharadh Ramaswamy, Arkadiusz Stopczynski, Cordelia Schmid, Zhonghua Xi, 2020. Ava active speaker: An audio-visual dataset for active speaker detection. In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 4492–4496.Google ScholarCross Ref
Richard Savery, Ryan Rose, and Gil Weinberg. 2019. Establishing human-robot trust through music-driven robotic emotion prosody and gesture. In 2019 28th IEEE international conference on robot and human interactive communication (RO-MAN). IEEE, 1–7.Google ScholarDigital Library
Francesco Setti, Chris Russell, Chiara Bassetti, and Marco Cristani. 2015. F-formation detection: Individuating free-standing conversational groups in images. PloS one 10, 5 (2015), e0123783.Google ScholarCross Ref
Masahiro Shiomi, Takayuki Kanda, Hiroshi Ishiguro, and Norihiro Hagita. 2006. Interactive humanoid robots for a science museum. In Proceedings of the 1st ACM SIGCHI/SIGART conference on Human-robot interaction. 305–312.Google ScholarDigital Library
Christopher Stanton and Catherine J Stevens. 2014. Robot pressure: the impact of robot eye gaze and lifelike bodily movements upon decision-making and trust. In International conference on social robotics. Springer, 330–339.Google ScholarCross Ref
Mason Swofford, John Peruzzi, Nathan Tsoi, Sydney Thompson, Roberto Martín-Martín, Silvio Savarese, and Marynel Vázquez. 2020. Improving Social Awareness Through DANTE: Deep Affinity Network for Clustering Conversational Interactants. Proceedings of the ACM on Human-Computer Interaction 4, CSCW1 (2020), 1–23.Google ScholarDigital Library
Stephanie Tan, David MJ Tax, and Hayley Hung. 2022. Conversation Group Detection With Spatio-Temporal Context. arXiv preprint arXiv:2206.02559 (2022).Google Scholar
Xiang Zhi Tan, Sean Andrist, Dan Bohus, and Eric Horvitz. 2020. Now, over here: Leveraging extended attentional capabilities in human-robot interaction. In Companion of the 2020 ACM/IEEE International Conference on Human-Robot Interaction. 468–470.Google ScholarDigital Library
Xiang Zhi Tan, Elizabeth Jeanne Carter, Prithu Pareek, and Aaron Steinfeld. 2022. Group Formation in Multi-Robot Human Interaction During Service Scenarios. In INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION. 159–169.Google Scholar
Nathan Tsoi, Kate Candon, Yofti Milkessa, and Marynel Vázquez. 2020. An End-to-End Approach for Training Neural Network Binary Classifiers on Metrics Based on the Confusion Matrix. arXiv preprint arXiv:2009.01367 (2020).Google Scholar
J-M Valin, François Michaud, Jean Rouat, and Dominic Létourneau. 2003. Robust sound source localization using a microphone array on a mobile robot. In Proceedings 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003)(Cat. No. 03CH37453), Vol. 2. IEEE, 1228–1233.Google ScholarCross Ref
Marynel Vázquez, Elizabeth J Carter, Braden McDorman, Jodi Forlizzi, Aaron Steinfeld, and Scott E Hudson. 2017. Towards robot autonomy in group conversations: Understanding the effects of body orientation and gaze. In Proceedings of the 2017 ACM/IEEE International Conference on Human-Robot Interaction. ACM, 42–52.Google ScholarDigital Library
Marynel Vázquez, Alexander Lew, Eden Gorevoy, and Joe Connolly. 2021. Pose Generation for Social Robots in Conversational Group Formations. Frontiers in Robotics and AI 8 (2021).Google Scholar
Zachary E Warren, Zhi Zheng, Amy R Swanson, Esubalew Bekele, Lian Zhang, Julie A Crittendon, Amy F Weitlauf, and Nilanjan Sarkar. 2015. Can robotic interaction improve joint attention skills?Journal of autism and developmental disorders 45, 11 (2015), 3726–3734.Google Scholar
Eduardo Zalama, Jaime Gómez García-Bermejo, Samuel Marcos, Salvador Domínguez, Raúl Feliz, Roberto Pinillos, and Joaquín López. 2014. Sacarino, a service robot in a hotel environment. In ROBOT2013: First Iberian Robotics Conference. Springer, 3–14.Google ScholarCross Ref
Abolfazl Zaraki, Daniele Mazzei, Manuel Giuliani, and Danilo De Rossi. 2014. Designing and evaluating a social gaze-control system for a humanoid robot. IEEE Transactions on Human-Machine Systems 44, 2 (2014), 157–168.Google ScholarCross Ref
Lei Zhou, Dingye Yang, Xiaolin Zhai, Shichao Wu, Zhengxi Hu, and Jingtai Liu. 2022. GA-STT: Human Trajectory Prediction with Group Aware Spatial-Temporal Transformer. IEEE Robotics and Automation Letters (2022).Google Scholar

Index Terms

Identifying the Focus of Attention in Human-Robot Conversational Groups
1. Computing methodologies
  1. Artificial intelligence
    1. Distributed artificial intelligence
      1. Intelligent agents

Recommendations

Conversational gaze mechanisms for humanlike robots

During conversations, speakers employ a number of verbal and nonverbal mechanisms to establish who participates in the conversation, when, and in what capacity. Gaze cues and mechanisms are particularly instrumental in establishing the participant roles ...
Read More
Combining dynamic head pose-gaze mapping with the robot conversational state for attention recognition in human-robot interactions

Recognizing the visual focus of attention in the HRI context.Relying on head pose since eye gaze estimation is often impossible to achieve.Inspired from the behavioral models for body, head and gaze dynamics in gaze shifts.Exploiting the robot ...
Read More
Spontaneous spoken dialogues with the furhat human-like robot head
HRI '14: Proceedings of the 2014 ACM/IEEE international conference on Human-robot interaction

Furhat [1] is a robot head that deploys a back-projected animated face that is realistic and human-like in anatomy. Furhat relies on a state-of-the-art facial animation architecture allowing accurate synchronized lip movements with speech, and the ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

HAI '23: Proceedings of the 11th International Conference on Human-Agent Interaction
December 2023
506 pages
ISBN:9798400708244
DOI:10.1145/3623809

Copyright © 2023 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 4 December 2023
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Conversation
F-formation
Gaze
HMM
HRI
Robot behavior
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate121of404submissions,30%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 75
  Total Downloads
- Downloads (Last 12 months)75
- Downloads (Last 6 weeks)16
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Identifying the Focus of Attention in Human-Robot Conversational Groups

HAI '23: Proceedings of the 11th International Conference on Human-Agent Interaction

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Conversational gaze mechanisms for humanlike robots

Combining dynamic head pose-gaze mapping with the robot conversational state for attention recognition in human-robot interactions

Spontaneous spoken dialogues with the furhat human-like robot head

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

Identifying the Focus of Attention in Human-Robot Conversational Groups

HAI '23: Proceedings of the 11th International Conference on Human-Agent Interaction

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Conversational gaze mechanisms for humanlike robots

Combining dynamic head pose-gaze mapping with the robot conversational state for attention recognition in human-robot interactions

Spontaneous spoken dialogues with the furhat human-like robot head

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media