skip to main content
10.1145/3462244.3479882acmconferencesArticle/Chapter ViewAbstractPublication Pagesicmi-mlmiConference Proceedingsconference-collections

An Automated Mutual Gaze Detection Framework for Social Behavior Assessment in Therapy for Children with Autism

Published: 18 October 2021 Publication History


Mutual gaze is one of the most significant, reliable, and observable social cues that we can use for establishing and maintaining successful social interactions. This cue has been actively used to assess the level of social behavior in the context of autism therapy. However, collecting gaze data manually and evaluating them is so challenging, which requires a lot of time and effort from therapy experts. To address these issues, in this paper, we introduce an automated mutual gaze detection framework, grounded based on previous works on automated gaze detection, as an effective predictive model for social visual behavior analysis and assessment in autism therapy. To evaluate the proposed gaze prediction framework, we prepare an in-house video dataset that captures social interactions between children with autism and their therapy trainers (N = 10, 30 video recordings). We estimate the mutual gaze ratio of children using our prediction model, then compared it with the social visual behavior scores that therapy experts manually annotated. The results showed that our framework provided mutual gaze ratio scores that reliably represent (or even replace) the therapy experts’ hand-coded social visual behavior scores through different analysis approaches: descriptive comparisons, correlation analysis, and regression prediction. We report our findings and discuss the implications of the proposed work in the context of visual behavior analysis for children with autism.

Supplementary Material

MP4 File (ICMI21-fp11614.mp4)
Presentation video for ICMI'21 long paper: An Automated Mutual Gaze Detection Framework for Social Behavior Assessment in Therapy for Children with Autism


Michael Argyle, Roger Ingham, Florisse Alkema, and Margaret McCallin. 1973. The different functions of gaze. Semiotica 7, 1 (1973), 19–32.
American Psychiatric Association 2013. Diagnostic and Statistical Manual of Mental Disorders (DSM-5®). American Psychiatric Pub.
Jon Baio, Lisa Wiggins, Deborah L Christensen, Matthew J Maenner, Julie Daniels, Zachary Warren, Margaret Kurzius-Spencer, Walter Zahorodny, Cordelia Robinson Rosenberg, Tiffany White, 2018. Prevalence of autism spectrum disorder among children aged 8 years—autism and developmental disabilities monitoring network, 11 sites, United States, 2014. MMWR Surveillance Summaries 67, 6 (2018), 1.
Elgiz Bal, Emily Harden, Damon Lamb, Amy Vaughan Van Hecke, John W Denver, and Stephen W Porges. 2010. Emotion recognition in children with autism spectrum disorders: Relations to eye gaze and autonomic state. Journal of Autism and Developmental Disorders 40, 3 (2010), 358–370.
Roghayeh Barmaki and Charles E Hughes. 2018. Embodiment analytics of practicing teachers in a virtual immersive environment. Journal of Computer Assisted Learning 34, 4 (2018), 387–396.
Roghayeh Barmaki, Kevin Yu, Rebecca Pearlman, Richard Shingles, Felix Bork, Greg M Osgood, and Nassir Navab. 2019. Enhancement of anatomical education using augmented reality: An empirical study of body painting. Anatomical Sciences Education 12, 6 (2019), 599–609.
Anjana Narayan Bhat. 2020. Is motor impairment in autism spectrum disorder distinct from developmental coordination disorder? A report from the SPARK study. Physical Therapy 100, 4 (2020), 633–644.
Anjana N Bhat. 2021. Motor impairment increases in children with autism spectrum disorder as a function of social communication, cognitive and functional impairment, repetitive behavior severity, and comorbid diagnoses: A SPARK study report. Autism Research 14, 1 (2021), 202–219.
Anjana N Bhat, Rebecca J Landa, and James C Galloway. 2011. Current perspectives on motor functioning in infants, children, and adults with autism spectrum disorders. Physical Therapy 91, 7 (2011), 1116–1129.
Anjana Narayan Bhat and Sudha Srinivasan. 2013. A review of “music and movement” therapies for children with autism: embodied interventions for multisystem development. Frontiers in Integrative Neuroscience 7 (2013), 22.
Conner J Black, Abigail L Hogan, Kayla D Smith, and Jane E Roberts. 2021. Early behavioral and physiological markers of social anxiety in infants with fragile X syndrome. Journal of Neurodevelopmental Disorders 13, 1 (2021), 1–9.
Anna Bonnel, Laurent Mottron, Isabelle Peretz, Manon Trudel, Erick Gallun, and Anne-Marie Bonnel. 2003. Enhanced pitch sensitivity in individuals with autism: a signal detection analysis. Journal of Cognitive Neuroscience 15, 2 (2003), 226–235.
Geraldine Dawson, Karen Toth, Robert Abbott, Julie Osterling, Jeff Munson, Annette Estes, and Jane Liaw. 2004. Early social attention impairments in autism: social orienting, joint attention, and attention to distress.Developmental Psychology 40, 2 (2004), 271.
Mayada Elsabbagh, Janice Fernandes, Sara Jane Webb, Geraldine Dawson, Tony Charman, Mark H Johnson, British Autism Study of Infant Siblings Team, 2013. Disengagement of visual attention in infancy is associated with emerging autism in toddlerhood. Biological Psychiatry 74, 3 (2013), 189–194.
Loretta Gallo-Lopez and Lawrence C Rubin. 2012. Play-Based Interventions for Children and Adolescents with Autism Spectrum Disorders. Routledge.
Monika Geretsegger, Cochavit Elefant, Karin A Mössler, and Christian Gold. 2014. Music therapy for people with autism spectrum disorder. Cochrane Database of Systematic Reviews6 (2014).
Hatice Gunes and Massimo Piccardi. 2007. Bi-modal emotion recognition from expressive face and body gestures. Journal of Network and Computer Applications 30, 4(2007), 1334–1345.
Zhang Guo and Roghayeh Barmaki. 2019. Collaboration analysis using object detection. In Proceedings of the 12th International Conference on Educational Data Mining. 695–698.
Zhang Guo and Roghayeh Barmaki. 2020. Deep neural networks for collaborative learning analytics: Evaluating team collaborations using student gaze point prediction. Australasian Journal of Educational Technology 36, 6(2020), 53–71.
Juanpablo Andrew Heredia Parillo. 2021. An automatic emotion recognition system that uses the human body posture. (2021).
Kristen L Hess, Michael J Morrier, L Juane Heflin, and Michelle L Ivey. 2008. Autism treatment survey: Services received by children with autism spectrum disorders in public school classrooms. Journal of Autism and Developmental Disorders 38, 5 (2008), 961–971.
Heidi Hillman. 2018. Child-centered play therapy as an intervention for children with autism: A literature review.International Journal of Play Therapy 27, 4 (2018), 198.
Christina Yvonne Jones. 2021. The effects of music therapy frequency on children with autism spectrum disorder (ASD); The therapists point of view. Ph.D. Dissertation. Northcentral University.
Brandon Keehn, Alan J Lincoln, Ralph-Axel Müller, and Jeanne Townsend. 2010. Attentional networks in children and adolescents with autism spectrum disorder. Journal of Child Psychology and Psychiatry 51, 11 (2010), 1251–1259.
Kangsoo Kim, Arjun Nagendran, Jeremy Bailenson, and Greg Welch. 2015. Expectancy violations related to a virtual human’s joint gaze behavior in real-virtual human interactions. In Proceedings of the International Conference on Computer Animation and Social Agents. 5–8.
Martin Koestinger, Paul Wohlhart, Peter M Roth, and Horst Bischof. 2011. Annotated facial landmarks in the wild: A large-scale, real-world database for facial landmark localization. In Proceedings of 2011 IEEE International Conference on Computer Vision Workshops (ICCV workshops). IEEE, 2144–2151.
Yann LeCun, Bernhard Boser, John S Denker, Donnie Henderson, Richard E Howard, Wayne Hubbard, and Lawrence D Jackel. 1989. Backpropagation applied to handwritten zip code recognition. Neural Computation 1, 4 (1989), 541–551.
Jicheng Li and Roghayeh Barmaki. 2019. Trends in virtual and augmented reality research: A review of latest eye tracking research papers and beyond. Preprints (2019).
Jicheng Li, Anjana Bhat, and Roghayeh Barmaki. 2021. A two-stage multi-modal affect analysis framework for children with autism spectrum disorder. arXiv preprint arXiv:2106.09199(2021).
Sidrah Liaqat, Chongruo Wu, Prashanth Reddy Duggirala, Sen-ching Samson Cheung, Chen-Nee Chuah, Sally Ozonoff, and Gregory Young. 2021. Predicting ASD diagnosis in children with synthetic and image-based eye gaze data. Signal Processing: Image Communication 94 (2021), 116198.
Hayoung A Lim and Ellary Draper. 2011. The effects of music therapy incorporated with applied behavior analysis verbal behavior approach for children with autism spectrum disorders. Journal of Music Therapy 48, 4 (2011), 532–550.
Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. 2016. Ssd: Single shot multibox detector. In European Conference on Computer Vision. Springer, 21–37.
Manuel J Marin-Jimenez, Vicky Kalogeiton, Pablo Medina-Suarez, and Andrew Zisserman. 2019. Laeo-net: Revisiting people looking at each other in videos. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3477–3485.
Manuel Jesús Marin-Jimenez, Andrew Zisserman, Marcin Eichner, and Vittorio Ferrari. 2014. Detecting people looking at each other in videos. International Journal of Computer Vision 106, 3 (2014), 282–296.
S Mohammad Mavadati, Huanghao Feng, Anibal Gutierrez, and Mohammad H Mahoor. 2014. Comparing the gaze responses of children with autism and typically developed individuals in human-robot interaction. In Proceedings of 2014 IEEE-RAS International Conference on Humanoid Robots. IEEE, 1128–1133.
Shervin Minaee, Mehdi Minaei, and Amirali Abdolrashidi. 2021. Deep-emotion: Facial expression recognition using attentional convolutional network. Sensors 21, 9 (2021), 3046.
Peter C Mundy. 2016. Autism and Joint Attention: Development, Neuroscience, and Clinical Fundamentals. Guilford Publications.
Basilio Noris, Jacqueline Nadel, Mandy Barker, Nouchine Hadjikhani, and Aude Billard. 2012. Investigating gaze of children with ASD in naturalistic settings. PloS One 7, 9 (2012), e44144.
Julie A Osterling, Geraldine Dawson, and Jeffrey A Munson. 2002. Early recognition of 1-year-old infants with autism spectrum disorder versus mental retardation. Development and Psychopathology 14, 2 (2002), 239–251.
Sally Ozonoff, Ana-Maria Iosif, Fam Baguio, Ian C Cook, Monique Moore Hill, Ted Hutman, Sally J Rogers, Agata Rozga, Sarabjit Sangha, Marian Sigman, 2010. A prospective study of the emergence of early behavioral signs of autism. Journal of the American Academy of Child & Adolescent Psychiatry 49, 3(2010), 256–266.
Cristina Palmero, Elsbeth A van Dam, Sergio Escalera, Mike Kelia, Guido F Lichtert, Lucas PJJ Noldus, Andrew J Spink, and Astrid van Wieringen. 2018. Automatic mutual gaze detection in face-to-face dyadic interaction videos. In Proceedings of Measuring Behavior, Vol. 1. 2.
Alonso Patron-Perez, Marcin Marszalek, Andrew Zisserman, and Ian Reid. 2010. High five: Recognising human interactions in TV shows. In BMVC, Vol. 1. Citeseer, 33.
Dee C Ray and Sue C Bratton. 2010. What the research shows about play therapy: Twenty-first century update. Child-centered Play Therapy Research: The Evidence Base for Effective Practice (2010), 3–33.
Daniel C Richardson and Rick Dale. 2005. Looking to understand: The coupling between speakers’ and listeners’ eye movements and its relationship to discourse comprehension. Cognitive Science 29, 6 (2005), 1045–1060.
Michael Rutter, A Bailey, and Catherine Lord. 2003. SCQ. The Social Communication Questionnaire. Torrance, CA: Western Psychological Services (2003).
Gurkirt Singh, Suman Saha, Michael Sapienza, Philip HS Torr, and Fabio Cuzzolin. 2017. Online real-time multiple spatiotemporal action localisation and prediction. In Proceedings of the IEEE International Conference on Computer Vision. 3637–3646.
Sudha M Srinivasan, Inge-Marie Eigsti, Timothy Gifford, and Anjana N Bhat. 2016. The effects of embodied rhythm and robotic interventions on the spontaneous and responsive verbal communication skills of children with Autism Spectrum Disorder (ASD): A further outcome of a pilot randomized controlled trial. Research in Autism Spectrum Disorders 27 (2016), 73–87.
Sudha M Srinivasan, Inge-Marie Eigsti, Linda Neelly, and Anjana N Bhat. 2016. The effects of embodied rhythm and robotic interventions on the spontaneous and responsive social attention patterns of children with autism spectrum disorder (ASD): A pilot randomized controlled trial. Research in Autism Spectrum Disorders 27 (2016), 54–72.
Sudha M Srinivasan, Maninderjit Kaur, Isabel K Park, Timothy D Gifford, Kerry L Marsh, and Anjana N Bhat. 2015. The effects of rhythm and robotic interventions on the imitation/praxis, interpersonal synchrony, and motor performance of children with autism spectrum disorder (ASD): A pilot randomized controlled trial. Autism Research and Treatment 2015 (2015).
Sudha M Srinivasan, Isabel K Park, Linda B Neelly, and Anjana N Bhat. 2015. A comparison of the effects of rhythm and robotic interventions on repetitive behaviors and affective states of children with Autism Spectrum Disorder (ASD). Research in Autism Spectrum Disorders 18 (2015), 51–63.
Lindsey Sterling, Geraldine Dawson, Annette Estes, and Jessica Greenson. 2008. Characteristics associated with presence of depressive symptoms in adults with autism spectrum disorder. Journal of Autism and Developmental Disorders 38, 6 (2008), 1011–1018.
Chidchanok Thepsoonthorn, Takahiro Yokozuka, Jinhwan Kwon, Robin Miao Sin Yap, Shunsuke Miura, Ken-ichiro Ogawa, and Yoshihiro Miyake. 2015. Look at you, look at me: Detection and analysis of mutual gaze convergence in face-to-face interaction. In Proceedings of 2015 IEEE/SICE International Symposium on System Integration (SII). IEEE, 581–586.
Jenifer Ware Balch and Dee C Ray. 2015. Emotional assets of children with autism spectrum disorder: A single-case therapeutic outcome experiment. Journal of Counseling & Development 93, 4 (2015), 429–439.
Zhefan Ye, Yin Li, Alireza Fathi, Yi Han, Agata Rozga, Gregory D Abowd, and James M Rehg. 2012. Detecting eye contact using wearable eye-tracking glasses. In Proceedings of the 2012 ACM Conference on Ubiquitous Computing. 699–704.
Zhefan Ye, Yin Li, Yun Liu, Chanel Bridges, Agata Rozga, and James M Rehg. 2015. Detecting bids for eye contact using a wearable camera. In Proceedings of the 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG), Vol. 1. IEEE, 1–8.
Paul Yoder, Wendy L Stone, Tedra Walden, and Elizabeth Malesa. 2009. Predicting social impairment and ASD diagnosis in younger siblings of children with autism spectrum disorder. Journal of Autism and Developmental Disorders 39, 10(2009), 1381–1391.
Benjamin Zablotsky, Lindsey I Black, Matthew J Maenner, Laura A Schieve, Melissa L Danielson, Rebecca H Bitsko, Stephen J Blumberg, Michael D Kogan, and Coleen A Boyle. 2019. Prevalence and trends of developmental disabilities among children in the United States: 2009–2017. Pediatrics 144, 4 (2019).

Cited By

View all
  • (2024)ARAIS-Activity Recognition of Autism Children During Intervention Sessions2024 7th International Conference on Machine Learning and Natural Language Processing (MLNLP)10.1109/MLNLP63328.2024.10800500(1-6)Online publication date: 18-Oct-2024
  • (2024)Evaluating Gaze Detection for Children with Autism Using the ChildPlay-R Dataset2024 IEEE 18th International Conference on Automatic Face and Gesture Recognition (FG)10.1109/FG59268.2024.10581976(1-5)Online publication date: 27-May-2024
  • (2024)Linguistic summarization of visual attention and developmental functioning of young children with autism spectrum disorderHealth Information Science and Systems10.1007/s13755-024-00297-412:1Online publication date: 16-Jul-2024
  • Show More Cited By



Information & Contributors


Published In

cover image ACM Conferences
ICMI '21: Proceedings of the 2021 International Conference on Multimodal Interaction
October 2021
876 pages
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]



Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 October 2021


Request permissions for this article.

Check for updates

Author Tags

  1. autism spectrum disorder
  2. automatic gaze detection
  3. deep learning
  4. mutual gaze
  5. play therapy.
  6. social visual behavior
  7. visual behavior analysis


  • Research-article
  • Research
  • Refereed limited

Funding Sources


ICMI '21
October 18 - 22, 2021
QC, Montréal, Canada

Acceptance Rates

Overall Acceptance Rate 453 of 1,080 submissions, 42%


Other Metrics

Bibliometrics & Citations


Article Metrics

  • Downloads (Last 12 months)65
  • Downloads (Last 6 weeks)4
Reflects downloads up to 05 Mar 2025

Other Metrics


Cited By

View all
  • (2024)ARAIS-Activity Recognition of Autism Children During Intervention Sessions2024 7th International Conference on Machine Learning and Natural Language Processing (MLNLP)10.1109/MLNLP63328.2024.10800500(1-6)Online publication date: 18-Oct-2024
  • (2024)Evaluating Gaze Detection for Children with Autism Using the ChildPlay-R Dataset2024 IEEE 18th International Conference on Automatic Face and Gesture Recognition (FG)10.1109/FG59268.2024.10581976(1-5)Online publication date: 27-May-2024
  • (2024)Linguistic summarization of visual attention and developmental functioning of young children with autism spectrum disorderHealth Information Science and Systems10.1007/s13755-024-00297-412:1Online publication date: 16-Jul-2024
  • (2024)ALATT-network: automated LSTM-based framework for classification and monitoring of autism spectrum disorder therapy tasksSignal, Image and Video Processing10.1007/s11760-024-03540-318:12(9205-9221)Online publication date: 20-Sep-2024
  • (2023)Linguistic Comparison of Children with and without ASD through Eye-Tracking DataProceedings of the 2023 9th International Conference on Computer Technology Applications10.1145/3605423.3605457(241-246)Online publication date: 10-May-2023
  • (2023)Recognition of Human Relationships Using Interactions and Gazes through Video Analysis in Surveillance Footage2023 IEEE International Conference on Robotics and Biomimetics (ROBIO)10.1109/ROBIO58561.2023.10354720(1-7)Online publication date: 4-Dec-2023
  • (2022)Pose Uncertainty Aware Movement Synchrony Estimation via Spatial-Temporal Graph TransformerProceedings of the 2022 International Conference on Multimodal Interaction10.1145/3536221.3556627(73-82)Online publication date: 7-Nov-2022
  • (2022)Dyadic Movement Synchrony Estimation Under Privacy-preserving Conditions2022 26th International Conference on Pattern Recognition (ICPR)10.1109/ICPR56361.2022.9956680(762-769)Online publication date: 21-Aug-2022

View Options

Login options

View options


View or Download as a PDF file.



View online with eReader.


HTML Format

View this article in HTML Format.

HTML Format






Share this Publication link

Share on social media