research-article

Automatic Detection of Gaze and Smile in Children's Video Calls

Authors:
Dhia-Elhak Goumri

Aix Marseille Univ, CNRS, LIS, France

Aix Marseille Univ, CNRS, LIS, France

0009-0005-9906-0963
View Profile

,
Thomas Janssoone

Enchanted Tools, France

Enchanted Tools, France

0000-0002-2316-4249
View Profile

,
Leonor Becerra-Bonache

Aix Marseille Univ, CNRS, LIS, France

Aix Marseille Univ, CNRS, LIS, France

0009-0009-0177-8360
View Profile

,
Abdellah Fourtassi

Aix Marseille Univ, CNRS, LIS, France

Aix Marseille Univ, CNRS, LIS, France

0000-0003-0279-7730
View Profile

ICMI '23 Companion: Companion Publication of the 25th International Conference on Multimodal InteractionOctober 2023Pages 383–388https://doi.org/10.1145/3610661.3616241

Published:09 October 2023Publication History

ICMI '23 Companion: Companion Publication of the 25th International Conference on Multimodal Interaction

Pages 383–388

ABSTRACT

With the increasing use of video chats by children, the need for tools that facilitate the scientific study of their communicative behavior becomes more pressing. This paper investigates the automatic detection – from video calls – of two major signals in children’s social coordination: smiles and gaze. While there has been significant advancement in the field of computer vision to model such signals, very little work has been done to put these techniques to the test in the noisy, variable context of video calls, and even fewer studies (if any) have investigated children’s video calls specifically. In this paper, we provide a first exploration into this question, testing and comparing two modeling approaches: a) a feature-based approach that relies on state-of-the-art software like OpenFace for feature extraction, and b) an end-to-end approach where models are directly optimized to classify the behavior of interest from raw data. We found that using features generated by OpenFace provides a better solution in the case of smiles, whereas using simple end-to-end architectures proved to be much more helpful in the case of looking behavior. A broader goal of this preliminary work is to provide the basis for a public, comprehensive toolkit for the automatic processing of children’s communicative signals from video chat, facilitating research in children’s online multimodal interaction.

References

Kirsten Abbot-Smith, Julie Dockrell, Alexandra Sturrock, Danielle Matthews, and Charlotte Wilson. 2023. Topic maintenance in social conversation: What children need to learn and evidence this can be taught. First Language (2023), 01427237231172652.Google Scholar
Abhishek Agrawal, Jing Liu, Kübra Bodur, Benoit Favre, and Abdellah Fourtassi. 2023. Development of Multimodal Turn Coordination in Conversations: Evidence for Adult-like behavior in Middle Childhood. In Proceedings of the Annual Meeting of the Cognitive Science Society.Google ScholarCross Ref
Tadas Baltrusaitis, Amir Zadeh, Yao Chong Lim, and Louis-Philippe Morency. 2018. OpenFace 2.0: Facial Behavior Analysis Toolkit. In 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018). 59–66. https://doi.org/10.1109/FG.2018.00019Google ScholarDigital Library
Kübra Bodur, Mitja Nikolaus, Fatima Kassim, Laurent Prévot, and Abdellah Fourtassi. 2021. ChiCo: A Multimodal Corpus for the Study of Child Conversation. In Companion Publication of the 2021 International Conference on Multimodal Interaction (Montreal, QC, Canada) (ICMI ’21 Companion). Association for Computing Machinery, New York, NY, USA, 158–163. https://doi.org/10.1145/3461615.3485399Google ScholarDigital Library
Kübra Bodur, Mitja Nikolaus, Laurent Prévot, and Abdellah Fourtassi. 2023. Using video calls to study children’s conversational development: The case of backchannel signaling. Frontiers in Computer Science 5 (2023).Google Scholar
Junkai Chen, Qihao Ou, Zheru Chi, and Hong Fu. 2017. Smile detection in the wild with deep convolutional neural networks. Machine vision and applications 28 (2017), 173–183.Google Scholar
Yihua Cheng, Haofei Wang, Yiwei Bao, and Feng Lu. 2021. Appearance-based Gaze Estimation With Deep Learning: A Review and Benchmark. arxiv:2104.12668 [cs.CV]Google Scholar
Jeffrey F Cohn and Edward Z Tronick. 1987. Mother–infant face-to-face interaction: The sequence of dyadic states at 3, 6, and 9 months.Developmental psychology 23, 1 (1987), 68.Google Scholar
Zhoucong Cui, Shuo Zhang, Jiani Hu, and Weihong Deng. 2014. Evaluation of Smile Detection Methods with Images in Real-World Scenarios. In ACCV Workshops.Google Scholar
Maureen de Seyssel, Marvin Lavechin, Hadrien Titeux, Arthur Thomas, Gwendal Virlet, Andrea Santos Revilla, Guillaume Wisniewski, Bogdan Ludusan, and Emmanuel Dupoux. 2023. ProsAudit, a prosodic benchmark for self-supervised speech models. arXiv preprint arXiv:2302.12057 (2023).Google Scholar
Jiankang Deng, Jia Guo, Yuxiang Zhou, Jinke Yu, Irene Kotsia, and Stefanos Zafeiriou. 2019. RetinaFace: Single-stage Dense Face Localisation in the Wild. arxiv:1905.00641 [cs.CV]Google Scholar
Starkey Duncan and Donald W Fiske. 2015. Face-to-face interaction: Research, methods, and theory. Routledge.Google Scholar
Paul Ekman and Wallace V Friesen. 1978. Facial action coding system. Environmental Psychology & Nonverbal Behavior (1978).Google Scholar
Yotam Erel, Christine E. Potter, Sagi Jaffe-Dax, Casey Lew-Williams, and Amit H. Bermano. 2022. iCatcher: A neural network approach for automated coding of young children’s eye movements. Infancy 27, 4 (2022), 765–779. https://doi.org/10.1111/infa.12468 arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1111/infa.12468Google ScholarCross Ref
Alan Fogel, Sueko Toda, and Masatoshi Kawai. 1988. Mother-infant face-to-face interaction in Japan and the United States: A laboratory comparison using 3-month-old infants.Developmental Psychology 24, 3 (1988), 398.Google Scholar
Shreya Ghosh, Abhinav Dhall, Munawar Hayat, Jarrod Knibbe, and Qiang Ji. 2022. Automatic Gaze Analysis: A Survey of Deep Learning based Approaches. arxiv:2108.05479 [cs.CV]Google Scholar
Xin Guo, Luisa Polania, and Kenneth Barner. 2018. Smile Detection in the Wild Based on Transfer Learning. In 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018). 679–686. https://doi.org/10.1109/FG.2018.00107Google ScholarDigital Library
Antonia F de C. Hamilton and Judith Holler. 2023. Face2face: advancing the science of social interaction. Philosophical Transactions of the Royal Society B: Biological Sciences 378, 1875 (2023), 20210470. https://doi.org/10.1098/rstb.2021.0470 arXiv:https://royalsocietypublishing.org/doi/pdf/10.1098/rstb.2021.0470Google ScholarCross Ref
Daniel D. Hromada, Charles Tijus, S. Poitrenaud, and Jacqueline Nadel. 2010. Zygomatic Smile Detection: The Semi-Supervised Haar Training of a Fast and Frugal System: A Gift to OpenCV Community. In 2010 IEEE RIVF International Conference on Computing & Communication Technologies, Research, Innovation, and Vision for the Future (RIVF). 1–5. https://doi.org/10.1109/RIVF.2010.5633176Google ScholarCross Ref
Hui-Chin Hsu, Alan Fogel, and Daniel S Messinger. 2001. Infant non-distress vocalization during mother-infant face-to-face interaction: Factors associated with quantitative and qualitative differences. Infant Behavior and Development 24, 1 (2001), 107–128.Google ScholarCross Ref
Adam Kendon. 1967. Some functions of gaze-direction in social interaction. Acta psychologica 26 (1967), 22–63.Google Scholar
Bin Li and Dimas Lima. 2021. Facial expression recognition via ResNet-50. International Journal of Cognitive Computing in Engineering 2 (2021), 57–64.Google ScholarCross Ref
Jing Liu, Mitja Nikolaus, Kübra Bodur, and Abdellah Fourtassi. 2022. Predicting backchannel signaling in child-caregiver multimodal conversations. In Companion publication of the 2022 international conference on multimodal interaction. 196–200.Google Scholar
Chiara Mazzocconi, Benjamin O’Brien, Kevin El Haddad, Kübra Bodur, and Abdellah Fourtassi. 2023. Differences between mimicking and non-mimicking laughter in child-caregiver conversation: A distributional and acoustic analysis. In Proceedings of the Annual Meeting of the Cognitive Science Society.Google ScholarCross Ref
Scott A Miller. 2012. Theory of Mind: Beyond the Preschool Years. Psychology Press.Google Scholar
Thomas Misiek and Abdellah Fourtassi. 2022. Caregivers exaggerate their lexical alignment to young children across several cultures. Proceedings of SemDial (2022).Google ScholarCross Ref
Mitja Nikolaus, Juliette Maes, Jeremy Auguste, Laurent Prevot, and Abdellah Fourtassi. 2021. Large-scale study of speech acts’ development using automatic labelling. In Proceedings of the 43rd Annual Meeting of the Cognitive Science Society. Vienna, Austria. https://hal.science/hal-03234620Google ScholarCross Ref
Patrizia Paggio, Manex Agirrezabal, Bart Jongejan, and Costanza Navarretta. 2020. Automatic Detection and Classification of Head Movements in Face-to-Face Conversations. In Proceedings of LREC2020 Workshop “People in language, vision and the mind” (ONION2020). European Language Resources Association (ELRA), Marseille, France, 15–21. https://aclanthology.org/2020.onion-1.3Google Scholar
Dinh Viet Sang 2017. Facial smile detection using convolutional neural networks. In 2017 9th International Conference on Knowledge and Systems Engineering (KSE). IEEE, 136–141.Google ScholarCross Ref
Bogdan Smolka and Karolina Nurzynska. 2015. Power LBP: A Novel Texture Operator for Smiling and Neutral Facial Display Classification. Procedia Computer Science 51 (2015), 1555–1564. https://doi.org/10.1016/j.procs.2015.05.350 International Conference On Computational Science, ICCS 2015.Google ScholarDigital Library
Saiyed Umer, Ranjeet Kumar Rout, Chiara Pero, and Michele Nappi. 2022. Facial expression recognition with trade-offs between data augmentation and deep learning features. Journal of Ambient Intelligence and Humanized Computing (2022), 1–15.Google Scholar
Erroll Wood, Tadas Baltrusaitis, Xucong Zhang, Yusuke Sugano, Peter Robinson, and Andreas Bulling. 2015. Rendering of eyes for eye-shape registration and gaze estimation. In Proceedings of the IEEE international conference on computer vision. 3756–3764.Google ScholarDigital Library
Yu Xia, Di Huang, and Yunhong Wang. 2017. Detecting Smiles of Young Children via Deep Transfer Learning. In 2017 IEEE International Conference on Computer Vision Workshops (ICCVW). 1673–1681. https://doi.org/10.1109/ICCVW.2017.196Google ScholarCross Ref
Marygrace E Yale, Daniel S Messinger, Alan B Cobo-Lewis, and Christine F Delgado. 2003. The temporal coordination of early infant communication.Developmental psychology 39, 5 (2003), 815.Google Scholar
Kaihao Zhang, Yongzhen Huang, Hong Wu, and Liang Wang. 2015. Facial smile detection based on deep learning features. 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015), 534–538.Google ScholarCross Ref

Index Terms

Automatic Detection of Gaze and Smile in Children's Video Calls
1. Computing methodologies
  1. Machine learning
2. Human-centered computing
  1. Interaction design

Recommendations

Partner’s Gaze with Duchenne Smile in Social Interaction Promotes Successive Cooperative Decision
Human-Computer Interaction. Technological Innovation
Abstract
Smile has been conceptualized as a signal of cooperative intent. However, given that smile is easy to fake, how smiling conveys the cooperative intention has long been a question of great interest. Although previous work suggests that people tend ...
Read More
Poisson regulation in mother-infant gaze systems

We propose stochastic models for the interactive regulation of gaze on/off each partner's face in mother/infant gaze as well as ''turn-taking''. We infer that a Poisson timing mechanism indeed underlies the negative exponential distributions of gaze, ...
Read More
Looking for Laughs: Gaze Interaction with Laughter Pragmatics and Coordination
ICMI '21: Proceedings of the 2021 International Conference on Multimodal Interaction

Laughter and gaze have an important role in managing and coordi-nating social interactions. In the current work, using a multimodal corpus of dyadic taste-testing interactions, we explore whether laughs performing different pragmatic functions are ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICMI '23 Companion: Companion Publication of the 25th International Conference on Multimodal Interaction
October 2023
434 pages
ISBN:9798400703218
DOI:10.1145/3610661
Editors:
Elisabeth André
University of Augsburg
,
Mohamed Chetouani
Sorbonne University
,
Dominique Vaufreydaz
Univ. Grenoble Alpes
,
Gale Lucas
USC Institute for Creative Technologies
,
Tanja Schultz
University of Bremen
,
Louis-Philippe Morency
Carnegie Mellon University
,
Alessandro Vinciarelli
University of Glasgow
Copyright © 2023 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 9 October 2023
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
child-caregiver interactions
gaze
machine learning
smile
video chats
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate453of1,080submissions,42%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 44
  Total Downloads
- Downloads (Last 12 months)44
- Downloads (Last 6 weeks)10
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Automatic Detection of Gaze and Smile in Children's Video Calls

ICMI '23 Companion: Companion Publication of the 25th International Conference on Multimodal Interaction

ABSTRACT

References

Cited By

Index Terms

Recommendations

Partner’s Gaze with Duchenne Smile in Social Interaction Promotes Successive Cooperative Decision

Poisson regulation in mother-infant gaze systems

Looking for Laughs: Gaze Interaction with Laughter Pragmatics and Coordination