skip to main content
10.1145/3610661.3616241acmconferencesArticle/Chapter ViewAbstractPublication Pagesicmi-mlmiConference Proceedingsconference-collections
research-article

Automatic Detection of Gaze and Smile in Children's Video Calls

Published:09 October 2023Publication History

ABSTRACT

With the increasing use of video chats by children, the need for tools that facilitate the scientific study of their communicative behavior becomes more pressing. This paper investigates the automatic detection – from video calls – of two major signals in children’s social coordination: smiles and gaze. While there has been significant advancement in the field of computer vision to model such signals, very little work has been done to put these techniques to the test in the noisy, variable context of video calls, and even fewer studies (if any) have investigated children’s video calls specifically. In this paper, we provide a first exploration into this question, testing and comparing two modeling approaches: a) a feature-based approach that relies on state-of-the-art software like OpenFace for feature extraction, and b) an end-to-end approach where models are directly optimized to classify the behavior of interest from raw data. We found that using features generated by OpenFace provides a better solution in the case of smiles, whereas using simple end-to-end architectures proved to be much more helpful in the case of looking behavior. A broader goal of this preliminary work is to provide the basis for a public, comprehensive toolkit for the automatic processing of children’s communicative signals from video chat, facilitating research in children’s online multimodal interaction.

References

  1. Kirsten Abbot-Smith, Julie Dockrell, Alexandra Sturrock, Danielle Matthews, and Charlotte Wilson. 2023. Topic maintenance in social conversation: What children need to learn and evidence this can be taught. First Language (2023), 01427237231172652.Google ScholarGoogle Scholar
  2. Abhishek Agrawal, Jing Liu, Kübra Bodur, Benoit Favre, and Abdellah Fourtassi. 2023. Development of Multimodal Turn Coordination in Conversations: Evidence for Adult-like behavior in Middle Childhood. In Proceedings of the Annual Meeting of the Cognitive Science Society.Google ScholarGoogle ScholarCross RefCross Ref
  3. Tadas Baltrusaitis, Amir Zadeh, Yao Chong Lim, and Louis-Philippe Morency. 2018. OpenFace 2.0: Facial Behavior Analysis Toolkit. In 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018). 59–66. https://doi.org/10.1109/FG.2018.00019Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Kübra Bodur, Mitja Nikolaus, Fatima Kassim, Laurent Prévot, and Abdellah Fourtassi. 2021. ChiCo: A Multimodal Corpus for the Study of Child Conversation. In Companion Publication of the 2021 International Conference on Multimodal Interaction (Montreal, QC, Canada) (ICMI ’21 Companion). Association for Computing Machinery, New York, NY, USA, 158–163. https://doi.org/10.1145/3461615.3485399Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Kübra Bodur, Mitja Nikolaus, Laurent Prévot, and Abdellah Fourtassi. 2023. Using video calls to study children’s conversational development: The case of backchannel signaling. Frontiers in Computer Science 5 (2023).Google ScholarGoogle Scholar
  6. Junkai Chen, Qihao Ou, Zheru Chi, and Hong Fu. 2017. Smile detection in the wild with deep convolutional neural networks. Machine vision and applications 28 (2017), 173–183.Google ScholarGoogle Scholar
  7. Yihua Cheng, Haofei Wang, Yiwei Bao, and Feng Lu. 2021. Appearance-based Gaze Estimation With Deep Learning: A Review and Benchmark. arxiv:2104.12668 [cs.CV]Google ScholarGoogle Scholar
  8. Jeffrey F Cohn and Edward Z Tronick. 1987. Mother–infant face-to-face interaction: The sequence of dyadic states at 3, 6, and 9 months.Developmental psychology 23, 1 (1987), 68.Google ScholarGoogle Scholar
  9. Zhoucong Cui, Shuo Zhang, Jiani Hu, and Weihong Deng. 2014. Evaluation of Smile Detection Methods with Images in Real-World Scenarios. In ACCV Workshops.Google ScholarGoogle Scholar
  10. Maureen de Seyssel, Marvin Lavechin, Hadrien Titeux, Arthur Thomas, Gwendal Virlet, Andrea Santos Revilla, Guillaume Wisniewski, Bogdan Ludusan, and Emmanuel Dupoux. 2023. ProsAudit, a prosodic benchmark for self-supervised speech models. arXiv preprint arXiv:2302.12057 (2023).Google ScholarGoogle Scholar
  11. Jiankang Deng, Jia Guo, Yuxiang Zhou, Jinke Yu, Irene Kotsia, and Stefanos Zafeiriou. 2019. RetinaFace: Single-stage Dense Face Localisation in the Wild. arxiv:1905.00641 [cs.CV]Google ScholarGoogle Scholar
  12. Starkey Duncan and Donald W Fiske. 2015. Face-to-face interaction: Research, methods, and theory. Routledge.Google ScholarGoogle Scholar
  13. Paul Ekman and Wallace V Friesen. 1978. Facial action coding system. Environmental Psychology & Nonverbal Behavior (1978).Google ScholarGoogle Scholar
  14. Yotam Erel, Christine E. Potter, Sagi Jaffe-Dax, Casey Lew-Williams, and Amit H. Bermano. 2022. iCatcher: A neural network approach for automated coding of young children’s eye movements. Infancy 27, 4 (2022), 765–779. https://doi.org/10.1111/infa.12468 arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1111/infa.12468Google ScholarGoogle ScholarCross RefCross Ref
  15. Alan Fogel, Sueko Toda, and Masatoshi Kawai. 1988. Mother-infant face-to-face interaction in Japan and the United States: A laboratory comparison using 3-month-old infants.Developmental Psychology 24, 3 (1988), 398.Google ScholarGoogle Scholar
  16. Shreya Ghosh, Abhinav Dhall, Munawar Hayat, Jarrod Knibbe, and Qiang Ji. 2022. Automatic Gaze Analysis: A Survey of Deep Learning based Approaches. arxiv:2108.05479 [cs.CV]Google ScholarGoogle Scholar
  17. Xin Guo, Luisa Polania, and Kenneth Barner. 2018. Smile Detection in the Wild Based on Transfer Learning. In 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018). 679–686. https://doi.org/10.1109/FG.2018.00107Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Antonia F de C. Hamilton and Judith Holler. 2023. Face2face: advancing the science of social interaction. Philosophical Transactions of the Royal Society B: Biological Sciences 378, 1875 (2023), 20210470. https://doi.org/10.1098/rstb.2021.0470 arXiv:https://royalsocietypublishing.org/doi/pdf/10.1098/rstb.2021.0470Google ScholarGoogle ScholarCross RefCross Ref
  19. Daniel D. Hromada, Charles Tijus, S. Poitrenaud, and Jacqueline Nadel. 2010. Zygomatic Smile Detection: The Semi-Supervised Haar Training of a Fast and Frugal System: A Gift to OpenCV Community. In 2010 IEEE RIVF International Conference on Computing & Communication Technologies, Research, Innovation, and Vision for the Future (RIVF). 1–5. https://doi.org/10.1109/RIVF.2010.5633176Google ScholarGoogle ScholarCross RefCross Ref
  20. Hui-Chin Hsu, Alan Fogel, and Daniel S Messinger. 2001. Infant non-distress vocalization during mother-infant face-to-face interaction: Factors associated with quantitative and qualitative differences. Infant Behavior and Development 24, 1 (2001), 107–128.Google ScholarGoogle ScholarCross RefCross Ref
  21. Adam Kendon. 1967. Some functions of gaze-direction in social interaction. Acta psychologica 26 (1967), 22–63.Google ScholarGoogle Scholar
  22. Bin Li and Dimas Lima. 2021. Facial expression recognition via ResNet-50. International Journal of Cognitive Computing in Engineering 2 (2021), 57–64.Google ScholarGoogle ScholarCross RefCross Ref
  23. Jing Liu, Mitja Nikolaus, Kübra Bodur, and Abdellah Fourtassi. 2022. Predicting backchannel signaling in child-caregiver multimodal conversations. In Companion publication of the 2022 international conference on multimodal interaction. 196–200.Google ScholarGoogle Scholar
  24. Chiara Mazzocconi, Benjamin O’Brien, Kevin El Haddad, Kübra Bodur, and Abdellah Fourtassi. 2023. Differences between mimicking and non-mimicking laughter in child-caregiver conversation: A distributional and acoustic analysis. In Proceedings of the Annual Meeting of the Cognitive Science Society.Google ScholarGoogle ScholarCross RefCross Ref
  25. Scott A Miller. 2012. Theory of Mind: Beyond the Preschool Years. Psychology Press.Google ScholarGoogle Scholar
  26. Thomas Misiek and Abdellah Fourtassi. 2022. Caregivers exaggerate their lexical alignment to young children across several cultures. Proceedings of SemDial (2022).Google ScholarGoogle ScholarCross RefCross Ref
  27. Mitja Nikolaus, Juliette Maes, Jeremy Auguste, Laurent Prevot, and Abdellah Fourtassi. 2021. Large-scale study of speech acts’ development using automatic labelling. In Proceedings of the 43rd Annual Meeting of the Cognitive Science Society. Vienna, Austria. https://hal.science/hal-03234620Google ScholarGoogle ScholarCross RefCross Ref
  28. Patrizia Paggio, Manex Agirrezabal, Bart Jongejan, and Costanza Navarretta. 2020. Automatic Detection and Classification of Head Movements in Face-to-Face Conversations. In Proceedings of LREC2020 Workshop “People in language, vision and the mind” (ONION2020). European Language Resources Association (ELRA), Marseille, France, 15–21. https://aclanthology.org/2020.onion-1.3Google ScholarGoogle Scholar
  29. Dinh Viet Sang 2017. Facial smile detection using convolutional neural networks. In 2017 9th International Conference on Knowledge and Systems Engineering (KSE). IEEE, 136–141.Google ScholarGoogle ScholarCross RefCross Ref
  30. Bogdan Smolka and Karolina Nurzynska. 2015. Power LBP: A Novel Texture Operator for Smiling and Neutral Facial Display Classification. Procedia Computer Science 51 (2015), 1555–1564. https://doi.org/10.1016/j.procs.2015.05.350 International Conference On Computational Science, ICCS 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Saiyed Umer, Ranjeet Kumar Rout, Chiara Pero, and Michele Nappi. 2022. Facial expression recognition with trade-offs between data augmentation and deep learning features. Journal of Ambient Intelligence and Humanized Computing (2022), 1–15.Google ScholarGoogle Scholar
  32. Erroll Wood, Tadas Baltrusaitis, Xucong Zhang, Yusuke Sugano, Peter Robinson, and Andreas Bulling. 2015. Rendering of eyes for eye-shape registration and gaze estimation. In Proceedings of the IEEE international conference on computer vision. 3756–3764.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Yu Xia, Di Huang, and Yunhong Wang. 2017. Detecting Smiles of Young Children via Deep Transfer Learning. In 2017 IEEE International Conference on Computer Vision Workshops (ICCVW). 1673–1681. https://doi.org/10.1109/ICCVW.2017.196Google ScholarGoogle ScholarCross RefCross Ref
  34. Marygrace E Yale, Daniel S Messinger, Alan B Cobo-Lewis, and Christine F Delgado. 2003. The temporal coordination of early infant communication.Developmental psychology 39, 5 (2003), 815.Google ScholarGoogle Scholar
  35. Kaihao Zhang, Yongzhen Huang, Hong Wu, and Liang Wang. 2015. Facial smile detection based on deep learning features. 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015), 534–538.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Automatic Detection of Gaze and Smile in Children's Video Calls

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        ICMI '23 Companion: Companion Publication of the 25th International Conference on Multimodal Interaction
        October 2023
        434 pages
        ISBN:9798400703218
        DOI:10.1145/3610661

        Copyright © 2023 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 9 October 2023

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed limited

        Acceptance Rates

        Overall Acceptance Rate453of1,080submissions,42%
      • Article Metrics

        • Downloads (Last 12 months)44
        • Downloads (Last 6 weeks)10

        Other Metrics

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format .

      View HTML Format