ABSTRACT
With the increasing use of video chats by children, the need for tools that facilitate the scientific study of their communicative behavior becomes more pressing. This paper investigates the automatic detection – from video calls – of two major signals in children’s social coordination: smiles and gaze. While there has been significant advancement in the field of computer vision to model such signals, very little work has been done to put these techniques to the test in the noisy, variable context of video calls, and even fewer studies (if any) have investigated children’s video calls specifically. In this paper, we provide a first exploration into this question, testing and comparing two modeling approaches: a) a feature-based approach that relies on state-of-the-art software like OpenFace for feature extraction, and b) an end-to-end approach where models are directly optimized to classify the behavior of interest from raw data. We found that using features generated by OpenFace provides a better solution in the case of smiles, whereas using simple end-to-end architectures proved to be much more helpful in the case of looking behavior. A broader goal of this preliminary work is to provide the basis for a public, comprehensive toolkit for the automatic processing of children’s communicative signals from video chat, facilitating research in children’s online multimodal interaction.
- Kirsten Abbot-Smith, Julie Dockrell, Alexandra Sturrock, Danielle Matthews, and Charlotte Wilson. 2023. Topic maintenance in social conversation: What children need to learn and evidence this can be taught. First Language (2023), 01427237231172652.Google Scholar
- Abhishek Agrawal, Jing Liu, Kübra Bodur, Benoit Favre, and Abdellah Fourtassi. 2023. Development of Multimodal Turn Coordination in Conversations: Evidence for Adult-like behavior in Middle Childhood. In Proceedings of the Annual Meeting of the Cognitive Science Society.Google ScholarCross Ref
- Tadas Baltrusaitis, Amir Zadeh, Yao Chong Lim, and Louis-Philippe Morency. 2018. OpenFace 2.0: Facial Behavior Analysis Toolkit. In 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018). 59–66. https://doi.org/10.1109/FG.2018.00019Google ScholarDigital Library
- Kübra Bodur, Mitja Nikolaus, Fatima Kassim, Laurent Prévot, and Abdellah Fourtassi. 2021. ChiCo: A Multimodal Corpus for the Study of Child Conversation. In Companion Publication of the 2021 International Conference on Multimodal Interaction (Montreal, QC, Canada) (ICMI ’21 Companion). Association for Computing Machinery, New York, NY, USA, 158–163. https://doi.org/10.1145/3461615.3485399Google ScholarDigital Library
- Kübra Bodur, Mitja Nikolaus, Laurent Prévot, and Abdellah Fourtassi. 2023. Using video calls to study children’s conversational development: The case of backchannel signaling. Frontiers in Computer Science 5 (2023).Google Scholar
- Junkai Chen, Qihao Ou, Zheru Chi, and Hong Fu. 2017. Smile detection in the wild with deep convolutional neural networks. Machine vision and applications 28 (2017), 173–183.Google Scholar
- Yihua Cheng, Haofei Wang, Yiwei Bao, and Feng Lu. 2021. Appearance-based Gaze Estimation With Deep Learning: A Review and Benchmark. arxiv:2104.12668 [cs.CV]Google Scholar
- Jeffrey F Cohn and Edward Z Tronick. 1987. Mother–infant face-to-face interaction: The sequence of dyadic states at 3, 6, and 9 months.Developmental psychology 23, 1 (1987), 68.Google Scholar
- Zhoucong Cui, Shuo Zhang, Jiani Hu, and Weihong Deng. 2014. Evaluation of Smile Detection Methods with Images in Real-World Scenarios. In ACCV Workshops.Google Scholar
- Maureen de Seyssel, Marvin Lavechin, Hadrien Titeux, Arthur Thomas, Gwendal Virlet, Andrea Santos Revilla, Guillaume Wisniewski, Bogdan Ludusan, and Emmanuel Dupoux. 2023. ProsAudit, a prosodic benchmark for self-supervised speech models. arXiv preprint arXiv:2302.12057 (2023).Google Scholar
- Jiankang Deng, Jia Guo, Yuxiang Zhou, Jinke Yu, Irene Kotsia, and Stefanos Zafeiriou. 2019. RetinaFace: Single-stage Dense Face Localisation in the Wild. arxiv:1905.00641 [cs.CV]Google Scholar
- Starkey Duncan and Donald W Fiske. 2015. Face-to-face interaction: Research, methods, and theory. Routledge.Google Scholar
- Paul Ekman and Wallace V Friesen. 1978. Facial action coding system. Environmental Psychology & Nonverbal Behavior (1978).Google Scholar
- Yotam Erel, Christine E. Potter, Sagi Jaffe-Dax, Casey Lew-Williams, and Amit H. Bermano. 2022. iCatcher: A neural network approach for automated coding of young children’s eye movements. Infancy 27, 4 (2022), 765–779. https://doi.org/10.1111/infa.12468 arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1111/infa.12468Google ScholarCross Ref
- Alan Fogel, Sueko Toda, and Masatoshi Kawai. 1988. Mother-infant face-to-face interaction in Japan and the United States: A laboratory comparison using 3-month-old infants.Developmental Psychology 24, 3 (1988), 398.Google Scholar
- Shreya Ghosh, Abhinav Dhall, Munawar Hayat, Jarrod Knibbe, and Qiang Ji. 2022. Automatic Gaze Analysis: A Survey of Deep Learning based Approaches. arxiv:2108.05479 [cs.CV]Google Scholar
- Xin Guo, Luisa Polania, and Kenneth Barner. 2018. Smile Detection in the Wild Based on Transfer Learning. In 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018). 679–686. https://doi.org/10.1109/FG.2018.00107Google ScholarDigital Library
- Antonia F de C. Hamilton and Judith Holler. 2023. Face2face: advancing the science of social interaction. Philosophical Transactions of the Royal Society B: Biological Sciences 378, 1875 (2023), 20210470. https://doi.org/10.1098/rstb.2021.0470 arXiv:https://royalsocietypublishing.org/doi/pdf/10.1098/rstb.2021.0470Google ScholarCross Ref
- Daniel D. Hromada, Charles Tijus, S. Poitrenaud, and Jacqueline Nadel. 2010. Zygomatic Smile Detection: The Semi-Supervised Haar Training of a Fast and Frugal System: A Gift to OpenCV Community. In 2010 IEEE RIVF International Conference on Computing & Communication Technologies, Research, Innovation, and Vision for the Future (RIVF). 1–5. https://doi.org/10.1109/RIVF.2010.5633176Google ScholarCross Ref
- Hui-Chin Hsu, Alan Fogel, and Daniel S Messinger. 2001. Infant non-distress vocalization during mother-infant face-to-face interaction: Factors associated with quantitative and qualitative differences. Infant Behavior and Development 24, 1 (2001), 107–128.Google ScholarCross Ref
- Adam Kendon. 1967. Some functions of gaze-direction in social interaction. Acta psychologica 26 (1967), 22–63.Google Scholar
- Bin Li and Dimas Lima. 2021. Facial expression recognition via ResNet-50. International Journal of Cognitive Computing in Engineering 2 (2021), 57–64.Google ScholarCross Ref
- Jing Liu, Mitja Nikolaus, Kübra Bodur, and Abdellah Fourtassi. 2022. Predicting backchannel signaling in child-caregiver multimodal conversations. In Companion publication of the 2022 international conference on multimodal interaction. 196–200.Google Scholar
- Chiara Mazzocconi, Benjamin O’Brien, Kevin El Haddad, Kübra Bodur, and Abdellah Fourtassi. 2023. Differences between mimicking and non-mimicking laughter in child-caregiver conversation: A distributional and acoustic analysis. In Proceedings of the Annual Meeting of the Cognitive Science Society.Google ScholarCross Ref
- Scott A Miller. 2012. Theory of Mind: Beyond the Preschool Years. Psychology Press.Google Scholar
- Thomas Misiek and Abdellah Fourtassi. 2022. Caregivers exaggerate their lexical alignment to young children across several cultures. Proceedings of SemDial (2022).Google ScholarCross Ref
- Mitja Nikolaus, Juliette Maes, Jeremy Auguste, Laurent Prevot, and Abdellah Fourtassi. 2021. Large-scale study of speech acts’ development using automatic labelling. In Proceedings of the 43rd Annual Meeting of the Cognitive Science Society. Vienna, Austria. https://hal.science/hal-03234620Google ScholarCross Ref
- Patrizia Paggio, Manex Agirrezabal, Bart Jongejan, and Costanza Navarretta. 2020. Automatic Detection and Classification of Head Movements in Face-to-Face Conversations. In Proceedings of LREC2020 Workshop “People in language, vision and the mind” (ONION2020). European Language Resources Association (ELRA), Marseille, France, 15–21. https://aclanthology.org/2020.onion-1.3Google Scholar
- Dinh Viet Sang 2017. Facial smile detection using convolutional neural networks. In 2017 9th International Conference on Knowledge and Systems Engineering (KSE). IEEE, 136–141.Google ScholarCross Ref
- Bogdan Smolka and Karolina Nurzynska. 2015. Power LBP: A Novel Texture Operator for Smiling and Neutral Facial Display Classification. Procedia Computer Science 51 (2015), 1555–1564. https://doi.org/10.1016/j.procs.2015.05.350 International Conference On Computational Science, ICCS 2015.Google ScholarDigital Library
- Saiyed Umer, Ranjeet Kumar Rout, Chiara Pero, and Michele Nappi. 2022. Facial expression recognition with trade-offs between data augmentation and deep learning features. Journal of Ambient Intelligence and Humanized Computing (2022), 1–15.Google Scholar
- Erroll Wood, Tadas Baltrusaitis, Xucong Zhang, Yusuke Sugano, Peter Robinson, and Andreas Bulling. 2015. Rendering of eyes for eye-shape registration and gaze estimation. In Proceedings of the IEEE international conference on computer vision. 3756–3764.Google ScholarDigital Library
- Yu Xia, Di Huang, and Yunhong Wang. 2017. Detecting Smiles of Young Children via Deep Transfer Learning. In 2017 IEEE International Conference on Computer Vision Workshops (ICCVW). 1673–1681. https://doi.org/10.1109/ICCVW.2017.196Google ScholarCross Ref
- Marygrace E Yale, Daniel S Messinger, Alan B Cobo-Lewis, and Christine F Delgado. 2003. The temporal coordination of early infant communication.Developmental psychology 39, 5 (2003), 815.Google Scholar
- Kaihao Zhang, Yongzhen Huang, Hong Wu, and Liang Wang. 2015. Facial smile detection based on deep learning features. 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015), 534–538.Google ScholarCross Ref
Index Terms
- Automatic Detection of Gaze and Smile in Children's Video Calls
Recommendations
Partner’s Gaze with Duchenne Smile in Social Interaction Promotes Successive Cooperative Decision
Human-Computer Interaction. Technological InnovationAbstractSmile has been conceptualized as a signal of cooperative intent. However, given that smile is easy to fake, how smiling conveys the cooperative intention has long been a question of great interest. Although previous work suggests that people tend ...
Poisson regulation in mother-infant gaze systems
We propose stochastic models for the interactive regulation of gaze on/off each partner's face in mother/infant gaze as well as ''turn-taking''. We infer that a Poisson timing mechanism indeed underlies the negative exponential distributions of gaze, ...
Looking for Laughs: Gaze Interaction with Laughter Pragmatics and Coordination
ICMI '21: Proceedings of the 2021 International Conference on Multimodal InteractionLaughter and gaze have an important role in managing and coordi-nating social interactions. In the current work, using a multimodal corpus of dyadic taste-testing interactions, we explore whether laughs performing different pragmatic functions are ...
Comments