skip to main content
research-article

Spontaneous Facial Behavior Analysis Using Deep Transformer-based Framework for Child–computer Interaction

Authors Info & Claims
Published:26 September 2023Publication History
Skip Abstract Section

Abstract

A fascinating challenge in robotics-human interaction is imitating the emotion recognition capability of humans to robots with the aim to make human-robotics interaction natural, genuine and intuitive. To achieve the natural interaction in affective robots, human-machine interfaces, and autonomous vehicles, understanding our attitudes and opinions is very important, and it provides a practical and feasible path to realize the connection between machine and human. Multimodal interface that includes voice along with facial expression can manifest a large range of nuanced emotions compared to purely textual interfaces and provide a great value to improve the intelligence level of effective communication. Interfaces that fail to manifest or ignore user emotions may significantly impact the performance and risk being perceived as cold, socially inept, untrustworthy, and incompetent. To equip a child well for life, we need to help our children identify their feelings, manage them well, and express their needs in healthy, respectful, and direct ways. Early identification of emotional deficits can help to prevent low social functioning in children. In this work, we analyzed the child’s spontaneous behavior using multimodal facial expression and voice signal presenting multimodal transformer-based last feature fusion for facial behavior analysis in children to extract contextualized representations from RGB video sequence and Hematoxylin and eosin video sequence and then using these representations followed by pairwise concatenations of contextualized representations using cross-feature fusion technique to predict users emotions. To validate the performance of the proposed framework, we have performed experiments with the different pairwise concatenations of contextualized representations that showed significantly better performance than state-of-the-art method. Besides, we perform t-distributed stochastic neighbor embedding visualization to visualize the discriminative feature in lower dimension space and probability density estimation to visualize the prediction capability of our proposed model.

REFERENCES

  1. [1] Albraikan Amani, Hafidh Basim, and Saddik Abdulmotaleb El. 2018. iAware: A real-time emotional biofeedback system based on physiological signals. IEEE Access 6 (2018), 7878078789.Google ScholarGoogle ScholarCross RefCross Ref
  2. [2] Albraikan Amani, Tobón Diana P., and Saddik Abdulmotaleb El. 2018. Hyper-parameter optimization for emotion detection using physiological signals. In Proceedings of the IEEE International Conference on Pervasive Computing and Communications Workshops (PerCom Workshops). IEEE, 836841.Google ScholarGoogle ScholarCross RefCross Ref
  3. [3] Albraikan Amani, Tobón Diana P., and Saddik Abdulmotaleb El. 2018. Toward user-independent emotion recognition using physiological signals. IEEE Sensors J. 19, 19 (2018), 84028412.Google ScholarGoogle ScholarCross RefCross Ref
  4. [4] Beale Russell and Peter Christian. 2008. The role of affect and emotion in HCI. In Affect and Emotion in Human-Computer Interaction. Springer, 111.Google ScholarGoogle Scholar
  5. [5] Demisse Girum G., Aouada Djamila, and Ottersten Björn. 2018. Deformation-based 3D facial expression representation. ACM Trans. Multim. Comput., Commun. Applic. 14, 1s (2018), 122.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. [6] Gu Huang, Chen Qiong, Xing Xiaoli, Zhao Junfeng, and Li Xiaoming. 2019. Facial emotion recognition in deaf children: Evidence from event-related potentials and event-related spectral perturbation analysis. Neuroscience Lett. 703 (2019), 198204.Google ScholarGoogle ScholarCross RefCross Ref
  7. [7] Khan Rizwan Ahmed, Crenn Arthur, Meyer Alexandre, and Bouakaz Saida. 2019. A novel database of children’s spontaneous facial expressions (LIRIS-CSE). Image Vis. Comput. 83 (2019), 6169.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. [8] Kim Yelin and Provost Emily Mower. 2015. Emotion recognition during speech using dynamics of multiple regions of the face. (2015).Google ScholarGoogle Scholar
  9. [9] Kucirkova Natalia, Evertsen-Stanghelle Cecilie, Studsrød Ingunn, Jensen Ida Bruheim, and Størksen Ingunn. 2020. Lessons for child–computer interaction studies following the research challenges during the Covid-19 pandemic. Int. J. Child-comput. Interact. 26 (2020), 100203.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. [10] Liu Shiguang, Wang Huixin, and Pei Min. 2022. Facial-expression-aware emotional color transfer based on convolutional neural network. ACM Trans. Multim. Comput., Commun. Applic. 18, 1 (2022), 119.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. [11] Mayor-Torres Juan Manuel, Medina-DeVilliers Sara, Clarkson Tessa, Lerner Matthew D., and Riccardi Giuseppe. 2021. Evaluation of interpretability for deep learning algorithms in EEG emotion recognition: A case study in autism. arXiv preprint arXiv:2111.13208 (2021).Google ScholarGoogle Scholar
  12. [12] Mehdizadehfar Vida, Ghassemi Farnaz, Fallah Ali, and Pouretemad Hamidreza. 2020. EEG study of facial emotion recognition in the fathers of autistic children. Biomed. Sig. Process. Contr. 56 (2020), 101721.Google ScholarGoogle ScholarCross RefCross Ref
  13. [13] Mehmood Raja Majid and Lee Hyo Jong. 2016. A novel feature extraction method based on late positive potential for emotion recognition in human brain signal patterns. Comput. Electric. Eng. 53 (2016), 444457.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. [14] Miao Yu, Dong Haiwei, Jaam Jihad Mohamad Al, and Saddik Abdulmotaleb El. 2019. A deep learning system for recognizing facial expression in real-time. ACM Trans. Multim. Comput., Commun. Applic. 15, 2 (2019), 120.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. [15] Qayyum Abdul, Razzak Imran, Moustafa Nour, and Mazhar Mona. 2022. Progressive ShallowNet for large scale dynamic and spontaneous facial behaviour analysis in children. Image Vis. Comput. (2022), 104375.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. [16] Shan Caifeng, Gong Shaogang, and McOwan Peter W.. 2009. Facial expression recognition based on local binary patterns: A comprehensive study. Image Vis. Comput. 27, 6 (2009), 803816.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. [17] Voeffray S.. 2011. Emotion-sensitive human-computer interaction (HCI): State of the art—Seminar paper. Emot. Recog. (2011), 14.Google ScholarGoogle Scholar
  18. [18] Wang Kai, Peng Xiaojiang, Yang Jianfei, Meng Debin, and Qiao Yu. 2020. Region attention networks for pose and occlusion robust facial expression recognition. IEEE Trans. Image Process. 29 (2020), 40574069.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. [19] Wang Xueping, Wang Yunhong, and Li Weixin. 2019. U-Net conditional GANs for photo-realistic and identity-preserving facial expression synthesis. ACM Trans. Multim. Comput., Commun. Applic. 15, 3s (2019), 123.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. [20] Yang Huei-Fang, Lin Bo-Yao, Chang Kuang-Yu, and Chen Chu-Song. 2018. Joint estimation of age and expression by combining scattering and convolutional networks. ACM Trans. Multim. Comput., Commun. Applic. 14, 1 (2018), 118.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. [21] Yao Anbang, Cai Dongqi, Hu Ping, Wang Shandong, Sha Liang, and Chen Yurong. 2016. HoloNet: Towards robust emotion recognition in the wild. In Proceedings of the 18th ACM International Conference on Multimodal Interaction. 472478.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. [22] Yin Guanghao, Sun Shouqian, Yu Dian, Li Dejian, and Zhang Kejun. 2022. A multimodal framework for large-scale emotion recognition by fusing music and electrodermal activity signals. ACM Trans. Multim. Comput., Commun. Applic. 18, 3 (2022), 123.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. [23] Zeng Jiabei, Shan Shiguang, and Chen Xilin. 2018. Facial expression recognition with inconsistently annotated datasets. In Proceedings of the European Conference on Computer Vision (ECCV). 222237.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. [24] Zhang Wei, Yao Ting, Zhu Shiai, and Saddik Abdulmotaleb El. 2019. Deep learning–based multimedia analytics: A review. ACM Trans. Multim. Comput., Commun. Applic. 15, 1s (2019), 126.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. [25] Zhang Zhaoxin, Guo Changyong, Meng Fanzhi, Xu Taizhong, and Huang Junkai. 2020. CovLets: A second-order descriptor for modeling multiple features. ACM Trans. Multim. Comput., Commun. Applic. 16, 1s (2020), 114.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. [26] Zhao Guoying and Pietikainen Matti. 2007. Dynamic texture recognition using local binary patterns with an application to facial expressions. IEEE Trans. Patt. Anal. Mach. Intell. 29, 6 (2007), 915928.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. [27] Zhi Ruicong, Flierl Markus, Ruan Qiuqi, and Kleijn W. Bastiaan. 2010. Graph-preserving sparse nonnegative matrix factorization with application to facial expression recognition. IEEE Trans. Syst., Man, Cyber., Part B (Cyber.) 41, 1 (2010), 3852.Google ScholarGoogle Scholar
  28. [28] Zhong Lin, Liu Qingshan, Yang Peng, Liu Bo, Huang Junzhou, and Metaxas Dimitris N.. 2012. Learning active facial patches for expression analysis. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 25622569.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Spontaneous Facial Behavior Analysis Using Deep Transformer-based Framework for Child–computer Interaction

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Multimedia Computing, Communications, and Applications
        ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 20, Issue 2
        February 2024
        548 pages
        ISSN:1551-6857
        EISSN:1551-6865
        DOI:10.1145/3613570
        • Editor:
        • Abdulmotaleb El Saddik
        Issue’s Table of Contents

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 26 September 2023
        • Online AM: 26 May 2022
        • Accepted: 16 May 2022
        • Revised: 14 April 2022
        • Received: 16 October 2021
        Published in tomm Volume 20, Issue 2

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Full Text

      View this article in Full Text.

      View Full Text