research-article

Spontaneous Facial Behavior Analysis Using Deep Transformer-based Framework for Child–computer Interaction

Authors:
Abdul Qayyum

Department of Computer Science and Engineering, Université Bourgogne, France

Department of Computer Science and Engineering, Université Bourgogne, France

0000-0003-3102-1595
View Profile

,
Imran Razzak

School of Computer Science and Engineering, University of New South Wales, Sydney, Australia

School of Computer Science and Engineering, University of New South Wales, Sydney, Australia

0000-0002-3930-6600
View Profile

,
M. Tanveer

Department of Mathematics, Indian Institute of Technology Indore, Simrol, Indore, India

Department of Mathematics, Indian Institute of Technology Indore, Simrol, Indore, India

0000-0002-5727-3697
View Profile

,
Moona Mazher

Department of Computer Engineering and Mathematics, University Rovira i Virgili, Spain

Department of Computer Engineering and Mathematics, University Rovira i Virgili, Spain

0000-0003-4444-5776
View Profile

ACM Transactions on Multimedia Computing, Communications, and Applications Volume 20 Issue 2Article No.: 43pp 1–17https://doi.org/10.1145/3539577

Published:26 September 2023Publication History

ACM Transactions on Multimedia Computing, Communications, and Applications

Abstract

A fascinating challenge in robotics-human interaction is imitating the emotion recognition capability of humans to robots with the aim to make human-robotics interaction natural, genuine and intuitive. To achieve the natural interaction in affective robots, human-machine interfaces, and autonomous vehicles, understanding our attitudes and opinions is very important, and it provides a practical and feasible path to realize the connection between machine and human. Multimodal interface that includes voice along with facial expression can manifest a large range of nuanced emotions compared to purely textual interfaces and provide a great value to improve the intelligence level of effective communication. Interfaces that fail to manifest or ignore user emotions may significantly impact the performance and risk being perceived as cold, socially inept, untrustworthy, and incompetent. To equip a child well for life, we need to help our children identify their feelings, manage them well, and express their needs in healthy, respectful, and direct ways. Early identification of emotional deficits can help to prevent low social functioning in children. In this work, we analyzed the child’s spontaneous behavior using multimodal facial expression and voice signal presenting multimodal transformer-based last feature fusion for facial behavior analysis in children to extract contextualized representations from RGB video sequence and Hematoxylin and eosin video sequence and then using these representations followed by pairwise concatenations of contextualized representations using cross-feature fusion technique to predict users emotions. To validate the performance of the proposed framework, we have performed experiments with the different pairwise concatenations of contextualized representations that showed significantly better performance than state-of-the-art method. Besides, we perform t-distributed stochastic neighbor embedding visualization to visualize the discriminative feature in lower dimension space and probability density estimation to visualize the prediction capability of our proposed model.

REFERENCES

[1] Albraikan Amani, Hafidh Basim, and Saddik Abdulmotaleb El. 2018. iAware: A real-time emotional biofeedback system based on physiological signals. IEEE Access 6 (2018), 78780–78789.Google ScholarCross Ref
[2] Albraikan Amani, Tobón Diana P., and Saddik Abdulmotaleb El. 2018. Hyper-parameter optimization for emotion detection using physiological signals. In Proceedings of the IEEE International Conference on Pervasive Computing and Communications Workshops (PerCom Workshops). IEEE, 836–841.Google ScholarCross Ref
[3] Albraikan Amani, Tobón Diana P., and Saddik Abdulmotaleb El. 2018. Toward user-independent emotion recognition using physiological signals. IEEE Sensors J. 19, 19 (2018), 8402–8412.Google ScholarCross Ref
[4] Beale Russell and Peter Christian. 2008. The role of affect and emotion in HCI. In Affect and Emotion in Human-Computer Interaction. Springer, 1–11.Google Scholar
[5] Demisse Girum G., Aouada Djamila, and Ottersten Björn. 2018. Deformation-based 3D facial expression representation. ACM Trans. Multim. Comput., Commun. Applic. 14, 1s (2018), 1–22.Google ScholarDigital Library
[6] Gu Huang, Chen Qiong, Xing Xiaoli, Zhao Junfeng, and Li Xiaoming. 2019. Facial emotion recognition in deaf children: Evidence from event-related potentials and event-related spectral perturbation analysis. Neuroscience Lett. 703 (2019), 198–204.Google ScholarCross Ref
[7] Khan Rizwan Ahmed, Crenn Arthur, Meyer Alexandre, and Bouakaz Saida. 2019. A novel database of children’s spontaneous facial expressions (LIRIS-CSE). Image Vis. Comput. 83 (2019), 61–69.Google ScholarDigital Library
[8] Kim Yelin and Provost Emily Mower. 2015. Emotion recognition during speech using dynamics of multiple regions of the face. (2015).Google Scholar
[9] Kucirkova Natalia, Evertsen-Stanghelle Cecilie, Studsrød Ingunn, Jensen Ida Bruheim, and Størksen Ingunn. 2020. Lessons for child–computer interaction studies following the research challenges during the Covid-19 pandemic. Int. J. Child-comput. Interact. 26 (2020), 100203.Google ScholarDigital Library
[10] Liu Shiguang, Wang Huixin, and Pei Min. 2022. Facial-expression-aware emotional color transfer based on convolutional neural network. ACM Trans. Multim. Comput., Commun. Applic. 18, 1 (2022), 1–19.Google ScholarDigital Library
[11] Mayor-Torres Juan Manuel, Medina-DeVilliers Sara, Clarkson Tessa, Lerner Matthew D., and Riccardi Giuseppe. 2021. Evaluation of interpretability for deep learning algorithms in EEG emotion recognition: A case study in autism. arXiv preprint arXiv:2111.13208 (2021).Google Scholar
[12] Mehdizadehfar Vida, Ghassemi Farnaz, Fallah Ali, and Pouretemad Hamidreza. 2020. EEG study of facial emotion recognition in the fathers of autistic children. Biomed. Sig. Process. Contr. 56 (2020), 101721.Google ScholarCross Ref
[13] Mehmood Raja Majid and Lee Hyo Jong. 2016. A novel feature extraction method based on late positive potential for emotion recognition in human brain signal patterns. Comput. Electric. Eng. 53 (2016), 444–457.Google ScholarDigital Library
[14] Miao Yu, Dong Haiwei, Jaam Jihad Mohamad Al, and Saddik Abdulmotaleb El. 2019. A deep learning system for recognizing facial expression in real-time. ACM Trans. Multim. Comput., Commun. Applic. 15, 2 (2019), 1–20.Google ScholarDigital Library
[15] Qayyum Abdul, Razzak Imran, Moustafa Nour, and Mazhar Mona. 2022. Progressive ShallowNet for large scale dynamic and spontaneous facial behaviour analysis in children. Image Vis. Comput. (2022), 104375.Google ScholarDigital Library
[16] Shan Caifeng, Gong Shaogang, and McOwan Peter W.. 2009. Facial expression recognition based on local binary patterns: A comprehensive study. Image Vis. Comput. 27, 6 (2009), 803–816.Google ScholarDigital Library
[17] Voeffray S.. 2011. Emotion-sensitive human-computer interaction (HCI): State of the art—Seminar paper. Emot. Recog. (2011), 1–4.Google Scholar
[18] Wang Kai, Peng Xiaojiang, Yang Jianfei, Meng Debin, and Qiao Yu. 2020. Region attention networks for pose and occlusion robust facial expression recognition. IEEE Trans. Image Process. 29 (2020), 4057–4069.Google ScholarDigital Library
[19] Wang Xueping, Wang Yunhong, and Li Weixin. 2019. U-Net conditional GANs for photo-realistic and identity-preserving facial expression synthesis. ACM Trans. Multim. Comput., Commun. Applic. 15, 3s (2019), 1–23.Google ScholarDigital Library
[20] Yang Huei-Fang, Lin Bo-Yao, Chang Kuang-Yu, and Chen Chu-Song. 2018. Joint estimation of age and expression by combining scattering and convolutional networks. ACM Trans. Multim. Comput., Commun. Applic. 14, 1 (2018), 1–18.Google ScholarDigital Library
[21] Yao Anbang, Cai Dongqi, Hu Ping, Wang Shandong, Sha Liang, and Chen Yurong. 2016. HoloNet: Towards robust emotion recognition in the wild. In Proceedings of the 18th ACM International Conference on Multimodal Interaction. 472–478.Google ScholarDigital Library
[22] Yin Guanghao, Sun Shouqian, Yu Dian, Li Dejian, and Zhang Kejun. 2022. A multimodal framework for large-scale emotion recognition by fusing music and electrodermal activity signals. ACM Trans. Multim. Comput., Commun. Applic. 18, 3 (2022), 1–23.Google ScholarDigital Library
[23] Zeng Jiabei, Shan Shiguang, and Chen Xilin. 2018. Facial expression recognition with inconsistently annotated datasets. In Proceedings of the European Conference on Computer Vision (ECCV). 222–237.Google ScholarDigital Library
[24] Zhang Wei, Yao Ting, Zhu Shiai, and Saddik Abdulmotaleb El. 2019. Deep learning–based multimedia analytics: A review. ACM Trans. Multim. Comput., Commun. Applic. 15, 1s (2019), 1–26.Google ScholarDigital Library
[25] Zhang Zhaoxin, Guo Changyong, Meng Fanzhi, Xu Taizhong, and Huang Junkai. 2020. CovLets: A second-order descriptor for modeling multiple features. ACM Trans. Multim. Comput., Commun. Applic. 16, 1s (2020), 1–14.Google ScholarDigital Library
[26] Zhao Guoying and Pietikainen Matti. 2007. Dynamic texture recognition using local binary patterns with an application to facial expressions. IEEE Trans. Patt. Anal. Mach. Intell. 29, 6 (2007), 915–928.Google ScholarDigital Library
[27] Zhi Ruicong, Flierl Markus, Ruan Qiuqi, and Kleijn W. Bastiaan. 2010. Graph-preserving sparse nonnegative matrix factorization with application to facial expression recognition. IEEE Trans. Syst., Man, Cyber., Part B (Cyber.) 41, 1 (2010), 38–52.Google Scholar
[28] Zhong Lin, Liu Qingshan, Yang Peng, Liu Bo, Huang Junzhou, and Metaxas Dimitris N.. 2012. Learning active facial patches for expression analysis. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2562–2569.Google ScholarDigital Library

Index Terms

Spontaneous Facial Behavior Analysis Using Deep Transformer-based Framework for Child–computer Interaction
1. Computing methodologies
  1. Machine learning
    1. Machine learning algorithms
      1. Feature selection
2. Human-centered computing
  1. Interaction design
    1. Interaction design theory, concepts and paradigms

Recommendations

Emotion and Collaborative Filtering-Based Recommendation System
SMA 2020: The 9th International Conference on Smart Media and Applications

Emotion information represents a user’s current emotional state and can be used in a variety of applications, such as cultural content services that recommend music according to user emotional states and user emotion monitoring. To increase user ...
Read More
Towards Virtual Audience Simulation For Speech Therapy
IVA '23: Proceedings of the 23rd ACM International Conference on Intelligent Virtual Agents

The utilization of virtual reality (VR) technology has shown promise in various therapeutic applications, particularly in exposure therapy for reducing fear of certain situations objects or activities, e.g. fear of height, or negative evaluation of ...
Read More
Towards natural child-computer interaction: recognizing spoken communicative styles
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Multimedia Computing, Communications, and Applications Volume 20, Issue 2
February 2024
548 pages
ISSN:1551-6857
EISSN:1551-6865
DOI:10.1145/3613570
Editor:
Abdulmotaleb El Saddik
Mohamed Bin Zayed University of Artificial Intelligence, UAE and University of Ottawa, Canada
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 26 September 2023
- Online AM: 26 May 2022
- Accepted: 16 May 2022
- Revised: 14 April 2022
- Received: 16 October 2021
Published in tomm Volume 20, Issue 2

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Datasets
neural networks
gaze detection
text tagging
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 1
  Total Citations
  View Citations
- 311
  Total Downloads
- Downloads (Last 12 months)167
- Downloads (Last 6 weeks)13
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

View Full Text

Spontaneous Facial Behavior Analysis Using Deep Transformer-based Framework for Child–computer Interaction

ACM Transactions on Multimedia Computing, Communications, and Applications

Abstract

REFERENCES

Cited By

Index Terms

Recommendations

Emotion and Collaborative Filtering-Based Recommendation System

Towards Virtual Audience Simulation For Speech Therapy

Towards natural child-computer interaction: recognizing spoken communicative styles