Abstract
A fascinating challenge in robotics-human interaction is imitating the emotion recognition capability of humans to robots with the aim to make human-robotics interaction natural, genuine and intuitive. To achieve the natural interaction in affective robots, human-machine interfaces, and autonomous vehicles, understanding our attitudes and opinions is very important, and it provides a practical and feasible path to realize the connection between machine and human. Multimodal interface that includes voice along with facial expression can manifest a large range of nuanced emotions compared to purely textual interfaces and provide a great value to improve the intelligence level of effective communication. Interfaces that fail to manifest or ignore user emotions may significantly impact the performance and risk being perceived as cold, socially inept, untrustworthy, and incompetent. To equip a child well for life, we need to help our children identify their feelings, manage them well, and express their needs in healthy, respectful, and direct ways. Early identification of emotional deficits can help to prevent low social functioning in children. In this work, we analyzed the child’s spontaneous behavior using multimodal facial expression and voice signal presenting multimodal transformer-based last feature fusion for facial behavior analysis in children to extract contextualized representations from RGB video sequence and Hematoxylin and eosin video sequence and then using these representations followed by pairwise concatenations of contextualized representations using cross-feature fusion technique to predict users emotions. To validate the performance of the proposed framework, we have performed experiments with the different pairwise concatenations of contextualized representations that showed significantly better performance than state-of-the-art method. Besides, we perform t-distributed stochastic neighbor embedding visualization to visualize the discriminative feature in lower dimension space and probability density estimation to visualize the prediction capability of our proposed model.
- [1] . 2018. iAware: A real-time emotional biofeedback system based on physiological signals. IEEE Access 6 (2018), 78780–78789.Google ScholarCross Ref
- [2] . 2018. Hyper-parameter optimization for emotion detection using physiological signals. In Proceedings of the IEEE International Conference on Pervasive Computing and Communications Workshops (PerCom Workshops). IEEE, 836–841.Google ScholarCross Ref
- [3] . 2018. Toward user-independent emotion recognition using physiological signals. IEEE Sensors J. 19, 19 (2018), 8402–8412.Google ScholarCross Ref
- [4] . 2008. The role of affect and emotion in HCI. In Affect and Emotion in Human-Computer Interaction. Springer, 1–11.Google Scholar
- [5] . 2018. Deformation-based 3D facial expression representation. ACM Trans. Multim. Comput., Commun. Applic. 14, 1s (2018), 1–22.Google ScholarDigital Library
- [6] . 2019. Facial emotion recognition in deaf children: Evidence from event-related potentials and event-related spectral perturbation analysis. Neuroscience Lett. 703 (2019), 198–204.Google ScholarCross Ref
- [7] . 2019. A novel database of children’s spontaneous facial expressions (LIRIS-CSE). Image Vis. Comput. 83 (2019), 61–69.Google ScholarDigital Library
- [8] . 2015. Emotion recognition during speech using dynamics of multiple regions of the face. (2015).Google Scholar
- [9] . 2020. Lessons for child–computer interaction studies following the research challenges during the Covid-19 pandemic. Int. J. Child-comput. Interact. 26 (2020), 100203.Google ScholarDigital Library
- [10] . 2022. Facial-expression-aware emotional color transfer based on convolutional neural network. ACM Trans. Multim. Comput., Commun. Applic. 18, 1 (2022), 1–19.Google ScholarDigital Library
- [11] . 2021. Evaluation of interpretability for deep learning algorithms in EEG emotion recognition: A case study in autism. arXiv preprint arXiv:2111.13208 (2021).Google Scholar
- [12] . 2020. EEG study of facial emotion recognition in the fathers of autistic children. Biomed. Sig. Process. Contr. 56 (2020), 101721.Google ScholarCross Ref
- [13] . 2016. A novel feature extraction method based on late positive potential for emotion recognition in human brain signal patterns. Comput. Electric. Eng. 53 (2016), 444–457.Google ScholarDigital Library
- [14] . 2019. A deep learning system for recognizing facial expression in real-time. ACM Trans. Multim. Comput., Commun. Applic. 15, 2 (2019), 1–20.Google ScholarDigital Library
- [15] . 2022. Progressive ShallowNet for large scale dynamic and spontaneous facial behaviour analysis in children. Image Vis. Comput. (2022), 104375.Google ScholarDigital Library
- [16] . 2009. Facial expression recognition based on local binary patterns: A comprehensive study. Image Vis. Comput. 27, 6 (2009), 803–816.Google ScholarDigital Library
- [17] . 2011. Emotion-sensitive human-computer interaction (HCI): State of the art—Seminar paper. Emot. Recog. (2011), 1–4.Google Scholar
- [18] . 2020. Region attention networks for pose and occlusion robust facial expression recognition. IEEE Trans. Image Process. 29 (2020), 4057–4069.Google ScholarDigital Library
- [19] . 2019. U-Net conditional GANs for photo-realistic and identity-preserving facial expression synthesis. ACM Trans. Multim. Comput., Commun. Applic. 15, 3s (2019), 1–23.Google ScholarDigital Library
- [20] . 2018. Joint estimation of age and expression by combining scattering and convolutional networks. ACM Trans. Multim. Comput., Commun. Applic. 14, 1 (2018), 1–18.Google ScholarDigital Library
- [21] . 2016. HoloNet: Towards robust emotion recognition in the wild. In Proceedings of the 18th ACM International Conference on Multimodal Interaction. 472–478.Google ScholarDigital Library
- [22] . 2022. A multimodal framework for large-scale emotion recognition by fusing music and electrodermal activity signals. ACM Trans. Multim. Comput., Commun. Applic. 18, 3 (2022), 1–23.Google ScholarDigital Library
- [23] . 2018. Facial expression recognition with inconsistently annotated datasets. In Proceedings of the European Conference on Computer Vision (ECCV). 222–237.Google ScholarDigital Library
- [24] . 2019. Deep learning–based multimedia analytics: A review. ACM Trans. Multim. Comput., Commun. Applic. 15, 1s (2019), 1–26.Google ScholarDigital Library
- [25] . 2020. CovLets: A second-order descriptor for modeling multiple features. ACM Trans. Multim. Comput., Commun. Applic. 16, 1s (2020), 1–14.Google ScholarDigital Library
- [26] . 2007. Dynamic texture recognition using local binary patterns with an application to facial expressions. IEEE Trans. Patt. Anal. Mach. Intell. 29, 6 (2007), 915–928.Google ScholarDigital Library
- [27] . 2010. Graph-preserving sparse nonnegative matrix factorization with application to facial expression recognition. IEEE Trans. Syst., Man, Cyber., Part B (Cyber.) 41, 1 (2010), 38–52.Google Scholar
- [28] . 2012. Learning active facial patches for expression analysis. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2562–2569.Google ScholarDigital Library
Index Terms
- Spontaneous Facial Behavior Analysis Using Deep Transformer-based Framework for Child–computer Interaction
Recommendations
Emotion and Collaborative Filtering-Based Recommendation System
SMA 2020: The 9th International Conference on Smart Media and ApplicationsEmotion information represents a user’s current emotional state and can be used in a variety of applications, such as cultural content services that recommend music according to user emotional states and user emotion monitoring. To increase user ...
Towards Virtual Audience Simulation For Speech Therapy
IVA '23: Proceedings of the 23rd ACM International Conference on Intelligent Virtual AgentsThe utilization of virtual reality (VR) technology has shown promise in various therapeutic applications, particularly in exposure therapy for reducing fear of certain situations objects or activities, e.g. fear of height, or negative evaluation of ...
Comments