Exploring Multimodal Features and Fusion for Time-Continuous Prediction of Emotional Valence and Arousal

Kumar, Ajit; Choi, Bong Jun; Pandey, Sandeep Kumar; Park, Sanghyeon; Choi, SeongIk; Shekhawat, Hanumant Singh; De Neve, Wesley; Saini, Mukesh; Prasanna, S. R. M.; Singh, Dhananjay

doi:10.1007/978-3-030-98404-5_65

Ajit Kumar¹⁴,
Bong Jun Choi¹⁴,
Sandeep Kumar Pandey¹⁵,
Sanghyeon Park¹⁶,
SeongIk Choi¹⁶,
Hanumant Singh Shekhawat¹⁵,
Wesley De Neve¹⁶,
Mukesh Saini¹⁷,
S. R. M. Prasanna¹⁸ &
…
Dhananjay Singh¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13184))

Included in the following conference series:

International Conference on Intelligent Human Computer Interaction

1870 Accesses

Abstract

Advances in machine learning and deep learning make it possible to detect and analyse emotion and sentiment using textual and audio-visual information at increasing levels of effectiveness. Recently, an interest has emerged to also apply these techniques for the assessment of mental health, including the detection of stress and depression. In this paper, we introduce an approach that predicts stress (emotional valence and arousal) in a time-continuous manner from audio-visual recordings, testing the effectiveness of different deep learning techniques and various features. Specifically, apart from adopting popular features (e.g., BERT, BPM, ECG, and VGGFace), we explore the use of new features, both engineered and learned, along different modalities to improve the effectiveness of time-continuous stress prediction: for video, we study the use of ResNet-50 features and the use of body and pose features through OpenPose, whereas for audio, we primarily investigate the use of Integrated Linear Prediction Residual (ILPR) features. The best result we achieved was a combined CCC value of 0.7595 and 0.3379 for the development set and the test set of MuSe-Stress 2021, respectively.

This research was supported under the India-Korea Joint Programme of Cooperation in Science & Technology by the National Research Foundation (NRF) Korea (2020K1A3A1A68093469), the Ministry of Science and ICT (MSIT) Korea and by the Department of Biotechnology (India) (DBT/IC-12031(22)-ICD-DBT).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Hybrid deep models for parallel feature extraction and enhanced emotion state classification

Article Open access 23 October 2024

Decoding emotions and unveiling stress: a non-invasive approach through sequential feature extraction and multiclass classifiers

Article 19 August 2024

Emotion State Detection Using EEG Signals—A Machine Learning Perspective

Notes

1.
The 25 OBF keypoints are Nose, Neck, R/L Shoulders, R/L Elbows, R/L Wrists, MidHip, R/L Hips, R/L Knees, R/L Ankles, R/L Eyes, R/L Ears, R/L BigToes, R/L SmallToes, R/L Heels, and Background (R/L stands for Right/Left).
2.
https://github.com/lstappen/MuSe2021.

References

Stappen, L., et al.: The MuSe 2021 multimodal sentiment analysis challenge: sentiment, emotion, physiological-emotion, and stress. In: Proceedings of the 2nd International on Multimodal Sentiment Analysis Challenge and Workshop. Association for Computing Machinery, New York (2021)
Google Scholar
Stappen, L., Baird, A., Schumann, L., Schuller, B.: The multimodal sentiment analysis in car reviews (MuSe-car) dataset: collection, insights and improvements. IEEE Trans. Affect. Comput. (2021)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
Google Scholar
Cao, Z., Hidalgo, G., Simon, T., Wei, S.-E., Sheikh, Y.: OpenPose: realtime multi-person 2D pose estimation using part affinity fields. IEEE Trans. Pattern Anal. Mach. Intell. 43(1), 172–186 (2019)
Article Google Scholar
Redmon, J., Farhadi, A.: YOLOV3: an incremental improvement (2018)
Google Scholar
Baghel, S., Prasanna, S.R.M., Guha, P.: Classification of multi speaker shouted speech and single speaker normal speech. In: TENCON 2017–2017 IEEE Region 10 Conference, pp. 2388–2392. IEEE (2017)
Google Scholar
Eyben, F., et al.: The Geneva minimalistic acoustic parameter set (GeMAPS) for voice research and affective computing. IEEE Trans. Affect. Comput. 7(2), 190–202 (2015)
Article Google Scholar
Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Degottex, G.: Glottal source and vocal-tract separation. Ph.D. thesis, Université Pierre et Marie Curie-Paris VI (2010)
Google Scholar
Rothenberg, M.: Acoustic interaction between the glottal source and the vocal tract. Vocal Fold Physiol. 1, 305–323 (1981)
Google Scholar
Loweimi, E., Barker, J., Saz-Torralba, O., Hain, T.: Robust source-filter separation of speech signal in the phase domain. In: Interspeech, pp. 414–418 (2017)
Google Scholar
Prasanna, S.R.M., Gupta, C.S., Yegnanarayana, B.: Extraction of speaker-specific excitation information from linear prediction residual of speech. Speech Commun. 48(10), 1243–1261 (2006)
Article Google Scholar
Baghel, S., Prasanna, S.R.M., Guha, P.: Exploration of excitation source information for shouted and normal speech classification. J. Acoust. Soc. Am. 147(2), 1250–1261 (2020)
Article Google Scholar
Makhoul, J.: Linear prediction: a tutorial review. Proc. IEEE 63(4), 561–580 (1975)
Article Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition (2015)
Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. Commun. ACM 60(6), 84–90 (2017)
Article Google Scholar
Zhang, K., Zhang, Z., Li, Z., Qiao, Y.: Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Process. Lett. 23(10), 1499–1503 (2016)
Article Google Scholar
Cao, Q., Shen, L., Xie, W., Parkhi, O.M., Zisserman, A.: VGGFace2: a dataset for recognising faces across pose and age. In: 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018) (2018)
Google Scholar
Parkhi, O.M., Vedaldi, A., Zisserman, A.: Deep face recognition, pp. 1–12. British Machine Vision Association (2015)
Google Scholar
Stappen, L., et al.: MuSe 2020 challenge and workshop: multimodal sentiment analysis, emotion-target engagement and trustworthiness detection in real-life media: emotional car reviews in-the-wild. In: Proceedings of the 1st International on Multimodal Sentiment Analysis in Real-life Media Challenge and Workshop, pp. 35–44 (2020)
Google Scholar
Cao, Z., Hidalgo, G., Simon, T., Wei, S.-E., Sheikh, Y.: OpenPose: realtime multi-person 2D pose estimation using part affinity fields. arXiv preprint arXiv:1812.08008 (2018)
Simon, T., Joo, H., Matthews, I., Sheikh, Y.: Hand keypoint detection in single images using multiview bootstrapping. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4645–4653 (2017)
Google Scholar
Qin, S., Kim, S., Manduchi, R.: Automatic skin and hair masking using fully convolutional networks. In: 2017 IEEE International Conference on Multimedia and Expo (ICME) (2017)
Google Scholar
Reimers, N., Gurevych, I.: Sentence-BERT: sentence embeddings using Siamese BERT-networks. arXiv preprint arXiv:1908.10084 (2019)
Atrey, P.K., Hossain, M.A., El Saddik, A., Kankanhalli, M.S.: Multimodal fusion for multimedia analysis: a survey. Multimed. Syst. 16(6), 345–379 (2010)
Article Google Scholar
Zhang, Q., Xiao, T., Huang, N., Zhang, D., Han, J.: Revisiting feature fusion for RGB-T salient object detection. IEEE Trans. Circ. Syst. Video Technol. 31(5), 1804–1818 (2020)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Soongsil University, Seoul, South Korea
Ajit Kumar & Bong Jun Choi
IIT Guwahati, Guwahati, Assam, India
Sandeep Kumar Pandey & Hanumant Singh Shekhawat
Ghent University Global Campus, Incheon, South Korea
Sanghyeon Park, SeongIk Choi & Wesley De Neve
IIT Ropar, Rupnagar, Punjab, India
Mukesh Saini
IIT Dharwad, Dharwad, Karnataka, India
S. R. M. Prasanna
ReSENSE Lab, HUFS, Seoul, South Korea
Dhananjay Singh

Authors

Ajit Kumar
View author publications
You can also search for this author in PubMed Google Scholar
Bong Jun Choi
View author publications
You can also search for this author in PubMed Google Scholar
Sandeep Kumar Pandey
View author publications
You can also search for this author in PubMed Google Scholar
Sanghyeon Park
View author publications
You can also search for this author in PubMed Google Scholar
SeongIk Choi
View author publications
You can also search for this author in PubMed Google Scholar
Hanumant Singh Shekhawat
View author publications
You can also search for this author in PubMed Google Scholar
Wesley De Neve
View author publications
You can also search for this author in PubMed Google Scholar
Mukesh Saini
View author publications
You can also search for this author in PubMed Google Scholar
S. R. M. Prasanna
View author publications
You can also search for this author in PubMed Google Scholar
Dhananjay Singh
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bong Jun Choi .

Editor information

Editors and Affiliations

Kent State University, Kent, OH, USA
Jong-Hoon Kim
University of Tartu, Tartu, Estonia
Madhusudan Singh
Kent State University, Kent, OH, USA
Javed Khan
Indian Institute of Information Technology, Allahabad, India
Uma Shanker Tiwary
Massachusetts Institute of Technology, Cambridge, MA, USA
Marigankar Sur
Hankuk University of Foreign Studies, Seoul, Korea (Republic of)
Dhananjay Singh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kumar, A. et al. (2022). Exploring Multimodal Features and Fusion for Time-Continuous Prediction of Emotional Valence and Arousal. In: Kim, JH., Singh, M., Khan, J., Tiwary, U.S., Sur, M., Singh, D. (eds) Intelligent Human Computer Interaction. IHCI 2021. Lecture Notes in Computer Science, vol 13184. Springer, Cham. https://doi.org/10.1007/978-3-030-98404-5_65

Download citation

DOI: https://doi.org/10.1007/978-3-030-98404-5_65
Published: 20 March 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-98403-8
Online ISBN: 978-3-030-98404-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Exploring Multimodal Features and Fusion for Time-Continuous Prediction of Emotional Valence and Arousal

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Hybrid deep models for parallel feature extraction and enhanced emotion state classification

Decoding emotions and unveiling stress: a non-invasive approach through sequential feature extraction and multiclass classifiers

Emotion State Detection Using EEG Signals—A Machine Learning Perspective

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Exploring Multimodal Features and Fusion for Time-Continuous Prediction of Emotional Valence and Arousal

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Hybrid deep models for parallel feature extraction and enhanced emotion state classification

Decoding emotions and unveiling stress: a non-invasive approach through sequential feature extraction and multiclass classifiers

Emotion State Detection Using EEG Signals—A Machine Learning Perspective

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation