Multimodal Affective Computing Based on Weighted Linear Fusion

Jin, Ke; Wang, Yiming; Wu, Cheng

doi:10.1007/978-3-030-55190-2_1

Ke Jin¹⁷,
Yiming Wang¹⁷ &
Cheng Wu¹⁷

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1252))

Included in the following conference series:

Proceedings of SAI Intelligent Systems Conference

885 Accesses
1 Citations

Abstract

Affective computing is one of the most important research directions in human-computer interaction system which gains increasing popularity. However, the traditional affective computing methods all make decisions based on unimodal signals, which has low accuracy and poor feasibility. In this article, the final classification decision is made from the perspective of multimodal fusion results which combines the decision results of both text emotion network and visual emotion network through weighted linear fusion algorithm. It is obvious that the speaker’s intention can be better understood by observing the speaker’s expression, listening to the speaker’s tone of voice and analyzing the words. Combining auditory, visual, semantic and other modes certainly provides more information than a single mode. Video information often contains a variety of modal characteristics, only one mode is always not enough to describe all aspects of the overall video stream characteristic information.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

D’mello, S.K., Kory, J.: A review and meta-analysis of multimodal affect detection systems. ACM Comput. Surv. 47(3), 43–79 (2015)
Google Scholar
Sharma, A., Anderson, D.V.: Deep emotion recognition using prosodic and spectral feature extraction and classification based on cross validation and bootstrap. In: Signal Processing and Signal Processing Education Workshop, pp. 421–425. IEEE (2015)
Google Scholar
Eknam, P., Friesen, W.V.: Constants across cultures in the face and emotion. J. Pers. Soc. Psychol. 17(2), 124–129 (1971)
Article Google Scholar
Ekman, P., Friesen, W.V.: Facial action coding system: a technique for the measurement of facial movement. Palo Alto: Consul. Psychol. 17(2), 124–129 (1971)
Google Scholar
Solymani, M., Pantic, M., Pun, T.: Multimodal emotion recognition in response to videos. IEEE Trans. Affect. Comput. (TAC) 3(2), 211–223 (2012)
Article Google Scholar
Pantic, M., Rothkrantz, L.J.M.: Automatic analysis of facial expressions: the state of the art. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 22(12), 1424–1445 (2000)
Article Google Scholar
Viola, P., Jones, M.J.: Robust real-time object detection. Int. J. Comput. Vis (IJCV) 57(2), 137–154 (2004)
Article Google Scholar
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: Computer Vision and Pattern Recognition (CVPR), pp. 886–893 (2005)
Google Scholar
Ojala, T., Pietikinen, M., Harwood, D.: A comparative study of texture measures with classification based on feature distributions. Pattern Recogn. 29(1), 51–59 (1996)
Article Google Scholar
Zhang, L., Chu, R., Xiang, S.: Face detection based on multi-block LBP representation. In: International Conference on Biometrics, pp. 11–18 (2007)
Google Scholar
Li, H., Lin, Z., Shen, X.: A convolutional neural network cascade for face detection. In: CVPR, pp. 5325–5334 (2015)
Google Scholar
Yang, S., Luo, P., Loy, C.-C., Tang, X.: From facial parts responses to face detection: a deep learning approach. In: IEEE International Conference on Computer Vision (ICCV), pp. 3676–3684 (2015)
Google Scholar
Zhang, K., Zhang, Z., Li, Z., Qiao, Yu.: Joint face detection and alignment using multi task cascaded convolutional networks. Signal Process. Lett. 23(10), 1499–1503 (2016)
Article Google Scholar
Ren, S., He, K., Girshick, R., et al.: Faster R-CNN: towards real-time object detection with region proposal networks. TPAMI 39(6), 1137–1149 (2017)
Article Google Scholar
Jiang, H., Learned-Miller, E.: Face detection with the faster R-CNN. In: IEEE International Conference on Automatic Face & Gesture Recognition (FG), pp. 650–657 (2017)
Google Scholar
Najibi, M., Samangouei, P., Chellappa, R., Davis, L.S.: SSH: single stage head less face detector. In: ICCV, pp. 4875–4884 (2017)
Google Scholar
Liu, W., Anguelov, D., Erhan, D., et al.: SSD: single shot multibox detector. In: European Conference on Computer Vision (ECCV), pp. 21–37 (2016)
Google Scholar
Zhang, S.F., Zhu, X.Y., Lei, Z., et al.: S3FD: singles hot scale-invariant face detector. In: ICCV, pp. 192–201 (2017)
Google Scholar
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: CVPR, pp. 779–788 (2016)
Google Scholar
Liu, C., Wechsler, H.: Gabor feature based classification using the enhanced fisher linear discriminant model for face recognition. IEEE Trans. Image Process. (TIP) 11(4), 467–476 (2002)
Article Google Scholar
Shan, C., Gong, S., Mcowan, P.W.: Facial expression Recognition based on local binary patterns: a comprehensive study. Image Vis. Comput. 27(6), 803–816 (2009)
Article Google Scholar
Mavadatis, M., Mahoor, M.H., Bartlett, K., et al.: DISFA: a spontaneous facial action intensity database. TAC 4(2), 151–160 (2013)
Google Scholar
Tan, L., Zhang, K., Wang, K., et al.: Group emotion recognition with individual facial emotion CNNs and global image based CNNs. In: ICMI, pp. 549–552 (2017)
Google Scholar
Soujanya, P., Navonil, M., Rada, M., Eduard, H.: Emotion recognition in conversation: research challenges, datasets, and recent advances (2019). arXiv preprint arXiv:1905.02947
Google Scholar
Shoumy, N.J., Ang, L.M., Seng, K.P., Rahaman, D.M.M., Zia, T.: Multimodal big data affective analytics: a comprehensive survey using text, audio, visual and physiological signals. J. Netw. Comput. Appl. 149, 102447 (2020)
Article Google Scholar
Neverova, N., Wolf, C., Taylor, G., et al.: Moddrop: adaptive multi-modal gesture recognition. TPAMI 38(8), 1692–1706 (2016)
Article Google Scholar
Weenink, D.: Canonical Correlation Analysis Inference For Functional Data With Applications. Springer, New York (2012)
Google Scholar
Majumder, N., Poria, S., Hazarika, D., et al.: Dialogue RNN: an attentive RNN for emotion detection in conversations. In: National Conference on Artificial Intelligence, pp. 6818–6825 (2019)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Urban Rail Transportation, Soochow University, Suzhou, China
Ke Jin, Yiming Wang & Cheng Wu

Authors

Ke Jin
View author publications
You can also search for this author in PubMed Google Scholar
Yiming Wang
View author publications
You can also search for this author in PubMed Google Scholar
Cheng Wu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ke Jin .

Editor information

Editors and Affiliations

Saga University, Saga, Japan
Kohei Arai
The Science and Information (SAI) Organization, Bradford, West Yorkshire, UK
Supriya Kapoor
The Science and Information (SAI) Organization, Bradford, West Yorkshire, UK
Rahul Bhatia

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jin, K., Wang, Y., Wu, C. (2021). Multimodal Affective Computing Based on Weighted Linear Fusion. In: Arai, K., Kapoor, S., Bhatia, R. (eds) Intelligent Systems and Applications. IntelliSys 2020. Advances in Intelligent Systems and Computing, vol 1252. Springer, Cham. https://doi.org/10.1007/978-3-030-55190-2_1

Download citation

DOI: https://doi.org/10.1007/978-3-030-55190-2_1
Published: 25 August 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-55189-6
Online ISBN: 978-3-030-55190-2
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics