Skip to main content

Multimodal Affective Computing Based on Weighted Linear Fusion

  • Conference paper
  • First Online:
Intelligent Systems and Applications (IntelliSys 2020)

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1252))

Included in the following conference series:

Abstract

Affective computing is one of the most important research directions in human-computer interaction system which gains increasing popularity. However, the traditional affective computing methods all make decisions based on unimodal signals, which has low accuracy and poor feasibility. In this article, the final classification decision is made from the perspective of multimodal fusion results which combines the decision results of both text emotion network and visual emotion network through weighted linear fusion algorithm. It is obvious that the speaker’s intention can be better understood by observing the speaker’s expression, listening to the speaker’s tone of voice and analyzing the words. Combining auditory, visual, semantic and other modes certainly provides more information than a single mode. Video information often contains a variety of modal characteristics, only one mode is always not enough to describe all aspects of the overall video stream characteristic information.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. D’mello, S.K., Kory, J.: A review and meta-analysis of multimodal affect detection systems. ACM Comput. Surv. 47(3), 43–79 (2015)

    Google Scholar 

  2. Sharma, A., Anderson, D.V.: Deep emotion recognition using prosodic and spectral feature extraction and classification based on cross validation and bootstrap. In: Signal Processing and Signal Processing Education Workshop, pp. 421–425. IEEE (2015)

    Google Scholar 

  3. Eknam, P., Friesen, W.V.: Constants across cultures in the face and emotion. J. Pers. Soc. Psychol. 17(2), 124–129 (1971)

    Article  Google Scholar 

  4. Ekman, P., Friesen, W.V.: Facial action coding system: a technique for the measurement of facial movement. Palo Alto: Consul. Psychol. 17(2), 124–129 (1971)

    Google Scholar 

  5. Solymani, M., Pantic, M., Pun, T.: Multimodal emotion recognition in response to videos. IEEE Trans. Affect. Comput. (TAC) 3(2), 211–223 (2012)

    Article  Google Scholar 

  6. Pantic, M., Rothkrantz, L.J.M.: Automatic analysis of facial expressions: the state of the art. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 22(12), 1424–1445 (2000)

    Article  Google Scholar 

  7. Viola, P., Jones, M.J.: Robust real-time object detection. Int. J. Comput. Vis (IJCV) 57(2), 137–154 (2004)

    Article  Google Scholar 

  8. Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: Computer Vision and Pattern Recognition (CVPR), pp. 886–893 (2005)

    Google Scholar 

  9. Ojala, T., Pietikinen, M., Harwood, D.: A comparative study of texture measures with classification based on feature distributions. Pattern Recogn. 29(1), 51–59 (1996)

    Article  Google Scholar 

  10. Zhang, L., Chu, R., Xiang, S.: Face detection based on multi-block LBP representation. In: International Conference on Biometrics, pp. 11–18 (2007)

    Google Scholar 

  11. Li, H., Lin, Z., Shen, X.: A convolutional neural network cascade for face detection. In: CVPR, pp. 5325–5334 (2015)

    Google Scholar 

  12. Yang, S., Luo, P., Loy, C.-C., Tang, X.: From facial parts responses to face detection: a deep learning approach. In: IEEE International Conference on Computer Vision (ICCV), pp. 3676–3684 (2015)

    Google Scholar 

  13. Zhang, K., Zhang, Z., Li, Z., Qiao, Yu.: Joint face detection and alignment using multi task cascaded convolutional networks. Signal Process. Lett. 23(10), 1499–1503 (2016)

    Article  Google Scholar 

  14. Ren, S., He, K., Girshick, R., et al.: Faster R-CNN: towards real-time object detection with region proposal networks. TPAMI 39(6), 1137–1149 (2017)

    Article  Google Scholar 

  15. Jiang, H., Learned-Miller, E.: Face detection with the faster R-CNN. In: IEEE International Conference on Automatic Face & Gesture Recognition (FG), pp. 650–657 (2017)

    Google Scholar 

  16. Najibi, M., Samangouei, P., Chellappa, R., Davis, L.S.: SSH: single stage head less face detector. In: ICCV, pp. 4875–4884 (2017)

    Google Scholar 

  17. Liu, W., Anguelov, D., Erhan, D., et al.: SSD: single shot multibox detector. In: European Conference on Computer Vision (ECCV), pp. 21–37 (2016)

    Google Scholar 

  18. Zhang, S.F., Zhu, X.Y., Lei, Z., et al.: S3FD: singles hot scale-invariant face detector. In: ICCV, pp. 192–201 (2017)

    Google Scholar 

  19. Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: CVPR, pp. 779–788 (2016)

    Google Scholar 

  20. Liu, C., Wechsler, H.: Gabor feature based classification using the enhanced fisher linear discriminant model for face recognition. IEEE Trans. Image Process. (TIP) 11(4), 467–476 (2002)

    Article  Google Scholar 

  21. Shan, C., Gong, S., Mcowan, P.W.: Facial expression Recognition based on local binary patterns: a comprehensive study. Image Vis. Comput. 27(6), 803–816 (2009)

    Article  Google Scholar 

  22. Mavadatis, M., Mahoor, M.H., Bartlett, K., et al.: DISFA: a spontaneous facial action intensity database. TAC 4(2), 151–160 (2013)

    Google Scholar 

  23. Tan, L., Zhang, K., Wang, K., et al.: Group emotion recognition with individual facial emotion CNNs and global image based CNNs. In: ICMI, pp. 549–552 (2017)

    Google Scholar 

  24. Soujanya, P., Navonil, M., Rada, M., Eduard, H.: Emotion recognition in conversation: research challenges, datasets, and recent advances (2019). arXiv preprint arXiv:1905.02947

    Google Scholar 

  25. Shoumy, N.J., Ang, L.M., Seng, K.P., Rahaman, D.M.M., Zia, T.: Multimodal big data affective analytics: a comprehensive survey using text, audio, visual and physiological signals. J. Netw. Comput. Appl. 149, 102447 (2020)

    Article  Google Scholar 

  26. Neverova, N., Wolf, C., Taylor, G., et al.: Moddrop: adaptive multi-modal gesture recognition. TPAMI 38(8), 1692–1706 (2016)

    Article  Google Scholar 

  27. Weenink, D.: Canonical Correlation Analysis Inference For Functional Data With Applications. Springer, New York (2012)

    Google Scholar 

  28. Majumder, N., Poria, S., Hazarika, D., et al.: Dialogue RNN: an attentive RNN for emotion detection in conversations. In: National Conference on Artificial Intelligence, pp. 6818–6825 (2019)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ke Jin .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Jin, K., Wang, Y., Wu, C. (2021). Multimodal Affective Computing Based on Weighted Linear Fusion. In: Arai, K., Kapoor, S., Bhatia, R. (eds) Intelligent Systems and Applications. IntelliSys 2020. Advances in Intelligent Systems and Computing, vol 1252. Springer, Cham. https://doi.org/10.1007/978-3-030-55190-2_1

Download citation

Publish with us

Policies and ethics