Skip to main content
Log in

Audio-visual emotion recognition using multi-directional regression and Ridgelet transform

  • Original Paper
  • Published:
Journal on Multimodal User Interfaces Aims and scope Submit manuscript

Abstract

In this paper, we propose an audio-visual emotion recognition system using multi-directional regression (MDR) audio features and ridgelet transform based face image features. MDR features capture directional derivative information in a spectro-temporal domain of speech, and, thereby, suitable to encode different levels of increasing or decreasing pitch and formant frequencies. For video inputs, interest points in a time frame are detected using spectro-temporal filters, and ridgelet transform is applied to cuboids around the interest points. Two separate extreme learning machine classifiers, one for speech modality and the other for face modality, are used. The scores of these two classifiers are fused using a Bayesian sum rule to make the final decision. Experimental results on eNTERFACE database show that the proposed method achieves accuracy of 85.06 % using bimodal inputs, 64.04 % using speech only, and 58.38 % using face only; these accuracies outnumber the accuracies obtained by some other state-of-the-art systems using the same database.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Schuller B, Rigoll G, Lang M (2004) Speech emotion recognition combining acoustic features and linguistic information in a hybrid support vector machine—belief network architecture. In: Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp I-577–580

  2. Zhou Y, Sun Y, Zhang J, Yan Y (2009) Speech emotion recognition using both spectral and prosodic features. In: Proceedings of International Conference Information Engineering and Computer Science (ICIECS), pp 1–4

  3. Devillers L, Vidrascu V (2006) Real-life emotion detection with lexical and paralinguistic cues on Human-Human call center dialogs. In: Proceedings of Interspeech’2006, Pittsburgh

  4. Gharavian D, Sheikhan M, Nazerieh AR, Garoucy S (2012) Speech emotion recognition using FCBF feature selection method and GA-optimized fuzzy ARTMAP neural network. Neural Comput Appl 21(8):2115–2126. doi:10.1007/s00521-011-0643-1

    Article  Google Scholar 

  5. Albornoz EM, Milone DH, Rufiner HL (2011) Spoken emotion recognition using hierarchical classifiers. Comput Speech Lang 25(3):556–570

    Article  Google Scholar 

  6. Burkhardt F, Paeschke A, Rolfes M, Sendlmeier W, Weiss B (2005) A database of German emotional speech. In: Proceedings of Interspeech’2005, Lisbon

  7. Bettadapura V (2012) Face expression recognition and analysis: the state of the art. College of Computing, Georgia Institute of Technology. arXiv:1203.6722v1

  8. Senechal T, Rapp V, Salam H, Seguier R, Bailly K, Prevost L (2012) Facial action recognition combining heterogeneous features via multikernel learning. IEEE Trans Syst Man Cybern B 42(4):993–1005

    Article  Google Scholar 

  9. Agrawal S, Khatri P (2015) Facial expression detection techniques: based on Viola and Jones algorithm and principal component analysis. In: Proceedings of 2015 Fifth International Conference on Advanced Computing & Communication Technologies (ACCT), pp 108–112, 21-22

  10. Majumder A, Behera L, Subramanian VK (2014) Emotion recognition from geometric facial features using self-organizing map. Pattern Recogn 47(3):1282–1293

    Article  Google Scholar 

  11. Pantic M, Valstar MF, Rademaker R, Maat L (2005) Web-based database for facial expression analysis. In: Proceedings of 13th ACM International Conference on Multimedia’05, pp 317–321. Database available at http://www.mmifacedb.com/

  12. Bejani M, Gharavian D, Charkari NM (2014) Audiovisual emotion recognition using ANOVA feature selection method and multi-classifier neural networks. Neural Comput Appl 24(2):399–412

    Article  Google Scholar 

  13. Martin O, Kotsia I, Macq B, Pitas I (2006) The eNTERFACE’05 audiovisual emotion database. In: Proceedings of ICDEW’2006, p 8, Atlanta, April 3–8

  14. Kachele M, Glodek M, Zharkov D, Meudt S, Schwenker F (2014) Fusion of audio-visual features using hierarchical classifier systems for the recognition of affective states and the state of depression. In: Proceedings of the International Conference on Pattern Recognition Applications and Methods (ICPRAM), pp 671–678

  15. Jeremie N, Vincent R, Kevin B, Lionel P, Mohamed C (2014) Audio-visual emotion recognition: a dynamic, multimodal approach. In: Proceedings of 26th French conference on interaction of human-machine (IHM’14), Lille

  16. Lin J-C, Wu C-H, Wei W-L (2012) Error weighted semi-coupled hidden Markov model for audio-visual emotion recognition. IEEE Trans Multimed 14(1):142–156

    Article  Google Scholar 

  17. Kim Y, Lee H, Provost EM (2013) Deep learning for robust feature generation in audiovisual emotion recognition. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp 3687–3691, 26–31 May 2013

  18. Metallinou A, Wollmer M, Katsamanis A, Eyben F, Schuller B, Narayanan S (2012) Context-sensitive learning for enhanced audiovisual emotion classification. IEEE Trans Affect Comput 3(2):184–198

    Article  Google Scholar 

  19. Mesgarani N, David S, Fritz J, Shamma S (2008) Phoneme representation and classification in primary cortex. J Acoust Soc Am 123:899–909

    Article  Google Scholar 

  20. Muhammad G, Mesallam T, Almalki K, Farahat M, Mahmood A, Alsulaiman M (2012) Multi directional regression (MDR) based features for automatic voice disorder detection. J Voice 26(6):817.e19–817.e27

    Article  Google Scholar 

  21. Do MN, Vetterli M (2003) The finite ridgelet transform for image representation. IEEE Trans Image Process 12(1):16–28

    Article  MathSciNet  MATH  Google Scholar 

  22. Huang G-B, Zhu Q-Y, Siew C-K (2006) Extreme learning machine: theory and applications. Neurocomputing 70(1–3):489–501

    Article  Google Scholar 

  23. Dollar P, Rabaud V, Cottrell G, Belongie S (2005) Behavior recognition via sparse spatio-temporal features. In: Proceedings of IEEE VS-PETS’2005, pp 65–72, Beijing, 15–16 Oct 2005

  24. Starck J-L, Candès EJ, Donoho DL (2002) The curvelet transform for image denoising. IEEE Trans Image Process 11:670–684

    Article  MathSciNet  MATH  Google Scholar 

  25. Huang G-B, Zhou H, Ding X, Zhang R (2012) Extreme learning machine for regression and multiclass classification. IEEE Trans Syst Man Cybern B 42(2):513–529

    Article  Google Scholar 

  26. Huang W, Li N, Lin Z, Huang G-B, Zong W, Zhou J, Duan Y (2013) Liver tumor detection and segmentation using kernel-based extreme learning machine. In: Proceedings of 35th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC ’13), pp 3662–3665, Osaka

  27. Savojardo C, Fariselli P, Casadio R (2013) BETAWARE: a machine-learning tool to detect and predict transmembrane beta barrel proteins in Prokaryotes. Bioinformatics 29(4):504–505

    Article  Google Scholar 

  28. Yin XX, Hadjiloucas S, Zhang Y (2014) Complex extreme learning machine applications in terahertz pulsed signals feature sets. Comput Methods Programs Biomed 117(2):387–403

    Article  Google Scholar 

  29. Hossain MS, Muhammad G, Song B, Hassan M, Alelaiwi A, Alamri A (2015) Audio-visual emotion-aware cloud gaming framework. IEEE Trans Circuits Syst Video Technol. doi:10.1109/TCSVT.2015.2444731

  30. Kanade T, Cohn J, Tian Y (2000) Comprehensive database for facial expression analysis. In: Proceedings of IEEE international conference on face and gesture recognition (AFGR ‘00), pp 46–53

  31. Mansoorizadeh M, Charkari NM (2010) Multimodal information fusion application to human emotion recognition from face and speech. Multimed Tools Appl 49(2):277–297

    Article  Google Scholar 

  32. Jiang D, Cui Y, Zhang X, Fan P, Ganzalez I, Sahli H (2011) Audio visual emotion recognition based on triple-stream dynamic bayesian network models. In: D’Mello S, et al. (eds) ACII 2011, Part I, LNCS 6974, pp 609–618

  33. Paleari M, Huet B (June 2008) Toward emotion indexing of multi-media excerpts. in: Proceedings of International Workshop on Content Based Multimedia Indexing (CBMI), pp 425-432, London

  34. Muhammad G, Masud M, Alelaiwi A, Rahman MA, Karime A, Alamri A, Hossain MS (2015) Spectro-temporal directional derivative based automatic speech recognition for a serious game scenario. Multimed Tools Appl 74(14):5313–5327. doi:10.1007/s11042-014-1973-7

    Article  Google Scholar 

  35. Jin Q, Li C, Chen S, Wu H (2015) Speech emotion recognition with acoustic and lexical features. In: Proceedings 2015 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 4749–4753, 19–24 Apr 2015

  36. Poria S, Cambria E, Howard N, Huang G-B, Hussain A (2015) Fusing audio, visual and textual clues for sentiment analysis from multimodal content. Neurocomputing. doi:10.1016/j.neucom.2015.01.095

  37. Hossain MS, Muhammad G (2015) Cloud-assisted speech and face recognition framework for health monitoring. Mob Netw Appl 20(3):391–399. doi:10.1007/s11036-015-0586-3

    Article  Google Scholar 

Download references

Acknowledgments

The authors extend their appreciation to the Deanship of Scientific Research at King Saud University, Riyadh, Saudi Arabia for funding this work through the research group Project No. RGP-1436-023.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ghulam Muhammad.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hossain, M.S., Muhammad, G. Audio-visual emotion recognition using multi-directional regression and Ridgelet transform. J Multimodal User Interfaces 10, 325–333 (2016). https://doi.org/10.1007/s12193-015-0207-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12193-015-0207-2

Keywords

Navigation