Skip to main content

A Survey on the Semi Supervised Learning Paradigm in the Context of Speech Emotion Recognition

  • Conference paper
  • First Online:
Intelligent Systems and Applications (IntelliSys 2021)

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 295))

Included in the following conference series:

Abstract

The area of Automatic Speech Emotion Recognition has been a hot topic for researchers for quite some time now. The recent breakthroughs on technology in the field of Machine Learning open up doors for multiple approaches of many kinds. However, some concerns have been persistent throughout the years where we highlight the design and collection of data. Proper annotation of data can be quite expensive and sometimes not even viable, as specialists are often needed for such a complex task as emotion recognition. The evolution of the semi supervised learning paradigm tries to drag down the high dependency on labelled data, potentially facilitating the design of a proper pipeline of tasks, single or multi modal, towards the final objective of the recognition of the human emotional mental state. In this paper, a review of the current single modal (audio) Semi Supervised Learning state of art is explored as a possible solution to the bottlenecking issues mentioned, as a way of helping and guiding future researchers when getting to the planning phase of such task, where many positive aspects from each piece of work can be drawn and combined.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Frijda, N.H.: Emotion and action. In: Manstead, A.S.R., Frijda, N., Fischer, A. (eds.) Feelings and Emotions: The Amsterdam Symposium, pp. 158–173. Cambridge University Press, Cambridge (2004)

    Google Scholar 

  2. Shiota, M.N.: Ekman’s theory of basic emotions. In: Miller, H.L. (ed.) The Sage Encyclopedia of Theory in Psychology, pp. 248–250. Sage Publications, Thousand Oaks (2016). https://doi.org/10.4135/9781483346274.n85

  3. Plutchik, R.: Emotions in the Practice of Psychotherapy: Clinical Implications of Affect Theories. American Psychological Association, Washington, DC (2000). https://doi.org/10.1037/10366-000.ISBN1557986940.OCLC44110498

    Book  Google Scholar 

  4. Aeluri, P., Vijayarajan, V.: Extraction of emotions from speech-a survey. Int. J. Appl. Eng. Res. 12, 5760–5767 (2017)

    Google Scholar 

  5. Pathak, S., Kolhe, V.L.: Emotion recognition from speech signals using deep learning methods. Imperial J. Interdisc. Research, 2 (2016)

    Google Scholar 

  6. Drakopoulos, G., Pikramenos, G., Spyrou, E., Perantonis, S.: Emotion recognition from speech: a survey (2019). https://doi.org/10.5220/0008495004320439

  7. Singh, A., Nowak, R., Zhu, X.: Unlabeled data: now it helps, now it doesn’t. NIPS. 1513–1520 (2008)

    Google Scholar 

  8. Khalil, R.A., Jones, E., Babar, M.J., Zafar, M., Alhussain, T.: Speech emotion recognition using deep learning techniques: a review. IEEE Access, pp. 1 (2019). https://doi.org/10.1109/ACCESS.2019.2936124

  9. Shaheen, F., Verma, B., Asafuddoula, Md.: Impact of Automatic Feature Extraction in Deep Learning Architecture, pp. 1–8 (2016). https://doi.org/10.1109/DICTA.2016.7797053

  10. Rabiner, L., Juang, B.: An introduction to hidden Markov models. IEEE ASSP Mag. 3(1), 4–16 (1986). https://doi.org/10.1109/MASSP.1986.1165342

    Article  Google Scholar 

  11. Reynolds, D.: Gaussian mixture models. Encycl. Biometrics (2008). https://doi.org/10.1007/978-0-387-73003-5196

    Article  Google Scholar 

  12. Heckerman, D.: A tutorial on learning with bayesian networks, (2008). https://doi.org/10.1007/978-3-540-85066-3_3

  13. Ben-Hur, A. Weston, J.: A user’s guide to support vector machines. Methods Mol. Biol. (Clifton, N.J.) 609, 223–239 (2010). https://doi.org/10.1007/978-1-60327-241-4_13

  14. Ouali, Y., Hudelot, C., Tami, M.: An overview of deep semi-supervised learning (2020)

    Google Scholar 

  15. Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press (2016)

    Google Scholar 

  16. Ayadi, M., Kamel, M.S., Karray, F.: Survey on speech emotion recognition: features, classification schemes, and databases. Pattern Recogn. 44, 572−587 (2011). https://doi.org/10.1016/j.patcog.2010.09.020

  17. Zhao, H., Xiao, Y., Zhang, Z.: Robust semisupervised generative adversarial networks for speech emotion recognition via distribution smoothness. IEEE Access 1–1 (2020). https://doi.org/10.1109/ACCESS.2020.3000751

  18. Pereira, I., Santos, D., Maciel, A., Barros, P.: Semisupervised model for emotion recognition in speech. In: 27th International Conference on Artificial Neural Networks, Rhodes, Greece, 4–7 October 2018, Proceedings, Part I (2018). https://doi.org/10.1007/978-3-030-01418-6_77

  19. Deng, J., Xu, X., Zhang, Z., Frühholz, S., Schuller, B.: Semi-supervised autoencoders for speech emotion recognition. In: IEEE/ACM Transactions on Audio, Speech, and Language Processing, pp. 1−1 (2017). https://doi.org/10.1109/TASLP.2017.2759338

  20. Parthasarathy, S., Busso, C.: Semi-supervised speech emotion recognition with ladder networks (2019)

    Google Scholar 

  21. Goodfellow, I., et al.: Generative adversarial networks. Adv. Neural Inf. Process. Syst. 3. https://doi.org/10.1145/3422622 (2014)

  22. Goodfellow, I., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. arXiv:1412.6572 (2014)

  23. Miyato, T., Maeda, S., Koyama, M., Ishii, S.: Virtual adversarial training: a regularization method for supervised and semi-supervised learning. IEEE Trans. Pattern Anal. Mach. Intell. (2017). https://doi.org/10.1109/TPAMI.2018.2858821

  24. Busso, C., et al.: IEMOCAP: interactive emotional dyadic motion capture database. Lang. Resour. Eval. 42, 335–359 (2008). https://doi.org/10.1007/s10579-008-9076-6

    Article  Google Scholar 

  25. Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W., Weiss, B.: A database of German emotional speech. In: 9th European Conference on Speech Communication and Technology. 5, pp. 1517–1520 (2005)

    Google Scholar 

  26. Batliner, A., Steidl, S., Noeth, E.: Releasing a thoroughly annotated and processed spontaneous emotional database: the FAU Aibo Emotion Corpus (2008)

    Google Scholar 

  27. Busso, C., Parthasarathy, S., Burmania, A., Abdel-Wahab, M., Sadoughi, N., Provost, E.M.: MSP-IMPROV: an acted corpus of dyadic interactions to study emotion perception. IEEE Trans. Affect. Comput. 8, 1−1 (2016). https://doi.org/10.1109/TAFFC.2016.2515617

  28. Schuller, B., Steidl, S., Batliner, A.: The interspeech 2009 emotion challenge. In: Tenth Annual Conference of the International Speech Communication Association, pp. 312–315 (2009)

    Google Scholar 

  29. Springenberg, J.: Unsupervised and semi-supervised learning with categorical generative adversarial networks (2015)

    Google Scholar 

  30. Odena, A.: Semi-supervised learning with generative adversarial networks (2016)

    Google Scholar 

  31. Berthelot, D., Schumm, T., Metz, L.: BEGAN: boundary equilibrium generative adversarial networks (2017)

    Google Scholar 

  32. Jackson, P., ul haq, S.: Surrey audio-visual expressed emotion (SAVEE) database (2011)

    Google Scholar 

  33. Barros, P., Churamani, N., Lakomkin, E., Siqueira, H., Sutherland, A., Wermter, S.: The OMG-emotion behavior dataset. 1–7 (2018). https://doi.org/10.1109/IJCNN.2018.8489099

  34. Panayotov, V., Chen, G., Povey, D., Khudanpur, S.: Librispeech: An ASR corpus based on public domain audio books. 5206–5210 (2015). https://doi.org/10.1109/ICASSP.2015.7178964

  35. Sejdic, E., Djurovic, I., Jiang, J.: Time–frequency feature representation using energy concentration: an overview of recent advances. Digital Signal Process. 19, 153–183 (2009). https://doi.org/10.1016/j.dsp.2007.12.004

  36. Ashwin Saran, T.S., Sai Reddy, G.: Video affective content analysis based on multimodal features using a novel hybrid SVM-RBM classifier, 416–421 (2016). https://doi.org/10.1109/UPCON.2016.7894690

  37. Vincent, P., Larochelle, H., Bengio, Y., Manzagol, P.-A. Extracting and composing robust features with denoising autoencoders. In: Proceedings of the 25th International Conference on Machine Learning, pp. 1096–1103 (2008). https://doi.org/10.1145/1390156.1390294

  38. Saxena, D., Cao, J.: Generative adversarial networks (GANs): challenges, solutions, and future directions (2020)

    Google Scholar 

  39. Munjal, P., Paul, A., Krishnan, N.: Implicit discriminator in variational autoencoder (2019)

    Google Scholar 

  40. Hinton, G.E. Salakhutdinov, R.R.: Reducing the dimensionality of data with neural networks. Science (New York, N.Y.) 313, 504–507 (2006). https://doi.org/10.1126/science.1127647

  41. Deng, J., Xu, X., Zhang, Z., Frühholz, S., Grandjean, D., Schuller, B.: Fisher kernels on phase-based features for speech emotion recognition (2017). https://doi.org/10.1007/978-981-10-2585-3_15

  42. Schuller, B., Arsic, D., Rigoll, G., Wimmer, M., Radig, B.: Audiovisual behavior modeling by combined feature spaces. 2, II-733 (2007). https://doi.org/10.1109/ICASSP.2007.366340

  43. Hansen, J., Bou-Ghazale, S.E.: Getting started with SUSAS: a speech under simulated and actual stress database. EUROSPEECH (1997)

    Google Scholar 

  44. Valpola, H.: From neural PCA to deep unsupervised learning (2014). https://doi.org/10.1016/B978-0-12-802806-3.00008-7

  45. Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. The MIT Press (2016)

    MATH  Google Scholar 

  46. Mariooryad, S., Lotfian, R., Busso, C.: Building a naturalistic emotional speech corpus by retrieving expressive behaviors from existing speech corpora. In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. pp, 238–242 (2014)

    Google Scholar 

  47. Schuller, B., et al.: The INTERSPEECH 2013 computational paralinguistics challenge: social signals, conflict, emotion, autism. In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. pp. 148–152 (2013)

    Google Scholar 

  48. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, pp. 770–778 (2016). https://doi.org/10.1109/CVPR.2016.90

  49. Hassan, A., Damper, R., Niranjan, M.: On acoustic emotion recognition: compensating for covariate shift. Audio Speech Lang. Process. IEEE Trans. 21, 1458-1468 (2013). https://doi.org/10.1109/TASL.2013.2255278

  50. Deng, J., Zhang, Z., Eyben, F., Schuller, B.: Autoencoder-based unsupervised domain adaptation for speech emotion recognition. IEEE Signal Process. Lett. 21(9), 1068–1072 (2014). https://doi.org/10.1109/LSP.2014.2324759

    Article  Google Scholar 

  51. Aldeneh, Z., Provost, E.M.: Using regional saliency for speech emotion recognition. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), New Orleans, LA, pp. 2741–2745 (2017). https://doi.org/10.1109/ICASSP.2017.7952655

  52. Eyben, F., Wöllmer, M., Schuller, B.: OpenSMILE - the munich versatile and fast open-source audio feature extractor, In: Proceedings ACM Multimedia (MM), ACM, Florence, Italy, ISBN 978–1–60558–933–6, pp. 1459–1462, 25.-29.10.2010

    Google Scholar 

  53. Latif, S., Rana, R., Khalifa, S., Jurdak, R., Epps, J., Schuller, B.: Multi-task semi-supervised adversarial autoencoding for speech emotion (2019)

    Google Scholar 

  54. Tulshan, A., Dhage, S.: Survey on virtual assistant: google assistant, siri, cortana, alexa. In: 4th International Symposium SIRS 2018, Bangalore, India, September 19–22, 2018, Revised Selected Papers (2019). https://doi.org/10.1007/978-981-13-57589_17

  55. Yampolskiy, R.: Unpredictability of AI (2019)

    Google Scholar 

  56. Almeida, P.S., Novais, P., Costa, E., Rodrigues, M., Neves, J.: Artificial intelligence tools for student learning assessment in professional schools (2008)

    Google Scholar 

  57. Rodrigues, M., Novais, F.F.R.P., Fdez-Riverola, F.: An approach to assessing stress in eLearning students. In: Proceedings of the 11th European Conference on e-Learning: ECEL, p. 461 October 2012

    Google Scholar 

  58. Gonçalves, S., Rodrigues, M., Carneiro, D., Fdez-Riverola, F., Novais, P.: Boosting learning: non-intrusive monitoring of student’s efficiency. In: Mascio, T.D., Gennari, R., Vittorini, P., De la Prieta, F. (eds.) Methodologies and Intelligent Systems for Technology Enhanced Learning. AISC, vol. 374, pp. 73–80. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-19632-9_10

    Chapter  Google Scholar 

  59. Rodrigues, M., Novais, P., Santos, M.F.: Future chall (2005)

    Google Scholar 

  60. Analide, C., Novais, P., Machado, J., Neves, J.: Quality of knowledge in virtual entities. In: Encyclopedia of Communities of Practice in Information and Knowledge Management, pp. 436–442. IGI global.enges in intelligent tutoring sys-tems–a framework (2006)

    Google Scholar 

  61. Andrade, F., Novais, P., Carneiro, D., Zeleznikow, J., Neves, J.: Using BATNAs and WATNAs in online dispute resolution. In: Nakakoji, K., Murakami, Y., McCready, E. (eds.) JSAI-isAI 2009. LNCS (LNAI), vol. 6284, pp. 5–18. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-14888-0_2

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Manuel Rodrigues .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Andrade, G., Rodrigues, M., Novais, P. (2022). A Survey on the Semi Supervised Learning Paradigm in the Context of Speech Emotion Recognition. In: Arai, K. (eds) Intelligent Systems and Applications. IntelliSys 2021. Lecture Notes in Networks and Systems, vol 295. Springer, Cham. https://doi.org/10.1007/978-3-030-82196-8_57

Download citation

Publish with us

Policies and ethics