A Survey on the Semi Supervised Learning Paradigm in the Context of Speech Emotion Recognition

Andrade, Guilherme; Rodrigues, Manuel; Novais, Paulo

doi:10.1007/978-3-030-82196-8_57

Guilherme Andrade¹⁰,
Manuel Rodrigues¹⁰ &
Paulo Novais¹⁰

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 295))

Included in the following conference series:

Proceedings of SAI Intelligent Systems Conference

1181 Accesses
4 Citations

Abstract

The area of Automatic Speech Emotion Recognition has been a hot topic for researchers for quite some time now. The recent breakthroughs on technology in the field of Machine Learning open up doors for multiple approaches of many kinds. However, some concerns have been persistent throughout the years where we highlight the design and collection of data. Proper annotation of data can be quite expensive and sometimes not even viable, as specialists are often needed for such a complex task as emotion recognition. The evolution of the semi supervised learning paradigm tries to drag down the high dependency on labelled data, potentially facilitating the design of a proper pipeline of tasks, single or multi modal, towards the final objective of the recognition of the human emotional mental state. In this paper, a review of the current single modal (audio) Semi Supervised Learning state of art is explored as a possible solution to the bottlenecking issues mentioned, as a way of helping and guiding future researchers when getting to the planning phase of such task, where many positive aspects from each piece of work can be drawn and combined.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Speech emotion recognition for the Urdu language

Article 13 August 2022

Comparative Analysis of Speech Emotion Recognition Models

SER_AMPEL: A Multi-source Dataset for Speech Emotion Recognition of Italian Older Adults

References

Frijda, N.H.: Emotion and action. In: Manstead, A.S.R., Frijda, N., Fischer, A. (eds.) Feelings and Emotions: The Amsterdam Symposium, pp. 158–173. Cambridge University Press, Cambridge (2004)
Google Scholar
Shiota, M.N.: Ekman’s theory of basic emotions. In: Miller, H.L. (ed.) The Sage Encyclopedia of Theory in Psychology, pp. 248–250. Sage Publications, Thousand Oaks (2016). https://doi.org/10.4135/9781483346274.n85
Plutchik, R.: Emotions in the Practice of Psychotherapy: Clinical Implications of Affect Theories. American Psychological Association, Washington, DC (2000). https://doi.org/10.1037/10366-000.ISBN1557986940.OCLC44110498
Book Google Scholar
Aeluri, P., Vijayarajan, V.: Extraction of emotions from speech-a survey. Int. J. Appl. Eng. Res. 12, 5760–5767 (2017)
Google Scholar
Pathak, S., Kolhe, V.L.: Emotion recognition from speech signals using deep learning methods. Imperial J. Interdisc. Research, 2 (2016)
Google Scholar
Drakopoulos, G., Pikramenos, G., Spyrou, E., Perantonis, S.: Emotion recognition from speech: a survey (2019). https://doi.org/10.5220/0008495004320439
Singh, A., Nowak, R., Zhu, X.: Unlabeled data: now it helps, now it doesn’t. NIPS. 1513–1520 (2008)
Google Scholar
Khalil, R.A., Jones, E., Babar, M.J., Zafar, M., Alhussain, T.: Speech emotion recognition using deep learning techniques: a review. IEEE Access, pp. 1 (2019). https://doi.org/10.1109/ACCESS.2019.2936124
Shaheen, F., Verma, B., Asafuddoula, Md.: Impact of Automatic Feature Extraction in Deep Learning Architecture, pp. 1–8 (2016). https://doi.org/10.1109/DICTA.2016.7797053
Rabiner, L., Juang, B.: An introduction to hidden Markov models. IEEE ASSP Mag. 3(1), 4–16 (1986). https://doi.org/10.1109/MASSP.1986.1165342
Article Google Scholar
Reynolds, D.: Gaussian mixture models. Encycl. Biometrics (2008). https://doi.org/10.1007/978-0-387-73003-5196
Article Google Scholar
Heckerman, D.: A tutorial on learning with bayesian networks, (2008). https://doi.org/10.1007/978-3-540-85066-3_3
Ben-Hur, A. Weston, J.: A user’s guide to support vector machines. Methods Mol. Biol. (Clifton, N.J.) 609, 223–239 (2010). https://doi.org/10.1007/978-1-60327-241-4_13
Ouali, Y., Hudelot, C., Tami, M.: An overview of deep semi-supervised learning (2020)
Google Scholar
Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press (2016)
Google Scholar
Ayadi, M., Kamel, M.S., Karray, F.: Survey on speech emotion recognition: features, classification schemes, and databases. Pattern Recogn. 44, 572−587 (2011). https://doi.org/10.1016/j.patcog.2010.09.020
Zhao, H., Xiao, Y., Zhang, Z.: Robust semisupervised generative adversarial networks for speech emotion recognition via distribution smoothness. IEEE Access 1–1 (2020). https://doi.org/10.1109/ACCESS.2020.3000751
Pereira, I., Santos, D., Maciel, A., Barros, P.: Semisupervised model for emotion recognition in speech. In: 27th International Conference on Artificial Neural Networks, Rhodes, Greece, 4–7 October 2018, Proceedings, Part I (2018). https://doi.org/10.1007/978-3-030-01418-6_77
Deng, J., Xu, X., Zhang, Z., Frühholz, S., Schuller, B.: Semi-supervised autoencoders for speech emotion recognition. In: IEEE/ACM Transactions on Audio, Speech, and Language Processing, pp. 1−1 (2017). https://doi.org/10.1109/TASLP.2017.2759338
Parthasarathy, S., Busso, C.: Semi-supervised speech emotion recognition with ladder networks (2019)
Google Scholar
Goodfellow, I., et al.: Generative adversarial networks. Adv. Neural Inf. Process. Syst. 3. https://doi.org/10.1145/3422622 (2014)
Goodfellow, I., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. arXiv:1412.6572 (2014)
Miyato, T., Maeda, S., Koyama, M., Ishii, S.: Virtual adversarial training: a regularization method for supervised and semi-supervised learning. IEEE Trans. Pattern Anal. Mach. Intell. (2017). https://doi.org/10.1109/TPAMI.2018.2858821
Busso, C., et al.: IEMOCAP: interactive emotional dyadic motion capture database. Lang. Resour. Eval. 42, 335–359 (2008). https://doi.org/10.1007/s10579-008-9076-6
Article Google Scholar
Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W., Weiss, B.: A database of German emotional speech. In: 9th European Conference on Speech Communication and Technology. 5, pp. 1517–1520 (2005)
Google Scholar
Batliner, A., Steidl, S., Noeth, E.: Releasing a thoroughly annotated and processed spontaneous emotional database: the FAU Aibo Emotion Corpus (2008)
Google Scholar
Busso, C., Parthasarathy, S., Burmania, A., Abdel-Wahab, M., Sadoughi, N., Provost, E.M.: MSP-IMPROV: an acted corpus of dyadic interactions to study emotion perception. IEEE Trans. Affect. Comput. 8, 1−1 (2016). https://doi.org/10.1109/TAFFC.2016.2515617
Schuller, B., Steidl, S., Batliner, A.: The interspeech 2009 emotion challenge. In: Tenth Annual Conference of the International Speech Communication Association, pp. 312–315 (2009)
Google Scholar
Springenberg, J.: Unsupervised and semi-supervised learning with categorical generative adversarial networks (2015)
Google Scholar
Odena, A.: Semi-supervised learning with generative adversarial networks (2016)
Google Scholar
Berthelot, D., Schumm, T., Metz, L.: BEGAN: boundary equilibrium generative adversarial networks (2017)
Google Scholar
Jackson, P., ul haq, S.: Surrey audio-visual expressed emotion (SAVEE) database (2011)
Google Scholar
Barros, P., Churamani, N., Lakomkin, E., Siqueira, H., Sutherland, A., Wermter, S.: The OMG-emotion behavior dataset. 1–7 (2018). https://doi.org/10.1109/IJCNN.2018.8489099
Panayotov, V., Chen, G., Povey, D., Khudanpur, S.: Librispeech: An ASR corpus based on public domain audio books. 5206–5210 (2015). https://doi.org/10.1109/ICASSP.2015.7178964
Sejdic, E., Djurovic, I., Jiang, J.: Time–frequency feature representation using energy concentration: an overview of recent advances. Digital Signal Process. 19, 153–183 (2009). https://doi.org/10.1016/j.dsp.2007.12.004
Ashwin Saran, T.S., Sai Reddy, G.: Video affective content analysis based on multimodal features using a novel hybrid SVM-RBM classifier, 416–421 (2016). https://doi.org/10.1109/UPCON.2016.7894690
Vincent, P., Larochelle, H., Bengio, Y., Manzagol, P.-A. Extracting and composing robust features with denoising autoencoders. In: Proceedings of the 25th International Conference on Machine Learning, pp. 1096–1103 (2008). https://doi.org/10.1145/1390156.1390294
Saxena, D., Cao, J.: Generative adversarial networks (GANs): challenges, solutions, and future directions (2020)
Google Scholar
Munjal, P., Paul, A., Krishnan, N.: Implicit discriminator in variational autoencoder (2019)
Google Scholar
Hinton, G.E. Salakhutdinov, R.R.: Reducing the dimensionality of data with neural networks. Science (New York, N.Y.) 313, 504–507 (2006). https://doi.org/10.1126/science.1127647
Deng, J., Xu, X., Zhang, Z., Frühholz, S., Grandjean, D., Schuller, B.: Fisher kernels on phase-based features for speech emotion recognition (2017). https://doi.org/10.1007/978-981-10-2585-3_15
Schuller, B., Arsic, D., Rigoll, G., Wimmer, M., Radig, B.: Audiovisual behavior modeling by combined feature spaces. 2, II-733 (2007). https://doi.org/10.1109/ICASSP.2007.366340
Hansen, J., Bou-Ghazale, S.E.: Getting started with SUSAS: a speech under simulated and actual stress database. EUROSPEECH (1997)
Google Scholar
Valpola, H.: From neural PCA to deep unsupervised learning (2014). https://doi.org/10.1016/B978-0-12-802806-3.00008-7
Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. The MIT Press (2016)
MATH Google Scholar
Mariooryad, S., Lotfian, R., Busso, C.: Building a naturalistic emotional speech corpus by retrieving expressive behaviors from existing speech corpora. In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. pp, 238–242 (2014)
Google Scholar
Schuller, B., et al.: The INTERSPEECH 2013 computational paralinguistics challenge: social signals, conflict, emotion, autism. In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. pp. 148–152 (2013)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, pp. 770–778 (2016). https://doi.org/10.1109/CVPR.2016.90
Hassan, A., Damper, R., Niranjan, M.: On acoustic emotion recognition: compensating for covariate shift. Audio Speech Lang. Process. IEEE Trans. 21, 1458-1468 (2013). https://doi.org/10.1109/TASL.2013.2255278
Deng, J., Zhang, Z., Eyben, F., Schuller, B.: Autoencoder-based unsupervised domain adaptation for speech emotion recognition. IEEE Signal Process. Lett. 21(9), 1068–1072 (2014). https://doi.org/10.1109/LSP.2014.2324759
Article Google Scholar
Aldeneh, Z., Provost, E.M.: Using regional saliency for speech emotion recognition. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), New Orleans, LA, pp. 2741–2745 (2017). https://doi.org/10.1109/ICASSP.2017.7952655
Eyben, F., Wöllmer, M., Schuller, B.: OpenSMILE - the munich versatile and fast open-source audio feature extractor, In: Proceedings ACM Multimedia (MM), ACM, Florence, Italy, ISBN 978–1–60558–933–6, pp. 1459–1462, 25.-29.10.2010
Google Scholar
Latif, S., Rana, R., Khalifa, S., Jurdak, R., Epps, J., Schuller, B.: Multi-task semi-supervised adversarial autoencoding for speech emotion (2019)
Google Scholar
Tulshan, A., Dhage, S.: Survey on virtual assistant: google assistant, siri, cortana, alexa. In: 4th International Symposium SIRS 2018, Bangalore, India, September 19–22, 2018, Revised Selected Papers (2019). https://doi.org/10.1007/978-981-13-57589_17
Yampolskiy, R.: Unpredictability of AI (2019)
Google Scholar
Almeida, P.S., Novais, P., Costa, E., Rodrigues, M., Neves, J.: Artificial intelligence tools for student learning assessment in professional schools (2008)
Google Scholar
Rodrigues, M., Novais, F.F.R.P., Fdez-Riverola, F.: An approach to assessing stress in eLearning students. In: Proceedings of the 11th European Conference on e-Learning: ECEL, p. 461 October 2012
Google Scholar
Gonçalves, S., Rodrigues, M., Carneiro, D., Fdez-Riverola, F., Novais, P.: Boosting learning: non-intrusive monitoring of student’s efficiency. In: Mascio, T.D., Gennari, R., Vittorini, P., De la Prieta, F. (eds.) Methodologies and Intelligent Systems for Technology Enhanced Learning. AISC, vol. 374, pp. 73–80. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-19632-9_10
Chapter Google Scholar
Rodrigues, M., Novais, P., Santos, M.F.: Future chall (2005)
Google Scholar
Analide, C., Novais, P., Machado, J., Neves, J.: Quality of knowledge in virtual entities. In: Encyclopedia of Communities of Practice in Information and Knowledge Management, pp. 436–442. IGI global.enges in intelligent tutoring sys-tems–a framework (2006)
Google Scholar
Andrade, F., Novais, P., Carneiro, D., Zeleznikow, J., Neves, J.: Using BATNAs and WATNAs in online dispute resolution. In: Nakakoji, K., Murakami, Y., McCready, E. (eds.) JSAI-isAI 2009. LNCS (LNAI), vol. 6284, pp. 5–18. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-14888-0_2
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

University of Minho, Braga, Portugal
Guilherme Andrade, Manuel Rodrigues & Paulo Novais

Authors

Guilherme Andrade
View author publications
You can also search for this author in PubMed Google Scholar
Manuel Rodrigues
View author publications
You can also search for this author in PubMed Google Scholar
Paulo Novais
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Manuel Rodrigues .

Editor information

Editors and Affiliations

Faculty of Science and Engineering, Saga University, Saga, Japan
Kohei Arai

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Andrade, G., Rodrigues, M., Novais, P. (2022). A Survey on the Semi Supervised Learning Paradigm in the Context of Speech Emotion Recognition. In: Arai, K. (eds) Intelligent Systems and Applications. IntelliSys 2021. Lecture Notes in Networks and Systems, vol 295. Springer, Cham. https://doi.org/10.1007/978-3-030-82196-8_57

Download citation

DOI: https://doi.org/10.1007/978-3-030-82196-8_57
Published: 03 August 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-82195-1
Online ISBN: 978-3-030-82196-8
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

A Survey on the Semi Supervised Learning Paradigm in the Context of Speech Emotion Recognition

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Speech emotion recognition for the Urdu language

Comparative Analysis of Speech Emotion Recognition Models

SER_AMPEL: A Multi-source Dataset for Speech Emotion Recognition of Italian Older Adults

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

A Survey on the Semi Supervised Learning Paradigm in the Context of Speech Emotion Recognition

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Speech emotion recognition for the Urdu language

Comparative Analysis of Speech Emotion Recognition Models

SER_AMPEL: A Multi-source Dataset for Speech Emotion Recognition of Italian Older Adults

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation