Abstract
The area of Automatic Speech Emotion Recognition has been a hot topic for researchers for quite some time now. The recent breakthroughs on technology in the field of Machine Learning open up doors for multiple approaches of many kinds. However, some concerns have been persistent throughout the years where we highlight the design and collection of data. Proper annotation of data can be quite expensive and sometimes not even viable, as specialists are often needed for such a complex task as emotion recognition. The evolution of the semi supervised learning paradigm tries to drag down the high dependency on labelled data, potentially facilitating the design of a proper pipeline of tasks, single or multi modal, towards the final objective of the recognition of the human emotional mental state. In this paper, a review of the current single modal (audio) Semi Supervised Learning state of art is explored as a possible solution to the bottlenecking issues mentioned, as a way of helping and guiding future researchers when getting to the planning phase of such task, where many positive aspects from each piece of work can be drawn and combined.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Frijda, N.H.: Emotion and action. In: Manstead, A.S.R., Frijda, N., Fischer, A. (eds.) Feelings and Emotions: The Amsterdam Symposium, pp. 158–173. Cambridge University Press, Cambridge (2004)
Shiota, M.N.: Ekman’s theory of basic emotions. In: Miller, H.L. (ed.) The Sage Encyclopedia of Theory in Psychology, pp. 248–250. Sage Publications, Thousand Oaks (2016). https://doi.org/10.4135/9781483346274.n85
Plutchik, R.: Emotions in the Practice of Psychotherapy: Clinical Implications of Affect Theories. American Psychological Association, Washington, DC (2000). https://doi.org/10.1037/10366-000.ISBN1557986940.OCLC44110498
Aeluri, P., Vijayarajan, V.: Extraction of emotions from speech-a survey. Int. J. Appl. Eng. Res. 12, 5760–5767 (2017)
Pathak, S., Kolhe, V.L.: Emotion recognition from speech signals using deep learning methods. Imperial J. Interdisc. Research, 2 (2016)
Drakopoulos, G., Pikramenos, G., Spyrou, E., Perantonis, S.: Emotion recognition from speech: a survey (2019). https://doi.org/10.5220/0008495004320439
Singh, A., Nowak, R., Zhu, X.: Unlabeled data: now it helps, now it doesn’t. NIPS. 1513–1520 (2008)
Khalil, R.A., Jones, E., Babar, M.J., Zafar, M., Alhussain, T.: Speech emotion recognition using deep learning techniques: a review. IEEE Access, pp. 1 (2019). https://doi.org/10.1109/ACCESS.2019.2936124
Shaheen, F., Verma, B., Asafuddoula, Md.: Impact of Automatic Feature Extraction in Deep Learning Architecture, pp. 1–8 (2016). https://doi.org/10.1109/DICTA.2016.7797053
Rabiner, L., Juang, B.: An introduction to hidden Markov models. IEEE ASSP Mag. 3(1), 4–16 (1986). https://doi.org/10.1109/MASSP.1986.1165342
Reynolds, D.: Gaussian mixture models. Encycl. Biometrics (2008). https://doi.org/10.1007/978-0-387-73003-5196
Heckerman, D.: A tutorial on learning with bayesian networks, (2008). https://doi.org/10.1007/978-3-540-85066-3_3
Ben-Hur, A. Weston, J.: A user’s guide to support vector machines. Methods Mol. Biol. (Clifton, N.J.) 609, 223–239 (2010). https://doi.org/10.1007/978-1-60327-241-4_13
Ouali, Y., Hudelot, C., Tami, M.: An overview of deep semi-supervised learning (2020)
Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press (2016)
Ayadi, M., Kamel, M.S., Karray, F.: Survey on speech emotion recognition: features, classification schemes, and databases. Pattern Recogn. 44, 572−587 (2011). https://doi.org/10.1016/j.patcog.2010.09.020
Zhao, H., Xiao, Y., Zhang, Z.: Robust semisupervised generative adversarial networks for speech emotion recognition via distribution smoothness. IEEE Access 1–1 (2020). https://doi.org/10.1109/ACCESS.2020.3000751
Pereira, I., Santos, D., Maciel, A., Barros, P.: Semisupervised model for emotion recognition in speech. In: 27th International Conference on Artificial Neural Networks, Rhodes, Greece, 4–7 October 2018, Proceedings, Part I (2018). https://doi.org/10.1007/978-3-030-01418-6_77
Deng, J., Xu, X., Zhang, Z., Frühholz, S., Schuller, B.: Semi-supervised autoencoders for speech emotion recognition. In: IEEE/ACM Transactions on Audio, Speech, and Language Processing, pp. 1−1 (2017). https://doi.org/10.1109/TASLP.2017.2759338
Parthasarathy, S., Busso, C.: Semi-supervised speech emotion recognition with ladder networks (2019)
Goodfellow, I., et al.: Generative adversarial networks. Adv. Neural Inf. Process. Syst. 3. https://doi.org/10.1145/3422622 (2014)
Goodfellow, I., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. arXiv:1412.6572 (2014)
Miyato, T., Maeda, S., Koyama, M., Ishii, S.: Virtual adversarial training: a regularization method for supervised and semi-supervised learning. IEEE Trans. Pattern Anal. Mach. Intell. (2017). https://doi.org/10.1109/TPAMI.2018.2858821
Busso, C., et al.: IEMOCAP: interactive emotional dyadic motion capture database. Lang. Resour. Eval. 42, 335–359 (2008). https://doi.org/10.1007/s10579-008-9076-6
Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W., Weiss, B.: A database of German emotional speech. In: 9th European Conference on Speech Communication and Technology. 5, pp. 1517–1520 (2005)
Batliner, A., Steidl, S., Noeth, E.: Releasing a thoroughly annotated and processed spontaneous emotional database: the FAU Aibo Emotion Corpus (2008)
Busso, C., Parthasarathy, S., Burmania, A., Abdel-Wahab, M., Sadoughi, N., Provost, E.M.: MSP-IMPROV: an acted corpus of dyadic interactions to study emotion perception. IEEE Trans. Affect. Comput. 8, 1−1 (2016). https://doi.org/10.1109/TAFFC.2016.2515617
Schuller, B., Steidl, S., Batliner, A.: The interspeech 2009 emotion challenge. In: Tenth Annual Conference of the International Speech Communication Association, pp. 312–315 (2009)
Springenberg, J.: Unsupervised and semi-supervised learning with categorical generative adversarial networks (2015)
Odena, A.: Semi-supervised learning with generative adversarial networks (2016)
Berthelot, D., Schumm, T., Metz, L.: BEGAN: boundary equilibrium generative adversarial networks (2017)
Jackson, P., ul haq, S.: Surrey audio-visual expressed emotion (SAVEE) database (2011)
Barros, P., Churamani, N., Lakomkin, E., Siqueira, H., Sutherland, A., Wermter, S.: The OMG-emotion behavior dataset. 1–7 (2018). https://doi.org/10.1109/IJCNN.2018.8489099
Panayotov, V., Chen, G., Povey, D., Khudanpur, S.: Librispeech: An ASR corpus based on public domain audio books. 5206–5210 (2015). https://doi.org/10.1109/ICASSP.2015.7178964
Sejdic, E., Djurovic, I., Jiang, J.: Time–frequency feature representation using energy concentration: an overview of recent advances. Digital Signal Process. 19, 153–183 (2009). https://doi.org/10.1016/j.dsp.2007.12.004
Ashwin Saran, T.S., Sai Reddy, G.: Video affective content analysis based on multimodal features using a novel hybrid SVM-RBM classifier, 416–421 (2016). https://doi.org/10.1109/UPCON.2016.7894690
Vincent, P., Larochelle, H., Bengio, Y., Manzagol, P.-A. Extracting and composing robust features with denoising autoencoders. In: Proceedings of the 25th International Conference on Machine Learning, pp. 1096–1103 (2008). https://doi.org/10.1145/1390156.1390294
Saxena, D., Cao, J.: Generative adversarial networks (GANs): challenges, solutions, and future directions (2020)
Munjal, P., Paul, A., Krishnan, N.: Implicit discriminator in variational autoencoder (2019)
Hinton, G.E. Salakhutdinov, R.R.: Reducing the dimensionality of data with neural networks. Science (New York, N.Y.) 313, 504–507 (2006). https://doi.org/10.1126/science.1127647
Deng, J., Xu, X., Zhang, Z., Frühholz, S., Grandjean, D., Schuller, B.: Fisher kernels on phase-based features for speech emotion recognition (2017). https://doi.org/10.1007/978-981-10-2585-3_15
Schuller, B., Arsic, D., Rigoll, G., Wimmer, M., Radig, B.: Audiovisual behavior modeling by combined feature spaces. 2, II-733 (2007). https://doi.org/10.1109/ICASSP.2007.366340
Hansen, J., Bou-Ghazale, S.E.: Getting started with SUSAS: a speech under simulated and actual stress database. EUROSPEECH (1997)
Valpola, H.: From neural PCA to deep unsupervised learning (2014). https://doi.org/10.1016/B978-0-12-802806-3.00008-7
Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. The MIT Press (2016)
Mariooryad, S., Lotfian, R., Busso, C.: Building a naturalistic emotional speech corpus by retrieving expressive behaviors from existing speech corpora. In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. pp, 238–242 (2014)
Schuller, B., et al.: The INTERSPEECH 2013 computational paralinguistics challenge: social signals, conflict, emotion, autism. In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. pp. 148–152 (2013)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, pp. 770–778 (2016). https://doi.org/10.1109/CVPR.2016.90
Hassan, A., Damper, R., Niranjan, M.: On acoustic emotion recognition: compensating for covariate shift. Audio Speech Lang. Process. IEEE Trans. 21, 1458-1468 (2013). https://doi.org/10.1109/TASL.2013.2255278
Deng, J., Zhang, Z., Eyben, F., Schuller, B.: Autoencoder-based unsupervised domain adaptation for speech emotion recognition. IEEE Signal Process. Lett. 21(9), 1068–1072 (2014). https://doi.org/10.1109/LSP.2014.2324759
Aldeneh, Z., Provost, E.M.: Using regional saliency for speech emotion recognition. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), New Orleans, LA, pp. 2741–2745 (2017). https://doi.org/10.1109/ICASSP.2017.7952655
Eyben, F., Wöllmer, M., Schuller, B.: OpenSMILE - the munich versatile and fast open-source audio feature extractor, In: Proceedings ACM Multimedia (MM), ACM, Florence, Italy, ISBN 978–1–60558–933–6, pp. 1459–1462, 25.-29.10.2010
Latif, S., Rana, R., Khalifa, S., Jurdak, R., Epps, J., Schuller, B.: Multi-task semi-supervised adversarial autoencoding for speech emotion (2019)
Tulshan, A., Dhage, S.: Survey on virtual assistant: google assistant, siri, cortana, alexa. In: 4th International Symposium SIRS 2018, Bangalore, India, September 19–22, 2018, Revised Selected Papers (2019). https://doi.org/10.1007/978-981-13-57589_17
Yampolskiy, R.: Unpredictability of AI (2019)
Almeida, P.S., Novais, P., Costa, E., Rodrigues, M., Neves, J.: Artificial intelligence tools for student learning assessment in professional schools (2008)
Rodrigues, M., Novais, F.F.R.P., Fdez-Riverola, F.: An approach to assessing stress in eLearning students. In: Proceedings of the 11th European Conference on e-Learning: ECEL, p. 461 October 2012
Gonçalves, S., Rodrigues, M., Carneiro, D., Fdez-Riverola, F., Novais, P.: Boosting learning: non-intrusive monitoring of student’s efficiency. In: Mascio, T.D., Gennari, R., Vittorini, P., De la Prieta, F. (eds.) Methodologies and Intelligent Systems for Technology Enhanced Learning. AISC, vol. 374, pp. 73–80. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-19632-9_10
Rodrigues, M., Novais, P., Santos, M.F.: Future chall (2005)
Analide, C., Novais, P., Machado, J., Neves, J.: Quality of knowledge in virtual entities. In: Encyclopedia of Communities of Practice in Information and Knowledge Management, pp. 436–442. IGI global.enges in intelligent tutoring sys-tems–a framework (2006)
Andrade, F., Novais, P., Carneiro, D., Zeleznikow, J., Neves, J.: Using BATNAs and WATNAs in online dispute resolution. In: Nakakoji, K., Murakami, Y., McCready, E. (eds.) JSAI-isAI 2009. LNCS (LNAI), vol. 6284, pp. 5–18. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-14888-0_2
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Andrade, G., Rodrigues, M., Novais, P. (2022). A Survey on the Semi Supervised Learning Paradigm in the Context of Speech Emotion Recognition. In: Arai, K. (eds) Intelligent Systems and Applications. IntelliSys 2021. Lecture Notes in Networks and Systems, vol 295. Springer, Cham. https://doi.org/10.1007/978-3-030-82196-8_57
Download citation
DOI: https://doi.org/10.1007/978-3-030-82196-8_57
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-82195-1
Online ISBN: 978-3-030-82196-8
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)