Abstract
To achieve high performance and classification accuracy, classification of emotions from audio or speech signals requires large quantities of data. Big datasets, however, are not always readily accessible. A good solution to this issue is to increase the data and augment it to construct a larger dataset for the classifier’s training. This paper proposes a unimodal approach that focuses on two main concepts: (1) augmenting speech signals to generate additional data samples; and (2) constructing classification models to identify emotion expressed through speech. In addition, three classifiers (Convolutional Neural Network (CNN), Naïve Bayes (NB) and K-Nearest Neighbor (kNN)) were further tested in order to decide which of the classifiers had the best results. We used augmented audio data from a dataset (SAVEE) in the proposed method to conduct training (50%), and testing (50%) was executed using the original data. The best performance of approximately 83% was found to be a mixture of augmentation strategies using the CNN classifier. Our proposed augmentation approach together with appropriate classification model enhances the efficiency of voice emotion recognition.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Shoumy, N.J., Ang, L.-M., Seng, K.P., Rahaman, D.M.M., Zia, T.: Multimodal big data affective analytics: a comprehensive survey using text, audio, visual and physiological signals. J. Netw. Comput. Appl. 149, 102447 (2020). https://doi.org/10.1016/j.jnca.2019.102447
Deb, S., Dandapat, S.: Emotion Classification using Segmentation of Vowel-Like and Non-Vowel-Like Regions. http://ieeexplore.ieee.org/document/7987785/ (2017). https://doi.org/10.1109/TAFFC.2017.2730187
Mairesse, F., Polifroni, J., Di Fabbrizio, G.: Can prosody inform sentiment analysis? experiments on short spoken reviews. In: ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing – Proceedings, pp. 5093–5096 (2012). https://doi.org/10.1109/ICASSP.2012.6289066
Sawata, R., Ogawa, T., Haseyama, M.: Novel audio feature projection using KDLPCCA-based correlation with EEG features for favorite music classification. IEEE Trans. Affect. Comput. 1–1 (2017). https://doi.org/10.1109/TAFFC.2017.2729540
Shorten, C., Khoshgoftaar, T.M.: A survey on image data augmentation for deep learning. J. Big Data 6(1), 1–48 (2019). https://doi.org/10.1186/s40537-019-0197-0
Gavali, P., Banu, J.S.: Deep convolutional Neural Network for image classification on CUDA platform. In: Deep Learning and Parallel Computing Environment for Bioengineering Systems, pp. 99–122. Elsevier (2019). https://doi.org/10.1016/B978-0-12-816718-2.00013-0
Padi, S., Manocha, D., Sriram, R.D.: Multi-Window Data Augmentation Approach for Speech Emotion Recognition. Presented at the (2020)
Bhakre, S.K., Bang, A.: Emotion recognition on the basis of audio signal using Naive Bayes classifier. In: 2016 International Conference on Advances in Computing, Communications and Informatics (ICACCI), pp. 2363–2367 (2016). https://doi.org/10.1109/ICACCI.2016.7732408
Meftah, I.T., Le. Thanh, N., Ben Amar, C.: Emotion recognition using KNN classification for user modeling and sharing of affect states. In: Huang, T., Zeng, Z., Li, C., Leung, C.S. (eds.) ICONIP 2012. LNCS, vol. 7663, pp. 234–242. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-34475-6_29
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Shoumy, N.J., Ang, LM., Rahaman, D.M.M., Zia, T., Seng, K.P., Khatun, S. (2021). Augmented Audio Data in Improving Speech Emotion Classification Tasks. In: Fujita, H., Selamat, A., Lin, J.CW., Ali, M. (eds) Advances and Trends in Artificial Intelligence. From Theory to Practice. IEA/AIE 2021. Lecture Notes in Computer Science(), vol 12799. Springer, Cham. https://doi.org/10.1007/978-3-030-79463-7_30
Download citation
DOI: https://doi.org/10.1007/978-3-030-79463-7_30
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-79462-0
Online ISBN: 978-3-030-79463-7
eBook Packages: Computer ScienceComputer Science (R0)