Augmented Audio Data in Improving Speech Emotion Classification Tasks

Shoumy, Nusrat J.; Ang, Li-Minn; Rahaman, D. M. Motiur; Zia, Tanveer; Seng, Kah Phooi; Khatun, Sabira

doi:10.1007/978-3-030-79463-7_30

Nusrat J. Shoumy¹²,
Li-Minn Ang¹³,
D. M. Motiur Rahaman¹²,
Tanveer Zia¹²,
Kah Phooi Seng¹⁴ &
…
Sabira Khatun¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12799))

Included in the following conference series:

International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems

1188 Accesses
3 Citations

Abstract

To achieve high performance and classification accuracy, classification of emotions from audio or speech signals requires large quantities of data. Big datasets, however, are not always readily accessible. A good solution to this issue is to increase the data and augment it to construct a larger dataset for the classifier’s training. This paper proposes a unimodal approach that focuses on two main concepts: (1) augmenting speech signals to generate additional data samples; and (2) constructing classification models to identify emotion expressed through speech. In addition, three classifiers (Convolutional Neural Network (CNN), Naïve Bayes (NB) and K-Nearest Neighbor (kNN)) were further tested in order to decide which of the classifiers had the best results. We used augmented audio data from a dataset (SAVEE) in the proposed method to conduct training (50%), and testing (50%) was executed using the original data. The best performance of approximately 83% was found to be a mixture of augmentation strategies using the CNN classifier. Our proposed augmentation approach together with appropriate classification model enhances the efficiency of voice emotion recognition.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Shoumy, N.J., Ang, L.-M., Seng, K.P., Rahaman, D.M.M., Zia, T.: Multimodal big data affective analytics: a comprehensive survey using text, audio, visual and physiological signals. J. Netw. Comput. Appl. 149, 102447 (2020). https://doi.org/10.1016/j.jnca.2019.102447
Deb, S., Dandapat, S.: Emotion Classification using Segmentation of Vowel-Like and Non-Vowel-Like Regions. http://ieeexplore.ieee.org/document/7987785/ (2017). https://doi.org/10.1109/TAFFC.2017.2730187
Mairesse, F., Polifroni, J., Di Fabbrizio, G.: Can prosody inform sentiment analysis? experiments on short spoken reviews. In: ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing – Proceedings, pp. 5093–5096 (2012). https://doi.org/10.1109/ICASSP.2012.6289066
Sawata, R., Ogawa, T., Haseyama, M.: Novel audio feature projection using KDLPCCA-based correlation with EEG features for favorite music classification. IEEE Trans. Affect. Comput. 1–1 (2017). https://doi.org/10.1109/TAFFC.2017.2729540
Shorten, C., Khoshgoftaar, T.M.: A survey on image data augmentation for deep learning. J. Big Data 6(1), 1–48 (2019). https://doi.org/10.1186/s40537-019-0197-0
Article Google Scholar
Gavali, P., Banu, J.S.: Deep convolutional Neural Network for image classification on CUDA platform. In: Deep Learning and Parallel Computing Environment for Bioengineering Systems, pp. 99–122. Elsevier (2019). https://doi.org/10.1016/B978-0-12-816718-2.00013-0
Padi, S., Manocha, D., Sriram, R.D.: Multi-Window Data Augmentation Approach for Speech Emotion Recognition. Presented at the (2020)
Google Scholar
Bhakre, S.K., Bang, A.: Emotion recognition on the basis of audio signal using Naive Bayes classifier. In: 2016 International Conference on Advances in Computing, Communications and Informatics (ICACCI), pp. 2363–2367 (2016). https://doi.org/10.1109/ICACCI.2016.7732408
Meftah, I.T., Le. Thanh, N., Ben Amar, C.: Emotion recognition using KNN classification for user modeling and sharing of affect states. In: Huang, T., Zeng, Z., Li, C., Leung, C.S. (eds.) ICONIP 2012. LNCS, vol. 7663, pp. 234–242. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-34475-6_29
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

School of Computing and Mathematics, Charles Sturt University, Bathurst, NSW, Australia
Nusrat J. Shoumy, D. M. Motiur Rahaman & Tanveer Zia
School of Science and Engineering, University of the Sunshine Coast, Sunshine Coast, QLD, Australia
Li-Minn Ang
School of Engineering and IT, University of New South Wales, Canberra, Australia
Kah Phooi Seng
Faculty of Electrical and Electronics Engineering, Universiti Malaysia Pahang, Pahang, Malaysia
Sabira Khatun

Authors

Nusrat J. Shoumy
View author publications
You can also search for this author in PubMed Google Scholar
Li-Minn Ang
View author publications
You can also search for this author in PubMed Google Scholar
D. M. Motiur Rahaman
View author publications
You can also search for this author in PubMed Google Scholar
Tanveer Zia
View author publications
You can also search for this author in PubMed Google Scholar
Kah Phooi Seng
View author publications
You can also search for this author in PubMed Google Scholar
Sabira Khatun
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nusrat J. Shoumy .

Editor information

Editors and Affiliations

i-SOMET Incorporate Association, Morioka, Japan
Hamido Fujita
Universiti Teknologi Malaysia, Kuala Lumpur, Malaysia
Ali Selamat
Western Norway University of Applied Sciences, Bergen, Norway
Jerry Chun-Wei Lin
Texas State University San Marcos, San Marcos, TX, USA
Moonis Ali

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Shoumy, N.J., Ang, LM., Rahaman, D.M.M., Zia, T., Seng, K.P., Khatun, S. (2021). Augmented Audio Data in Improving Speech Emotion Classification Tasks. In: Fujita, H., Selamat, A., Lin, J.CW., Ali, M. (eds) Advances and Trends in Artificial Intelligence. From Theory to Practice. IEA/AIE 2021. Lecture Notes in Computer Science(), vol 12799. Springer, Cham. https://doi.org/10.1007/978-3-030-79463-7_30

Download citation

DOI: https://doi.org/10.1007/978-3-030-79463-7_30
Published: 19 July 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-79462-0
Online ISBN: 978-3-030-79463-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics