Skip to main content

Time Distributed Multiview Representation for Speech Emotion Recognition

  • Conference paper
  • First Online:
Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications (CIARP 2023)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14469))

Included in the following conference series:

  • 327 Accesses

Abstract

In recent years, speech-emotion recognition (SER) techniques have gained importance, mainly in human-computer interaction studies and applications. This research area has different challenges, including developing new and efficient detection methods, efficient extraction of audio features, and time preprocessing strategies. This paper proposes a new multiview model to detect speech emotion in raw audio data. The proposed method uses mel-spectrogram features optimized from audio files and combines deep learning algorithms to improve the detection performance. This combination relied on the following algorithms: CNN (Convolutional Neural Network), VGG (Visual Geometry Group), ResNet (Residual neural network), and LSTM (Long Short-Term Memory). The role of the CNN algorithm is to extract the characteristics present in the images of the mel-spectrograms applied as input to the method. These characteristics are combined with the VGG and ResNet networks, which are pre-trained algorithms. Finally, the LSTM algorithm receives all this combined information to identify the predefined emotions. The proposed method was developed using the RAVDESS database and considering eight emotions. The results show an increase of up to 12% in accuracy compared to strategies in the literature that use raw data processing.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Kulkarni, K., et al.: Automatic recognition of facial displays of unfelt emotions. IEEE Trans. Affect. Comput. 12(2), 377–390 (2021). https://doi.org/10.1109/TAFFC.2018.2874996

  2. Aleedy, M., Shaiba, H., Bezbradica, M.: Generating and analyzing chatbot responses using natural language processing. Int. J. Adv. Comput. Sci. Appl. (IJACSA) 10(9) (2019). https://doi.org/10.14569/IJACSA.2019.0100910

  3. Loris. www.loris.ai/company/

  4. Das, A., Nair, K., Bandi, Y.: Emotion detection using natural language processing and ConvNets. In: Shukla, S., Gao, X.Z., Kureethara, J.V., Mishra, D. (eds.) Data Science and Security. LNNS, vol. 462, pp. 127–135. Springer, Singapore (2022). https://doi.org/10.1007/978-981-19-2211-4_11

  5. Cowen, A.S., Keltner, D.: Self-report captures 27 distinct categories of emotion bridged by continuous gradients. Proc. Nat. Acad. Sci. USA 114(38), E7900–E7909 (2017). https://doi.org/10.1073/pnas.1702247114. Epub 5 September 2017. PMID: 28874542. PMCID: PMC5617253

  6. Rajak, R., Mall, R.: Emotion recognition from audio, dimensional and discrete categorization using CNNs. In: TENCON 2019–2019 IEEE Region 10 Conference (TENCON), Kochi, India, pp. 301–305 (2019). https://doi.org/10.1109/TENCON.2019.8929459

  7. Mustaqeem, K.S.: A CNN-assisted enhanced audio signal processing for speech emotion recognition. Sensors 20(1), 183 (2020). https://doi.org/10.3390/s20010183

  8. Slimi, A., Hamroun, M., Zrigui, M., Nicolas, H.: Emotion recognition from speech using spectrograms and shallow neural networks. In: Proceedings of the 18th International Conference on Advances in Mobile Computing & Multimedia (MoMM 2020), pp. 35–39. Association for Computing Machinery, New York, NY, USA (2021)

    Google Scholar 

  9. Gupta, M., Chandra, S.: Speech emotion recognition using MFCC and wide residual network. In: 2021 Thirteenth International Conference on Contemporary Computing (IC3-2021), pp. 320–327. Association for Computing Machinery, New York, NY, USA (2021). https://doi.org/10.1145/3474124.3474171

  10. Ayadi, S., Lachiri, Z.: A combined CNN-LSTM network for audio emotion recognition using speech and song attributs. In: 2022 6th International Conference on Advanced Technologies for Signal and Image Processing (ATSIP), Sfax, Tunisia, pp. 1–6 (2022). https://doi.org/10.1109/ATSIP55956.2022.9805924

  11. Livingstone, S.R., Russo, F.A.: The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) (2018)

    Google Scholar 

  12. Deckmann, S.M., Pomilio, J.A.: Analysis of discretized signals. in Electric Power Quality Assessment - UNICAMP (2020)

    Google Scholar 

  13. Raffel, C., Liang, D., Ellis, D.P.W., Nieto, O.: librosa: audio and music signal analysis in Python. In: Proceedings of the 14th Python in Science Conference (2015)

    Google Scholar 

Download references

Acknowledgements

This study was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - Brasil (CAPES) - Finance Code 001. This work was partially supported by Conselho Nacional de Desenvolvimento Científico e Tecnológico - CNPq (Proc. 311065/2020-1).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marcelo E. Pellenz .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Letícia de Mattos, F., Pellenz, M.E., Britto, A.d.S. (2024). Time Distributed Multiview Representation for Speech Emotion Recognition. In: Vasconcelos, V., Domingues, I., Paredes, S. (eds) Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications. CIARP 2023. Lecture Notes in Computer Science, vol 14469. Springer, Cham. https://doi.org/10.1007/978-3-031-49018-7_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-49018-7_11

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-49017-0

  • Online ISBN: 978-3-031-49018-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics