Skip to main content

Telegram Bot for Emotion Recognition Using Acoustic Cues and Prosody

  • Conference paper
  • First Online:
Computational Intelligence in Communications and Business Analytics (CICBA 2022)

Abstract

In recent times, voice emotional recognition has been established as a significant field in Human Computer Interaction (HCI). Human emotions are intrinsically expressed by linguistic (verbal content) information and paralinguistic information such as tone, emotional state, expressions and gestures which form the relevant basis for emotional analysis. Accounting the user’s affective state in HCI systems is essential to detect subtle user behaviour changes through which the computer can initiate interactions instead of simply responding to user commands. This paper aims to tackle existing problems in speech emotion recognition (SER) by taking into account the acoustic cues and prosodic parameters to detect user emotion. Here, the work will be mainly on the Ryerson Audio-Visual Database (RAVDESS) to extract Mel Frequency Cepstral Coefficients (MFCC) from signals to recognise emotion. The librosa library will be used for processing and extracting audio files before testing and classification is carried out using 4 different classifiers for a comparative study. This SER model is then integrated into a Telegram Bot to develop an intuitive, user-friendly interface as an application for psychological therapy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Zhao, J., Mao, X., Chen, L.: Speech emotion recognition using DEEP 1d 2D CNN LSTM networks. Biomed. Signal Process. Control 47, 312–323 (2019)

    Article  Google Scholar 

  2. Milton, A., Sharmy Roy, S., Tamil Selvi, S.: SVM scheme for Speech emotion recognition USING MFCC feature. Int. J. Comput. Appl. 69(9), 34–39 (2013)

    Google Scholar 

  3. Chenchah, F., Lachiri, Z.: Acoustic emotion recognition using linear and nonlinear cepstral coefficients. Int. J. Adv. Comput. Sci. Appl. 6(11) (2015)

    Google Scholar 

  4. Nwe, T.L., Foo, S.W., De Silva, L.C.: Speech emotion recognition using hidden Markov models. Speech Commun. 41(4), 603–623 (2003)

    Article  Google Scholar 

  5. Demircan, S., Kahramanlı, H.: Feature extraction from speech data for emotion recognition. J. Adv. Comput. Networks 2(1), 28–30 (2014)

    Article  Google Scholar 

  6. Le, D., Provost, E.M.: Emotion recognition from spontaneous speech using Hidden Markov models with deep belief networks. In: 2013 IEEE Workshop on Automatic Speech Recognition and Understanding (2013)

    Google Scholar 

  7. Tawari, A., Trivedi, M.: Speech emotion analysis in noisy real-world environment. In: 2010 International Conference on Pattern Recognition

    Google Scholar 

  8. Shaqra, F.A., Duwairi, R., Al-Ayyoub, M.: Recognizing emotion from speech based on age and gender using hierarchical models. In: The 10th International Conference on Ambient Systems, Networks and Technologies (ANT), 29 April–2 May 2019, Leuven, Belgium (2019)

    Google Scholar 

  9. Nediyanchath, A., Paramasivam, P., Yenigalla, P.: Multi-head attention for speech emotion recognition with auxiliary learning of gender recognition. In: 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

    Google Scholar 

  10. Mao, Q., Dong, M., Huang, Z., Zhan, Y.: Learning salient features for Speech Emotion Recognition using convolutional Neural Networks. IEEE Trans. Multimedia 16(8), 2203–2213 (2014)

    Article  Google Scholar 

  11. Nalini, N.J., Palanivel, S., Balasubramanian, M.: Speech emotion recognition using residual phase and MFCC features. Int. J. Eng. Technol. 5(6), 4515–4527 (2013)

    Google Scholar 

  12. Livingstone, S.R., Russo, F.A.: The ryerson audio-visual database of emotional speech and song (RAVDESS): a dynamic, multimodal set of facial and vocal expressions in North American English. PLoS ONE 13(5), e0196391 (2018)

    Article  Google Scholar 

  13. Alim, S.A., Alang Rashid, N.K.: Some Commonly Used Speech Feature Extraction Algorithms. Pub: 12th Dec, 2018

    Google Scholar 

  14. Lausen, A., Hammerschmidt, K.: Emotion recognition and confidence ratings predicted by vocal stimulus type and prosodic parameters. Humanities and Social Sciences Communications, vol. 7, no. 1 (2020)

    Google Scholar 

  15. Sanjita, B.R., Nipunika, A.: Speech Emotion Recognition using MLP Classifier. IJESC, vol. 10, no. 5, May 2020

    Google Scholar 

  16. Amjad, A., Khan, L.: Effect on speech emotion classification of a feature selection approach using a convolutional neural network. Peer J. Comput. Sci., 7, Pub: 3rd Nov, 2021

    Google Scholar 

  17. Martin, O., Kotsia, I., Macq, B.: The eNTERFACE’ 05 audio-visual emotion database. In: 22nd International Conference on Data Engineering Workshops (ICDEW'06); Pub: 24th Apr, 2006

    Google Scholar 

  18. Madhavi, A., Priya Valentina, A., Mounika, K., Rohit, B., Nagma, S.: Comparative analysis of different classifiers for speech emotion recognition. In: Kiran Mai, C., Kiranmayee, B.V., Favorskaya, M.N., Chandra Satapathy, S., Raju, K.S. (eds.) Proceedings of International Conference on Advances in Computer Engineering and Communication Systems. LAIS, vol. 20, pp. 523–538. Springer, Singapore (2021). https://doi.org/10.1007/978-981-15-9293-5_48

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ishita Nag .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Nag, I., Syed, S.A., Basu, S., Shaw, S., Banik, B.G. (2022). Telegram Bot for Emotion Recognition Using Acoustic Cues and Prosody. In: Mukhopadhyay, S., Sarkar, S., Dutta, P., Mandal, J.K., Roy, S. (eds) Computational Intelligence in Communications and Business Analytics. CICBA 2022. Communications in Computer and Information Science, vol 1579. Springer, Cham. https://doi.org/10.1007/978-3-031-10766-5_31

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-10766-5_31

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-10765-8

  • Online ISBN: 978-3-031-10766-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics