Skip to main content
Log in

Language dialect based speech emotion recognition through deep learning techniques

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

The primordial way of communication is through vocal signals, which pave the way for support between individuals in a social structure. Computer applications provide a way to create Automatic Speech Recognition (ASR) with a combination of Speech Emotion Recognition (SER) to detect and identify emotions in the speech signals. The semantic relatedness of words with abstract concepts proves to be complicated than concrete ideas. An ensemble of different clustering techniques is utilized to automatically segregate sense distinctions in the various dialects of sentences spoken to tackle this issue. The interpretation of word sense of a word may change with time and group of people. The proposed model maps characters to word sense with weights provided by Senticnet with trial-and-error methods and tuning. The proposed model utilizes stop words to distinguish word senses with 72.78% accuracy for regional dialects.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

References

  • Akçay, M. B., & Oğuz, K. (2020). Speech emotion recognition: Emotional models, databases, features, preprocessing methods, supporting modalities, and classifiers. Speech Communication, 116, 56–76.

    Article  Google Scholar 

  • Bakhshi, A., Chalup, S., Harimi, A., & Mirhassani, S. M. (2020). Recognition of emotion from speech using evolutionary cepstral coefficients. Multimedia Tools and Applications, 79(2), 1–21.

    Google Scholar 

  • Bernard, M., Thiolliere, R., Saksida, A., Loukatou, G. R., Larsen, E., Johnson, M., Fibla, L., Dupoux, E., Daland, R., Cao, X. N., et al. (2020). WordSeg: Standardizing unsupervised word form segmentation from text. Behavior Research Methods, 52(1), 264–278.

    Article  Google Scholar 

  • Christy, A., Vaithyasubramanian, S., Jesudoss, A., & Praveena, M. D. A. (2020). Multimodal speech emotion recognition and classification using convolutional neural network techniques. International Journal of Speech Technology, 23, 381–388 (2020)

  • Gaonkar, R., Kwon, H., Bastan, M., Balasubramanian, N., & Chambers, N. (2020). Modeling Label Semantics for Predicting Emotional Reactions. ArXiv Preprint. arXiv:2006.05489.

  • Grave, E., Bojanowski, P., Gupta, P., Joulin, A., & Mikolov, T. (2018). Learning word vectors for 157 languages. In Proceedings of the international conference on language resources and evaluation (LREC 2018).

  • Jermsittiparsert, K., Abdurrahman, A., Siriattakul, P., Sundeeva, L. A., Hashim, W., Rahim, R., & Maseleno, A. (2020). Pattern recognition and features selection for speech emotion recognition model using deep learning. International Journal of Speech Technology, 23(4), 1–8.

    Article  Google Scholar 

  • Kunchukuttan, A., Kakwani, D., Golla, S., Gokul, N. C., Bhattacharyya, A., Khapra, M. M., & Kumar, P. (2020). AI4Bharat-IndicNLP Corpus: Monolingual Corpora and Word Embeddings for Indic Languages. ArXiv Preprint. arXiv:2005.00085.

  • Moselhy, A. M., & Abdelnaiem, A. A. (2013). LPC and MFCC performance evaluation with artificial neural network for spoken language identification. International Journal of Signal Processing, Image Processing and Pattern Recognition, 6(3), 55.

    Google Scholar 

  • Rajendran, S., & Jayagopal, P. (2020). Preserving learnability and intelligibility at the point of care with assimilation of different speech recognition techniques. International Journal of Speech Technology, 23(2), 265–276. https://doi.org/10.1007/s10772-020-09687-x.

    Article  Google Scholar 

  • Shi, Y., Hwang, M.-Y., & Lei, X. (2019). End-to-end speech recognition using a high rank lstm-ctc based model. In ICASSP 2019–2019 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 7080–7084).

  • Shivaprasad, S., & Sadanandam, M. (2020). Identification of regional dialects of Telugu language using text independent speech processing models. International Journal of Speech Technology, 23, 251–258 (2020).

  • Stiennon, N., Ouyang, L., Wu, J., Ziegler, D. M., Lowe, R., Voss, C., Radford, A., Amodei, D., & Christiano, P. (2020). Learning to summarize from human feedback. ArXiv Preprint. arXiv:2009.01325.

  • Tavares, A. R., Avelar, P., Flach, J. M., Nicolau, M., Lamb, L. C., & Vardi, M. (2020). Understanding Boolean function learnability on deep neural networks. ArXiv Preprint. arXiv:2009.05908

  • Xu, Q., Likhomanenko, T., Kahn, J., Hannun, A., Synnaeve, G., & Collobert, R. (2020). Iterative pseudo-labeling for speech recognition. Computation and Language. arXiv Preprint. arXiv:2005.09267.

  • Yang, Y., Yuan, S., Cer, D., Kong, S.-Y., Constant, N., Pilar, P., Ge, H., Sung, Y.-H., Strope, B., & Kurzweil, R. (2018). Learning semantic textual similarity from conversations. ArXiv Preprint. arXiv:1804.07754.

  • Yoon, S., Byun, S., & Jung, K. (2018). Multimodal speech emotion recognition using audio and text. In 2018 IEEE spoken language technology workshop (SLT) (pp. 112–118).

  • Yu, C., Kang, M., Chen, Y., Wu, J., & Zhao, X. (2020). Acoustic modeling based on deep learning for low-resource speech recognition: An overview. IEEE Access. https://doi.org/10.1109/ACCESS.2020.3020421.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sandeep Kumar Mathivanan.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Rajendran, S., Mathivanan, S., Jayagopal, P. et al. Language dialect based speech emotion recognition through deep learning techniques. Int J Speech Technol 24, 625–635 (2021). https://doi.org/10.1007/s10772-021-09838-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-021-09838-8

Keywords

Navigation