skip to main content
10.1145/3456126.3456129acmotherconferencesArticle/Chapter ViewAbstractPublication PagesasseConference Proceedingsconference-collections
research-article

SeeSpeech: See Emotions in The Speech

Authors Info & Claims
Published:29 June 2021Publication History

ABSTRACT

At present, the understanding of speech by machines mostly focuses on the understanding of semantics, but speech should also include emotions in the speech. Emotion can not only strengthen semantics, but can even change semantic information. The paper discusses how to realize the emotion classification, which is called SeeSpeech. SeeSpeech chooses MCEP as the speech emotion feature, and inputs it into CNN and Transformer respectively. In order to obtain richer features, CNN uses batch normalization, while Transformer uses layer normalization, and then combines the output of CNN and Transformer. Finally, the type of emotion is obtained through SoftMax. SeeSpeech obtained the highest classification accuracy rate of 97% on the RAVDESS data set, and also obtained the classification accuracy rate of 85% on the actual edge gateway test. It can be seen from the results that SeeSpeech has encouraging performance in speech emotion classification and has a wide range of application prospects in human-computer interaction.

References

  1. R. W. Picard, Affective computing. MIT press, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. P. Gupta and N. Rajput, “Two-stream emotion recognition for call center monitoring,” in Eighth Annual Conference of the International Speech Communication Association, 2007.Google ScholarGoogle Scholar
  3. B. Schuller, G. Rigoll, and M. Lang, “Speech emotion recognition combining acoustic features and linguistic information in a hybrid support vector machine-belief network architecture,” in 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 1. IEEE, 2004, pp. I–577.Google ScholarGoogle ScholarCross RefCross Ref
  4. V. Kostov and S. Fukuda, “Emotion in user interface, voice interaction system,” in Smc 2000 conference proceedings. 2000 ieee international conference on systems, man and cybernetics.’cybernetics evolving to systems, humans, organizations, and their complex interactions’(cat. no. 0, vol. 2. IEEE, 2000, pp. 798–803.Google ScholarGoogle ScholarCross RefCross Ref
  5. H. Boril, S. Omid Sadjadi, T. Kleinschmidt, and J. H. Hansen, “Analysis and detection of cognitive load and frustration in drivers’ speech,” Proceedings of INTERSPEECH 2010, pp. 502–505, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  6. E. Marchi, B. Schuller, A. Batliner, S. Fridenzon, S. Tal, and O. Golan, “Emotion in the speech of children with autism spectrum conditions: Prosody and everything else,” in Proceedings 3rd Workshop on Child, Computer and Interaction (WOCCI 2012), Satellite Event of INTERSPEECH 2012, 2012.Google ScholarGoogle Scholar
  7. R. F. Livingstone SR, “(2018) the ryerson audio-visual database of emotional speech and song (ravdess): A dynamic, multimodal set of facial and vocal expressions in north american english. plos one 13(5): e0196391.” https://doi.org/10.1371/journal.pone.0196391.Google ScholarGoogle Scholar
  8. M. G. de Pinto, M. Polignano, P. Lops, and G. Semeraro, “Emotions understanding model from spoken language using deep neural networks and mel-frequency cepstral coefficients,” in 2020 IEEE Conference on Evolving and Adaptive Intelligent Systems (EAIS). IEEE, 2020, pp. 1 – 5.Google ScholarGoogle ScholarCross RefCross Ref
  9. Iqbal and K. Barua, “A real-time emotion recognition from speech using gradient boosting,” in 2019 International Conference on Electrical, Computer and Communication Engineering (ECCE). IEEE, 2019, pp. 1 – 5.Google ScholarGoogle ScholarCross RefCross Ref
  10. R. Jannat, I. Tynes, L. L. Lime, J. Adorno, and S. Canavan, “Ubiquitous emotion recognition using audio and video data,” in Proceedings of the 2018 ACM International Joint Conference and 2018 International Symposium on Pervasive and Ubiquitous Computing and Wearable Computers, 2018, pp. 956–959. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. K. S. Rao, S. G. Koolagudi, and R. R. Vempada, “Emotion recognition from speech using global and local prosodic features,” International journal of speech technology, vol. 16, no. 2, pp. 143–160, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. S. G. Koolagudi and K. S. Rao, “Emotion recognition from speech: a review,” International journal of speech technology, vol. 15, no. 2, pp. 99 –117, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. S. Kuchibhotla, H. D. Vankayalapati, R. Vaddi, and K. R. Anne, “A comparative analysis of classifiers in emotion recognition through acoustic features,” International Journal of Speech Technology, vol. 17, no. 4, pp. 401–408, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. D. Bitouk, R. Verma, and A. Nenkova, “Class-level spectral features for emotion recognition,” Speech communication, vol. 52, no. 7-8, pp. 613–625 , 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. H. Teager and S. Teager, “Evidence for nonlinear sound production mechanisms in the vocal tract,” in Speech production and speech modelling. Springer, 1990, pp. 241–261.Google ScholarGoogle ScholarCross RefCross Ref
  16. J. F. Kaiser, “On a simple algorithm to calculate the'energy'of a signal,” in International conference on acoustics, speech, and signal processing. IEEE, 1990, pp. 381–384.Google ScholarGoogle ScholarCross RefCross Ref
  17. F. Bulagang, N. G. Weng, J. Mountstephens, and J. Teo, “A review of recent approaches for emotion classification using electrocardiography and electrodermography signals,” Informatics in Medicine Unlocked, vol. 20, p. 100363, 2020. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S2352914820301040Google ScholarGoogle ScholarCross RefCross Ref
  18. B. Schuller, G. Rigoll, and M. Lang, “Hidden markov model-based speech emotion recognition,” in 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings.(ICASSP’03)., vol. 2. IEEE, 2003, pp. II–1.Google ScholarGoogle ScholarCross RefCross Ref
  19. T. L. Nwe, S. W. Foo, and L. C. De Silva, “Speech emotion recognition using hidden markov models,” Speech communication, vol. 41, no. 4, pp. 603–623, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  20. Y. Pan, P. Shen, and L. Shen, “Speech emotion recognition using support vector machine,” International Journal of Smart Home, vol. 6, no. 2, pp. 101–108, 2012.Google ScholarGoogle Scholar
  21. J. Nicholson, K. Takahashi, and R. Nakatsu, “Emotion recognition in speech using neural networks,” Neural computing & applications, vol. 9, no. 4 , pp. 290–296, 2000.Google ScholarGoogle ScholarCross RefCross Ref
  22. F. Eyben, M. Wollmer,¨ A. Graves, B. Schuller, E. Douglas-Cowie, and R. Cowie, “On-line emotion recognition in a 3-d activation-valence-time continuum using acoustic and linguistic cues,” Journal on Multimodal User Interfaces, vol. 3, no. 1-2, pp. 7–19, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  23. G. Trigeorgis, F. Ringeval, R. Brueckner, E. Marchi, M. A. Nicolaou, B. Schuller, and S. Zafeiriou, “Adieu features? end-to-end speech emotion recognition using a deep convolutional recurrent network,” in 2016 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, 2016, pp. 5200–5204.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. W. Lim, D. Jang, and T. Lee, “Speech emotion recognition using convolutional and recurrent neural networks,” in 2016 Asia-Pacific signal and information processing association annual summit and conference (APSIPA). IEEE, 2016, pp. 1 – 4.Google ScholarGoogle Scholar

Index Terms

  1. SeeSpeech: See Emotions in The Speech
            Index terms have been assigned to the content through auto-classification.

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in
            • Published in

              cover image ACM Other conferences
              ASSE '21: 2021 2nd Asia Service Sciences and Software Engineering Conference
              February 2021
              143 pages
              ISBN:9781450389082
              DOI:10.1145/3456126

              Copyright © 2021 ACM

              Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 29 June 2021

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • research-article
              • Research
              • Refereed limited
            • Article Metrics

              • Downloads (Last 12 months)2
              • Downloads (Last 6 weeks)0

              Other Metrics

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader

            HTML Format

            View this article in HTML Format .

            View HTML Format