research-article

SeeSpeech: See Emotions in The Speech

Authors:
Jianing Geng

School of Data Science, University of Science and Technology of China, China

School of Data Science, University of Science and Technology of China, China
View Profile

,
Hao Zhu

University of Science and Technology of China, China

University of Science and Technology of China, China
View Profile

,
Xiang-Yang Li

University of Science and Technology of China, China

University of Science and Technology of China, China
View Profile

ASSE '21: 2021 2nd Asia Service Sciences and Software Engineering ConferenceFebruary 2021Pages 116–122https://doi.org/10.1145/3456126.3456129

Published:29 June 2021Publication History

ASSE '21: 2021 2nd Asia Service Sciences and Software Engineering Conference

Pages 116–122

ABSTRACT

At present, the understanding of speech by machines mostly focuses on the understanding of semantics, but speech should also include emotions in the speech. Emotion can not only strengthen semantics, but can even change semantic information. The paper discusses how to realize the emotion classification, which is called SeeSpeech. SeeSpeech chooses MCEP as the speech emotion feature, and inputs it into CNN and Transformer respectively. In order to obtain richer features, CNN uses batch normalization, while Transformer uses layer normalization, and then combines the output of CNN and Transformer. Finally, the type of emotion is obtained through SoftMax. SeeSpeech obtained the highest classification accuracy rate of 97% on the RAVDESS data set, and also obtained the classification accuracy rate of 85% on the actual edge gateway test. It can be seen from the results that SeeSpeech has encouraging performance in speech emotion classification and has a wide range of application prospects in human-computer interaction.

References

R. W. Picard, Affective computing. MIT press, 2000. Google ScholarDigital Library
P. Gupta and N. Rajput, “Two-stream emotion recognition for call center monitoring,” in Eighth Annual Conference of the International Speech Communication Association, 2007.Google Scholar
B. Schuller, G. Rigoll, and M. Lang, “Speech emotion recognition combining acoustic features and linguistic information in a hybrid support vector machine-belief network architecture,” in 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 1. IEEE, 2004, pp. I–577.Google ScholarCross Ref
V. Kostov and S. Fukuda, “Emotion in user interface, voice interaction system,” in Smc 2000 conference proceedings. 2000 ieee international conference on systems, man and cybernetics.’cybernetics evolving to systems, humans, organizations, and their complex interactions’(cat. no. 0, vol. 2. IEEE, 2000, pp. 798–803.Google ScholarCross Ref
H. Boril, S. Omid Sadjadi, T. Kleinschmidt, and J. H. Hansen, “Analysis and detection of cognitive load and frustration in drivers’ speech,” Proceedings of INTERSPEECH 2010, pp. 502–505, 2010.Google ScholarCross Ref
E. Marchi, B. Schuller, A. Batliner, S. Fridenzon, S. Tal, and O. Golan, “Emotion in the speech of children with autism spectrum conditions: Prosody and everything else,” in Proceedings 3rd Workshop on Child, Computer and Interaction (WOCCI 2012), Satellite Event of INTERSPEECH 2012, 2012.Google Scholar
R. F. Livingstone SR, “(2018) the ryerson audio-visual database of emotional speech and song (ravdess): A dynamic, multimodal set of facial and vocal expressions in north american english. plos one 13(5): e0196391.” https://doi.org/10.1371/journal.pone.0196391.Google Scholar
M. G. de Pinto, M. Polignano, P. Lops, and G. Semeraro, “Emotions understanding model from spoken language using deep neural networks and mel-frequency cepstral coefficients,” in 2020 IEEE Conference on Evolving and Adaptive Intelligent Systems (EAIS). IEEE, 2020, pp. 1 – 5.Google ScholarCross Ref
Iqbal and K. Barua, “A real-time emotion recognition from speech using gradient boosting,” in 2019 International Conference on Electrical, Computer and Communication Engineering (ECCE). IEEE, 2019, pp. 1 – 5.Google ScholarCross Ref
R. Jannat, I. Tynes, L. L. Lime, J. Adorno, and S. Canavan, “Ubiquitous emotion recognition using audio and video data,” in Proceedings of the 2018 ACM International Joint Conference and 2018 International Symposium on Pervasive and Ubiquitous Computing and Wearable Computers, 2018, pp. 956–959. Google ScholarDigital Library
K. S. Rao, S. G. Koolagudi, and R. R. Vempada, “Emotion recognition from speech using global and local prosodic features,” International journal of speech technology, vol. 16, no. 2, pp. 143–160, 2013. Google ScholarDigital Library
S. G. Koolagudi and K. S. Rao, “Emotion recognition from speech: a review,” International journal of speech technology, vol. 15, no. 2, pp. 99 –117, 2012. Google ScholarDigital Library
S. Kuchibhotla, H. D. Vankayalapati, R. Vaddi, and K. R. Anne, “A comparative analysis of classifiers in emotion recognition through acoustic features,” International Journal of Speech Technology, vol. 17, no. 4, pp. 401–408, 2014. Google ScholarDigital Library
D. Bitouk, R. Verma, and A. Nenkova, “Class-level spectral features for emotion recognition,” Speech communication, vol. 52, no. 7-8, pp. 613–625 , 2010. Google ScholarDigital Library
H. Teager and S. Teager, “Evidence for nonlinear sound production mechanisms in the vocal tract,” in Speech production and speech modelling. Springer, 1990, pp. 241–261.Google ScholarCross Ref
J. F. Kaiser, “On a simple algorithm to calculate the'energy'of a signal,” in International conference on acoustics, speech, and signal processing. IEEE, 1990, pp. 381–384.Google ScholarCross Ref
F. Bulagang, N. G. Weng, J. Mountstephens, and J. Teo, “A review of recent approaches for emotion classification using electrocardiography and electrodermography signals,” Informatics in Medicine Unlocked, vol. 20, p. 100363, 2020. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S2352914820301040Google ScholarCross Ref
B. Schuller, G. Rigoll, and M. Lang, “Hidden markov model-based speech emotion recognition,” in 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings.(ICASSP’03)., vol. 2. IEEE, 2003, pp. II–1.Google ScholarCross Ref
T. L. Nwe, S. W. Foo, and L. C. De Silva, “Speech emotion recognition using hidden markov models,” Speech communication, vol. 41, no. 4, pp. 603–623, 2003.Google ScholarCross Ref
Y. Pan, P. Shen, and L. Shen, “Speech emotion recognition using support vector machine,” International Journal of Smart Home, vol. 6, no. 2, pp. 101–108, 2012.Google Scholar
J. Nicholson, K. Takahashi, and R. Nakatsu, “Emotion recognition in speech using neural networks,” Neural computing & applications, vol. 9, no. 4 , pp. 290–296, 2000.Google ScholarCross Ref
F. Eyben, M. Wollmer,¨ A. Graves, B. Schuller, E. Douglas-Cowie, and R. Cowie, “On-line emotion recognition in a 3-d activation-valence-time continuum using acoustic and linguistic cues,” Journal on Multimodal User Interfaces, vol. 3, no. 1-2, pp. 7–19, 2010.Google ScholarCross Ref
G. Trigeorgis, F. Ringeval, R. Brueckner, E. Marchi, M. A. Nicolaou, B. Schuller, and S. Zafeiriou, “Adieu features? end-to-end speech emotion recognition using a deep convolutional recurrent network,” in 2016 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, 2016, pp. 5200–5204.Google ScholarDigital Library
W. Lim, D. Jang, and T. Lee, “Speech emotion recognition using convolutional and recurrent neural networks,” in 2016 Asia-Pacific signal and information processing association annual summit and conference (APSIPA). IEEE, 2016, pp. 1 – 4.Google Scholar

Index Terms

SeeSpeech: See Emotions in The Speech

Index terms have been assigned to the content through auto-classification.

Recommendations

Emotions and speech disorders: do developmental stutters recognize emotional vocal expressions?
Proceedings of the Third COST 2102 international training school conference on Toward autonomous, adaptive, and context-aware multimodal interfaces: theoretical and practical issues

This paper intends to evaluate the developmental stutters' ability to recognize emotional vocal expressions. To this aim, a group of diagnosed developmental child stutters and a fluent one are tested on the perception of 5 basic vocal emotional states (...
Read More
Emotions, speech and the ASR framework
Special issue on speech and emotion

Automatic recognition and understanding of speech are crucial steps towards natural human-machine interaction. Apart from the recognition of the word sequence, the recognition of properties such as prosody, emotion tags or stress tags may be of ...
Read More
Detecting changing emotions in natural speech
IEA/AIE'12: Proceedings of the 25th international conference on Industrial Engineering and Other Applications of Applied Intelligent Systems: advanced research in applied artificial intelligence

The goal of this research was to develop a system that will automatically measure changes in the emotional state of a speaker, by analyzing his/her voice. Natural (non-acted) human speech of 77 (Dutch) speakers was collected and manually splitted into ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

ASSE '21: 2021 2nd Asia Service Sciences and Software Engineering Conference
February 2021
143 pages
ISBN:9781450389082
DOI:10.1145/3456126

Copyright © 2021 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 29 June 2021
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Deep learning
Emothion classification
Emotions in speech
Qualifiers
- research-article
- Research
- Refereed limited
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 56
  Total Downloads
- Downloads (Last 12 months)2
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

SeeSpeech: See Emotions in The Speech

ASSE '21: 2021 2nd Asia Service Sciences and Software Engineering Conference

ABSTRACT

References

Cited By

Index Terms

Recommendations

Emotions and speech disorders: do developmental stutters recognize emotional vocal expressions?

Emotions, speech and the ASR framework

Detecting changing emotions in natural speech

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

SeeSpeech: See Emotions in The Speech

ASSE '21: 2021 2nd Asia Service Sciences and Software Engineering Conference

ABSTRACT

References

Cited By

Index Terms

Recommendations

Emotions and speech disorders: do developmental stutters recognize emotional vocal expressions?

Emotions, speech and the ASR framework

Detecting changing emotions in natural speech

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media