research-article

Noise-label Suppressed Module for Speech Emotion Recognition

Authors:
Xingcan Liang

Hefei Institutes of Physical Science, Chinese Academy of Sciences, China and University of Science and Technolog of China, China

Hefei Institutes of Physical Science, Chinese Academy of Sciences, China and University of Science and Technolog of China, China

0000-0002-3407-2905
View Profile

,
Linsen Xu

College of Mechanical and Electrical Engineering, Hohai University, China and Suzhou Research Institute, Hohai University, China

College of Mechanical and Electrical Engineering, Hohai University, China and Suzhou Research Institute, Hohai University, China

0000-0001-6951-5633
View Profile

,
Zhipeng Liu

Hefei Institutes of Physical Science, Chinese Academy of Sciences Hefei, China and University of Science and Technolog of China, China

Hefei Institutes of Physical Science, Chinese Academy of Sciences Hefei, China and University of Science and Technolog of China, China

0000-0001-8436-4925
View Profile

,
Xiang Sui

Changzhou College of Information Technology, China and University of Science and Technolog of China, China

Changzhou College of Information Technology, China and University of Science and Technolog of China, China

0009-0009-4173-7520
View Profile

,
Jinfu Liu

Changzhou Vocational Institute of Industry Technology, China and Hefei Institutes of Physical Science, Chinese Academy of Sciences, China

Changzhou Vocational Institute of Industry Technology, China and Hefei Institutes of Physical Science, Chinese Academy of Sciences, China

0000-0002-0097-3529
View Profile

RobCE '23: Proceedings of the 2023 3rd International Conference on Robotics and Control EngineeringMay 2023Pages 148–152https://doi.org/10.1145/3598151.3598176

Published:18 July 2023Publication History

RobCE '23: Proceedings of the 2023 3rd International Conference on Robotics and Control Engineering

Pages 148–152

ABSTRACT

Speech emotion recognition (SER) has become an attractive topic owing to its broad range of applications. Segmentation is often used to increase training data for SER, but the inherited label may result in low performance. In this paper, we proposed a robust noise-label-suppressed module by relabeling the segment label to suppress the bad effects of the inherited label. Firstly, the segment of the log Mel spectrogram with deltas and delta-deltas of speech was calculated. Then, speech features were extracted by feature extraction model with 3-D data. Finally, the labels of each segment were corrected by the relabel model. Experimental results on the IEMOCAP dataset illustrate that our proposed noise-label suppressed module is superior to other advanced methods and gets robust performance.

References

Carlos Busso, Murtaza Bulut, Chi-Chun Lee, Abe Kazemzadeh, Emily Mower, Samuel Kim, Jeannette N Chang, Sungbok Lee, and Shrikanth S Narayanan. 2008. IEMOCAP: Interactive emotional dyadic motion capture database. Language resources and evaluation 42, 4 (2008), 335–359.Google Scholar
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770–778.Google ScholarCross Ref
Xi Ma, Zhiyong Wu, Jia Jia, Mingxing Xu, Helen Meng, and Lianhong Cai. 2018. Emotion Recognition from Variable-Length Speech Segments Using Deep Learning on Spectrograms. In Interspeech. 3683–3687.Google Scholar
Sandeep Kumar Pandey, Hanumant Singh Shekhawat, and SRM Prasanna. 2022. Attention gated tensor neural network architectures for speech emotion recognition. Biomedical Signal Processing and Control 71 (2022), 103173.Google ScholarCross Ref
Achintya Kumar Sarkar, Zheng-Hua Tan, Hao Tang, Suwon Shon, and James Glass. 2019. Time-contrastive learning based deep bottleneck features for text-dependent speaker verification. IEEE/ACM Transactions on Audio, Speech, and Language Processing 27, 8 (2019), 1267–1279.Google ScholarDigital Library
Aharon Satt, Shai Rozenberg, and Ron Hoory. 2017. Efficient Emotion Recognition from Speech Using Deep Learning on Spectrograms. In Interspeech. 1089–1093.Google Scholar
Haipeng Wang, Tan Lee, Cheung-Chi Leung, Bin Ma, and Haizhou Li. 2015. Acoustic segment modeling with spectral clustering methods. IEEE/ACM Transactions on Audio, Speech, and Language Processing 23, 2 (2015), 264–277.Google ScholarDigital Library
Kai Wang, Xiaojiang Peng, Jianfei Yang, Shijian Lu, and Yu Qiao. 2020. Suppressing uncertainties for large-scale facial expression recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 6897–6906.Google ScholarCross Ref
Xixin Wu, Yuewen Cao, Hui Lu, Songxiang Liu, Disong Wang, Zhiyong Wu, Xunying Liu, and Helen M Meng. 2021. Speech Emotion Recognition using Sequential Capsule Networks. IEEE/ACM Transactions on Audio, Speech, and Language Processing (2021).Google ScholarDigital Library
Shiqing Zhang, Shiliang Zhang, Tiejun Huang, and Wen Gao. 2017. Speech emotion recognition using deep convolutional neural network and discriminant temporal pyramid matching. IEEE Transactions on Multimedia 20, 6 (2017), 1576–1590.Google ScholarCross Ref

Index Terms

Noise-label Suppressed Module for Speech Emotion Recognition
1. Computer systems organization
  1. Dependable and fault-tolerant systems and networks
    1. Redundancy
  2. Embedded and cyber-physical systems
    1. Embedded systems
    2. Robotics
2. Networks
  1. Network properties
    1. Network reliability

Recommendations

Speech emotion recognition based on optimized deep features of dual-channel complementary spectrogram
Abstract
Speech emotion recognition (SER) is an essential field of artificial intelligence. Although the Mel spectrogram is commonly used in SER, it emphasizes low-frequency emotional components. In this paper, we propose VMD-Teager-Mel (VTMel) ...
Highlights
- A VTMel spectrogram that supplements the Mel spectrogram is proposed, highlighting high-frequency components.
Read More
Synthesized speech for model training in cross-corpus recognition of human emotion

Recognizing speakers in emotional conditions remains a challenging issue, since speaker states such as emotion affect the acoustic parameters used in typical speaker recognition systems. Thus, it is believed that knowledge of the current speaker emotion ...
Read More
Speech Emotion Recognition by Conventional Machine Learning and Deep Learning
Hybrid Artificial Intelligent Systems
Abstract
This paper reports experimental results of speech emotion recognition by conventional machine learning methods and deep learning techniques. We use a selection of mel frequency cepstral coefficients (MFCCs) as features for the conventional machine ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
RobCE '23: Proceedings of the 2023 3rd International Conference on Robotics and Control Engineering
May 2023
255 pages
ISBN:9781450398107
DOI:10.1145/3598151
Editors:
Aiguo Song,
Maki Habib
Copyright © 2023 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 18 July 2023
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Mel spectrogram
deep learning
noise-label suppress
speech emotion recognition
Qualifiers
- research-article
- Research
- Refereed limited
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 28
  Total Downloads
- Downloads (Last 12 months)28
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Noise-label Suppressed Module for Speech Emotion Recognition

RobCE '23: Proceedings of the 2023 3rd International Conference on Robotics and Control Engineering

ABSTRACT

References

Cited By

Index Terms

Recommendations

Speech emotion recognition based on optimized deep features of dual-channel complementary spectrogram

Synthesized speech for model training in cross-corpus recognition of human emotion

Speech Emotion Recognition by Conventional Machine Learning and Deep Learning

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

Noise-label Suppressed Module for Speech Emotion Recognition

RobCE '23: Proceedings of the 2023 3rd International Conference on Robotics and Control Engineering

ABSTRACT

References

Cited By

Index Terms

Recommendations

Speech emotion recognition based on optimized deep features of dual-channel complementary spectrogram

Synthesized speech for model training in cross-corpus recognition of human emotion

Speech Emotion Recognition by Conventional Machine Learning and Deep Learning

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media