skip to main content
10.1145/3598151.3598176acmotherconferencesArticle/Chapter ViewAbstractPublication PagesrobceConference Proceedingsconference-collections
research-article

Noise-label Suppressed Module for Speech Emotion Recognition

Authors Info & Claims
Published:18 July 2023Publication History

ABSTRACT

Speech emotion recognition (SER) has become an attractive topic owing to its broad range of applications. Segmentation is often used to increase training data for SER, but the inherited label may result in low performance. In this paper, we proposed a robust noise-label-suppressed module by relabeling the segment label to suppress the bad effects of the inherited label. Firstly, the segment of the log Mel spectrogram with deltas and delta-deltas of speech was calculated. Then, speech features were extracted by feature extraction model with 3-D data. Finally, the labels of each segment were corrected by the relabel model. Experimental results on the IEMOCAP dataset illustrate that our proposed noise-label suppressed module is superior to other advanced methods and gets robust performance.

References

  1. Carlos Busso, Murtaza Bulut, Chi-Chun Lee, Abe Kazemzadeh, Emily Mower, Samuel Kim, Jeannette N Chang, Sungbok Lee, and Shrikanth S Narayanan. 2008. IEMOCAP: Interactive emotional dyadic motion capture database. Language resources and evaluation 42, 4 (2008), 335–359.Google ScholarGoogle Scholar
  2. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770–778.Google ScholarGoogle ScholarCross RefCross Ref
  3. Xi Ma, Zhiyong Wu, Jia Jia, Mingxing Xu, Helen Meng, and Lianhong Cai. 2018. Emotion Recognition from Variable-Length Speech Segments Using Deep Learning on Spectrograms. In Interspeech. 3683–3687.Google ScholarGoogle Scholar
  4. Sandeep Kumar Pandey, Hanumant Singh Shekhawat, and SRM Prasanna. 2022. Attention gated tensor neural network architectures for speech emotion recognition. Biomedical Signal Processing and Control 71 (2022), 103173.Google ScholarGoogle ScholarCross RefCross Ref
  5. Achintya Kumar Sarkar, Zheng-Hua Tan, Hao Tang, Suwon Shon, and James Glass. 2019. Time-contrastive learning based deep bottleneck features for text-dependent speaker verification. IEEE/ACM Transactions on Audio, Speech, and Language Processing 27, 8 (2019), 1267–1279.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Aharon Satt, Shai Rozenberg, and Ron Hoory. 2017. Efficient Emotion Recognition from Speech Using Deep Learning on Spectrograms. In Interspeech. 1089–1093.Google ScholarGoogle Scholar
  7. Haipeng Wang, Tan Lee, Cheung-Chi Leung, Bin Ma, and Haizhou Li. 2015. Acoustic segment modeling with spectral clustering methods. IEEE/ACM Transactions on Audio, Speech, and Language Processing 23, 2 (2015), 264–277.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Kai Wang, Xiaojiang Peng, Jianfei Yang, Shijian Lu, and Yu Qiao. 2020. Suppressing uncertainties for large-scale facial expression recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 6897–6906.Google ScholarGoogle ScholarCross RefCross Ref
  9. Xixin Wu, Yuewen Cao, Hui Lu, Songxiang Liu, Disong Wang, Zhiyong Wu, Xunying Liu, and Helen M Meng. 2021. Speech Emotion Recognition using Sequential Capsule Networks. IEEE/ACM Transactions on Audio, Speech, and Language Processing (2021).Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Shiqing Zhang, Shiliang Zhang, Tiejun Huang, and Wen Gao. 2017. Speech emotion recognition using deep convolutional neural network and discriminant temporal pyramid matching. IEEE Transactions on Multimedia 20, 6 (2017), 1576–1590.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Noise-label Suppressed Module for Speech Emotion Recognition

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Other conferences
            RobCE '23: Proceedings of the 2023 3rd International Conference on Robotics and Control Engineering
            May 2023
            255 pages
            ISBN:9781450398107
            DOI:10.1145/3598151

            Copyright © 2023 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 18 July 2023

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article
            • Research
            • Refereed limited
          • Article Metrics

            • Downloads (Last 12 months)28
            • Downloads (Last 6 weeks)3

            Other Metrics

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          HTML Format

          View this article in HTML Format .

          View HTML Format