skip to main content
10.1145/3319921.3319963acmotherconferencesArticle/Chapter ViewAbstractPublication PagesiciaiConference Proceedingsconference-collections
research-article

End-to-End Speech Emotion Recognition Based on One-Dimensional Convolutional Neural Network

Published: 15 March 2019 Publication History

Abstract

Real-time speech emotion recognition has always been a problem. To this end, we proposed an end-to-end speech emotion recognition model based on one-dimensional convolutional neural network, which contains only three convolution layers, two pooling layers and one full-connected layer. Through Adam optimization algorithm and back propagation mechanism, more discriminative features can be extracted continuously. Our model is quite simple in structure and easy to quickly complete the emotional classification task. Compared with traditional methods, there is no need to carry out the complex process of manually extracting features, and the model can automatically learn the emotional features from raw speech signals. In the emotional recognition experiments with EMODB, CASIA, IEMOCAP, and CHEAVD four speech databases, relatively high recognition rates were obtained. Experiments show that the proposed algorithm is of great benefit to the implementation of real-time speech emotion recognition.

References

[1]
Hinton G, Deng L, Yu D, Dahl GE, Mohamed A-r, Jaitly N, Senior A, Vanhoucke V, Nguyen P, Sainath TNJISpm. 2012. Deep neuraccl networks for acoustic modeling in speech recognition. The shared views of four research groups. 29, 82--97.
[2]
Abdel-Hamid O, Deng L, Yu D. 2013. Exploring convolutional neural network structures and optimization techniques for speech recognition. In Interspeech. 1173--1175.
[3]
Sak H, Senior A, Beaufays FJapa. 2014. Long short-term memory based recurrent neural network architectures for large vocabulary speech recognition. J.
[4]
Hsiao P-W, Chen C-P. 2018. Effective Attention Mechanism in Dynamic Models for Speech Emotion Recognition. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE. 2526--2530.
[5]
Badshah AM, Ahmad J, Rahim N, Baik SW. 2017. Speech emotion recognition from spectrograms with deep convolutional neural network. In Platform Technology and Service (PlatCon), 2017 International Conference on. IEEE. 1--5.
[6]
Tao F, Liu G. 2018. Advanced LSTM. A study about better time dependency modeling in emotion recognition. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE. 2906--2910.
[7]
Hsiao P-W, Chen C-P. 2018. Effective Attention Mechanism in Dynamic Models for Speech Emotion Recognition. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE. 2526--2530.
[8]
Harár P, Burget R, Dutta MK. 2017. Speech emotion recognition with deep learning. In Signal Processing and Integrated Networks (SPIN), 2017 4th International Conference on. IEEE. 137--140.
[9]
Bertero D, Fung P. 2017. A first look into a Convolutional Neural Network for speech emotion detection. In Acoustics, Speech and Signal Processing (ICASSP), 2017 IEEE International Conference on. IEEE. 5115--5119.
[10]
Peng Z, Zhu Z, Unoki M, Dang J, Akagi M. 2018. Auditory-Inspired End-to-End Speech Emotion Recognition Using 3D Convolutional Recurrent Neural Networks Based on Spectral-Temporal Representation. In 2018 IEEE International Conference on Multimedia and Expo (ICME). IEEE. 1--6.
[11]
Thomas-F Q. 2002. Discrete-Time Speech Signal Processing: Principles and Practice. Pearson Education, Inc. Prentice Hall PTR.
[12]
Anbu H. 2017. Explain the Profound in Simple Language Deep Learning: Principle Analysis and Python Practice. Publishing House of Electronics Industry.
[13]
Glorot X, Bordes A, Bengio Y. 2011. Deep sparse rectifier neural networks. In Proceedings of the fourteenth international conference on artificial intelligence and statistics. 315--323.
[14]
Rumelhart DE, Hinton GE, Williams RJJn. 1986. Learning representations by back-propagating errors. 323--533.
[15]
https://keras.io/zh/losses/#categorical_crossentropy{EB/OL}.
[16]
Kinga D, Adam JB. 2015. A method for stochastic optimization. In International Conference on Learning Representations (ICLR).
[17]
Burkhardt F, Paeschke A, Rolfes M, Sendlmeier WF, Weiss B. 2005. A database of German emotional speech. In Ninth European Conference on Speech Communication and Technology.
[18]
http://www.chineseldc.org/resource_info.php?rid=76{EB/OL}.
[19]
Busso C, Bulut M, Lee C C, et al. 2008. IEMOCAP: Interactive emotional dyadic motion capture database. J. Language resources and evaluation. 42(4), 335.
[20]
Li Y, Tao J, Chao L, et al. 2017. CHEAVD: a Chinese natural emotional audio--visual database. J. Journal of Ambient Intelligence and Humanized Computing. 8(6), 913--924.
[21]
Chollet F. 2015. Keras: Deep learning library for theano and tensorflow. J. URL: https://keras. io/k, 7(8).
[22]
Prechelt LJNN. 1998. Automatic early stopping using cross validation: quantifying the criteria. 11, 761--767.

Cited By

View all
  • (2024)TLBT-Net: A Multi-scale Cross-fusion Model for Speech Emotion RecognitionProceedings of the International Conference on Modeling, Natural Language Processing and Machine Learning10.1145/3677779.3677819(245-250)Online publication date: 17-May-2024
  • (2024)Emotion and Sentiment Analysis in Dialogue: A Multimodal Strategy Employing the BERT Model2024 Parul International Conference on Engineering and Technology (PICET)10.1109/PICET60765.2024.10716061(1-7)Online publication date: 3-May-2024
  • (2024)Speech Emotion Classification Based on Dynamic Graph Attention Network2024 5th International Conference on Electronic Communication and Artificial Intelligence (ICECAI)10.1109/ICECAI62591.2024.10675234(328-331)Online publication date: 31-May-2024
  • Show More Cited By

Index Terms

  1. End-to-End Speech Emotion Recognition Based on One-Dimensional Convolutional Neural Network

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    ICIAI '19: Proceedings of the 2019 3rd International Conference on Innovation in Artificial Intelligence
    March 2019
    279 pages
    ISBN:9781450361286
    DOI:10.1145/3319921
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    In-Cooperation

    • Xi'an Jiaotong-Liverpool University: Xi'an Jiaotong-Liverpool University
    • University of Texas-Dallas: University of Texas-Dallas

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 15 March 2019

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Convolutional Neural Network
    2. End-to-End
    3. Speech Emotion Recognition

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Funding Sources

    • the National Natural Science Foundation of China
    • Program for Liaoning Distinguished Professor, the 13th Five-Year Plan of Education Science in Liaoning Province
    • Program for Changjiang Scholars and Innovative Research Team in University
    • Program for Dalian High-level Talent?s Innovation
    • The Liaoning Province Doctor Startup Fund
    • Innovation Fund Plan for Dalian Science and Technology

    Conference

    ICIAI 2019

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)16
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 13 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)TLBT-Net: A Multi-scale Cross-fusion Model for Speech Emotion RecognitionProceedings of the International Conference on Modeling, Natural Language Processing and Machine Learning10.1145/3677779.3677819(245-250)Online publication date: 17-May-2024
    • (2024)Emotion and Sentiment Analysis in Dialogue: A Multimodal Strategy Employing the BERT Model2024 Parul International Conference on Engineering and Technology (PICET)10.1109/PICET60765.2024.10716061(1-7)Online publication date: 3-May-2024
    • (2024)Speech Emotion Classification Based on Dynamic Graph Attention Network2024 5th International Conference on Electronic Communication and Artificial Intelligence (ICECAI)10.1109/ICECAI62591.2024.10675234(328-331)Online publication date: 31-May-2024
    • (2024)Self-supervised Learning for Speech Emotion Recognition Task Using Audio-visual Features and Distil Hubert Model on BAVED and RAVDESS DatabasesJournal of Systems Science and Systems Engineering10.1007/s11518-024-5607-y33:5(576-606)Online publication date: 29-May-2024
    • (2023)GCF2-Net: global-aware cross-modal feature fusion network for speech emotion recognitionFrontiers in Neuroscience10.3389/fnins.2023.118313217Online publication date: 4-May-2023
    • (2023)Speech Emotion Recognition Using Global-Aware Cross-Modal Feature Fusion NetworkAdvanced Intelligent Computing Technology and Applications10.1007/978-981-99-4742-3_17(211-221)Online publication date: 30-Jul-2023
    • (2023)Improvement of Speech Emotion Recognition by Deep Convolutional Neural Network and Speech FeaturesThird Congress on Intelligent Systems10.1007/978-981-19-9225-4_10(117-129)Online publication date: 12-Mar-2023
    • (2022)Feature-enhanced embedding learning for heterogeneous collaborative filteringNeural Computing and Applications10.1007/s00521-022-07490-034:21(18741-18756)Online publication date: 1-Nov-2022
    • (2021)Improved Speech Emotion Recognition using Transfer Learning and Spectrogram AugmentationProceedings of the 2021 International Conference on Multimodal Interaction10.1145/3462244.3481003(645-652)Online publication date: 18-Oct-2021
    • (2021)Multimodal Emotion Recognition Fusion Analysis Adapting BERT With Heterogeneous Feature UnificationIEEE Access10.1109/ACCESS.2021.30927359(94557-94572)Online publication date: 2021
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media