skip to main content
10.1145/3606283.3606290acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicgspConference Proceedingsconference-collections
research-article

Sentiment Analysis from Speech Signals using Convolution Neural Network

Published:11 August 2023Publication History

ABSTRACT

Abstract—Sentiment analysis for emotion recognition from the speech is the most effective method for interaction of human with machines. It has obtained adequate popularity in present days with implementations in social media, medical field, traffic, customer review, lie detection, carboard system and many more. Numerous methods such as artificial neural network (ANN), recurrent neural network (RNN), and convolution neural network (CNN) are suggested to recognize sentiments from speech. In this paper, we introduce a model with using 1-dimensional CNN consisting of 7 sets of 1D convolution layers, 3 fully connected layers, and an output layer. Acoustic features are extracted from the audio files using different feature extraction technique. The paper considers wave plot as well as spectrogram related features. For increasing data points, data augmentation technique is used, which has helped to improve the classification accuracy. The experimental results validates that the proposed model has performed better in comparison to the existing methodologies.

References

  1. S. Mirsamadi, E. Barsoum and C. Zhang, "Automatic speech emotion recognition using recurrent neural networks with local attention," 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2017, pp. 2227-2231, doi: 10.1109/ICASSP.2017.7952552.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. W. Q. Zheng, J. S. Yu, Y. X. Zou. "An experimental study of speech emotion recognition based on deep convolutional neural networks" , 2015 International Conference on Affective Computing and Intelligent Interaction (ACII), 2015Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Huang, Zhengwei, "Speech emotion recognition using CNN." Proceedings of the 22nd ACM international conference on Multimedia. 2014.Google ScholarGoogle Scholar
  4. Han, Kun, Dong Yu and Ivan Tashev. “Speech emotion recognition using deep neural network and extreme learning machine.” INTERSPEECH (2014).Google ScholarGoogle Scholar
  5. Ruhul Amin Khalil, Edward Jones, Mohammad Inayatullah Babar, Tariqullah Jan, Mohammad Haseeb Zafar, Thamer Alhussain. "Speech Emotion Recognition Using Deep Learning Techniques: A Review" , IEEE Access, 2019Google ScholarGoogle Scholar
  6. Jianfeng Zhao, Xia Mao, Lijiang Chen. "Speech emotion recognition using deep 1D & 2D CNN LSTM networks" , Biomedical Signal Processing and Control, 2019Google ScholarGoogle ScholarCross RefCross Ref
  7. Byun, S.-W.; Lee, S.-P. A Study on a Speech Emotion Recognition System with Effective Acoustic Features Using Deep Learning Algorithms. Appl. Sci. 2021, 11, 1890. https://doi.org/10.3390/ app11041890Google ScholarGoogle ScholarCross RefCross Ref
  8. Shiqing Zhang, Shiliang Zhang, Tiejun Huang, Wen Gao. "Speech Emotion Recognition Using Deep Convolutional Neural Network and Discriminant Temporal Pyramid Matching" , IEEE Transactions on Multimedia, 2018Google ScholarGoogle Scholar
  9. B. Mocanu and R. Tapu, "Speech Emotion Recognition using GhostVLAD and Sentiment Metric Learning," 2021 12th International Symposium on Image and Signal Processing and Analysis (ISPA), 2021, pp. 126-130, doi: 10.1109/ISPA52656.2021.9552068.Google ScholarGoogle ScholarCross RefCross Ref
  10. LIVINGSTONE, S., 2022. RAVDESS Emotional speech audio. [online] Kaggle.com. Available at: https://www.kaggle.com/datasets/uwrfkaggler/ravdess-emotional-speech-audioGoogle ScholarGoogle Scholar
  11. Nwe, Tin Lay, Say Wei Foo and Liyanage C. De Silva. “Speech emotion recognition using hidden Markov models.” Speech Commun. 41 (2003): 603-623.Google ScholarGoogle Scholar
  12. Lim, Wootaek, Dae-young Jang and Taejin Lee. “Speech emotion recognition using convolutional and Recurrent Neural Networks.” 2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA) (2016): 1-4.Google ScholarGoogle Scholar
  13. Ingale, Ashish B., and D. S. Chaudhari. "Speech emotion recognition." International Journal of Soft Computing and Engineering (IJSCE) 2.1 (2012): 235-238.Google ScholarGoogle Scholar
  14. M. Li et al., "Contrastive Unsupervised Learning for Speech Emotion Recognition," ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2021, pp. 6329-6333, doi: 10.1109/ICASSP39728.2021.9413910.Google ScholarGoogle ScholarCross RefCross Ref
  15. Qirong Mao, Ming Dong, Zhengwei Huang, Yongzhao Zhan. "Learning Salient Features for Speech Emotion Recognition Using Convolutional Neural Networks" , IEEE Transactions on Multimedia, 2014Google ScholarGoogle ScholarCross RefCross Ref
  16. LOK, E., 2022. Toronto emotional speech set (TESS). [online] Kaggle.com. Available at: https://www.kaggle.com/datasets/ejlok1/toronto-emotional-speech-set-tessGoogle ScholarGoogle Scholar
  17. Nicholson, Joy, Kazuhiko Takahashi, and Ryohei Nakatsu. "Emotion recognition in speech using neural networks." Neural computing & applications 9.4 (2000): 290-296.Google ScholarGoogle Scholar
  18. M. Gokilavani, H. Katakam, S. A. Basheer and P. Srinivas, "Ravdness, Crema-D, Tess Based Algorithm for Emotion Recognition Using Speech," 2022 4th International Conference on Smart Systems and Inventive Technology (ICSSIT), 2022, pp. 1625-1631, doi: 10.1109/ICSSIT53264.2022.9716313.Google ScholarGoogle ScholarCross RefCross Ref
  19. Yulan Li, Charlesetta Baidoo, Ting Cai, Goodlet A. Kusi. "Speech Emotion Recognition Using 1D CNN with No Attention" , 2019 23rd International Computer Science and Engineering Conference (ICSEC), 2019Google ScholarGoogle Scholar
  20. Mustaqeem and Soonil Kwon. “MLT-DNet: Speech emotion recognition using 1D dilated CNN based on multi-learning trick approach.” Expert Syst. Appl. 167 (2021): 114177.Google ScholarGoogle Scholar
  21. Krishna, D. N., and Ankita Patil. "Multimodal Emotion Recognition Using Cross-Modal Attention and 1D Convolutional Neural Networks." Interspeech. 2020.Google ScholarGoogle Scholar
  22. LOK, E., 2022. CREMA-D. [online] Kaggle.com. Available at: <https://www.kaggle.com/datasets/ejlok1/cremad>Google ScholarGoogle Scholar
  23. A. A. A. Zamil, S. Hasan, S. M. Jannatul Baki, J. M. Adam and I. Zaman, "Emotion Detection from Speech Signals using Voting Mechanism on Classified Frames," 2019 International Conference on Robotics,Electrical and Signal Processing Techniques (ICREST), 2019, pp. 281-285, doi: 10.1109/ICREST.2019.8644168.Google ScholarGoogle ScholarCross RefCross Ref
  24. Alaa Hamouda, Mahmoud Marei, and Mohamed Rohaim, "Building Machine Learning Based Senti-word Lexicon for Sentiment Analysis," Journal of Advances in Information Technology, Vol. 2, No. 4, pp. 199-203, November, 2011.doi:10.4304/jait.2.4.199-203Google ScholarGoogle ScholarCross RefCross Ref
  25. Xiaoyi Zhao and Yukio Ohsawa, "Sentiment Analysis on the Online Reviews Based on Hidden Markov Model," Vol. 9, No. 2, pp. 33-38, May 2018. doi: 10.12720/jait.9.2.33-38Google ScholarGoogle ScholarCross RefCross Ref
  26. H K Darshan, Aditya R Shankar, B S Harish, and Keerthi Kumar H M, "Exploiting RLPI for Sentiment Analysis on Movie Reviews," Journal of Advances in Information Technology, Vol. 10, No. 1, pp. 14-19, February 2019. doi: 10.12720/jait.10.1.14-19Google ScholarGoogle ScholarCross RefCross Ref
  27. Mohammad Darwich, Shahrul Azman Mohd Noah, Nazlia Omar, Nurul Aida Osman, and Ibrahim Said Ahmad, "Quantifying the Natural Sentiment Strength of Polar Term Senses Using Semantic Gloss Information and Degree Adverbs," Journal of Advances in Information Technology, Vol. 11, No. 3, pp. 109-118, August 2020. doi: 10.12720/jait.11.3.109-118.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Sentiment Analysis from Speech Signals using Convolution Neural Network

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      ICGSP '23: Proceedings of the 2023 7th International Conference on Graphics and Signal Processing
      June 2023
      83 pages
      ISBN:9798400700460
      DOI:10.1145/3606283

      Copyright © 2023 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 11 August 2023

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed limited
    • Article Metrics

      • Downloads (Last 12 months)50
      • Downloads (Last 6 weeks)4

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format