research-article

Sentiment Analysis from Speech Signals using Convolution Neural Network

Authors:
Rahul Kumar Chaurasiya

Department of ECE, Maulana Azad National Institute of Technology, India

Department of ECE, Maulana Azad National Institute of Technology, India

0000-0002-0911-0869
View Profile

,
Nettem Sri Priya

Department of ECE, Maulana Azad National Institute of Technology, India

Department of ECE, Maulana Azad National Institute of Technology, India

0009-0000-2128-2156
View Profile

,
Kothapally Gnana Praneeth

Department of ECE, Maulana Azad National Institute of Technology, India

Department of ECE, Maulana Azad National Institute of Technology, India

0009-0004-8969-5757
View Profile

,
Gujjarlapudi Varun Kumar

Department of ECE, Maulana Azad National Institute of Technology, India

Department of ECE, Maulana Azad National Institute of Technology, India

0009-0001-7674-3615
View Profile

,
Matsa Jahnavi

Department of ECE, Maulana Azad National Institute of Technology, India

Department of ECE, Maulana Azad National Institute of Technology, India

0009-0003-1678-4882
View Profile

,
Tadigadapa Pranay Teja

Department of ECE, Maulana Azad National Institute of Technology, India

Department of ECE, Maulana Azad National Institute of Technology, India

0009-0003-8866-5137
View Profile

ICGSP '23: Proceedings of the 2023 7th International Conference on Graphics and Signal ProcessingJune 2023Pages 42–49https://doi.org/10.1145/3606283.3606290

Published:11 August 2023Publication History

ICGSP '23: Proceedings of the 2023 7th International Conference on Graphics and Signal Processing

Pages 42–49

ABSTRACT

Abstract—Sentiment analysis for emotion recognition from the speech is the most effective method for interaction of human with machines. It has obtained adequate popularity in present days with implementations in social media, medical field, traffic, customer review, lie detection, carboard system and many more. Numerous methods such as artificial neural network (ANN), recurrent neural network (RNN), and convolution neural network (CNN) are suggested to recognize sentiments from speech. In this paper, we introduce a model with using 1-dimensional CNN consisting of 7 sets of 1D convolution layers, 3 fully connected layers, and an output layer. Acoustic features are extracted from the audio files using different feature extraction technique. The paper considers wave plot as well as spectrogram related features. For increasing data points, data augmentation technique is used, which has helped to improve the classification accuracy. The experimental results validates that the proposed model has performed better in comparison to the existing methodologies.

References

S. Mirsamadi, E. Barsoum and C. Zhang, "Automatic speech emotion recognition using recurrent neural networks with local attention," 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2017, pp. 2227-2231, doi: 10.1109/ICASSP.2017.7952552.Google ScholarDigital Library
W. Q. Zheng, J. S. Yu, Y. X. Zou. "An experimental study of speech emotion recognition based on deep convolutional neural networks" , 2015 International Conference on Affective Computing and Intelligent Interaction (ACII), 2015Google ScholarDigital Library
Huang, Zhengwei, "Speech emotion recognition using CNN." Proceedings of the 22nd ACM international conference on Multimedia. 2014.Google Scholar
Han, Kun, Dong Yu and Ivan Tashev. “Speech emotion recognition using deep neural network and extreme learning machine.” INTERSPEECH (2014).Google Scholar
Ruhul Amin Khalil, Edward Jones, Mohammad Inayatullah Babar, Tariqullah Jan, Mohammad Haseeb Zafar, Thamer Alhussain. "Speech Emotion Recognition Using Deep Learning Techniques: A Review" , IEEE Access, 2019Google Scholar
Jianfeng Zhao, Xia Mao, Lijiang Chen. "Speech emotion recognition using deep 1D & 2D CNN LSTM networks" , Biomedical Signal Processing and Control, 2019Google ScholarCross Ref
Byun, S.-W.; Lee, S.-P. A Study on a Speech Emotion Recognition System with Effective Acoustic Features Using Deep Learning Algorithms. Appl. Sci. 2021, 11, 1890. https://doi.org/10.3390/ app11041890Google ScholarCross Ref
Shiqing Zhang, Shiliang Zhang, Tiejun Huang, Wen Gao. "Speech Emotion Recognition Using Deep Convolutional Neural Network and Discriminant Temporal Pyramid Matching" , IEEE Transactions on Multimedia, 2018Google Scholar
B. Mocanu and R. Tapu, "Speech Emotion Recognition using GhostVLAD and Sentiment Metric Learning," 2021 12th International Symposium on Image and Signal Processing and Analysis (ISPA), 2021, pp. 126-130, doi: 10.1109/ISPA52656.2021.9552068.Google ScholarCross Ref
LIVINGSTONE, S., 2022. RAVDESS Emotional speech audio. [online] Kaggle.com. Available at: https://www.kaggle.com/datasets/uwrfkaggler/ravdess-emotional-speech-audioGoogle Scholar
Nwe, Tin Lay, Say Wei Foo and Liyanage C. De Silva. “Speech emotion recognition using hidden Markov models.” Speech Commun. 41 (2003): 603-623.Google Scholar
Lim, Wootaek, Dae-young Jang and Taejin Lee. “Speech emotion recognition using convolutional and Recurrent Neural Networks.” 2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA) (2016): 1-4.Google Scholar
Ingale, Ashish B., and D. S. Chaudhari. "Speech emotion recognition." International Journal of Soft Computing and Engineering (IJSCE) 2.1 (2012): 235-238.Google Scholar
M. Li et al., "Contrastive Unsupervised Learning for Speech Emotion Recognition," ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2021, pp. 6329-6333, doi: 10.1109/ICASSP39728.2021.9413910.Google ScholarCross Ref
Qirong Mao, Ming Dong, Zhengwei Huang, Yongzhao Zhan. "Learning Salient Features for Speech Emotion Recognition Using Convolutional Neural Networks" , IEEE Transactions on Multimedia, 2014Google ScholarCross Ref
LOK, E., 2022. Toronto emotional speech set (TESS). [online] Kaggle.com. Available at: https://www.kaggle.com/datasets/ejlok1/toronto-emotional-speech-set-tessGoogle Scholar
Nicholson, Joy, Kazuhiko Takahashi, and Ryohei Nakatsu. "Emotion recognition in speech using neural networks." Neural computing & applications 9.4 (2000): 290-296.Google Scholar
M. Gokilavani, H. Katakam, S. A. Basheer and P. Srinivas, "Ravdness, Crema-D, Tess Based Algorithm for Emotion Recognition Using Speech," 2022 4th International Conference on Smart Systems and Inventive Technology (ICSSIT), 2022, pp. 1625-1631, doi: 10.1109/ICSSIT53264.2022.9716313.Google ScholarCross Ref
Yulan Li, Charlesetta Baidoo, Ting Cai, Goodlet A. Kusi. "Speech Emotion Recognition Using 1D CNN with No Attention" , 2019 23rd International Computer Science and Engineering Conference (ICSEC), 2019Google Scholar
Mustaqeem and Soonil Kwon. “MLT-DNet: Speech emotion recognition using 1D dilated CNN based on multi-learning trick approach.” Expert Syst. Appl. 167 (2021): 114177.Google Scholar
Krishna, D. N., and Ankita Patil. "Multimodal Emotion Recognition Using Cross-Modal Attention and 1D Convolutional Neural Networks." Interspeech. 2020.Google Scholar
LOK, E., 2022. CREMA-D. [online] Kaggle.com. Available at: <https://www.kaggle.com/datasets/ejlok1/cremad>Google Scholar
A. A. A. Zamil, S. Hasan, S. M. Jannatul Baki, J. M. Adam and I. Zaman, "Emotion Detection from Speech Signals using Voting Mechanism on Classified Frames," 2019 International Conference on Robotics,Electrical and Signal Processing Techniques (ICREST), 2019, pp. 281-285, doi: 10.1109/ICREST.2019.8644168.Google ScholarCross Ref
Alaa Hamouda, Mahmoud Marei, and Mohamed Rohaim, "Building Machine Learning Based Senti-word Lexicon for Sentiment Analysis," Journal of Advances in Information Technology, Vol. 2, No. 4, pp. 199-203, November, 2011.doi:10.4304/jait.2.4.199-203Google ScholarCross Ref
Xiaoyi Zhao and Yukio Ohsawa, "Sentiment Analysis on the Online Reviews Based on Hidden Markov Model," Vol. 9, No. 2, pp. 33-38, May 2018. doi: 10.12720/jait.9.2.33-38Google ScholarCross Ref
H K Darshan, Aditya R Shankar, B S Harish, and Keerthi Kumar H M, "Exploiting RLPI for Sentiment Analysis on Movie Reviews," Journal of Advances in Information Technology, Vol. 10, No. 1, pp. 14-19, February 2019. doi: 10.12720/jait.10.1.14-19Google ScholarCross Ref
Mohammad Darwich, Shahrul Azman Mohd Noah, Nazlia Omar, Nurul Aida Osman, and Ibrahim Said Ahmad, "Quantifying the Natural Sentiment Strength of Polar Term Senses Using Semantic Gloss Information and Degree Adverbs," Journal of Advances in Information Technology, Vol. 11, No. 3, pp. 109-118, August 2020. doi: 10.12720/jait.11.3.109-118.Google ScholarCross Ref

Index Terms

Sentiment Analysis from Speech Signals using Convolution Neural Network
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches
      1. Neural networks

Recommendations

Multilingual Speech Sentiment Recognition Using Spiking Neural Networks
Big Data and Artificial Intelligence
Abstract
Speech sentiment and emotion recognition has grown significantly as a research field in recent years as it has potential uses in a variety of domains. Multilingual speech sentiment recognition still remains a challenging task due to the cultural ...
Read More
Evaluating Acoustic Feature Maps in 2D-CNN for Speaker Identification
ICMLC '19: Proceedings of the 2019 11th International Conference on Machine Learning and Computing

This paper presents a study evaluating different acoustic feature map representations in two-dimensional convolutional neural networks (2D-CNN) on the speech dataset for various speech-related activities. Specifically, the task involves identifying ...
Read More
Robust Arabic speech recognition in noisy environments using prosodic features and formant

This paper investigates the contribution of formants and prosodic features such as pitch and energy in Arabic speech recognition under real-life conditions. Our speech recognition system based on Hidden Markov Models (HMMs) is implemented using the HTK ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

ICGSP '23: Proceedings of the 2023 7th International Conference on Graphics and Signal Processing
June 2023
83 pages
ISBN:9798400700460
DOI:10.1145/3606283

Copyright © 2023 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 11 August 2023
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
CNN 1D
Data augmentation
MFCC
Spectrogram
Zero-crossing rate
Qualifiers
- research-article
- Research
- Refereed limited
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 50
  Total Downloads
- Downloads (Last 12 months)50
- Downloads (Last 6 weeks)4
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Sentiment Analysis from Speech Signals using Convolution Neural Network

ICGSP '23: Proceedings of the 2023 7th International Conference on Graphics and Signal Processing

ABSTRACT

References

Cited By

Index Terms

Recommendations

Multilingual Speech Sentiment Recognition Using Spiking Neural Networks

Evaluating Acoustic Feature Maps in 2D-CNN for Speaker Identification

Robust Arabic speech recognition in noisy environments using prosodic features and formant

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

Sentiment Analysis from Speech Signals using Convolution Neural Network

ICGSP '23: Proceedings of the 2023 7th International Conference on Graphics and Signal Processing

ABSTRACT

References

Cited By

Index Terms

Recommendations

Multilingual Speech Sentiment Recognition Using Spiking Neural Networks

Evaluating Acoustic Feature Maps in 2D-CNN for Speaker Identification

Robust Arabic speech recognition in noisy environments using prosodic features and formant

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media