ABSTRACT
Abstract—Sentiment analysis for emotion recognition from the speech is the most effective method for interaction of human with machines. It has obtained adequate popularity in present days with implementations in social media, medical field, traffic, customer review, lie detection, carboard system and many more. Numerous methods such as artificial neural network (ANN), recurrent neural network (RNN), and convolution neural network (CNN) are suggested to recognize sentiments from speech. In this paper, we introduce a model with using 1-dimensional CNN consisting of 7 sets of 1D convolution layers, 3 fully connected layers, and an output layer. Acoustic features are extracted from the audio files using different feature extraction technique. The paper considers wave plot as well as spectrogram related features. For increasing data points, data augmentation technique is used, which has helped to improve the classification accuracy. The experimental results validates that the proposed model has performed better in comparison to the existing methodologies.
- S. Mirsamadi, E. Barsoum and C. Zhang, "Automatic speech emotion recognition using recurrent neural networks with local attention," 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2017, pp. 2227-2231, doi: 10.1109/ICASSP.2017.7952552.Google ScholarDigital Library
- W. Q. Zheng, J. S. Yu, Y. X. Zou. "An experimental study of speech emotion recognition based on deep convolutional neural networks" , 2015 International Conference on Affective Computing and Intelligent Interaction (ACII), 2015Google ScholarDigital Library
- Huang, Zhengwei, "Speech emotion recognition using CNN." Proceedings of the 22nd ACM international conference on Multimedia. 2014.Google Scholar
- Han, Kun, Dong Yu and Ivan Tashev. “Speech emotion recognition using deep neural network and extreme learning machine.” INTERSPEECH (2014).Google Scholar
- Ruhul Amin Khalil, Edward Jones, Mohammad Inayatullah Babar, Tariqullah Jan, Mohammad Haseeb Zafar, Thamer Alhussain. "Speech Emotion Recognition Using Deep Learning Techniques: A Review" , IEEE Access, 2019Google Scholar
- Jianfeng Zhao, Xia Mao, Lijiang Chen. "Speech emotion recognition using deep 1D & 2D CNN LSTM networks" , Biomedical Signal Processing and Control, 2019Google ScholarCross Ref
- Byun, S.-W.; Lee, S.-P. A Study on a Speech Emotion Recognition System with Effective Acoustic Features Using Deep Learning Algorithms. Appl. Sci. 2021, 11, 1890. https://doi.org/10.3390/ app11041890Google ScholarCross Ref
- Shiqing Zhang, Shiliang Zhang, Tiejun Huang, Wen Gao. "Speech Emotion Recognition Using Deep Convolutional Neural Network and Discriminant Temporal Pyramid Matching" , IEEE Transactions on Multimedia, 2018Google Scholar
- B. Mocanu and R. Tapu, "Speech Emotion Recognition using GhostVLAD and Sentiment Metric Learning," 2021 12th International Symposium on Image and Signal Processing and Analysis (ISPA), 2021, pp. 126-130, doi: 10.1109/ISPA52656.2021.9552068.Google ScholarCross Ref
- LIVINGSTONE, S., 2022. RAVDESS Emotional speech audio. [online] Kaggle.com. Available at: https://www.kaggle.com/datasets/uwrfkaggler/ravdess-emotional-speech-audioGoogle Scholar
- Nwe, Tin Lay, Say Wei Foo and Liyanage C. De Silva. “Speech emotion recognition using hidden Markov models.” Speech Commun. 41 (2003): 603-623.Google Scholar
- Lim, Wootaek, Dae-young Jang and Taejin Lee. “Speech emotion recognition using convolutional and Recurrent Neural Networks.” 2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA) (2016): 1-4.Google Scholar
- Ingale, Ashish B., and D. S. Chaudhari. "Speech emotion recognition." International Journal of Soft Computing and Engineering (IJSCE) 2.1 (2012): 235-238.Google Scholar
- M. Li et al., "Contrastive Unsupervised Learning for Speech Emotion Recognition," ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2021, pp. 6329-6333, doi: 10.1109/ICASSP39728.2021.9413910.Google ScholarCross Ref
- Qirong Mao, Ming Dong, Zhengwei Huang, Yongzhao Zhan. "Learning Salient Features for Speech Emotion Recognition Using Convolutional Neural Networks" , IEEE Transactions on Multimedia, 2014Google ScholarCross Ref
- LOK, E., 2022. Toronto emotional speech set (TESS). [online] Kaggle.com. Available at: https://www.kaggle.com/datasets/ejlok1/toronto-emotional-speech-set-tessGoogle Scholar
- Nicholson, Joy, Kazuhiko Takahashi, and Ryohei Nakatsu. "Emotion recognition in speech using neural networks." Neural computing & applications 9.4 (2000): 290-296.Google Scholar
- M. Gokilavani, H. Katakam, S. A. Basheer and P. Srinivas, "Ravdness, Crema-D, Tess Based Algorithm for Emotion Recognition Using Speech," 2022 4th International Conference on Smart Systems and Inventive Technology (ICSSIT), 2022, pp. 1625-1631, doi: 10.1109/ICSSIT53264.2022.9716313.Google ScholarCross Ref
- Yulan Li, Charlesetta Baidoo, Ting Cai, Goodlet A. Kusi. "Speech Emotion Recognition Using 1D CNN with No Attention" , 2019 23rd International Computer Science and Engineering Conference (ICSEC), 2019Google Scholar
- Mustaqeem and Soonil Kwon. “MLT-DNet: Speech emotion recognition using 1D dilated CNN based on multi-learning trick approach.” Expert Syst. Appl. 167 (2021): 114177.Google Scholar
- Krishna, D. N., and Ankita Patil. "Multimodal Emotion Recognition Using Cross-Modal Attention and 1D Convolutional Neural Networks." Interspeech. 2020.Google Scholar
- LOK, E., 2022. CREMA-D. [online] Kaggle.com. Available at: <https://www.kaggle.com/datasets/ejlok1/cremad>Google Scholar
- A. A. A. Zamil, S. Hasan, S. M. Jannatul Baki, J. M. Adam and I. Zaman, "Emotion Detection from Speech Signals using Voting Mechanism on Classified Frames," 2019 International Conference on Robotics,Electrical and Signal Processing Techniques (ICREST), 2019, pp. 281-285, doi: 10.1109/ICREST.2019.8644168.Google ScholarCross Ref
- Alaa Hamouda, Mahmoud Marei, and Mohamed Rohaim, "Building Machine Learning Based Senti-word Lexicon for Sentiment Analysis," Journal of Advances in Information Technology, Vol. 2, No. 4, pp. 199-203, November, 2011.doi:10.4304/jait.2.4.199-203Google ScholarCross Ref
- Xiaoyi Zhao and Yukio Ohsawa, "Sentiment Analysis on the Online Reviews Based on Hidden Markov Model," Vol. 9, No. 2, pp. 33-38, May 2018. doi: 10.12720/jait.9.2.33-38Google ScholarCross Ref
- H K Darshan, Aditya R Shankar, B S Harish, and Keerthi Kumar H M, "Exploiting RLPI for Sentiment Analysis on Movie Reviews," Journal of Advances in Information Technology, Vol. 10, No. 1, pp. 14-19, February 2019. doi: 10.12720/jait.10.1.14-19Google ScholarCross Ref
- Mohammad Darwich, Shahrul Azman Mohd Noah, Nazlia Omar, Nurul Aida Osman, and Ibrahim Said Ahmad, "Quantifying the Natural Sentiment Strength of Polar Term Senses Using Semantic Gloss Information and Degree Adverbs," Journal of Advances in Information Technology, Vol. 11, No. 3, pp. 109-118, August 2020. doi: 10.12720/jait.11.3.109-118.Google ScholarCross Ref
Index Terms
- Sentiment Analysis from Speech Signals using Convolution Neural Network
Recommendations
Multilingual Speech Sentiment Recognition Using Spiking Neural Networks
Big Data and Artificial IntelligenceAbstractSpeech sentiment and emotion recognition has grown significantly as a research field in recent years as it has potential uses in a variety of domains. Multilingual speech sentiment recognition still remains a challenging task due to the cultural ...
Evaluating Acoustic Feature Maps in 2D-CNN for Speaker Identification
ICMLC '19: Proceedings of the 2019 11th International Conference on Machine Learning and ComputingThis paper presents a study evaluating different acoustic feature map representations in two-dimensional convolutional neural networks (2D-CNN) on the speech dataset for various speech-related activities. Specifically, the task involves identifying ...
Robust Arabic speech recognition in noisy environments using prosodic features and formant
This paper investigates the contribution of formants and prosodic features such as pitch and energy in Arabic speech recognition under real-life conditions. Our speech recognition system based on Hidden Markov Models (HMMs) is implemented using the HTK ...
Comments