skip to main content

CNN-Based Models for Emotion and Sentiment Analysis Using Speech Data

Published: 23 October 2024 Publication History


The study aims to present an in-depth Sentiment Analysis (SA) grounded by the presence of emotions in the speech signals. Nowadays, all kinds of web-based applications ranging from social media platforms and video-sharing sites to e-commerce applications provide support for Human–Computer Interfaces (HCIs). These media applications allow users to share their experiences in all forms such as text, audio, video, GIF, and so on. The most natural and fundamental form of expressing oneself is through speech. Speech-Based Sentiment Analysis (SBSA) is the task of gaining insights into speech signals. It aims to classify the statement as neutral, negative, or positive. On the other hand, Speech Emotion Recognition (SER) categorizes speech signals into the following emotions: disgust, fear, sadness, anger, happiness, and neutral. It is necessary to recognize the sentiments along with the profoundness of the emotions in the speech signals. To cater to the above idea, a methodology is proposed defining a text-oriented SA model using the combination of CNN and Bi-LSTM techniques along with an embedding layer, applied to the text obtained from speech signals; achieving an accuracy of 84.49%. Also, the proposed methodology suggests an Emotion Analysis (EA) model based on the CNN technique highlighting the type of emotion present in the speech signal with an accuracy measure of 95.12%. The presented architecture can also be applied to different other domains like product review systems, video recommendation systems, education, health, security, and so on.


Fazeel Abid, Muhammad Alam, Muhammad Yasir, and Chen Li. 2019. Sentiment analysis through recurrent variants latterly on convolutional neural network of Twitter. Future Generation Computer Systems 95 (2019), 292–308.
Surekha Reddy Bandela and T. Kishore Kumar. 2017. Stressed speech emotion recognition using feature fusion of teager energy operator and MFCC. In Proceedings of the 2017 8th International Conference on Computing, Communication and Networking Technologies (ICCCNT’17). IEEE, 1–5.
Surekha Reddy Bandela and T. Kishore Kumar. 2018. Emotion recognition of stressed speech using teager energy and linear prediction features. In Proceedings of the 2018 IEEE 18th International Conference on Advanced Learning Technologies (ICALT’18). IEEE, 422–425.
John Lorenzo Bautista, Yun Kyung Lee, and Hyun Soon Shin. 2022. Speech emotion recognition based on parallel CNN-attention networks with multi-fold data augmentation. Electronics 11, 23 (2022), 3935.
Sahar E. Bou-Ghazale and John H. L. Hansen. 2000. A comparative study of traditional and newly proposed features for recognition of speech under stress. IEEE Transactions on Speech and Audio Processing 8, 4 (2000), 429–442.
Sung-Woo Byun and Seok-Pil Lee. 2021. A study on a speech emotion recognition system with effective acoustic features using deep learning algorithms. Applied Sciences 11, 4 (2021), 1890.
Victor Chang, Lian Liu, Qianwen Xu, Taiyu Li, and Ching-Hsien Hsu. 2023. An improved model for sentiment analysis on luxury hotel review. Expert Systems 40, 2 (2023), e12580.
Rahul Kumar Chaurasiya, Nettem Sri Priya, Kothapally Gnana Praneeth, Gujjarlapudi Varun Kumar, Matsa Jahnavi, and Tadigadapa Pranay Teja. 2023. Sentiment analysis from speech signals using convolution neural network. In Proceedings of the 2023 7th International Conference on Graphics and Signal Processing. 42–49.
Nan Chen and Peikang Wang. 2018. Advanced combined LSTM-CNN model for Twitter sentiment analysis. In Proceedings of the 2018 5th IEEE International Conference on Cloud Computing and Intelligence Systems (CCIS’18). IEEE, 684–687.
Ruinian Chen, Ying Zhou, and Yanmin Qian. 2017. Emotion recognition using support vector machine and deep neural network. In Proceedings of the National Conference on Man-Machine Speech Communication. Springer, 122–131.
Vladimir Chernykh and Pavel Prikhodko. 2017. Emotion recognition from speech with recurrent neural networks. arXiv:1701.08071. Retrieved from
Ravi Raj Choudhary, Gaurav Meena, and Krishna Kumar Mohbey. 2022. Speech emotion based sentiment recognition using deep neural networks. In Journal of Physics: Conference Series, Rajesh Kumar, Ashok Ray, Nishtha Kesswani, Frank Lin, Ashok Patel, Gaurav Trivedi, and N. Kishorjit Singh (Eds.). Vol. 2236. IOP Publishing, 012003.
Ronan Collobert, Jason Weston, Léon Bottou, Michael Karlen, Koray Kavukcuoglu, and Pavel Kuksa. 2011. Natural language processing (almost) from scratch. Journal of Machine Learning Research 12, ARTICLE (2011), 2493–2537.
Juan Antonio Domínguez-Jiménez, Kiara Coralia Campo-Landines, Juan C. Martínez-Santos, Enrique J. Delahoz, and Sonia H. Contreras-Ortiz. 2020. A machine learning model for emotion recognition from physiological signals. Biomedical Signal Processing and Control 55 (2020), 101646.
Florian Eyben, Klaus R. Scherer, Björn W. Schuller, Johan Sundberg, Elisabeth André, Carlos Busso, Laurence Y. Devillers, Julien Epps, Petri Laukka, Shrikanth S. Narayanan, and Khiet P. Truong. 2015. The Geneva minimalistic acoustic parameter set (GeMAPS) for voice research and affective computing. IEEE Transactions on Affective Computing 7, 2 (2015), 190–202.
Florian Eyben, Martin Wöllmer, and Björn Schuller. 2010. Opensmile: The munich versatile and fast open-source audio feature extractor. In Proceedings of the 18th ACM International Conference on Multimedia. 1459–1462.
Souraya Ezzat, Neamat El Gayar, and Moustafa M. Ghanem. 2012. Sentiment analysis of call centre audio conversations using text classification. International Journal of Computer Information Systems and Industrial Management Applications 4, 1 (2012), 619–627.
Xiaojing Fan, A. Runa, Zhili Pei, and Mingyang Jiang. 2021. An improved convolutional neural network for text classification. In Journal of Physics: Conference Series, Xiaoling Li, Yong Gao, Cuicui Ji, Songyu Pan, and Tianyun Wen (Eds.). Vol. 2066. IOP Publishing, 012091.
Usha Devi Gandhi, Priyan Malarvizhi Kumar, Gokulnath Chandra Babu, and Gayathri Karthick. 2021. Sentiment analysis on Twitter data by using convolutional neural network (CNN) and long short term memory (LSTM). Wireless Personal Communications (2021), 1–10.
Yijie Gao and Shijing Si. 2023. Label smoothing for enhanced text sentiment classification. arXiv:2312.06522. Retrieved from
Sandhya Devi Gogula, Mohamed Rahouti, Suvarna Kumar Gogula, Anitha Jalamuri, and Senthil Kumar Jagatheesaperumal. 2023. An emotion-based rating system for books using sentiment analysis and machine learning in the cloud. Applied Sciences 13, 2 (2023), 773.
Yongxin Huang, Kexin Wang, Sourav Dutta, Raj Nath Patel, Goran Glavaš, and Iryna Gurevych. 2023. AdaSent: Efficient domain-adapted sentence embeddings for few-shot classification. arXiv:2311.00408. Retrieved from
Tulika Jha, Ramisetty Kavya, Jabez Christopher, and Vasan Arunachalam. 2022. Machine learning techniques for speech emotion recognition using paralinguistic acoustic features. International Journal of Speech Technology 25, 3 (2022), 707–725.
Armand Joulin, Edouard Grave, Piotr Bojanowski, and Tomas Mikolov. 2016. Bag of tricks for efficient text classification. arXiv:1607.01759. Retrieved from
Lakshmish Kaushik, Abhijeet Sangwan, and John HL Hansen. 2017. Automatic sentiment detection in naturalistic audio. IEEE/ACM Transactions on Audio, Speech, and Language Processing 25, 8 (2017), 1668–1679.
P. Kumar, K. Rani, Kusuluri Sai, N. Krishna, and S. Chaitanya. 2023. A review on detection of positive or negative emotion based on speech.
Chul Min Lee, Serdar Yildirim, Murtaza Bulut, Abe Kazemzadeh, Carlos Busso, Zhigang Deng, Sungbok Lee, and Shrikanth Narayanan. 2004. Emotion recognition based on phoneme classes. In Proceedings of the 8th International Conference on Spoken Language Processing.
Jinkyu Lee and Ivan Tashev. 2015. High-level feature representation using recurrent neural network for speech emotion recognition. In Proceedings of the Interspeech 2015.
Pengfei Liu, Shafiq Joty, and Helen Meng. 2015. Fine-grained opinion mining with recurrent neural networks and word embeddings. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 1433–1443.
Shenghua Liu, Fuxin Li, Fangtao Li, Xueqi Cheng, and Huawei Shen. 2013. Adaptive co-training SVM for sentiment classification on tweets. In Proceedings of the 22nd ACM International Conference on Information & Knowledge Management. 2079–2088.
Yaxiong Ma, Yixue Hao, Min Chen, Jincai Chen, Ping Lu, and Andrej Košir. 2019. Audio-visual emotion fusion (AVEF): A deep efficient weighted approach. Information Fusion 46 (2019), 184–192.
Vijay Mane, Ashwini Borse, and Rajesh Jalnekar. 2022. Emotion based Sentiment analysis for EEG signals using deep learning. NeuroQuantology 20, 19 (2022), 10–48047.
Fenna Miedema. 2018. Sentiment analysis with long short-term memory networks. Vrije Universiteit Amsterdam 1 (2018), 1–17.
Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. arXiv:1301.3781. Retrieved from
Seyedmahdad Mirsamadi, Emad Barsoum, and Cha Zhang. 2017. Automatic speech emotion recognition using recurrent neural networks with local attention. In Proceedings of the 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’17). IEEE, 2227–2231.
Chiranjeev Mishra, Eshit Bansal, and A. Helen Victoria. 2022. Tweet Sentiment Extraction. Technical Report. EasyChair.
Sivakumar Murugaiyan and Srinivasulu Reddy Uyyala. 2023. Aspect-based sentiment analysis of customer speech data using deep convolutional neural network and BiLSTM. Cognitive Computation 15, 3 (2023), 914–931.
Mustaqeem and Soonil Kwon. 2021. Optimal feature selection based speech emotion recognition using two-stream deep convolutional neural network. International Journal of Intelligent Systems 36, 9 (2021), 5116–5135.
Fatemeh Noroozi, Marina Marjanovic, Angelina Njegus, Sergio Escalera, and Gholamreza Anbarjafari. 2017. Audio-visual emotion recognition in video clips. IEEE Transactions on Affective Computing 10, 1 (2017), 60–75.
Xi Ouyang, Pan Zhou, Cheng Hua Li, and Lijun Liu. 2015. Sentiment analysis using convolutional neural network. In Proceedings of the 2015 IEEE International Conference on Computer and Information Technology; Ubiquitous Computing and Communications; Dependable, Autonomic and Secure Computing; Pervasive Intelligence and Computing. IEEE, 2359–2364.
J. Panthati, J. Bhaskar, and T. K. Ranga. 2018. Sentiment analysis on customer reviews using deep learning. International Journal of Computer Sciences and Engineering 6, 7 (2018), 1023–1024.
Suprava Patnaik. 2023. Speech emotion recognition by using complex MFCC and deep sequential model. Multimedia Tools and Applications 82, 8 (2023), 11897–11922.
Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. Glove: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP’14). 1532–1543.
Benedictus Prabaswara, Wanda Safira, Kartika Purwandari, and Felix Indra Kurniadi. 2022. Twitter sentiment analysis of Indonesian airlines using LSTM. In Proceedings of the 2022 IEEE International Conference on Internet of Things and Intelligence Systems (IoTaIS’22). 386–389. DOI:
Sujata Rani and Parteek Kumar. 2019. Deep learning based sentiment analysis using convolution neural network. Arabian Journal for Science and Engineering 44, 4 (2019), 3305–3314.
Iqra Safder, Zainab Mahmood, Raheem Sarwar, Saeed-Ul Hassan, Farooq Zaman, Rao Muhammad Adeel Nawab, Faisal Bukhari, Rabeeh Ayaz Abbasi, Salem Alelyani, Naif Radi Aljohani, and Raheel Nawaz. 2021. Sentiment analysis for Urdu online reviews using deep learning models. Expert Systems 38, 8 (2021), e12751.
Björn Schuller, Gerhard Rigoll, and Manfred Lang. 2003. Hidden Markov model-based speech emotion recognition. In Proceedings of the 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing(ICASSP’03)., Vol. 2. IEEE, II–1.
Ftoon Abu Shaqra, Rehab Duwairi, and Mahmoud Al-Ayyoub. 2019. Recognizing emotion from speech based on age and gender using hierarchical models. Procedia Computer Science 151 (2019), 37–44.
Pulkit Sharma, Vinayak Abrol, Abhijeet Sachdev, and Aroor Dinesh Dileep. 2016. Speech emotion recognition using kernel sparse representation based classifier. In Proceedings of the 2016 24th European Signal Processing Conference (EUSIPCO’16). IEEE, 374–377.
Jaspreet Singh, Gurvinder Singh, and Rajinder Singh. 2017. Optimization of sentiment analysis using machine learning classifiers. Human-centric Computing and Information Sciences 7 (2017), 1–12.
Axell Mondrian Soesanto, Vincent Christian Chandra, and Derwin Suhartono. 2023. Sentiments comparison on Twitter about LGBT. Procedia Computer Science 216 (2023), 765–773.
George Trigeorgis, Fabien Ringeval, Raymond Brueckner, Erik Marchi, Mihalis A. Nicolaou, Björn Schuller, and Stefanos Zafeiriou. 2016. Adieu features? End-to-end speech emotion recognition using a deep convolutional recurrent network. In Proceedings of the 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’16). IEEE, 5200–5204.
Anjali Tripathi, Upasana Singh, Garima Bansal, Rishabh Gupta, and Ashutosh Kumar Singh. 2020. A review on emotion detection and classification using speech. In Proceedings of the International Conference on Innovative Computing & Communications (ICICC’20).
Vishu Tyagi, Ashwini Kumar, and Sanjoy Das. 2020. Sentiment analysis on twitter data using deep learning approach. In Proceedings of the 2020 2nd International Conference on Advances in Computing, Communication Control and Networking (ICACCCN’20). IEEE, 187–190.
Muhammad Umer, Imran Ashraf, Arif Mehmood, Saru Kumari, Saleem Ullah, and Gyu Sang Choi. 2021. Sentiment analysis of tweets using a unified convolutional neural network-long short-term memory network model. Computational Intelligence 37, 1 (2021), 409–434.
Chunyi Wang, Ying Ren, Na Zhang, Fuwei Cui, and Shiying Luo. 2022. Speech emotion recognition based on multi-feature and multi-lingual fusion. Multimedia Tools and Applications 81, 4 (2022), 4897–4907.
M. Wieman and A. Sun. 2014. Analyzing vocal patterns to determine emotion. Available online: (Accessed on 22 August 2024).
Li Xiaoyan, Rodolfo C. Raga, and Shi Xuemei. 2022. GloVe-CNN-BiLSTM model for sentiment analysis on text reviews. Journal of Sensors 2022, 1 (2022), 1–12.
Zheng Yanan and Tian Dagang. 2008. Study on text classification based on GloVe and SVM [J]. Software Guide 17, 06 (2008), 45–48.
Kaicheng Yang, Hua Xu, and Kai Gao. 2020. CM-BERT: Cross-modal BERT for text-audio sentiment analysis. In Proceedings of the 28th ACM International Conference on Multimedia. 521–528.
Zhiyou Yang and Ying Huang. 2022. Algorithm for speech emotion recognition classification based on Mel-frequency cepstral coefficients and broad learning system. Evolutionary Intelligence 15, 4 (2022), 2485–2494.
K. Yoon. 2014. Convolutional neural networks for sentence classification [OL]. arXiv Preprint (2014).
Adib Ashfaq A. Zamil, Sajib Hasan, Showmik MD Jannatul Baki, Jawad MD Adam, and Isra Zaman. 2019. Emotion detection from speech signals using voting mechanism on classified frames. In Proceedings of the 2019 International Conference on Robotics, Electrical and Signal Processing Techniques (ICREST’19). IEEE, 281–285.
Jianfeng Zhao, Xia Mao, and Lijiang Chen. 2019. Speech emotion recognition using deep 1D & 2D CNN LSTM networks. Biomedical Signal Processing and Control 47 (2019), 312–323.

Index Terms

  1. CNN-Based Models for Emotion and Sentiment Analysis Using Speech Data



    Information & Contributors


    Published In

    cover image ACM Transactions on Asian and Low-Resource Language Information Processing
    ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 23, Issue 10
    October 2024
    189 pages
    • Editor:
    • Imed Zitouni
    Issue’s Table of Contents


    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 23 October 2024
    Online AM: 08 August 2024
    Accepted: 26 July 2024
    Revised: 10 January 2024
    Received: 18 June 2023
    Published in TALLIP Volume 23, Issue 10

    Check for updates

    Author Tags

    1. Speech recognition
    2. emotion detection
    3. deep learning
    4. sentiment analysis


    • Research-article


    Other Metrics

    Bibliometrics & Citations


    Article Metrics

    • 0
      Total Citations
    • 361
      Total Downloads
    • Downloads (Last 12 months)361
    • Downloads (Last 6 weeks)42
    Reflects downloads up to 02 Mar 2025

    Other Metrics


    View Options

    Login options

    Full Access

    View options


    View or Download as a PDF file.



    View online with eReader.


    Full Text

    View this article in Full Text.

    Full Text






    Share this Publication link

    Share on social media