Aspect-Based Sentiment Analysis of Customer Speech Data Using Deep Convolutional Neural Network and BiLSTM

Murugaiyan, Sivakumar; Uyyala, Srinivasulu Reddy

doi:10.1007/s12559-023-10127-6

Aspect-Based Sentiment Analysis of Customer Speech Data Using Deep Convolutional Neural Network and BiLSTM

Published: 06 March 2023

Volume 15, pages 914–931, (2023)
Cite this article

Cognitive Computation Aims and scope Submit manuscript

670 Accesses
6 Citations
Explore all metrics

Abstract

The process of detecting sentiments of particular context from human speech emotions is naturally in-built for humans unlike computers, where it is not possible to process human emotions by a machine for predicting sentiments of a particular context. Though machines can easily understand the content-based information, accessing the real emotion behind it is difficult. Aspect-based sentiment analysis based on speech emotion recognition framework can bridge the gap between these problems. The proposed model helps people with autism spectrum disorder (ASD) to understand other’s sentiments expressed through speech data about the recently purchased product based on various aspects of the product. It is a framework through which different sound discourse documents are characterized into various feelings like happy, sad, anger, and neutral and label the sound with aspect-wise sentiment polarity. This study proposed a hybrid model using deep convolutional neural networks (DCNN) for speech emotion recognition, bidirectional long short term memory (BiLSTM) for speech aspect recognition, and rule-based classifier for aspect-wise sentiment classification. In the existing work, sentiment analysis was carried out on speech data, but aspect-based sentiment analysis on speech data was not carried out successfully. The proposed model extracted standard Mel frequency cepstral coefficient (MFCC) features from customer speech data about product review and generated aspect-wise sentiment label. Enhanced cat swarm optimization (ECSO) algorithm was used for selection features from the extracted feature in the proposed model that improved the overall sentiment classification accuracy. The proposed hybrid framework obtained promising results on sentiment classification accuracy of 93.28%, 91.45%, 92.12%, and 90.45% on four benchmark datasets. The proposed hybrid framework sentiment classification accuracy on these benchmark datasets were compared with other CNN variants and shown better performance. Sentiment classification accuracy of the proposed model with state-of-art methods on the four benchmark datasets was compared and shown better performance. Aspect classification accuracy of the proposed with state-of-art methods on the benchmark datasets was compared and shown better performance. The developed hybrid model using DCNN, BiLSTM, and rule-based classifier outperformed the state-of-art models for aspect-based sentiment analysis by incorporating ECSO algorithm in feature selection process. The proposed model will help to perform aspect-based sentiment analysis on all domains with specified aspect corpus.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Sentiment Analysis in Social Media Data for Depression Detection Using Artificial Intelligence: A Review

Article 19 November 2021

"Challenges and future in deep learning for sentiment analysis: a comprehensive review and a proposed novel hybrid approach"

Article Open access 05 March 2024

Sentiment analysis using deep learning architectures: a review

Article 02 December 2019

Data Availability

The RAVDESS datasets analyzed during the current study are available in the zenodo repository, https://zenodo.org/record/1188976. The SAVEE datasets analyzed during the current study are available in the kahlan repository, http://kahlan.eps.surrey.ac.uk/savee/. The EmoDB datasets analyzed during the current study are available in the emodb repository, http://www.emodb.bilderbar.info/download/. The IEMOCAP datasets analyzed during the current study are available in the sail repository, https://sail.usc.edu/iemocap/.

References

Khalid HM, Helander MG. Customer emotional needs in product design. Concurr Eng. 2006;14(3):197–206. https://doi.org/10.1177/1063293X06068387.
Article Google Scholar
Fu Y, Liao J, Li Y, Wang S, Li D, Li X. Multiple perspective attention based on double BiLSTM for aspect and sentiment pair extract. Neurocomputing. 2021;438:302–11. https://doi.org/10.1016/j.neucom.2021.01.079.
Article Google Scholar
Li G, Liu F, Wang Y, Guo Y, Xiao L, Zhu L. A convolutional neural network (CNN) based approach for the recognition and evaluation of classroom teaching behavior. Sci Program. 2021;2021:8. https://doi.org/10.1155/2021/6336773.
Article Google Scholar
Lu Z, Cao L, Zhang Y, Chiu CC, Fan J. Speech sentiment analysis via pre-trained features from end-to-end asr models. In: ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE; 2020. p. 7149–53. https://doi.org/10.1109/ICASSP40776.2020.9052937.
Chapter Google Scholar
Capuano N, Greco L, Ritrovato P, Vento M. Sentiment analysis for customer relationship management: an incremental learning approach. Appl Intell. 2021;51(6):3339–52. https://doi.org/10.1007/s10489-020-01984-x.
Article Google Scholar
Yadav S, Ekbal A, Saha S, Bhattacharyya P. Medical sentiment analysis using social media: towards building a patient assisted system. In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018). ELRA; 2018. p. 2790–7.
Google Scholar
Das RK, Panda M, Misra H. Decision support grievance redressal system using sentence sentiment analysis. In: Proceedings of the 13th International Conference on Theory and Practice of Electronic Governance. Association for Computing Machinery; 2020. p. 17–24. https://doi.org/10.1145/3428502.3428505.
Chapter Google Scholar
Maghilnan S, Kumar MR. Sentiment analysis on speaker specific speech data. In: 2017 international conference on intelligent computing and control (I2C2). IEEE; 2017. p. 1–5. https://doi.org/10.1109/I2C2.2017.8321795.
Chapter Google Scholar
Ezzat S, El Gayar N, Ghanem MM. Sentiment analysis of call centre audio conversations using text classification. Int J Comput Inf Syst Ind Manag Appl. 2012;4(1):619–27.
Google Scholar
Lakomkin E, Zamani MA, Weber C, Magg S, Wermter S. Incorporating end-to-end speech recognition models for sentiment analysis. In: 2019 IEEE International Conference on Robotics and Automation (ICRA). IEEE; 2019. p. 7976–82. https://doi.org/10.1109/ICRA.2019.8794468.
Chapter Google Scholar
Huang Z, Dong M, Mao Q, Zhan Y. Speech emotion recognition using CNN. Proceedings of the 22nd ACM international conference on Multimedia; 2014. p. 801–4. https://doi.org/10.1145/2647868.2654984.
Book Google Scholar
Haq S, Jackson PJB. Speaker-dependent audio-visual emotion recognition. Proc. Int. Conf. on Auditory-Visual Speech Processing (AVSP’09); 2009. p. 1–6.
Google Scholar
Berlin TU, Science C, Berlin LKA, Berlin HU. A database of German emotional speech. Proceedings Interspeech; 2005. https://doi.org/10.21437/Interspeech.2005-446.
Book MATH Google Scholar
Ververidis D, Kotropoulos C, Pitas I. Automatic emotional speech classification. In: 2004 IEEE international conference on acoustics, speech, and signal processing, vol. 1. IEEE; 2004. p. 1–593. https://doi.org/10.1109/ICASSP.2004.1326055.
Chapter Google Scholar
Cui C, Ren Y, Liu J, Chen F, Huang R, Lei M, Zhao Z. EMOVIE: a Mandarin emotion speech dataset with a simple emotional text-to-speech model. Interspeech, pp. 1-5. 2021. https://doi.org/10.21437/Interspeech.2021-1148.
Han K, Yu D, Tashev I. Speech emotion recognition using deep neural network and extreme learning machine. Interspeech; 2014. p. 223–7. https://doi.org/10.21437/Interspeech.2014-57.
Book Google Scholar
M. Xu, F. Zhang and W. Zhang, Head Fusion: Improving the Accuracy and Robustness of Speech Emotion Recognition on the IEMOCAP and RAVDESS Dataset, IEEE Access, 9, pp. 74539-74549, 2021, https://doi.org/10.1109/ACCESS.2021.3067460.
Article Google Scholar
Mirsamadi S, Barsoum E, Zhang C. Automatic speech emotion recognition using recurrent neural networks with local attention. In: 2017 IEEE International conference on acoustics, speech and signal processing (ICASSP). IEEE; 2017. p. 2227–31. https://doi.org/10.1109/ICASSP.2017.7952552.
Chapter Google Scholar
Chen M, He X, Yang J, Zhang H. 3-D convolutional recurrent neural networks with attention model for speech emotion recognition. IEEE Signal Process Lett. 2018;25(10):1440–4. https://doi.org/10.1109/LSP.2018.2860246.
Article Google Scholar
Xie Y, Liang R, Liang Z, Huang C, Zou C, Schuller B. Speech emotion classification using attention-based LSTM. IEEE/ACM Trans Audio Speech Lang Process. 2019;27(11):1675–85. https://doi.org/10.1109/TASLP.2019.2925934.
Article Google Scholar
Zhao J, Mao X, Chen L. Speech emotion recognition using deep 1D and 2D CNN LSTM networks. Biomed Signal Process Control. 2019;47:312–23. https://doi.org/10.1016/j.bspc.2018.08.035.
Article Google Scholar
Sajjad M, Kwon S. Clustering-based speech emotion recognition by incorporating learned features and deep BiLSTM. IEEE Access. 2020;8:79861–75. https://doi.org/10.1109/ACCESS.2020.2990405.
Article Google Scholar
Kwon S. A CNN-assisted enhanced audio signal processing for speech emotion recognition. Sensors. 2019;20(1):183. https://doi.org/10.3390/s20010183.
Article Google Scholar
Issa D, Demirci MF, Yazici A. Speech emotion recognition with deep convolutional neural networks. Biomed Signal Process Control. 2020;59:101894. https://doi.org/10.1016/j.bspc.2020.101894.
Article Google Scholar
Atila O, Şengür A. Attention guided 3D CNN-LSTM model for accurate speech based emotion recognition. Appl Acoust. 2021;182:108260. https://doi.org/10.1016/j.apacoust.2021.108260.
Article Google Scholar
Kwon S. CLSTM: deep feature-based speech emotion recognition using the hierarchical ConvLSTM network. Mathematics. 2020;8(12):2133. https://doi.org/10.3390/math8122133.
Article Google Scholar
Chiril P, Pamungkas EW, Benamara F, Moriceau V, Patti V. Emotionally informed hate speech detection: a multi-target perspective. Cogn Comput. 2022;14(1):322–52. https://doi.org/10.1007/s12559-021-09862-5.
Article Google Scholar
Chatziagapi A, Paraskevopoulos G, Sgouropoulos D, Pantazopoulos G, Nikandrou M, Giannakopoulos T, Narayanan S. Data augmentation using GANs for speech emotion recognition. Interspeech; 2019. p. 171–5. https://doi.org/10.21437/Interspeech.2019-2561.
Book Google Scholar
Wu JJ, Chang ST. Exploring customer sentiment regarding online retail services: a topic-based approach. J Retail Consum Serv. 2020;55:102145. https://doi.org/10.1016/j.jretconser.2020.102145.
Article Google Scholar
McFee B, Raffel C, Liang D, Ellis DPW, McVicar M, Battenberg E, Nieto O. librosa: audio and music signal analysis in python. Proceedings of the 14th python in science conference; 2015. p. 18–25. https://doi.org/10.25080/Majora-7b98e3ed-003.
Book Google Scholar
Alim SA, Rashid NKA. Some commonly used speech feature extraction algorithms. IntechOpen; 2018. p. 2–19. https://doi.org/10.5772/intechopen.80419.
Book Google Scholar
Shashidhar R, Patilkulkarni S. Audiovisual speech recognition for Kannada language using feed forward neural network. Neural Comput Appl. 2022;34:15603–15. https://doi.org/10.1007/s00521-022-07249-7.
Article Google Scholar
Pawar MD, Kokate RD. Convolution neural network based automatic speech emotion recognition using Mel-frequency Cepstrum coefficients. Multimed Tools Appl. 2021;80(10):15563–87. https://doi.org/10.1007/s11042-020-10329-2.
Article Google Scholar
Gomathy M. Optimal feature selection for speech emotion recognition using enhanced cat swarm optimization algorithm. Int J Speech Technol. 2021;24(1):155–63. https://doi.org/10.1007/s10772-020-09776-x.
Article Google Scholar
Sainath TN, Kingsbury B, Saon G, Soltau H, Mohamed AR, Dahl G, Ramabhadran B. Deep convolutional neural networks for large-scale speech tasks. Neural Netw. 2015;64:39–48. https://doi.org/10.1016/j.neunet.2014.08.005.
Article Google Scholar
Kingma DP, Ba J. Adam: a method for stochastic optimization. International Conference for Learning Representations, pp. 1-15. 2015. arXiv:1412.6980. https://doi.org/10.48550/arXiv.1412.6980.
Sutskever I, Martens J, Dahl G, Hinton G. On the importance of initialization and momentum in deep learning. In: Proc. Int. Conf. Mach. Learn. PMLR; 2013. p. 1139–47.
Google Scholar
Duchi J, Hazan E, Singer Y. Adaptive subgradient methods for online learning and stochastic optimization. J Mach Learn Res. 2011;12(61):2121–59.
MathSciNet MATH Google Scholar
Zeiler M. ADADELTA: An Adaptive Learning Rate Method. ArXiv, abs/1212.5701. 2012. https://doi.org/10.48550/arXiv.1212.5701.
Xu D, Zhang S, Zhang H, Mandic DP. Convergence of the RMSProp deep learning method with penalty for nonconvex optimization. Neural Netw. 2021;139:17–23. https://doi.org/10.1016/j.neunet.2021.02.011.
Article Google Scholar
Kimura T, Nose T, Hirooka S, Chiba Y, Ito A. Comparison of speech recognition performance between Kaldi and Google Cloud Speech API. In: Pan JS, Ito A, Tsai PW, Jain L, editors. Recent advances in intelligent information hiding and multimedia signal processing. IIH-MSP 2018. Smart Innovation, Systems and Technologies, vol. 110. Cham: Springer; 2019. https://doi.org/10.1007/978-3-030-03748-2_13.
Chapter Google Scholar
Iancu B. Evaluating google speech-to-text API’s performance for Romanian e-learning resources. Inf Econ. 2019;23(1):17–25. https://doi.org/10.12948/ISSN14531305/23.1.2019.02.
Article Google Scholar
Wang X, Liu Y, Sun C, Liu M, Wang X. Extended dependency-based word embeddings for aspect extraction. In: International Conference on Neural Information Processing. Springer; 2016. p. 104–11. https://doi.org/10.1007/978-3-319-46681-1_13.
Chapter Google Scholar
Sharma AK, Chaurasia S, Srivastava DK. Sentimental short sentences classification by using CNN deep learning model with fine tuned Word2Vec. Procedia Comput Sci. 2020;167:1139–47. https://doi.org/10.1016/j.procs.2020.03.416.
Article Google Scholar
Patilkulkarni S. Visual speech recognition for small scale dataset using VGG16 convolution neural network. Multimed Tools Appl. 2021;80(19):28941–52. https://doi.org/10.1007/s11042-021-11119-0.
Article Google Scholar
Livingstone SR, Russo FA. The Ryerson audio-visual database of emotional speech and song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English. PLOS ONE. 2018;13(5). https://doi.org/10.1371/journal.pone.0196391.
Haq S, Jackson PJB. Speaker-dependent audio-visual emotion recognition. Proc. Int’l Conf. on Auditory-Visual Speech Processing; 2009. p. 53–8.
Google Scholar
Berlin TU, Science C, Berlin LKA, Berlin HU. A database of German emotional speech. Interspeech. 2005;5:1517–20.
Google Scholar
Busso C, Bulut ÆM, Abe ÆCLÆ, Mower E, Kim ÆS, Chang ÆJN, et al. IEMOCAP: interactive emotional dyadic motion capture database. Lang Resour Eval. 2018;42:335–59. https://doi.org/10.1007/s10579-008-9076-6.
Article Google Scholar
Shashidhar R, Patilkulkarni S, Puneeth SB. Combining audio and visual speech recognition using LSTM and deep convolutional neural network. Int J Inf Technol. 2022;14(7):3425–36. https://doi.org/10.1007/s41870-022-00907-y.
Article Google Scholar
Srividya K, Sowjanya AM. Aspect based sentiment analysis using RNN-LSTM. Int J Adv Sci Technol. 2020;29(4):5875–80.
Google Scholar
Al-Smadi M, Talafha B, Al-Ayyoub M, Jararweh Y. Using long short-term memory deep neural networks for aspect-based sentiment analysis of Arabic reviews. Int J Mach Learn Cybern. 2019;10(8):2163–75. https://doi.org/10.1007/s13042-018-0799-4.
Article Google Scholar
Xu L, Lin J, Wang L, Yin C, Wang J. Deep convolutional neural network based approach for aspect-based sentiment analysis. Adv Sci Technol Lett. 2017;143:199–204. https://doi.org/10.14257/ASTL.2017.143.41.
Article Google Scholar
Kumar R, Pannu HS, Malhi AK. Aspect-based sentiment analysis using deep networks and stochastic optimization. Neural Comput Appl. 2020;32(8):3221–35. https://doi.org/10.1007/s00521-019-04105-z.
Article Google Scholar
Ombabi AH, Ouarda W, Alimi AM. Deep learning CNN–LSTM framework for Arabic sentiment analysis using textual information shared in social networks. Soc Netw Anal Min. 2020;10(1):1–13. https://doi.org/10.1007/s13278-020-00668-1.
Article Google Scholar
Wang Y, Huang M, Zhu X, Zhao L. Attention-based LSTM for aspect-level sentiment classification. In: Proceedings of the 2016 conference on empirical methods in natural language processing. Association for Computational Linguistics; 2016. p. 606–15. https://doi.org/10.18653/v1/D16-1058.
Chapter Google Scholar
Kumar JA, Abirami S. Ensemble application of bidirectional LSTM and GRU for aspect category detection with imbalanced data. Neural Comput Appl. 2021;33(21):14603–21. https://doi.org/10.1007/s00521-021-06100-9.
Article Google Scholar
Setiawan EI, Ferry F, Santoso J, Sumpeno S, Fujisawa K, Purnomo MH. Bidirectional GRU for targeted aspect-based sentiment analysis based on character-enhanced token-embedding and multi-level attention. Int J Intell Eng Syst. 2020;13(5):392–407. https://doi.org/10.22266/ijies2020.1031.35.
Article Google Scholar
Granholm V, Noble WS, Käll L. A cross-validation scheme for machine learning algorithms in shotgun proteomics. BMC Bioinform. 2012;13(16):1–8. https://doi.org/10.1186/1471-2105-13-S16-S3.
Article Google Scholar
Sugan N, Srinivas NS, Kar N, Kumar LS, Nath MK, Kanhe A. Performance comparison of different cepstral features for speech emotion recognition. In: 2018 International CET conference on control, communication, and computing (IC4). IEEE; 2018. p. 266–71.
Chapter Google Scholar
Tuncer T, Dogan S, Acharya UR. Automated accurate speech emotion recognition system using twine shuffle pattern and iterative neighborhood component analysis techniques. Knowl Based Syst. 2021;211:106547. https://doi.org/10.1016/j.knosys.2020.106547.
Article Google Scholar
Kwon S. Optimal feature selection based speech emotion recognition using two-stream deep convolutional neural network. Int J Intell Syst. 2021;36(9):5116–35. https://doi.org/10.1002/int.22505.
Article Google Scholar
Kwon S. MLT-DNet: speech emotion recognition using 1D dilated CNN based on multi-learning trick approach. Expert Syst Appl. 2021;167:114177. https://doi.org/10.1016/j.eswa.2020.114177.
Article Google Scholar
Yogesh CK, Hariharan M, Ngadiran R, Adom AH, Yaacob S, Berkai C, Polat K. A new hybrid PSO assisted biogeography-based optimization for emotion and stress recognition from speech signal. Expert Syst Appl. 2017;69:149–58. https://doi.org/10.1016/j.eswa.2016.10.035.
Article Google Scholar
Assunção G, Menezes P, Perdigão F. Speaker awareness for speech emotion recognition. Int J Online Biomed Eng. 2020;16(4):15–22. https://doi.org/10.3991/ijoe.v16i04.11870.
Article Google Scholar
Badshah AM, Rahim N, Ullah N, Ahmad J, Muhammad K, Lee MY, Baik SW. Deep features-based speech emotion recognition for smart affective services. Multimed Tools Appl. 2019;78(5):5571–89. https://doi.org/10.1007/s11042-017-5292-7.
Article Google Scholar
Jiang P, Fu H, Tao H, Lei P, Zhao L. Parallelized convolutional recurrent neural network with spectral features for speech emotion recognition. IEEE Access. 2019;7:90368–77. https://doi.org/10.1109/ACCESS.2019.2927384.
Article Google Scholar
Anvarjon T, Kwon S. Deep-net: a lightweight CNN-based speech emotion recognition system using deep frequency features. Sensors. 2020;20(18):5212. https://doi.org/10.3390/s20185212.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Networking and Communications, School of Computing, SRM Institute of Science and Technology, Kattankulathur, Chennai, 603203, India
Sivakumar Murugaiyan & Srinivasulu Reddy Uyyala
Machine Learning and Data Analytics Lab, Centre of Excellence in Artificial Intelligence, Department of Computer Applications, National Institute of Technology, Tiruchirappalli, 620015, India
Sivakumar Murugaiyan

Authors

Sivakumar Murugaiyan
View author publications
You can also search for this author in PubMed Google Scholar
Srinivasulu Reddy Uyyala
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Srinivasulu Reddy Uyyala.

Ethics declarations

Ethical Approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Conflict of Interest

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Murugaiyan, S., Uyyala, S.R. Aspect-Based Sentiment Analysis of Customer Speech Data Using Deep Convolutional Neural Network and BiLSTM. Cogn Comput 15, 914–931 (2023). https://doi.org/10.1007/s12559-023-10127-6

Download citation

Received: 14 March 2022
Accepted: 19 February 2023
Published: 06 March 2023
Issue Date: May 2023
DOI: https://doi.org/10.1007/s12559-023-10127-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Aspect-Based Sentiment Analysis of Customer Speech Data Using Deep Convolutional Neural Network and BiLSTM

Abstract

Access this article

Similar content being viewed by others

Sentiment Analysis in Social Media Data for Depression Detection Using Artificial Intelligence: A Review

"Challenges and future in deep learning for sentiment analysis: a comprehensive review and a proposed novel hybrid approach"

Sentiment analysis using deep learning architectures: a review

Data Availability

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Ethical Approval

Conflict of Interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Aspect-Based Sentiment Analysis of Customer Speech Data Using Deep Convolutional Neural Network and BiLSTM

Abstract

Access this article

Similar content being viewed by others

Sentiment Analysis in Social Media Data for Depression Detection Using Artificial Intelligence: A Review

"Challenges and future in deep learning for sentiment analysis: a comprehensive review and a proposed novel hybrid approach"

Sentiment analysis using deep learning architectures: a review

Data Availability

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Ethical Approval

Conflict of Interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation