Skip to main content

Advertisement

Log in

Aspect-Based Sentiment Analysis of Customer Speech Data Using Deep Convolutional Neural Network and BiLSTM

  • Published:
Cognitive Computation Aims and scope Submit manuscript

Abstract

The process of detecting sentiments of particular context from human speech emotions is naturally in-built for humans unlike computers, where it is not possible to process human emotions by a machine for predicting sentiments of a particular context. Though machines can easily understand the content-based information, accessing the real emotion behind it is difficult. Aspect-based sentiment analysis based on speech emotion recognition framework can bridge the gap between these problems. The proposed model helps people with autism spectrum disorder (ASD) to understand other’s sentiments expressed through speech data about the recently purchased product based on various aspects of the product. It is a framework through which different sound discourse documents are characterized into various feelings like happy, sad, anger, and neutral and label the sound with aspect-wise sentiment polarity. This study proposed a hybrid model using deep convolutional neural networks (DCNN) for speech emotion recognition, bidirectional long short term memory (BiLSTM) for speech aspect recognition, and rule-based classifier for aspect-wise sentiment classification. In the existing work, sentiment analysis was carried out on speech data, but aspect-based sentiment analysis on speech data was not carried out successfully. The proposed model extracted standard Mel frequency cepstral coefficient (MFCC) features from customer speech data about product review and generated aspect-wise sentiment label. Enhanced cat swarm optimization (ECSO) algorithm was used for selection features from the extracted feature in the proposed model that improved the overall sentiment classification accuracy. The proposed hybrid framework obtained promising results on sentiment classification accuracy of 93.28%, 91.45%, 92.12%, and 90.45% on four benchmark datasets. The proposed hybrid framework sentiment classification accuracy on these benchmark datasets were compared with other CNN variants and shown better performance. Sentiment classification accuracy of the proposed model with state-of-art methods on the four benchmark datasets was compared and shown better performance. Aspect classification accuracy of the proposed with state-of-art methods on the benchmark datasets was compared and shown better performance. The developed hybrid model using DCNN, BiLSTM, and rule-based classifier outperformed the state-of-art models for aspect-based sentiment analysis by incorporating ECSO algorithm in feature selection process. The proposed model will help to perform aspect-based sentiment analysis on all domains with specified aspect corpus.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Data Availability

The RAVDESS datasets analyzed during the current study are available in the zenodo repository, https://zenodo.org/record/1188976. The SAVEE datasets analyzed during the current study are available in the kahlan repository, http://kahlan.eps.surrey.ac.uk/savee/. The EmoDB datasets analyzed during the current study are available in the emodb repository, http://www.emodb.bilderbar.info/download/. The IEMOCAP datasets analyzed during the current study are available in the sail repository, https://sail.usc.edu/iemocap/.

References

  1. Khalid HM, Helander MG. Customer emotional needs in product design. Concurr Eng. 2006;14(3):197–206. https://doi.org/10.1177/1063293X06068387.

    Article  Google Scholar 

  2. Fu Y, Liao J, Li Y, Wang S, Li D, Li X. Multiple perspective attention based on double BiLSTM for aspect and sentiment pair extract. Neurocomputing. 2021;438:302–11. https://doi.org/10.1016/j.neucom.2021.01.079.

    Article  Google Scholar 

  3. Li G, Liu F, Wang Y, Guo Y, Xiao L, Zhu L. A convolutional neural network (CNN) based approach for the recognition and evaluation of classroom teaching behavior. Sci Program. 2021;2021:8. https://doi.org/10.1155/2021/6336773.

    Article  Google Scholar 

  4. Lu Z, Cao L, Zhang Y, Chiu CC, Fan J. Speech sentiment analysis via pre-trained features from end-to-end asr models. In: ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE; 2020. p. 7149–53. https://doi.org/10.1109/ICASSP40776.2020.9052937.

    Chapter  Google Scholar 

  5. Capuano N, Greco L, Ritrovato P, Vento M. Sentiment analysis for customer relationship management: an incremental learning approach. Appl Intell. 2021;51(6):3339–52. https://doi.org/10.1007/s10489-020-01984-x.

    Article  Google Scholar 

  6. Yadav S, Ekbal A, Saha S, Bhattacharyya P. Medical sentiment analysis using social media: towards building a patient assisted system. In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018). ELRA; 2018. p. 2790–7.

    Google Scholar 

  7. Das RK, Panda M, Misra H. Decision support grievance redressal system using sentence sentiment analysis. In: Proceedings of the 13th International Conference on Theory and Practice of Electronic Governance. Association for Computing Machinery; 2020. p. 17–24. https://doi.org/10.1145/3428502.3428505.

    Chapter  Google Scholar 

  8. Maghilnan S, Kumar MR. Sentiment analysis on speaker specific speech data. In: 2017 international conference on intelligent computing and control (I2C2). IEEE; 2017. p. 1–5. https://doi.org/10.1109/I2C2.2017.8321795.

    Chapter  Google Scholar 

  9. Ezzat S, El Gayar N, Ghanem MM. Sentiment analysis of call centre audio conversations using text classification. Int J Comput Inf Syst Ind Manag Appl. 2012;4(1):619–27.

    Google Scholar 

  10. Lakomkin E, Zamani MA, Weber C, Magg S, Wermter S. Incorporating end-to-end speech recognition models for sentiment analysis. In: 2019 IEEE International Conference on Robotics and Automation (ICRA). IEEE; 2019. p. 7976–82. https://doi.org/10.1109/ICRA.2019.8794468.

    Chapter  Google Scholar 

  11. Huang Z, Dong M, Mao Q, Zhan Y. Speech emotion recognition using CNN. Proceedings of the 22nd ACM international conference on Multimedia; 2014. p. 801–4. https://doi.org/10.1145/2647868.2654984.

    Book  Google Scholar 

  12. Haq S, Jackson PJB. Speaker-dependent audio-visual emotion recognition. Proc. Int. Conf. on Auditory-Visual Speech Processing (AVSP’09); 2009. p. 1–6.

    Google Scholar 

  13. Berlin TU, Science C, Berlin LKA, Berlin HU. A database of German emotional speech. Proceedings Interspeech; 2005. https://doi.org/10.21437/Interspeech.2005-446.

    Book  MATH  Google Scholar 

  14. Ververidis D, Kotropoulos C, Pitas I. Automatic emotional speech classification. In: 2004 IEEE international conference on acoustics, speech, and signal processing, vol. 1. IEEE; 2004. p. 1–593. https://doi.org/10.1109/ICASSP.2004.1326055.

    Chapter  Google Scholar 

  15. Cui C, Ren Y, Liu J, Chen F, Huang R, Lei M, Zhao Z. EMOVIE: a Mandarin emotion speech dataset with a simple emotional text-to-speech model. Interspeech, pp. 1-5. 2021. https://doi.org/10.21437/Interspeech.2021-1148.

  16. Han K, Yu D, Tashev I. Speech emotion recognition using deep neural network and extreme learning machine. Interspeech; 2014. p. 223–7. https://doi.org/10.21437/Interspeech.2014-57.

    Book  Google Scholar 

  17. M. Xu, F. Zhang and W. Zhang, Head Fusion: Improving the Accuracy and Robustness of Speech Emotion Recognition on the IEMOCAP and RAVDESS Dataset, IEEE Access, 9, pp. 74539-74549, 2021, https://doi.org/10.1109/ACCESS.2021.3067460.

    Article  Google Scholar 

  18. Mirsamadi S, Barsoum E, Zhang C. Automatic speech emotion recognition using recurrent neural networks with local attention. In: 2017 IEEE International conference on acoustics, speech and signal processing (ICASSP). IEEE; 2017. p. 2227–31. https://doi.org/10.1109/ICASSP.2017.7952552.

    Chapter  Google Scholar 

  19. Chen M, He X, Yang J, Zhang H. 3-D convolutional recurrent neural networks with attention model for speech emotion recognition. IEEE Signal Process Lett. 2018;25(10):1440–4. https://doi.org/10.1109/LSP.2018.2860246.

    Article  Google Scholar 

  20. Xie Y, Liang R, Liang Z, Huang C, Zou C, Schuller B. Speech emotion classification using attention-based LSTM. IEEE/ACM Trans Audio Speech Lang Process. 2019;27(11):1675–85. https://doi.org/10.1109/TASLP.2019.2925934.

    Article  Google Scholar 

  21. Zhao J, Mao X, Chen L. Speech emotion recognition using deep 1D and 2D CNN LSTM networks. Biomed Signal Process Control. 2019;47:312–23. https://doi.org/10.1016/j.bspc.2018.08.035.

    Article  Google Scholar 

  22. Sajjad M, Kwon S. Clustering-based speech emotion recognition by incorporating learned features and deep BiLSTM. IEEE Access. 2020;8:79861–75. https://doi.org/10.1109/ACCESS.2020.2990405.

    Article  Google Scholar 

  23. Kwon S. A CNN-assisted enhanced audio signal processing for speech emotion recognition. Sensors. 2019;20(1):183. https://doi.org/10.3390/s20010183.

    Article  Google Scholar 

  24. Issa D, Demirci MF, Yazici A. Speech emotion recognition with deep convolutional neural networks. Biomed Signal Process Control. 2020;59:101894. https://doi.org/10.1016/j.bspc.2020.101894.

    Article  Google Scholar 

  25. Atila O, Şengür A. Attention guided 3D CNN-LSTM model for accurate speech based emotion recognition. Appl Acoust. 2021;182:108260. https://doi.org/10.1016/j.apacoust.2021.108260.

    Article  Google Scholar 

  26. Kwon S. CLSTM: deep feature-based speech emotion recognition using the hierarchical ConvLSTM network. Mathematics. 2020;8(12):2133. https://doi.org/10.3390/math8122133.

    Article  Google Scholar 

  27. Chiril P, Pamungkas EW, Benamara F, Moriceau V, Patti V. Emotionally informed hate speech detection: a multi-target perspective. Cogn Comput. 2022;14(1):322–52. https://doi.org/10.1007/s12559-021-09862-5.

    Article  Google Scholar 

  28. Chatziagapi A, Paraskevopoulos G, Sgouropoulos D, Pantazopoulos G, Nikandrou M, Giannakopoulos T, Narayanan S. Data augmentation using GANs for speech emotion recognition. Interspeech; 2019. p. 171–5. https://doi.org/10.21437/Interspeech.2019-2561.

    Book  Google Scholar 

  29. Wu JJ, Chang ST. Exploring customer sentiment regarding online retail services: a topic-based approach. J Retail Consum Serv. 2020;55:102145. https://doi.org/10.1016/j.jretconser.2020.102145.

    Article  Google Scholar 

  30. McFee B, Raffel C, Liang D, Ellis DPW, McVicar M, Battenberg E, Nieto O. librosa: audio and music signal analysis in python. Proceedings of the 14th python in science conference; 2015. p. 18–25. https://doi.org/10.25080/Majora-7b98e3ed-003.

    Book  Google Scholar 

  31. Alim SA, Rashid NKA. Some commonly used speech feature extraction algorithms. IntechOpen; 2018. p. 2–19. https://doi.org/10.5772/intechopen.80419.

    Book  Google Scholar 

  32. Shashidhar R, Patilkulkarni S. Audiovisual speech recognition for Kannada language using feed forward neural network. Neural Comput Appl. 2022;34:15603–15. https://doi.org/10.1007/s00521-022-07249-7.

    Article  Google Scholar 

  33. Pawar MD, Kokate RD. Convolution neural network based automatic speech emotion recognition using Mel-frequency Cepstrum coefficients. Multimed Tools Appl. 2021;80(10):15563–87. https://doi.org/10.1007/s11042-020-10329-2.

    Article  Google Scholar 

  34. Gomathy M. Optimal feature selection for speech emotion recognition using enhanced cat swarm optimization algorithm. Int J Speech Technol. 2021;24(1):155–63. https://doi.org/10.1007/s10772-020-09776-x.

    Article  Google Scholar 

  35. Sainath TN, Kingsbury B, Saon G, Soltau H, Mohamed AR, Dahl G, Ramabhadran B. Deep convolutional neural networks for large-scale speech tasks. Neural Netw. 2015;64:39–48. https://doi.org/10.1016/j.neunet.2014.08.005.

    Article  Google Scholar 

  36. Kingma DP, Ba J. Adam: a method for stochastic optimization. International Conference for Learning Representations, pp. 1-15. 2015. arXiv:1412.6980. https://doi.org/10.48550/arXiv.1412.6980.

  37. Sutskever I, Martens J, Dahl G, Hinton G. On the importance of initialization and momentum in deep learning. In: Proc. Int. Conf. Mach. Learn. PMLR; 2013. p. 1139–47.

    Google Scholar 

  38. Duchi J, Hazan E, Singer Y. Adaptive subgradient methods for online learning and stochastic optimization. J Mach Learn Res. 2011;12(61):2121–59.

    MathSciNet  MATH  Google Scholar 

  39. Zeiler M. ADADELTA: An Adaptive Learning Rate Method. ArXiv, abs/1212.5701. 2012. https://doi.org/10.48550/arXiv.1212.5701.

  40. Xu D, Zhang S, Zhang H, Mandic DP. Convergence of the RMSProp deep learning method with penalty for nonconvex optimization. Neural Netw. 2021;139:17–23. https://doi.org/10.1016/j.neunet.2021.02.011.

    Article  Google Scholar 

  41. Kimura T, Nose T, Hirooka S, Chiba Y, Ito A. Comparison of speech recognition performance between Kaldi and Google Cloud Speech API. In: Pan JS, Ito A, Tsai PW, Jain L, editors. Recent advances in intelligent information hiding and multimedia signal processing. IIH-MSP 2018. Smart Innovation, Systems and Technologies, vol. 110. Cham: Springer; 2019. https://doi.org/10.1007/978-3-030-03748-2_13.

    Chapter  Google Scholar 

  42. Iancu B. Evaluating google speech-to-text API’s performance for Romanian e-learning resources. Inf Econ. 2019;23(1):17–25. https://doi.org/10.12948/ISSN14531305/23.1.2019.02.

    Article  Google Scholar 

  43. Wang X, Liu Y, Sun C, Liu M, Wang X. Extended dependency-based word embeddings for aspect extraction. In: International Conference on Neural Information Processing. Springer; 2016. p. 104–11. https://doi.org/10.1007/978-3-319-46681-1_13.

    Chapter  Google Scholar 

  44. Sharma AK, Chaurasia S, Srivastava DK. Sentimental short sentences classification by using CNN deep learning model with fine tuned Word2Vec. Procedia Comput Sci. 2020;167:1139–47. https://doi.org/10.1016/j.procs.2020.03.416.

    Article  Google Scholar 

  45. Patilkulkarni S. Visual speech recognition for small scale dataset using VGG16 convolution neural network. Multimed Tools Appl. 2021;80(19):28941–52. https://doi.org/10.1007/s11042-021-11119-0.

    Article  Google Scholar 

  46. Livingstone SR, Russo FA. The Ryerson audio-visual database of emotional speech and song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English. PLOS ONE. 2018;13(5). https://doi.org/10.1371/journal.pone.0196391.

  47. Haq S, Jackson PJB. Speaker-dependent audio-visual emotion recognition. Proc. Int’l Conf. on Auditory-Visual Speech Processing; 2009. p. 53–8.

    Google Scholar 

  48. Berlin TU, Science C, Berlin LKA, Berlin HU. A database of German emotional speech. Interspeech. 2005;5:1517–20.

    Google Scholar 

  49. Busso C, Bulut ÆM, Abe ÆCLÆ, Mower E, Kim ÆS, Chang ÆJN, et al. IEMOCAP: interactive emotional dyadic motion capture database. Lang Resour Eval. 2018;42:335–59. https://doi.org/10.1007/s10579-008-9076-6.

    Article  Google Scholar 

  50. Shashidhar R, Patilkulkarni S, Puneeth SB. Combining audio and visual speech recognition using LSTM and deep convolutional neural network. Int J Inf Technol. 2022;14(7):3425–36. https://doi.org/10.1007/s41870-022-00907-y.

    Article  Google Scholar 

  51. Srividya K, Sowjanya AM. Aspect based sentiment analysis using RNN-LSTM. Int J Adv Sci Technol. 2020;29(4):5875–80.

    Google Scholar 

  52. Al-Smadi M, Talafha B, Al-Ayyoub M, Jararweh Y. Using long short-term memory deep neural networks for aspect-based sentiment analysis of Arabic reviews. Int J Mach Learn Cybern. 2019;10(8):2163–75. https://doi.org/10.1007/s13042-018-0799-4.

    Article  Google Scholar 

  53. Xu L, Lin J, Wang L, Yin C, Wang J. Deep convolutional neural network based approach for aspect-based sentiment analysis. Adv Sci Technol Lett. 2017;143:199–204. https://doi.org/10.14257/ASTL.2017.143.41.

    Article  Google Scholar 

  54. Kumar R, Pannu HS, Malhi AK. Aspect-based sentiment analysis using deep networks and stochastic optimization. Neural Comput Appl. 2020;32(8):3221–35. https://doi.org/10.1007/s00521-019-04105-z.

    Article  Google Scholar 

  55. Ombabi AH, Ouarda W, Alimi AM. Deep learning CNN–LSTM framework for Arabic sentiment analysis using textual information shared in social networks. Soc Netw Anal Min. 2020;10(1):1–13. https://doi.org/10.1007/s13278-020-00668-1.

    Article  Google Scholar 

  56. Wang Y, Huang M, Zhu X, Zhao L. Attention-based LSTM for aspect-level sentiment classification. In: Proceedings of the 2016 conference on empirical methods in natural language processing. Association for Computational Linguistics; 2016. p. 606–15. https://doi.org/10.18653/v1/D16-1058.

    Chapter  Google Scholar 

  57. Kumar JA, Abirami S. Ensemble application of bidirectional LSTM and GRU for aspect category detection with imbalanced data. Neural Comput Appl. 2021;33(21):14603–21. https://doi.org/10.1007/s00521-021-06100-9.

    Article  Google Scholar 

  58. Setiawan EI, Ferry F, Santoso J, Sumpeno S, Fujisawa K, Purnomo MH. Bidirectional GRU for targeted aspect-based sentiment analysis based on character-enhanced token-embedding and multi-level attention. Int J Intell Eng Syst. 2020;13(5):392–407. https://doi.org/10.22266/ijies2020.1031.35.

    Article  Google Scholar 

  59. Granholm V, Noble WS, Käll L. A cross-validation scheme for machine learning algorithms in shotgun proteomics. BMC Bioinform. 2012;13(16):1–8. https://doi.org/10.1186/1471-2105-13-S16-S3.

    Article  Google Scholar 

  60. Sugan N, Srinivas NS, Kar N, Kumar LS, Nath MK, Kanhe A. Performance comparison of different cepstral features for speech emotion recognition. In: 2018 International CET conference on control, communication, and computing (IC4). IEEE; 2018. p. 266–71.

    Chapter  Google Scholar 

  61. Tuncer T, Dogan S, Acharya UR. Automated accurate speech emotion recognition system using twine shuffle pattern and iterative neighborhood component analysis techniques. Knowl Based Syst. 2021;211:106547. https://doi.org/10.1016/j.knosys.2020.106547.

    Article  Google Scholar 

  62. Kwon S. Optimal feature selection based speech emotion recognition using two-stream deep convolutional neural network. Int J Intell Syst. 2021;36(9):5116–35. https://doi.org/10.1002/int.22505.

    Article  Google Scholar 

  63. Kwon S. MLT-DNet: speech emotion recognition using 1D dilated CNN based on multi-learning trick approach. Expert Syst Appl. 2021;167:114177. https://doi.org/10.1016/j.eswa.2020.114177.

    Article  Google Scholar 

  64. Yogesh CK, Hariharan M, Ngadiran R, Adom AH, Yaacob S, Berkai C, Polat K. A new hybrid PSO assisted biogeography-based optimization for emotion and stress recognition from speech signal. Expert Syst Appl. 2017;69:149–58. https://doi.org/10.1016/j.eswa.2016.10.035.

    Article  Google Scholar 

  65. Assunção G, Menezes P, Perdigão F. Speaker awareness for speech emotion recognition. Int J Online Biomed Eng. 2020;16(4):15–22. https://doi.org/10.3991/ijoe.v16i04.11870.

    Article  Google Scholar 

  66. Badshah AM, Rahim N, Ullah N, Ahmad J, Muhammad K, Lee MY, Baik SW. Deep features-based speech emotion recognition for smart affective services. Multimed Tools Appl. 2019;78(5):5571–89. https://doi.org/10.1007/s11042-017-5292-7.

    Article  Google Scholar 

  67. Jiang P, Fu H, Tao H, Lei P, Zhao L. Parallelized convolutional recurrent neural network with spectral features for speech emotion recognition. IEEE Access. 2019;7:90368–77. https://doi.org/10.1109/ACCESS.2019.2927384.

    Article  Google Scholar 

  68. Anvarjon T, Kwon S. Deep-net: a lightweight CNN-based speech emotion recognition system using deep frequency features. Sensors. 2020;20(18):5212. https://doi.org/10.3390/s20185212.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Srinivasulu Reddy Uyyala.

Ethics declarations

Ethical Approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Conflict of Interest

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Murugaiyan, S., Uyyala, S.R. Aspect-Based Sentiment Analysis of Customer Speech Data Using Deep Convolutional Neural Network and BiLSTM. Cogn Comput 15, 914–931 (2023). https://doi.org/10.1007/s12559-023-10127-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12559-023-10127-6

Keywords

Navigation