Skip to main content
Log in

Study on emotion recognition and companion Chatbot using deep neural network

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

With the development of technology, the importance of the research on speech emotion recognition and semantic analysis has increased. The research is primarily applied in companion robot, technology products and medical purpose. In this research, a communication system with speech emotion recognition is proposed. The system pre-process speech with sound data enhancing method in speech emotion recognition and transform the sound into spectrogram by MFCC (Mel Frequency Cepstral Coefficient). Then, GoogLeNet of CNN (Convolutional Neural Network) is applied to recognize the five emotions, which are peace, happy, sad, angry and fear, and the top accuracy of recognition is 79.81%. When applying semantic analysis, the training texts are divided into two categories, positive and negative, and the chatting conversations are conducted in the framework Seq2Seq of RNN (Recurrent Neural Network). The systematic framework of this research has two parts, the client and the server. The former one is developed on Android system to be used in Application, and the latter one is established by Ubuntu Linux system and combined with the web server. With the bi-terminal framework system, the users can record voice in APP one his/her cellphone and upload the voice file to the server. Then, the voice undergoes speech emotion recognition by CNN and semantic analysis by RNN to function as a chatting machine that can respond positively or negatively based on the detected emotion and show the results on APP of the user’s cell phone. The main contributions of this research are: 1) This study introduces the Chinese word vector to the robot dialogue system, effectively improving dialogue tolerance and semantic interpretation, 2) The traditional method of emotion identification must first tokenize the Chinese words, analyze the clauses and part of speech, and capture the emotional keywords before being interpreted by the expert system. Different from the traditional method, this study classifies the input directly through the convolutional neural network after the input sentence is converted into a spectrogram by MFCC, and 3) in addition to implementing the companion robot, the user’s emotional index can be collected for analysis by the back-end care organization. In addition, compared with other commercial humanoid companion robots, this study is presented in an App, which is easier to use and economical.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23
Fig. 24
Fig. 25

Similar content being viewed by others

References

  1. Directorate-General of Budget, Accounting and Staristics, Executive Yuan, R.O.C., Taiwan (http://eng.dgbas.gov.tw)

  2. Sejdić E, Djurović I, Jiang J (2009) Time-frequency feature representation using energy concentration: an overview of recent advances. Digital Signal Processing 19(1):153–183

    Article  Google Scholar 

  3. LeCun, Yann. "LeNet-5, convolutional neural networks". Retrieved 16 November 2013.

  4. Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., & Bengio, Y. (2014). Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078.

  5. Rumelhart DE, Hinton GE, Williams RJ (1986) Learning representations by back-propagating errors. Nature 323(6088):533

    Article  MATH  Google Scholar 

  6. World Population Ageing, http://www.un.org/en/development/desa/population/publications/pdf/ageing/WPA2017_Highlights.pdf

  7. Lynley, Matthew (May 18, 2016). "Google unveils Google assistant, a virtual assistant that's a big upgrade to Google now". TechCrunch. AOL. Retrieved March 17, 2017.

  8. Use Siri on all your Apple devices". support.apple.com.

  9. Lau, Chris (March 18, 2014). "Why Cortana assistant can help Microsoft in the smartphone market". The Street.

  10. Amazon.com Help: Set Up Your Amazon Echo, Amazon.com. Retrieved 2015-03-04.

  11. Wan Xiao-Fang, https://deepq.com/article/WFHLineBot, 2016.

  12. Paro, http://www.parorobots.com/, 2014.

  13. Kuri, https://www.heykuri.com/, 2017.

  14. Zenbo, https://zenbo.asus.com/, Mar 2017.

  15. Deng L, Yu D (2014) Deep learning: methods and applications. Foundations and Trends® in Signal Processing 7(3–4):197–387

    Article  MathSciNet  MATH  Google Scholar 

  16. UFLDL Tutorial, http://ufldl.stanford.edu/tutorial/supervised/MultiLayerNeuralNetworks/

  17. Szegedy, Christian, et al. "Going deeper with convolutions." Proceedings of the IEEE conference on computer vision and pattern recognition. 2015.

  18. He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770-778).

  19. Kombrink S, Mikolov T, Karafiát M, Burget L (2011) Recurrent neural network based language modeling in meeting recognition In Twelfth annual conference of the international speech communication association

    Google Scholar 

  20. Goodfellow I, Bengio Y, Courville A, Bengio Y (2016) Deep learning, vol 1. MIT press, Cambridge

    MATH  Google Scholar 

  21. Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780

    Article  Google Scholar 

  22. Sutskever, I., Vinyals, O., & Le, Q. V. (2014). Sequence to sequence learning with neural networks. In advances in neural information processing systems (pp. 3104-3112).

  23. Rabiner LR, Juang BH (1993) Fundamentals of speech recognition, vol 14. PTR Prentice Hall, Englewood Cliffs

    Google Scholar 

  24. Zhang, B., Quan, C., & Ren, F. (2016, June). Study on CNN in the recognition of emotion in audio and images. In computer and information science (ICIS), 2016 IEEE/ACIS 15th international conference on (pp. 1-5). IEEE.

  25. Badshah, A. M., Ahmad, J., Rahim, N., & Baik, S. W. (2017, February). Speech emotion recognition from spectrograms with deep convolutional neural network. In platform technology and service (PlatCon), 2017 international conference on (pp. 1-5). IEEE.

  26. Heideman M, Johnson D, Burrus C (1984) Gauss and the history of the fast Fourier transform. IEEE ASSP Mag 1(4):14–21

    Article  MATH  Google Scholar 

  27. Chandrasekar, P., Chapaneri, S., & Jayaswal, D. (2014, April). Automatic speech emotion recognition: a survey. In 2014 international conference on circuits, systems, communication and information technology applications (CSCITA) (pp. 341-346). IEEE.

  28. Wu CH, Lin JC, Wei WL (2014) Survey on audiovisual emotion recognition: databases, features, and data fusion strategies. APSIPA transactions on signal and information processing 3

  29. Zhao J, Mao X, Chen L (2019) Speech emotion recognition using deep 1D & 2D CNN LSTM networks. Biomedical Signal Processing and Control 47:312–323

    Article  Google Scholar 

  30. Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W. F., & Weiss, B. (2005). A database of German emotional speech. In Ninth European Conference on Speech Communication and Technology.

  31. Hinton, G. E. (1986, August). Learning distributed representations of concepts. In Proceedings of the eighth annual conference of the cognitive science society (Vol. 1, p. 12).

  32. Bengio Y, Ducharme R, Vincent P, Janvin C (2003) A neural probabilistic language model. The Journal of Machine Learning Research 3:1137–1155

    MATH  Google Scholar 

  33. Le, Q. V., & Mikolov, T. (2014). Distributed representations of sentences and documents. arXiv preprint arXiv:1405.4053.

  34. Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. In advances in neural information processing systems (pp. 3111-3119).

  35. Mnih, A., & Hinton, G. (2007, June). Three new graphical models for statistical language modelling. In proceedings of the 24th international conference on machine learning (pp. 641-648). ACM.

  36. Mnih, A., & Hinton, G. E. (2009). A scalable hierarchical distributed language model. In advances in neural information processing systems (pp. 1081-1088).

  37. Mnih, A., & Teh, Y. W. (2012). A fast and simple algorithm for training neural probabilistic language models. arXiv preprint arXiv:1206.6426.

  38. CLOUD SPEECH-TO-TEXT, https://cloud.google.com/speech-to-text/, 2018.

  39. Livingstone SR, Russo FA (2018) The Ryerson audio-visual database of emotional speech and song (RAVDESS): a dynamic, multimodal set of facial and vocal expressions in north American English. PLoS One 13(5):e0196391

    Article  Google Scholar 

  40. Busso C, Bulut M, Lee CC, Kazemzadeh A, Mower E, Kim S, … Narayanan SS (2008) IEMOCAP: interactive emotional dyadic motion capture database. Lang Resour Eval 42(4):335

    Article  Google Scholar 

  41. Jieba, https://github.com/ldkrsi/jieba-zh_TW, Jul. 2016.

  42. Pre-trained word vectors, https://github.com/facebookresearch/fastText/blob/master/pretrained-vectors.md, May 2017.

  43. DeepQA, https://github.com/Conchylicultor/DeepQA

  44. AndroidAudioRecorder, https://github.com/adrielcafe/AndroidAudioRecorder, Apr 2017.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ming-Che Lee.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lee, MC., Chiang, SY., Yeh, SC. et al. Study on emotion recognition and companion Chatbot using deep neural network. Multimed Tools Appl 79, 19629–19657 (2020). https://doi.org/10.1007/s11042-020-08841-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-020-08841-6

Keywords

Navigation