Study on emotion recognition and companion Chatbot using deep neural network

Lee, Ming-Che; Chiang, Shu-Yin; Yeh, Sheng-Cheng; Wen, Ting-Feng

doi:10.1007/s11042-020-08841-6

Study on emotion recognition and companion Chatbot using deep neural network

Published: 27 March 2020

Volume 79, pages 19629–19657, (2020)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Ming-Che Lee¹,
Shu-Yin Chiang²,
Sheng-Cheng Yeh¹ &
…
Ting-Feng Wen¹

2199 Accesses
28 Citations
Explore all metrics

Abstract

With the development of technology, the importance of the research on speech emotion recognition and semantic analysis has increased. The research is primarily applied in companion robot, technology products and medical purpose. In this research, a communication system with speech emotion recognition is proposed. The system pre-process speech with sound data enhancing method in speech emotion recognition and transform the sound into spectrogram by MFCC (Mel Frequency Cepstral Coefficient). Then, GoogLeNet of CNN (Convolutional Neural Network) is applied to recognize the five emotions, which are peace, happy, sad, angry and fear, and the top accuracy of recognition is 79.81%. When applying semantic analysis, the training texts are divided into two categories, positive and negative, and the chatting conversations are conducted in the framework Seq2Seq of RNN (Recurrent Neural Network). The systematic framework of this research has two parts, the client and the server. The former one is developed on Android system to be used in Application, and the latter one is established by Ubuntu Linux system and combined with the web server. With the bi-terminal framework system, the users can record voice in APP one his/her cellphone and upload the voice file to the server. Then, the voice undergoes speech emotion recognition by CNN and semantic analysis by RNN to function as a chatting machine that can respond positively or negatively based on the detected emotion and show the results on APP of the user’s cell phone. The main contributions of this research are: 1) This study introduces the Chinese word vector to the robot dialogue system, effectively improving dialogue tolerance and semantic interpretation, 2) The traditional method of emotion identification must first tokenize the Chinese words, analyze the clauses and part of speech, and capture the emotional keywords before being interpreted by the expert system. Different from the traditional method, this study classifies the input directly through the convolutional neural network after the input sentence is converted into a spectrogram by MFCC, and 3) in addition to implementing the companion robot, the user’s emotional index can be collected for analysis by the back-end care organization. In addition, compared with other commercial humanoid companion robots, this study is presented in an App, which is easier to use and economical.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 3

Semantic speech analysis using machine learning and deep learning techniques: a comprehensive review

Article Open access 19 December 2023

Efficient Speech to Emotion Recognition Using Convolutional Neural Network

Speech Emotion Recognition using Time Distributed 2D-Convolution layers for CAPSULENETS

Article 04 March 2022

References

Directorate-General of Budget, Accounting and Staristics, Executive Yuan, R.O.C., Taiwan (http://eng.dgbas.gov.tw)
Sejdić E, Djurović I, Jiang J (2009) Time-frequency feature representation using energy concentration: an overview of recent advances. Digital Signal Processing 19(1):153–183
Article Google Scholar
LeCun, Yann. "LeNet-5, convolutional neural networks". Retrieved 16 November 2013.
Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., & Bengio, Y. (2014). Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078.
Rumelhart DE, Hinton GE, Williams RJ (1986) Learning representations by back-propagating errors. Nature 323(6088):533
Article MATH Google Scholar
World Population Ageing, http://www.un.org/en/development/desa/population/publications/pdf/ageing/WPA2017_Highlights.pdf
Lynley, Matthew (May 18, 2016). "Google unveils Google assistant, a virtual assistant that's a big upgrade to Google now". TechCrunch. AOL. Retrieved March 17, 2017.
Use Siri on all your Apple devices". support.apple.com.
Lau, Chris (March 18, 2014). "Why Cortana assistant can help Microsoft in the smartphone market". The Street.
Amazon.com Help: Set Up Your Amazon Echo, Amazon.com. Retrieved 2015-03-04.
Wan Xiao-Fang, https://deepq.com/article/WFHLineBot, 2016.
Paro, http://www.parorobots.com/, 2014.
Kuri, https://www.heykuri.com/, 2017.
Zenbo, https://zenbo.asus.com/, Mar 2017.
Deng L, Yu D (2014) Deep learning: methods and applications. Foundations and Trends® in Signal Processing 7(3–4):197–387
Article MathSciNet MATH Google Scholar
UFLDL Tutorial, http://ufldl.stanford.edu/tutorial/supervised/MultiLayerNeuralNetworks/
Szegedy, Christian, et al. "Going deeper with convolutions." Proceedings of the IEEE conference on computer vision and pattern recognition. 2015.
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770-778).
Kombrink S, Mikolov T, Karafiát M, Burget L (2011) Recurrent neural network based language modeling in meeting recognition In Twelfth annual conference of the international speech communication association
Google Scholar
Goodfellow I, Bengio Y, Courville A, Bengio Y (2016) Deep learning, vol 1. MIT press, Cambridge
MATH Google Scholar
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780
Article Google Scholar
Sutskever, I., Vinyals, O., & Le, Q. V. (2014). Sequence to sequence learning with neural networks. In advances in neural information processing systems (pp. 3104-3112).
Rabiner LR, Juang BH (1993) Fundamentals of speech recognition, vol 14. PTR Prentice Hall, Englewood Cliffs
Google Scholar
Zhang, B., Quan, C., & Ren, F. (2016, June). Study on CNN in the recognition of emotion in audio and images. In computer and information science (ICIS), 2016 IEEE/ACIS 15th international conference on (pp. 1-5). IEEE.
Badshah, A. M., Ahmad, J., Rahim, N., & Baik, S. W. (2017, February). Speech emotion recognition from spectrograms with deep convolutional neural network. In platform technology and service (PlatCon), 2017 international conference on (pp. 1-5). IEEE.
Heideman M, Johnson D, Burrus C (1984) Gauss and the history of the fast Fourier transform. IEEE ASSP Mag 1(4):14–21
Article MATH Google Scholar
Chandrasekar, P., Chapaneri, S., & Jayaswal, D. (2014, April). Automatic speech emotion recognition: a survey. In 2014 international conference on circuits, systems, communication and information technology applications (CSCITA) (pp. 341-346). IEEE.
Wu CH, Lin JC, Wei WL (2014) Survey on audiovisual emotion recognition: databases, features, and data fusion strategies. APSIPA transactions on signal and information processing 3
Zhao J, Mao X, Chen L (2019) Speech emotion recognition using deep 1D & 2D CNN LSTM networks. Biomedical Signal Processing and Control 47:312–323
Article Google Scholar
Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W. F., & Weiss, B. (2005). A database of German emotional speech. In Ninth European Conference on Speech Communication and Technology.
Hinton, G. E. (1986, August). Learning distributed representations of concepts. In Proceedings of the eighth annual conference of the cognitive science society (Vol. 1, p. 12).
Bengio Y, Ducharme R, Vincent P, Janvin C (2003) A neural probabilistic language model. The Journal of Machine Learning Research 3:1137–1155
MATH Google Scholar
Le, Q. V., & Mikolov, T. (2014). Distributed representations of sentences and documents. arXiv preprint arXiv:1405.4053.
Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. In advances in neural information processing systems (pp. 3111-3119).
Mnih, A., & Hinton, G. (2007, June). Three new graphical models for statistical language modelling. In proceedings of the 24th international conference on machine learning (pp. 641-648). ACM.
Mnih, A., & Hinton, G. E. (2009). A scalable hierarchical distributed language model. In advances in neural information processing systems (pp. 1081-1088).
Mnih, A., & Teh, Y. W. (2012). A fast and simple algorithm for training neural probabilistic language models. arXiv preprint arXiv:1206.6426.
CLOUD SPEECH-TO-TEXT, https://cloud.google.com/speech-to-text/, 2018.
Livingstone SR, Russo FA (2018) The Ryerson audio-visual database of emotional speech and song (RAVDESS): a dynamic, multimodal set of facial and vocal expressions in north American English. PLoS One 13(5):e0196391
Article Google Scholar
Busso C, Bulut M, Lee CC, Kazemzadeh A, Mower E, Kim S, … Narayanan SS (2008) IEMOCAP: interactive emotional dyadic motion capture database. Lang Resour Eval 42(4):335
Article Google Scholar
Jieba, https://github.com/ldkrsi/jieba-zh_TW, Jul. 2016.
Pre-trained word vectors, https://github.com/facebookresearch/fastText/blob/master/pretrained-vectors.md, May 2017.
DeepQA, https://github.com/Conchylicultor/DeepQA
AndroidAudioRecorder, https://github.com/adrielcafe/AndroidAudioRecorder, Apr 2017.

Download references

Author information

Authors and Affiliations

Department of Computer and Communication Engineering, Ming Chuan University, Taoyuan City, 333, Taiwan
Ming-Che Lee, Sheng-Cheng Yeh & Ting-Feng Wen
Department of Information and Telecommunications Engineering, Ming Chuan University, Taoyuan City, 333, Taiwan
Shu-Yin Chiang

Authors

Ming-Che Lee
View author publications
You can also search for this author in PubMed Google Scholar
Shu-Yin Chiang
View author publications
You can also search for this author in PubMed Google Scholar
Sheng-Cheng Yeh
View author publications
You can also search for this author in PubMed Google Scholar
Ting-Feng Wen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ming-Che Lee.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lee, MC., Chiang, SY., Yeh, SC. et al. Study on emotion recognition and companion Chatbot using deep neural network. Multimed Tools Appl 79, 19629–19657 (2020). https://doi.org/10.1007/s11042-020-08841-6

Download citation

Received: 28 March 2019
Revised: 09 January 2020
Accepted: 13 March 2020
Published: 27 March 2020
Issue Date: July 2020
DOI: https://doi.org/10.1007/s11042-020-08841-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Study on emotion recognition and companion Chatbot using deep neural network

Abstract

Access this article

Similar content being viewed by others

Semantic speech analysis using machine learning and deep learning techniques: a comprehensive review

Efficient Speech to Emotion Recognition Using Convolutional Neural Network

Speech Emotion Recognition using Time Distributed 2D-Convolution layers for CAPSULENETS

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Study on emotion recognition and companion Chatbot using deep neural network

Abstract

Access this article

Similar content being viewed by others

Semantic speech analysis using machine learning and deep learning techniques: a comprehensive review

Efficient Speech to Emotion Recognition Using Convolutional Neural Network

Speech Emotion Recognition using Time Distributed 2D-Convolution layers for CAPSULENETS

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation