ABSTRACT
Computer emotion recognition plays an important role in the field of artificial intelligence and is a key technology to realize human-machine interaction. Aiming at a cross-modal fusion problem of two nonlinear features of facial expression image and speech emotion, a bimodal fusion emotion recognition model (D-CNN) based on convolutional neural network is proposed. Firstly, a fine-grained feature extraction method based on convolutional neural network is proposed. Secondly, in order to obtain joint features representation, a feature fusion method based on the fine-grained features of bimodal is proposed. Finally, in order to verify the performance of the D-CNN model, experiments were conducted on the open source dataset eNTERFACE'05. The experimental results show that the multi-modal emotion recognition model D-CNN is more than 10% higher than the single emotion recognition model of speech and facial expression respectively. In addition, compared with the other commonly used bimodal emotion recognition methods(such as universal background model), the recognition rete of D-CNN is increased by 5%.
- Rajesh K M, Naveenkumar M. A robust method for face recognition and face emotion detection system using support vector machines{C}// 016 International Conference on Electrical, Electronics, Communication, Computer and Optimization Techniques. IEEE, 2017:1--5.Google Scholar
- Verma, R., & Dabbagh, M.-Y. (2013). FAST FACIAL EXPRESSION RECOGNITION BASED ON LOCAL BINARY PATTERNS. In 26th IEEE Canadian Conference Of Electrical And Computer Engineering (CCECE) (pp. 1--4).Google Scholar
- Hasani, B., & Mahoor, M. H. (n.d.). Facial Expression Recognition Using Enhanced Deep 3D Convolutional Neural Networks, 30--40.Google Scholar
- Barsoum, E., Zhang, C., Ferrer, C. C., & Zhang, Z. (2016). Training Deep Networks for Facial Expression Recognition with Crowd-Sourced Label Distribution, 279--283. Google ScholarDigital Library
- Nema, B. M., & Abdul-kareem, A. A. (2017). Preprocessing Signal for Speech Emotion Recognition, (November).Google Scholar
- Chaudhari, P. A. A., Pund, M. A., Bamnote, G. R., & Pattalwar, P. S. V. (n.d.). Need of Boosted GMM in Speech Emotion Recognition System Implemented Using Gaussian Mixture Model, 3(1), 1--13.Google Scholar
- Lim, W., Jang, D., & Lee, T. (2016). Speech emotion recognition using convolutional and Recurrent Neural Networks. 2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA), 1--4.Google ScholarCross Ref
- Huang, Z., Dong, M., Mao, Q., & Zhan, Y. (2014). Speech Emotion Recognition Using CNN. Proceedings of the ACM International Conference on Multimedia - MM '14, 801--804. Google ScholarDigital Library
- Jirayucharoensak, S., Pan-Ngum, S., & Israsena, P. (2014). EEG-Based Emotion Recognition Using Deep Learning Network with Principal Component Based Covariate Shift Adaptation. Scientific World Journal, 2014.Google ScholarCross Ref
- Ajay S G, Pravena D, Govind D, et al. Exploring the Significance of Low Frequency Regions in Electroglottographic Signals for Emotion Recognition{J}. 2017.Google Scholar
- CHEN Peng-zhan, ZHANG Xin, XU Fang-ping.Two-modal emotion recognition based on speech signal and text information{J}. Journal of East China Jiaotong University, 2017, 34(02):100--104.Google Scholar
- Lee, C. W., Song, K. Y., Jeong, J., & Choi, W. Y. (2018). Convolutional Attention Networks for Multimodal Emotion Recognition from Speech and Text Data, 1--8. https://doi.org/arXiv:1805.06606v1Google Scholar
- Paper, R. (2013). Towards Efficient Multi-Modal Emotion Recognition, 1--10.Google Scholar
- Mansoorizadeh, M., & Charkari, N. M. (2010). Multimodal information fusion application to human emotion recognition from face and speech. Multimedia Tools and Applications. Google ScholarDigital Library
- Tzirakis, P., Trigeorgis, G., Nicolaou, M. A., Schuller, B., & Zafeiriou, S. (2016). End-to-End Multimodal Emotion Recognition using Deep Neural Networks, 14(8), 1--9.Google Scholar
- Poria, S., Chaturvedi, I., & Cambria, E. (2016). Convolutional MKL Based Multimodal Emotion Recognition and Sentiment Analysis.Google Scholar
- Yan Jingjie, Lu Guanming, Li Haibo, Wang Shanshan. Two-modal emotion recognition based on facial expression and speech{J}. Journal of Nanjing University of Posts and Telecommunications (Natural Science Edition), 2018, 38(01):60--65.Google Scholar
Index Terms
- Bimodal Emotion Recognition Based on Convolutional Neural Network
Recommendations
Multimodal Deep Convolutional Neural Network for Audio-Visual Emotion Recognition
ICMR '16: Proceedings of the 2016 ACM on International Conference on Multimedia RetrievalEmotion recognition is a challenging task because of the emotional gap between subjective emotion and the low-level audio-visual features. Inspired by the recent success of deep learning in bridging the semantic gap, this paper proposes to bridge the ...
Children's Emotion Recognition Based on Convolutional Neural Network
WSSE '20: Proceedings of the 2nd World Symposium on Software EngineeringIn this paper, a facial expression recognition model for children was presented. Based on the existing research, this study divides common learner emotions into happiness, concentration, panic and boredom, and builds a large-scale learner emotion ...
Video-based Emotion Recognition Using Deeply-Supervised Neural Networks
ICMI '18: Proceedings of the 20th ACM International Conference on Multimodal InteractionEmotion recognition (ER) based on natural facial images/videos has been studied for some years and considered a comparatively hot topic in the field of affective computing. However, it remains a challenge to perform ER in the wild, given the noises ...
Comments