Abstract
Human–robot interaction was always based on estimation of human emotions from human facial expressions, voice and gestures. Human emotions were always categorized in a discretized manner, while we estimate facial images from common datasets for continuous emotions. Linear regression was used in this study which numerically quantizes human emotions as valence and arousal by displaying the raw images on the two-respective coordinate axis. The face image datasets from the Japanese female facial expression (JAFFE) dataset and the extended Cohn–Kanade (CK+) dataset were used in this experiment. Human emotions for the above-mentioned datasets were interpreted by 85 participants who were used in the experimentation. The best result from a series of experiments shows that the minimum of root mean square error for the JAFFE dataset was 0.1661 for valence and 0.1379 for arousal. The proposed method has been compared with previous methods such as songs, sentences, and it is observed that the proposed method for common datasets testing showed an outstanding emotion estimation performance.
Similar content being viewed by others
References
Ekman P, Friesen WV (1976) Measuring facial movement. Environm Psychol Nonverbal Behav 1(1):56–75. https://doi.org/10.1007/bf01115465
Ekman P (1992) An argument for basic emotions. Cogn Emot 6(3–4):169–200. https://doi.org/10.1080/02699939208411068
Ekman P (1994) Strong evidence for universals in facial expressions: a reply to Russell’s mistaken critique. Psychol Bull 115(2):268–287. https://doi.org/10.1037/0033-2909.115.2.268
Picard RW (1995) Affective computing. MIT Technical report No. 321
Ma C, Prendinger H, Ishizuka M (2005) Emotion estimation and reasoning based on affective textual interaction. In: International conference on affective computing and intelligent interaction, California, USA, pp 622–628
Matsumoto K, Kita K, Ren F (2012) Emotion estimation from sentence using relation between Japanese slangs and emotion expressions. In: Proceedings of the 26th Pacific Asia conference on language, information, and computation, Bali, Indonesia, pp 343–350
Cohen I, Garg A, Huang TS (2000) Emotion recognition from facial expressions using multilevel HMM. Neural information process systems, vol 2. Springer, Berlin
Miyakoshi Y, Kato S (2011) Facial emotion detection considering partial occlusion of face using Bayesian network. In: IEEE Computers and Informatics Symposium (ISCI), Kuala Lumpur, Malaysia, pp 96–101
Ghimire D, Lee J (2013) Geometric feature-based facial expression recognition in image sequences using multi-class adaboost and support vector machines. Sensors 13(6):7714–7734. https://doi.org/10.3390/s130607714
Pitaloka DA, Wulandari A, Basaruddin T, Liliana DY (2017) Enhancing CNN with preprocessing stage in automatic emotion recognition. Proc Comput Sci 116:523–529. https://doi.org/10.1016/j.procs.2017.10.038
Lin DT (2006) Facial expression classification using PCA and hierarchical radial basis function network. J Inf Sci Eng 22(5):1033–1046
Shih FY, Chuang CF, Wang PS (2008) Performance comparisons of facial expression recognition in JAFFE database. Int J Pattern Recognit Artif Intell 22(3):445–459. https://doi.org/10.1142/S0218001408006284
Jane N, Kumar S, Kumar A, Shamsolmoali P, Zareapoor M (2018) Hybrid deep neural networks for face emotion recognition. Pattern Recognit Lett. https://doi.org/10.1016/j.patrec.2018.04.010
Gunawan AAS (2015) Face expression detection on Kinect using active appearance model and fuzzy logic. Proc Comput Sci 59:268–274. https://doi.org/10.1016/j.procs.2015.07.558
Liu P, Han S, Meng Z, Tong Y (2014) Facial expression recognition via a boosted deep belief network. In: IEEE conference on computer vision and pattern recognition, Ohio, USA, pp 1805–1812
Breuer R, Kimmel R (2017) A deep learning perspective on the origin of facial expressions. ArXiv preprint arXiv:1705.01842
Schuller B, Rigoll G, Lang M (2003) Hidden Markov model-based speech emotion recognition. In: IEEE international conference on acoustics, speech, and signal processing, Hong Kong, China, pp I-401–I-404
Busso C, Deng Z, Yildirim S, Bulut M, Lee CM, Kazemzadeh A, Lee S, Neumann U, Narayanan S (2004) Analysis of emotion recognition using facial expressions, speech and multimodal information. In: Proceedings of the 6th international conference on multimodal interfaces, PA, USA, pp 205–211
Kim SM, Valitutti A, Calvo RA (2010) Evaluation of unsupervised emotion models to textual affect recognition. In: Proceedings of the NAACL HLT 2010 workshop on computational approaches to anal and generation of emotion in text, California, USA, pp 62–70
Russell JA (1980) A circumplex model of affect. J Pers Soc Psychol 39(6):1161–1178. https://doi.org/10.1037/h0077714
Russell JA, Barrett LF (1999) Core affect, prototypical emotional episodes, and other things called emotion: dissecting the elephant. J Pers Soc Psychol 76(5):805–819. https://doi.org/10.1037/0022-3514.76.5.805
Yang YH, Lin YC, Su YF, Chen HH (2007) Music emotion classification: a regression approach. In: 2007 IEEE international conference on multimedia and expo, Beijing, China, pp 208–211
Yang YH, Lin YC, Su YF, Chen HH (2008) A regression approach to music emotion recognition. IEEE Trans Audio Speech Lang Process 16(2):448–457. https://doi.org/10.1109/TASL.2007.911513
Chen YA, Wang JC, Yang YH, Chen H (2014) Linear regression-based adaptation of music emotion recognition models for personalization. In: 2014 IEEE international conference on acoustics, speech and signal processing (ICASSP), Florence, Italy, pp 2149–2153
Sun K, Yu J, Huang Y, Hu X (2009) An improved valence-arousal emotion space for video affective content representation and recognition. In: 2009 IEEE international conference on multimedia and expo, New York, USA, pp 566–569
Mehrabian A (1996) Pleasure-arousal-dominance: a general framework for describing and measuring individual differences in temperament. Curr Psychol 14(4):261–292. https://doi.org/10.1007/bf02686918
Grimm M, Kroschel K (2007) Emotion estimation in speech using a 3D emotion space concept. Robust speech recognition and understanding. InTech, Rijeka
Lyons M, Akamatsu S, Kamachi M, Gyoba J (1998) Coding facial expressions with gabor wavelets. In: Proceedings of the third IEEE international conference on automatic face and gesture recognition, Nara, Japan, pp 200–205
Lucey P, Cohn JF, Kanade T, Saragih J (2010) The extended cohn-kanade dataset (ck+): a complete dataset for action unit and emotion-specified expression. In: 2010 IEEE computer society conference on computer vision and pattern recognition workshops, CA, USA, San Francisco, pp 94–101
Bargiela A, Pedrycz W, Nakashima T (2007) Multiple regression with fuzzy data. Fuzzy Sets Syst 158(19):2169–2188. https://doi.org/10.1016/j.fss.2007.04.011
Viola P, Jones MJ (2004) Robust real-time face detection. Int J Comput Vis 57(2):137–154. https://doi.org/10.1023/b:visi.0000013087.49260.fb
Lienhart R, Maydt J (2002) An extended set of haar-like features for rapid object detection. In: 2002 international conference on image processing, New York, USA, vol 1, pp I-900–I-903
Schwegmann CP, Kleynhans W, Salmon BP (2016) Synthetic aperture radar ship detection using Haar-like features. IEEE GeoSci Rem Sens Lett 14(2):154–158. https://doi.org/10.1109/LGRS.2016.2631638
Acknowledgements
This work was supported by the National Research Foundation of Korea funded by the Korean Government under Grant NRF-2019R1A2C1011270.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Lee, HS., Kang, BY. Continuous emotion estimation of facial expressions on JAFFE and CK+ datasets for human–robot interaction. Intel Serv Robotics 13, 15–27 (2020). https://doi.org/10.1007/s11370-019-00301-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11370-019-00301-x