Abstract
Emotion recognition is a very important technique for effective interaction between human and artificial intelligence (AI) system. For a long time, facial expression-based methods have been actively studied, and they are showing high recognition performance thanks to powerful deep learning recently. On the other hand, the images of the datasets used in the conventional emotion recognition studies are usually short in length and often generated through intentional expression. Also, continuous domain annotation of emotional labels in dataset configuration requires high cost. In order to overcome such problems, this paper proposes an emotion recognition method based on semi-supervised learning that utilizes an appropriate amount of unlabeled dataset in parallel while minimizing the use of labeled dataset requiring high training cost. The proposed emotion recognition method is based on CNN-LSTM-based regressor for regressing arousal and valence in continuous domain. In addition, we present scenarios and design criteria in which semi-supervised learning can be effectively applied to emotion recognition tasks through experiments using well-known MAHNOB-HCI and AFEW-VA datasets.
Similar content being viewed by others
References
Dai A M, Le Q V (2015). Semi-supervised sequence learning. In: advances in neural information processing systems (NIPS), pp. 3079-3087
Deng J, Dong W, Socher R, Li L J, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: IEEE conference on computer vision and pattern recognition (CVPR), pp. 248-255
Dhall A, Goecke R, Ghosh S, Joshi J, Hoey J, Gedeon T (2017) From individual to group-level emotion recognition: EmotiW 5.0. In: 19th ACM international conference on multimodal interaction (ICMI), pp. 524-528
Ghimire D, Lee J (2013) Geometric feature-based facial expression recognition in image sequences using multi-class adaboost and support vector machines. Sensors 13(6):7714–7734
Goodfellow I J, Erhan D, Carrier P L, Courville A, Mirza M, Hamner B, et al (2013) Challenges in representation learning: a report on three machine learning contests. In: International Conference on Neural Information Processing. Springer, pp. 117–124
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780
Huang G, Liu Z, Van Der Maaten L, Weinberger K Q (2017) Densely connected convolutional networks. In: IEEE conference on computer vision and pattern recognition (CVPR), pp. 4700-4708
Jung H, Lee S, Yim J, Park S, Kim J (2015) Joint fine-tuning in deep neural networks for facial expression recognition. In: IEEE international conference on computer vision (ICCV), pp. 2983-2991
Ketkar N. (2017) Introduction to pytorch. In Deep learning with python, Apress, pp. 195–208
Kim D H, Lee M K, Choi D Y, Song B C (2017) Multi-modal emotion recognition using semi-supervised learning and multiple neural networks in the wild. In: 19th ACM international conference on multimodal interaction (ICMI), pp. 529-535
Koelstra S, Muhl C, Soleymani M, Lee JS, Yazdani A, Ebrahimi T, Pun T, Nijholt A, Patras I (2011) Deap: a database for emotion analysis; using physiological signals. IEEE Trans Affect Comput 3(1):18–31
Kollias D, Tzirakis P, Nicolaou MA, Papaioannou A, Zhao G, Schuller B, Kotsia I, Zafeiriou S (2019) Deep affect prediction in-the-wild: Aff-wild database and challenge, deep architectures, and beyond. Int J Comput Vis 127(6–7):907–929
Kossaifi J, Tzimiropoulos G, Todorovic S, Pantic M (2017) AFEW-VA database for valence and arousal estimation in-the-wild. Image Vis Comput 65:23–36
Krizhevsky A, Hinton G (2009) Learning multiple layers of features from tiny images, technical report. Univ Toronto 1(4):7
Laine S, Aila T (2016) Temporal ensembling for semi-supervised learning. arXiv preprint arXiv:1610.02242.
Lucey P, Cohn J F, Kanade T, Saragih J, Ambadar Z, Matthews I (2010) The extended cohn-kanade dataset (ck+): a complete dataset for action unit and emotion-specified expression. In: 2010 IEEE computer society conference on computer vision and pattern recognition-workshops (CVPRW), pp. 94-101
Ma Y, Chen W, Ma X, Xu J, Huang X, Maciejewski R, TungA K (2017) EasySVM: a visual analysis approach for open-box support vector machines. Comput Vis Med 3(2):161–175
Mehrkanoon S, Alzate C, Mall R, Langone R, Suykens JA (2014) Multiclass semisupervised learning based upon kernel spectral clustering. IEEE Trans Neural Netw Learn Syst 26(4):720–733
Mehrkanoon S, Agudelo OM, Suykens JA (2015) Incremental multi-class semi-supervised clustering regularized by Kalman filtering. Neural Netw 71:88–104
Netzer Y, Wang T, Coates A, Bissacco A, Wu B, Ng A Y (2011) Reading digits in natural images with unsupervised feature learning. In: Workshop Deep Learn. Unsupervised Feature Learn. Neural Inf. Process. Syst. Workshops
Pantic, M, Valstar M, Rademaker R, Maat L (2005) Web-based database for facial expression analysis. In: IEEE international conference on multimedia and Expo (ICMI), p. 5
Qiao S, Shen W, Zhang Z, Wang B, Yuille A (2018) Deep co-training for semi-supervised image recognition. In: European conference on computer vision (ECCV), pp. 135-152
Rezagholiradeh M, Haidar M A (2018) Reg-Gan: semi-supervised learning based on generative adversarial networks for regression. In: international conference on acoustics, speech and signal processing (ICASSP), pp. 2806-2810
Robert T, Thome N, Cord M (2018) Hybridnet: classification and reconstruction cooperation for semi-supervised learning. In: European conference on computer vision (ECCV), pp. 153-169
Salimans T, Goodfellow I, Zaremba W, Cheung V, Radford A, Chen X (2016) Improved techniques for training gans. In: advances in neural information processing systems (NIPS), pp. 2234-2242
Soleymani M, Lichtenauer J, Pun T, Pantic M (2011) A multimodal database for affect recognition and implicit tagging. IEEE Trans Affect Comput 3(1):42–55
Soleymani M, Asghari-Esfeden S, Fu Y, Pantic M (2015) Analysis of EEG signals and facial expressions for continuous emotion detection. IEEE Trans Affect Comput 7(1):17–28
Tarvainen A, Valpola H (2017) Mean teachers are better role models: weight-averaged consistency targets improve semi-supervised deep learning results. In: advances in neural information processing systems (NIPS), pp. 1195-1204
Tong Y, Liao W, Ji Q (2007) Facial action unit recognition by exploiting their dynamic and semantic relationships. IEEE Trans Pattern Anal Mach Intell 29(10):1683–1699
Yang H, Ciftci U, Yin L (2018) Facial expression recognition by de-expression residue learning. In: IEEE conference on computer vision and pattern recognition (CVPR), pp. 2168-2177
Zhang S, Zhu X, Lei Z, Shi H, Wang X, Li S Z (2017) S3fd: single shot scale-invariant face detector. In: IEEE international conference on computer vision (ICCV), pp. 192-201
Zhao G, Huang X, Taini M, Li SZ, PietikäInen M (2011) Facial expression recognition from near-infrared videos. Image Vis Comput 29(9):607–619
Acknowledgements
This work was supported by Institute of Information & communications Technology Planning & Evaluation(IITP) grant funded by the Korea government(MSIT) [2020-0-01389, Artificial Intelligence Convergence Research Center(Inha University)] and Industrial Technology Innovation Program through the Ministry of Trade, Industry, and Energy (MI, Korea) [Development of Human-Friendly Human-Robot Interaction Technologies Using Human Internal Emotional States] under Grant 10073154.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Choi, D.Y., Song, B.C. Semi-supervised learning for facial expression-based emotion recognition in the continuous domain. Multimed Tools Appl 79, 28169–28187 (2020). https://doi.org/10.1007/s11042-020-09412-5
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-020-09412-5