ABSTRACT
Existing methods on facial expression recognition (FER) are mainly trained in the setting when all expression classes are fixed in advance. However, in real applications, expression classes are becoming increasingly fine-grained and incremental. To deal with sequential expression classes, we can fine-tune or re-train these models, but this often results in poor performance or large computing resources consumption. To address these problems, we develop an Incremental Facial Expression Recognition Network (IExpressNet), which can learn a competitive multi-class classifier at any time with a lower requirement of computing resources. Specifically, IExpressNet consists of two novel components. First, we construct an exemplar set by dynamically selecting representative samples from old expression classes. Then, the exemplar set and new expression classes samples constitute the training set. Second, we design a novel center-expression-distilled loss. As for facial expression in the wild, center-expression-distilled loss enhances the discriminative power of the deeply learned features and prevents catastrophic forgetting. Extensive experiments are conducted on two large-scale FER datasets in the wild, RAF-DB and AffectNet. The results demonstrate the superiority of the proposed method as compared to state-of-the-art incremental learning approaches.
Supplemental Material
- Iman Abbasnejad, Sridha Sridharan, Dung Nguyen, Simon Denman, Clinton Fookes, and Simon Lucey. 2017. Using synthetic data to improve facial expression analysis with 3d convolutional networks. In Proceedings of the IEEE International Conference on Computer Vision Workshops. 1609--1618.Google ScholarCross Ref
- Rahaf Aljundi, Punarjay Chakravarty, and Tinne Tuytelaars. 2017. Expert gate: Lifelong learning with a network of experts. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3366--3375.Google ScholarCross Ref
- Lisa Feldman Barrett. 2006. Are emotions natural kinds? Perspectives on psychological science, Vol. 1, 1 (2006), 28--58.Google Scholar
- Eden Belouadah and Adrian Popescu. 2018. DeeSIL: Deep-shallow incremental learning. In Proceedings of the European Conference on Computer Vision Workshops.Google Scholar
- Eden Belouadah and Adrian Popescu. 2019. IL2M: Class incremental learning with dual memory. In Proceedings of the IEEE International Conference on Computer Vision. 583--592.Google ScholarCross Ref
- Michael J Black and Yaser Yacoob. 1997. Recognizing facial expressions in image sequences using local parameterized models of image motion. International Journal of Computer Vision, Vol. 25, 1 (1997), 23--48.Google ScholarDigital Library
- Cynthia Breazeal. 2003. Emotion and sociable humanoid robots. International Journal of Human-Computer Studies, Vol. 59, 1--2 (2003), 119--155.Google ScholarDigital Library
- Francisco M. Castro, Manuel J. Mar'in-Jiménez, Nicolás Guil, Cordelia Schmid, and Karteek Alahari. 2018. End-to-end incremental learning. In Proceedings of the European Conference on Computer Vision. 233--248.Google ScholarDigital Library
- Gert Cauwenberghs and Tomaso Poggio. 2001. Incremental and decremental support vector machine learning. In Proceedings of Advances in Neural Information Processing Systems. 409--415.Google Scholar
- Wei-Yi Chang, Shih-Huan Hsu, and Jen-Hsien Chien. 2017. FATAUVA-Net: An integrated deep learning framework for facial attribute recognition, action unit detection, and valence-arousal estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 17--25.Google ScholarCross Ref
- Xinlei Chen, Abhinav Shrivastava, and Abhinav Gupta. 2013. Neil: Extracting visual knowledge from web data. In Proceedings of the IEEE International Conference on Computer Vision. 1409--1416.Google ScholarDigital Library
- Ciprian Adrian Corneanu, Marc Oliu Simón, Jeffrey F. Cohn, and Sergio Escalera Guerrero. 2016. Survey on rgb, 3d, thermal, and multimodal approaches for facial expression recognition: History, trends, and affect-related applications. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 38, 8 (2016), 1548--1568.Google ScholarDigital Library
- Corinna Cortes and Vladimir Vapnik. 1995. Support-vector networks. Machine Learning, Vol. 20, 3 (1995), 273--297.Google ScholarCross Ref
- Alan S. Cowen and Dacher Keltner. 2017. Self-report captures 27 distinct categories of emotion bridged by continuous gradients. Proceedings of the National Academy of Sciences, Vol. 114, 38 (2017), E7900--E7909.Google ScholarCross Ref
- Prithviraj Dhar, Rajat Vikram Singh, Kuan-Chuan Peng, Ziyan Wu, and Rama Chellappa. 2019. Learning without memorizing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5138--5146.Google ScholarCross Ref
- Gianluca Donato, Marian Stewart Bartlett, Joseph C. Hager, Paul Ekman, and Terrence J. Sejnowski. 1999. Classifying facial actions. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 21, 10 (1999), 974--989.Google ScholarDigital Library
- Paul Ekman and Wallace V Friesen. 1971. Constants across cultures in the face and emotion. Journal of Personality and Social Psychology, Vol. 17, 2 (1971), 124.Google ScholarCross Ref
- Paul Ekman, Wallace V. Friesen, Maureen O'sullivan, Anthony Chan, Irene Diacoyanni-Tarlatzis, Karl Heider, Rainer Krause, William Ayhan LeCompte, Tom Pitcairn, Pio E. Ricci-Bitti, Klaus Scherer, and Masatoshi Tomita. 1987. Universals and cultural differences in the judgments of facial expressions of emotion. Journal of Personality and Social Psychology, Vol. 53, 4 (1987), 712.Google ScholarCross Ref
- C. Fabian Benitez-Quiroz, Ramprakash Srinivasan, and Aleix M. Martinez. 2016. Emotionet: An accurate, real-time algorithm for the automatic annotation of a million facial expressions in the wild. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5562--5570.Google Scholar
- Zhiwen Fan, Liyan Sun, Xinghao Ding, Yue Huang, Congbo Cai, and John Paisley. 2018a. A segmentation-aware deep fusion network for compressed sensing mri. In Proceedings of the European Conference on Computer Vision. 55--70.Google ScholarCross Ref
- Zhiwen Fan, Huafeng Wu, Xueyang Fu, Yue Huang, and Xinghao Ding. 2018b. Residual-guide network for single image deraining. In In Proceedings of the 26th ACM International Conference on Multimedia. 1751--1759.Google ScholarDigital Library
- Ian J Goodfellow, Mehdi Mirza, Aaron Courville Da Xiao, and Yoshua Bengio. 2014. An empirical investigation of catastrophic forgeting in gradientbased neural networks. In Proceedings of International Conference on Learning Representations. Citeseer.Google Scholar
- Xiaodong Gu, Zhiwen Fan, Siyu Zhu, Zuozhuo Dai, Feitong Tan, and Ping Tan. 2020. Cascade cost volume for high-resolution multi-view stereo and stereo matching. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2495--2504.Google ScholarCross Ref
- Behzad Hasani and Mohammad H. Mahoor. 2017. Facial expression recognition using enhanced deep 3D convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 30--40.Google Scholar
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770--778.Google ScholarCross Ref
- Heechul Jung, Sihaeng Lee, Junho Yim, Sunjeong Park, and Junmo Kim. 2015. Joint fine-tuning in deep neural networks for facial expression recognition. In Proceedings of the IEEE International Conference on Computer Vision. 2983--2991.Google ScholarDigital Library
- Anis Kacem, Mohamed Daoudi, Boulbaba Ben Amor, and Juan Carlos Alvarez-Paiva. 2017. A novel space-time representation on the positive semidefinite cone for facial expression recognition. In Proceedings of the IEEE International Conference on Computer Vision. 3180--3189.Google ScholarCross Ref
- Ronald Kemker and Christopher Kanan. 2018. Fearnet: Brain-inspired model for incremental learning. In Proceedings of International Conference on Learning Representations.Google Scholar
- Dimitrios Kollias, Mihalis A. Nicolaou, Irene Kotsia, Guoying Zhao, and Stefanos Zafeiriou. 2017. Recognition of affect in the wild using deep neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 26--33.Google ScholarCross Ref
- Steffen Lange and Gunter Grieser. 2002. On the power of incremental learning. Theoretical Computer Science, Vol. 288, 2 (2002), 277--307.Google ScholarDigital Library
- Joonwhoan Lee and EunJong Park. 2011. Fuzzy similarity-based emotional classification of color images. IEEE Transactions on Multimedia, Vol. 13, 5 (2011), 1031--1039.Google ScholarDigital Library
- Liandong Li, Tadas Baltrusaitis, Bo Sun, and Louis-Philippe Morency. 2017a. Combining sequential geometry and texture features for distinguishing genuine and deceptive emotions. In Proceedings of the IEEE International Conference on Computer Vision Workshops. 3147--3153.Google ScholarCross Ref
- Shan Li and Weihong Deng. 2019. Reliable crowdsourcing and deep locality-preserving learning for unconstrained facial expression recognition. IEEE Transactions on Image Processing, Vol. 28, 1 (2019), 356--370.Google ScholarDigital Library
- Shan Li and Weihong Deng. 2020. Deep facial expression recognition: A survey. IEEE Transactions on Affective Computing (2020).Google Scholar
- Shan Li, Weihong Deng, and JunPing Du. 2017b. Reliable crowdsourcing and deep locality-preserving learning for expression recognition in the wild. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2852--2861.Google ScholarCross Ref
- Zhizhong Li and Derek Hoiem. 2017. Learning without forgetting. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 40, 12 (2017), 2935--2947.Google ScholarDigital Library
- Takeo Lien, James J.and Kanade, Jeffrey F. Cohn, and Ching-Chung Li. 1998. Automated facial expression recognition based on FACS action units. In Proceedings of Third IEEE International Conference on Automatic Face and Gesture Recognition. IEEE, 390--395.Google ScholarCross Ref
- Kristen A. Lindquist, Tor D. Wager, Hedy Kober, Eliza Bliss-Moreau, and Lisa Feldman Barrett. 2012. The brain basis of emotion: A meta-analytic review. The Behavioral and Brain Sciences, Vol. 35, 3 (2012), 121.Google ScholarCross Ref
- Wantong Lu, Yantao Yu, Yongzhe Chang, Zhen Wang, Chenhui Li, and Bo Yuan. 2020. A Dual Input-aware Factorization Machine for CTR Prediction. In Proceedings of the 29th International Joint Conference on Artificial Intelligence.Google ScholarCross Ref
- Patrick Lucey, Jeffrey F. Cohn, Takeo Kanade, Jason Saragih, Zara Ambadar, and Iain Matthews. 2010a. The extended cohn-kanade dataset (CK): A complete dataset for action unit and emotion-specified expression. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops. IEEE, 94--101.Google ScholarCross Ref
- Patrick Lucey, Jeffrey F. Cohn, Iain Matthews, Simon Lucey, Sridha Sridharan, Jessica Howlett, and Kenneth M. Prkachin. 2010b. Automatically detecting pain in video through facial action units. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), Vol. 41, 3 (2010), 664--674.Google ScholarDigital Library
- Arun Mallya and Svetlana Lazebnik. 2018. Packnet: Adding multiple tasks to a single network by iterative pruning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7765--7773.Google ScholarCross Ref
- Aleix Martinez and Shichuan Du. 2012. A model of the perception of facial expressions of emotion by humans: Research overview and perspectives. Journal of Machine Learning Research, Vol. 13, May (2012), 1589--1608.Google ScholarDigital Library
- Daniel McDuff, Rana El Kaliouby, Karim Kassam, and Rosalind Picard. 2010. Affect valence inference from facial action unit spectrograms. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops. IEEE, 17--24.Google ScholarCross Ref
- Thomas Mensink, Jakob Verbeek, Florent Perronnin, and Gabriela Csurka. 2013. Distance-based image classification: Generalizing to new classes at near-zero cost. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 35, 11 (2013), 2624--2637.Google ScholarDigital Library
- Tom Mitchell, William Cohen, Estevam Hruschka, Partha Talukdar, Bishan Yang, Justin Betteridge, Andrew Carlson, Bhanava Dalvi, Matt Gardner, Bryan Kisiel, et almbox. 2018. Never-ending learning. Commun. ACM, Vol. 61, 5 (2018), 103--115.Google ScholarDigital Library
- Ali Mollahosseini, Behzad Hasani, and Mohammad H Mahoor. 2017. Affectnet: A database for facial expression, valence, and arousal computing in the wild. IEEE Transactions on Affective Computing, Vol. 10, 1 (2017), 18--31.Google ScholarDigital Library
- Michael D. Muhlbaier, Apostolos Topalis, and Robi Polikar. 2008. Learn. NC: Combining ensemble of classifiers with dynamically weighted consult-and-vote for efficient incremental learning of new classes. IEEE Transactions on Neural Networks, Vol. 20, 1 (2008), 152--168.Google ScholarDigital Library
- Mihalis A. Nicolaou, Hatice Gunes, and Maja Pantic. 2011. Continuous prediction of spontaneous affect from multiple cues and modalities in valence-arousal space. IEEE Transactions on Affective Computing, Vol. 2, 2 (2011), 92--105.Google ScholarDigital Library
- Takahiro Otsuka and Jun Ohya. 1998. Extracting facial motion parameters by tracking feature points. In Proceedings of International Conference on Advanced Multimedia Content Processing. Springer, 433--444.Google Scholar
- Curtis Padgett, Garrison W. Cottrell, and Ralph Adolphs. 1996. Categorical perception in facial emotion classification. In Proceedings of the Eighteenth Annual Conference of the Cognitive Science Society: July 12-15, 1996, University of California, San Diego, Vol. 18. Psychology Press, 249.Google Scholar
- Sylvestre-Alvise Rebuffi, Hakan Bilen, and Andrea Vedaldi. 2018. Efficient parametrization of multi-domain deep neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 8119--8127.Google ScholarCross Ref
- Sylvestre-Alvise Rebuffi, Alexander Kolesnikov, Georg Sperl, and Christoph H. Lampert. 2017. iCaRL: Incremental classifier and representation learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2001--2010.Google Scholar
- Marko Ristin, Matthieu Guillaumin, Juergen Gall, and Luc Van Gool. 2014. Incremental learning of NCM forests for large-scale image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3654--3661.Google ScholarDigital Library
- Amir Rosenfeld and John K. Tsotsos. 2018. Incremental learning through deep adaptation. IEEE Transactions on Pattern Analysis and Machine Intelligence (2018).Google Scholar
- Stefan Ruping. 2001. Incremental learning with support vector machines. In Proceedings of the IEEE International Conference on Data Mining. 641--642.Google ScholarCross Ref
- Evangelos Sariyanidi, Hatice Gunes, and Andrea Cavallaro. 2014. Automatic analysis of facial affect: A survey of registration, representation, and recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 37, 6 (2014), 1113--1133.Google ScholarCross Ref
- Harold Schlosberg. 1954. Three dimensions of emotion. Psychological Review, Vol. 61, 2 (1954), 81.Google ScholarCross Ref
- Ramprasaath R. Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. 2017. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE International Conference on Computer Vision. 618--626.Google ScholarCross Ref
- Daniela Simon, Kenneth D. Craig, Frederic Gosselin, Pascal Belin, and Pierre Rainville. 2008. Recognition and discrimination of prototypical dynamic expressions of pain and emotions. PAIN®, Vol. 135, 1--2 (2008), 55--64.Google ScholarCross Ref
- Hao Tang, Wei Wang, Songsong Wu, Xinya Chen, Dan Xu, Nicu Sebe, and Yan Yan. 2019. Expression Conditional Gan for Facial Expression-to-Expression Translation. In Proceedings of the IEEE Conference on International Conference on Image Processing. 4449--4453.Google ScholarCross Ref
- Sebastian Thrun. 1998. Lifelong learning algorithms. In Learning to learn. Springer, 181--209.Google Scholar
- Ying-li Tian, Takeo Kanade, and Jeffrey F. Cohn. 2001. Recognizing action units for facial expression analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 23, 2 (2001), 97--115.Google ScholarDigital Library
- Yu-Xiong Wang, Deva Ramanan, and Martial Hebert. 2017. Growing a brain: Fine-tuning by increasing model capacity. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2471--2480.Google ScholarCross Ref
- Zhen Wang, Liu Liu, and Dacheng Tao. 2020. Deep Streaming Label Learning. In Proceedings of International Conference on Machine Learning.Google Scholar
- Zhen Wang, Rui Zhang, Jianzhong Qi, and Bo Yuan. 2019. DBSVEC: Density-Based Clustering Using Support Vector Expansion. In Proceedings of the IEEE Conference on International Conference on Data Engineering. 280--291.Google ScholarCross Ref
- Max Welling. 2009. Herding dynamical weights to learn. In Proceedings of International Conference on Machine Learning. 1121--1128.Google ScholarDigital Library
- Yandong Wen, Kaipeng Zhang, Zhifeng Li, and Yu Qiao. 2016. A discriminative feature learning approach for deep face recognition. In Proceedings of the European Conference on Computer Vision. Springer, 499--515.Google ScholarCross Ref
- Amanda C. de C. Williams. 2002. Facial expression of pain: an evolutionary account. Behavioral and Brain Sciences, Vol. 25, 4 (2002), 439--455.Google Scholar
- Tianjun Xiao, Jiaxing Zhang, Kuiyuan Yang, Yuxin Peng, and Zheng Zhang. 2014. Error-driven incremental learning in deep convolutional neural network for large-scale image classification. In Proceedings of the 22nd ACM International Conference on Multimedia. 177--186.Google ScholarDigital Library
- Huiyuan Yang, Umur Ciftci, and Lijun Yin. 2018. Facial expression recognition by de-expression residue learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2168--2177.Google ScholarCross Ref
- Yantao Yu, Zhen Wang, and Bo Yuan. 2019. An Input-aware Factorization Machine for Sparse Prediction. In Proceedings of International Joint Conference on Artificial Intelligence. 1466--1472.Google ScholarCross Ref
- Jiabei Zeng, Shiguang Shan, and Xilin Chen. 2018. Facial expression recognition with inconsistently annotated datasets. In Proceedings of the European Conference on Computer Vision. 222--237.Google ScholarCross Ref
- Feifei Zhang, Tianzhu Zhang, Qirong Mao, and Changsheng Xu. 2018. Joint pose and expression modeling for facial expression recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3359--3368.Google ScholarCross Ref
- Kaihao Zhang, Yongzhen Huang, Yong Du, and Liang Wang. 2017. Facial expression recognition based on deep evolutional spatial-temporal networks. IEEE Transactions on Image Processing, Vol. 26, 9 (2017), 4193--4203.Google ScholarDigital Library
- Ligang Zhang and Dian Tjondronegoro. 2011. Facial expression recognition using facial movement features. IEEE Transactions on Affective Computing, Vol. 2, 4 (2011), 219--229.Google ScholarDigital Library
- Guoying Zhao and Matti Pietikainen. 2007. Dynamic texture recognition using local binary patterns with an application to facial expressions. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 29, 6 (2007), 915--928.Google ScholarDigital Library
- Sicheng Zhao, Guiguang Ding, Yue Gao, Xin Zhao, Youbao Tang, Jungong Han, Hongxun Yao, and Qingming Huang. 2018a. Discrete probability distribution prediction of image emotions with shared sparse learning. IEEE Transactions on Affective Computing (2018).Google Scholar
- Sicheng Zhao, Yue Gao, Guiguang Ding, and Tat-Seng Chua. 2017. Real-time multimedia social event detection in microblog. IEEE Transactions on Cybernetics, Vol. 48, 11 (2017), 3218--3231.Google ScholarCross Ref
- Sicheng Zhao, Hongxun Yao, Yue Gao, Guiguang Ding, and Tat-Seng Chua. 2016a. Predicting personalized image emotion perceptions in social networks. IEEE Transactions on Affective Computing, Vol. 9, 4 (2016), 526--540.Google ScholarDigital Library
- Sicheng Zhao, Hongxun Yao, Yue Gao, Rongrong Ji, Wenlong Xie, Xiaolei Jiang, and Tat-Seng Chua. 2016b. Predicting personalized emotion perceptions of social images. In Proceedings of the 24th ACM International Conference on Multimedia. 1385--1394.Google ScholarDigital Library
- Sicheng Zhao, Xin Zhao, Guiguang Ding, and Kurt Keutzer. 2018b. EmotionGAN: Unsupervised domain adaptation for learning discrete probability distributions of image emotions. In Proceedings of the 26th ACM International Conference on Multimedia. 1319--1327.Google ScholarDigital Library
Index Terms
- IExpressNet: Facial Expression Recognition with Incremental Classes
Recommendations
Expression-invariant face recognition by facial expression transformations
In this paper, we present a method of expression-invariant face recognition that transforms input face image with an arbitrary expression into its corresponding neutral facial expression image. When a new face image with an arbitrary expression is ...
Pose-Robust Facial Expression Recognition Using View-Based 2D + 3D AAM
This paper proposes a pose-robust face tracking and facial expression recognition method using a view-based 2D 3D active appearance model (AAM) that extends the 2D 3D AAM to the view-based approach, where one independent face model is used for a ...
Pose-Invariant Facial Expression Recognition Based on 3D Face Reconstruction and Synthesis from a Single 2D Image
ICPR '14: Proceedings of the 2014 22nd International Conference on Pattern RecognitionIn this paper, a novel method is proposed for person-independent pose-invariant facial expression recognition based on 3D face reconstruction from only 2D frontal images in a training set. A 3D Facial Expression Generic Elastic Model (3D FE-GEM) is ...
Comments