Abstract
Facial action unit (AU) recognition has attracted great attention because of the applications in a wide range of fields. Missing labels and class imbalance (CIB) are both challenges for facial action unit recognition. Missing labels means that there are only apart label assignments for training samples. CIB is observed from two perspectives: firstly, the number of positive AUs is much smaller than that of negative AUs for each expressional image; secondly, the rate of positive samples of different AUs are significantly different. Both missing labels and CIB lead to performance degradation in AU recognition. In this work, we propose to handle these two challenges in AU recognition simultaneously. Specifically, we formulate AU recognition with missing labels as a multi label learning with missing labels (MLML) problem, which handles the missing label challenge naturally. However, different from most existing MLML approaches which usually employ same features from whole image for all classes, we select the most related features for each AU. To handle the CIB challenge, we further introduce class cardinality bounds which constrain the number of positive AUs for each data instance, as well as the number of positive labels for each AU in the overall dataset. The class cardinality bounds serve as linear constraints for the objective function, which turns the optimization NP-hard. Thus we present convex approximation based on the Lovasz extension, which leads to a linear program that can be efficiently solved by the alternative direction method of multipliers (ADMM). Experimental results on both posed and spontaneous facial expression datasets demonstrate the superiority of the proposed method compared to state-of-the-art.
Similar content being viewed by others
Notes
The details of the optimization process based on ADMM is presented in the Supplementary Material.
The evaluation data from the 2 databases are available at: https://pan.baidu.com/s/1Hz72YNVBvQt-LFCAY4c43w.
References
Bach FR (2013) Learning with submodular functions: a convex optimization perspective. arXiv: Learn 6:145–373
Boyd B, Parikh N, Chu E, Peleato B, Eckstein J (2011) Distributed optimization and statistical learning via the alternating direction method of multipliers. In: Foundations and trends in machine learning, pp 1–122
Boyd S, Vandenberghe L (2013) Convex optimization. Cambridge University Press, Cambridge
Boykov Y, Kolmogorov V (2004) An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision. IEEE Trans Pattern Anal Mach Intell 26 (9):1124
Breazeal CL (2000) Sociable machines: expressive social exchange between humans and robots. Doctoral dissertation, Massachusetts Institute of Technology
Bucak SS, Jin R, Jain AK (2011) Multi-label learning with incomplete class assignments. In: Computer vision and pattern recognition, pp 2801–2808
Cabral RS, Torre FDL, Costeira JP, Bernardino A (2011) Matrix completion for multi-label image classification. In: Advances in neural information processing systems, pp 190–198
Chen G, Song Y, Wang F, Zhang C (2008) Semi-supervised multi-label learning by solving a sylvester equation. In: Siam international conference on data mining, SDM 2008, Atlanta, pp 410–419
Cootes TF. aam tools. [online]. available: http://personalpages.manchester.ac.uk/staff/timothy.f.cootes/
Cootes TF, Edwards GJ, Taylor CJ (2001) Active appearance models. IEEE Trans Pattern Anal Mach Intell 23(6):681–685
Dembczynski K, Jachnik A, Kotlowski W, Waegeman W, Hullermeier E (2013) Optimizing the F-measure in multi-label classification: plug-in rule approach versus structured loss minimization. In: International conference on machine learning, pp 1130–1138
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B Methodol:1–38
Ekman PE, Friesen WV, Hager JC (2002) Facial action coding system. A human face, Salt Lake City
Geman S, Geman D (1984) Stochastic relaxation, gibbs distributions, and the bayesian restoration of images. IEEE Trans Pattern Anal Mach Intell 6(6):721–741
Goldberg AB, Zhu X, Recht B, Xu J, Nowak RD (2010) Transduction with matrix completion: Three birds with one stone. In: Advances in neural information processing systems, pp 757–765
Hamm J, Kohler CG, Gur RC, Verma R (2011) Automated facial action coding system for dynamic analysis of facial expressions in neuropsychiatric disorders. J Neurosci Methods 200(2):237–256
Han S, Meng Z, Khan AS, Tong Y (2016) Incremental boosting convolutional neural network for facial 613 action unit recognition. In: Advances in neural information processing system, pp 109–117
Jiang B, Valstar M, Pantic M (2011) Action unit detection using sparse appearance descriptors in space-time video volumes. In: IEEE International conference on automatic face & gesture recognition and workshops, pp 314–321
Li Y, Chen J, Zhao Y, Ji Q (2013) Data-free prior model for facial action unit recognition. IEEE Trans Affect Comput 4(2):127–141
Li Y, Wang S, Zhao Y, Ji Q (2013) Simultaneous facial feature tracking and facial expression recognition. IEEE Trans Image Process 22(7):2559–2573
Li Y, Wu B, Ghanem B, Zhao Y, Yao H, Ji Q (2016) Facial action unit recognition under incomplete data based on multi-label learning with missing labels. Pattern Recogn 60:890–900
Liao W, Ji Q (2009) Learning bayesian network parameters under incomplete data with domain knowledge. Pattern Recogn 42(11):3046–3056
Liu Z, Wang S, Wang Z, Ji Q (2013) Implicit video multi-emotion tagging by exploiting multi-expression relations. In: IEEE International conference and workshops on automatic face and gesture recognition, pp 1–6
Lucey P, Cohn JF, Kanade T, Saragih J, Ambadar Z, Matthews I (2010) The Extended Cohn-Kanade Dataset (CK+): a complete dataset for action unit and emotion-specified expression. In: Computer vision and pattern recognition, pp 94–101
Mahoor MH, Cadavid S, Messinger DS, Cohn JF (2009) A framework for automated measurement of the intensity of non-posed facial action units. In: 2009. CVPR workshops 2009. IEEE computer society conference on computer vision and pattern recognition workshops, pp 74–80
Mckeown G, Valstar M, Cowie R, Pantic M, Schroder M (2012) The semaine database: annotated multimodal records of emotionally colored conversations between a person and a limited agent. IEEE Trans Affect Comput 3(1):5–17
Pantic M, Patras I (2006) Dynamics of facial expression: recognition of facial actions and their temporal segments from face profile image sequences. IEEE Trans Syst Man Cybern Part B 36(2):433–49
Petterson J, Caetano T (2010) Reverse multi-label learning. In: International conference on neural information processing systems, pp 1912–1920
Rudovic O, Pavlovic V, Pantic M (2015) Context-sensitive dynamic ordinal regression for intensity estimation of facial action units. IEEE Trans Pattern Anal Mach Intell 37(5):944–958
Sandbach G, Zafeiriou S, Pantic M (2013) Markov random field structures for facial action unit intensity estimation. In: IEEE International conference on computer vision workshops, pp 738–745
Sorower MS (2010) A literature survey on algorithms for multi-label learning. Oregon State University
Sun YY, Zhang Y, Zhou ZH (2010) Multi-label learning with weak label. In: Twenty-fourth AAAI conference on artificial intelligence, pp 593–598
Tian Y, Kanade T, Cohn JF (2001) Recognizing action units for facial expression analysis. IEEE Trans Pattern Anal Mach Intell 23(2):97–115
Tong Y, Ji Q (2008) Learning bayesian networks with qualitative constraints. In: Computer vision and pattern recognition, pp 1–8
Tong Y, Chen J, Ji Q (2010) A unified probabilistic framework for spontaneous facial action modeling and understanding. IEEE Trans Pattern Anal Mach Intell 32(2):258–273
Tong Y, Liao W, Ji Q (2007) Facial action unit recognition by exploiting their dynamic and semantic relationships. IEEE Trans Pattern Anal Mach Intell 29(10):1683–1699
Valstar M, Pantic M (2007) Combined support vector machines and hidden markov models for modeling facial action temporal dynamics. In: IEEE International conference on human-computer interaction, pp 118–127
Wang Q, Si L, Zhang D (2014) Learning to hash with partial tags: exploring correlation between tags and hashing bits for large scale image retrieval. In: European conference on computer vision, pp 378– 392
Wu B, Liu Z, Wang S, Hu B, Ji Q (2014) Multi-label learning with missing labels. In: International conference on pattern recognition, pp 1964–1968
Wu B, Lyu S, Ghanem B (2015) Ml-mg: multi-label learning with missing labels using a mixed graph. In: IEEE International conference on computer vision, pp 4157–4165
Wu B, Lyu S, Hu B, Ji Q (2015) Multi-label learning with missing labels for image annotation and facial action unit recognition. Pattern Recogn 48(7):2279–2289
Wu B, Lyu S, Ghanem B (2016) Constrained submodular minimization for missing labels and class imbalance in multi-label learning. In: The thirtieth AAAI conference on artificial intelligence
Xu M, Jin R, Zhou ZH (2013) Speedup matrix completion with side information: application to multi-label learning. In: Advances in neural information processing systems, pp 2301–2309
Zehfuss G (1858) ÜBer eine gewisse determinante. Zeitschrift für Mathematik und Physik, pp 298– 301
Zelnikmanor L, Perona P (2005) Self-tuning spectral clustering. In: Advances in neural information processing systems, pp 1601–1608
Zhang ML, Li YK, Liu XY (2015) Towards class-imbalance aware multi-label learning. In: International conference on artificial intelligence, pp 4041–4047
Zhu X (2005) Semi-supervised learning literature survey. Comput Sci 37(1):63–77
Acknowledgments
Yongqiang Li is supported by National Natural Science Foundation of China (No. 61402129), and Postdoctoral Foundation Projects (No. LBH-Z14090, No. 2015M571417 and No. 2017T100243). Baoyuan Wu is supported by Tencent AI Lab Foundation. Hongxun Yao is partially supported by National Natural Science Foundation of China (No. 61472103) and Key Program (No. 61133003).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Li, Y., Wu, B., Zhao, Y. et al. Handling missing labels and class imbalance challenges simultaneously for facial action unit recognition. Multimed Tools Appl 78, 20309–20332 (2019). https://doi.org/10.1007/s11042-018-6836-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-018-6836-1