Abstract
We address the task of simultaneous feature fusion and modeling of discrete ordinal outputs. We propose a novel Gaussian process (GP) auto-encoder modeling approach. In particular, we introduce GP encoders to project multiple observed features onto a latent space, while GP decoders are responsible for reconstructing the original features. Inference is performed in a novel variational framework, where the recovered latent representations are further constrained by the ordinal output labels. In this way, we seamlessly integrate the ordinal structure in the learned manifold, while attaining robust fusion of the input features. We demonstrate the representation abilities of our model on benchmark datasets from machine learning and affect analysis. We further evaluate the model on the tasks of feature fusion and joint ordinal prediction of facial action units. Our experiments demonstrate the benefits of the proposed approach compared to the state of the art.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
The subscript r indicates that the process facilitates the recognition model.
- 2.
For simplicity we assume an isotropic (diagonal) covariance across the dimensions.
- 3.
Note that we adopt here a linear model for \(g_c(\cdot )\) as it operates on a low-dimensional non-linear manifold \(\varvec{X}\), already obtained by the GP auto-encoder.
References
Bartlett, M., Whitehill, J.: Automated facial expression measurement: recent applications to basic research in human behavior, learning, and education. In: Handbook of Face Perception. Oxford University Press, USA (2010)
Ekman, P., Friesen, W.V., Hager, J.C.: Facial action coding system. UT: A Human Face, Salt Lake City (2002)
Pantic, M.: Machine analysis of facial behaviour: naturalistic and dynamic behaviour. Philos. Trans. Roy. Soc. B: Biol. Sci. 364, 3505–3513 (2009)
Rudovic, O., Pavlovic, V., Pantic, M.: Context-sensitive dynamic ordinal regression for intensity estimation of facial action units. IEEE TPAMI 37, 944–958 (2015)
Mahoor, M.H., Cadavid, S., Messinger, D.S., Cohn, J.F.: A framework for automated measurement of the intensity of non-posed facial action units. In: IEEE CVPR-W, pp. 74–80 (2009)
Mavadati, S.M., Mahoor, M.H., Bartlett, K., Trinh, P., Cohn, J.F.: DISFA: a spontaneous facial action intensity database. IEEE TAC 4, 151–160 (2013)
Ming, Z., Bugeau, A., Rouas, J.L., Shochi, T.: Facial action units intensity estimation by the fusion of features with multi-kernel support vector machine. In: IEEE FG, vol. 6, pp. 1–6 (2015)
Valstar, M.F., Almaev, T., Girard, J.M., McKeown, G., Mehu, M., Yin, L., Pantic, M., Cohn, J.F.: FERA 2015 - second facial expression recognition and analysis challenge. In: IEEE FG, vol. 6, pp. 1–8 (2015)
Savran, A., Sankur, B., Bilge, M.T.: Regression-based intensity estimation of facial action units. Image Vis. Comput. 30, 774–784 (2012)
Kaltwang, S., Rudovic, O., Pantic, M.: Continuous pain intensity estimation from facial expressions. In: Bebis, G., et al. (eds.) ISVC 2012. LNCS, vol. 7432, pp. 368–377. Springer, Heidelberg (2012). doi:10.1007/978-3-642-33191-6_36
Jeni, L.A., Girard, J.M., Cohn, J.F., De La Torre, F.: Continuous AU intensity estimation using localized, sparse facial feature space. In: IEEE FG, pp. 1–7 (2013)
Kaltwang, S., Todorovic, S., Pantic, M.: Doubly sparse relevance vector machine for continuous facial behavior estimation. IEEE TPAMI 38, 1748–1761 (2015)
Li, Y., Mavadati, S.M., Mahoor, M.H., Ji, Q.: A unified probabilistic framework for measuring the intensity of spontaneous facial action units. In: IEEE FG (2013)
Sandbach, G., Zafeiriou, S., Pantic, M.: Markov random field structures for facial action unit intensity estimation. In: IEEE ICCV-W, pp. 738–745 (2013)
Kaltwang, S., Todorovic, S., Pantic, M.: Latent trees for estimating intensity of facial action units. In: IEEE CVPR, pp. 296–304 (2015)
Nicolle, J., Bailly, K., Chetouani, M.: Facial action unit intensity prediction via hard multi-task metric learning for kernel regression. In: IEEE FG, pp. 1–6 (2015)
Mohammadi, M.R., Fatemizadeh, E., Mahoor, M.H.: Intensity estimation of spontaneous facial action units based on their sparsity properties. IEEE TCYB 46, 817–826 (2016)
Damianou, A., Ek, C.H., Titsias, M., Lawrence, N.: Manifold relevance determination. In: ICML, pp. 145–152 (2012)
Urtasun, R., Quattoni, A., Lawrence, N., Darrell, T.: Transferring nonlinear representations using Gaussian processes with a shared latent space. Technical report MIT-CSAIL-TR-08-020 (2008)
Calandra, R., Peters, J., Rasmussen, C.E., Deisenroth, M.P.: Manifold Gaussian processes for regression. In: IJCNN (2016)
Rasmussen, C., Williams, C.: Gaussian Processes for Machine Learning, vol. 1. MIT Press, Cambridge (2006)
Titsias, M.K., Lawrence, N.D.: Bayesian Gaussian process latent variable model. In: AISTATS, pp. 844–851 (2010)
Dai, Z., Damianou, A., González, J., Lawrence, N.: Variational auto-encoded deep Gaussian processes. In: ICLR (2016)
Agresti, A.: Analysis of Ordinal Categorical Data. Wiley, Hoboken (2010)
Mahoor, M.H., Zhou, M., Veon, K.L., Mavadati, S.M., Cohn, J.F.: Facial action unit recognition with sparse representation. In: IEEE FG, pp. 336–342 (2011)
Chu, W.S., Torre, F.D.L., Cohn, J.F.: Selective transfer machine for personalized facial action unit detection. In: IEEE CVPR, pp. 3515–3522 (2013)
Zhao, K., Chu, W.S., De la Torre, F., Cohn, J.F., Zhang, H.: Joint patch and multi-label learning for facial action unit detection. In: IEEE CVPR (2015)
Eleftheriadis, S., Rudovic, O., Pantic, M.: Multi-conditional latent variable model for joint facial action unit detection. In: IEEE ICCV, pp. 3792–3800 (2015)
Candès, E.J., Li, X., Ma, Y., Wright, J.: Robust principal component analysis? J. ACM 58, 11 (2011)
Kingma, D.P., Welling, M.: Auto-encoding variational Bayes. In: ICLR (2013)
Rezende, D.J., Mohamed, S., Wierstra, D.: Stochastic backpropagation and approximate inference in deep generative models. In: ICML, pp. 1278–1286 (2014)
Chu, W., Ghahramani, Z.: Gaussian processes for ordinal regression. JMLR 6, 1019–1041 (2005)
Zeiler, M.D.: ADADELTA: an adaptive learning rate method. arXiv preprint arXiv:1212.5701 (2012)
Lawrence, N.D., Candela, J.Q.: Local distance preservation in the GP-LVM through back constraints. In: ICML, vol. 148, pp. 513–520 (2006)
Lawrence, N.: Probabilistic non-linear principal component analysis with Gaussian process latent variable models. JMLR 6, 1783–1816 (2005)
Shon, A., Grochow, K., Hertzmann, A., Rao, R.: Learning shared latent structure for image synthesis and robotic imitation. NIPS 18, 1233–1240 (2006)
Ek, C.H., Torr, P.H.S., Lawrence, N.D.: Gaussian process latent variable models for human pose estimation. In: Popescu-Belis, A., Renals, S., Bourlard, H. (eds.) MLMI 2007. LNCS, vol. 4892, pp. 132–143. Springer, Heidelberg (2008). doi:10.1007/978-3-540-78155-4_12
Eleftheriadis, S., Rudovic, O., Pantic, M.: Discriminative shared Gaussian processes for multiview and view-invariant facial expression recognition. IEEE TIP 24, 189–204 (2015)
Damianou, A., Lawrence, N.: Semi-described and semi-supervised learning with Gaussian processes. In: UAI (2015)
LeCun, Y., Cortes, C., Burges, C.J.: The MNIST database of handwritten digits (1998)
Zhang, X., Yin, L., Cohn, J.F., Canavan, S., Reale, M., Horowitz, A., Liu, P., Girard, J.M.: BP4D-spontaneous: a high-resolution spontaneous 3D dynamic facial expression database. Image Vis. Comput. 32, 692–706 (2014)
Ojala, T., Pietikainen, M., Maenpaa, T.: Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE TPAMI 24, 971–987 (2002)
Shrout, P.E., Fleiss, J.L.: Intraclass correlations: uses in assessing rater reliability. Psychol. Bull. 86, 420 (1979)
Sheth, R., Wang, Y., Khardon, R.: Sparse variational inference for generalized GP models. In: ICML, pp. 1302–1311 (2015)
Acknowledgement
This work has been funded by the European Community Horizon 2020 under grant agreement No. 645094 (SEWA), and No. 688835 (DE-ENIGMA). MPD has been supported by a Google Faculty Research Award.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Eleftheriadis, S., Rudovic, O., Deisenroth, M.P., Pantic, M. (2017). Variational Gaussian Process Auto-Encoder for Ordinal Prediction of Facial Action Units. In: Lai, SH., Lepetit, V., Nishino, K., Sato, Y. (eds) Computer Vision – ACCV 2016. ACCV 2016. Lecture Notes in Computer Science(), vol 10112. Springer, Cham. https://doi.org/10.1007/978-3-319-54184-6_10
Download citation
DOI: https://doi.org/10.1007/978-3-319-54184-6_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-54183-9
Online ISBN: 978-3-319-54184-6
eBook Packages: Computer ScienceComputer Science (R0)