Abstract
A novel bag-of-words based approach is proposed for recognizing facial expressions corresponding to each of the six basic prototypic emotions from a video sequence. Each video sequence is represented as a specific combination of local (in spatio-temporal scale) motion patterns. These local motion patterns are captured in motion descriptors (MDs) which are unique combinations of optical flow and image gradient. These MDs can be compared to the words in the bag-of-words setting. Generally, the key-words in the wordbook as reported in the literature, are rigid, i.e., are taken as it is from the training data and cannot generalize well. We propose a novel adaptive learning technique for the key-words. The adapted key-MDs better represent the local motion patterns of the videos and generalize well to the unseen data and thus give better expression recognition accuracy. To test the efficiency of the proposed approach, we have experimented extensively on three well known datasets. We have also compared the results with existing state-of-the-art expression descriptors. Our method gives better accuracy. The proposed approach have been able to reduce the training time including the time for feature-extraction more than nine times and test time more than twice as compared to current state-of-the-art descriptor.
Similar content being viewed by others
References
Abboud B, Davoine F, Dang M (2003) Expressive face recognition and synthesis. In: CVPR Workshop. doi:10.1109/CVPRW.2003.10056, vol 5
Agarwal S, Chatterjee M, Mukherjee DP (2012) Recognizing facial expressions using a novel shape motion descriptor. In: ICVGIP. doi:10.1145/2425333.2425362
Agarwal S, Mukherjee DP, Decoding mixed emotions from expression map of face images (2013). In: AFGR. doi:10.1109/FG.2013.6553731
Aifanti N, Papachristou C, Delopoulos A (2010) The mug facial expression database. In: WIAMIS, pp. 76–84
Aleksic SP, Katsaggelos KA (2006) Automatic facial expression recognition using fcial animation parameters and multi-stream hmms. IEEE Transactions on Signal Preocessing, Supplement on Secure Media 1:3–11
Bartlett MS, Littlewort G, Fasel I, Movellan JR (2003) Real time face detection and facial expression recognition: Development and application to human computer interaction. In: CVPR Workship on CVPR for HCI, vol. 5, p. 53
Bejani M, Gharavian D, Charkari NM (2014) Audiovisual emotion recognition using anova feature selection method and multi-classifier neural networks. Neural Comput & Applic 24(2):399–412
Boughrara H, Chtourou M, Amar CB, Chen L (2014) Facial expression recognition based on a mlp neural network using constructive training algorithm. Multimedia Tools and Applications:1–23
Buenaposada JM, Muñoz E, Baumela L (2008) Recognising facial expressions in video sequences. Pattern Anal Applic 11(1):101–116
Chakraborty D, Pal NR (2008) Selecting useful groups of features in a connectionist framework. IEEE Trans Neural Netw 19(3):381–396
Chew SW, Rana R, Lucey P, Lucey S, Sridharan S (2012) Sparse temporal representations for facial expression recognition. In: AIVT, pp. 311–322. Springer
Dhall A, Asthana A, Goecke R, Gedeon T (2011) Emotion recognition using phog and lpq features. In: AFGR, pp. 878–883
Duchenne GB (1862) Mecanisme de la Physionomie Humaine, ou analyse electro-physiologique de I’expression des passions. Jules Renouard, Paris
Ekman P (1999) Facial expressions. In: T. Dalgleish, M. Power (eds) Handbook of Cognition and Emotion. Wiley, New York
Ekman P, Friesen WV (1978) The facial action coding system: A technique for the measurement of facial movement. Consulting Psychologists Press Inc.
Girard J, Cohn J, Mahoor M, Mavadati SM, Rosenwald D (2013) Social risk and depression: Evidence from manual and automatic facial expression analysis. In: AFGR. doi:10.1109/FG.2013.6553748
Hoque ME, Courgeon M, Martin JC, Mutlu B, Picard RW (2013) Mach: My automated conversation coach. In: UbiComp, pp. 697–706. ACM
Hsieh CC, Hsih MH, Jiang MK, Cheng YM, Liang EH (2015) Effective semantic features for facial expressions recognition using svm. Multimedia Tools and Applications:1–20
Hsu FS, Lin WY, Tsai TW (2014) Facial expression recognition using bag of distances. Multimedia Tools and Applications 73(1):309–326
Jain S, Hu C, Aggarwal JK (2011) Facial expression recognition with temporal modeling of shapes. In: ICCV Workshops, pp. 1642–1649. IEEE
Jiang B, Valstar MF, Pantic M (2011) Action unit detection uing sparse appearance descriptors in space-time video volumes. In: AFGR, pp. 314–321
Kanade T, Tian Y, Cohn JF (2000) Comprehensive database for facial expression analysis. In: AFGR, pp. 46–53. doi:10.1109/AFGR.2000.840611
Kaur M, Vashist R, Neeru N (2010) Recognition of facial expression with principal component analysis and singular value decomposition. Int J Comput Appl 9:36–40
Lajevardi SM, Hussain ZM (2012) Automatic facial expression recognition: feature extraction and selection. SIViP 6(1):159–169
Li Y, Wang S, Zhao Y, Ji Q (2013) Simultaneous facial feature tracking and facial expression recognition. IEEE Trans Image Process 22(7):2559–2573
Littlewort G, Bartlett MS, Fasel I, Sussking J, Movellan J (2004) Dynamics of facial expression extracted automatically from video. In: CVPR Workshop, p. 80
Lucey P, Cohn JF, Kanade T, Saragih J (2010) The extended cohn-kanade dataset (ck + ): A complete dataset for action unit and emotion-specified expression. In: CVPR, pp. 94–101
Matsumoto D, Hwang HS, Skinner L, Frank M (2011) Evaluating truthfulness and detecting deception. In: FBI law enforcement bulletin
McDuff D, Kalioubyb RE, Senechal T, Demirdjian D, Picard R (2014) Automatic measurement of ad preferences from facial responses gathered over the internet. Image and Vision Computing. doi:10.1016/j.imavis.2014.01.004
Mukherjee S, Biswas S, Mukherjee DP (2011) Recognising human action at a distance in video by key poses. IEEE Trans Circuits Syst Video Technol 21:1228–1241
Narayan BL, Murthy CA, Pal SK (2006) Maxdiff kd- trees for data condensation. Pattern Recognition Letters 27:187–200
Ojala M, Garriga GC (2010) Permutation tests for studying classifier performance. J Mach Learn Res 11:1833–1863
Pantic M, Valstar MF, Rademaker R, Maat L (2005) Web-based database for facial expression analysis. In: ICME
Rudovic O, Pavlovic V, Pantic M (2012) Multi-output laplacian dynamic ordinal regression for facial expression recognition and intensity estimation. In: CVPR, pp. 2634–2641. IEEE
Sanchez A, Ruiz JV, Moreno AB, Montemayor AS, Hernndez J, Pantrigo JJ (2011) Differential optical flow applied to automatic facial expression recognition. Neurocomputing 74:1272–1282
Shan C, Gong S, McOwan PW (2005) Robust facial expression recognition using local binary patterns. In: ICIP, vol. 2, pp. II 370 – II 373
Shan C, Gong S, McOwan PW (2006) Dynamic facial expression recognition using a bayesian temporal manifold model. In: BMVC, pp. 297–306
Shan C, Gong S, McOwan PW (2009) Facial expression recognition based on local binary patterns: A comprehensive study. Image Vis Comput 27(6):803–816
Sikka K, Dhall A, Bartlett M (2013) Weakly supervised pain localization using multiple instance learning. In: AFGR’13. doi:10.1109/FG.2013.6553762
Suk M, Prabhakaran B (2014) Real-time mobile facial expression recognition system - a case study. In: CVPR Workshops, pp. 132–137
Tariq U, Yang J, Huang TS (2013) Maximum margin gmm learning for facial expression recognition. In: AFGR’13, p. NA. doi:10.1109/FG.2013.6553794
Tian Y, Kanade T, Cohn JF (2001) Recognising action units for facial expression analysis, vol 23, p 2
Tian YL (2004) Evaluation of face resolution for expression analysis. In: FPIV, p. 82
Valstar M, Patras I, Pantic M (2005) Facial action unit detection using probabilistic actively learned support vector machines on tracked facial point data. In: CVPR, vol. 3, pp. 76–84. doi:10.1109/CVPR.2005.457
Valstar MF, Pantic M (2010) Induced disgust, happiness and surprise: an addition to the mmi facial expression database. In: LREC
Van Gemert JC, Veenman CJ, Smeulders AW, Geusebroek JM (2010) Visual word ambiguity. IEEE Trans Pattern Anal Mach Intell 32(7):1271–1283
Viola P, Jones M (2001) Rapid object detection using a boosted cascade of simple features. In: CVPR, pp. 511–518
Wang J, Yin L (2007) Static topographic modeling for facial expression recognition and analysis. Comput. Vis Image Underst 108(1-2):19–34
Wang Z, Wang S, Ji Q (2013) Capturing complex spatio-temporal relations among facial muscles for facial expression recognition. In: CVPR, pp. 3422–3429. IEEE
Xiao R, Zhao Q, Zhang D, Shi P (2011) Facial expression recognition on multiple manifolds. Pattern Recognit 44(1):107–116
Xu L, Mordohai P (2010) Automatic facial expression recognition using bags of motion words. In: BMVC. doi:10.5244/C.24.13
Yeasin M, Bullot B, Sharma R (2004) From facial expression to level of interest: A spatio-temporal approach. In: CVPR, vol. 2, pp. II–922 –II–927
Zhang L, Tjondronegoro D (2011) Facial expression recognition using facial movement features. IEEE Trans Affective Comput 2(4):219–229
Zhang Z, Lyons MJ, Schuster M, Akamatsu S (1998) Comparison between geometry-based and gabor-wavelets-based facial expression recognition using multi-layer perceptron. In: AFGR, pp. 454– 459
Zhao G, Pietikinen M (2007) Dynamic texture recognition using local binary patterns with an application to facial expressions. IEEE Trans Pattern Anal Mach Intell 29(6):915928
Acknowledgments
This work was supported by DST, Govt. of India project no. SR/WOS-A/ET-53/2012(G).
Author information
Authors and Affiliations
Corresponding author
Appendix A: Convergence of the Adaptive Learning
Appendix A: Convergence of the Adaptive Learning
Suppose that in the ith iteration of the while loop of Algorithm 1, ξ number of MDs (Q 1,Q 2,...,Q ξ ) find key-MD P l as their nearest key-MD. For illustration purpose, a typical positioning of the ξ number of MDs and the key-MD P l in 2D space is shown in Fig. 14. Let the distance between the key-MD P l and MDs Q k , k=1,2,...,ξ be represented by d k , k=1,2,...,ξ and \(S={\sum }_{k=1}^{\xi }{d_{k}}\), i.e., S represents the summation of the distances from the key-MD P l to the MDs Q k , k=1,2,3,...,ξ. Without loss of generality, let us consider the MD Q 1. After moving P l by a fraction (δ) of the distance d 1 towards Q 1, let the new position of the key-MD be \({P^{1}_{l}}\). Let the new distances between the key-MD \({P^{1}_{l}}\) and MDs Q k be represented by \({d^{1}_{k}}\) and let, \(S^{1}={\sum }_{k=1}^{\xi }{{d^{1}_{k}}}\). We want S to decrease with each iteration. Let the angles between the straight lines joining Q k to P l and P l to \({P^{1}_{l}}\) be 𝜃 k and \(\lambda ^{1}=d_{1} - {d^{1}_{l}}\). Therefore, \({d^{1}_{k}}-d_{k}=\sqrt {((d_{k})^{2}+(\lambda ^{1})^{2}-2d_{k}\lambda ^{1}\cos \theta _{k})}-d_{k}\). It can be shown that,
From (7) it can be said that rate of change of S 1 with respect to λ 1 is negative i.e., the key-MD converges towards the corresponding MDs, when \({\sum }_{k=2}^{\xi }\cos \theta _{k}>-1\text {.}\)
Let ζ k,j , k,j=1,2,...,ξ represent the distance between the two MDs Q j and Q k . Therefore, from (7) we can write, \({d^{1}_{k}} < d_{k}\) when (d k )2+(d 1)2>ζ k,1. Similar equations can be inferred when we consider all the other MDs Q k , k=2,...,ξ. Therefore, it can be said that the key-MD P l converges towards the cluster of MDs Q k , k=1,2,...,ξ as long as P l remains outside the hyper circle whose diameter is the distance between the two maximum distant MDs among all Q k , k=1,2,...,ξ. We find the lower bound of S−S ξ where S ξ is the summation of distances from the key-MD P l to all the MDs Q k , k=1,2,...,ξ after one full iteration of the while loop of Algorithm 1. From Fig. 14 we get (from the triangle inequality),
Summing up the above ξ number of inequations we get,
Since λ k, k=1,2,...,ξ are assumed to be very small, the magnitude of the right side of (9) is very small. We find the value of P l for which S ξ gets its minimum value as follows. Since, S ξ is the summation of some distances, it is a positive number. Therefore, S ξ gets its minimum value for the value of P l for which \({\sum }_{k=1}^{\xi }{(Q_{k}-P_{l})^{2}}\) gets its minimum value. Let us define \(S^{\prime }={\sum }_{k=1}^{\xi }{(Q_{k}-P_{l})^{2}}\). Therefore,
In (10), \(\bar {Q}\) refers to the arithmetic mean of the MDs Q k , k=1,2,...,ξ. From (10) it can be said that \(dS^{\prime }\) and therefore \(S^{\prime }\) gets its minimum value when \(dP_{l} =\bar {Q}-P_{l}\) or \(P_{l}=\bar {Q}\). The characteristic and the biggest advantage of adaptive learning technique is that the system under adaptive learning changes gracefully according to the changes in its environment. That is the key-MD under adaptive learning stores the experience from past learning. This is not possible if we consider \(\bar {Q}\) to be the key-MD. Therefore, we represent the video sequences (training and test) in terms of ED constructed using adapted key-MDs.
Rights and permissions
About this article
Cite this article
Agarwal, S., Mukherjee, D.P. Facial expression recognition through adaptive learning of local motion descriptor. Multimed Tools Appl 76, 1073–1099 (2017). https://doi.org/10.1007/s11042-015-3103-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-015-3103-6