Skip to main content
Log in

Facial expression recognition through adaptive learning of local motion descriptor

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

A novel bag-of-words based approach is proposed for recognizing facial expressions corresponding to each of the six basic prototypic emotions from a video sequence. Each video sequence is represented as a specific combination of local (in spatio-temporal scale) motion patterns. These local motion patterns are captured in motion descriptors (MDs) which are unique combinations of optical flow and image gradient. These MDs can be compared to the words in the bag-of-words setting. Generally, the key-words in the wordbook as reported in the literature, are rigid, i.e., are taken as it is from the training data and cannot generalize well. We propose a novel adaptive learning technique for the key-words. The adapted key-MDs better represent the local motion patterns of the videos and generalize well to the unseen data and thus give better expression recognition accuracy. To test the efficiency of the proposed approach, we have experimented extensively on three well known datasets. We have also compared the results with existing state-of-the-art expression descriptors. Our method gives better accuracy. The proposed approach have been able to reduce the training time including the time for feature-extraction more than nine times and test time more than twice as compared to current state-of-the-art descriptor.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

References

  1. Abboud B, Davoine F, Dang M (2003) Expressive face recognition and synthesis. In: CVPR Workshop. doi:10.1109/CVPRW.2003.10056, vol 5

  2. Agarwal S, Chatterjee M, Mukherjee DP (2012) Recognizing facial expressions using a novel shape motion descriptor. In: ICVGIP. doi:10.1145/2425333.2425362

  3. Agarwal S, Mukherjee DP, Decoding mixed emotions from expression map of face images (2013). In: AFGR. doi:10.1109/FG.2013.6553731

  4. Aifanti N, Papachristou C, Delopoulos A (2010) The mug facial expression database. In: WIAMIS, pp. 76–84

  5. Aleksic SP, Katsaggelos KA (2006) Automatic facial expression recognition using fcial animation parameters and multi-stream hmms. IEEE Transactions on Signal Preocessing, Supplement on Secure Media 1:3–11

    Google Scholar 

  6. Bartlett MS, Littlewort G, Fasel I, Movellan JR (2003) Real time face detection and facial expression recognition: Development and application to human computer interaction. In: CVPR Workship on CVPR for HCI, vol. 5, p. 53

  7. Bejani M, Gharavian D, Charkari NM (2014) Audiovisual emotion recognition using anova feature selection method and multi-classifier neural networks. Neural Comput & Applic 24(2):399–412

    Article  Google Scholar 

  8. Boughrara H, Chtourou M, Amar CB, Chen L (2014) Facial expression recognition based on a mlp neural network using constructive training algorithm. Multimedia Tools and Applications:1–23

  9. Buenaposada JM, Muñoz E, Baumela L (2008) Recognising facial expressions in video sequences. Pattern Anal Applic 11(1):101–116

    Article  MathSciNet  Google Scholar 

  10. Chakraborty D, Pal NR (2008) Selecting useful groups of features in a connectionist framework. IEEE Trans Neural Netw 19(3):381–396

    Article  Google Scholar 

  11. Chew SW, Rana R, Lucey P, Lucey S, Sridharan S (2012) Sparse temporal representations for facial expression recognition. In: AIVT, pp. 311–322. Springer

  12. Dhall A, Asthana A, Goecke R, Gedeon T (2011) Emotion recognition using phog and lpq features. In: AFGR, pp. 878–883

  13. Duchenne GB (1862) Mecanisme de la Physionomie Humaine, ou analyse electro-physiologique de I’expression des passions. Jules Renouard, Paris

    Book  Google Scholar 

  14. Ekman P (1999) Facial expressions. In: T. Dalgleish, M. Power (eds) Handbook of Cognition and Emotion. Wiley, New York

  15. Ekman P, Friesen WV (1978) The facial action coding system: A technique for the measurement of facial movement. Consulting Psychologists Press Inc.

  16. Girard J, Cohn J, Mahoor M, Mavadati SM, Rosenwald D (2013) Social risk and depression: Evidence from manual and automatic facial expression analysis. In: AFGR. doi:10.1109/FG.2013.6553748

  17. Hoque ME, Courgeon M, Martin JC, Mutlu B, Picard RW (2013) Mach: My automated conversation coach. In: UbiComp, pp. 697–706. ACM

  18. Hsieh CC, Hsih MH, Jiang MK, Cheng YM, Liang EH (2015) Effective semantic features for facial expressions recognition using svm. Multimedia Tools and Applications:1–20

  19. Hsu FS, Lin WY, Tsai TW (2014) Facial expression recognition using bag of distances. Multimedia Tools and Applications 73(1):309–326

    Article  Google Scholar 

  20. Jain S, Hu C, Aggarwal JK (2011) Facial expression recognition with temporal modeling of shapes. In: ICCV Workshops, pp. 1642–1649. IEEE

  21. Jiang B, Valstar MF, Pantic M (2011) Action unit detection uing sparse appearance descriptors in space-time video volumes. In: AFGR, pp. 314–321

  22. Kanade T, Tian Y, Cohn JF (2000) Comprehensive database for facial expression analysis. In: AFGR, pp. 46–53. doi:10.1109/AFGR.2000.840611

  23. Kaur M, Vashist R, Neeru N (2010) Recognition of facial expression with principal component analysis and singular value decomposition. Int J Comput Appl 9:36–40

    Google Scholar 

  24. Lajevardi SM, Hussain ZM (2012) Automatic facial expression recognition: feature extraction and selection. SIViP 6(1):159–169

    Article  Google Scholar 

  25. Li Y, Wang S, Zhao Y, Ji Q (2013) Simultaneous facial feature tracking and facial expression recognition. IEEE Trans Image Process 22(7):2559–2573

    Article  Google Scholar 

  26. Littlewort G, Bartlett MS, Fasel I, Sussking J, Movellan J (2004) Dynamics of facial expression extracted automatically from video. In: CVPR Workshop, p. 80

  27. Lucey P, Cohn JF, Kanade T, Saragih J (2010) The extended cohn-kanade dataset (ck + ): A complete dataset for action unit and emotion-specified expression. In: CVPR, pp. 94–101

  28. Matsumoto D, Hwang HS, Skinner L, Frank M (2011) Evaluating truthfulness and detecting deception. In: FBI law enforcement bulletin

  29. McDuff D, Kalioubyb RE, Senechal T, Demirdjian D, Picard R (2014) Automatic measurement of ad preferences from facial responses gathered over the internet. Image and Vision Computing. doi:10.1016/j.imavis.2014.01.004

  30. Mukherjee S, Biswas S, Mukherjee DP (2011) Recognising human action at a distance in video by key poses. IEEE Trans Circuits Syst Video Technol 21:1228–1241

    Article  Google Scholar 

  31. Narayan BL, Murthy CA, Pal SK (2006) Maxdiff kd- trees for data condensation. Pattern Recognition Letters 27:187–200

    Article  Google Scholar 

  32. Ojala M, Garriga GC (2010) Permutation tests for studying classifier performance. J Mach Learn Res 11:1833–1863

    MathSciNet  MATH  Google Scholar 

  33. Pantic M, Valstar MF, Rademaker R, Maat L (2005) Web-based database for facial expression analysis. In: ICME

  34. Rudovic O, Pavlovic V, Pantic M (2012) Multi-output laplacian dynamic ordinal regression for facial expression recognition and intensity estimation. In: CVPR, pp. 2634–2641. IEEE

  35. Sanchez A, Ruiz JV, Moreno AB, Montemayor AS, Hernndez J, Pantrigo JJ (2011) Differential optical flow applied to automatic facial expression recognition. Neurocomputing 74:1272–1282

    Article  Google Scholar 

  36. Shan C, Gong S, McOwan PW (2005) Robust facial expression recognition using local binary patterns. In: ICIP, vol. 2, pp. II 370 – II 373

  37. Shan C, Gong S, McOwan PW (2006) Dynamic facial expression recognition using a bayesian temporal manifold model. In: BMVC, pp. 297–306

  38. Shan C, Gong S, McOwan PW (2009) Facial expression recognition based on local binary patterns: A comprehensive study. Image Vis Comput 27(6):803–816

    Article  Google Scholar 

  39. Sikka K, Dhall A, Bartlett M (2013) Weakly supervised pain localization using multiple instance learning. In: AFGR’13. doi:10.1109/FG.2013.6553762

  40. Suk M, Prabhakaran B (2014) Real-time mobile facial expression recognition system - a case study. In: CVPR Workshops, pp. 132–137

  41. Tariq U, Yang J, Huang TS (2013) Maximum margin gmm learning for facial expression recognition. In: AFGR’13, p. NA. doi:10.1109/FG.2013.6553794

  42. Tian Y, Kanade T, Cohn JF (2001) Recognising action units for facial expression analysis, vol 23, p 2

  43. Tian YL (2004) Evaluation of face resolution for expression analysis. In: FPIV, p. 82

  44. Valstar M, Patras I, Pantic M (2005) Facial action unit detection using probabilistic actively learned support vector machines on tracked facial point data. In: CVPR, vol. 3, pp. 76–84. doi:10.1109/CVPR.2005.457

  45. Valstar MF, Pantic M (2010) Induced disgust, happiness and surprise: an addition to the mmi facial expression database. In: LREC

  46. Van Gemert JC, Veenman CJ, Smeulders AW, Geusebroek JM (2010) Visual word ambiguity. IEEE Trans Pattern Anal Mach Intell 32(7):1271–1283

    Article  Google Scholar 

  47. Viola P, Jones M (2001) Rapid object detection using a boosted cascade of simple features. In: CVPR, pp. 511–518

  48. Wang J, Yin L (2007) Static topographic modeling for facial expression recognition and analysis. Comput. Vis Image Underst 108(1-2):19–34

    Article  Google Scholar 

  49. Wang Z, Wang S, Ji Q (2013) Capturing complex spatio-temporal relations among facial muscles for facial expression recognition. In: CVPR, pp. 3422–3429. IEEE

  50. Xiao R, Zhao Q, Zhang D, Shi P (2011) Facial expression recognition on multiple manifolds. Pattern Recognit 44(1):107–116

    Article  MATH  Google Scholar 

  51. Xu L, Mordohai P (2010) Automatic facial expression recognition using bags of motion words. In: BMVC. doi:10.5244/C.24.13

  52. Yeasin M, Bullot B, Sharma R (2004) From facial expression to level of interest: A spatio-temporal approach. In: CVPR, vol. 2, pp. II–922 –II–927

  53. Zhang L, Tjondronegoro D (2011) Facial expression recognition using facial movement features. IEEE Trans Affective Comput 2(4):219–229

    Article  Google Scholar 

  54. Zhang Z, Lyons MJ, Schuster M, Akamatsu S (1998) Comparison between geometry-based and gabor-wavelets-based facial expression recognition using multi-layer perceptron. In: AFGR, pp. 454– 459

  55. Zhao G, Pietikinen M (2007) Dynamic texture recognition using local binary patterns with an application to facial expressions. IEEE Trans Pattern Anal Mach Intell 29(6):915928

    Article  Google Scholar 

Download references

Acknowledgments

This work was supported by DST, Govt. of India project no. SR/WOS-A/ET-53/2012(G).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Swapna Agarwal.

Appendix A: Convergence of the Adaptive Learning

Appendix A: Convergence of the Adaptive Learning

Suppose that in the ith iteration of the while loop of Algorithm 1, ξ number of MDs (Q 1,Q 2,...,Q ξ ) find key-MD P l as their nearest key-MD. For illustration purpose, a typical positioning of the ξ number of MDs and the key-MD P l in 2D space is shown in Fig. 14. Let the distance between the key-MD P l and MDs Q k , k=1,2,...,ξ be represented by d k , k=1,2,...,ξ and \(S={\sum }_{k=1}^{\xi }{d_{k}}\), i.e., S represents the summation of the distances from the key-MD P l to the MDs Q k , k=1,2,3,...,ξ. Without loss of generality, let us consider the MD Q 1. After moving P l by a fraction (δ) of the distance d 1 towards Q 1, let the new position of the key-MD be \({P^{1}_{l}}\). Let the new distances between the key-MD \({P^{1}_{l}}\) and MDs Q k be represented by \({d^{1}_{k}}\) and let, \(S^{1}={\sum }_{k=1}^{\xi }{{d^{1}_{k}}}\). We want S to decrease with each iteration. Let the angles between the straight lines joining Q k to P l and P l to \({P^{1}_{l}}\) be 𝜃 k and \(\lambda ^{1}=d_{1} - {d^{1}_{l}}\). Therefore, \({d^{1}_{k}}-d_{k}=\sqrt {((d_{k})^{2}+(\lambda ^{1})^{2}-2d_{k}\lambda ^{1}\cos \theta _{k})}-d_{k}\). It can be shown that,

$$ \frac{dS^{1}}{d\lambda^{1}}=\frac{d}{d\lambda^{1}}\{\sum\limits_{k=1}^{\xi}{{d^{1}_{k}}}\} =-\sum\limits_{k=1}^{\xi}\cos\theta_{k} =-1-\sum\limits_{k=2}^{\xi}\cos\theta_{k}. $$
(7)
Fig. 14
figure 14

A typical positioning of the ξ number of MDs, the key-MD P l and its new position \({P^{1}_{l}}\) in 2D space

From (7) it can be said that rate of change of S 1 with respect to λ 1 is negative i.e., the key-MD converges towards the corresponding MDs, when \({\sum }_{k=2}^{\xi }\cos \theta _{k}>-1\text {.}\)

Let ζ k,j , k,j=1,2,...,ξ represent the distance between the two MDs Q j and Q k . Therefore, from (7) we can write, \({d^{1}_{k}} < d_{k}\) when (d k )2+(d 1)2>ζ k,1. Similar equations can be inferred when we consider all the other MDs Q k , k=2,...,ξ. Therefore, it can be said that the key-MD P l converges towards the cluster of MDs Q k , k=1,2,...,ξ as long as P l remains outside the hyper circle whose diameter is the distance between the two maximum distant MDs among all Q k , k=1,2,...,ξ. We find the lower bound of SS ξ where S ξ is the summation of distances from the key-MD P l to all the MDs Q k , k=1,2,...,ξ after one full iteration of the while loop of Algorithm 1. From Fig. 14 we get (from the triangle inequality),

$$\begin{array}{@{}rcl@{}} &&\sum\limits_{k=2}^{\xi}{{d^{1}_{k}}} + ({\xi}-1){d^{1}_{1}} \leq \sum\limits_{k=2}^{\xi}{d_{k}} + ({\xi}-1)d_{1} \\ &\Rightarrow &\sum\limits_{k=1}^{\xi}{d_{k}} - \sum\limits_{k=1}^{\xi}{{d^{1}_{k}}} \geq ({\xi}-2)\left( {d^{1}_{1}}-d_{1}\right)\text{.}\\ &\text{Similarly,}& \sum\limits_{k=1}^{\xi}{{d^{1}_{k}}} - \sum\limits_{k=1}^{\xi}{{d^{2}_{k}}} \geq ({\xi}-2)\left( {d^{2}_{2}}-{d^{1}_{2}}\right)\\ &&.\\ &&.\\ &&.\\ &&\sum\limits_{k=1}^{\xi}{d^{{\xi}-1}_{k}} - \sum\limits_{k=1}^{\xi}{d^{{\xi}}_{k}} \geq ({\xi}-2)\left( d^{\xi}_{\xi}-d^{{\xi}-1}_{\xi}\right). \end{array} $$
(8)

Summing up the above ξ number of inequations we get,

$$\begin{array}{@{}rcl@{}} &&\sum\limits_{k=1}^{\xi}{d_{k}} - \sum\limits_{k=1}^{\xi}{d^{\xi}_{k}} \geq ({\xi}-2)\left( {d^{1}_{1}}-d_{1}+{d^{2}_{2}}-{d^{1}_{2}}+ ... + d^{\xi}_{\xi}-d^{\xi-1}_{\xi}\right) \\ &&\hspace*{20pt}\Rightarrow S-S^{\xi} \geq -({\xi}-2)(\lambda^{1} + \lambda^{2} + ... + \lambda^{\xi}). \end{array} $$
(9)

Since λ k, k=1,2,...,ξ are assumed to be very small, the magnitude of the right side of (9) is very small. We find the value of P l for which S ξ gets its minimum value as follows. Since, S ξ is the summation of some distances, it is a positive number. Therefore, S ξ gets its minimum value for the value of P l for which \({\sum }_{k=1}^{\xi }{(Q_{k}-P_{l})^{2}}\) gets its minimum value. Let us define \(S^{\prime }={\sum }_{k=1}^{\xi }{(Q_{k}-P_{l})^{2}}\). Therefore,

$$\begin{array}{@{}rcl@{}} dS^{\prime} &=& -2\sum\limits_{k=1}^{\xi}{(Q_{k}-P_{l})dP_{l}} =-2\xi\{(\sum\limits_{k=1}^{\xi}{(Q_{k}}/\xi)-P_{l})\}dP_{l} \\ &=& -2\xi(\bar{Q}-P_{l})dP_{l} \end{array} $$
(10)

In (10), \(\bar {Q}\) refers to the arithmetic mean of the MDs Q k , k=1,2,...,ξ. From (10) it can be said that \(dS^{\prime }\) and therefore \(S^{\prime }\) gets its minimum value when \(dP_{l} =\bar {Q}-P_{l}\) or \(P_{l}=\bar {Q}\). The characteristic and the biggest advantage of adaptive learning technique is that the system under adaptive learning changes gracefully according to the changes in its environment. That is the key-MD under adaptive learning stores the experience from past learning. This is not possible if we consider \(\bar {Q}\) to be the key-MD. Therefore, we represent the video sequences (training and test) in terms of ED constructed using adapted key-MDs.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Agarwal, S., Mukherjee, D.P. Facial expression recognition through adaptive learning of local motion descriptor. Multimed Tools Appl 76, 1073–1099 (2017). https://doi.org/10.1007/s11042-015-3103-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-015-3103-6

Keywords

Navigation