Skip to main content

Multi-modal Conditional Feature Enhancement for Facial Action Unit Recognition

  • Chapter
  • First Online:

Abstract

Current state-of-the-art methods in multi-modal fusion typically rely on generating a new shared representation space onto which multi-modal features are mapped for the goal of obtaining performance improvements by combining the individual modalities. Often, these heavily fine-tuned feature representations would have strong feature discriminability in their own spaces which may not be present in the fused subspace owing to the compression of information arising from multiple sources. To address this, we propose a new approach to fusion by enhancing the individual feature spaces through information exchange between the modalities. Essentially, domain adaptation is learnt by building a shared representation used for mutually enhancing each domain’s knowledge. In particular, the learning objective is modeled to modify the features with the overarching goal of improving the combined system performance. We apply our fusion method to the task of facial action unit (AU) recognition by learning to enhance the thermal and visible feature representations. We compare our approach to other recent fusion schemes and demonstrate its effectiveness on the MMSE dataset by outperforming previous techniques.

N. N. Lakshminarayana, D. D. Mohan, N. Sankaran—Equal contribution authors listed in alphabetical order.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Bodla N, Zheng J, Xu H, Chen J, Castillo CD, Chellappa R (2017) Deep heterogeneous feature fusion for template-based face recognition. CoRR http://arxiv.org/abs/1702.04471

  2. Chu WS, De la Torre F, Cohn JF (2017) Learning spatial and temporal cues for multi-label facial action unit detection. In: 2017 12th IEEE international conference on automatic face and gesture recognition (FG 2017). IEEE, pp 25–32

    Google Scholar 

  3. Corneanu CA, Simón MO, Cohn JF, Guerrero SE (2016) Survey on rgb, 3D, thermal, and multimodal approaches for facial expression recognition: history, trends, and affect-related applications. IEEE Trans Pattern Anal Mach Intell 38(8):1548–1568

    Article  Google Scholar 

  4. Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR’05), vol 1, pp 886–893. https://doi.org/10.1109/CVPR.2005.177

  5. Ekman P, Friesen WV (1976) Measuring facial movement. Environ Psychol Nonverbal Behav 1(1):56–75

    Article  Google Scholar 

  6. Ghosh S, Laksana E, Scherer S, Morency LP (2015) A Multi-label convolutional neural network approach to cross-domain action unit detection. In: Proceedings of ACII 2015. IEEE, Xi’an, China. http://ict.usc.edu/pubs/A%20Multi-label%20Convolutional%20Neural%20Network%20Approach%20to%20Cross-Domain%20Action%20Unit%20Detection.pdf

  7. Gudi A, Tasli HE, Den Uyl TM, Maroulis A (2015) Deep learning based facs action unit occurrence and intensity estimation. In: Proceedings of the 2015 11th IEEE international conference and workshops on automatic face and gesture recognition (FG), vol 6. IEEE, pp 1–5

    Google Scholar 

  8. Han S, Meng Z, Khan AS, Tong Y (2016) Incremental boosting convolutional neural network for facial action unit recognition. In: Advances in neural information processing systems, pp 109–117

    Google Scholar 

  9. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

    Google Scholar 

  10. Huang G, Liu Z, Weinberger KQ, van der Maaten L (2017) Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, vol 1, p 3

    Google Scholar 

  11. Huang H, Liu H, Kong X, Lou X, Wang Z (2017) Heterogeneous massive feature fusion on grassmannian manifold. J Phys: Conf Ser 887:012066. (IOP Publishing)

    Google Scholar 

  12. Jaiswal S, Valstar M (2016) Deep learning the dynamic appearance and shape of facial action units. In: 2016 IEEE winter conference on applications of computer vision (WACV). IEEE, pp 1–8

    Google Scholar 

  13. Jarlier S, Grandjean D, Delplanque S, N’diaye K, Cayeux I, Velazco MI, Sander D, Vuilleumier P, Scherer KR (2011) Thermal analysis of facial muscles contractions. IEEE Trans Affect Comput 2(1):2–9

    Article  Google Scholar 

  14. Lahat D, Adalı T, Jutten C (2015) Multimodal data fusion: an overview of methods, challenges and prospects. Proc IEEE 103(9):1449–1477. https://hal.archives-ouvertes.fr/hal-01179853

    Article  Google Scholar 

  15. Lin G, Fan G, Kang X, Zhang E, Yu L (2016) Heterogeneous feature structure fusion for classification. Pattern Recognit. 53:1–11

    Article  Google Scholar 

  16. Lin TY, RoyChowdhury A, Maji S (2015) Bilinear CNN models for fine-grained visual recognition

    Google Scholar 

  17. Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110. https://doi.org/10.1023/B:VISI.0000029664.99615.94

    Article  Google Scholar 

  18. Sankaran N, Tulyakov S, Setlur S, Govindaraju V (2018) Metadata-based feature aggregation network for face recognition. In: 2018 11th IAPR international conference on biometrics (ICB 2018). IEEE

    Google Scholar 

  19. Saxe AM, Bansal Y, Dapello J, Advani M, Kolchinsky A, Tracey BD, Cox DD (2018) On the information bottleneck theory of deep learning. In: International conference on learning representations

    Google Scholar 

  20. Schroff F, Kalenichenko D, Philbin J (2015) Facenet: a unified embedding for face recognition and clustering. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 815–823

    Google Scholar 

  21. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556

  22. Srivastava RK, Greff K, Schmidhuber J (2015) Training very deep networks. In: Advances in neural information processing systems. pp 2377–2385

    Google Scholar 

  23. Szegedy C, Ioffe S, Vanhoucke V, Alemi AA (2017) Inception-v4, inception-resnet and the impact of residual connections on learning

    Google Scholar 

  24. Tian YL, Kanade T, Cohn JF (2005) Facial expression analysis. In: Handbook of face recognition. Springer, Berlin, pp 247–275

    Google Scholar 

  25. Xu C, Tao D, Xu C (2013) A survey on multi-view learning. arXiv preprint arXiv:1304.5634

  26. Yu Z, Yu J, Fan J, Tao D (2017) Multi-modal factorized bilinear pooling with co-attention learning for visual question answering. CoRR http://arxiv.org/abs/1708.01471

  27. Zhang K, Zhang Z, Li Z, Qiao Y (2016) Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Process Lett 23(10):1499–1503

    Article  Google Scholar 

  28. Zhang Z, Girard JM, Wu Y, Zhang X, Liu P, Ciftci U, Canavan S, Reale M, Horowitz A, Yang H, Cohn JF, Ji Q, Yin L (2016) Multimodal spontaneous emotion corpus for human behavior analysis. In: 2016 IEEE CVPR, pp 3438–3446. https://doi.org/10.1109/CVPR.2016.374

  29. Zhao H, Tian M, Sun S, Shao J, Yan J, Yi S, Wang X, Tang X (2017) Spindle net: person re-identification with human body region guided feature decomposition and fusion. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 1077–1085

    Google Scholar 

  30. Zhao K, Chu WS, De la Torre F, Cohn JF, Zhang H (2016) Joint patch and multi-label learning for facial action unit and holistic expression recognition. IEEE Trans Image Process 25(8):3931–3946

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

This material is based upon work partially supported by the National Science Foundation under Grant IIP \(\#1266183\).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Deen Dayal Mohan .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Lakshminarayana, N.N., Mohan, D.D., Sankaran, N., Setlur, S., Govindaraju, V. (2020). Multi-modal Conditional Feature Enhancement for Facial Action Unit Recognition. In: Singh, R., Vatsa, M., Patel, V., Ratha, N. (eds) Domain Adaptation for Visual Understanding. Springer, Cham. https://doi.org/10.1007/978-3-030-30671-7_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-30671-7_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-30670-0

  • Online ISBN: 978-3-030-30671-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics