Multi-modal Conditional Feature Enhancement for Facial Action Unit Recognition

Lakshminarayana, Nagashri N.; Mohan, Deen Dayal; Sankaran, Nishant; Setlur, Srirangaraj; Govindaraju, Venu

doi:10.1007/978-3-030-30671-7_7

Multi-modal Conditional Feature Enhancement for Facial Action Unit Recognition

Nagashri N. Lakshminarayana⁵,
Deen Dayal Mohan⁵,
Nishant Sankaran⁵,
Srirangaraj Setlur⁵ &
…
Venu Govindaraju⁵

Chapter
First Online: 09 January 2020

745 Accesses
1 Citations
1 Altmetric

Abstract

Current state-of-the-art methods in multi-modal fusion typically rely on generating a new shared representation space onto which multi-modal features are mapped for the goal of obtaining performance improvements by combining the individual modalities. Often, these heavily fine-tuned feature representations would have strong feature discriminability in their own spaces which may not be present in the fused subspace owing to the compression of information arising from multiple sources. To address this, we propose a new approach to fusion by enhancing the individual feature spaces through information exchange between the modalities. Essentially, domain adaptation is learnt by building a shared representation used for mutually enhancing each domain’s knowledge. In particular, the learning objective is modeled to modify the features with the overarching goal of improving the combined system performance. We apply our fusion method to the task of facial action unit (AU) recognition by learning to enhance the thermal and visible feature representations. We compare our approach to other recent fusion schemes and demonstrate its effectiveness on the MMSE dataset by outperforming previous techniques.

N. N. Lakshminarayana, D. D. Mohan, N. Sankaran—Equal contribution authors listed in alphabetical order.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Bodla N, Zheng J, Xu H, Chen J, Castillo CD, Chellappa R (2017) Deep heterogeneous feature fusion for template-based face recognition. CoRR http://arxiv.org/abs/1702.04471
Chu WS, De la Torre F, Cohn JF (2017) Learning spatial and temporal cues for multi-label facial action unit detection. In: 2017 12th IEEE international conference on automatic face and gesture recognition (FG 2017). IEEE, pp 25–32
Google Scholar
Corneanu CA, Simón MO, Cohn JF, Guerrero SE (2016) Survey on rgb, 3D, thermal, and multimodal approaches for facial expression recognition: history, trends, and affect-related applications. IEEE Trans Pattern Anal Mach Intell 38(8):1548–1568
Article Google Scholar
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR’05), vol 1, pp 886–893. https://doi.org/10.1109/CVPR.2005.177
Ekman P, Friesen WV (1976) Measuring facial movement. Environ Psychol Nonverbal Behav 1(1):56–75
Article Google Scholar
Ghosh S, Laksana E, Scherer S, Morency LP (2015) A Multi-label convolutional neural network approach to cross-domain action unit detection. In: Proceedings of ACII 2015. IEEE, Xi’an, China. http://ict.usc.edu/pubs/A%20Multi-label%20Convolutional%20Neural%20Network%20Approach%20to%20Cross-Domain%20Action%20Unit%20Detection.pdf
Gudi A, Tasli HE, Den Uyl TM, Maroulis A (2015) Deep learning based facs action unit occurrence and intensity estimation. In: Proceedings of the 2015 11th IEEE international conference and workshops on automatic face and gesture recognition (FG), vol 6. IEEE, pp 1–5
Google Scholar
Han S, Meng Z, Khan AS, Tong Y (2016) Incremental boosting convolutional neural network for facial action unit recognition. In: Advances in neural information processing systems, pp 109–117
Google Scholar
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Google Scholar
Huang G, Liu Z, Weinberger KQ, van der Maaten L (2017) Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, vol 1, p 3
Google Scholar
Huang H, Liu H, Kong X, Lou X, Wang Z (2017) Heterogeneous massive feature fusion on grassmannian manifold. J Phys: Conf Ser 887:012066. (IOP Publishing)
Google Scholar
Jaiswal S, Valstar M (2016) Deep learning the dynamic appearance and shape of facial action units. In: 2016 IEEE winter conference on applications of computer vision (WACV). IEEE, pp 1–8
Google Scholar
Jarlier S, Grandjean D, Delplanque S, N’diaye K, Cayeux I, Velazco MI, Sander D, Vuilleumier P, Scherer KR (2011) Thermal analysis of facial muscles contractions. IEEE Trans Affect Comput 2(1):2–9
Article Google Scholar
Lahat D, Adalı T, Jutten C (2015) Multimodal data fusion: an overview of methods, challenges and prospects. Proc IEEE 103(9):1449–1477. https://hal.archives-ouvertes.fr/hal-01179853
Article Google Scholar
Lin G, Fan G, Kang X, Zhang E, Yu L (2016) Heterogeneous feature structure fusion for classification. Pattern Recognit. 53:1–11
Article Google Scholar
Lin TY, RoyChowdhury A, Maji S (2015) Bilinear CNN models for fine-grained visual recognition
Google Scholar
Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110. https://doi.org/10.1023/B:VISI.0000029664.99615.94
Article Google Scholar
Sankaran N, Tulyakov S, Setlur S, Govindaraju V (2018) Metadata-based feature aggregation network for face recognition. In: 2018 11th IAPR international conference on biometrics (ICB 2018). IEEE
Google Scholar
Saxe AM, Bansal Y, Dapello J, Advani M, Kolchinsky A, Tracey BD, Cox DD (2018) On the information bottleneck theory of deep learning. In: International conference on learning representations
Google Scholar
Schroff F, Kalenichenko D, Philbin J (2015) Facenet: a unified embedding for face recognition and clustering. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 815–823
Google Scholar
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556
Srivastava RK, Greff K, Schmidhuber J (2015) Training very deep networks. In: Advances in neural information processing systems. pp 2377–2385
Google Scholar
Szegedy C, Ioffe S, Vanhoucke V, Alemi AA (2017) Inception-v4, inception-resnet and the impact of residual connections on learning
Google Scholar
Tian YL, Kanade T, Cohn JF (2005) Facial expression analysis. In: Handbook of face recognition. Springer, Berlin, pp 247–275
Google Scholar
Xu C, Tao D, Xu C (2013) A survey on multi-view learning. arXiv preprint arXiv:1304.5634
Yu Z, Yu J, Fan J, Tao D (2017) Multi-modal factorized bilinear pooling with co-attention learning for visual question answering. CoRR http://arxiv.org/abs/1708.01471
Zhang K, Zhang Z, Li Z, Qiao Y (2016) Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Process Lett 23(10):1499–1503
Article Google Scholar
Zhang Z, Girard JM, Wu Y, Zhang X, Liu P, Ciftci U, Canavan S, Reale M, Horowitz A, Yang H, Cohn JF, Ji Q, Yin L (2016) Multimodal spontaneous emotion corpus for human behavior analysis. In: 2016 IEEE CVPR, pp 3438–3446. https://doi.org/10.1109/CVPR.2016.374
Zhao H, Tian M, Sun S, Shao J, Yan J, Yi S, Wang X, Tang X (2017) Spindle net: person re-identification with human body region guided feature decomposition and fusion. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 1077–1085
Google Scholar
Zhao K, Chu WS, De la Torre F, Cohn JF, Zhang H (2016) Joint patch and multi-label learning for facial action unit and holistic expression recognition. IEEE Trans Image Process 25(8):3931–3946
Article MathSciNet Google Scholar

Download references

Acknowledgements

This material is based upon work partially supported by the National Science Foundation under Grant IIP \(\#1266183\).

Author information

Authors and Affiliations

University at Buffalo, Buffalo, NY, 14226, USA
Nagashri N. Lakshminarayana, Deen Dayal Mohan, Nishant Sankaran, Srirangaraj Setlur & Venu Govindaraju

Authors

Nagashri N. Lakshminarayana
View author publications
You can also search for this author in PubMed Google Scholar
Deen Dayal Mohan
View author publications
You can also search for this author in PubMed Google Scholar
Nishant Sankaran
View author publications
You can also search for this author in PubMed Google Scholar
Srirangaraj Setlur
View author publications
You can also search for this author in PubMed Google Scholar
Venu Govindaraju
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Deen Dayal Mohan .

Editor information

Editors and Affiliations

Indraprastha Institute of Information Technology Delhi, New Delhi, India
Richa Singh
Indraprastha Institute of Information Technology Delhi, New Delhi, India
Mayank Vatsa
Johns Hopkins University, Baltimore, MD, USA
Vishal M. Patel
IBM Thomas J. Watson Research Center, Yorktown Heights, NY, USA
Nalini Ratha

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Lakshminarayana, N.N., Mohan, D.D., Sankaran, N., Setlur, S., Govindaraju, V. (2020). Multi-modal Conditional Feature Enhancement for Facial Action Unit Recognition. In: Singh, R., Vatsa, M., Patel, V., Ratha, N. (eds) Domain Adaptation for Visual Understanding. Springer, Cham. https://doi.org/10.1007/978-3-030-30671-7_7

Download citation

DOI: https://doi.org/10.1007/978-3-030-30671-7_7
Published: 09 January 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-30670-0
Online ISBN: 978-3-030-30671-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics