research-article

MF²ShrT: Multimodal Feature Fusion Using Shared Layered Transformer for Face Anti-spoofing

Authors:

Aashania Antil,

Chhavi DhimanAuthors Info & Claims

ACM Transactions on Multimedia Computing, Communications and Applications, Volume 20, Issue 6

Article No.: 172, Pages 1 - 21

https://doi.org/10.1145/3640817

Published: 08 March 2024 Publication History

Abstract

In recent times, Face Anti-spoofing (FAS) has gained significant attention in both academic and industrial domains. Although various convolutional neural network (CNN)-based solutions have emerged, multimodal approaches incorporating RGB, depth, and information retrieval (IR) have exhibited better performance than unimodal classifiers. The increasing veracity of modern presentation attack instruments results in a persistent need to enhance the performance of such models. Recently, self-attention-based vision transformers (ViT) have become a popular choice in this field. Their fundamental aspects for multimodal FAS have not been thoroughly explored yet. Therefore, we propose a novel framework for FAS called MF²ShrT, which is based on a pretrained vision transformer. The proposed framework uses overlap patches and parameter sharing in the ViT network, allowing it to utilize multiple modalities in a computationally efficient manner. Furthermore, to effectively fuse intermediate features from different encoders of each ViT, we explore a T-encoder-based hybrid feature block enabling the system to identify correlations and dependencies across different modalities. MF²ShrT outperforms conventional vision transformers and achieves state-of-the-art performance on benchmarks CASIA-SURF and WMCA, demonstrating the efficiency of transformer-based models for presentation attack detection PAD).

References

[1]

R. Ramachandra and C. Busch. 2017. Presentation attack detection methods for face recognition systems: A comprehensive survey. ACM Computing Surveys (CSUR) 50, 1 (2017), 1–37.

Digital Library

[2]

Z. Zhang, J. Yan, S. Liu, Z. Lei, D. Yi, and S. Z. Li. 2012. A face antispoofing database with diverse attacks. In 5th IAPR International Conference on Biometrics (ICB), New Delhi, India.

[3]

I. Chingovska, A. Anjos, and S. Marcel. 2012. On the effectiveness of local binary patterns in face anti-spoofing. In BIOSIG — Proceedings of the International Conference of Biometrics Special Interest Group (BIOSIG), Darmstadt, Germany.

[4]

Y. Li, W. Liu, Y. Jin, and Y. Cao. 2021. SPGAN: Face forgery using spoofing generative adversarial networks. ACM Transactions on Multimedia Computing, Communications, and Applications 17 (2021), 1–20.

Digital Library

[5]

A. Liu, C. Zhao, Z. Yu, J. Wan, A. Su, X. Liu, Z. Tan, S. Escalera, J. Xing, Y. Liang, G. Guo, Z. Lei, S. Z. Li, and D. Zhang. 2021. Contrastive context-aware learning for 3D high-fidelity mask face presentation attack detection. IEEE Transactions on Information Forensics and Security 17 (2021), 2497–2507.

Digital Library

[6]

P. N. Belhumeur, J. P. Hespanha, and D. J. Kriegman. 1997. Eigenfaces vs. Fisherfaces: Recognition using class specific linear projection. IEEE Transactions on Pattern Analysis and Machine Intelligence 19, 7 (1997), 711–720.

Digital Library

[7]

B. Zhang, S. Shan, X. Chen, and W. Gao. 2007. Histogram of Gabor phase patterns (HGPP): A novel object representation approach for face recognition. IEEE Transactions on Image Processing 16, 1 (2007), 57–68.

Digital Library

[8]

H. Chen, G. Hu, Z. Lei, Y. Chen, N. M. Robertson, and S. Z. Li. 2019. Attention-based two-stream convolutional networks for face spoofing detection. IEEE Transactions on Information Forensics and Security 15 (2019), 578–593.

Digital Library

[9]

W. R. Almeida, F. A. Andaló, R. Padilha, G. Bertocco, W. Dias, R. D. S. Torres, J. Wainer, and A. Rocha. 2020. Detecting face presentation attacks in mobile devices with a patch-based CNN and a sensor-aware loss function. PLos One 15, 9 (2020).

[10]

Y. Liu, A. Jourabloo, and X. Liu. 2018. Learning deep models for face anti-spoofing: Binary or auxiliary supervision. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.

[11]

Z. Wang, Y. Xu, L. Wu, H. Han, Y. Ma, and Z. Li. 2023. Improving face anti-spoofing via advanced multi-perspective feature learning. ACM Transactions on Multimedia Computing, Communications, and Applications 19, 6 (2023), 1–18.

Digital Library

[12]

L. Feng, L.-M. Po, Y. Li, X. Xu, F. Yuan, T. C.-H. Cheung, and K.-W. Cheung. 2016. Integration of image quality and motion cues for face anti-spoofing: A neural network approach. Journal of Visual Communication and Image Representation 38 (2016), 451–460.

Digital Library

[13]

A. Antil and C. Dhiman. 2023. A two stream face antispoofing framework using multilevel deep features and ELBP features. Multimedia Systems (2023), 2023.

[14]

Z. Yu, X. Li, J. Shi, Z. Xia, and G. Zhao. 2021. Revisiting pixel-wise supervision for face anti-spoofing. IEEE Transactions on Biometrics, Behavior and Identity Science (2021).

[15]

G. Wang, C. Lan, H. Han, S. Shan, and X. Chen. 2019. Multi-modal face presentation attack detection via spatial and channel attentions. In CVPRW, (2019).

[16]

Q. Yang, X. Zhu, J.-K. Fwu, Y. Ye, G. You, and Y. Zhu. 2020. PipeNet: Selective modal pipeline of fusion network for multi-modal face anti-spoofing. arXiv:2004.11744 [cs.CV].

[17]

W. Liu, X. Wei, T. Lei, X. Wang, H. Meng, and A. K. Nandi. 2021. Data fusion based two-stage cascade framework for multi-modality face anti-spoofing. IEEE Transactions on Cognitive and Developmental Systems 1–1, (2021).

[18]

K. Han, Y. Wang, H. Chen, X. Chen, J. Guo, Z. Liu, Y. Tang, A. Xiao, C. Xu, Y. Xu, Z. Yang, Y. Zhang, and D. Tao. 2023. A survey on vision transformer. IEEE Transactions on Pattern Analysis and Machine Intelligence 45 (2023), 87–110.

[19]

A. George and S. Marcel. 2021. On the effectiveness of vision transformers for zero-shot face anti-spoofing. 2021 IEEE International Joint Conference on Biometrics (IJCB). Shenzhen, China, 1--8. DOI:

Digital Library

[20]

Z. Ming, Z. Yu, M. Al-Ghadi, M. Visani, M. MuzzamilLuqman, and J.-C. Burie. ViTransPAD: Video transformer using convolution and self-attention for face presentation attack detection. arXiv:2203.01562 [cs.CV].

[21]

Z. Wang, Q. Wang, W. Deng, and G. Guo. 2022. Face anti-spoofing using transformers with relation-aware mechanism. IEEE Transactions on Biometrics, Behavior, and Identity Science 4, 3 (2022), 439–450.

[22]

A. Liu, Z. Tan, J. Wan, Y. Liang, Z. Lei, G. Guo, and S. Z. Li. 2021. Face anti-spoofing via adversarial cross-modality translation. IEEE Transactions on Information Forensics and Security 16 (2021), 2759–2772.

[23]

A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby. 2021. An image is worth 16x16 words: Transformers for image recognition at scale. In ICLR, Vienna, Austria.

[24]

Y. Atoum, Y. Liu, A. Jourabloo, and X. Liu. 2017. Face anti-spoofing using patch and depth-based CNNs. In IEEE International Joint Conference on Biometrics (IJCB), Denver, CO, USA.

Digital Library

[25]

S. Zhang, A. Liu, J. Wan, Y. Liang, G. Guo, S. Escalera, H. J. Escalante, and S. Z. Li. 2020. CASIA-SURF: A large-scale multi-modal benchmark for face anti-spoofing. IEEE Transactions on Biometrics, Behavior, and Identity Science 2, 2 (2020), 182–193.

[26]

A. Liu, Z. Tan, J. Wan, S. Escalera, G. Guo, and S. Z. Li. 2021. CASIA-SURF CeFA: A benchmark for multi-modal cross-ethnicity face anti-spoofing. In IEEE/CVF Winter Conference on Applications of Computer Vision (WACV).

[27]

A. Parkin and O. Grinchuk. 2019. Recognizing multi-modal face spoofing with face recognition networks. In IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Long Beach, CA, USA.

[28]

S. Zhang, X. Wang, A. Liu, C. Zhao, J. Wan, S. Escalera, H. Shi, Z. Wang, and S. Z. Li. 2019. A dataset and benchmark for large-scale multi-modal face anti-spoofing. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA.

[29]

T. Shen, Y. Huang, and Z. Tong. 2019. FaceBagNet: Bag-of-local-features model for multi-modal face anti-spoofing. In IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Long Beach, CA, USA.

[30]

A. George and S. Marcel. 2020. Learning one class representations for face presentation attack detection using multi-channel convolutional neural networks. IEEE Transactions on Information Forensics and Security 16 (2020), 361–375.

[31]

O. Nikisins, A. George, and S. Marcel. 2019. Domain adaptation in multi-channel autoencoder based features for robust face anti-spoofing. In International Conference on Biometrics (ICB), Crete, Greece.

[32]

A. Liu and Y. Liang. 2022. MA-ViT: Modality-agnostic vision transformers for face anti-spoofing. In Proceedings of the 31st International Joint Conference on Artificial Intelligence (IJCAI’22), Vienna, Austria.

[33]

Z. Li, H. Li, X. Luo, Y. Hu, K.-Y. Lam, and A. C. Kot. Asymmetric modality translation for face presentation attack detection. arXiv:2110.09108 [cs.CV].

[34]

W. Wang, F. Wen, H. Zheng, R. Ying, and P. Liu. 2022. Conv-MLP: A convolution and MLP mixed model for multimodal face anti-spoofing. IEEE Transactions on Information Forensics and Security 17 (2022), 2284–2297.

[35]

Q. Liu and L. Zhang. 2021. Face anti-spoofing by using feature fusion. In Proceedings of the 2021 International Conference on Pattern Recognition and Intelligent Systems, Bangkok.

Digital Library

[36]

P. Zhang, F. Zou, Z. Wu, N. Dai, S. Mark, M. Fu, J. Zhao, and K. Li. 2019. FeatherNets: Convolutional neural networks as light as feather for face anti-spoofing. In CVPRW.

[37]

Z. Yu, Y. Qin, X. Li, Z. Wang, C. Zhao, Z. Lei, and G. Zhao. 2020. Multi-modal face anti-spoofing based on central difference networks. arXiv:2004.08388 [cs.CV].

[38]

A. George and S. Marcel. 2021. Cross modal focal loss for RGBD face anti-spoofing. arXiv:2103.00948 [cs.CV].

[39]

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin. Attention Is All You Need. arXiv:1706.03762 [cs.CL].

[40]

Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, and B. Guo. 2021. Swin transformer: Hierarchical vision transformer using shifted windows. In IEEE/CVF International Conference on Computer Vision (ICCV).

[41]

N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, and S. Zagoruyko. 2005. End-to-end object detection with transformers. arXiv:2005.12872 [cs.CV].

[42]

S. Zheng, J. Lu, H. Zhao, X. Zhu, Z. Luo, Y. Wang, Y. Fu, J. Feng, T. Xiang, P. H. Torr, and L. Zhang. Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers. arXiv:2012.15840 [cs.CV].

[43]

J. Yu, J. Li, Z. Yu, and Q. Huang. 2020. Multimodal transformer with multi-view visual representation for image captioning. IEEE Transactions on Circuits and Systems for Video Technology 30, 12 (2020).

Digital Library

[44]

J. Yu, X. Yang, F. Gao, and D. Tao. 2017. Deep multimodal distance metric learning using click constraints for image ranking. IEEE Transactions on Cybernetics 47, 12 (2017), 4014–4024.

[45]

A. Liu, Z. Tan, Z. Yu, C. Zhao, J. Wan, Y. Liang, D. Zhang, S. Z. Li, and G. Guo. 2023. FM-ViT: Flexible modal vision transformers for face anti-spoofing. arXiv:2305.03277v1 [cs.CV].

[46]

Z. Wang, Q. Wang, W. Deng, and G. Guo. 2022. Learning multi-granularity temporal characteristics for face anti-spoofing. IEEE Transactions on Information Forensics and Security 17 (2022), 1254–1269.

[47]

Z. Yu, X. Li, P. Wang, and G. Zhao. TransRPPG: Remote photoplethysmography transformer for 3D mask face presentation attack detection. arXiv:2104.07419 [cs.CV].

[48]

Y. Zhong and W. Deng. Face transformer for recognition. arXiv:2103.14803 [cs.CV].

[49]

T. B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, A. Neelakanta, P. Shyam, G. Sastry, A. Askell, S. Agarwal, A. Herbert-Voss, G. Krueger, T. Henighan, R. Child, A. Ramesh, D. M. Ziegler, J. Wu, C. Winter, C. Hesse, M. Chen, E. Sigler, M. Litwin, S. Gray, B. Chess, J. Clark, C. Berner, S. McCandlish, A. Radford, I. Sutskever, and D. Amodei. Language models are few-shot learners. arXiv:2005.14165 [cs.CL].

[50]

S. Takase and S. Kiyono. Lessons on parameter sharing across layers in transformers. arXiv:2104.06022 [cs.CL].

[51]

V. Nair and G. E. Hinton. 2010. “Rectified linear units improve restricted boltzmann machines. In ICML, Haifa, Israel.

Digital Library

[52]

A. George, Z. Mostaani, D. Geissenbuhler, O. Nikisins, A. Anjos, and S. Marcel. 2019. Biometric face presentation attack detection with multi-channel convolutional neural network. IEEE Transactions on Information Forensics and Security 15 (2019), 42–55.

[53]

A. Anjos and S. Marcel. 2011. Counter-measures to photo attacks in face recognition: A public database and a baseline. In International Joint Conference on Biometrics (IJCB), Washington, DC, USA.

Digital Library

[54]

A. George and S. Marcel. 2019. Deep pixel-wise binary supervision for face presentation attack detection. In International Conference on Biometrics, ICB 2019, Crete, Greece.

[55]

K. He, X. Zhang, S. Ren, and J. Sun. 2015. Deep residual learning for image recognition. arXiv:1512.03385 [cs.CV].

[56]

I. Tolstikhin, N. Houlsby, A. Kolesniko, L. Beyer, X. Zhai, T. Unterthiner, J. Yung, A. Steiner, D. Keysers, J. Uszkoreit, M. Lucic, and A. Dosovitskiy. MLP-Mixer: An all-MLP architecture for vision. arXiv:2105.01601 [cs.CV].

[57]

Z. Yu, R. Cai, Y. Cui, X. Liu, Y. Hu, and A. Kot. Rethinking vision transformer and masked autoencoder in multimodal face anti-spoofing. arXiv:2302.05744 [cs.CV].

[58]

T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollar. Focal loss for dense object detection. arXiv:1708.02002v2 [cs.CV].

[59]

Z. Yu, C. Zhao, Z. Wang, Y. Qin, Z. Su, X. Li, F. Zhou, and G. Zhao. 2020. Searching central difference convolutional networks for face anti-spoofing. In CVPR, Seattle, Washington.

[60]

G. Huang, Z. Liu, L. V. D. Maaten, and K. Q. Weinberger. Densely connected convolutional networks. arXiv:1608.06993 [cs.CV].

[61]

A. Liu, Z. Tan, X. Li, J. Wan, S. Escalera, G. Guo, and S. Z. Li. Static and dynamic fusion for multi-modal cross-ethnicity face anti-spoofing. arXiv:1912.02340v2 [cs.CV].

Cited By

Liu SUl Abideen ZWan TShahzad IWaseem APan Y(2025)MuST-GAN MFAS: Multi-semantic spoof tracer GAN with transformer layers for multi-modal face anti-spoofingThe Computer Journal10.1093/comjnl/bxaf011Online publication date: 12-Feb-2025
https://doi.org/10.1093/comjnl/bxaf011
Antil ADhiman C(2025)Unmasking Deception: A Comprehensive Survey on the Evolution of Face Anti‐spoofing MethodsNeurocomputing10.1016/j.neucom.2024.128992617(128992)Online publication date: Feb-2025
https://doi.org/10.1016/j.neucom.2024.128992
Ma YLyu CLi LWei YXu Y(2024)Algorithm of face anti-spoofing based on pseudo-negative features generationFrontiers in Neuroscience10.3389/fnins.2024.136228618Online publication date: 12-Apr-2024
https://doi.org/10.3389/fnins.2024.1362286
Show More Cited By

Index Terms

MF²ShrT: Multimodal Feature Fusion Using Shared Layered Transformer for Face Anti-spoofing
1. Applied computing
  1. Life and medical sciences
    1. Bioinformatics
2. Computing methodologies

Recommendations

Rethinking Vision Transformer and Masked Autoencoder in Multimodal Face Anti-Spoofing
Abstract
Recently, vision transformer (ViT) based multimodal learning methods have been proposed to improve the robustness of face anti-spoofing (FAS) systems. However, there are still no works to explore the fundamental natures (e.g., modality-aware ... $^{}$ $^{}$ $^{}$ $^{}$
Multimodal contrastive learning for face anti-spoofing
Abstract
Multimodal face anti-spoofing systems adopt multiple sensor modalities, such as infrared, color, depth, and thermal, to distinguish between living and spoofing faces via complementary spoofing clues from each modality. One challenge is that when ...
Hand-based multimodal biometric fusion: A review
Abstract
Over the past few decades, hand-based multimodal biometrics systems have achieved significant attention because of their high security, accuracy, and anti-counterfeiting. Various hand physiological biometric modalities have been explored for ...
Graphical abstract

Display Omitted
Highlights
- Presented a comprehensive overview of hand-based multimodal biometric fusion.
- Introduced the characteristics of four levels of hand-based biometrics.
- Summarized the recent hand-based multimodal biometrics methods based on six ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Multimedia Computing, Communications, and Applications

ACM Transactions on Multimedia Computing, Communications, and Applications Volume 20, Issue 6

June 2024

715 pages

EISSN:1551-6865

DOI:10.1145/3613638

Editor:
Abdulmotaleb El Saddik
Mohamed Bin Zayed University of Artificial Intelligence, UAE and University of Ottawa, Canada

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 March 2024

Online AM: 25 January 2024

Accepted: 08 January 2024

Revised: 04 December 2023

Received: 09 August 2023

Published in TOMM Volume 20, Issue 6

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

6
Total Citations
View Citations
483
Total Downloads

Downloads (Last 12 months)432
Downloads (Last 6 weeks)46

Reflects downloads up to 16 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Liu SUl Abideen ZWan TShahzad IWaseem APan Y(2025)MuST-GAN MFAS: Multi-semantic spoof tracer GAN with transformer layers for multi-modal face anti-spoofingThe Computer Journal10.1093/comjnl/bxaf011Online publication date: 12-Feb-2025
https://doi.org/10.1093/comjnl/bxaf011
Antil ADhiman C(2025)Unmasking Deception: A Comprehensive Survey on the Evolution of Face Anti‐spoofing MethodsNeurocomputing10.1016/j.neucom.2024.128992617(128992)Online publication date: Feb-2025
https://doi.org/10.1016/j.neucom.2024.128992
Ma YLyu CLi LWei YXu Y(2024)Algorithm of face anti-spoofing based on pseudo-negative features generationFrontiers in Neuroscience10.3389/fnins.2024.136228618Online publication date: 12-Apr-2024
https://doi.org/10.3389/fnins.2024.1362286
Dhiman CAntil AAnand AGakhar S(2024)A deep face spoof detection framework using multi-level ELBPs and stacked LSTMsSignal, Image and Video Processing10.1007/s11760-024-03169-218:S1(499-512)Online publication date: 28-Apr-2024
https://doi.org/10.1007/s11760-024-03169-2
Muhammad ULaaksonen JRomaissa Beddiar DOussalah M(2024)Domain Generalization via Ensemble Stacking for Face Presentation Attack DetectionInternational Journal of Computer Vision10.1007/s11263-024-02152-1132:12(5759-5782)Online publication date: 1-Dec-2024
https://dl.acm.org/doi/10.1007/s11263-024-02152-1
Antil ADhiman C(2024)Securing Faces: A GAN-Powered Defense Against Spoofing with MSRCR and CBAMPattern Recognition10.1007/978-3-031-78201-5_28(430-449)Online publication date: 1-Dec-2024
https://dl.acm.org/doi/10.1007/978-3-031-78201-5_28

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

Figures

Tables

Media

View full text|Download PDF

View Issue’s Table of Contents