skip to main content
research-article

MF2ShrT: Multimodal Feature Fusion Using Shared Layered Transformer for Face Anti-spoofing

Published: 08 March 2024 Publication History

Abstract

In recent times, Face Anti-spoofing (FAS) has gained significant attention in both academic and industrial domains. Although various convolutional neural network (CNN)-based solutions have emerged, multimodal approaches incorporating RGB, depth, and information retrieval (IR) have exhibited better performance than unimodal classifiers. The increasing veracity of modern presentation attack instruments results in a persistent need to enhance the performance of such models. Recently, self-attention-based vision transformers (ViT) have become a popular choice in this field. Their fundamental aspects for multimodal FAS have not been thoroughly explored yet. Therefore, we propose a novel framework for FAS called MF2ShrT, which is based on a pretrained vision transformer. The proposed framework uses overlap patches and parameter sharing in the ViT network, allowing it to utilize multiple modalities in a computationally efficient manner. Furthermore, to effectively fuse intermediate features from different encoders of each ViT, we explore a T-encoder-based hybrid feature block enabling the system to identify correlations and dependencies across different modalities. MF2ShrT outperforms conventional vision transformers and achieves state-of-the-art performance on benchmarks CASIA-SURF and WMCA, demonstrating the efficiency of transformer-based models for presentation attack detection  PAD).

References

[1]
R. Ramachandra and C. Busch. 2017. Presentation attack detection methods for face recognition systems: A comprehensive survey. ACM Computing Surveys (CSUR) 50, 1 (2017), 1–37.
[2]
Z. Zhang, J. Yan, S. Liu, Z. Lei, D. Yi, and S. Z. Li. 2012. A face antispoofing database with diverse attacks. In 5th IAPR International Conference on Biometrics (ICB), New Delhi, India.
[3]
I. Chingovska, A. Anjos, and S. Marcel. 2012. On the effectiveness of local binary patterns in face anti-spoofing. In BIOSIG — Proceedings of the International Conference of Biometrics Special Interest Group (BIOSIG), Darmstadt, Germany.
[4]
Y. Li, W. Liu, Y. Jin, and Y. Cao. 2021. SPGAN: Face forgery using spoofing generative adversarial networks. ACM Transactions on Multimedia Computing, Communications, and Applications 17 (2021), 1–20.
[5]
A. Liu, C. Zhao, Z. Yu, J. Wan, A. Su, X. Liu, Z. Tan, S. Escalera, J. Xing, Y. Liang, G. Guo, Z. Lei, S. Z. Li, and D. Zhang. 2021. Contrastive context-aware learning for 3D high-fidelity mask face presentation attack detection. IEEE Transactions on Information Forensics and Security 17 (2021), 2497–2507.
[6]
P. N. Belhumeur, J. P. Hespanha, and D. J. Kriegman. 1997. Eigenfaces vs. Fisherfaces: Recognition using class specific linear projection. IEEE Transactions on Pattern Analysis and Machine Intelligence 19, 7 (1997), 711–720.
[7]
B. Zhang, S. Shan, X. Chen, and W. Gao. 2007. Histogram of Gabor phase patterns (HGPP): A novel object representation approach for face recognition. IEEE Transactions on Image Processing 16, 1 (2007), 57–68.
[8]
H. Chen, G. Hu, Z. Lei, Y. Chen, N. M. Robertson, and S. Z. Li. 2019. Attention-based two-stream convolutional networks for face spoofing detection. IEEE Transactions on Information Forensics and Security 15 (2019), 578–593.
[9]
W. R. Almeida, F. A. Andaló, R. Padilha, G. Bertocco, W. Dias, R. D. S. Torres, J. Wainer, and A. Rocha. 2020. Detecting face presentation attacks in mobile devices with a patch-based CNN and a sensor-aware loss function. PLos One 15, 9 (2020).
[10]
Y. Liu, A. Jourabloo, and X. Liu. 2018. Learning deep models for face anti-spoofing: Binary or auxiliary supervision. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.
[11]
Z. Wang, Y. Xu, L. Wu, H. Han, Y. Ma, and Z. Li. 2023. Improving face anti-spoofing via advanced multi-perspective feature learning. ACM Transactions on Multimedia Computing, Communications, and Applications 19, 6 (2023), 1–18.
[12]
L. Feng, L.-M. Po, Y. Li, X. Xu, F. Yuan, T. C.-H. Cheung, and K.-W. Cheung. 2016. Integration of image quality and motion cues for face anti-spoofing: A neural network approach. Journal of Visual Communication and Image Representation 38 (2016), 451–460.
[13]
A. Antil and C. Dhiman. 2023. A two stream face antispoofing framework using multilevel deep features and ELBP features. Multimedia Systems (2023), 2023.
[14]
Z. Yu, X. Li, J. Shi, Z. Xia, and G. Zhao. 2021. Revisiting pixel-wise supervision for face anti-spoofing. IEEE Transactions on Biometrics, Behavior and Identity Science (2021).
[15]
G. Wang, C. Lan, H. Han, S. Shan, and X. Chen. 2019. Multi-modal face presentation attack detection via spatial and channel attentions. In CVPRW, (2019).
[16]
Q. Yang, X. Zhu, J.-K. Fwu, Y. Ye, G. You, and Y. Zhu. 2020. PipeNet: Selective modal pipeline of fusion network for multi-modal face anti-spoofing. arXiv:2004.11744 [cs.CV].
[17]
W. Liu, X. Wei, T. Lei, X. Wang, H. Meng, and A. K. Nandi. 2021. Data fusion based two-stage cascade framework for multi-modality face anti-spoofing. IEEE Transactions on Cognitive and Developmental Systems 1–1, (2021).
[18]
K. Han, Y. Wang, H. Chen, X. Chen, J. Guo, Z. Liu, Y. Tang, A. Xiao, C. Xu, Y. Xu, Z. Yang, Y. Zhang, and D. Tao. 2023. A survey on vision transformer. IEEE Transactions on Pattern Analysis and Machine Intelligence 45 (2023), 87–110.
[19]
A. George and S. Marcel. 2021. On the effectiveness of vision transformers for zero-shot face anti-spoofing. 2021 IEEE International Joint Conference on Biometrics (IJCB). Shenzhen, China, 1--8. DOI:
[20]
Z. Ming, Z. Yu, M. Al-Ghadi, M. Visani, M. MuzzamilLuqman, and J.-C. Burie. ViTransPAD: Video transformer using convolution and self-attention for face presentation attack detection. arXiv:2203.01562 [cs.CV].
[21]
Z. Wang, Q. Wang, W. Deng, and G. Guo. 2022. Face anti-spoofing using transformers with relation-aware mechanism. IEEE Transactions on Biometrics, Behavior, and Identity Science 4, 3 (2022), 439–450.
[22]
A. Liu, Z. Tan, J. Wan, Y. Liang, Z. Lei, G. Guo, and S. Z. Li. 2021. Face anti-spoofing via adversarial cross-modality translation. IEEE Transactions on Information Forensics and Security 16 (2021), 2759–2772.
[23]
A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby. 2021. An image is worth 16x16 words: Transformers for image recognition at scale. In ICLR, Vienna, Austria.
[24]
Y. Atoum, Y. Liu, A. Jourabloo, and X. Liu. 2017. Face anti-spoofing using patch and depth-based CNNs. In IEEE International Joint Conference on Biometrics (IJCB), Denver, CO, USA.
[25]
S. Zhang, A. Liu, J. Wan, Y. Liang, G. Guo, S. Escalera, H. J. Escalante, and S. Z. Li. 2020. CASIA-SURF: A large-scale multi-modal benchmark for face anti-spoofing. IEEE Transactions on Biometrics, Behavior, and Identity Science 2, 2 (2020), 182–193.
[26]
A. Liu, Z. Tan, J. Wan, S. Escalera, G. Guo, and S. Z. Li. 2021. CASIA-SURF CeFA: A benchmark for multi-modal cross-ethnicity face anti-spoofing. In IEEE/CVF Winter Conference on Applications of Computer Vision (WACV).
[27]
A. Parkin and O. Grinchuk. 2019. Recognizing multi-modal face spoofing with face recognition networks. In IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Long Beach, CA, USA.
[28]
S. Zhang, X. Wang, A. Liu, C. Zhao, J. Wan, S. Escalera, H. Shi, Z. Wang, and S. Z. Li. 2019. A dataset and benchmark for large-scale multi-modal face anti-spoofing. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA.
[29]
T. Shen, Y. Huang, and Z. Tong. 2019. FaceBagNet: Bag-of-local-features model for multi-modal face anti-spoofing. In IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Long Beach, CA, USA.
[30]
A. George and S. Marcel. 2020. Learning one class representations for face presentation attack detection using multi-channel convolutional neural networks. IEEE Transactions on Information Forensics and Security 16 (2020), 361–375.
[31]
O. Nikisins, A. George, and S. Marcel. 2019. Domain adaptation in multi-channel autoencoder based features for robust face anti-spoofing. In International Conference on Biometrics (ICB), Crete, Greece.
[32]
A. Liu and Y. Liang. 2022. MA-ViT: Modality-agnostic vision transformers for face anti-spoofing. In Proceedings of the 31st International Joint Conference on Artificial Intelligence (IJCAI’22), Vienna, Austria.
[33]
Z. Li, H. Li, X. Luo, Y. Hu, K.-Y. Lam, and A. C. Kot. Asymmetric modality translation for face presentation attack detection. arXiv:2110.09108 [cs.CV].
[34]
W. Wang, F. Wen, H. Zheng, R. Ying, and P. Liu. 2022. Conv-MLP: A convolution and MLP mixed model for multimodal face anti-spoofing. IEEE Transactions on Information Forensics and Security 17 (2022), 2284–2297.
[35]
Q. Liu and L. Zhang. 2021. Face anti-spoofing by using feature fusion. In Proceedings of the 2021 International Conference on Pattern Recognition and Intelligent Systems, Bangkok.
[36]
P. Zhang, F. Zou, Z. Wu, N. Dai, S. Mark, M. Fu, J. Zhao, and K. Li. 2019. FeatherNets: Convolutional neural networks as light as feather for face anti-spoofing. In CVPRW.
[37]
Z. Yu, Y. Qin, X. Li, Z. Wang, C. Zhao, Z. Lei, and G. Zhao. 2020. Multi-modal face anti-spoofing based on central difference networks. arXiv:2004.08388 [cs.CV].
[38]
A. George and S. Marcel. 2021. Cross modal focal loss for RGBD face anti-spoofing. arXiv:2103.00948 [cs.CV].
[39]
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin. Attention Is All You Need. arXiv:1706.03762 [cs.CL].
[40]
Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, and B. Guo. 2021. Swin transformer: Hierarchical vision transformer using shifted windows. In IEEE/CVF International Conference on Computer Vision (ICCV).
[41]
N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, and S. Zagoruyko. 2005. End-to-end object detection with transformers. arXiv:2005.12872 [cs.CV].
[42]
S. Zheng, J. Lu, H. Zhao, X. Zhu, Z. Luo, Y. Wang, Y. Fu, J. Feng, T. Xiang, P. H. Torr, and L. Zhang. Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers. arXiv:2012.15840 [cs.CV].
[43]
J. Yu, J. Li, Z. Yu, and Q. Huang. 2020. Multimodal transformer with multi-view visual representation for image captioning. IEEE Transactions on Circuits and Systems for Video Technology 30, 12 (2020).
[44]
J. Yu, X. Yang, F. Gao, and D. Tao. 2017. Deep multimodal distance metric learning using click constraints for image ranking. IEEE Transactions on Cybernetics 47, 12 (2017), 4014–4024.
[45]
A. Liu, Z. Tan, Z. Yu, C. Zhao, J. Wan, Y. Liang, D. Zhang, S. Z. Li, and G. Guo. 2023. FM-ViT: Flexible modal vision transformers for face anti-spoofing. arXiv:2305.03277v1 [cs.CV].
[46]
Z. Wang, Q. Wang, W. Deng, and G. Guo. 2022. Learning multi-granularity temporal characteristics for face anti-spoofing. IEEE Transactions on Information Forensics and Security 17 (2022), 1254–1269.
[47]
Z. Yu, X. Li, P. Wang, and G. Zhao. TransRPPG: Remote photoplethysmography transformer for 3D mask face presentation attack detection. arXiv:2104.07419 [cs.CV].
[48]
Y. Zhong and W. Deng. Face transformer for recognition. arXiv:2103.14803 [cs.CV].
[49]
T. B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, A. Neelakanta, P. Shyam, G. Sastry, A. Askell, S. Agarwal, A. Herbert-Voss, G. Krueger, T. Henighan, R. Child, A. Ramesh, D. M. Ziegler, J. Wu, C. Winter, C. Hesse, M. Chen, E. Sigler, M. Litwin, S. Gray, B. Chess, J. Clark, C. Berner, S. McCandlish, A. Radford, I. Sutskever, and D. Amodei. Language models are few-shot learners. arXiv:2005.14165 [cs.CL].
[50]
S. Takase and S. Kiyono. Lessons on parameter sharing across layers in transformers. arXiv:2104.06022 [cs.CL].
[51]
V. Nair and G. E. Hinton. 2010. “Rectified linear units improve restricted boltzmann machines. In ICML, Haifa, Israel.
[52]
A. George, Z. Mostaani, D. Geissenbuhler, O. Nikisins, A. Anjos, and S. Marcel. 2019. Biometric face presentation attack detection with multi-channel convolutional neural network. IEEE Transactions on Information Forensics and Security 15 (2019), 42–55.
[53]
A. Anjos and S. Marcel. 2011. Counter-measures to photo attacks in face recognition: A public database and a baseline. In International Joint Conference on Biometrics (IJCB), Washington, DC, USA.
[54]
A. George and S. Marcel. 2019. Deep pixel-wise binary supervision for face presentation attack detection. In International Conference on Biometrics, ICB 2019, Crete, Greece.
[55]
K. He, X. Zhang, S. Ren, and J. Sun. 2015. Deep residual learning for image recognition. arXiv:1512.03385 [cs.CV].
[56]
I. Tolstikhin, N. Houlsby, A. Kolesniko, L. Beyer, X. Zhai, T. Unterthiner, J. Yung, A. Steiner, D. Keysers, J. Uszkoreit, M. Lucic, and A. Dosovitskiy. MLP-Mixer: An all-MLP architecture for vision. arXiv:2105.01601 [cs.CV].
[57]
Z. Yu, R. Cai, Y. Cui, X. Liu, Y. Hu, and A. Kot. Rethinking vision transformer and masked autoencoder in multimodal face anti-spoofing. arXiv:2302.05744 [cs.CV].
[58]
T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollar. Focal loss for dense object detection. arXiv:1708.02002v2 [cs.CV].
[59]
Z. Yu, C. Zhao, Z. Wang, Y. Qin, Z. Su, X. Li, F. Zhou, and G. Zhao. 2020. Searching central difference convolutional networks for face anti-spoofing. In CVPR, Seattle, Washington.
[60]
G. Huang, Z. Liu, L. V. D. Maaten, and K. Q. Weinberger. Densely connected convolutional networks. arXiv:1608.06993 [cs.CV].
[61]
A. Liu, Z. Tan, X. Li, J. Wan, S. Escalera, G. Guo, and S. Z. Li. Static and dynamic fusion for multi-modal cross-ethnicity face anti-spoofing. arXiv:1912.02340v2 [cs.CV].

Cited By

View all
  • (2025)MuST-GAN MFAS: Multi-semantic spoof tracer GAN with transformer layers for multi-modal face anti-spoofingThe Computer Journal10.1093/comjnl/bxaf011Online publication date: 12-Feb-2025
  • (2025)Unmasking Deception: A Comprehensive Survey on the Evolution of Face Anti‐spoofing MethodsNeurocomputing10.1016/j.neucom.2024.128992617(128992)Online publication date: Feb-2025
  • (2024)Algorithm of face anti-spoofing based on pseudo-negative features generationFrontiers in Neuroscience10.3389/fnins.2024.136228618Online publication date: 12-Apr-2024
  • Show More Cited By

Index Terms

  1. MF2ShrT: Multimodal Feature Fusion Using Shared Layered Transformer for Face Anti-spoofing

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on Multimedia Computing, Communications, and Applications
      ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 20, Issue 6
      June 2024
      715 pages
      EISSN:1551-6865
      DOI:10.1145/3613638
      • Editor:
      • Abdulmotaleb El Saddik
      Issue’s Table of Contents

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 08 March 2024
      Online AM: 25 January 2024
      Accepted: 08 January 2024
      Revised: 04 December 2023
      Received: 09 August 2023
      Published in TOMM Volume 20, Issue 6

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Face anti-spoofing
      2. presentation attack detection
      3. multimodal
      4. vision transformer

      Qualifiers

      • Research-article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)432
      • Downloads (Last 6 weeks)46
      Reflects downloads up to 16 Feb 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2025)MuST-GAN MFAS: Multi-semantic spoof tracer GAN with transformer layers for multi-modal face anti-spoofingThe Computer Journal10.1093/comjnl/bxaf011Online publication date: 12-Feb-2025
      • (2025)Unmasking Deception: A Comprehensive Survey on the Evolution of Face Anti‐spoofing MethodsNeurocomputing10.1016/j.neucom.2024.128992617(128992)Online publication date: Feb-2025
      • (2024)Algorithm of face anti-spoofing based on pseudo-negative features generationFrontiers in Neuroscience10.3389/fnins.2024.136228618Online publication date: 12-Apr-2024
      • (2024)A deep face spoof detection framework using multi-level ELBPs and stacked LSTMsSignal, Image and Video Processing10.1007/s11760-024-03169-218:S1(499-512)Online publication date: 28-Apr-2024
      • (2024)Domain Generalization via Ensemble Stacking for Face Presentation Attack DetectionInternational Journal of Computer Vision10.1007/s11263-024-02152-1132:12(5759-5782)Online publication date: 1-Dec-2024
      • (2024)Securing Faces: A GAN-Powered Defense Against Spoofing with MSRCR and CBAMPattern Recognition10.1007/978-3-031-78201-5_28(430-449)Online publication date: 1-Dec-2024

      View Options

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Full Text

      View this article in Full Text.

      Full Text

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media