SI-Net: spatial interaction network for deepfake detection

Wang, Jian; Du, Xiaoyu; Cheng, Yu; Sun, Yunlian; Tang, Jinhui

doi:10.1007/s00530-023-01114-w

SI-Net: spatial interaction network for deepfake detection

Special Issue Paper
Published: 01 July 2023

Volume 29, pages 3139–3150, (2023)
Cite this article

Multimedia Systems Aims and scope Submit manuscript

Jian Wang^1,2,
Xiaoyu Du^1,2,
Yu Cheng³,
Yunlian Sun¹ &
…
Jinhui Tang^1,2

349 Accesses
Explore all metrics

Abstract

As manipulated faces become more realistic and indistinguishable, there is a high demand for efficiently and accurately detecting deepfakes. Existing CNN-based deepfake detection methods either learn a global feature representation of the whole face or learn multiple local features. However, these methods learn the global and local features independently, thus neglect the spatial correlations between the local features and global context, which are vital in identifying different forgery patterns. Therefore, in this paper, we propose Spatial Interaction Network (SI-Net), a deepfake detection method to mine potential complementary and co-occurrent features between local texture and global context concurrently. Specifically, we first utilize a region feature extractor that distills local features from the global features, to simplify the procedure of local feature extraction. We then propose spatial-aware transformer to learn the co-occurrence feature from local texture and global context, concurrently. We capture the attended feature from the local regions according to their importance. The final prediction is made through the composite considerations of the aforementioned modules. Experimental results on two public datasets, FaceForensics++ and WildDeepfake, demonstrate the superior performance of SI-Net compared with the state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Deepfake: An Overview

Image forgery detection: a survey of recent deep-learning approaches

Article Open access 03 October 2022

Deepfake generation and detection, a survey

Article 08 January 2022

References

Afchar, D., Nozick. V., Yamagishi, J., et al.: Mesonet: a compact facial video forgery detection network. In: IEEE International Workshop on Information Forensics and Security (2018)
Bappy, J..H., Simons, C., Nataraj, L., et al.: Hybrid lstm and encoder–decoder architecture for detection of image forgeries. IEEE Transact. Image Process 28(7), 3286–3300 (2019)
Article MathSciNet MATH Google Scholar
Bayar, B., Stamm, M.C.: A deep learning approach to universal image manipulation detection using a new convolutional layer. In: ACM Workshop on Information Hiding and Multimedia Security, pp 5–10 (2016)
Chai, L., Bau, D., Lim, S.N., et. al.: What makes fake images detectable? understanding properties that generalize. In: European Conference on Computer Vision, pp 103–120 (2020)
Chen, Z., Yang, H.: Manipulated face detector: Joint spatial and frequency domain attention network. (2020) arXiv preprint arXiv:2005.02958
Chi, C., Wei, F., Hu, H.: Relationnet++: Bridging visual representations for object detection via transformer decoder. In: Annual Conference on Neural Information Processing Systems (2020)
Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 1251–1258 (2017)
Chugh, K., Gupta, P., Dhall, A., et al.: Not made for each other- audio-visual dissonance-based deepfake detection and localization. In: ACM International Conference on Multimedia, pp 439–447, (2020)https://doi.org/10.1145/3394171.3413700
Ciftci, U.A., Demir, I., Yin, L.: Fakecatcher: detection of synthetic portrait videos using biological signals. IEEE Transact. Pattern Anal. Mach. Intellig. (2020). https://doi.org/10.1109/TPAMI.2020.3009287
Article Google Scholar
Cozzolino, D., Verdoliva, L.: Noiseprint: A CNN-based camera model fingerprint. IEEE Transact. Informat. Forens. Security 15, 144–159 (2020). https://doi.org/10.1109/TIFS.2019.2916364
Article Google Scholar
Cozzolino, D., Poggi, G., Verdoliva, L.: Recasting residual-based local descriptors as convolutional neural networks: an application to image forgery detection. In: ACM Workshop on Information Hiding and Multimedia Security, pp 159–164 (2017)
Dang, H., Liu, F., Stehouwer, J., et al.: On the detection of digital face manipulation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 5781–5790 (2020)
Deepfakes (2020-01-06) Deepfakes. https://github.com/deepfakes/faceswap
Di, D., Shang, X., Zhang, W., et al.: Multiple hypothesis video relation detection. In: IEEE International Conference on Multimedia Big Data, pp 287–291 (2019)
Du, X.Y., Yang, Y., Yang, L., et al.: Captioning videos using large-scale image corpus. J. Comp. Sci. Technol. 32(3), 480–493 (2017)
Article Google Scholar
Du, Y., Yuan, C., Li, B., et al.: Interaction-aware spatio-temporal pyramid attention networks for action classification. In: European Conference on Computer Vision, pp 373–389 (2018)
Durall, R., Keuper, M., Keuper, J.: Watch your up-convolution: Cnn based generative deep neural networks are failing to reproduce spectral distributions. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 7890–7899 (2020)
Frank, J., Eisenhofer, T., Schönherr, L., et al.: Leveraging frequency analysis for deep fake image recognition. In: International Conference on Machine Learning, pp 3247–3258 (2020)
Fridrich, J., Kodovsky, J.: Rich models for steganalysis of digital images. IEEE Transact. Informat. Forens. Security 7(3), 868–882 (2012)
Article Google Scholar
He, K., Zhang, X., Ren, S., et al.: Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 770–778 (2016)
He, K., Gkioxari, G., Dollár, P., et al.: Mask r-cnn. In: IEEE International Conference on Computer Vision, pp 2961– 2969 (2017)
Hou, Y., Xu, J., Liu, M., et al.: Nlh: a blind pixel-level non-local method for real-world image denoising. IEEE Transact. Image Process. 29, 5121–5135 (2020)
Article MATH Google Scholar
Huang, L., Wang, W., Chen, J., et al.: Attention on attention for image captioning. In: IEEE International Conference on Computer Vision, pp 4634–4643 (2019)
Huang, Y., Juefei-Xu, F., Wang, R., et al.: Fakelocator: Robust localization of gan-based face manipulations via semantic segmentation networks with bells and whistles. arXiv preprint arXiv:2001.09598 (2020)
King, D.E.: Dlib-ml: a machine learning toolkit. J. Mach. Learn. Res. 10, 1755–1758 (2009)
Google Scholar
Kowalski (2020-07-22) Faceswap. https://github.com/MarekKowalski/FaceSwap/
Li, L., Bao, J., Zhang, T., et al.: (2020) Face x-ray for more general face forgery detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 5001–5010
Li, Y., Chang, M.C., Lyu, S.: In ictu oculi: Exposing ai created fake videos by detecting eye blinking. In: IEEE International Workshop on Information Forensics and Security (2018a)
Li, Y., Zeng, J., Shan, S., et al.: Occlusion aware facial expression recognition using cnn with attention mechanism. IEEE Transact. Image Process. 28(5), 2439–2450 (2018)
Article MathSciNet Google Scholar
Li, Y., Yang, X., Shang, X., et al.: Interventional video relation detection. In: ACM International Conference on Multimedia, pp 4091–4099 (2021)
de Lima, O., Franklin, S., Basu, S., et al.: Deepfake detection using spatiotemporal convolutional networks. arXiv preprint arXiv:2006.14749 (2020)
Liu, H., Feng, J., Qi, M., et al.: End-to-end comparative attention networks for person re-identification. IEEE Transact. Image Process. 26(7), 3492–3506 (2017)
Article MathSciNet MATH Google Scholar
Liu, X., Yang, X., Wang, M., et al.: Deep neighborhood component analysis for visual similarity modeling. ACM Transact. Intell. Syst. Technol. 11(3), 1–15 (2020)
Google Scholar
Liu, Z., Qi, X., Torr, P.H.: Global texture enhancement for fake face detection in the wild. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 8060–8069 (2020b)
Masi, I., Killekar, A., Mascarenhas, R.M., et al.: Two-branch recurrent network for isolating deepfakes in videos. In: European Conference on Computer Vision, pp 667–684 (2020)
Matern, F., Riess, C., Stamminger, M.: Exploiting visual artifacts to expose deepfakes and face manipulations. In: IEEE Winter Applications of Computer Vision Workshops, pp 83–92 (2019)
Neekhara, P., Hussain, S., Jere, M., et al.: Adversarial deepfakes: Evaluating vulnerability of deepfake detectors to adversarial examples. arXiv preprint arXiv:2002.12749 (2020)
Nguyen, H.H., Fang, F., Yamagishi, J., et al.: Multi-task learning for detecting and segmenting manipulated facial images and videos. In: International Conference on Biometrics Theory, Applications and Systems (2019)
Ondyari (2020-04-13) FaceForensics. https://github.com/ondyari/FaceForensics
Ozbulak U (2020-12-13) Pytorch cnn visualizations. https://github.com/utkuozbulak/pytorch-cnn-visualizations
Peng, Y., He, X., Zhao, J.: Object-part attention model for fine-grained image classification. IEEE Transact. Image Process. 27(3), 1487–1500 (2017)
Article MathSciNet MATH Google Scholar
Qian, Y., Yin, G., Sheng, L., et al.: Thinking in frequency: Face forgery detection by mining frequency-aware clues. In: European Conference on Computer Vision, pp 86–103 (2020)
Rahmouni, N., Nozick, V., Yamagishi, J., et al.: Distinguishing computer graphics from natural images using convolution neural networks. In: IEEE International Workshop on Information Forensics and Security (2017)
Rossler, A., Cozzolino, D., Verdoliva, L., et al.: Faceforensics++: Learning to detect manipulated facial images. In: IEEE International Conference on Computer Vision (2019)
Selvaraju, R.R., Cogswell, M., Das, A., et al.: Grad-cam: Visual explanations from deep networks via gradient-based localization. In: IEEE International Conference on Computer Vision, pp 618–626, (2017) https://doi.org/10.1109/ICCV.2017.74
Shang, X., Di, D., Xiao, J., et al.: Annotating objects and relations in user-generated videos. In: International Conference on Multimedia Retrieval, pp 279–287 (2019)
Tan, Y., Hao, Y., He, X., et al.: Selective dependency aggregation for action classification. In: ACM International Conference on Multimedia, pp 592–601 (2021)
Thies, J., Zollhofer, M., Stamminger, M., et al.: Face2face: Real-time face capture and reenactment of rgb videos. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 2387–2395 (2016)
Thies, J., Zollhöfer, M., Nießner, M.: Deferred neural rendering: image synthesis using neural textures. ACM Transact. Graph. 38(4), 66:1-66:12 (2019)
Google Scholar
Tolosana, R., Romero-Tapiador, S., Fierrez, J., et al.: Deepfakes evolution: Analysis of facial regions and fake detection performance. arXiv preprint arXiv:2004.07532 (2020)
Vaswani, A., Shazeer, N., Parmar, N., et al.: Attention is all you need. In: Annual Conference on Neural Information Processing Systems pp 5998–6008 (2017)
Wang, S.Y., Wang, O., Zhang, R., et al.: Cnn-generated images are surprisingly easy to spot... for now. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 8692–8701 (2020)
Wang, X., Girshick, R., Gupta, A., et al.: Non-local neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 7794–7803 (2018)
Wang, Z., Bovik, A.C., Sheikh, H.R., et al.: Image quality assessment: from error visibility to structural similarity. IEEE Transact. Image Process. 13(4), 600–612 (2004)
Article Google Scholar
Xiao, J., Shang, X., Yang, X., et al.: Visual relation grounding in videos. In: European Conference on Computer Vision, Springer, pp 447–464 (2020)
Yang, C., Ding, L., Chen, Y., et al.: Defending against gan-based deepfake attacks via transformation-aware adversarial faces. arXiv preprint arXiv:2006.07421 (2020a)
Yang, X., Dong, J., Cao, Y., et al.: Tree-augmented cross-modal encoding for complex-query video retrieval. In: ACM SIGIR Conference on Research and Development in Information Retrieval, pp 1339–1348 (2020b)
Yang, X., Liu, X., Jian, M., et al.: Weakly-supervised video object grounding by exploring spatio-temporal contexts. In: ACM International Conference on Multimedia, pp 1939–1947 (2020c)
Yang, X., Feng, F., Ji, W., et al.: Deconfounded video moment retrieval with causal intervention. In: SIGIR (2021)
Zhang, D., Zhang, H., Tang, J., et al.: Feature pyramid transformer. In: European Conference on Computer Vision, pp 323–339(2020)
Zhang, K., Zhang, Z., Li, Z., et al.: Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Sig. Process. Lett. 23(10), 1499–1503 (2016)
Article Google Scholar
Zhou, P., Han, X., Morariu, V.I., et al.: Two-stream neural networks for tampered face detection. In: IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp 1831–1839,(2017) https://doi.org/10.1109/CVPRW.2017.229
Zhu, F., Fang, C., Ma, K.K.: Pnen: Pyramid non-local enhanced networks. IEEE Transact. Image Process. 29, 8831–8841 (2020)
Article MATH Google Scholar
Zi, B., Chang, M., Chen, J., et al.:Wilddeepfake: A challenging real-world dataset for deepfake detection. In: ACM International Conference on Multimedia, pp 2382–2390 (2020)

Download references

Acknowledgements

This work is supported by Open Funding Project of the State Key Laboratory of Communication Content Cognition (No. 20K03) and National Natural Science Foundation of China under Grant 62076131.

Author information

Authors and Affiliations

School of Computer Science and Engineering, Nanjing University of Science and Technology, 200 Xiaolingwei Street, Nanjing, 210018, Jiangsu, China
Jian Wang, Xiaoyu Du, Yunlian Sun & Jinhui Tang
State Key Laboratory of Communication Content Cognition, 2 Jintai West Road, Chaoyang District, 100026, Beijing, China
Jian Wang, Xiaoyu Du & Jinhui Tang
Infrastructure Inspection Research Institute, China Academy of Railway Sciences Corporation Limited, 2 Daliushu Road, Haidian District, 100081, Beijing, China
Yu Cheng

Authors

Jian Wang
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoyu Du
View author publications
You can also search for this author in PubMed Google Scholar
Yu Cheng
View author publications
You can also search for this author in PubMed Google Scholar
Yunlian Sun
View author publications
You can also search for this author in PubMed Google Scholar
Jinhui Tang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yu Cheng.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Wang, J., Du, X., Cheng, Y. et al. SI-Net: spatial interaction network for deepfake detection. Multimedia Systems 29, 3139–3150 (2023). https://doi.org/10.1007/s00530-023-01114-w

Download citation

Published: 01 July 2023
Issue Date: October 2023
DOI: https://doi.org/10.1007/s00530-023-01114-w

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

SI-Net: spatial interaction network for deepfake detection

Abstract

Access this article

Similar content being viewed by others

Deepfake: An Overview

Image forgery detection: a survey of recent deep-learning approaches

Deepfake generation and detection, a survey

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

SI-Net: spatial interaction network for deepfake detection

Abstract

Access this article

Similar content being viewed by others

Deepfake: An Overview

Image forgery detection: a survey of recent deep-learning approaches

Deepfake generation and detection, a survey

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation