VIFST: Video Inpainting Localization Using Multi-view Spatial-Frequency Traces

Pei, Pengfei; Zhao, Xianfeng; Li, Jinchuan; Cao, Yun

doi:10.1007/978-981-99-7025-4_37

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14327))

Included in the following conference series:

Pacific Rim International Conference on Artificial Intelligence

496 Accesses

Abstract

Video inpainting techniques based on deep learning have shown promise in removing unwanted objects from videos. However, their misuse can lead to harmful outcomes. While current methods excel in identifying known forgeries, they struggle when facing unfamiliar ones. Thus, it is crucial to design a video inpainting localization method that exhibits better generalization. The key hurdle lies in devising a network that can extract more generalized forgery features. A notable observation is that the forgery regions often exhibit disparities in forgery traces, such as boundaries, pixel distributions, and region characteristics, when contrasted with the original areas. These traces are prevalent in various inpainted videos, and harnessing them could bolster the detection’s versatility. Based on these multi-view traces, we introduce a three-stage solution termed VIFST: 1) The Spatial and Frequency Branches capture diverse traces, including edges, pixels, and regions, from different viewpoints; 2) local feature learning via CNN-based MaxPoolFormer; and 3) global context feature learning through Transformer-based InterlacedFormer. By integrating local and global feature learning networks, VIFST enhances fine-grained pixel-level detection performance. Extensive experiments demonstrate the effectiveness of our method and its superior generalization performance compared to state-of-the-art approaches. The source code for our method has been published on GitHub: https://github.com/lajlksdf/UVL.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Abbas, W., Shabbir, M., Yazıcıoğlu, Y., Koutsoukos, X.: Edge augmentation with controllability constraints in directed Laplacian networks. IEEE Control Syst. Lett. 6, 1106–1111 (2022)
Article MathSciNet Google Scholar
Chen, M., et al.: CF-ViT: a general coarse-to-fine method for vision transformer. In: Williams, B., Chen, Y., Neville, J. (eds.) AAAI, pp. 7042–7052. AAAI Press, Washington, DC (2023)
Google Scholar
Diao, Q., Jiang, Y., Wen, B., Sun, J., Yuan, Z.: MetaFormer: a unified meta framework for fine-grained recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, New Orleans, Louisiana, USA. IEEE (2022)
Google Scholar
Dong, C., Chen, X., Hu, R., Cao, J., Li, X.: MVSS-Net: multi-view multi-scale supervised networks for image manipulation detection. IEEE Trans. Pattern Anal. Mach. Intell. 45(3), 3539–3553 (2023)
Google Scholar
Gao, C., Saraf, A., Huang, J.-B., Kopf, J.: Flow-edge guided video completion. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12357, pp. 713–729. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58610-2_42
Chapter Google Scholar
Ji, Z., Hou, J., Su, Y., Pang, Y., Li, X.: G2LP-Net: global to local progressive video inpainting network. IEEE TCSVT 33(3), 1082–1092 (2023)
Google Scholar
Kim, D., Woo, S., Lee, J., Kweon, I.S.: Deep video inpainting. In: IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, pp. 5792–5801. IEEE (2019)
Google Scholar
Lee, S., Oh, S.W., Won, D., Kim, S.J.: Copy-and-paste networks for deep video inpainting. In: International Conference on Computer Vision, Seoul, Korea (South), pp. 4412–4420 (2019)
Google Scholar
Li, H., Huang, J.: Localization of deep inpainting using high-pass fully convolutional network. In: International Conference on Computer Vision, Seoul, Korea (South), pp. 8300–8309 (2019)
Google Scholar
Li, J., Xie, H., Li, J., Wang, Z., Zhang, Y.: Frequency-aware discriminative feature learning supervised by single-center loss for face forgery detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6458–6467. IEEE, virtual (2021)
Google Scholar
Oh, S.W., Lee, S., Lee, J., Kim, S.J.: Onion-peel networks for deep video completion. In: International Conference on Computer Vision, Seoul, Korea (South), pp. 4402–4411 (2019)
Google Scholar
Shi, X., Li, P., Wu, H., Chen, Q., Zhu, H.: A lightweight image splicing tampering localization method based on mobilenetv2 and SRM. IET Image Process. 17(6), 1883–1892 (2023)
Article Google Scholar
Wei, S., Li, H., Huang, J.: Deep video inpainting localization using spatial and temporal traces. In: ICASSP, pp. 8957–8961 (2022)
Google Scholar
Xiao, X., Hu, Q., Wang, G.: Edge-aware multi-task network for integrating quantification segmentation and uncertainty prediction of liver tumor on multi-modality non-contrast MRI. CoRR abs/2307.01798 (2023)
Google Scholar
Xu, R., Li, X., Zhou, B., Loy, C.C.: Deep flow-guided video inpainting. In: IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, pp. 3723–3732. IEEE (2019)
Google Scholar
Yang, W., Chen, Z., Chen, C., Chen, G., Wong, K.Y.K.: Deep face video inpainting via UV mapping. IEEE Trans. Image Process. 32, 1145–1157 (2023)
Article Google Scholar
Yu, B., Li, W., Li, X., Lu, J., Zhou, J.: Frequency-aware spatiotemporal transformers for video inpainting detection. In: International Conference on Computer Vision, pp. 8188–8197, October 2021
Google Scholar
Yuan, Y., et al.: HRFormer: high-resolution vision transformer for dense predict. In: Advances in Neural Information Processing Systems, pp. 7281–7293. Virtual (2021)
Google Scholar
Zeng, Y., Fu, J., Chao, H.: Learning joint spatial-temporal transformations for video inpainting. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12361, pp. 528–543. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58517-4_31
Chapter Google Scholar
Zhang, Y., Fu, Z., Qi, S., Xue, M., Hua, Z., Xiang, Y.: Localization of inpainting forgery with feature enhancement network. IEEE Trans. Big Data 9(3), 936–948 (2023)
Article Google Scholar
Zhou, P., et al.: Generate, segment, and refine: towards generic manipulation segmentation. In: AAAI, New York, NY, USA, pp. 13058–13065 (2020)
Google Scholar
Zhou, P., Yu, N., Wu, Z., Davis, L., Shrivastava, A., Lim, S.: Deep video inpainting detection. In: 32nd British Machine Vision Conference 2021, BMVC, p. 35. Online (2021)
Google Scholar

Download references

Author information

Authors and Affiliations

State Key Laboratory of Information Security, Institute of Information Engineering, Chinese Academy of Sciences, Beijing, 100083, China
Pengfei Pei, Xianfeng Zhao, Jinchuan Li & Yun Cao
School of Cyber Security, University of Chinese Academy of Sciences, Beijing, 100083, China
Pengfei Pei, Xianfeng Zhao, Jinchuan Li & Yun Cao

Authors

Pengfei Pei
View author publications
You can also search for this author in PubMed Google Scholar
Xianfeng Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Jinchuan Li
View author publications
You can also search for this author in PubMed Google Scholar
Yun Cao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xianfeng Zhao .

Editor information

Editors and Affiliations

Tsinghua University, Beijing, China
Fenrong Liu
SEEK Limited, Cremorne, NSW, Australia
Arun Anand Sadanandan
MIMOS Berhad, Kuala Lumpur, Malaysia
Duc Nghia Pham
Universitas Indonesia, Depok, Indonesia
Petrus Mursanto
Tabcorp Holdings Limited, Melbourne, VIC, Australia
Dickson Lukose

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Pei, P., Zhao, X., Li, J., Cao, Y. (2024). VIFST: Video Inpainting Localization Using Multi-view Spatial-Frequency Traces. In: Liu, F., Sadanandan, A.A., Pham, D.N., Mursanto, P., Lukose, D. (eds) PRICAI 2023: Trends in Artificial Intelligence. PRICAI 2023. Lecture Notes in Computer Science(), vol 14327. Springer, Singapore. https://doi.org/10.1007/978-981-99-7025-4_37

Download citation

DOI: https://doi.org/10.1007/978-981-99-7025-4_37
Published: 10 November 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-7024-7
Online ISBN: 978-981-99-7025-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics