Skip to main content

VIFST: Video Inpainting Localization Using Multi-view Spatial-Frequency Traces

  • Conference paper
  • First Online:
PRICAI 2023: Trends in Artificial Intelligence (PRICAI 2023)

Abstract

Video inpainting techniques based on deep learning have shown promise in removing unwanted objects from videos. However, their misuse can lead to harmful outcomes. While current methods excel in identifying known forgeries, they struggle when facing unfamiliar ones. Thus, it is crucial to design a video inpainting localization method that exhibits better generalization. The key hurdle lies in devising a network that can extract more generalized forgery features. A notable observation is that the forgery regions often exhibit disparities in forgery traces, such as boundaries, pixel distributions, and region characteristics, when contrasted with the original areas. These traces are prevalent in various inpainted videos, and harnessing them could bolster the detection’s versatility. Based on these multi-view traces, we introduce a three-stage solution termed VIFST: 1) The Spatial and Frequency Branches capture diverse traces, including edges, pixels, and regions, from different viewpoints; 2) local feature learning via CNN-based MaxPoolFormer; and 3) global context feature learning through Transformer-based InterlacedFormer. By integrating local and global feature learning networks, VIFST enhances fine-grained pixel-level detection performance. Extensive experiments demonstrate the effectiveness of our method and its superior generalization performance compared to state-of-the-art approaches. The source code for our method has been published on GitHub: https://github.com/lajlksdf/UVL.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Abbas, W., Shabbir, M., Yazıcıoğlu, Y., Koutsoukos, X.: Edge augmentation with controllability constraints in directed Laplacian networks. IEEE Control Syst. Lett. 6, 1106–1111 (2022)

    Article  MathSciNet  Google Scholar 

  2. Chen, M., et al.: CF-ViT: a general coarse-to-fine method for vision transformer. In: Williams, B., Chen, Y., Neville, J. (eds.) AAAI, pp. 7042–7052. AAAI Press, Washington, DC (2023)

    Google Scholar 

  3. Diao, Q., Jiang, Y., Wen, B., Sun, J., Yuan, Z.: MetaFormer: a unified meta framework for fine-grained recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, New Orleans, Louisiana, USA. IEEE (2022)

    Google Scholar 

  4. Dong, C., Chen, X., Hu, R., Cao, J., Li, X.: MVSS-Net: multi-view multi-scale supervised networks for image manipulation detection. IEEE Trans. Pattern Anal. Mach. Intell. 45(3), 3539–3553 (2023)

    Google Scholar 

  5. Gao, C., Saraf, A., Huang, J.-B., Kopf, J.: Flow-edge guided video completion. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12357, pp. 713–729. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58610-2_42

    Chapter  Google Scholar 

  6. Ji, Z., Hou, J., Su, Y., Pang, Y., Li, X.: G2LP-Net: global to local progressive video inpainting network. IEEE TCSVT 33(3), 1082–1092 (2023)

    Google Scholar 

  7. Kim, D., Woo, S., Lee, J., Kweon, I.S.: Deep video inpainting. In: IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, pp. 5792–5801. IEEE (2019)

    Google Scholar 

  8. Lee, S., Oh, S.W., Won, D., Kim, S.J.: Copy-and-paste networks for deep video inpainting. In: International Conference on Computer Vision, Seoul, Korea (South), pp. 4412–4420 (2019)

    Google Scholar 

  9. Li, H., Huang, J.: Localization of deep inpainting using high-pass fully convolutional network. In: International Conference on Computer Vision, Seoul, Korea (South), pp. 8300–8309 (2019)

    Google Scholar 

  10. Li, J., Xie, H., Li, J., Wang, Z., Zhang, Y.: Frequency-aware discriminative feature learning supervised by single-center loss for face forgery detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6458–6467. IEEE, virtual (2021)

    Google Scholar 

  11. Oh, S.W., Lee, S., Lee, J., Kim, S.J.: Onion-peel networks for deep video completion. In: International Conference on Computer Vision, Seoul, Korea (South), pp. 4402–4411 (2019)

    Google Scholar 

  12. Shi, X., Li, P., Wu, H., Chen, Q., Zhu, H.: A lightweight image splicing tampering localization method based on mobilenetv2 and SRM. IET Image Process. 17(6), 1883–1892 (2023)

    Article  Google Scholar 

  13. Wei, S., Li, H., Huang, J.: Deep video inpainting localization using spatial and temporal traces. In: ICASSP, pp. 8957–8961 (2022)

    Google Scholar 

  14. Xiao, X., Hu, Q., Wang, G.: Edge-aware multi-task network for integrating quantification segmentation and uncertainty prediction of liver tumor on multi-modality non-contrast MRI. CoRR abs/2307.01798 (2023)

    Google Scholar 

  15. Xu, R., Li, X., Zhou, B., Loy, C.C.: Deep flow-guided video inpainting. In: IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, pp. 3723–3732. IEEE (2019)

    Google Scholar 

  16. Yang, W., Chen, Z., Chen, C., Chen, G., Wong, K.Y.K.: Deep face video inpainting via UV mapping. IEEE Trans. Image Process. 32, 1145–1157 (2023)

    Article  Google Scholar 

  17. Yu, B., Li, W., Li, X., Lu, J., Zhou, J.: Frequency-aware spatiotemporal transformers for video inpainting detection. In: International Conference on Computer Vision, pp. 8188–8197, October 2021

    Google Scholar 

  18. Yuan, Y., et al.: HRFormer: high-resolution vision transformer for dense predict. In: Advances in Neural Information Processing Systems, pp. 7281–7293. Virtual (2021)

    Google Scholar 

  19. Zeng, Y., Fu, J., Chao, H.: Learning joint spatial-temporal transformations for video inpainting. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12361, pp. 528–543. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58517-4_31

    Chapter  Google Scholar 

  20. Zhang, Y., Fu, Z., Qi, S., Xue, M., Hua, Z., Xiang, Y.: Localization of inpainting forgery with feature enhancement network. IEEE Trans. Big Data 9(3), 936–948 (2023)

    Article  Google Scholar 

  21. Zhou, P., et al.: Generate, segment, and refine: towards generic manipulation segmentation. In: AAAI, New York, NY, USA, pp. 13058–13065 (2020)

    Google Scholar 

  22. Zhou, P., Yu, N., Wu, Z., Davis, L., Shrivastava, A., Lim, S.: Deep video inpainting detection. In: 32nd British Machine Vision Conference 2021, BMVC, p. 35. Online (2021)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xianfeng Zhao .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Pei, P., Zhao, X., Li, J., Cao, Y. (2024). VIFST: Video Inpainting Localization Using Multi-view Spatial-Frequency Traces. In: Liu, F., Sadanandan, A.A., Pham, D.N., Mursanto, P., Lukose, D. (eds) PRICAI 2023: Trends in Artificial Intelligence. PRICAI 2023. Lecture Notes in Computer Science(), vol 14327. Springer, Singapore. https://doi.org/10.1007/978-981-99-7025-4_37

Download citation

  • DOI: https://doi.org/10.1007/978-981-99-7025-4_37

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-99-7024-7

  • Online ISBN: 978-981-99-7025-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics