Abstract
Video inpainting techniques based on deep learning have shown promise in removing unwanted objects from videos. However, their misuse can lead to harmful outcomes. While current methods excel in identifying known forgeries, they struggle when facing unfamiliar ones. Thus, it is crucial to design a video inpainting localization method that exhibits better generalization. The key hurdle lies in devising a network that can extract more generalized forgery features. A notable observation is that the forgery regions often exhibit disparities in forgery traces, such as boundaries, pixel distributions, and region characteristics, when contrasted with the original areas. These traces are prevalent in various inpainted videos, and harnessing them could bolster the detection’s versatility. Based on these multi-view traces, we introduce a three-stage solution termed VIFST: 1) The Spatial and Frequency Branches capture diverse traces, including edges, pixels, and regions, from different viewpoints; 2) local feature learning via CNN-based MaxPoolFormer; and 3) global context feature learning through Transformer-based InterlacedFormer. By integrating local and global feature learning networks, VIFST enhances fine-grained pixel-level detection performance. Extensive experiments demonstrate the effectiveness of our method and its superior generalization performance compared to state-of-the-art approaches. The source code for our method has been published on GitHub: https://github.com/lajlksdf/UVL.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Abbas, W., Shabbir, M., Yazıcıoğlu, Y., Koutsoukos, X.: Edge augmentation with controllability constraints in directed Laplacian networks. IEEE Control Syst. Lett. 6, 1106–1111 (2022)
Chen, M., et al.: CF-ViT: a general coarse-to-fine method for vision transformer. In: Williams, B., Chen, Y., Neville, J. (eds.) AAAI, pp. 7042–7052. AAAI Press, Washington, DC (2023)
Diao, Q., Jiang, Y., Wen, B., Sun, J., Yuan, Z.: MetaFormer: a unified meta framework for fine-grained recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, New Orleans, Louisiana, USA. IEEE (2022)
Dong, C., Chen, X., Hu, R., Cao, J., Li, X.: MVSS-Net: multi-view multi-scale supervised networks for image manipulation detection. IEEE Trans. Pattern Anal. Mach. Intell. 45(3), 3539–3553 (2023)
Gao, C., Saraf, A., Huang, J.-B., Kopf, J.: Flow-edge guided video completion. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12357, pp. 713–729. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58610-2_42
Ji, Z., Hou, J., Su, Y., Pang, Y., Li, X.: G2LP-Net: global to local progressive video inpainting network. IEEE TCSVT 33(3), 1082–1092 (2023)
Kim, D., Woo, S., Lee, J., Kweon, I.S.: Deep video inpainting. In: IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, pp. 5792–5801. IEEE (2019)
Lee, S., Oh, S.W., Won, D., Kim, S.J.: Copy-and-paste networks for deep video inpainting. In: International Conference on Computer Vision, Seoul, Korea (South), pp. 4412–4420 (2019)
Li, H., Huang, J.: Localization of deep inpainting using high-pass fully convolutional network. In: International Conference on Computer Vision, Seoul, Korea (South), pp. 8300–8309 (2019)
Li, J., Xie, H., Li, J., Wang, Z., Zhang, Y.: Frequency-aware discriminative feature learning supervised by single-center loss for face forgery detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6458–6467. IEEE, virtual (2021)
Oh, S.W., Lee, S., Lee, J., Kim, S.J.: Onion-peel networks for deep video completion. In: International Conference on Computer Vision, Seoul, Korea (South), pp. 4402–4411 (2019)
Shi, X., Li, P., Wu, H., Chen, Q., Zhu, H.: A lightweight image splicing tampering localization method based on mobilenetv2 and SRM. IET Image Process. 17(6), 1883–1892 (2023)
Wei, S., Li, H., Huang, J.: Deep video inpainting localization using spatial and temporal traces. In: ICASSP, pp. 8957–8961 (2022)
Xiao, X., Hu, Q., Wang, G.: Edge-aware multi-task network for integrating quantification segmentation and uncertainty prediction of liver tumor on multi-modality non-contrast MRI. CoRR abs/2307.01798 (2023)
Xu, R., Li, X., Zhou, B., Loy, C.C.: Deep flow-guided video inpainting. In: IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, pp. 3723–3732. IEEE (2019)
Yang, W., Chen, Z., Chen, C., Chen, G., Wong, K.Y.K.: Deep face video inpainting via UV mapping. IEEE Trans. Image Process. 32, 1145–1157 (2023)
Yu, B., Li, W., Li, X., Lu, J., Zhou, J.: Frequency-aware spatiotemporal transformers for video inpainting detection. In: International Conference on Computer Vision, pp. 8188–8197, October 2021
Yuan, Y., et al.: HRFormer: high-resolution vision transformer for dense predict. In: Advances in Neural Information Processing Systems, pp. 7281–7293. Virtual (2021)
Zeng, Y., Fu, J., Chao, H.: Learning joint spatial-temporal transformations for video inpainting. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12361, pp. 528–543. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58517-4_31
Zhang, Y., Fu, Z., Qi, S., Xue, M., Hua, Z., Xiang, Y.: Localization of inpainting forgery with feature enhancement network. IEEE Trans. Big Data 9(3), 936–948 (2023)
Zhou, P., et al.: Generate, segment, and refine: towards generic manipulation segmentation. In: AAAI, New York, NY, USA, pp. 13058–13065 (2020)
Zhou, P., Yu, N., Wu, Z., Davis, L., Shrivastava, A., Lim, S.: Deep video inpainting detection. In: 32nd British Machine Vision Conference 2021, BMVC, p. 35. Online (2021)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Pei, P., Zhao, X., Li, J., Cao, Y. (2024). VIFST: Video Inpainting Localization Using Multi-view Spatial-Frequency Traces. In: Liu, F., Sadanandan, A.A., Pham, D.N., Mursanto, P., Lukose, D. (eds) PRICAI 2023: Trends in Artificial Intelligence. PRICAI 2023. Lecture Notes in Computer Science(), vol 14327. Springer, Singapore. https://doi.org/10.1007/978-981-99-7025-4_37
Download citation
DOI: https://doi.org/10.1007/978-981-99-7025-4_37
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-7024-7
Online ISBN: 978-981-99-7025-4
eBook Packages: Computer ScienceComputer Science (R0)