Skip to main content

Semantic-Guided Completion Network for Video Inpainting in Complex Urban Scene

  • Conference paper
  • First Online:
Pattern Recognition and Computer Vision (PRCV 2023)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14435))

Included in the following conference series:

  • 331 Accesses

Abstract

Video inpainting aims to fill damaged areas in video frames with appropriate content. Complex scene contain cluttered or ambiguous semantics and objects, making video inpainting in such scenarios a challenging yet meaningful task. Current methods are limited by the lack of sufficient video information, resulting in blurred results and temporal artifacts. In this paper, we design a novel semantic-guided completion network that uses the semantic information of the videos to complete missing regions in the complex urban scene. Specifically, we first leverage the semantic information to model the structure and content of the video and improve U-Net network to complete the broken semantic image. Then, we propose a module based on spatial-adaptive normalization to guide the generation of the damaged part of the video pixels by combining semantic information. Our model’s ability to generate reasonable and accurate content is demonstrated through both quantitative and qualitative results on two publicly available urban scene datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 109.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 139.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bertalmio, M., Bertozzi, A.L., Sapiro, G.: Navier-stokes, fluid dynamics, and image and video inpainting. In: Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), p. I (2001)

    Google Scholar 

  2. Brostow, G.J., Shotton, J., Fauqueur, J., Cipolla, R.: Segmentation and recognition using structure from motion point clouds. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008. LNCS, vol. 5302, pp. 44–57. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-88682-2_5

    Chapter  Google Scholar 

  3. Chang, Y.L., Liu, Z.Y., Lee, K.Y., et al.: Learnable gated temporal shift module for deep video inpainting. arXiv preprint arXiv:1907.01131 (2019)

  4. Chen, D., Liao, J., Yuan, L., et al.: Coherent online video style transfer. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1105–1114 (2017)

    Google Scholar 

  5. Chen, Y., Guo, X., Shen, W.: Robust adaptive spatio-temporal video denoising algorithm based on motion estimation. Comput. Appl. 26(8), 1882–1887 (2006)

    Google Scholar 

  6. Cordts, M., Omran, M., Ramos, S., et al.: The cityscapes dataset for semantic urban scene understanding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3213–3223 (2016)

    Google Scholar 

  7. Gatys, L.A., Ecker, A.S., Bethge, M.: A neural algorithm of artistic style. arXiv preprint arXiv:1508.06576 (2015)

  8. Granados, M., Tompkin, J., Kim, K.I., et al.: How not to be seen - object removal from videos of crowded scenes. Comput. Graph. Forum 31, 219–228 (2012)

    Article  Google Scholar 

  9. Hu, P., Caba, F., Wang, O., et al.: Temporally distributed networks for fast video semantic segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8818–8827 (2020)

    Google Scholar 

  10. Ilg, E., Mayer, N., Saikia, T., et al.: FlowNet 2.0: evolution of optical flow estimation with deep networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2462–2470 (2017)

    Google Scholar 

  11. Isola, P., Zhu, J.Y., Zhou, T., et al.: Image-to-image translation with conditional adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1125–1134 (2017)

    Google Scholar 

  12. Kang, J., Oh, S.W., Kim, S.J.: Error compensation framework for flow-guided video inpainting. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision-ECCV 2022: 17th European Conference, pp. 357–390. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19784-0_22

  13. Kim, D., Woo, S., Lee, J.Y.: Deep video inpainting. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5792–5801 (2019)

    Google Scholar 

  14. Lee, S., Oh, S.W., Won, D.Y., et al.: Copy-and-paste networks for deep video inpainting. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4413–4421 (2019)

    Google Scholar 

  15. Liu, R., Deng, H., Huang, Y., et al.: FuseFormer: fusing fine-grained information in transformers for video inpainting. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 14040–14049 (2021)

    Google Scholar 

  16. Liu, R., Li, B., Zhu, Y.: Temporal group fusion network for deep video inpainting. IEEE Trans. Circuits Syst. Video Technol. 32(6), 3539–3551 (2021)

    Article  Google Scholar 

  17. Nazeri, K., Ng, E., Joseph, T., et al.: EdgeConnect: generative image inpainting with adversarial edge learning. arXiv preprint arXiv:1901.00212 (2019)

  18. Newson, A., Almansa, A., Fradet, M., et al.: Video inpainting of complex scenes. SIAM J. Imag. Sci. 7(4), 1993–2019 (2014)

    Article  MathSciNet  Google Scholar 

  19. Oh, S.W., Lee, S., Lee, J.Y., et al.: Onion-peel networks for deep video completion. In: proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4403–4412 (2019)

    Google Scholar 

  20. Park, T., Liu, M.Y., Wang, T.C., et al.: Semantic image synthesis with spatially-adaptive normalization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2337–2346 (2019)

    Google Scholar 

  21. Patwardhan, K.A., Sapiro, G., Bertalmio, M.: Video inpainting of occluding and occluded objects. In: IEEE International Conference on Image Processing (ICIP), pp. II-69 (2005)

    Google Scholar 

  22. Shang, Y., Duan, B., Zong, Z., Nie, L., Yan, Y.: Lipschitz continuity guided knowledge distillation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10675–10684 (2021)

    Google Scholar 

  23. Shang, Y., Xu, D., Zong, Z., Nie, L., Yan, Y.: Network binarization via contrastive learning. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) European Conference on Computer Vision. pp. 586–602. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-20083-0_35

  24. Shang, Y., Yuan, Z., Xie, B., Wu, B., Yan, Y.: Post-training quantization on diffusion models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1972–1981 (2023)

    Google Scholar 

  25. Wang, C., Huang, H., Han, X., et al.: Video inpainting by jointly learning temporal structure and spatial details. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 5232–5239 (2019)

    Google Scholar 

  26. Wang, T.C., Liu, M.Y., Tao, A., et al.: Few-shot video-to-video synthesis. arXiv preprint arXiv:1910.12713 (2019)

  27. Wang, T.C., Liu, M.Y., Zhu, J.Y., et al.: High-resolution image synthesis and semantic manipulation with conditional GANs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8789–8807 (2018)

    Google Scholar 

  28. Wang, T.C., Liu, M.Y., Zhu, J.Y., et al.: Video-to-video synthesis. arXiv preprint arXiv:1808.06601 (2018)

  29. Wu, Z., Sun, C., Xuan, H., et al.: Deep stereo video inpainting. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5693–5702 (2023)

    Google Scholar 

  30. Wu, Z., Xuan, H., Sun, C., et al.: Semi-supervised video inpainting with cycle consistency constraints. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 22586–22595 (2023)

    Google Scholar 

  31. Wu, Z., Zhang, K., Xuan, H., et al.: DAPC-Net: deformable alignment and pyramid context completion networks for video inpainting. IEEE Signal Process. Lett. 28, 1145–1149 (2021)

    Article  Google Scholar 

  32. Wu, Z., Sun, C., Xuan, H., Zhang, K., Yan, Y.: Divide-and-conquer completion network for video inpainting. IEEE Trans. Circuits Syst. Video Technol. 33(6), 2753–2766 (2023)

    Article  Google Scholar 

  33. Xu, R., Li, X., Zhou, B., et al.: Deep flow-guided video inpainting. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3723–3732 (2019)

    Google Scholar 

  34. Zeng, Y., Fu, J., Chao, H.: Learning joint spatial-temporal transformations for video inpainting. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12361, pp. 528–543. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58517-4_31

    Chapter  Google Scholar 

  35. Zhang, K., Fu, J., Liu, D.: Flow-guided transformer for video inpainting. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) Computer Vision-ECCV 2022: 17th European Conference, pp. 74–90. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19797-0_5

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhiliang Wu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wang, J., Xuan, H., Wu, Z. (2024). Semantic-Guided Completion Network for Video Inpainting in Complex Urban Scene. In: Liu, Q., et al. Pattern Recognition and Computer Vision. PRCV 2023. Lecture Notes in Computer Science, vol 14435. Springer, Singapore. https://doi.org/10.1007/978-981-99-8552-4_18

Download citation

  • DOI: https://doi.org/10.1007/978-981-99-8552-4_18

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-99-8551-7

  • Online ISBN: 978-981-99-8552-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics