Semantic-Guided Completion Network for Video Inpainting in Complex Urban Scene

Wang, Jianan; Xuan, Hanyu; Wu, Zhiliang

doi:10.1007/978-981-99-8552-4_18

Jianan Wang¹⁵,
Hanyu Xuan¹⁶ &
Zhiliang Wu¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14435))

Included in the following conference series:

Chinese Conference on Pattern Recognition and Computer Vision (PRCV)

331 Accesses

Abstract

Video inpainting aims to fill damaged areas in video frames with appropriate content. Complex scene contain cluttered or ambiguous semantics and objects, making video inpainting in such scenarios a challenging yet meaningful task. Current methods are limited by the lack of sufficient video information, resulting in blurred results and temporal artifacts. In this paper, we design a novel semantic-guided completion network that uses the semantic information of the videos to complete missing regions in the complex urban scene. Specifically, we first leverage the semantic information to model the structure and content of the video and improve U-Net network to complete the broken semantic image. Then, we propose a module based on spatial-adaptive normalization to guide the generation of the damaged part of the video pixels by combining semantic information. Our model’s ability to generate reasonable and accurate content is demonstrated through both quantitative and qualitative results on two publicly available urban scene datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 109.00; Price excludes VAT (USA)

Softcover Book: USD 139.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bertalmio, M., Bertozzi, A.L., Sapiro, G.: Navier-stokes, fluid dynamics, and image and video inpainting. In: Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), p. I (2001)
Google Scholar
Brostow, G.J., Shotton, J., Fauqueur, J., Cipolla, R.: Segmentation and recognition using structure from motion point clouds. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008. LNCS, vol. 5302, pp. 44–57. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-88682-2_5
Chapter Google Scholar
Chang, Y.L., Liu, Z.Y., Lee, K.Y., et al.: Learnable gated temporal shift module for deep video inpainting. arXiv preprint arXiv:1907.01131 (2019)
Chen, D., Liao, J., Yuan, L., et al.: Coherent online video style transfer. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1105–1114 (2017)
Google Scholar
Chen, Y., Guo, X., Shen, W.: Robust adaptive spatio-temporal video denoising algorithm based on motion estimation. Comput. Appl. 26(8), 1882–1887 (2006)
Google Scholar
Cordts, M., Omran, M., Ramos, S., et al.: The cityscapes dataset for semantic urban scene understanding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3213–3223 (2016)
Google Scholar
Gatys, L.A., Ecker, A.S., Bethge, M.: A neural algorithm of artistic style. arXiv preprint arXiv:1508.06576 (2015)
Granados, M., Tompkin, J., Kim, K.I., et al.: How not to be seen - object removal from videos of crowded scenes. Comput. Graph. Forum 31, 219–228 (2012)
Article Google Scholar
Hu, P., Caba, F., Wang, O., et al.: Temporally distributed networks for fast video semantic segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8818–8827 (2020)
Google Scholar
Ilg, E., Mayer, N., Saikia, T., et al.: FlowNet 2.0: evolution of optical flow estimation with deep networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2462–2470 (2017)
Google Scholar
Isola, P., Zhu, J.Y., Zhou, T., et al.: Image-to-image translation with conditional adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1125–1134 (2017)
Google Scholar
Kang, J., Oh, S.W., Kim, S.J.: Error compensation framework for flow-guided video inpainting. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision-ECCV 2022: 17th European Conference, pp. 357–390. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19784-0_22
Kim, D., Woo, S., Lee, J.Y.: Deep video inpainting. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5792–5801 (2019)
Google Scholar
Lee, S., Oh, S.W., Won, D.Y., et al.: Copy-and-paste networks for deep video inpainting. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4413–4421 (2019)
Google Scholar
Liu, R., Deng, H., Huang, Y., et al.: FuseFormer: fusing fine-grained information in transformers for video inpainting. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 14040–14049 (2021)
Google Scholar
Liu, R., Li, B., Zhu, Y.: Temporal group fusion network for deep video inpainting. IEEE Trans. Circuits Syst. Video Technol. 32(6), 3539–3551 (2021)
Article Google Scholar
Nazeri, K., Ng, E., Joseph, T., et al.: EdgeConnect: generative image inpainting with adversarial edge learning. arXiv preprint arXiv:1901.00212 (2019)
Newson, A., Almansa, A., Fradet, M., et al.: Video inpainting of complex scenes. SIAM J. Imag. Sci. 7(4), 1993–2019 (2014)
Article MathSciNet Google Scholar
Oh, S.W., Lee, S., Lee, J.Y., et al.: Onion-peel networks for deep video completion. In: proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4403–4412 (2019)
Google Scholar
Park, T., Liu, M.Y., Wang, T.C., et al.: Semantic image synthesis with spatially-adaptive normalization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2337–2346 (2019)
Google Scholar
Patwardhan, K.A., Sapiro, G., Bertalmio, M.: Video inpainting of occluding and occluded objects. In: IEEE International Conference on Image Processing (ICIP), pp. II-69 (2005)
Google Scholar
Shang, Y., Duan, B., Zong, Z., Nie, L., Yan, Y.: Lipschitz continuity guided knowledge distillation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10675–10684 (2021)
Google Scholar
Shang, Y., Xu, D., Zong, Z., Nie, L., Yan, Y.: Network binarization via contrastive learning. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) European Conference on Computer Vision. pp. 586–602. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-20083-0_35
Shang, Y., Yuan, Z., Xie, B., Wu, B., Yan, Y.: Post-training quantization on diffusion models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1972–1981 (2023)
Google Scholar
Wang, C., Huang, H., Han, X., et al.: Video inpainting by jointly learning temporal structure and spatial details. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 5232–5239 (2019)
Google Scholar
Wang, T.C., Liu, M.Y., Tao, A., et al.: Few-shot video-to-video synthesis. arXiv preprint arXiv:1910.12713 (2019)
Wang, T.C., Liu, M.Y., Zhu, J.Y., et al.: High-resolution image synthesis and semantic manipulation with conditional GANs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8789–8807 (2018)
Google Scholar
Wang, T.C., Liu, M.Y., Zhu, J.Y., et al.: Video-to-video synthesis. arXiv preprint arXiv:1808.06601 (2018)
Wu, Z., Sun, C., Xuan, H., et al.: Deep stereo video inpainting. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5693–5702 (2023)
Google Scholar
Wu, Z., Xuan, H., Sun, C., et al.: Semi-supervised video inpainting with cycle consistency constraints. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 22586–22595 (2023)
Google Scholar
Wu, Z., Zhang, K., Xuan, H., et al.: DAPC-Net: deformable alignment and pyramid context completion networks for video inpainting. IEEE Signal Process. Lett. 28, 1145–1149 (2021)
Article Google Scholar
Wu, Z., Sun, C., Xuan, H., Zhang, K., Yan, Y.: Divide-and-conquer completion network for video inpainting. IEEE Trans. Circuits Syst. Video Technol. 33(6), 2753–2766 (2023)
Article Google Scholar
Xu, R., Li, X., Zhou, B., et al.: Deep flow-guided video inpainting. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3723–3732 (2019)
Google Scholar
Zeng, Y., Fu, J., Chao, H.: Learning joint spatial-temporal transformations for video inpainting. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12361, pp. 528–543. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58517-4_31
Chapter Google Scholar
Zhang, K., Fu, J., Liu, D.: Flow-guided transformer for video inpainting. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) Computer Vision-ECCV 2022: 17th European Conference, pp. 74–90. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19797-0_5

Download references

Author information

Authors and Affiliations

Nanjing University of Science and Technology, Nanjing, 210094, China
Jianan Wang & Zhiliang Wu
Anhui University, Anhui, 230039, China
Hanyu Xuan

Authors

Jianan Wang
View author publications
You can also search for this author in PubMed Google Scholar
Hanyu Xuan
View author publications
You can also search for this author in PubMed Google Scholar
Zhiliang Wu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhiliang Wu .

Editor information

Editors and Affiliations

Nanjing University of Information Science and Technology, Nanjing, China
Qingshan Liu
Xiamen University, Xiamen, China
Hanzi Wang
Beijing University of Posts and Telecommunications, Beijing, China
Zhanyu Ma
Sun Yat-sen University, Guangzhou, China
Weishi Zheng
Peking University, Beijing, China
Hongbin Zha
Chinese Academy of Sciences, Beijing, China
Xilin Chen
Chinese Academy of Sciences, Beijing, China
Liang Wang
Xiamen University, Xiamen, China
Rongrong Ji

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, J., Xuan, H., Wu, Z. (2024). Semantic-Guided Completion Network for Video Inpainting in Complex Urban Scene. In: Liu, Q., et al. Pattern Recognition and Computer Vision. PRCV 2023. Lecture Notes in Computer Science, vol 14435. Springer, Singapore. https://doi.org/10.1007/978-981-99-8552-4_18

Download citation

DOI: https://doi.org/10.1007/978-981-99-8552-4_18
Published: 28 December 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8551-7
Online ISBN: 978-981-99-8552-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Semantic-Guided Completion Network for Video Inpainting in Complex Urban Scene