Abstract
Video inpainting aims to fill damaged areas in video frames with appropriate content. Complex scene contain cluttered or ambiguous semantics and objects, making video inpainting in such scenarios a challenging yet meaningful task. Current methods are limited by the lack of sufficient video information, resulting in blurred results and temporal artifacts. In this paper, we design a novel semantic-guided completion network that uses the semantic information of the videos to complete missing regions in the complex urban scene. Specifically, we first leverage the semantic information to model the structure and content of the video and improve U-Net network to complete the broken semantic image. Then, we propose a module based on spatial-adaptive normalization to guide the generation of the damaged part of the video pixels by combining semantic information. Our model’s ability to generate reasonable and accurate content is demonstrated through both quantitative and qualitative results on two publicly available urban scene datasets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bertalmio, M., Bertozzi, A.L., Sapiro, G.: Navier-stokes, fluid dynamics, and image and video inpainting. In: Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), p. I (2001)
Brostow, G.J., Shotton, J., Fauqueur, J., Cipolla, R.: Segmentation and recognition using structure from motion point clouds. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008. LNCS, vol. 5302, pp. 44–57. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-88682-2_5
Chang, Y.L., Liu, Z.Y., Lee, K.Y., et al.: Learnable gated temporal shift module for deep video inpainting. arXiv preprint arXiv:1907.01131 (2019)
Chen, D., Liao, J., Yuan, L., et al.: Coherent online video style transfer. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1105–1114 (2017)
Chen, Y., Guo, X., Shen, W.: Robust adaptive spatio-temporal video denoising algorithm based on motion estimation. Comput. Appl. 26(8), 1882–1887 (2006)
Cordts, M., Omran, M., Ramos, S., et al.: The cityscapes dataset for semantic urban scene understanding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3213–3223 (2016)
Gatys, L.A., Ecker, A.S., Bethge, M.: A neural algorithm of artistic style. arXiv preprint arXiv:1508.06576 (2015)
Granados, M., Tompkin, J., Kim, K.I., et al.: How not to be seen - object removal from videos of crowded scenes. Comput. Graph. Forum 31, 219–228 (2012)
Hu, P., Caba, F., Wang, O., et al.: Temporally distributed networks for fast video semantic segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8818–8827 (2020)
Ilg, E., Mayer, N., Saikia, T., et al.: FlowNet 2.0: evolution of optical flow estimation with deep networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2462–2470 (2017)
Isola, P., Zhu, J.Y., Zhou, T., et al.: Image-to-image translation with conditional adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1125–1134 (2017)
Kang, J., Oh, S.W., Kim, S.J.: Error compensation framework for flow-guided video inpainting. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision-ECCV 2022: 17th European Conference, pp. 357–390. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19784-0_22
Kim, D., Woo, S., Lee, J.Y.: Deep video inpainting. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5792–5801 (2019)
Lee, S., Oh, S.W., Won, D.Y., et al.: Copy-and-paste networks for deep video inpainting. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4413–4421 (2019)
Liu, R., Deng, H., Huang, Y., et al.: FuseFormer: fusing fine-grained information in transformers for video inpainting. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 14040–14049 (2021)
Liu, R., Li, B., Zhu, Y.: Temporal group fusion network for deep video inpainting. IEEE Trans. Circuits Syst. Video Technol. 32(6), 3539–3551 (2021)
Nazeri, K., Ng, E., Joseph, T., et al.: EdgeConnect: generative image inpainting with adversarial edge learning. arXiv preprint arXiv:1901.00212 (2019)
Newson, A., Almansa, A., Fradet, M., et al.: Video inpainting of complex scenes. SIAM J. Imag. Sci. 7(4), 1993–2019 (2014)
Oh, S.W., Lee, S., Lee, J.Y., et al.: Onion-peel networks for deep video completion. In: proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4403–4412 (2019)
Park, T., Liu, M.Y., Wang, T.C., et al.: Semantic image synthesis with spatially-adaptive normalization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2337–2346 (2019)
Patwardhan, K.A., Sapiro, G., Bertalmio, M.: Video inpainting of occluding and occluded objects. In: IEEE International Conference on Image Processing (ICIP), pp. II-69 (2005)
Shang, Y., Duan, B., Zong, Z., Nie, L., Yan, Y.: Lipschitz continuity guided knowledge distillation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10675–10684 (2021)
Shang, Y., Xu, D., Zong, Z., Nie, L., Yan, Y.: Network binarization via contrastive learning. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) European Conference on Computer Vision. pp. 586–602. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-20083-0_35
Shang, Y., Yuan, Z., Xie, B., Wu, B., Yan, Y.: Post-training quantization on diffusion models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1972–1981 (2023)
Wang, C., Huang, H., Han, X., et al.: Video inpainting by jointly learning temporal structure and spatial details. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 5232–5239 (2019)
Wang, T.C., Liu, M.Y., Tao, A., et al.: Few-shot video-to-video synthesis. arXiv preprint arXiv:1910.12713 (2019)
Wang, T.C., Liu, M.Y., Zhu, J.Y., et al.: High-resolution image synthesis and semantic manipulation with conditional GANs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8789–8807 (2018)
Wang, T.C., Liu, M.Y., Zhu, J.Y., et al.: Video-to-video synthesis. arXiv preprint arXiv:1808.06601 (2018)
Wu, Z., Sun, C., Xuan, H., et al.: Deep stereo video inpainting. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5693–5702 (2023)
Wu, Z., Xuan, H., Sun, C., et al.: Semi-supervised video inpainting with cycle consistency constraints. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 22586–22595 (2023)
Wu, Z., Zhang, K., Xuan, H., et al.: DAPC-Net: deformable alignment and pyramid context completion networks for video inpainting. IEEE Signal Process. Lett. 28, 1145–1149 (2021)
Wu, Z., Sun, C., Xuan, H., Zhang, K., Yan, Y.: Divide-and-conquer completion network for video inpainting. IEEE Trans. Circuits Syst. Video Technol. 33(6), 2753–2766 (2023)
Xu, R., Li, X., Zhou, B., et al.: Deep flow-guided video inpainting. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3723–3732 (2019)
Zeng, Y., Fu, J., Chao, H.: Learning joint spatial-temporal transformations for video inpainting. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12361, pp. 528–543. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58517-4_31
Zhang, K., Fu, J., Liu, D.: Flow-guided transformer for video inpainting. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) Computer Vision-ECCV 2022: 17th European Conference, pp. 74–90. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19797-0_5
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Wang, J., Xuan, H., Wu, Z. (2024). Semantic-Guided Completion Network for Video Inpainting in Complex Urban Scene. In: Liu, Q., et al. Pattern Recognition and Computer Vision. PRCV 2023. Lecture Notes in Computer Science, vol 14435. Springer, Singapore. https://doi.org/10.1007/978-981-99-8552-4_18
Download citation
DOI: https://doi.org/10.1007/978-981-99-8552-4_18
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8551-7
Online ISBN: 978-981-99-8552-4
eBook Packages: Computer ScienceComputer Science (R0)