Abstract
The purpose of video inpainting is to get a reasonable content from the video to fill in the missing region. Video is a continuous four-dimensional sequence in the temporal dimension. It’s difficult to ensure the temporal continuity of video by inpaint video frames respectively along the time dimension. Video inpainting has gone from the traditional inpainting algorithm to the advanced learning based inpainting method. It has been able to inpainting for a variety of scenes. However, there are still unresolved questions in video inpainting, and video inpainting is still a challenging task. Existing works focused on fixing the problem of object removal in the video, and neglected the importance of inpainting the occlusion scene in the middle region. For the occlusion problem in the middle region, we propose a local and nonlocal optical flow video inpainting framework. First, according to the forward and backward directions of the reference frame and the sampling window, we divide the video into local and nonlocal frames, extract the local and nonlocal optical flow and feed them to the residual network for rough inpainting. Next, our approach extracts and completes the edges of the predicted flow. Finally, the composed optical flow field guides the propagation of pixels to inpaint the video content. Experimental results on DAVIS and YouTube-VOS datasets show that our method has significantly improved in terms of the image quality and optical flow quality compared with the state of the art. Codes are available at {https://github.com/lengfengio/LNFVI.git.}











Similar content being viewed by others
Availability of data and materials
We using available datasets to tested our method. The DAVIS dataset can be found in https://davischallenge.org/davis2017/code.html and the YouTube-VOS dataset can be found in https://youtube-vos.org/dataset
Code Availability
We upload the project in GitHub in the next few months.
References
Barnes C, Shechtman E, Finkelstein A, Goldman D B (2009) Patchmatch: a randomized correspondence algorithm for structural image editing. ACM Trans Graph 28(3):24
Beauchemin S S, Barron J L (1995) The computation of optical flow. ACM Comput Surv (CSUR) 27(3):433–466
Chen L, Takaki T, Ishii I (2012) Accuracy of gradient-based optical flow estimation in high-frame-rate video analysis. IEICE Trans Inform Syst 95 (4):1130–1141
Cheng J, Yang Y, Tang X, Xiong N, Zhang Y, Lei F (2020) Generative adversarial networks: a literature review. KSII Trans Internet Inform Syst (TIIS) 14(12):4625–4647
Cong R, Lei J, Fu H, Porikli F, Huang Q, Hou C (2019) Video saliency detection via sparsity-based reconstruction and propagation. IEEE Trans Image Process 28(10):4819–4831
Ding L, Goshtasby A (2001) On the canny edge detector. Pattern Recogn 34(3):721–725
Ding D, Ram S, Rodríguez J J (2018) Image inpainting using nonlocal texture matching and nonlinear filtering. IEEE Trans Image Process 28 (4):1705–1719
Elharrouss O, Almaadeed N, Al-Maadeed S, Akbari Y (2020) Image inpainting: a review. Neural Process Lett 51(2):2007–2028
Gao C, Saraf A, Huang J-B, Kopf J (2020) Flow-edge guided video completion. In: European conference on computer vision. Springer, pp 713–729
Gibbons F X (1990) Self-attention and behavior: a review and theoretical update. Adv Exper Soc Psychol 23:249–303
Huang J-B, Kang S B, Ahuja N, Kopf J (2016) Temporally coherent completion of dynamic video. ACM Trans Graph (TOG) 35(6):1–11
Iizuka S, Simo-Serra E, Ishikawa H (2017) Globally and locally consistent image completion. ACM Trans Graph (ToG) 36(4):1–14
Ilg E, Mayer N, Saikia T, Keuper M, Dosovitskiy A, Brox T (2017) Flownet 2.0: evolution of optical flow estimation with deep networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2462–2470
Jam J, Kendrick C, Walker K, Drouard V, Hsu J G-S, Yap M H (2021) A comprehensive review of past and present image inpainting methods. Comput Vis Image Understand 203:103147
Kim D, Woo S, Lee J-Y, Kweon I S (2019) Deep video inpainting. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5792–5801
Köhler R, Schuler C, Schölkopf B, Harmeling S (2014) Mask-specific inpainting with deep neural networks. In: German conference on pattern recognition. Springer, pp 523–534
Krizhevsky A, Sutskever I, Hinton G E (2012) Imagenet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems, 25
Lee S, Oh S W, Won D, Kim S J (2019) Copy-and-paste networks for deep video inpainting. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 4413–4421
Li Y, Zhang X, Chen D (2018) Csrnet: Dilated convolutional neural networks for understanding the highly congested scenes. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1091–1100
Lin J, Gan C, Han S Temporal shift module for efficient video understanding. arXiv:1811.08383 (2018) 1811
Li Z, Lu C-Z, Qin J, Guo C-L, Cheng M-M (2022) Towards an end-to-end framework for flow-guided video inpainting. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 17562–17571
Liu K, Du Q, Yang H, Ma B (2010) Optical flow and principal component analysis-based motion detection in outdoor videos. EURASIP J Adv Signal Process 2010(1):1–6
Liu K, Li J, Hussain Bukhari S S (2022) Overview of image inpainting and forensic technology. Security and Communication Networks, 2022
Nazeri K, Ng E, Joseph T, Qureshi F Z, Ebrahimi M (2019) Edgeconnect: generative image inpainting with adversarial edge learning. arXiv:1901.00212
Newson A, Almansa A, Fradet M, Gousseau Y, Pérez P (2014) Video inpainting of complex scenes. SIAM J Imag Sci 7(4):1993–2019
Oh S W, Lee S, Lee J-Y, Kim S J (2019) Onion-peel networks for deep video completion. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 4403–4412
Ouyang H, Wang T, Chen Q (2021) Internal video inpainting by implicit long-range propagation. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 14579–14588
Pathak D, Krahenbuhl P, Donahue J, Darrell T, Efros A A (2016) Context encoders: feature learning by inpainting. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2536–2544
Perazzi F, Pont-Tuset J, McWilliams B, Van Gool L, Gross M, Sorkine-Hornung A (2016) A benchmark dataset and evaluation methodology for video object segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 724–732
Ren J S, Xu L, Yan Q, Sun W (2015) Shepard convolutional neural networks. Advances in Neural Information Processing Systems, 28
Shao M, Zhang W, Zuo W, Meng D (2020) Multi-scale generative adversarial inpainting network based on cross-layer attention transfer mechanism. Knowl-Based Syst 196:105778
Shaw P, Uszkoreit J, Vaswani A (2018) Self-attention with relative position representations. arXiv:1803.02155
Shih T K, Tang N C, Hwang J-N (2009) Exemplar-based video inpainting without ghost shadow artifacts by maintaining temporal continuity. IEEE Trans Circ Syst Video Technol 19(3):347–360
Sridevi G, Srinivas Kumar S (2019) Image inpainting based on fractional-order nonlinear diffusion for image reconstruction. Circ Syst Signal Process 38 (8):3802–3817
Theckedath D, Sedamkar RR (2020) Detecting affect states using vgg16, resnet50 and se-resnet50 networks. SN Comput Sci 1(2):1–7
Wang C, Huang H, Han X, Wang J (2019) Video inpainting by jointly learning temporal structure and spatial details. In: Proceedings of the AAAI conference on artificial intelligence, vol 33, pp 5232–5239
Wang L, Chen W, Yang W, Bi F, Yu F R (2020) A state-of-the-art review on image synthesis with generative adversarial networks. IEEE Access 8:63514–63537
Wang L, Guo Y, Liu L, Lin Z, Deng X, An W (2020) Deep video super-resolution using hr optical flow estimation. IEEE Trans Image Process 29:4323–4336
Wexler Y, Shechtman E, Irani M (2007) Space-time completion of video. IEEE Trans Pattern Anal Mach Intell 29(3):463–476
Xu N, Yang L, Fan Y, Yue D, Liang Y, Yang J, Huang T (2018) Youtube-vos: a large-scale video object segmentation benchmark. arXiv:1809.03327
Xu R, Li X, Zhou B, Loy C C (2019) Deep flow-guided video inpainting. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3723–3732
Yin Z, Shi J (2018) Geonet: unsupervised learning of dense depth, optical flow and camera pose. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1983–1992
Yu J, Lin Z, Yang J, Shen X, Lu X, Huang T S (2018) Generative image inpainting with contextual attention. In: Proceedings of the IEEE conference on computer vision and pattern recognition , pp 5505–5514
Yu J, Lin Z, Yang J, Shen X, Lu X, Huang T S (2019) Free-form image inpainting with gated convolution. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 4471–4480
Zeng Y, Fu J, Chao H (2020) Learning joint spatial-temporal transformations for video inpainting. In: European conference on computer vision. Springer, pp 528–543
Zhang Y, Aydın TO (2021) Deep hdr estimation with generative detail reconstruction. In: Computer graphics forum, vol 40. Wiley Online Library, pp 179–190
Zhang M, Zhang G-F (2021) Fast image inpainting strategy based on the space-fractional modified cahn-hilliard equations. Comput Math Applic 102:1–14
Zhang R, Isola P, Efros A A, Shechtman E, Wang O (2018) The unreasonable effectiveness of deep features as a perceptual metric. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 586–595
Zhang K, Fu J, Liu D (2022) Inertia-guided flow completion and style fusion for video inpainting. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5982–5991
Zhao W, Zhang J, Li L, Barnes N, Liu N, Han J (2021) Weakly supervised video salient object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pp 16826–16835
Funding
This work was supported by Fundamental Research Funds for the Universities of Henan Province(NSFRF220414), Excellent Young Teachers Program of Henan Polytechnic University (No.2019XQG-02).
Author information
Authors and Affiliations
Contributions
All authors took part in the discussion of the work described in this paper. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Not applicable
Consent for Publication
Not applicable
Conflict of Interests
The authors declare that they have no conflict of interest.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Wang, J., Yang, Z., Huo, Z. et al. Local and nonlocal flow-guided video inpainting. Multimed Tools Appl 83, 10321–10340 (2024). https://doi.org/10.1007/s11042-023-15457-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-023-15457-z