Abstract
Online video stabilization is important for hand-held camera shooting or remote robots control. Existing methods either need use the whole video to perform offline stabilization and result in long latency, or dismiss the nonuniform motion field in each frame and lead to large distortion. The non-uniform motion includes dynamic foreground motion and non-planar background motion. To better describe the shaky motion field online, we propose a novel attentive and multi-scale regression and refinement framework called ACP-Net. It exploits the idea of modeling camera motion on progressive levels, consisting of a flow-guided quiescent attention (FQA) module and a cascaded pyramid prediction (CPP) module. FQA module takes optical flow as an extra input and generates a soft mask to remedy the disturbance from dynamic foreground objects. Based on the attentive feature, the CPP module utilizes a multi-scale residual pyramid structure to do coarse to fine stabilization. Experimental results on public benchmarks show that our proposed method can achieve state-of-the-art performance both qualitatively and quantitatively, comparing to both online and offline methods.
Y. Xu and Q. ZhangāEqual contribution.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12346, pp. 213ā229. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58452-8_13
Choi, J., Kweon, I.S.: Deep iterative frame interpolation for full-frame video stabilization. ACM Trans. Graph. (TOG) 39(1), 1ā9 (2020)
Dosovitskiy, A., et al.: An image is worth 16\(\, \times \,\)16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)
Gleicher, M.L., Liu, F.: Re-cinematography: Improving the camerawork of casual video. ACM Trans. Multimedia Comput. Commun. Appl. 5(1), 1ā28 (2008)
Goldstein, A., Fattal, R.: Video stabilization using Epipolar geometry. ACM Trans. Graph. (TOG) 31(5), 1ā10 (2012)
Grundmann, M., Kwatra, V., Essa, I.: Auto-directed video stabilization with robust L1 optimal camera paths. In: CVPR 2011, pp. 225ā232. IEEE (2011)
Huang, C.H., Yin, H., Tai, Y.W., Tang, C.K.: Stablenet: semi-online, multi-scale deep video stabilization. arXiv preprint arXiv:1907.10283 (2019)
Jaderberg, M., Simonyan, K., Zisserman, A., et al.: Spatial transformer networks. In: Advances in Neural Information Processing Systems, pp. 2017ā2025 (2015)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Liu, F., Gleicher, M., Jin, H., Agarwala, A.: Content-preserving warps for 3D video stabilization. ACM Trans. Graph. (TOG) 28(3), 1ā9 (2009)
Liu, F., Gleicher, M., Wang, J., Jin, H., Agarwala, A.: Subspace video stabilization. ACM Trans. Graph. (TOG) 30(1), 1ā10 (2011)
Liu, S., Tan, P., Yuan, L., Sun, J., Zeng, B.: MeshFlow: minimum latency online videoĀ stabilization. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9910, pp. 800ā815. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46466-4_48
Liu, S., Yuan, L., Tan, P., Sun, J.: Bundled camera paths for video stabilization. ACM Trans. Graph. (TOG) 32(4), 1ā10 (2013)
Liu, S., Yuan, L., Tan, P., Sun, J.: SteadyFlow: spatially smooth optical flow for video stabilization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4209ā4216 (2014)
Matsushita, Y., Ofek, E., Ge, W., Tang, X., Shum, H.Y.: Full-frame video stabilization with motion inpainting. IEEE Trans. Pattern Anal. Mach. Intell. 28(7), 1150ā1163 (2006)
Roberto e Souza, M., Maia, H.D.A., Pedrini, H.: Survey on digital video stabilization: concepts, methods, and challenges. ACM Comput. Surv. (CSUR) 55(3), 1ā37 (2022)
Sun, D., Yang, X., Liu, M.Y., Kautz, J.: PWC-Net: CNNs for optical flow using pyramid, warping, and cost volume. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8934ā8943 (2018)
Teed, Z., Deng, J.: RAFT: recurrent all-pairs field transforms for optical flow. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12347, pp. 402ā419. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58536-5_24
Wang, M., et al.: Deep online video stabilization with multi-grid warping transformation learning. IEEE Trans. Image Process. 28(5), 2283ā2292 (2018)
Wang, Y.S., Liu, F., Hsu, P.S., Lee, T.Y.: Spatially and temporally optimized video stabilization. IEEE Trans. Vis. Comput. Graph. 19(8), 1354ā1361 (2013)
Woo, S., Park, J., Lee, J.Y., So Kweon, I.: CBAM: convolutional block attention module. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 3ā19 (2018)
Xu, H., Zhang, J., Cai, J., Rezatofighi, H., Tao, D.: Gmflow: Learning optical flow via global matching. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8121ā8130 (2022)
Xu, S.Z., Hu, J., Wang, M., Mu, T.J., Hu, S.M.: Deep video stabilization using adversarial networks. In: Computer Graphics Forum, vol. 37, pp. 267ā276. Wiley Online Library (2018)
Xu, Y., Zhang, J., Maybank, S.J., Tao, D.: DUT: learning video stabilization by simply watching unstable videos. IEEE Trans. Image Process. 31, 4306ā4320 (2022)
Xu, Y., Zhang, J., Tao, D.: Out-of-boundary view synthesis towards full-frame video stabilization. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4842ā4851 (2021)
Xu, Y., Zhang, J., Zhang, Q., Tao, D.: ViTPose: simple vision transformer baselines for human pose estimation. arXiv preprint arXiv:2204.12484 (2022)
Xu, Y., Zhang, Q., Zhang, J., Tao, D.: ViTAE: vision transformer advanced by exploring intrinsic inductive bias. In: Advances in Neural Information Processing Systems, vol. 34 (2021)
Yu, J., Ramamoorthi, R.: Selfie video stabilization. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 551ā566 (2018)
Yu, J., Ramamoorthi, R.: Robust video stabilization by optimization in CNN weight space. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3800ā3808 (2019)
Yu, J., Ramamoorthi, R.: Learning video stabilization using optical flow. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8159ā8167 (2020)
Zhang, J., Tao, D.: Empowering things with intelligence: a survey of the progress, challenges, and opportunities in artificial intelligence of things. IEEE Internet Things J. 8(10), 7789ā7817 (2020)
Zhang, L., Chen, X.Q., Kong, X.Y., Huang, H.: Geodesic video stabilization in transformation space. IEEE Trans. Image Process. 26(5), 2219ā2229 (2017)
Zhang, Q., Xu, Y., Zhang, J., Tao, D.: ViTAEv2: vision transformer advanced by exploring inductive bias for image recognition and beyond. arXiv preprint arXiv:2202.10108 (2022)
Zhang, Q., Xu, Y., Zhang, J., Tao, D.: VSA: learning varied-size window attention in vision transformers. arXiv preprint arXiv:2204.08446 (2022)
Zhao, M., Ling, Q.: PWStableNet: learning pixel-wise warping maps for video stabilization. IEEE Trans. Image Process. 29, 3582ā3595 (2020)
Acknowledgement
Mr Yufei Xu, Mr Qiming Zhang, and Dr Jing Zhang are supported in part by ARC FL-170100117 and IH-180100002.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
Ā© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Xu, Y., Zhang, Q., Zhang, J., Tao, D. (2022). Attentive Cascaded Pyramid Network forĀ Online Video Stabilization. In: Fang, L., Povey, D., Zhai, G., Mei, T., Wang, R. (eds) Artificial Intelligence. CICAI 2022. Lecture Notes in Computer Science(), vol 13604. Springer, Cham. https://doi.org/10.1007/978-3-031-20497-5_2
Download citation
DOI: https://doi.org/10.1007/978-3-031-20497-5_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-20496-8
Online ISBN: 978-3-031-20497-5
eBook Packages: Computer ScienceComputer Science (R0)