Attentive Cascaded Pyramid Network for Online Video Stabilization

Xu, Yufei; Zhang, Qiming; Zhang, Jing; Tao, Dacheng

doi:10.1007/978-3-031-20497-5_2

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13604))

Included in the following conference series:

CAAI International Conference on Artificial Intelligence

1336 Accesses

Abstract

Online video stabilization is important for hand-held camera shooting or remote robots control. Existing methods either need use the whole video to perform offline stabilization and result in long latency, or dismiss the nonuniform motion field in each frame and lead to large distortion. The non-uniform motion includes dynamic foreground motion and non-planar background motion. To better describe the shaky motion field online, we propose a novel attentive and multi-scale regression and refinement framework called ACP-Net. It exploits the idea of modeling camera motion on progressive levels, consisting of a flow-guided quiescent attention (FQA) module and a cascaded pyramid prediction (CPP) module. FQA module takes optical flow as an extra input and generates a soft mask to remedy the disturbance from dynamic foreground objects. Based on the attentive feature, the CPP module utilizes a multi-scale residual pyramid structure to do coarse to fine stabilization. Experimental results on public benchmarks show that our proposed method can achieve state-of-the-art performance both qualitatively and quantitatively, comparing to both online and offline methods.

Y. Xu and Q. Zhang—Equal contribution.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12346, pp. 213–229. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58452-8_13
Chapter Google Scholar
Choi, J., Kweon, I.S.: Deep iterative frame interpolation for full-frame video stabilization. ACM Trans. Graph. (TOG) 39(1), 1–9 (2020)
Article Google Scholar
Dosovitskiy, A., et al.: An image is worth 16\(\, \times \,\)16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)
Gleicher, M.L., Liu, F.: Re-cinematography: Improving the camerawork of casual video. ACM Trans. Multimedia Comput. Commun. Appl. 5(1), 1–28 (2008)
Article Google Scholar
Goldstein, A., Fattal, R.: Video stabilization using Epipolar geometry. ACM Trans. Graph. (TOG) 31(5), 1–10 (2012)
Article Google Scholar
Grundmann, M., Kwatra, V., Essa, I.: Auto-directed video stabilization with robust L1 optimal camera paths. In: CVPR 2011, pp. 225–232. IEEE (2011)
Google Scholar
Huang, C.H., Yin, H., Tai, Y.W., Tang, C.K.: Stablenet: semi-online, multi-scale deep video stabilization. arXiv preprint arXiv:1907.10283 (2019)
Jaderberg, M., Simonyan, K., Zisserman, A., et al.: Spatial transformer networks. In: Advances in Neural Information Processing Systems, pp. 2017–2025 (2015)
Google Scholar
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Liu, F., Gleicher, M., Jin, H., Agarwala, A.: Content-preserving warps for 3D video stabilization. ACM Trans. Graph. (TOG) 28(3), 1–9 (2009)
Google Scholar
Liu, F., Gleicher, M., Wang, J., Jin, H., Agarwala, A.: Subspace video stabilization. ACM Trans. Graph. (TOG) 30(1), 1–10 (2011)
Article Google Scholar
Liu, S., Tan, P., Yuan, L., Sun, J., Zeng, B.: MeshFlow: minimum latency online video stabilization. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9910, pp. 800–815. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46466-4_48
Chapter Google Scholar
Liu, S., Yuan, L., Tan, P., Sun, J.: Bundled camera paths for video stabilization. ACM Trans. Graph. (TOG) 32(4), 1–10 (2013)
Google Scholar
Liu, S., Yuan, L., Tan, P., Sun, J.: SteadyFlow: spatially smooth optical flow for video stabilization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4209–4216 (2014)
Google Scholar
Matsushita, Y., Ofek, E., Ge, W., Tang, X., Shum, H.Y.: Full-frame video stabilization with motion inpainting. IEEE Trans. Pattern Anal. Mach. Intell. 28(7), 1150–1163 (2006)
Article Google Scholar
Roberto e Souza, M., Maia, H.D.A., Pedrini, H.: Survey on digital video stabilization: concepts, methods, and challenges. ACM Comput. Surv. (CSUR) 55(3), 1–37 (2022)
Article Google Scholar
Sun, D., Yang, X., Liu, M.Y., Kautz, J.: PWC-Net: CNNs for optical flow using pyramid, warping, and cost volume. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8934–8943 (2018)
Google Scholar
Teed, Z., Deng, J.: RAFT: recurrent all-pairs field transforms for optical flow. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12347, pp. 402–419. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58536-5_24
Chapter Google Scholar
Wang, M., et al.: Deep online video stabilization with multi-grid warping transformation learning. IEEE Trans. Image Process. 28(5), 2283–2292 (2018)
Article MathSciNet Google Scholar
Wang, Y.S., Liu, F., Hsu, P.S., Lee, T.Y.: Spatially and temporally optimized video stabilization. IEEE Trans. Vis. Comput. Graph. 19(8), 1354–1361 (2013)
Article Google Scholar
Woo, S., Park, J., Lee, J.Y., So Kweon, I.: CBAM: convolutional block attention module. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 3–19 (2018)
Google Scholar
Xu, H., Zhang, J., Cai, J., Rezatofighi, H., Tao, D.: Gmflow: Learning optical flow via global matching. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8121–8130 (2022)
Google Scholar
Xu, S.Z., Hu, J., Wang, M., Mu, T.J., Hu, S.M.: Deep video stabilization using adversarial networks. In: Computer Graphics Forum, vol. 37, pp. 267–276. Wiley Online Library (2018)
Google Scholar
Xu, Y., Zhang, J., Maybank, S.J., Tao, D.: DUT: learning video stabilization by simply watching unstable videos. IEEE Trans. Image Process. 31, 4306–4320 (2022)
Article Google Scholar
Xu, Y., Zhang, J., Tao, D.: Out-of-boundary view synthesis towards full-frame video stabilization. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4842–4851 (2021)
Google Scholar
Xu, Y., Zhang, J., Zhang, Q., Tao, D.: ViTPose: simple vision transformer baselines for human pose estimation. arXiv preprint arXiv:2204.12484 (2022)
Xu, Y., Zhang, Q., Zhang, J., Tao, D.: ViTAE: vision transformer advanced by exploring intrinsic inductive bias. In: Advances in Neural Information Processing Systems, vol. 34 (2021)
Google Scholar
Yu, J., Ramamoorthi, R.: Selfie video stabilization. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 551–566 (2018)
Google Scholar
Yu, J., Ramamoorthi, R.: Robust video stabilization by optimization in CNN weight space. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3800–3808 (2019)
Google Scholar
Yu, J., Ramamoorthi, R.: Learning video stabilization using optical flow. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8159–8167 (2020)
Google Scholar
Zhang, J., Tao, D.: Empowering things with intelligence: a survey of the progress, challenges, and opportunities in artificial intelligence of things. IEEE Internet Things J. 8(10), 7789–7817 (2020)
Article Google Scholar
Zhang, L., Chen, X.Q., Kong, X.Y., Huang, H.: Geodesic video stabilization in transformation space. IEEE Trans. Image Process. 26(5), 2219–2229 (2017)
Article MathSciNet MATH Google Scholar
Zhang, Q., Xu, Y., Zhang, J., Tao, D.: ViTAEv2: vision transformer advanced by exploring inductive bias for image recognition and beyond. arXiv preprint arXiv:2202.10108 (2022)
Zhang, Q., Xu, Y., Zhang, J., Tao, D.: VSA: learning varied-size window attention in vision transformers. arXiv preprint arXiv:2204.08446 (2022)
Zhao, M., Ling, Q.: PWStableNet: learning pixel-wise warping maps for video stabilization. IEEE Trans. Image Process. 29, 3582–3595 (2020)
Article MATH Google Scholar

Download references

Acknowledgement

Mr Yufei Xu, Mr Qiming Zhang, and Dr Jing Zhang are supported in part by ARC FL-170100117 and IH-180100002.

Author information

Authors and Affiliations

University of Sydney, Camperdown, Australia
Yufei Xu, Qiming Zhang & Jing Zhang
JD Explore Academy, Beijing, China
Dacheng Tao

Authors

Yufei Xu
View author publications
You can also search for this author in PubMed Google Scholar
Qiming Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Jing Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Dacheng Tao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dacheng Tao .

Editor information

Editors and Affiliations

Tsinghua University, Beijing, China
Lu Fang
Xiaomi Inc., Beijing, China
Daniel Povey
Shanghai Jiao Tong University, Shanghai, China
Guangtao Zhai
JD Explore Academy, Beijing, China
Tao Mei
Chinese Academy of Sciences, Beijing, China
Ruiping Wang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Xu, Y., Zhang, Q., Zhang, J., Tao, D. (2022). Attentive Cascaded Pyramid Network for Online Video Stabilization. In: Fang, L., Povey, D., Zhai, G., Mei, T., Wang, R. (eds) Artificial Intelligence. CICAI 2022. Lecture Notes in Computer Science(), vol 13604. Springer, Cham. https://doi.org/10.1007/978-3-031-20497-5_2

Download citation

DOI: https://doi.org/10.1007/978-3-031-20497-5_2
Published: 17 December 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-20496-8
Online ISBN: 978-3-031-20497-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Attentive Cascaded Pyramid Network for Online Video Stabilization