Skip to main content

Attentive Cascaded Pyramid Network forĀ Online Video Stabilization

  • Conference paper
  • First Online:
Artificial Intelligence (CICAI 2022)

Abstract

Online video stabilization is important for hand-held camera shooting or remote robots control. Existing methods either need use the whole video to perform offline stabilization and result in long latency, or dismiss the nonuniform motion field in each frame and lead to large distortion. The non-uniform motion includes dynamic foreground motion and non-planar background motion. To better describe the shaky motion field online, we propose a novel attentive and multi-scale regression and refinement framework called ACP-Net. It exploits the idea of modeling camera motion on progressive levels, consisting of a flow-guided quiescent attention (FQA) module and a cascaded pyramid prediction (CPP) module. FQA module takes optical flow as an extra input and generates a soft mask to remedy the disturbance from dynamic foreground objects. Based on the attentive feature, the CPP module utilizes a multi-scale residual pyramid structure to do coarse to fine stabilization. Experimental results on public benchmarks show that our proposed method can achieve state-of-the-art performance both qualitatively and quantitatively, comparing to both online and offline methods.

Y. Xu and Q. Zhangā€”Equal contribution.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12346, pp. 213ā€“229. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58452-8_13

    ChapterĀ  Google ScholarĀ 

  2. Choi, J., Kweon, I.S.: Deep iterative frame interpolation for full-frame video stabilization. ACM Trans. Graph. (TOG) 39(1), 1ā€“9 (2020)

    ArticleĀ  Google ScholarĀ 

  3. Dosovitskiy, A., et al.: An image is worth 16\(\, \times \,\)16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)

  4. Gleicher, M.L., Liu, F.: Re-cinematography: Improving the camerawork of casual video. ACM Trans. Multimedia Comput. Commun. Appl. 5(1), 1ā€“28 (2008)

    ArticleĀ  Google ScholarĀ 

  5. Goldstein, A., Fattal, R.: Video stabilization using Epipolar geometry. ACM Trans. Graph. (TOG) 31(5), 1ā€“10 (2012)

    ArticleĀ  Google ScholarĀ 

  6. Grundmann, M., Kwatra, V., Essa, I.: Auto-directed video stabilization with robust L1 optimal camera paths. In: CVPR 2011, pp. 225ā€“232. IEEE (2011)

    Google ScholarĀ 

  7. Huang, C.H., Yin, H., Tai, Y.W., Tang, C.K.: Stablenet: semi-online, multi-scale deep video stabilization. arXiv preprint arXiv:1907.10283 (2019)

  8. Jaderberg, M., Simonyan, K., Zisserman, A., et al.: Spatial transformer networks. In: Advances in Neural Information Processing Systems, pp. 2017ā€“2025 (2015)

    Google ScholarĀ 

  9. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)

  10. Liu, F., Gleicher, M., Jin, H., Agarwala, A.: Content-preserving warps for 3D video stabilization. ACM Trans. Graph. (TOG) 28(3), 1ā€“9 (2009)

    Google ScholarĀ 

  11. Liu, F., Gleicher, M., Wang, J., Jin, H., Agarwala, A.: Subspace video stabilization. ACM Trans. Graph. (TOG) 30(1), 1ā€“10 (2011)

    ArticleĀ  Google ScholarĀ 

  12. Liu, S., Tan, P., Yuan, L., Sun, J., Zeng, B.: MeshFlow: minimum latency online videoĀ stabilization. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9910, pp. 800ā€“815. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46466-4_48

    ChapterĀ  Google ScholarĀ 

  13. Liu, S., Yuan, L., Tan, P., Sun, J.: Bundled camera paths for video stabilization. ACM Trans. Graph. (TOG) 32(4), 1ā€“10 (2013)

    Google ScholarĀ 

  14. Liu, S., Yuan, L., Tan, P., Sun, J.: SteadyFlow: spatially smooth optical flow for video stabilization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4209ā€“4216 (2014)

    Google ScholarĀ 

  15. Matsushita, Y., Ofek, E., Ge, W., Tang, X., Shum, H.Y.: Full-frame video stabilization with motion inpainting. IEEE Trans. Pattern Anal. Mach. Intell. 28(7), 1150ā€“1163 (2006)

    ArticleĀ  Google ScholarĀ 

  16. Roberto e Souza, M., Maia, H.D.A., Pedrini, H.: Survey on digital video stabilization: concepts, methods, and challenges. ACM Comput. Surv. (CSUR) 55(3), 1ā€“37 (2022)

    ArticleĀ  Google ScholarĀ 

  17. Sun, D., Yang, X., Liu, M.Y., Kautz, J.: PWC-Net: CNNs for optical flow using pyramid, warping, and cost volume. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8934ā€“8943 (2018)

    Google ScholarĀ 

  18. Teed, Z., Deng, J.: RAFT: recurrent all-pairs field transforms for optical flow. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12347, pp. 402ā€“419. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58536-5_24

    ChapterĀ  Google ScholarĀ 

  19. Wang, M., et al.: Deep online video stabilization with multi-grid warping transformation learning. IEEE Trans. Image Process. 28(5), 2283ā€“2292 (2018)

    ArticleĀ  MathSciNetĀ  Google ScholarĀ 

  20. Wang, Y.S., Liu, F., Hsu, P.S., Lee, T.Y.: Spatially and temporally optimized video stabilization. IEEE Trans. Vis. Comput. Graph. 19(8), 1354ā€“1361 (2013)

    ArticleĀ  Google ScholarĀ 

  21. Woo, S., Park, J., Lee, J.Y., So Kweon, I.: CBAM: convolutional block attention module. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 3ā€“19 (2018)

    Google ScholarĀ 

  22. Xu, H., Zhang, J., Cai, J., Rezatofighi, H., Tao, D.: Gmflow: Learning optical flow via global matching. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8121ā€“8130 (2022)

    Google ScholarĀ 

  23. Xu, S.Z., Hu, J., Wang, M., Mu, T.J., Hu, S.M.: Deep video stabilization using adversarial networks. In: Computer Graphics Forum, vol. 37, pp. 267ā€“276. Wiley Online Library (2018)

    Google ScholarĀ 

  24. Xu, Y., Zhang, J., Maybank, S.J., Tao, D.: DUT: learning video stabilization by simply watching unstable videos. IEEE Trans. Image Process. 31, 4306ā€“4320 (2022)

    ArticleĀ  Google ScholarĀ 

  25. Xu, Y., Zhang, J., Tao, D.: Out-of-boundary view synthesis towards full-frame video stabilization. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4842ā€“4851 (2021)

    Google ScholarĀ 

  26. Xu, Y., Zhang, J., Zhang, Q., Tao, D.: ViTPose: simple vision transformer baselines for human pose estimation. arXiv preprint arXiv:2204.12484 (2022)

  27. Xu, Y., Zhang, Q., Zhang, J., Tao, D.: ViTAE: vision transformer advanced by exploring intrinsic inductive bias. In: Advances in Neural Information Processing Systems, vol. 34 (2021)

    Google ScholarĀ 

  28. Yu, J., Ramamoorthi, R.: Selfie video stabilization. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 551ā€“566 (2018)

    Google ScholarĀ 

  29. Yu, J., Ramamoorthi, R.: Robust video stabilization by optimization in CNN weight space. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3800ā€“3808 (2019)

    Google ScholarĀ 

  30. Yu, J., Ramamoorthi, R.: Learning video stabilization using optical flow. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8159ā€“8167 (2020)

    Google ScholarĀ 

  31. Zhang, J., Tao, D.: Empowering things with intelligence: a survey of the progress, challenges, and opportunities in artificial intelligence of things. IEEE Internet Things J. 8(10), 7789ā€“7817 (2020)

    ArticleĀ  Google ScholarĀ 

  32. Zhang, L., Chen, X.Q., Kong, X.Y., Huang, H.: Geodesic video stabilization in transformation space. IEEE Trans. Image Process. 26(5), 2219ā€“2229 (2017)

    ArticleĀ  MathSciNetĀ  MATHĀ  Google ScholarĀ 

  33. Zhang, Q., Xu, Y., Zhang, J., Tao, D.: ViTAEv2: vision transformer advanced by exploring inductive bias for image recognition and beyond. arXiv preprint arXiv:2202.10108 (2022)

  34. Zhang, Q., Xu, Y., Zhang, J., Tao, D.: VSA: learning varied-size window attention in vision transformers. arXiv preprint arXiv:2204.08446 (2022)

  35. Zhao, M., Ling, Q.: PWStableNet: learning pixel-wise warping maps for video stabilization. IEEE Trans. Image Process. 29, 3582ā€“3595 (2020)

    ArticleĀ  MATHĀ  Google ScholarĀ 

Download references

Acknowledgement

Mr Yufei Xu, Mr Qiming Zhang, and Dr Jing Zhang are supported in part by ARC FL-170100117 and IH-180100002.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dacheng Tao .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

Ā© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Xu, Y., Zhang, Q., Zhang, J., Tao, D. (2022). Attentive Cascaded Pyramid Network forĀ Online Video Stabilization. In: Fang, L., Povey, D., Zhai, G., Mei, T., Wang, R. (eds) Artificial Intelligence. CICAI 2022. Lecture Notes in Computer Science(), vol 13604. Springer, Cham. https://doi.org/10.1007/978-3-031-20497-5_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-20497-5_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-20496-8

  • Online ISBN: 978-3-031-20497-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics