Skip to main content

Efficient Semantic-Guidance High-Resolution Video Matting

  • Conference paper
  • First Online:
Advances in Computer Graphics (CGI 2023)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14495))

Included in the following conference series:

  • 200 Accesses

Abstract

Video matting has made significant progress in trimap-based field. However, researchers are increasingly interested in auxiliary-free matting because it is more useful in real-world applications. We propose a new efficient semantic-guidance high-resolution video matting network for human body. We apply the convolutional network as the backbone while also employing the transformer in the encoder, which is used to utilize semantic features, while ensuring that the network is not overly bloated. In addition, a channel-wise attention mechanism is introduced in the decoder to improve the representation of semantic feature. In comparison to the current state-of-the-art methods, the method proposed in this paper achieves better results while maintaining the speed and efficiency of prediction. We can complete the real-time auxiliary-free matting for high-resolution video (4K or HD).

Supported by National Natural Science Foundation of China (61807002).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Ballas, N., Yao, L., Pal, C., Courville, A.: Delving deeper into convolutional networks for learning video representations. arXiv preprint arXiv:1511.06432 (2015)

  2. Chen, X., et al.: Robust human matting via semantic guidance. In: Wang, L., Gall, J., Chin, T.J., Sato, I., Chellappa, R. (eds.) ACCV 2022. LNCS, vol. 13842, pp. 2984–2999. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-26284-5_37

    Chapter  Google Scholar 

  3. Chen, Y., et al.: Mobile-former: bridging mobilenet and transformer. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5270–5279 (2022)

    Google Scholar 

  4. Chen, Y., Dai, X., Liu, M., Chen, D., Yuan, L., Liu, Z.: Dynamic ReLU. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12364, pp. 351–367. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58529-7_21

    Chapter  Google Scholar 

  5. Dosovitskiy, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)

  6. Erofeev, M., Gitman, Y., Vatolin, D.S., Fedorov, A., Wang, J.: Perceptually motivated benchmark for video matting. In: British Machine Vision Conference, pp. 1–12 (2015)

    Google Scholar 

  7. Graham, B., et al.: Levit: a vision transformer in convnet’s clothing for faster inference. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12259–12269 (2021)

    Google Scholar 

  8. Ke, Z., et al.: Is a green screen really necessary for real-time portrait matting? arXiv preprint arXiv:2011.11961 (2020)

  9. Li, L., Tang, J., Ye, Z., Sheng, B., Mao, L., Ma, L.: Unsupervised face super-resolution via gradient enhancement and semantic guidance. Vis. Comput. (2021)

    Google Scholar 

  10. Li, Y., Fang, L., Ye, L., Yang, X.: Deep video matting with temporal consistency. In: International Forum on Digital TV and Wireless Multimedia Communications, pp. 339–352 (2022)

    Google Scholar 

  11. Lin, S., Ryabtsev, A., Sengupta, S., Curless, B.L., Seitz, S.M., Kemelmacher-Shlizerman, I.: Real-time high-resolution background matting. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8762–8771 (2021)

    Google Scholar 

  12. Lin, S., Yang, L., Saleemi, I., Sengupta, S.: Robust high-resolution video matting with temporal guidance. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 238–247 (2022)

    Google Scholar 

  13. Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48

    Chapter  Google Scholar 

  14. Lin, X., Sun, S., Huang, W., Sheng, B., Li, P., Feng, D.D.: EAPT: efficient attention pyramid transformer for image processing. IEEE Trans. Multimedia (2021)

    Google Scholar 

  15. Park, G., Son, S., Yoo, J., Kim, S., Kwak, N.: Matteformer: transformer-based image matting via prior-tokens. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11696–11706 (2022)

    Google Scholar 

  16. Qiao, Y., et al.: Attention-guided hierarchical structure aggregation for image matting. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13676–13685 (2020)

    Google Scholar 

  17. Rhemann, C., Rother, C., Wang, J., Gelautz, M., Kohli, P., Rott, P.: A perceptually motivated online benchmark for image matting. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1826–1833 (2009)

    Google Scholar 

  18. Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015)

    Google Scholar 

  19. Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.C.: Mobilenetv 2: inverted residuals and linear bottlenecks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4510–4520 (2018)

    Google Scholar 

  20. Sengupta, S., Jayaram, V., Curless, B., Seitz, S.M., Kemelmacher-Shlizerman, I.: background matting: the world is your green screen. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2291–2300 (2020)

    Google Scholar 

  21. Srinivas, A., Lin, T.Y., Parmar, N., Shlens, J., Abbeel, P., Vaswani, A.: Bottleneck transformers for visual recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16519–16529 (2021)

    Google Scholar 

  22. Sun, Y., Tang, C.K., Tai, Y.W.: Semantic image matting. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11120–11129 (2021)

    Google Scholar 

  23. Sun, Y., Wang, G., Gu, Q., Tang, C.K., Tai, Y.W.: Deep video matting via spatio-temporal alignment and aggregation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6975–6984 (2021)

    Google Scholar 

  24. Wu, H., Zheng, S., Zhang, J., Huang, K.: Fast end-to-end trainable guided filter. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1838–1847 (2018)

    Google Scholar 

  25. Xu, N., Price, B., Cohen, S., Huang, T.: Deep image matting. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2970–2979 (2017)

    Google Scholar 

  26. Yao, G., Huang, R.: An image matting algorithm based on inception-resnet-v2 network. In: International conference on Variability of the Sun and Sun-Like Stars: From Asteroseismology to Space Weather, pp. 323–334 (2022)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yue Yu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Yu, Y., Li, D., Yang, Y. (2024). Efficient Semantic-Guidance High-Resolution Video Matting. In: Sheng, B., Bi, L., Kim, J., Magnenat-Thalmann, N., Thalmann, D. (eds) Advances in Computer Graphics. CGI 2023. Lecture Notes in Computer Science, vol 14495. Springer, Cham. https://doi.org/10.1007/978-3-031-50069-5_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-50069-5_13

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-50068-8

  • Online ISBN: 978-3-031-50069-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics