Skip to main content
Log in

End-to-end video compression for surveillance and conference videos

  • 1221: Deep Learning for Image/Video Compression and Visual Quality Assessment
  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

The storage and transmission tasks of surveillance and conference videos are an important branch of video compression. Since surveillance and conference videos have strong inter-frame correlation, considerable continuity at the image level and motion level between the consecutive frames exists. However, traditional video codec networks cannot fully use the characteristics of surveillance and conference videos during compression. Therefore, based on the DVC video codec framework, we propose a “MV residual + MV optimization” coding strategy for surveillance and conference videos to further reduce the compression rate and improve the quality of compressed video frames. During the testing stage, the online update strategy is promoted, which adapts the network’s parameters to different surveillance and conference videos. Our contribution is to propose an optical flow residual coding method for videos with strong inter-frame correlation, implement optical flow optimization at decoding end and online update strategy at the encoding end. Experiments show that our method can outperform DVC framework, especially on CUHK Square surveillance video with 1.2dB improvement.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Agustsson E, Mentzer F, Tschannen M, Cavigelli L, Timofte R, Benini L, Gool LV (2017) Soft-to-hard vector quantization for end-to-end learning compressible representations. In: Guyon I, Luxburg UV, Bengio S, Wallach H, Fergus R, Vishwanathan S, Garnett R. (eds) Advances in neural information processing systems, vol 30. Curran Associates, Inc

  2. Alam MM, Nguyen TD, Hagan MT, Chandler DM (2015) A perceptual quantization strategy for HEVC based on a convolutional neural network trained on natural images. In: Applications of digital image processing XXXVIII, vol 9599, p 959918. International Society for Optics and Photonics

  3. Alexandre D, Hang HM (2020) Learned video codec with enriched reconstruction for clic p-frame coding. arXiv:2012.07462

  4. Ballé J, Laparra V, Simoncelli EP (2017) End-to-end optimized image compression. In: International conference on learning representations

  5. Ballé J, Minnen D, Singh S, Hwang SJ, Johnston N (2018) Variational image compression with a scale hyperprior. In: International conference on learning representations

  6. Bellard F BPG image format (http://bellard.org/bpg/), Accessed 30 Jan 2017

  7. Cui W, Zhang T, Zhang S, Jiang F, Zuo W, Zhao D (2018) Convolutional neural networks based intra prediction for HEVC, pp 436–436

  8. Djelouah A, Campos J, Schaub-Meyer S, Schroers C (2019) Neural inter-frame compression for video coding. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 6421–6429

  9. Hu Z, Lu G, Xu D (2021) FVC: a new framework towards deep video compression in feature space. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1502–1511

  10. Huo S, Liu D, Wu F, Li H (2018) Convolutional neural network-based motion compensation refinement for video coding. In: 2018 IEEE International symposium on circuits and systems (ISCAS), pp 1–4

  11. Index CVN (2016) Forecast and methodology, 2015–2020. White paper, 1–41

  12. Johnston N, Vincent D, Minnen D, Covell M, Singh S, Chinen T, Hwang SJ, Shor J, Toderici G (2018) Improved lossy image compression with priming and spatially adaptive bit rates for recurrent networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4385–4393

  13. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Adv Neur Inform Process Syst 25:1097–1105

    Google Scholar 

  14. LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324

    Article  Google Scholar 

  15. Li J, Li B, Xu J, Xiong R, Gao W (2018) Fully connected network-based intra prediction for image coding. IEEE Trans Image Process 27(7):3236–3247

    Article  MathSciNet  MATH  Google Scholar 

  16. Lin J, Liu D, Li H, Wu F (2020) M-LVC: multiple frames prediction for learned video compression. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3546–3554

  17. Lu G, Ouyang W, Xu D, Zhang X, Cai C, Gao Z (2019) DVC: an end-to-end deep video compression framework. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11006–11015

  18. Lu G, Cai C, Zhang X, Chen L, Ouyang W, Xu D, Gao Z (2020) Content adaptive and error propagation aware deep video compression. In: European conference on computer vision, pp 456–472. Springer

  19. Marpe D, Schwarz H, Wiegand T (2003) Context-based adaptive binary arithmetic coding in the h. 264/avc video compression standard. IEEE Trans Circ Syst Video Technol 13(7):620–636

    Article  Google Scholar 

  20. Minnen D, Ballé J, Toderici G (2018) Joint autoregressive and hierarchical priors for learned image compression. In: Bengio S, Wallach H, Larochelle H, Grauman K, Cesa-Bianchi N, Garnett R (eds) Advances in neural information processing systems, vol 31. Curran Associates, Inc

  21. Pellegrini S, Ess A, Schindler K, Van Gool L (2009) You’ll never walk alone: modeling social behavior for multi-target tracking. In: 2009 IEEE 12th International conference on computer vision, pp 261– 268

  22. Ranjan A, Black MJ (2017) Optical flow estimation using a spatial pyramid network. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, pp 4161–4170

  23. Reda FA, Liu G, Shih KJ, Kirby R, Barker J, Tarjan D, Tao A, Catanzaro B (2018) Sdc-net: video prediction using spatially-displaced convolution. In: Proceedings of the European conference on computer vision (ECCV), pp 718–733

  24. Sengar SS, Mukhopadhyay S (2020) Motion segmentation-based surveillance video compression using adaptive particle swarm optimization. Neural Comput Applic 32(15):11443–11457

    Article  Google Scholar 

  25. Skodras A, Christopoulos C, Ebrahimi T (2001) The jpeg 2000 still image compression standard. IEEE Signal Process Mag 18(5):36–58

    Article  MATH  Google Scholar 

  26. Song R, Liu D, Li H, Wu F (2017) Neural network-based arithmetic coding of intra prediction modes in HEVC. In: 2017 IEEE Visual communications and image processing (VCIP), pp 1–4

  27. Song X, Chen Y, Feng ZH, Hu G, Yu DJ, Wu XJ (2020) SP-GAN: self-growing and pruning generative adversarial networks. IEEE Trans Neural Netw Learn Syst 32(6):2458–2469

    Article  Google Scholar 

  28. Sullivan GJ, Ohm JR, Han WJ, Wiegand T (2012) Overview of the high efficiency video coding (HEVC) standard. IEEE Trans Circ Syst Video Technol 22(12):1649–1668

    Article  Google Scholar 

  29. Theis L, Shi W, Cunningham A, Huszár F (2017) Lossy image compression with compressive autoencoders. In: International conference on learning representations

  30. Toderici G, O’Malley SM, Hwang SJ, Vincent D, Minnen D, Baluja S, Covell M, Sukthankar R (2016) Variable rate image compression with recurrent neural networks. In: International conference on learning representations

  31. Toderici G, Vincent D, Johnston N, Jin Hwang S, Minnen D, Shor J, Covell M (2017) Full resolution image compression with recurrent neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5306–5314

  32. Wallace GK (1992) The jpeg still picture compression standard. IEEE Trans Consum Electron 38(1):xviii–xxxiv

    Article  Google Scholar 

  33. Wang M, Li W, Wang X (2012) Transferring a generic pedestrian detector towards specific scenes. In: 2012 IEEE Conference on computer vision and pattern recognition, pp 3274–3281

  34. Wu CY, Singhal N, Krahenbuhl P (2018) Video compression through image interpolation. In: Proceedings of the European conference on computer vision (ECCV), pp 416–431

  35. Wu Y, He T, Chen Z (2020) Memorize, then recall: a generative framework for low bit-rate surveillance video compression. In: 2020 IEEE International symposium on circuits and systems (ISCAS), pp 1–5

  36. Wu L, Huang K, Shen H, Gao L (2021) Foreground-background parallel compression with residual encoding for surveillance video. IEEE Trans Circuits Syst Video Technol 31(7):2711–2724

    Article  Google Scholar 

  37. Xue T, Chen B, Wu J, Wei D, Freeman WT (2019) Video enhancement with task-oriented flow. Int J Comput Vis 127(8):1106–1125

    Article  Google Scholar 

  38. Yan N, Liu D, Li H, Li B, Li L, Wu F (2018) Convolutional neural network-based fractional-pixel motion compensation. IEEE Trans Circuits Syst Video Technol 29(3):840–853

    Article  Google Scholar 

  39. Yang R, Xu M, Wang Z, Li T (2018) Multi-frame quality enhancement for compressed video. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6664–6673

  40. Zhao L, Wang S, Wang S, Ye Y, Ma S, Gao W (2021) Enhanced surveillance video compression with dual reference frames generation. IEEE Trans Circuits Syst Video Technol, 1–1

Download references

Acknowledgements

This work was supported in part by National Key R&D Program of China (2018YFE0203900), National Natural Science Foundation of China (61773093), Sichuan Science and Technology Program (2020YFG0476) and Important Science and Technology Innovation Projects in Chengdu (2018-YF08-00039-GX).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yu Zhao.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, S., Zhao, Y., Gao, H. et al. End-to-end video compression for surveillance and conference videos. Multimed Tools Appl 81, 42713–42730 (2022). https://doi.org/10.1007/s11042-022-13484-w

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-022-13484-w

Keywords

Navigation