Abstract
The storage and transmission tasks of surveillance and conference videos are an important branch of video compression. Since surveillance and conference videos have strong inter-frame correlation, considerable continuity at the image level and motion level between the consecutive frames exists. However, traditional video codec networks cannot fully use the characteristics of surveillance and conference videos during compression. Therefore, based on the DVC video codec framework, we propose a “MV residual + MV optimization” coding strategy for surveillance and conference videos to further reduce the compression rate and improve the quality of compressed video frames. During the testing stage, the online update strategy is promoted, which adapts the network’s parameters to different surveillance and conference videos. Our contribution is to propose an optical flow residual coding method for videos with strong inter-frame correlation, implement optical flow optimization at decoding end and online update strategy at the encoding end. Experiments show that our method can outperform DVC framework, especially on CUHK Square surveillance video with 1.2dB improvement.
Similar content being viewed by others
References
Agustsson E, Mentzer F, Tschannen M, Cavigelli L, Timofte R, Benini L, Gool LV (2017) Soft-to-hard vector quantization for end-to-end learning compressible representations. In: Guyon I, Luxburg UV, Bengio S, Wallach H, Fergus R, Vishwanathan S, Garnett R. (eds) Advances in neural information processing systems, vol 30. Curran Associates, Inc
Alam MM, Nguyen TD, Hagan MT, Chandler DM (2015) A perceptual quantization strategy for HEVC based on a convolutional neural network trained on natural images. In: Applications of digital image processing XXXVIII, vol 9599, p 959918. International Society for Optics and Photonics
Alexandre D, Hang HM (2020) Learned video codec with enriched reconstruction for clic p-frame coding. arXiv:2012.07462
Ballé J, Laparra V, Simoncelli EP (2017) End-to-end optimized image compression. In: International conference on learning representations
Ballé J, Minnen D, Singh S, Hwang SJ, Johnston N (2018) Variational image compression with a scale hyperprior. In: International conference on learning representations
Bellard F BPG image format (http://bellard.org/bpg/), Accessed 30 Jan 2017
Cui W, Zhang T, Zhang S, Jiang F, Zuo W, Zhao D (2018) Convolutional neural networks based intra prediction for HEVC, pp 436–436
Djelouah A, Campos J, Schaub-Meyer S, Schroers C (2019) Neural inter-frame compression for video coding. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 6421–6429
Hu Z, Lu G, Xu D (2021) FVC: a new framework towards deep video compression in feature space. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1502–1511
Huo S, Liu D, Wu F, Li H (2018) Convolutional neural network-based motion compensation refinement for video coding. In: 2018 IEEE International symposium on circuits and systems (ISCAS), pp 1–4
Index CVN (2016) Forecast and methodology, 2015–2020. White paper, 1–41
Johnston N, Vincent D, Minnen D, Covell M, Singh S, Chinen T, Hwang SJ, Shor J, Toderici G (2018) Improved lossy image compression with priming and spatially adaptive bit rates for recurrent networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4385–4393
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Adv Neur Inform Process Syst 25:1097–1105
LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324
Li J, Li B, Xu J, Xiong R, Gao W (2018) Fully connected network-based intra prediction for image coding. IEEE Trans Image Process 27(7):3236–3247
Lin J, Liu D, Li H, Wu F (2020) M-LVC: multiple frames prediction for learned video compression. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3546–3554
Lu G, Ouyang W, Xu D, Zhang X, Cai C, Gao Z (2019) DVC: an end-to-end deep video compression framework. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11006–11015
Lu G, Cai C, Zhang X, Chen L, Ouyang W, Xu D, Gao Z (2020) Content adaptive and error propagation aware deep video compression. In: European conference on computer vision, pp 456–472. Springer
Marpe D, Schwarz H, Wiegand T (2003) Context-based adaptive binary arithmetic coding in the h. 264/avc video compression standard. IEEE Trans Circ Syst Video Technol 13(7):620–636
Minnen D, Ballé J, Toderici G (2018) Joint autoregressive and hierarchical priors for learned image compression. In: Bengio S, Wallach H, Larochelle H, Grauman K, Cesa-Bianchi N, Garnett R (eds) Advances in neural information processing systems, vol 31. Curran Associates, Inc
Pellegrini S, Ess A, Schindler K, Van Gool L (2009) You’ll never walk alone: modeling social behavior for multi-target tracking. In: 2009 IEEE 12th International conference on computer vision, pp 261– 268
Ranjan A, Black MJ (2017) Optical flow estimation using a spatial pyramid network. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, pp 4161–4170
Reda FA, Liu G, Shih KJ, Kirby R, Barker J, Tarjan D, Tao A, Catanzaro B (2018) Sdc-net: video prediction using spatially-displaced convolution. In: Proceedings of the European conference on computer vision (ECCV), pp 718–733
Sengar SS, Mukhopadhyay S (2020) Motion segmentation-based surveillance video compression using adaptive particle swarm optimization. Neural Comput Applic 32(15):11443–11457
Skodras A, Christopoulos C, Ebrahimi T (2001) The jpeg 2000 still image compression standard. IEEE Signal Process Mag 18(5):36–58
Song R, Liu D, Li H, Wu F (2017) Neural network-based arithmetic coding of intra prediction modes in HEVC. In: 2017 IEEE Visual communications and image processing (VCIP), pp 1–4
Song X, Chen Y, Feng ZH, Hu G, Yu DJ, Wu XJ (2020) SP-GAN: self-growing and pruning generative adversarial networks. IEEE Trans Neural Netw Learn Syst 32(6):2458–2469
Sullivan GJ, Ohm JR, Han WJ, Wiegand T (2012) Overview of the high efficiency video coding (HEVC) standard. IEEE Trans Circ Syst Video Technol 22(12):1649–1668
Theis L, Shi W, Cunningham A, Huszár F (2017) Lossy image compression with compressive autoencoders. In: International conference on learning representations
Toderici G, O’Malley SM, Hwang SJ, Vincent D, Minnen D, Baluja S, Covell M, Sukthankar R (2016) Variable rate image compression with recurrent neural networks. In: International conference on learning representations
Toderici G, Vincent D, Johnston N, Jin Hwang S, Minnen D, Shor J, Covell M (2017) Full resolution image compression with recurrent neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5306–5314
Wallace GK (1992) The jpeg still picture compression standard. IEEE Trans Consum Electron 38(1):xviii–xxxiv
Wang M, Li W, Wang X (2012) Transferring a generic pedestrian detector towards specific scenes. In: 2012 IEEE Conference on computer vision and pattern recognition, pp 3274–3281
Wu CY, Singhal N, Krahenbuhl P (2018) Video compression through image interpolation. In: Proceedings of the European conference on computer vision (ECCV), pp 416–431
Wu Y, He T, Chen Z (2020) Memorize, then recall: a generative framework for low bit-rate surveillance video compression. In: 2020 IEEE International symposium on circuits and systems (ISCAS), pp 1–5
Wu L, Huang K, Shen H, Gao L (2021) Foreground-background parallel compression with residual encoding for surveillance video. IEEE Trans Circuits Syst Video Technol 31(7):2711–2724
Xue T, Chen B, Wu J, Wei D, Freeman WT (2019) Video enhancement with task-oriented flow. Int J Comput Vis 127(8):1106–1125
Yan N, Liu D, Li H, Li B, Li L, Wu F (2018) Convolutional neural network-based fractional-pixel motion compensation. IEEE Trans Circuits Syst Video Technol 29(3):840–853
Yang R, Xu M, Wang Z, Li T (2018) Multi-frame quality enhancement for compressed video. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6664–6673
Zhao L, Wang S, Wang S, Ye Y, Ma S, Gao W (2021) Enhanced surveillance video compression with dual reference frames generation. IEEE Trans Circuits Syst Video Technol, 1–1
Acknowledgements
This work was supported in part by National Key R&D Program of China (2018YFE0203900), National Natural Science Foundation of China (61773093), Sichuan Science and Technology Program (2020YFG0476) and Important Science and Technology Innovation Projects in Chengdu (2018-YF08-00039-GX).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Wang, S., Zhao, Y., Gao, H. et al. End-to-end video compression for surveillance and conference videos. Multimed Tools Appl 81, 42713–42730 (2022). https://doi.org/10.1007/s11042-022-13484-w
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-022-13484-w