Skip to main content
Log in

Video saliency aware intelligent HD video compression with the improvement of visual quality and the reduction of coding complexity

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

The fast development of video technology and hardware has led to a great amount of video applications in the field of industry, such as the video conference, video surveillance and live video streaming. Most of the applications are facing with the problem of limited network bandwidth while try to keep high video resolution. Perceptual video compression addresses the problem by introducing saliency information to reduce the perceptual redundancy, which retains more information in salient region while compresses the non-salient region as much as possible. In this paper, in order to combine the multi-scale information in video saliency, an advanced video saliency detection model SALDPC is proposed which is built based on deformable pyramid convolution decoder and multi-scale temporal recurrence. In order to better guide video coding through saliency, based on the HEVC video coding standard, a saliency-aware rate-distortion optimization algorithm, SRDO, is proposed using block-based saliency information to change the rate-distortion balance and guide more reasonable bit allocation. Furthermore, a more flexible QP selection method, SAQP, is developed using CU's saliency information adaptively to change the QP of CU to ensure the high quality of the high saliency areas. The final results are available in the following three configurations: SRDO, SRDO + SQP, and SRDO + SAQP. Experimental results show that our method achieves a very high video quality improvement while significantly reducing the video encoding time. Compared to HEVC, the BD-EWPSNR of the SRDO method improves 0.703 dB, and the BD-Rate based on EWPSNR saves 20.822%; the BD-EWPSNR of the SRDO + SAQP is improved up to 1.217 dB, and the BD-Rate based on EWPSNR further saves up to 32.41%. At the same time, in terms of compression time, the proposed method saves up to 29.06% compared to HEVC. Experimental results show the superiority of the proposed method in comparison to the state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

References

  1. Hussain T, Muhammad K, Ser JD, Baik SW, De Albuquerque VHC (2020) Intelligent embedded vision for summarization of multiview videos in IIoT. IEEE T Ind Inf 16(4):2592–2602

    Article  Google Scholar 

  2. Muhammad K, Hussain T, Del Ser J, Palade V, De Albuquerque VHC (2020) DeepReS: a deep learning-based video summarization strategy for resource-constrained industrial surveillance scenarios. IEEE T Ind Inf 16(9):5938–5947

    Article  Google Scholar 

  3. Wang MH, Xie WY, Zhang J, Qin J (2020) Industrial applications of ultrahigh definition video coding with an optimized supersample adaptive offset framework. IEEE T Ind Inf 16(12):7613–7623

    Article  Google Scholar 

  4. Sullivan GJ, Ohm JR (2010) Recent developments in standardization of high efficiency video coding (HEVC). Proc SPIE 7798(1):731–739

    Google Scholar 

  5. Wiegand T, Ohm JR, Sullivan GJ, Han WJ, Joshi R (2010) Special section on the joint call for proposals on high efficiency video coding (HEVC) standardization. IEEE Trans Circuits Syst Video Technol 20(12):1661–1666

    Article  Google Scholar 

  6. Zhang J, Kwong STW, Zhao T, Ip HHS (2019) Complexity control in the HEVC intracoding for industrial video applications. IEEE T Ind Inf 15(3):1437–1449

    Article  Google Scholar 

  7. Lee JS, Ebrahimi T (2012) Perceptual video compression: a survey. IEEE J Sel Top Signal Process 6(6):684–697

    Article  Google Scholar 

  8. Itti L, Koch C, Niebur E (1998) A model of saliency-based visual attention for rapid scene analysis. IEEE Trans Pattern Anal Mach Intell 20(11):1254–1259

    Article  Google Scholar 

  9. Schölkopf B, Platt J, Hofmann T (2007) Graph-Based visual saliency. Adv Neural Inf Proc Sys 19:545–552

    Google Scholar 

  10. Hou X, Zhang L (2007) Saliency detection: a spectral residual approach, In IEEE Conferene Computer Vision and Pattern Recognition, Minneapolis, MN, USA, pp. 1–8

  11. Judd T, Durand F, Torralba A (2012) A benchmark of computational models of saliency to predict human fixations, MIT Technical report, Cambridge, MA, USA, Rep. MIT-CSAIL-TR-2012-001

  12. Kümmerer M, Theis L, Bethge M (2014) Deep gaze I: boosting saliency prediction with feature maps trained on imagenet,” arXiv preprint arXiv: 1411.1045

  13. Kummerer M, Wallis TSA, Gatys LA, Bethge M (2017) Understanding low-and high-level contributions to fixation prediction, In Proc. IEEE Int. Conf. Comp. Vision, Venice, Italy, pp. 4789–4798

  14. Huang X, Shen C, Boix X, Zhao Q (2015) Salicon: Reducing the semantic gap in saliency prediction by adapting deep neural networks, In IEEE Int. Conf. Comput. Vision, Santiago, Chile, pp. 262–270

  15. Kruthiventi SSS, Ayush K, Babu RV (2017) Deepfix: a fully convolutional neural network for predicting human eye fixations. IEEE Trans Image Process 26(9):4446–4456

    Article  MathSciNet  Google Scholar 

  16. Pan J, Sayrol E, Giro-i-Nieto X, Mcguinness K, Oconnor N. E. (2016) Shallow and deep convolutional networks for saliency prediction, In Proc. IEEE Conf. Comput. Vision Pattern Recognit., Las Vegas, NV, USA, pp. 598–606

  17. Pan J et al. (2017) Salgan: Visual saliency prediction with generative adversarial networks, arXiv preprint arXiv: 1701.01081

  18. Wang W, Shen J (2018) Deep visual attention prediction. IEEE Trans Image Process 27(5):2368–2378

    Article  MathSciNet  Google Scholar 

  19. Itti L, Dhavale N, Pighin F (2004) Realistic avatar eye and head animation using a neurobiological model of visual attention. Proc SPIE Int Soc Opt Eng 5200:64–78

    Google Scholar 

  20. Itti L, Baldi P (2009) Bayesian surprise attracts human attention. Vision Res 49(10):1295–1306

    Article  Google Scholar 

  21. Guo C, Zhang L (2010) A novel multiresolution spatiotemporal saliency detection model and its applications in image and video compression. IEEE Trans Image Process 19(1):185–198

    Article  MathSciNet  Google Scholar 

  22. Fang Y, Wang Z, Lin W, Fang ZJ (2014) Video saliency incorporating spatiotemporal cues and uncertainty weighting. IEEE Trans Image Process 23(9):3910–3921

    Article  MathSciNet  Google Scholar 

  23. Xu M, Jiang L, Sun XY, Ye ZT, Wang ZL (2017) Learning to detect video saliency with HEVC features. IEEE Trans Image Process 26(1):369–385

    Article  MathSciNet  Google Scholar 

  24. Jiang L, Xu M, Liu T, Qiao ML, Wang ZL (2018) Deepvs: a deep learning based video saliency prediction approach, In Proc. Eur. Conf. Comput. Vision (ECCV), Munich, Germany, pp. 602–617

  25. Wang WG, Shen JB, Guo F, Cheng MM, Borji A (2018) Revisiting video saliency: a large-scale benchmark and a new model, In Proc. IEEE Conf. Comput. Vision Pattern Recogniti., Salt Lake, UTAH, pp. 4894–4903

  26. Wang WG, Shen JB, Xie JW, Cheng MM, Ling HB, Borji A (2021) Revisiting video saliency prediction in the deep learning era. IEEE Trans Pattern Anal Mach Intell 43(1):220–237

    Article  Google Scholar 

  27. Itti L (2004) Automatic foveation for video compression using a neurobiological model of visual attention. IEEE Trans Image Process 13(10):1304–1318

    Article  Google Scholar 

  28. Cavallaro A, Steiger O, Ebrahimi T (2005) Semantic video analysis for adaptive content delivery and automatic description. IEEE Trans Circuits Syst Video Technol 15(10):1200–1209

    Article  Google Scholar 

  29. Zünd F, Pritch Y, Sorkine-Hornung A, Mangold S, Gross T (2013) Content-aware compression using saliency-driven image retargeting, In 2013 IEEE Int. Conf. Image Process., IEEE, Melbourne, VIC, Australia, pp. 1845–1849

  30. Karlsson LS, Sjostrom M, (2005) Improved ROI video coding using variable Gaussian pre-filters and variance in intensity,” In IEEE Int. Conf. Image Process., IEEE, Genova, Italy, pp. 13–16

  31. Li ZC, Qin SY, Itti L (2011) Visual attention guided bit allocation in video compression. Image Vision Comput 29(1):1–14

    Article  Google Scholar 

  32. Gupta R, Khanna MT, Chaudhury S (2013) Visual saliency guided video compression algorithm. Signal Process Image Commun 28(9):1006–1022

    Article  Google Scholar 

  33. Hadizadeh H, Bajić IV (2014) Saliency-aware video compression. IEEE Trans Image Process 23(1):19–33

    Article  MathSciNet  Google Scholar 

  34. Xu M, Deng X, Li SX, Wang ZL (2014) Region-of-interest based conversational HEVC coding with hierarchical perception model of face. IEEE J Sel Top Signal Process 8(3):475–489

    Article  Google Scholar 

  35. Li S, Xu M, Deng X, Wang Z (2015) Weight-based R-λ rate control for perceptual HEVC coding on conversational videos. Signal Process Image Commun 38:127–140

    Article  Google Scholar 

  36. Xiang G et al (2018) A perceptually temporal adaptive quantization algorithm for HEVC. J Visual Commun Image Represent 50:280–289

    Article  Google Scholar 

  37. Xiao J et al (2017) A sensitive object-oriented approach to big surveillance data compression for social security applications in smart cities. Software Pract Exper 47(8):1061–1080

    Article  Google Scholar 

  38. Zhu SP, Xu ZY (2018) Spatiotemporal visual saliency guided perceptual high efficiency video coding with neural network. Neurocomputing 275:511–522

    Article  Google Scholar 

  39. He KM, Zhang XY, Ren SQ, Sun J (2016) Deep residual learning for image recognition, In 2016 IEEE Conf. Comput. Vision Pattern Recognit. (CVPR), Las Vegas, NV, pp. 770–778

  40. Gao SH, Cheng MM, Zhao K, Zhang XY, Yang MH, Torr PHS (2021) Res2net: a new multi-scale backbone architecture. IEEE Trans Pattern Anal Mach Intell 43(2):652–662

    Article  Google Scholar 

  41. Linardos P, Mohedano E, Nieto JJ, O’Connor N (2019) Simple vs complex temporal recurrences for video saliency prediction, arXiv preprint arXiv: 1907.01869

  42. Lin TY, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection, arXiv preprint arXiv: 1612.03144, pp. 2117–2125

  43. Dai JF et al. (2017) Deformable convolutional networks, arXiv preprint arXiv: 1703.06211, pp. 764–773

  44. Zhu SP, Liu C, Xu ZY (2020) High-Definition video compression system based on perception guidance of salient information of a convolutional neural network and HEVC compression domain. IEEE Trans Circuits Syst Video Technol 30(7):1946–1959

    Google Scholar 

  45. Sun XB, Yang XF, Wang SK, Liu M (2020) Content-aware rate control scheme for hevc based on static and dynamic saliency detection. Neurocomputing 411:393–405

    Article  Google Scholar 

  46. Khatoonabadi SH, Vasconcelos N, Bajić IV, Shan Y, “How many bits does it take for a stimulus to be salient?,” In 2015 IEEE Proc. IEEE Conf. Comput. Vision Pattern Recognit., Boston, MA, 2015, pp. 5501–5510

  47. Leborán V, García-Díaz A, Fdez-Vidal XR, Pardo XM (2017) Dynamic whitening saliency. IEEE Trans Pattern Anal Mach Intell 39(5):893–907

    Article  Google Scholar 

  48. Hadizadeh H, Enriquez MJ, Bajic IV (2012) Eye-tracking database for a set of standard video sequences. IEEE Trans Image Process 21(2):898–903

    Article  MathSciNet  Google Scholar 

  49. Bylinskii Z, Judd T, Oliva A, Torralba A, Durand F (2019) What do different evaluation metrics tell us about saliency models? IEEE Trans Pattern Anal Mach Intell 41(3):740–757

    Article  Google Scholar 

  50. Li Z, Aaron A, Katsavounidis I, Moorthy A, Manohara M (2018) Toward a practical perceptual video quality metric, Netflix Technol. Blog, vol. 62, no. 2, Jun. 2016. Accessed: Aug. 21, 2018. [Online] Available: https://netflixtechblog.com/toward-a-practical-perceptual-video-quality-metric-653f208b9652

Download references

Acknowledgements

This work is supported by the National Key Technologies R&D Program of China (Grant No. 2016YFB0500505), National Natural Science Foundation of China (NSFC) under grants No. 61375025, 61075011 and 60675018, and also the Scientific Research Foundation for the Returned Overseas Chinese Scholars from the State Education Ministry of China.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shiping Zhu.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhu, S., Chang, Q. & Li, Q. Video saliency aware intelligent HD video compression with the improvement of visual quality and the reduction of coding complexity. Neural Comput & Applic 34, 7955–7974 (2022). https://doi.org/10.1007/s00521-022-06895-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-022-06895-1

Keywords

Navigation