Skip to main content
Log in

An intelligent framework of VVC-based video compression and bit rate reduction using vision transformer-based adaptive residual attention densenet

  • Original Paper
  • Published:
Signal, Image and Video Processing Aims and scope Submit manuscript

Abstract

The Versatile Video Coding (VVC) shows better performance by combining various functions and features for the high dynamic range, and better spatial resolution achieving better bit rate savings than the existing video coding models. Although the VVC maintains better quality compressed video utilizing extra encoding functions, the VVC is still in the operation of continuous enhancement, and the experts are continuously suggesting new technologies to enhance the VVC's coding performance. In addition, the traditional mechanisms utilize more resources and the encoding time to perform the task. Since the conventional models are normally complex and the parameter’s amount is highly large, it is needed to develop a lightweight model for VVC. Hence, a new mechanism is suggested in this work for video compression and bit rate minimization based on VVC by influencing deep learning models. At first, by employing the Motion Vector (MV) encoder-decoder task, the motion is measured in the suggested work. Moreover, with the assistance of this MV, the frame reformation is carried out to conduct the motion compensation. The compression process is performed and the residual images are achieved by adopting the Vision Transformer-based Adaptive Residual Attention DenseNet (ViT-ARADNet), where the parameters included in this network are optimally tuned by the Random Value Enhanced Pelican Optimization (RVEPO). Further, the bit rate of the residual image is determined by the entropy coding in the presented work’s training phase. Subsequently, the video quality assessment metrics such as Visual Information Fidelity (VIF) and predicted Differential Mean Opinion Score (DMOSp) are measured to enrich the model functionality.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Algorithm 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Data availability

The data underlying this article are available in Dataset 1: https://ultravideo.fi/#testsequences, Dataset 2: https://github.com/HEVC-Projects/CPH

References

  1. Zhu, L., Zhang, Y., Li, N., Jiang, G., Kwong, S.: Deep learning-based intra mode derivation for versatile video coding. ACM Trans. Multimed. Comput. Commun. Appl. 19, 1–20 (2023)

    Article  MATH  Google Scholar 

  2. Amna, M., Imen, W., Fatma Ezahra, S.: Fast multi-type tree partitioning for versatile video coding using machine learning. Signal, Image Video Process 17, 67–74 (2023)

    Article  MATH  Google Scholar 

  3. Sun, T., Wang, Y., Huang, Z., Sun, J.: STRANet: soft-target and restriction-aware neural network for efficient VVC intra coding. IEEE Trans. Circuits Syst. Video Technol. 34(11), 11993–12005 (2024)

    Article  MATH  Google Scholar 

  4. Wang, D., Fang, B., Wei, X., Xian, W., Zhou, M., Mao, Q.: Rate control in versatile video coding with cosh rate-distortion model. J. Circuits Syst. Comput. 32(12), 2350210 (2023)

    Article  MATH  Google Scholar 

  5. Yang, H., Chen, H., Chen, J., Esenlik, S., Sethuraman, S., Xiu, X., Alshina, E., Luo, J.: Subblock-based motion derivation and inter prediction refinement in the versatile video coding standard. IEEE Trans. Circuits Syst. Video Technol. 31(10), 3862–3877 (2021)

    Article  Google Scholar 

  6. Zhang, F., Ma, D., Feng, C., Bull, D.R.: Video compression with CNN-based post processing. IEEE Multimedia 28(4), 74–83 (2021)

    Article  MATH  Google Scholar 

  7. Bouaafia, S., Khemiri, R., Messaoud, S., Ben Ahmed, O., Sayadi, F.E.: Deep learning-based video quality enhancement for the new versatile video coding. Neural Comput. Appl. 34(17), 14135–14149 (2022)

    Article  Google Scholar 

  8. Udora, C., Adhuran, J., Fernando, A.: A quality of experience aware framework for versatile video coding based video transmission. IEEE Trans. Consum. Electr. 69(2), 205–216 (2022)

    Article  Google Scholar 

  9. Amestoy, T., Sidaty, N., Hamidouche, W., Philippe, P. and Menard, D., "Video quality assessment and coding complexity of the versatile video coding standard", arXiv preprint, 2023.

  10. Dumitras, A., Haskell, B.G.: A texture replacement method at the encoder for bit-rate reduction of compressed video. IEEE Trans. Circuits Syst. Video Technol. 13(2), 163–175 (2003)

    Article  MATH  Google Scholar 

  11. Kim, B.J., Xiong, Z., Pearlman, W.A.: Low bit-rate scalable video coding with 3-D set partitioning in hierarchical trees (3-D SPIHT). IEEE Trans. Circuits Syst. Video Technol. 10(8), 1374–1387 (2000)

    Article  MATH  Google Scholar 

  12. Rosario, J.M.D., Fox, G.: Constant bit rate network transmission of variable bit rate continuous media in video-on-demand servers. Multimedia Tools Appl. 2, 215–232 (1996)

    Article  MATH  Google Scholar 

  13. Zhang, H., Jung, C., Zou, D. and Li, M., "WCDANN: A lightweight CNN post-processing filter for VVC-based video compression", IEEE Access, 2023.

  14. Das, T., Choi, K., Choi, J.: High quality video frames from VVC: a deep neural network approach. IEEE Access 11, 54254–54264 (2023)

    Article  MATH  Google Scholar 

  15. Lee, J., Bang, G., Kang, J., Teratani, M., Lafruit, G., Choi, H.: Performance analysis of multiview video compression based on MIV and VVC multilayer. ETRI J 46(6), 1075–1089 (2024)

    Article  Google Scholar 

  16. Zhu, S., Chang, Q., Li, Q.: Video saliency aware intelligent HD video compression with the improvement of visual quality and the reduction of coding complexity. Neural Comput. Appl. 34(10), 7955–7974 (2022)

    Article  MATH  Google Scholar 

  17. Hou, W., Mo, Y., Wei, C.: Very low bit-rate video coding based on wavelet transform. J. Shanghai Univ. 2(4), 270–274 (1998)

    Article  MATH  Google Scholar 

  18. Joy, H.K., Kounte, M.R., Chandrasekhar, A., Paul, M.: deep learning based video compression techniques with future research issues. Wireless Pers. Commun. 131(4), 2599–2625 (2023)

    Article  MATH  Google Scholar 

  19. Chen, J., Zhou, J., Yu, S., Xu, J., Zhong, L., Zheng, J.: A very low bit rate video coding combined with fast adaptive block size motion estimation and nonuniform scalar quantization multiwavelet transform. Multimedia Tools Appl. 26, 123–144 (2005)

    Article  Google Scholar 

  20. Silveira, D., Povala, G., Amaral, L., Zatt, B., Agostini, L., Porto, M.: Efficient reference frame compression scheme for video coding systems: algorithm and VLSI design. J. Real-Time Image Proc. 16, 391–411 (2019)

    Article  Google Scholar 

  21. Kim, S., Lee, Y., Yoon, K.: Versatile video coding-based coding tree unit level image compression with dual quantization parameters for hybrid vision. IEEE Access 11, 34498–34509 (2023)

    Article  MATH  Google Scholar 

  22. Huang, X., Zhou, F., Niu, W., Li, T., Lu, Y., Zhou, Y., Yin, H., Yan, C.: Multi-stage affine motion estimation fast algorithm for versatile video coding using decision tree. J Visual Commun Image Represent 96, 103910 (2023)

    Article  Google Scholar 

  23. García-Lucas, D., Cebrián-Márquez, G., Díaz-Honrubia, A.J., Mallikarachchi, T., Cuenca, P.: A fast full partitioning algorithm for HEVC-to-VVC video transcoding using Bayesian classifiers. J Visual Commun Image Represent 94, 103829 (2023)

    Article  Google Scholar 

  24. Song, Y., Zeng, B., Wang, M., Deng, Z.: An efficient low-complexity block partition scheme for VVC intra coding. J Real-Time Image Process 19(1), 161–172 (2022). https://doi.org/10.1007/s11554-021-01174-z

    Article  MATH  Google Scholar 

  25. Karayiannis, N.B., Li, Y.: A replenishment technique for low bit-rate video compression based on wavelets and vector quantization. IEEE Trans. Circuits Syst. Video Technol. 11(5), 658–663 (2001)

    Article  MATH  Google Scholar 

  26. Raufmehr, F., Salehi, M.R., Abiri, E.: A neural network-based video bit-rate control algorithm for variable bit-rate applications of versatile video coding standard. Signal Process Image Commun 96, 116317 (2021)

    Article  Google Scholar 

  27. Abdallah, B., Belghith, F., Ben Ayed, M.A., Masmoudi, N.: Low-complexity QTMT partition based on deep neural network for versatile video coding. SIViP 15(6), 1153–1160 (2021)

    Article  MATH  Google Scholar 

  28. Yi, W., Dong, L., Liu, M., Hui, M., Kong, L., Zhao, Y.: MFAF-Net: image dehazing with multi-level features and adaptive fusion. Vis. Comput. 40, 2293–2307 (2024)

    Article  MATH  Google Scholar 

  29. Yi, W., Dong, L., Liu, M., Zhao, Y., Hui, M., Kong, L.: Gated residual feature attention network for real-time dehazing. Appl. Intell. 52, 17449–17464 (2022)

    Article  MATH  Google Scholar 

  30. Yi, W., Dong, L., Liu, M., Hui, M., Kong, L., Zhao, Y.: Frequency-guidance collaborative triple-branch Network for single image dehazing. Displays 80, 102577 (2023)

    Article  MATH  Google Scholar 

  31. Yi, W., Dong, L., Liu, M., Zhao, Y., Hui, M., Kong, L.: DCNet: dual-cascade network for single image dehazing. Neural Comput. Appl. 34, 16771–16783 (2022)

    Article  MATH  Google Scholar 

  32. Chen, Y., Wang, S., Ip, H., Kwong, S.: Rate distortion optimization with adaptive content modeling for random-access versatile video coding. Inf Sci 645, 119325 (2023)

    Article  Google Scholar 

  33. Darwish, S.M., Almajtomi, A.A.: Metaheuristic-based vector quantization approach: a new paradigm for neural network-based video compression. Multimedia Tools Appl 80, 7367–7396 (2021)

    Article  MATH  Google Scholar 

  34. Lee, J.Y., Choi, Y., Van Le, T., Choi, K.: Efficient feature coding based on performance analysis of versatile video coding (VVC) in video coding for machines (VCM). Multimedia Tools Appl 82, 42803–42816 (2023)

    Article  MATH  Google Scholar 

  35. Tran, Q.N., Yang, S.-H.: Attention-based inter-prediction for versatile video coding. IEEE Access 11, 84313–84322 (2023)

    Article  Google Scholar 

  36. Trojovský, P., Dehghani, M.: Pelican optimization algorithm: a novel nature-inspired algorithm for engineering applications. Sensors 22(3), 855 (2022)

    Article  MATH  Google Scholar 

  37. Lu, G., Ouyang, W., Xu, D., Zhang, X., Cai, C. and Gao, Z., "Dvc: An end-to-end deep video compression framework", In proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 11006–11015, 2019.

  38. Arnab, A., Dehghani, M., Heigold, G., Sun, C., Lučić, M. and Schmid, C., "Vivit: A video vision transformer", Proceedings of the IEEE/CVF international conference on computer vision, pp. 6836–6846, 2021.

  39. Zhang, Y., Tian, Y., Kong, Y., Zhong, B., Fu, Y.: Residual dense network for image restoration. IEEE Trans. Pattern Anal. Mach. Intell. 43(7), 2480–2495 (2020)

    Article  MATH  Google Scholar 

  40. Wang, H., Xu, J., Yan, R., Sun, C., Chen, X.: Intelligent bearing fault diagnosis using multi-head attention-based CNN. Procedia Manufact 49, 112–118 (2020)

    Article  Google Scholar 

  41. Oyelade, O.N., Ezugwu, A.E.S., Mohamed, T.I., Abualigah, L.: Ebola optimization search algorithm: A new nature-inspired metaheuristic optimization algorithm. IEEE Access 10, 16150–16177 (2022)

    Article  MATH  Google Scholar 

  42. Sulaiman, M.H., Mustaffa, Z., Saari, M.M., Daniyal, H.: Barnacles mating optimizer: a new bio-inspired algorithm for solving engineering optimization problems. Eng Appl Artif Intell 87, 103330 (2020)

    Article  MATH  Google Scholar 

  43. Tarkhaneh, O., Alipour, N., Chapnevis, A. and Shen, H., "Golden tortoise beetle optimizer: a novel nature-inspired meta-heuristic algorithm for engineering problems", arXiv preprint, 2021.

  44. Chowdhury, M.M.H., Khatun, A.: Image compression using discrete wavelet transform. Int J Comput Sci Issues (IJCSI) 9(4), 327 (2012)

    Google Scholar 

  45. Wang, X., Su, Y.: Color image encryption based on chaotic compressed sensing and two-dimensional fractional Fourier transform. Sci Rep 10(1), 18556 (2020)

    Article  MATH  Google Scholar 

  46. Hua, Z., Jin, F., Xu, B., Huang, H.: 2D Logistic-Sine-coupling map for image encryption. Signal Process. 149, 148–161 (2018)

    Article  MATH  Google Scholar 

Download references

Acknowledgements

I would like to express my very great appreciation to the co-authors of this manuscript for their valuable and constructive suggestions during the planning and development of this research work.

Funding

This research did not receive any specific funding.

Author information

Authors and Affiliations

Authors

Contributions

All authors have made substantial contributions to conception and design, revising the manuscript, and the final approval of the version to be published. Also, all authors agreed to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.

Corresponding author

Correspondence to D. Padmapriya.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest.

Ethical approval

Not applicable.

Informed consent

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Padmapriya, D., Ameelia Roseline, A. An intelligent framework of VVC-based video compression and bit rate reduction using vision transformer-based adaptive residual attention densenet. SIViP 19, 206 (2025). https://doi.org/10.1007/s11760-024-03764-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11760-024-03764-3

Keywords