Abstract
The Versatile Video Coding (VVC) shows better performance by combining various functions and features for the high dynamic range, and better spatial resolution achieving better bit rate savings than the existing video coding models. Although the VVC maintains better quality compressed video utilizing extra encoding functions, the VVC is still in the operation of continuous enhancement, and the experts are continuously suggesting new technologies to enhance the VVC's coding performance. In addition, the traditional mechanisms utilize more resources and the encoding time to perform the task. Since the conventional models are normally complex and the parameter’s amount is highly large, it is needed to develop a lightweight model for VVC. Hence, a new mechanism is suggested in this work for video compression and bit rate minimization based on VVC by influencing deep learning models. At first, by employing the Motion Vector (MV) encoder-decoder task, the motion is measured in the suggested work. Moreover, with the assistance of this MV, the frame reformation is carried out to conduct the motion compensation. The compression process is performed and the residual images are achieved by adopting the Vision Transformer-based Adaptive Residual Attention DenseNet (ViT-ARADNet), where the parameters included in this network are optimally tuned by the Random Value Enhanced Pelican Optimization (RVEPO). Further, the bit rate of the residual image is determined by the entropy coding in the presented work’s training phase. Subsequently, the video quality assessment metrics such as Visual Information Fidelity (VIF) and predicted Differential Mean Opinion Score (DMOSp) are measured to enrich the model functionality.












Similar content being viewed by others
Data availability
The data underlying this article are available in Dataset 1: https://ultravideo.fi/#testsequences, Dataset 2: https://github.com/HEVC-Projects/CPH
References
Zhu, L., Zhang, Y., Li, N., Jiang, G., Kwong, S.: Deep learning-based intra mode derivation for versatile video coding. ACM Trans. Multimed. Comput. Commun. Appl. 19, 1–20 (2023)
Amna, M., Imen, W., Fatma Ezahra, S.: Fast multi-type tree partitioning for versatile video coding using machine learning. Signal, Image Video Process 17, 67–74 (2023)
Sun, T., Wang, Y., Huang, Z., Sun, J.: STRANet: soft-target and restriction-aware neural network for efficient VVC intra coding. IEEE Trans. Circuits Syst. Video Technol. 34(11), 11993–12005 (2024)
Wang, D., Fang, B., Wei, X., Xian, W., Zhou, M., Mao, Q.: Rate control in versatile video coding with cosh rate-distortion model. J. Circuits Syst. Comput. 32(12), 2350210 (2023)
Yang, H., Chen, H., Chen, J., Esenlik, S., Sethuraman, S., Xiu, X., Alshina, E., Luo, J.: Subblock-based motion derivation and inter prediction refinement in the versatile video coding standard. IEEE Trans. Circuits Syst. Video Technol. 31(10), 3862–3877 (2021)
Zhang, F., Ma, D., Feng, C., Bull, D.R.: Video compression with CNN-based post processing. IEEE Multimedia 28(4), 74–83 (2021)
Bouaafia, S., Khemiri, R., Messaoud, S., Ben Ahmed, O., Sayadi, F.E.: Deep learning-based video quality enhancement for the new versatile video coding. Neural Comput. Appl. 34(17), 14135–14149 (2022)
Udora, C., Adhuran, J., Fernando, A.: A quality of experience aware framework for versatile video coding based video transmission. IEEE Trans. Consum. Electr. 69(2), 205–216 (2022)
Amestoy, T., Sidaty, N., Hamidouche, W., Philippe, P. and Menard, D., "Video quality assessment and coding complexity of the versatile video coding standard", arXiv preprint, 2023.
Dumitras, A., Haskell, B.G.: A texture replacement method at the encoder for bit-rate reduction of compressed video. IEEE Trans. Circuits Syst. Video Technol. 13(2), 163–175 (2003)
Kim, B.J., Xiong, Z., Pearlman, W.A.: Low bit-rate scalable video coding with 3-D set partitioning in hierarchical trees (3-D SPIHT). IEEE Trans. Circuits Syst. Video Technol. 10(8), 1374–1387 (2000)
Rosario, J.M.D., Fox, G.: Constant bit rate network transmission of variable bit rate continuous media in video-on-demand servers. Multimedia Tools Appl. 2, 215–232 (1996)
Zhang, H., Jung, C., Zou, D. and Li, M., "WCDANN: A lightweight CNN post-processing filter for VVC-based video compression", IEEE Access, 2023.
Das, T., Choi, K., Choi, J.: High quality video frames from VVC: a deep neural network approach. IEEE Access 11, 54254–54264 (2023)
Lee, J., Bang, G., Kang, J., Teratani, M., Lafruit, G., Choi, H.: Performance analysis of multiview video compression based on MIV and VVC multilayer. ETRI J 46(6), 1075–1089 (2024)
Zhu, S., Chang, Q., Li, Q.: Video saliency aware intelligent HD video compression with the improvement of visual quality and the reduction of coding complexity. Neural Comput. Appl. 34(10), 7955–7974 (2022)
Hou, W., Mo, Y., Wei, C.: Very low bit-rate video coding based on wavelet transform. J. Shanghai Univ. 2(4), 270–274 (1998)
Joy, H.K., Kounte, M.R., Chandrasekhar, A., Paul, M.: deep learning based video compression techniques with future research issues. Wireless Pers. Commun. 131(4), 2599–2625 (2023)
Chen, J., Zhou, J., Yu, S., Xu, J., Zhong, L., Zheng, J.: A very low bit rate video coding combined with fast adaptive block size motion estimation and nonuniform scalar quantization multiwavelet transform. Multimedia Tools Appl. 26, 123–144 (2005)
Silveira, D., Povala, G., Amaral, L., Zatt, B., Agostini, L., Porto, M.: Efficient reference frame compression scheme for video coding systems: algorithm and VLSI design. J. Real-Time Image Proc. 16, 391–411 (2019)
Kim, S., Lee, Y., Yoon, K.: Versatile video coding-based coding tree unit level image compression with dual quantization parameters for hybrid vision. IEEE Access 11, 34498–34509 (2023)
Huang, X., Zhou, F., Niu, W., Li, T., Lu, Y., Zhou, Y., Yin, H., Yan, C.: Multi-stage affine motion estimation fast algorithm for versatile video coding using decision tree. J Visual Commun Image Represent 96, 103910 (2023)
García-Lucas, D., Cebrián-Márquez, G., Díaz-Honrubia, A.J., Mallikarachchi, T., Cuenca, P.: A fast full partitioning algorithm for HEVC-to-VVC video transcoding using Bayesian classifiers. J Visual Commun Image Represent 94, 103829 (2023)
Song, Y., Zeng, B., Wang, M., Deng, Z.: An efficient low-complexity block partition scheme for VVC intra coding. J Real-Time Image Process 19(1), 161–172 (2022). https://doi.org/10.1007/s11554-021-01174-z
Karayiannis, N.B., Li, Y.: A replenishment technique for low bit-rate video compression based on wavelets and vector quantization. IEEE Trans. Circuits Syst. Video Technol. 11(5), 658–663 (2001)
Raufmehr, F., Salehi, M.R., Abiri, E.: A neural network-based video bit-rate control algorithm for variable bit-rate applications of versatile video coding standard. Signal Process Image Commun 96, 116317 (2021)
Abdallah, B., Belghith, F., Ben Ayed, M.A., Masmoudi, N.: Low-complexity QTMT partition based on deep neural network for versatile video coding. SIViP 15(6), 1153–1160 (2021)
Yi, W., Dong, L., Liu, M., Hui, M., Kong, L., Zhao, Y.: MFAF-Net: image dehazing with multi-level features and adaptive fusion. Vis. Comput. 40, 2293–2307 (2024)
Yi, W., Dong, L., Liu, M., Zhao, Y., Hui, M., Kong, L.: Gated residual feature attention network for real-time dehazing. Appl. Intell. 52, 17449–17464 (2022)
Yi, W., Dong, L., Liu, M., Hui, M., Kong, L., Zhao, Y.: Frequency-guidance collaborative triple-branch Network for single image dehazing. Displays 80, 102577 (2023)
Yi, W., Dong, L., Liu, M., Zhao, Y., Hui, M., Kong, L.: DCNet: dual-cascade network for single image dehazing. Neural Comput. Appl. 34, 16771–16783 (2022)
Chen, Y., Wang, S., Ip, H., Kwong, S.: Rate distortion optimization with adaptive content modeling for random-access versatile video coding. Inf Sci 645, 119325 (2023)
Darwish, S.M., Almajtomi, A.A.: Metaheuristic-based vector quantization approach: a new paradigm for neural network-based video compression. Multimedia Tools Appl 80, 7367–7396 (2021)
Lee, J.Y., Choi, Y., Van Le, T., Choi, K.: Efficient feature coding based on performance analysis of versatile video coding (VVC) in video coding for machines (VCM). Multimedia Tools Appl 82, 42803–42816 (2023)
Tran, Q.N., Yang, S.-H.: Attention-based inter-prediction for versatile video coding. IEEE Access 11, 84313–84322 (2023)
Trojovský, P., Dehghani, M.: Pelican optimization algorithm: a novel nature-inspired algorithm for engineering applications. Sensors 22(3), 855 (2022)
Lu, G., Ouyang, W., Xu, D., Zhang, X., Cai, C. and Gao, Z., "Dvc: An end-to-end deep video compression framework", In proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 11006–11015, 2019.
Arnab, A., Dehghani, M., Heigold, G., Sun, C., Lučić, M. and Schmid, C., "Vivit: A video vision transformer", Proceedings of the IEEE/CVF international conference on computer vision, pp. 6836–6846, 2021.
Zhang, Y., Tian, Y., Kong, Y., Zhong, B., Fu, Y.: Residual dense network for image restoration. IEEE Trans. Pattern Anal. Mach. Intell. 43(7), 2480–2495 (2020)
Wang, H., Xu, J., Yan, R., Sun, C., Chen, X.: Intelligent bearing fault diagnosis using multi-head attention-based CNN. Procedia Manufact 49, 112–118 (2020)
Oyelade, O.N., Ezugwu, A.E.S., Mohamed, T.I., Abualigah, L.: Ebola optimization search algorithm: A new nature-inspired metaheuristic optimization algorithm. IEEE Access 10, 16150–16177 (2022)
Sulaiman, M.H., Mustaffa, Z., Saari, M.M., Daniyal, H.: Barnacles mating optimizer: a new bio-inspired algorithm for solving engineering optimization problems. Eng Appl Artif Intell 87, 103330 (2020)
Tarkhaneh, O., Alipour, N., Chapnevis, A. and Shen, H., "Golden tortoise beetle optimizer: a novel nature-inspired meta-heuristic algorithm for engineering problems", arXiv preprint, 2021.
Chowdhury, M.M.H., Khatun, A.: Image compression using discrete wavelet transform. Int J Comput Sci Issues (IJCSI) 9(4), 327 (2012)
Wang, X., Su, Y.: Color image encryption based on chaotic compressed sensing and two-dimensional fractional Fourier transform. Sci Rep 10(1), 18556 (2020)
Hua, Z., Jin, F., Xu, B., Huang, H.: 2D Logistic-Sine-coupling map for image encryption. Signal Process. 149, 148–161 (2018)
Acknowledgements
I would like to express my very great appreciation to the co-authors of this manuscript for their valuable and constructive suggestions during the planning and development of this research work.
Funding
This research did not receive any specific funding.
Author information
Authors and Affiliations
Contributions
All authors have made substantial contributions to conception and design, revising the manuscript, and the final approval of the version to be published. Also, all authors agreed to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no conflict of interest.
Ethical approval
Not applicable.
Informed consent
Not applicable.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Padmapriya, D., Ameelia Roseline, A. An intelligent framework of VVC-based video compression and bit rate reduction using vision transformer-based adaptive residual attention densenet. SIViP 19, 206 (2025). https://doi.org/10.1007/s11760-024-03764-3
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11760-024-03764-3