An intelligent framework of VVC-based video compression and bit rate reduction using vision transformer-based adaptive residual attention densenet

Padmapriya, D.; Ameelia Roseline, A.

doi:10.1007/s11760-024-03764-3

An intelligent framework of VVC-based video compression and bit rate reduction using vision transformer-based adaptive residual attention densenet

Original Paper
Published: 17 January 2025

Volume 19, article number 206, (2025)
Cite this article

Signal, Image and Video Processing Aims and scope Submit manuscript

D. Padmapriya^1,2 &
A. Ameelia Roseline²

152 Accesses
Explore all metrics

Abstract

The Versatile Video Coding (VVC) shows better performance by combining various functions and features for the high dynamic range, and better spatial resolution achieving better bit rate savings than the existing video coding models. Although the VVC maintains better quality compressed video utilizing extra encoding functions, the VVC is still in the operation of continuous enhancement, and the experts are continuously suggesting new technologies to enhance the VVC's coding performance. In addition, the traditional mechanisms utilize more resources and the encoding time to perform the task. Since the conventional models are normally complex and the parameter’s amount is highly large, it is needed to develop a lightweight model for VVC. Hence, a new mechanism is suggested in this work for video compression and bit rate minimization based on VVC by influencing deep learning models. At first, by employing the Motion Vector (MV) encoder-decoder task, the motion is measured in the suggested work. Moreover, with the assistance of this MV, the frame reformation is carried out to conduct the motion compensation. The compression process is performed and the residual images are achieved by adopting the Vision Transformer-based Adaptive Residual Attention DenseNet (ViT-ARADNet), where the parameters included in this network are optimally tuned by the Random Value Enhanced Pelican Optimization (RVEPO). Further, the bit rate of the residual image is determined by the entropy coding in the presented work’s training phase. Subsequently, the video quality assessment metrics such as Visual Information Fidelity (VIF) and predicted Differential Mean Opinion Score (DMOSp) are measured to enrich the model functionality.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Attention-Guided Motion Estimation for Video Compression

Spatial-Temporal Autoencoder with Attention Network for Video Compression

Deep learning-guided video compression for machine vision tasks

Article Open access 20 September 2024

Data availability

The data underlying this article are available in Dataset 1: https://ultravideo.fi/#testsequences, Dataset 2: https://github.com/HEVC-Projects/CPH

References

Zhu, L., Zhang, Y., Li, N., Jiang, G., Kwong, S.: Deep learning-based intra mode derivation for versatile video coding. ACM Trans. Multimed. Comput. Commun. Appl. 19, 1–20 (2023)
Article MATH Google Scholar
Amna, M., Imen, W., Fatma Ezahra, S.: Fast multi-type tree partitioning for versatile video coding using machine learning. Signal, Image Video Process 17, 67–74 (2023)
Article MATH Google Scholar
Sun, T., Wang, Y., Huang, Z., Sun, J.: STRANet: soft-target and restriction-aware neural network for efficient VVC intra coding. IEEE Trans. Circuits Syst. Video Technol. 34(11), 11993–12005 (2024)
Article MATH Google Scholar
Wang, D., Fang, B., Wei, X., Xian, W., Zhou, M., Mao, Q.: Rate control in versatile video coding with cosh rate-distortion model. J. Circuits Syst. Comput. 32(12), 2350210 (2023)
Article MATH Google Scholar
Yang, H., Chen, H., Chen, J., Esenlik, S., Sethuraman, S., Xiu, X., Alshina, E., Luo, J.: Subblock-based motion derivation and inter prediction refinement in the versatile video coding standard. IEEE Trans. Circuits Syst. Video Technol. 31(10), 3862–3877 (2021)
Article Google Scholar
Zhang, F., Ma, D., Feng, C., Bull, D.R.: Video compression with CNN-based post processing. IEEE Multimedia 28(4), 74–83 (2021)
Article MATH Google Scholar
Bouaafia, S., Khemiri, R., Messaoud, S., Ben Ahmed, O., Sayadi, F.E.: Deep learning-based video quality enhancement for the new versatile video coding. Neural Comput. Appl. 34(17), 14135–14149 (2022)
Article Google Scholar
Udora, C., Adhuran, J., Fernando, A.: A quality of experience aware framework for versatile video coding based video transmission. IEEE Trans. Consum. Electr. 69(2), 205–216 (2022)
Article Google Scholar
Amestoy, T., Sidaty, N., Hamidouche, W., Philippe, P. and Menard, D., "Video quality assessment and coding complexity of the versatile video coding standard", arXiv preprint, 2023.
Dumitras, A., Haskell, B.G.: A texture replacement method at the encoder for bit-rate reduction of compressed video. IEEE Trans. Circuits Syst. Video Technol. 13(2), 163–175 (2003)
Article MATH Google Scholar
Kim, B.J., Xiong, Z., Pearlman, W.A.: Low bit-rate scalable video coding with 3-D set partitioning in hierarchical trees (3-D SPIHT). IEEE Trans. Circuits Syst. Video Technol. 10(8), 1374–1387 (2000)
Article MATH Google Scholar
Rosario, J.M.D., Fox, G.: Constant bit rate network transmission of variable bit rate continuous media in video-on-demand servers. Multimedia Tools Appl. 2, 215–232 (1996)
Article MATH Google Scholar
Zhang, H., Jung, C., Zou, D. and Li, M., "WCDANN: A lightweight CNN post-processing filter for VVC-based video compression", IEEE Access, 2023.
Das, T., Choi, K., Choi, J.: High quality video frames from VVC: a deep neural network approach. IEEE Access 11, 54254–54264 (2023)
Article MATH Google Scholar
Lee, J., Bang, G., Kang, J., Teratani, M., Lafruit, G., Choi, H.: Performance analysis of multiview video compression based on MIV and VVC multilayer. ETRI J 46(6), 1075–1089 (2024)
Article Google Scholar
Zhu, S., Chang, Q., Li, Q.: Video saliency aware intelligent HD video compression with the improvement of visual quality and the reduction of coding complexity. Neural Comput. Appl. 34(10), 7955–7974 (2022)
Article MATH Google Scholar
Hou, W., Mo, Y., Wei, C.: Very low bit-rate video coding based on wavelet transform. J. Shanghai Univ. 2(4), 270–274 (1998)
Article MATH Google Scholar
Joy, H.K., Kounte, M.R., Chandrasekhar, A., Paul, M.: deep learning based video compression techniques with future research issues. Wireless Pers. Commun. 131(4), 2599–2625 (2023)
Article MATH Google Scholar
Chen, J., Zhou, J., Yu, S., Xu, J., Zhong, L., Zheng, J.: A very low bit rate video coding combined with fast adaptive block size motion estimation and nonuniform scalar quantization multiwavelet transform. Multimedia Tools Appl. 26, 123–144 (2005)
Article Google Scholar
Silveira, D., Povala, G., Amaral, L., Zatt, B., Agostini, L., Porto, M.: Efficient reference frame compression scheme for video coding systems: algorithm and VLSI design. J. Real-Time Image Proc. 16, 391–411 (2019)
Article Google Scholar
Kim, S., Lee, Y., Yoon, K.: Versatile video coding-based coding tree unit level image compression with dual quantization parameters for hybrid vision. IEEE Access 11, 34498–34509 (2023)
Article MATH Google Scholar
Huang, X., Zhou, F., Niu, W., Li, T., Lu, Y., Zhou, Y., Yin, H., Yan, C.: Multi-stage affine motion estimation fast algorithm for versatile video coding using decision tree. J Visual Commun Image Represent 96, 103910 (2023)
Article Google Scholar
García-Lucas, D., Cebrián-Márquez, G., Díaz-Honrubia, A.J., Mallikarachchi, T., Cuenca, P.: A fast full partitioning algorithm for HEVC-to-VVC video transcoding using Bayesian classifiers. J Visual Commun Image Represent 94, 103829 (2023)
Article Google Scholar
Song, Y., Zeng, B., Wang, M., Deng, Z.: An efficient low-complexity block partition scheme for VVC intra coding. J Real-Time Image Process 19(1), 161–172 (2022). https://doi.org/10.1007/s11554-021-01174-z
Article MATH Google Scholar
Karayiannis, N.B., Li, Y.: A replenishment technique for low bit-rate video compression based on wavelets and vector quantization. IEEE Trans. Circuits Syst. Video Technol. 11(5), 658–663 (2001)
Article MATH Google Scholar
Raufmehr, F., Salehi, M.R., Abiri, E.: A neural network-based video bit-rate control algorithm for variable bit-rate applications of versatile video coding standard. Signal Process Image Commun 96, 116317 (2021)
Article Google Scholar
Abdallah, B., Belghith, F., Ben Ayed, M.A., Masmoudi, N.: Low-complexity QTMT partition based on deep neural network for versatile video coding. SIViP 15(6), 1153–1160 (2021)
Article MATH Google Scholar
Yi, W., Dong, L., Liu, M., Hui, M., Kong, L., Zhao, Y.: MFAF-Net: image dehazing with multi-level features and adaptive fusion. Vis. Comput. 40, 2293–2307 (2024)
Article MATH Google Scholar
Yi, W., Dong, L., Liu, M., Zhao, Y., Hui, M., Kong, L.: Gated residual feature attention network for real-time dehazing. Appl. Intell. 52, 17449–17464 (2022)
Article MATH Google Scholar
Yi, W., Dong, L., Liu, M., Hui, M., Kong, L., Zhao, Y.: Frequency-guidance collaborative triple-branch Network for single image dehazing. Displays 80, 102577 (2023)
Article MATH Google Scholar
Yi, W., Dong, L., Liu, M., Zhao, Y., Hui, M., Kong, L.: DCNet: dual-cascade network for single image dehazing. Neural Comput. Appl. 34, 16771–16783 (2022)
Article MATH Google Scholar
Chen, Y., Wang, S., Ip, H., Kwong, S.: Rate distortion optimization with adaptive content modeling for random-access versatile video coding. Inf Sci 645, 119325 (2023)
Article Google Scholar
Darwish, S.M., Almajtomi, A.A.: Metaheuristic-based vector quantization approach: a new paradigm for neural network-based video compression. Multimedia Tools Appl 80, 7367–7396 (2021)
Article MATH Google Scholar
Lee, J.Y., Choi, Y., Van Le, T., Choi, K.: Efficient feature coding based on performance analysis of versatile video coding (VVC) in video coding for machines (VCM). Multimedia Tools Appl 82, 42803–42816 (2023)
Article MATH Google Scholar
Tran, Q.N., Yang, S.-H.: Attention-based inter-prediction for versatile video coding. IEEE Access 11, 84313–84322 (2023)
Article Google Scholar
Trojovský, P., Dehghani, M.: Pelican optimization algorithm: a novel nature-inspired algorithm for engineering applications. Sensors 22(3), 855 (2022)
Article MATH Google Scholar
Lu, G., Ouyang, W., Xu, D., Zhang, X., Cai, C. and Gao, Z., "Dvc: An end-to-end deep video compression framework", In proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 11006–11015, 2019.
Arnab, A., Dehghani, M., Heigold, G., Sun, C., Lučić, M. and Schmid, C., "Vivit: A video vision transformer", Proceedings of the IEEE/CVF international conference on computer vision, pp. 6836–6846, 2021.
Zhang, Y., Tian, Y., Kong, Y., Zhong, B., Fu, Y.: Residual dense network for image restoration. IEEE Trans. Pattern Anal. Mach. Intell. 43(7), 2480–2495 (2020)
Article MATH Google Scholar
Wang, H., Xu, J., Yan, R., Sun, C., Chen, X.: Intelligent bearing fault diagnosis using multi-head attention-based CNN. Procedia Manufact 49, 112–118 (2020)
Article Google Scholar
Oyelade, O.N., Ezugwu, A.E.S., Mohamed, T.I., Abualigah, L.: Ebola optimization search algorithm: A new nature-inspired metaheuristic optimization algorithm. IEEE Access 10, 16150–16177 (2022)
Article MATH Google Scholar
Sulaiman, M.H., Mustaffa, Z., Saari, M.M., Daniyal, H.: Barnacles mating optimizer: a new bio-inspired algorithm for solving engineering optimization problems. Eng Appl Artif Intell 87, 103330 (2020)
Article MATH Google Scholar
Tarkhaneh, O., Alipour, N., Chapnevis, A. and Shen, H., "Golden tortoise beetle optimizer: a novel nature-inspired meta-heuristic algorithm for engineering problems", arXiv preprint, 2021.
Chowdhury, M.M.H., Khatun, A.: Image compression using discrete wavelet transform. Int J Comput Sci Issues (IJCSI) 9(4), 327 (2012)
Google Scholar
Wang, X., Su, Y.: Color image encryption based on chaotic compressed sensing and two-dimensional fractional Fourier transform. Sci Rep 10(1), 18556 (2020)
Article MATH Google Scholar
Hua, Z., Jin, F., Xu, B., Huang, H.: 2D Logistic-Sine-coupling map for image encryption. Signal Process. 149, 148–161 (2018)
Article MATH Google Scholar

Download references

Acknowledgements

I would like to express my very great appreciation to the co-authors of this manuscript for their valuable and constructive suggestions during the planning and development of this research work.

Funding

This research did not receive any specific funding.

Author information

Authors and Affiliations

Department of Information and Communication Engineering, Anna University, Guindy, Chennai, 600025, Tamil Nadu, India
D. Padmapriya
Department of Electronics and Communication Engineering, Panimalar Engineering College, Poonamallee, Chennai, 600123, Tamil Nadu, India
D. Padmapriya & A. Ameelia Roseline

Authors

D. Padmapriya
View author publications
You can also search for this author inPubMed Google Scholar
A. Ameelia Roseline
View author publications
You can also search for this author inPubMed Google Scholar

Contributions

All authors have made substantial contributions to conception and design, revising the manuscript, and the final approval of the version to be published. Also, all authors agreed to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.

Corresponding author

Correspondence to D. Padmapriya.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest.

Ethical approval

Not applicable.

Informed consent

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Padmapriya, D., Ameelia Roseline, A. An intelligent framework of VVC-based video compression and bit rate reduction using vision transformer-based adaptive residual attention densenet. SIViP 19, 206 (2025). https://doi.org/10.1007/s11760-024-03764-3

Download citation

Received: 26 December 2023
Revised: 05 December 2024
Accepted: 06 December 2024
Published: 17 January 2025
DOI: https://doi.org/10.1007/s11760-024-03764-3

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An intelligent framework of VVC-based video compression and bit rate reduction using vision transformer-based adaptive residual attention densenet

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Attention-Guided Motion Estimation for Video Compression

Spatial-Temporal Autoencoder with Attention Network for Video Compression

Deep learning-guided video compression for machine vision tasks

Data availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Informed consent

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now