Efficient feature coding based on performance analysis of Versatile Video Coding (VVC) in Video Coding for Machines (VCM)

Lee, Jin Young; Choi, Yongho; Van Le, The; Choi, Kiho

doi:10.1007/s11042-023-15409-7

Efficient feature coding based on performance analysis of Versatile Video Coding (VVC) in Video Coding for Machines (VCM)

Published: 24 April 2023

Volume 82, pages 42803–42816, (2023)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Jin Young Lee¹,
Yongho Choi¹,
The Van Le¹ &
…
Kiho Choi ORCID: orcid.org/0000-0002-2869-0440^2,3

454 Accesses
3 Citations
Explore all metrics

Abstract

Conventional video coding standards offer efficient compression of traditional 2D images. In particular, versatile video coding (VVC), which is the latest video coding standard, achieves very high compression efficiency, while maintaining high visual quality for humans. On the other hand, video coding for machines (VCM), which is developed as a new style of a video coding standard, mainly targets efficient compression of features extracted from deep neural networks. It generally employs VVC for feature coding. However, since VVC was developed for traditional images, an influence of the VVC based feature coding on VCM is not clear. Therefore, this paper proposes efficient tool combination by analyzing performance of VVC coding tools for the VCM feature coding, and then applies it into video captioning, which automatically generates natural language descriptions from videos. Experimental results show that the proposed tool combination is very efficient, in terms of coding performance and encoding complexity.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Semantic context driven language descriptions of videos using deep neural network

Article Open access 10 February 2022

Visual and language semantic hybrid enhancement and complementary for video description

Article 20 January 2022

Video Captioning Using Deep Learning Approach-A Comprehensive Survey

Data availability

Data sharing not applicable to this article as no datasets were generated or analysed during the current study.

References

Aafaq N, Akhtar N, Liu W, Gilani SZ, Mian A (2019) “Spatio-temporal dynamics and semantic attribute enriched visual encoding for video captioning,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Alshin A and Alshina E (2016) “Bi-directional optical flow for future video codec,” in Proc. Data Compress. Conf. (DCC)
Baroncini V and Wien M (2020) “VVC Verification Test Report for UHD SDR Video Content, document”, JVET-T2020, ITU-T/ISO/IEC Joint Video Experts Team (JVET)
Bossen F, Boyce J, Suehring K, Li X, Seregin V (2019) “JVET common test conditions and software reference configurations for SDR video,” ITU-T/ISO/IEC Joint Video Experts Team (JVET) JVET-N1010
Bjøntegaard G (2008) “Improvement of BD-PSNR Model”, ITU-T SG16/Q6 VCEG-AI11
Bross B, Keydel P, Schwarz H, Marpe D, Wiegand T, Zhao L, Zhao X, Li X, Liu S, Chang Y-J, Jiang H-Y, Lin P-H, Kuo C-C, Lin C-C, Lin C-L (2018) “CE3: Multiple reference line intra prediction (Test 1.1.1, 1.1.2, 1.1.3 and 1.1.4)”, JVET-L0283, ITU-T/ISO/IEC Joint Video Experts Team (JVET)
Chen DL, Dolan WB (2011) “Collecting highly parallel data for paraphrase evaluation,” Association for Computational Linguistics, pp. 190–200
Chen H, Yang H, Chen J (2018) “Symmetrical Mode for Biprediction,” JVET-J0063, ITU-T/ISO/IEC Joint Video Experts Team (JVET)
Chen H, Yang H, Chen J (2018) “CE4: Separate List for Sub-Block Merge Candidates (Test 4.2.8)”, JVET-L0369, ITU-T/ISO/IEC Joint Video Experts Team (JVET)
Chen J, Chien W-J, Hu N, Seregin V, Karczewicz M, Li X (2016) “Enhanced Motion Vector Difference Coding,” JVET-D0123, ITU-T/ISO/IEC Joint Video Exploration Team (JVET)
Chen W, Chen Y, Chernyak R, Choi K, Hashimoto R, Huang Y, Jang H, Liao R, Liu S (2020) “JVET AHG report: Tool reporting procedure (AHG13),” ITU-T/ISO/IEC Joint Video Experts Team (JVET) JVET-T0013
Chiang M-S, Hsu C-W, Huang Y-W, Lei S-M (2018) “CE10.1.1: Multi-hypothesis Prediction for Improving AMVP Mode, Skip or Merge Mode, and Intra Mode,” JVET-L0100, ITUT/ISO/IEC Joint Video Experts Team (JVET)
Choi K, Chen J, Rusanovskyy D, Choi K-P, Jang ES (2020) An Overview of the MPEG-5 Essential Video Coding Standard. IEEE Signal Process Mag 37(3):160–167
Article Google Scholar
Choi K, Piao Y, Kim C (2018) “CE6: AMT with reduced transform types (Test1.5),” JVET-K0171, ITUT/ISO/IEC Joint Video Experts Team (JVET)
De-Luxán-Hernández S, De-Luxán-Hernández S, George V, Ma J, Nguyen T, Schwarz H, Marpe D, Wiegand T (2019) “An intra subpartition coding mode for VVC,” in Proceedings of IEEE Int. Conf. Image Process. (ICIP), pp. 1203–1207
Denkowski M, Lavie A (2014) “Meteor Universal: Language Specific Translation Evaluation for Any Target Language,” Association for Computational Linguistics, pp. 376–380
Gao H, Esenlik S, Alshina E, Steinbach E (2021) Geometric Partitioning Mode in Versatile Video Coding: Algorithm Review and Analysis. IEEE Trans Circuits Syst Video Technol 31(9):3603–3617
Article Google Scholar
He Y and Luo J (2019) “CE4–2.1: Prediction Refinement With Optical Flow for Affine Mode,” JVET-O0070, ITUT/ISO/IEC Joint Video Experts Team (JVET)
Helle P, Pfaff J, Schäfer J, Rischke R, Schwarz H, Marpe D, and Wiegand T (2019), “Intra Picture Prediction for Video Coding with Neural Networks,” In Proc. Data Compression Conference 2019
High Efficient Video Coding (HEVC) (2013) ITU-T Recommendation H.265 and ISO/IEC 23008–2
Hochreiter S, Schmidhuber J (1998) Long short-term memory. Neural Comput 9(8):1735–1780
Article Google Scholar
Huang Y-W, An J, Huang H, Li X, Hsiang S-T, Zhang K, Gao H, Ma J, Chubach O (2021) Block partitioning structure in the VVC standard. IEEE Trans Circuits Syst Video Technol 31(10):3818–3833
Article Google Scholar
ISO/IEC JTC1/SC 29/WG2, N0190 (2022) Use Cases and Requirements for Video Coding for Machines
ISO/IEC JTC1/SC 29/WG2, N0193 ( 2022) Evaluation Framework for Video Coding for Machines
Jeong S, Park MW, Piao Y, Park M, Choi K (2018) “CE4: Ultimate Motion Vector Expression (Test 4.5.4),” JVET-L0054, ITU-T/ISO/IEC Joint Video Experts Team (JVET)
Karczewicz M, Hu N, Taquet J, Chen C, Misra K, Andersson K, Yin P, Lu T, François E, Chen J (2021) VVC In-Loop Filters. IEEE Trans Circuits Syst Video Technol 31(10):3907–3925
Article Google Scholar
Krizhevsky A, Sutskever I, Hinton GE (2012) “Imagenet classification with deep convolutional neural networks,” Neural Information Processing Systems, pp. 1106–1114
Koo M, Salehifar M, Lim J, Kim S-H (2019) “Low frequency nonseparable transform (LFNST),” in Proc. Picture Coding Symp. (PCS)
Lee JY (2019) Deep multimodal embedding for video captioning. Multimed Tools Appl 78(22):31793–31805
Article Google Scholar
Lei Z, Huang Y (2021) Video captioning based on channel soft attention and semantic reconstructor. Future internet 13(2):55
Article MathSciNet Google Scholar
Li J, Wang M, Zhang L, Zhang K, Wang S, Wang S, Ma S, Gao W (2020) “Sub-Sampled Cross-Component Prediction for Chroma Component Coding,” In Proc. Data Compression Conference
Li L, Li H, Liu D, Li Z, Yang H, Lin S, Chen H, Wu F (2018) “An efficient four-parameter affine motion model for video coding. IEEE Trans Circuits Syst Video Technol 28(8):1934–1948
Article Google Scholar
Lin C-Y (2004) “ROUGE: A Package for Automatic Evaluation of Summaries,” Association for Computational Linguistics, pp, 74–81
Nabati M, Behrad A (2020) Multi-sentence video captioning using content-oriented beam searching and multi-stage refining algorithm. Inf Process Manag 57(6):102302
Article Google Scholar
Pan Y, Yao T, Li H, Mei T (2017) “Video captioning with transferred semantic attributes,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Papineni K, Roukos S, Ward T, Zhu W-J (2002) “BLEU: a method for automatic evaluation of machine translation,” Association for Computational Linguistics, pp. 311–318
Schwarz H, Nguyen T, Marpe D, Wiegand T (2018) “CE7: Transform Coefficient Coding and Dependent Quantization (Tests 7.1.2, 7.2.1),” JVET-K0071, ITUT/ISO/IEC Joint Video Experts Team (JVET)
Sethuraman S (2019) “CE9: Results of DMVR Related Tests CE9.2.1 and CE9.2.2,” JVET-M0147, ITUT/ISO/IEC Joint Video Experts Team (JVET),
Su Y-C, Chen C-Y, Huang Y-W, Lei S-M, He Y, Luo J, Xiu X, Ye Y (2018) “CE4-related: Generalized Bi-prediction Improvements Combined from JVET-L0197 and JVET-L0296,” JVET-L0646, ITU-T/ISO/IEC Joint Video Experts Team (JVET)
Szegedy C, Ioffe S, Vanhoucke V, and Alemi A (2016) “Inception-v4, Inception-ResNet and the impact of residual connections on learning,” arXiv [cs.CV]
Van der Auwera G, Heo J, Filippov A (2018) “CE3: Summary Report on Intra Prediction and Mode Coding,” JVET-J0023, ITU-T/ISO/IEC Joint Video Experts Team (JVET)
Venugopalan S, Rohrbach M, Donahue J, Mooney R, Darrell T, Saenko en K (2015) “Sequence to sequence-video to text”, in Proceedings of the IEEE international conference on computer vision
Versatile Video Coding (VVC) (2020) ITU-T Recommendation H.266 and ISO/IEC 23090–3
Vedantam R, Zitnick CL, Parikh D (2015) “CIDEr: Consensus-based Image Description Evaluation,” IEEE Conference on Computer Vision and Pattern Recognition, pp. 4566–4575
VVC Reference Software. https://vcgit.hhi.fraunhofer.de/jvet/VVCSoftware_VTM/-/tags/.
Xu X, Li X, Liu S (2019) “Intra block copy in Versatile Video Coding with Reference Sample Memory Reuse,” in Proc. Picture Coding Symp. (PCS)
Yan C et al (2020) STAT: Spatial-temporal attention mechanism for video captioning. IEEE Trans Multimedia 22(1):229–241
Article Google Scholar
Zhang Y, Agrafiotis D, Bull DR (2013) “High Dynamic Range image & video compression a review,” In Proc. International Conference on Digital Signal Processing (DSP)
Zhang Y, Naccari M, Agrafiotis D, Mrak M, Bull DR (2016) High Dynamic Range Video Compression Exploiting Luminance Masking. IEEE Trans Circuits Syst Video Technol 26(5):950–964
Article Google Scholar
Zhang Y, Naccari M, Agrafiotis D, Mrak M, Bull DR (2013) “High dynamic range video compression by intensity dependent spatial quantization in HEVC,” In Proc. Picture Coding Symposium (PCS)
Zhang L, Zhang K, Liu H, Wang Y, Zhao P, Hong D (2018) “CE4: History-based Motion Vector Prediction (Test 4.4.7),” JVET-L0266, ITU-T/ISO/IEC Joint Video Experts Team (JVET)
Zhao Y, Yang H, Chen J (2018) “CE6: Spatially Varying Transform (Test 6.1.12.1),” JVET-K0139, ITUT/ISO/IEC Joint Video Experts Team (JVET)

Download references

Acknowledgements

This work was supported by the Institute of Information & communications Technology Planning & Evaluation(IITP) grant funded by the Korea government(MSIT) (IITP-2021-0-02067, IITP-2022-RS-2022-00156345) and the National Research Foundation of Korea(NRF) grant funded by the Korean government(MSIT) (NRF-2021R1F1A1060816).

Author information

Authors and Affiliations

Department of Intelligent Mechatronics Engineering, Sejong University, Seoul, Korea
Jin Young Lee, Yongho Choi & The Van Le
Department of Electronics Engineering, Kyung Hee University, Yongin-si, Korea
Kiho Choi
Department of Electronics and Information Convergence Engineering, Kyung Hee University, Yongin-si, Korea
Kiho Choi

Authors

Jin Young Lee
View author publications
You can also search for this author in PubMed Google Scholar
Yongho Choi
View author publications
You can also search for this author in PubMed Google Scholar
The Van Le
View author publications
You can also search for this author in PubMed Google Scholar
Kiho Choi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kiho Choi.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

It has not been published elsewhere and that it has not been submitted simultaneously for publication elsewhere.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Lee, J.Y., Choi, Y., Van Le, T. et al. Efficient feature coding based on performance analysis of Versatile Video Coding (VVC) in Video Coding for Machines (VCM). Multimed Tools Appl 82, 42803–42816 (2023). https://doi.org/10.1007/s11042-023-15409-7

Download citation

Received: 06 June 2022
Revised: 01 December 2022
Accepted: 18 April 2023
Published: 24 April 2023
Issue Date: November 2023
DOI: https://doi.org/10.1007/s11042-023-15409-7

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Efficient feature coding based on performance analysis of Versatile Video Coding (VVC) in Video Coding for Machines (VCM)

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Semantic context driven language descriptions of videos using deep neural network

Visual and language semantic hybrid enhancement and complementary for video description

Video Captioning Using Deep Learning Approach-A Comprehensive Survey

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Efficient feature coding based on performance analysis of Versatile Video Coding (VVC) in Video Coding for Machines (VCM)

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Semantic context driven language descriptions of videos using deep neural network

Visual and language semantic hybrid enhancement and complementary for video description

Video Captioning Using Deep Learning Approach-A Comprehensive Survey

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation