Skip to main content
Log in

Efficient feature coding based on performance analysis of Versatile Video Coding (VVC) in Video Coding for Machines (VCM)

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Conventional video coding standards offer efficient compression of traditional 2D images. In particular, versatile video coding (VVC), which is the latest video coding standard, achieves very high compression efficiency, while maintaining high visual quality for humans. On the other hand, video coding for machines (VCM), which is developed as a new style of a video coding standard, mainly targets efficient compression of features extracted from deep neural networks. It generally employs VVC for feature coding. However, since VVC was developed for traditional images, an influence of the VVC based feature coding on VCM is not clear. Therefore, this paper proposes efficient tool combination by analyzing performance of VVC coding tools for the VCM feature coding, and then applies it into video captioning, which automatically generates natural language descriptions from videos. Experimental results show that the proposed tool combination is very efficient, in terms of coding performance and encoding complexity.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Data availability

Data sharing not applicable to this article as no datasets were generated or analysed during the current study.

References

  1. Aafaq N, Akhtar N, Liu W, Gilani SZ, Mian A (2019) “Spatio-temporal dynamics and semantic attribute enriched visual encoding for video captioning,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

  2. Alshin A and Alshina E (2016) “Bi-directional optical flow for future video codec,” in Proc. Data Compress. Conf. (DCC)

  3.  Baroncini V and Wien M (2020) “VVC Verification Test Report for UHD SDR Video Content, document”, JVET-T2020, ITU-T/ISO/IEC Joint Video Experts Team (JVET)

  4. Bossen F, Boyce J, Suehring K, Li X, Seregin V (2019) “JVET common test conditions and software reference configurations for SDR video,” ITU-T/ISO/IEC Joint Video Experts Team (JVET) JVET-N1010

  5. Bjøntegaard G (2008) “Improvement of BD-PSNR Model”, ITU-T SG16/Q6 VCEG-AI11

  6. Bross B, Keydel P, Schwarz H, Marpe D, Wiegand T, Zhao L, Zhao X, Li X, Liu S, Chang Y-J, Jiang H-Y, Lin P-H, Kuo C-C, Lin C-C, Lin C-L (2018) “CE3: Multiple reference line intra prediction (Test 1.1.1, 1.1.2, 1.1.3 and 1.1.4)”, JVET-L0283, ITU-T/ISO/IEC Joint Video Experts Team (JVET)

  7. Chen DL, Dolan WB (2011) “Collecting highly parallel data for paraphrase evaluation,” Association for Computational Linguistics, pp. 190–200

  8. Chen H, Yang H, Chen J (2018) “Symmetrical Mode for Biprediction,” JVET-J0063, ITU-T/ISO/IEC Joint Video Experts Team (JVET)

  9. Chen H, Yang H, Chen J (2018) “CE4: Separate List for Sub-Block Merge Candidates (Test 4.2.8)”, JVET-L0369, ITU-T/ISO/IEC Joint Video Experts Team (JVET)

  10. Chen J, Chien W-J, Hu N, Seregin V, Karczewicz M, Li X (2016) “Enhanced Motion Vector Difference Coding,” JVET-D0123, ITU-T/ISO/IEC Joint Video Exploration Team (JVET)

  11. Chen W, Chen Y, Chernyak R, Choi K, Hashimoto R, Huang Y, Jang H, Liao R, Liu S (2020) “JVET AHG report: Tool reporting procedure (AHG13),” ITU-T/ISO/IEC Joint Video Experts Team (JVET) JVET-T0013

  12. Chiang M-S, Hsu C-W, Huang Y-W, Lei S-M (2018) “CE10.1.1: Multi-hypothesis Prediction for Improving AMVP Mode, Skip or Merge Mode, and Intra Mode,” JVET-L0100, ITUT/ISO/IEC Joint Video Experts Team (JVET)

  13. Choi K, Chen J, Rusanovskyy D, Choi K-P, Jang ES (2020) An Overview of the MPEG-5 Essential Video Coding Standard. IEEE Signal Process Mag 37(3):160–167

    Article  Google Scholar 

  14. Choi K, Piao Y, Kim C (2018) “CE6: AMT with reduced transform types (Test1.5),” JVET-K0171, ITUT/ISO/IEC Joint Video Experts Team (JVET)

  15. De-Luxán-Hernández S, De-Luxán-Hernández S, George V, Ma J, Nguyen T, Schwarz H, Marpe D, Wiegand T (2019) “An intra subpartition coding mode for VVC,” in Proceedings of IEEE Int. Conf. Image Process. (ICIP), pp. 1203–1207

  16. Denkowski M, Lavie A (2014) “Meteor Universal: Language Specific Translation Evaluation for Any Target Language,” Association for Computational Linguistics, pp. 376–380

  17. Gao H, Esenlik S, Alshina E, Steinbach E (2021) Geometric Partitioning Mode in Versatile Video Coding: Algorithm Review and Analysis. IEEE Trans Circuits Syst Video Technol 31(9):3603–3617

    Article  Google Scholar 

  18. He Y and Luo J (2019) “CE4–2.1: Prediction Refinement With Optical Flow for Affine Mode,” JVET-O0070, ITUT/ISO/IEC Joint Video Experts Team (JVET)

  19. Helle P, Pfaff J, Schäfer J, Rischke R, Schwarz H, Marpe D, and Wiegand T (2019), “Intra Picture Prediction for Video Coding with Neural Networks,” In Proc. Data Compression Conference 2019

  20. High Efficient Video Coding (HEVC) (2013) ITU-T Recommendation H.265 and ISO/IEC 23008–2

  21. Hochreiter S, Schmidhuber J (1998) Long short-term memory. Neural Comput 9(8):1735–1780

    Article  Google Scholar 

  22. Huang Y-W, An J, Huang H, Li X, Hsiang S-T, Zhang K, Gao H, Ma J, Chubach O (2021) Block partitioning structure in the VVC standard. IEEE Trans Circuits Syst Video Technol 31(10):3818–3833

    Article  Google Scholar 

  23. ISO/IEC JTC1/SC 29/WG2, N0190 (2022) Use Cases and Requirements for Video Coding for Machines

  24. ISO/IEC JTC1/SC 29/WG2, N0193 ( 2022) Evaluation Framework for Video Coding for Machines

  25. Jeong S, Park MW, Piao Y, Park M, Choi K (2018) “CE4: Ultimate Motion Vector Expression (Test 4.5.4),” JVET-L0054, ITU-T/ISO/IEC Joint Video Experts Team (JVET)

  26. Karczewicz M, Hu N, Taquet J, Chen C, Misra K, Andersson K, Yin P, Lu T, François E, Chen J (2021) VVC In-Loop Filters. IEEE Trans Circuits Syst Video Technol 31(10):3907–3925

    Article  Google Scholar 

  27. Krizhevsky A, Sutskever I, Hinton GE (2012) “Imagenet classification with deep convolutional neural networks,” Neural Information Processing Systems, pp. 1106–1114

  28. Koo M, Salehifar M, Lim J, Kim S-H (2019) “Low frequency nonseparable transform (LFNST),” in Proc. Picture Coding Symp. (PCS)

  29. Lee JY (2019) Deep multimodal embedding for video captioning. Multimed Tools Appl 78(22):31793–31805

    Article  Google Scholar 

  30. Lei Z, Huang Y (2021) Video captioning based on channel soft attention and semantic reconstructor. Future internet 13(2):55

    Article  MathSciNet  Google Scholar 

  31. Li J, Wang M, Zhang L, Zhang K, Wang S, Wang S, Ma S, Gao W (2020) “Sub-Sampled Cross-Component Prediction for Chroma Component Coding,” In Proc. Data Compression Conference

  32. Li L, Li H, Liu D, Li Z, Yang H, Lin S, Chen H, Wu F (2018) “An efficient four-parameter affine motion model for video coding. IEEE Trans Circuits Syst Video Technol 28(8):1934–1948

    Article  Google Scholar 

  33. Lin C-Y (2004) “ROUGE: A Package for Automatic Evaluation of Summaries,” Association for Computational Linguistics, pp, 74–81

  34. Nabati M, Behrad A (2020) Multi-sentence video captioning using content-oriented beam searching and multi-stage refining algorithm. Inf Process Manag 57(6):102302

    Article  Google Scholar 

  35. Pan Y, Yao T, Li H, Mei T (2017) “Video captioning with transferred semantic attributes,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

  36. Papineni K, Roukos S, Ward T, Zhu W-J (2002) “BLEU: a method for automatic evaluation of machine translation,” Association for Computational Linguistics, pp. 311–318

  37. Schwarz H, Nguyen T, Marpe D, Wiegand T (2018) “CE7: Transform Coefficient Coding and Dependent Quantization (Tests 7.1.2, 7.2.1),” JVET-K0071, ITUT/ISO/IEC Joint Video Experts Team (JVET)

  38. Sethuraman S (2019) “CE9: Results of DMVR Related Tests CE9.2.1 and CE9.2.2,” JVET-M0147, ITUT/ISO/IEC Joint Video Experts Team (JVET),

  39. Su Y-C, Chen C-Y, Huang Y-W, Lei S-M, He Y, Luo J, Xiu X, Ye Y (2018) “CE4-related: Generalized Bi-prediction Improvements Combined from JVET-L0197 and JVET-L0296,” JVET-L0646, ITU-T/ISO/IEC Joint Video Experts Team (JVET)

  40. Szegedy C, Ioffe S, Vanhoucke V, and Alemi A (2016) “Inception-v4, Inception-ResNet and the impact of residual connections on learning,” arXiv [cs.CV]

  41. Van der Auwera G, Heo J, Filippov A (2018) “CE3: Summary Report on Intra Prediction and Mode Coding,” JVET-J0023, ITU-T/ISO/IEC Joint Video Experts Team (JVET)

  42. Venugopalan S, Rohrbach M, Donahue J, Mooney R, Darrell T, Saenko en K (2015) “Sequence to sequence-video to text”, in Proceedings of the IEEE international conference on computer vision

  43. Versatile Video Coding (VVC) (2020) ITU-T Recommendation H.266 and ISO/IEC 23090–3

  44. Vedantam R, Zitnick CL, Parikh D (2015) “CIDEr: Consensus-based Image Description Evaluation,” IEEE Conference on Computer Vision and Pattern Recognition, pp. 4566–4575

  45. VVC Reference Software. https://vcgit.hhi.fraunhofer.de/jvet/VVCSoftware_VTM/-/tags/.

  46. Xu X, Li X, Liu S (2019) “Intra block copy in Versatile Video Coding with Reference Sample Memory Reuse,” in Proc. Picture Coding Symp. (PCS)

  47. Yan C et al (2020) STAT: Spatial-temporal attention mechanism for video captioning. IEEE Trans Multimedia 22(1):229–241

    Article  Google Scholar 

  48. Zhang Y, Agrafiotis D, Bull DR (2013) “High Dynamic Range image & video compression a review,” In Proc. International Conference on Digital Signal Processing (DSP)

  49. Zhang Y, Naccari M, Agrafiotis D, Mrak M, Bull DR (2016) High Dynamic Range Video Compression Exploiting Luminance Masking. IEEE Trans Circuits Syst Video Technol 26(5):950–964

    Article  Google Scholar 

  50. Zhang Y, Naccari M, Agrafiotis D, Mrak M, Bull DR (2013) “High dynamic range video compression by intensity dependent spatial quantization in HEVC,” In Proc. Picture Coding Symposium (PCS)

  51. Zhang L, Zhang K, Liu H, Wang Y, Zhao P, Hong D (2018) “CE4: History-based Motion Vector Prediction (Test 4.4.7),” JVET-L0266, ITU-T/ISO/IEC Joint Video Experts Team (JVET)

  52. Zhao Y, Yang H, Chen J (2018) “CE6: Spatially Varying Transform (Test 6.1.12.1),” JVET-K0139, ITUT/ISO/IEC Joint Video Experts Team (JVET)

Download references

Acknowledgements

This work was supported by the Institute of Information & communications Technology Planning & Evaluation(IITP) grant funded by the Korea government(MSIT) (IITP-2021-0-02067, IITP-2022-RS-2022-00156345) and the National Research Foundation of Korea(NRF) grant funded by the Korean government(MSIT) (NRF-2021R1F1A1060816).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kiho Choi.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

It has not been published elsewhere and that it has not been submitted simultaneously for publication elsewhere.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lee, J.Y., Choi, Y., Van Le, T. et al. Efficient feature coding based on performance analysis of Versatile Video Coding (VVC) in Video Coding for Machines (VCM). Multimed Tools Appl 82, 42803–42816 (2023). https://doi.org/10.1007/s11042-023-15409-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-023-15409-7

Keywords

Navigation