Skip to main content

Advertisement

Log in

Deep Learning Based Video Compression Techniques with Future Research Issues

  • Published:
Wireless Personal Communications Aims and scope Submit manuscript

Abstract

The advancements in the domain of video coding technologies are tremendously fluctuating in recent years. As the public got acquainted with the creation and availability of videos through internet boom and video acquisition devices including mobile phones, camera etc., the necessity of video compression become crucial. The resolution variance (4 K, 2 K etc.), framerate, display is some of the features that glorifies the importance of compression. Improving compression ratio with better efficiency and quality was the focus and it has many stumbling blocks to achieve it. The era of artificial intelligence, neural network, and especially deep learning provided light in the path of video processing area, particularly in compression. The paper mainly focuses on a precise, organized, meticulous review of the impact of deep learning on video compression. The content adaptivity quality of deep learning marks its importance in video compression to traditional signal processing. The development of intelligent and self-trained steps in video compression with deep learning is reviewed in detail. The relevant and noteworthy work that arose in each step of compression is inculcated in this paper. A detailed survey in the development of intra- prediction, inter-prediction, in-loop filtering, quantization, and entropy coding in hand with deep learning techniques are pointed along with envisages ideas in each field. The future scope of enhancement in various stages of compression and relevant research scope to explore with Deep Learning is emphasized.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Data Availability

Enquiries about data availability should be directed to the authors.

Abbreviations

VVC:

Versatile video coding

HEVC:

High efficiency video coding

CNN:

Convolutional neural network

SRCNN:

Super-resolution convolutional neural network

GOB:

Group of blocks

DCT:

Discreet cosine transform

AVC:

Advanced video coding

UHD:

Ultra-high-definition

DST:

Discrete sine transform

DWT:

Discrete wavelet transform

HT:

Hilbert transform

CABAC:

Context-adaptive binary arithmetic coding

NN:

Neural network

DL:

Deep learning

CTU:

Coding tree unit

FRCNN:

Faster region based convolutional neural network, fractional-pixel reference generation CNN

SVM:

Support vector machine

CNNMCR:

Convolutional neural network-based motion compensation refinement

VECNN:

Virtual reference frame enhancement CNN

VRF:

Virtual reference frame

DVRF:

Direct virtual reference frame

FCNN:

Fully convolutional neural network

RHCNN:

Residual highway convolutional neural network

RDO:

Rate distortion optimization

SAO:

Sample adaptive offset

MIF:

Multi frame in loop filter

DNN:

Deep neural network

SSIM:

Structural similarity index

VMAF:

Video multimethod assessment fusion

References

  1. Ma, S., Zhang, X., Jia, C., Zhao, Z., Wang, S., & Wanga, S. (2019). Image and video compression with neural networks: A review. IEEE Transaction on Circuits and System for Video Technology, 8215(SEPTEMBER 2018), 1–1.

    Article  Google Scholar 

  2. Reader, C. (2002). History of video compression (Draft), document JVT-D068, Joint video team (JVT) of ISO/IEC MPEG & ITEG (ISO/IEC JTC1/SC29/WG11 and ITU-T SG16 Q.6).

  3. Huffman, D. A. (1952). A method for the construction of minimum-redundancy codes. Proceedings of the IRE, 40(9), 1098–1101.

    Article  MATH  Google Scholar 

  4. Andrews, H., & Pratt, W. (1968). Fourier transform coding of image in Proc. Hawaii Int. Conf. System Sciences, pp. 677–679.

  5. Pratt, W. K., Kane, J., & Andrews, H. C. (1969). Hadamard transform ima coding. Proceedings of the IEEE, 57(1), 58–68.

    Article  Google Scholar 

  6. Ahmed, N., Natarajan, T., & Rao, K. R. (1974). Discrete cosine transform. IEEE Transaction on Computers, 100(1), 90–93.

    Article  MathSciNet  MATH  Google Scholar 

  7. Joy, H.K., & Kounte, M.R. (2019). An overview of traditional and recent trends in video processing, in Proceedings of the 2nd International Conference on Smart Systems and Inventive Technology, ICSSIT 2019. pp. 848–851.

  8. Wiegand, T., Sullivan, G. J., Bjontegaard, G., & Luthra, A. (2003). Overview of the H.264/AVC video coding standard. IEEE Transactions on Circuits and Systems for Video Technology, 13(7), 560–576.

    Article  Google Scholar 

  9. Sullivan, G. J., Ohm, J., Han, W.-J., & Wiegand, T. (2012). Overview of the high efficiency video coding (HEVC) standard. IEEE Transaction on Circuits and Systems for Video Technology, 22(12), 1649–1668.

    Article  Google Scholar 

  10. LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436–444.

    Article  Google Scholar 

  11. Dong, L., Yue, L., Jianping, L., Houqiang, L., & Feng, W. (2020). Deep learning-based video coding: A review and a case study. ACM Computer Survey, 53(1), 1–34.

    Google Scholar 

  12. Kumar, B. S., & Shree, V. U. (2020). An end-to-end video compression using deep neural netowrk. JAC: A Journal of Composition Theory, XIII(XI), 209–215.

    Google Scholar 

  13. Agustsson, E., Tschannen, M., Mentzer, F., Timofte, R., & Van Gool, L. (2018). Extreme learned image compression with GANs, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 2587–2590.

  14. Zhang, X., Ma, S., Wang, S., Zhang, X., Sun, H., & Gao, W. (2017). A joint compression scheme of video feature descriptors and visual content. IEEE Transaction on Image Processing, 26(2), 633–647.

    Article  MathSciNet  MATH  Google Scholar 

  15. Li, Y., Jia, C., Zhang, X., Wang, S., Ma, S., & Gao, W. (2018). Joint rate-distortion optimization for simultaneous texture and deep feature compression of facial images, in IEEE International Conference on Multimedia Big Data (BigMM), pp. 334–341.

  16. Li, X., & Gong, N. (2020). Run-time deep learning enhanced fast coding unit decision for high efficiency video coding. Journal of Circuits, Systems and Computers, 29(3), 1–19.

    Article  MathSciNet  Google Scholar 

  17. Srivastava, N., Mansimov, E., & Salakhudinov, R. (2015). Unsupervised learning of video representations using LSTMS,” in International conference on machine learning, pp. 843–852.

  18. Li, J., Li, B., Xu, J., Xiong, R., & Gao, W. (2018). Fully connected network- based intra prediction for image coding, IEEE Transaction on Image Processing.

  19. Joy, H.K., Kounte, M.R., & Joy, A.K. (2020). Deep learning approach in intra -prediction of high efficiency video coding, in 2020 International conference on smart technologies in computing, electrical and electronics (ICSTCEE), Bengaluru, pp. 134–138, doi: https://doi.org/10.1109/ICSTCEE49637.2020.9277189

  20. Li, Y., Li, L., Li, Z., Yang, J., Xu, N., Liu, D., & Li, H. (2018). A hybrid neural network for chroma intra prediction, in 2018 25th IEEE International Conference on Image Processing (ICIP). IEEE, pp. 1797–1801.

  21. Pfaff, J., Helle, P., Maniry, D., Kaltenstadler, S., Stallenberger, B., Merkle, P., Siekmann, M., Schwarz, H., Marpe, D., & Wiegan, T. (2018). Intra prediction modes based on neural networks, in JVET-J0037. ISO/IEC JTC/SC 29/WG 11, April, pp. 1–14.

  22. Li, Y., Liu, D., Li, H., Li, L., Wu, F., Zhang, H., & Yang, H. (2017). Convolutional neural network-based block up-sampling for intra frame coding. IEEE Transaction on Circuits and Systems for Video Technology. https://doi.org/10.1109/TCSVT.2017.2727682

    Article  Google Scholar 

  23. Hu, Y., Yang, W., Xia, S., Cheng, W.H., & Liu, J. (2018). Enhanced intra prediction with recurrent neural network in video coding, in IEEE Data Compression Conference (DCC), pp. 413–413.

  24. Feng, L., Zhang, X., Zhang, X., Wang, S., Wang, R., & Ma, S. (2018) A dual-network based super-resolution for compressed high-definition video, in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics).

  25. Huang, H., Schiopu, I., & Munteanu, A. (2020). Frame-wise CNN-based filtering for intra-frame quality enhancement of HEVC videos. IEEE Transaction on Circuits and System Video Technology, 8215(c), 1–1.

    Google Scholar 

  26. Shen, M., Xue, P., & Wang, C. (2011). Down-sampling based video coding using super-resolution technique. IEEE Transaction on Circuits and Systems for Video Technology, 21(6), 755–765.

    Article  Google Scholar 

  27. Pfaff, J., Helle, P., Maniry, D., Kaltenstadler, S., Samek, W., Schwarz, H., Marpe, D., & Wiegand, T. (2018). Neural network based intra prediction for video coding, in Applications of Digital Image Processing XLI, vol. 10752. International Society for Optics and Photonics, 2018, p. 1075213.

  28. Zhang, Z.T., Yeh, C.H., Kang, L.W., & Lin, M.H. (2017). Efficient CTU- based intra frame coding for HEVC based on deep learning, in Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC). IEEE, pp. 661–664

  29. Ma, C., Liu, D., Peng, X., Li, L., & Wu, F. (2020). Convolutional neural network-based arithmetic coding for HEVC intra-predicted residues. IEEE Transactions on Circuits and Systems for Video Technology, 30(7), 1901–1916.

    Google Scholar 

  30. Meyer, M., Wiesner, J., Schneider, J., & Rohlfing, C. (2019). Convolutional neural networks for video intra prediction using cross-component adaptation, in ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, United Kingdom, 2019, pp. 1607–1611, doi: https://doi.org/10.1109/ICASSP.2019.8682846.

  31. Liu, Z., Yu, X., Gao, Y., Chen, S., Ji, X., & Wang, D. (2016). CU partition mode decision for HEVC hardwired intra encoder using convolution neural network. IEEE Transaction on Image Processing, 25(11), 5088–5103.

    Article  MathSciNet  MATH  Google Scholar 

  32. Song, N., Liu, Z., Ji, X., & Wang, D. (2017) CNN oriented fast PU mode decision for HEVC hardwired intra encoder, in IEEE Global Conference on Signal and Information Processing (GlobalSIP), pp. 239–243.

  33. Yan, N., Liu, D., Li, H., Li, B., Li, L., & Wu, F. (2018). Convolutional neural network-based fractional-pixel motion compensation. IEEE Transaction on Circuits and Systems for Video Technology. https://doi.org/10.1109/TCSVT.2018.2816932

    Article  Google Scholar 

  34. Zhao, L., Wang, S., Zhang, X., Wang, S., Ma, S., & Gao, W. (2018). Enhanced CTU-level inter prediction with deep frame rate up-conversion for high efficiency video coding,” in 25th IEEE International Conference on Image Processing (ICIP), 2018, pp. 206–210.

  35. Alexandre, D., Hang, H.-M., Peng, W.-H., & Domański, M. (2021). Deep video compression for interframe coding. IEEE International Conference on Image Processing (ICIP), 2021, 2124–2128. https://doi.org/10.1109/ICIP42928.2021.9506275

    Article  Google Scholar 

  36. Bouaafia, S., Khemiri, R., Sayadi, F. E., & Atri, M. (2020). Fast CU partition-based machine learning approach for reducing HEVC complexity. Journal of Real-Time Image Processing, 17(1), 185–196.

    Article  Google Scholar 

  37. Lee, J. K., Kim, N., Cho, S., & Kang, J. W. (2020). Deep video prediction network based inter-frame coding in HEVC. IEEE Access, 8, 95906–95917.

    Article  Google Scholar 

  38. Lee, J.K., Kim, N., Cho, S., & Kang, J.W. (2018). Enhanced motion-compensated video coding with deep virtual reference frame generation, submitted to IEEE Transaction on Image Processing.

  39. Guo, Y., Liu, Z., Chen, Z., & Liu, S. (2020). Deep inter coding with interpolated reference frame for hierarchical coding structure. IEEE International Conference on Visual Communications and Image Processing (VCIP), 2020, 302–305. https://doi.org/10.1109/VCIP49819.2020.9301769

    Article  Google Scholar 

  40. Li, K., Bare, B., & Yan, B. (2017). An efficient deep convolutional neural networks model for compressed image deblocking, in International Conference on Multimedia and Expo (ICME), 2017, pp. 1320–1325.

  41. He, P., Li, H., Wang, H., Wang, S., Jiang, X., & Zhang, R. (2020). Frame-wise detection of double HEVC compression by learning deep spatiotemporal representations in compression domain. IEEE Transaction on Multimediations, 9210(65), 1–14.

    Google Scholar 

  42. Brand, F., Seiler, J., & Kaup, A. (2021). Switchable motion models for non-block-based inter prediction in learning-based video coding. Picture Coding Symposium (PCS), 2021, 1–5. https://doi.org/10.1109/PCS50896.2021.9477475

    Article  Google Scholar 

  43. Wiedemann, S., et al. (2019). DeepCABAC: Context-adaptive binary arithmetic coding for deep neural network compression,” arXiv, pp. 2–5.

  44. Yin, H., Yang, H., Huang, X., Wang, H., & Yan, C. (2019). Multi-stage all-zero block detection for HEVC coding using machine learning. Journal of Visual Communication and Image Representative, 73(September), 102945.

    Google Scholar 

  45. Wang, M., Fang, X., Tan, S., Zhang, X., & Zhang, L. (2020). Low complexity quantization in high efficiency video coding. IEEE Access, 8, 145159–145170.

    Article  Google Scholar 

  46. Puri, S., Lasserre, S., & Le Callet, P. (2017). CNN-based transform index prediction in multiple transforms framework to assist entropy coding, in Signal Processing Conference (EUSIPCO), European, pp. 798–802.

  47. Y. Zhang, T. Shen, X. Ji, Y. Zhang, R. Xiong, and Q. Dai, “Residual Highway Convolutional Neural Networks for in-loop Filtering in HEVC,” IEEE Trans. on Image Processing, 2018.

  48. Yuan, Z., Liu, H., Mukherjee, D., Adsumilli, B., & Wang, Y. (2021). Block-based learned image coding with convolutional autoencoder and intra-prediction aided entropy coding. Picture Coding Symposium (PCS), 2021, 1–5. https://doi.org/10.1109/PCS50896.2021.9477503

    Article  Google Scholar 

  49. Dong, C., Deng, Y., Change Loy, C., & Tang, X. (2015). Compression artifacts reduction by a deep convolutional network, in Proceedings of the IEEE International Conference on Computer Vision, pp. 576–584.

  50. Yang, K., Liu, D., & Wu, F. (2020). Deep learning-based nonlinear transform for HEVC intra coding. IEEE International Conference on Visual Communications and Image Processing (VCIP), 2020, 387–390. https://doi.org/10.1109/VCIP49819.2020.9301790

    Article  Google Scholar 

  51. Jia, C., Wang, S., Zhang, X., Liu, J., Pu, S., Wang, S., & Ma, S. (2019). Content-aware convolutional neural network for in-loop filtering in high efficiency video coding. IEEE Trans. on Image Processing. https://doi.org/10.1109/TIP.2019.2896489

    Article  MathSciNet  MATH  Google Scholar 

  52. Song, X., Yao, J., Zhou, L., Wang, L., Wu, X., Xie, D., & Pu, S. (2018). A practical convolutional neural network as loop filter for intra frame, arXiv preprint arXiv:1805.06121.

  53. Park, W.-S., & Kim, M. (2016). CNN-based in-loop filtering for coding efficiency improvement, in Image, Video, and Multidimensional Signal Processing Workshop (IVMSP), pp. 1–5.

  54. Cui, K., Koyuncu, A. B., Boev, A., Alshina, E., & Steinbach, E. (2021). Convolutional neural network-based post-filtering for compressed YUV420 images and video. Picture Coding Symposium (PCS), 2021, 1–5. https://doi.org/10.1109/PCS50896.2021.9477486

    Article  Google Scholar 

  55. Zhu, L., Zhang, Y., Wang, S., Yuan, H., Kwong, S., & Ip, H.H.-S. (2018). Con- volutional neural network-based synthesized view quality enhancement for 3d video coding. IEEE Transactions on Image Processing, 27(11), 5365–5377.

    Article  MathSciNet  Google Scholar 

  56. Yue, J., Gao, Y., Li, S., & Jia, M. (2020). A mixed appearance-based and coding distortion-based CNN fusion approach for in-loop filtering in video coding. IEEE International Conference on Visual Communications and Image Processing (VCIP), 2020, 487–490. https://doi.org/10.1109/VCIP49819.2020.9301895

    Article  Google Scholar 

  57. Li, T., Xu, M., Zhu, C., Yang, R., Wang, Z., & Guan, Z. (2019). A deep learning approach for multi-frame in-loop filter of HEVC. IEEE Transactions on Image Processing, 28(11), 5663–5678.

    Article  MathSciNet  MATH  Google Scholar 

  58. Joy, H. K., & Kounte, M. R. (2022). Decision algorithm for intra prediction in high-efficiency video coding (HEVC). Journal of Southwest Jiaotong University, 57(5), 180–193. https://doi.org/10.35741/issn.0258-2724.57.5.15

    Article  Google Scholar 

  59. Pan, Z., Yi, X., Zhang, Y., Jeon, B., & Kwong, S. (2020). Efficient in-loop filtering based on enhanced deep convolutional neural networks for HEVC. IEEE Transactions on Image Processing, 29, 5352–5366.

    Article  MATH  Google Scholar 

  60. Dhanalakshmi, A., & Nagarajan, G. (2020). Combined spatial temporal based In-loop filter for scalable extension of HEVC. ICT Express, 6(4), 306–311.

    Article  Google Scholar 

  61. Lai, P.R., & Wang, J.S. (2020). Multi-stage attention convolutional neural networks for HEVC in-loop filtering,” in Proceedings - 2020 IEEE International Conference on Artifical Intelligents Circuits System AICAS 2020, pp. 173–177.

  62. Cavigelli, L., Hager, P. & Benini, L. (2017). CAS-CNN: A deep convolu- tional neural network for image compression artifact suppression, in International Joint Conference on Neural Networks (IJCNN). IEEE, pp. 752–759.

  63. Joy, H. K., & Kounte, M. R. (2020). A comprehensive review of traditional video processing. Advances in Science, Technology and Engineering System Journal, 5(6), 274–279.

    Article  Google Scholar 

Download references

Acknowledgements

The authors would like to thank the anonymous referees for providing valuable suggestions which helped clarify the exposition of the material. The authors are greatly indebted to the anonymous reviewers whose thought-provoking and encouraging comments have motivated them to modify significantly and update the paper. They also like to express their gratitude to REVA University for extending research facilities to carry out this research.

Funding

The authors have not disclosed any funding.

Author information

Authors and Affiliations

Authors

Contributions

HKJ Analysis of existing video coding techniques and it evolution. Extensive research on multiple intra and inter prediction techniques with deep learning methodology. MRK Content on Deep learning-based Quantization and entropy, Future trends and open research scope identification. AC Overall review of section wise contents in the paper. Content contribution on open research scope identification. MP Overall review of section wise contents in the paper. Content contribution on recent techniques and reviews on loop filtering and its research white spaces.

Corresponding author

Correspondence to Arunkumar Chandrasekhar.

Ethics declarations

Competing interest

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Joy, H.K., Kounte, M.R., Chandrasekhar, A. et al. Deep Learning Based Video Compression Techniques with Future Research Issues. Wireless Pers Commun 131, 2599–2625 (2023). https://doi.org/10.1007/s11277-023-10558-2

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11277-023-10558-2

Keywords

Navigation