Abstract
The advancements in the domain of video coding technologies are tremendously fluctuating in recent years. As the public got acquainted with the creation and availability of videos through internet boom and video acquisition devices including mobile phones, camera etc., the necessity of video compression become crucial. The resolution variance (4 K, 2 K etc.), framerate, display is some of the features that glorifies the importance of compression. Improving compression ratio with better efficiency and quality was the focus and it has many stumbling blocks to achieve it. The era of artificial intelligence, neural network, and especially deep learning provided light in the path of video processing area, particularly in compression. The paper mainly focuses on a precise, organized, meticulous review of the impact of deep learning on video compression. The content adaptivity quality of deep learning marks its importance in video compression to traditional signal processing. The development of intelligent and self-trained steps in video compression with deep learning is reviewed in detail. The relevant and noteworthy work that arose in each step of compression is inculcated in this paper. A detailed survey in the development of intra- prediction, inter-prediction, in-loop filtering, quantization, and entropy coding in hand with deep learning techniques are pointed along with envisages ideas in each field. The future scope of enhancement in various stages of compression and relevant research scope to explore with Deep Learning is emphasized.













Similar content being viewed by others
Data Availability
Enquiries about data availability should be directed to the authors.
Abbreviations
- VVC:
-
Versatile video coding
- HEVC:
-
High efficiency video coding
- CNN:
-
Convolutional neural network
- SRCNN:
-
Super-resolution convolutional neural network
- GOB:
-
Group of blocks
- DCT:
-
Discreet cosine transform
- AVC:
-
Advanced video coding
- UHD:
-
Ultra-high-definition
- DST:
-
Discrete sine transform
- DWT:
-
Discrete wavelet transform
- HT:
-
Hilbert transform
- CABAC:
-
Context-adaptive binary arithmetic coding
- NN:
-
Neural network
- DL:
-
Deep learning
- CTU:
-
Coding tree unit
- FRCNN:
-
Faster region based convolutional neural network, fractional-pixel reference generation CNN
- SVM:
-
Support vector machine
- CNNMCR:
-
Convolutional neural network-based motion compensation refinement
- VECNN:
-
Virtual reference frame enhancement CNN
- VRF:
-
Virtual reference frame
- DVRF:
-
Direct virtual reference frame
- FCNN:
-
Fully convolutional neural network
- RHCNN:
-
Residual highway convolutional neural network
- RDO:
-
Rate distortion optimization
- SAO:
-
Sample adaptive offset
- MIF:
-
Multi frame in loop filter
- DNN:
-
Deep neural network
- SSIM:
-
Structural similarity index
- VMAF:
-
Video multimethod assessment fusion
References
Ma, S., Zhang, X., Jia, C., Zhao, Z., Wang, S., & Wanga, S. (2019). Image and video compression with neural networks: A review. IEEE Transaction on Circuits and System for Video Technology, 8215(SEPTEMBER 2018), 1–1.
Reader, C. (2002). History of video compression (Draft), document JVT-D068, Joint video team (JVT) of ISO/IEC MPEG & ITEG (ISO/IEC JTC1/SC29/WG11 and ITU-T SG16 Q.6).
Huffman, D. A. (1952). A method for the construction of minimum-redundancy codes. Proceedings of the IRE, 40(9), 1098–1101.
Andrews, H., & Pratt, W. (1968). Fourier transform coding of image in Proc. Hawaii Int. Conf. System Sciences, pp. 677–679.
Pratt, W. K., Kane, J., & Andrews, H. C. (1969). Hadamard transform ima coding. Proceedings of the IEEE, 57(1), 58–68.
Ahmed, N., Natarajan, T., & Rao, K. R. (1974). Discrete cosine transform. IEEE Transaction on Computers, 100(1), 90–93.
Joy, H.K., & Kounte, M.R. (2019). An overview of traditional and recent trends in video processing, in Proceedings of the 2nd International Conference on Smart Systems and Inventive Technology, ICSSIT 2019. pp. 848–851.
Wiegand, T., Sullivan, G. J., Bjontegaard, G., & Luthra, A. (2003). Overview of the H.264/AVC video coding standard. IEEE Transactions on Circuits and Systems for Video Technology, 13(7), 560–576.
Sullivan, G. J., Ohm, J., Han, W.-J., & Wiegand, T. (2012). Overview of the high efficiency video coding (HEVC) standard. IEEE Transaction on Circuits and Systems for Video Technology, 22(12), 1649–1668.
LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436–444.
Dong, L., Yue, L., Jianping, L., Houqiang, L., & Feng, W. (2020). Deep learning-based video coding: A review and a case study. ACM Computer Survey, 53(1), 1–34.
Kumar, B. S., & Shree, V. U. (2020). An end-to-end video compression using deep neural netowrk. JAC: A Journal of Composition Theory, XIII(XI), 209–215.
Agustsson, E., Tschannen, M., Mentzer, F., Timofte, R., & Van Gool, L. (2018). Extreme learned image compression with GANs, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 2587–2590.
Zhang, X., Ma, S., Wang, S., Zhang, X., Sun, H., & Gao, W. (2017). A joint compression scheme of video feature descriptors and visual content. IEEE Transaction on Image Processing, 26(2), 633–647.
Li, Y., Jia, C., Zhang, X., Wang, S., Ma, S., & Gao, W. (2018). Joint rate-distortion optimization for simultaneous texture and deep feature compression of facial images, in IEEE International Conference on Multimedia Big Data (BigMM), pp. 334–341.
Li, X., & Gong, N. (2020). Run-time deep learning enhanced fast coding unit decision for high efficiency video coding. Journal of Circuits, Systems and Computers, 29(3), 1–19.
Srivastava, N., Mansimov, E., & Salakhudinov, R. (2015). Unsupervised learning of video representations using LSTMS,” in International conference on machine learning, pp. 843–852.
Li, J., Li, B., Xu, J., Xiong, R., & Gao, W. (2018). Fully connected network- based intra prediction for image coding, IEEE Transaction on Image Processing.
Joy, H.K., Kounte, M.R., & Joy, A.K. (2020). Deep learning approach in intra -prediction of high efficiency video coding, in 2020 International conference on smart technologies in computing, electrical and electronics (ICSTCEE), Bengaluru, pp. 134–138, doi: https://doi.org/10.1109/ICSTCEE49637.2020.9277189
Li, Y., Li, L., Li, Z., Yang, J., Xu, N., Liu, D., & Li, H. (2018). A hybrid neural network for chroma intra prediction, in 2018 25th IEEE International Conference on Image Processing (ICIP). IEEE, pp. 1797–1801.
Pfaff, J., Helle, P., Maniry, D., Kaltenstadler, S., Stallenberger, B., Merkle, P., Siekmann, M., Schwarz, H., Marpe, D., & Wiegan, T. (2018). Intra prediction modes based on neural networks, in JVET-J0037. ISO/IEC JTC/SC 29/WG 11, April, pp. 1–14.
Li, Y., Liu, D., Li, H., Li, L., Wu, F., Zhang, H., & Yang, H. (2017). Convolutional neural network-based block up-sampling for intra frame coding. IEEE Transaction on Circuits and Systems for Video Technology. https://doi.org/10.1109/TCSVT.2017.2727682
Hu, Y., Yang, W., Xia, S., Cheng, W.H., & Liu, J. (2018). Enhanced intra prediction with recurrent neural network in video coding, in IEEE Data Compression Conference (DCC), pp. 413–413.
Feng, L., Zhang, X., Zhang, X., Wang, S., Wang, R., & Ma, S. (2018) A dual-network based super-resolution for compressed high-definition video, in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics).
Huang, H., Schiopu, I., & Munteanu, A. (2020). Frame-wise CNN-based filtering for intra-frame quality enhancement of HEVC videos. IEEE Transaction on Circuits and System Video Technology, 8215(c), 1–1.
Shen, M., Xue, P., & Wang, C. (2011). Down-sampling based video coding using super-resolution technique. IEEE Transaction on Circuits and Systems for Video Technology, 21(6), 755–765.
Pfaff, J., Helle, P., Maniry, D., Kaltenstadler, S., Samek, W., Schwarz, H., Marpe, D., & Wiegand, T. (2018). Neural network based intra prediction for video coding, in Applications of Digital Image Processing XLI, vol. 10752. International Society for Optics and Photonics, 2018, p. 1075213.
Zhang, Z.T., Yeh, C.H., Kang, L.W., & Lin, M.H. (2017). Efficient CTU- based intra frame coding for HEVC based on deep learning, in Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC). IEEE, pp. 661–664
Ma, C., Liu, D., Peng, X., Li, L., & Wu, F. (2020). Convolutional neural network-based arithmetic coding for HEVC intra-predicted residues. IEEE Transactions on Circuits and Systems for Video Technology, 30(7), 1901–1916.
Meyer, M., Wiesner, J., Schneider, J., & Rohlfing, C. (2019). Convolutional neural networks for video intra prediction using cross-component adaptation, in ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, United Kingdom, 2019, pp. 1607–1611, doi: https://doi.org/10.1109/ICASSP.2019.8682846.
Liu, Z., Yu, X., Gao, Y., Chen, S., Ji, X., & Wang, D. (2016). CU partition mode decision for HEVC hardwired intra encoder using convolution neural network. IEEE Transaction on Image Processing, 25(11), 5088–5103.
Song, N., Liu, Z., Ji, X., & Wang, D. (2017) CNN oriented fast PU mode decision for HEVC hardwired intra encoder, in IEEE Global Conference on Signal and Information Processing (GlobalSIP), pp. 239–243.
Yan, N., Liu, D., Li, H., Li, B., Li, L., & Wu, F. (2018). Convolutional neural network-based fractional-pixel motion compensation. IEEE Transaction on Circuits and Systems for Video Technology. https://doi.org/10.1109/TCSVT.2018.2816932
Zhao, L., Wang, S., Zhang, X., Wang, S., Ma, S., & Gao, W. (2018). Enhanced CTU-level inter prediction with deep frame rate up-conversion for high efficiency video coding,” in 25th IEEE International Conference on Image Processing (ICIP), 2018, pp. 206–210.
Alexandre, D., Hang, H.-M., Peng, W.-H., & Domański, M. (2021). Deep video compression for interframe coding. IEEE International Conference on Image Processing (ICIP), 2021, 2124–2128. https://doi.org/10.1109/ICIP42928.2021.9506275
Bouaafia, S., Khemiri, R., Sayadi, F. E., & Atri, M. (2020). Fast CU partition-based machine learning approach for reducing HEVC complexity. Journal of Real-Time Image Processing, 17(1), 185–196.
Lee, J. K., Kim, N., Cho, S., & Kang, J. W. (2020). Deep video prediction network based inter-frame coding in HEVC. IEEE Access, 8, 95906–95917.
Lee, J.K., Kim, N., Cho, S., & Kang, J.W. (2018). Enhanced motion-compensated video coding with deep virtual reference frame generation, submitted to IEEE Transaction on Image Processing.
Guo, Y., Liu, Z., Chen, Z., & Liu, S. (2020). Deep inter coding with interpolated reference frame for hierarchical coding structure. IEEE International Conference on Visual Communications and Image Processing (VCIP), 2020, 302–305. https://doi.org/10.1109/VCIP49819.2020.9301769
Li, K., Bare, B., & Yan, B. (2017). An efficient deep convolutional neural networks model for compressed image deblocking, in International Conference on Multimedia and Expo (ICME), 2017, pp. 1320–1325.
He, P., Li, H., Wang, H., Wang, S., Jiang, X., & Zhang, R. (2020). Frame-wise detection of double HEVC compression by learning deep spatiotemporal representations in compression domain. IEEE Transaction on Multimediations, 9210(65), 1–14.
Brand, F., Seiler, J., & Kaup, A. (2021). Switchable motion models for non-block-based inter prediction in learning-based video coding. Picture Coding Symposium (PCS), 2021, 1–5. https://doi.org/10.1109/PCS50896.2021.9477475
Wiedemann, S., et al. (2019). DeepCABAC: Context-adaptive binary arithmetic coding for deep neural network compression,” arXiv, pp. 2–5.
Yin, H., Yang, H., Huang, X., Wang, H., & Yan, C. (2019). Multi-stage all-zero block detection for HEVC coding using machine learning. Journal of Visual Communication and Image Representative, 73(September), 102945.
Wang, M., Fang, X., Tan, S., Zhang, X., & Zhang, L. (2020). Low complexity quantization in high efficiency video coding. IEEE Access, 8, 145159–145170.
Puri, S., Lasserre, S., & Le Callet, P. (2017). CNN-based transform index prediction in multiple transforms framework to assist entropy coding, in Signal Processing Conference (EUSIPCO), European, pp. 798–802.
Y. Zhang, T. Shen, X. Ji, Y. Zhang, R. Xiong, and Q. Dai, “Residual Highway Convolutional Neural Networks for in-loop Filtering in HEVC,” IEEE Trans. on Image Processing, 2018.
Yuan, Z., Liu, H., Mukherjee, D., Adsumilli, B., & Wang, Y. (2021). Block-based learned image coding with convolutional autoencoder and intra-prediction aided entropy coding. Picture Coding Symposium (PCS), 2021, 1–5. https://doi.org/10.1109/PCS50896.2021.9477503
Dong, C., Deng, Y., Change Loy, C., & Tang, X. (2015). Compression artifacts reduction by a deep convolutional network, in Proceedings of the IEEE International Conference on Computer Vision, pp. 576–584.
Yang, K., Liu, D., & Wu, F. (2020). Deep learning-based nonlinear transform for HEVC intra coding. IEEE International Conference on Visual Communications and Image Processing (VCIP), 2020, 387–390. https://doi.org/10.1109/VCIP49819.2020.9301790
Jia, C., Wang, S., Zhang, X., Liu, J., Pu, S., Wang, S., & Ma, S. (2019). Content-aware convolutional neural network for in-loop filtering in high efficiency video coding. IEEE Trans. on Image Processing. https://doi.org/10.1109/TIP.2019.2896489
Song, X., Yao, J., Zhou, L., Wang, L., Wu, X., Xie, D., & Pu, S. (2018). A practical convolutional neural network as loop filter for intra frame, arXiv preprint arXiv:1805.06121.
Park, W.-S., & Kim, M. (2016). CNN-based in-loop filtering for coding efficiency improvement, in Image, Video, and Multidimensional Signal Processing Workshop (IVMSP), pp. 1–5.
Cui, K., Koyuncu, A. B., Boev, A., Alshina, E., & Steinbach, E. (2021). Convolutional neural network-based post-filtering for compressed YUV420 images and video. Picture Coding Symposium (PCS), 2021, 1–5. https://doi.org/10.1109/PCS50896.2021.9477486
Zhu, L., Zhang, Y., Wang, S., Yuan, H., Kwong, S., & Ip, H.H.-S. (2018). Con- volutional neural network-based synthesized view quality enhancement for 3d video coding. IEEE Transactions on Image Processing, 27(11), 5365–5377.
Yue, J., Gao, Y., Li, S., & Jia, M. (2020). A mixed appearance-based and coding distortion-based CNN fusion approach for in-loop filtering in video coding. IEEE International Conference on Visual Communications and Image Processing (VCIP), 2020, 487–490. https://doi.org/10.1109/VCIP49819.2020.9301895
Li, T., Xu, M., Zhu, C., Yang, R., Wang, Z., & Guan, Z. (2019). A deep learning approach for multi-frame in-loop filter of HEVC. IEEE Transactions on Image Processing, 28(11), 5663–5678.
Joy, H. K., & Kounte, M. R. (2022). Decision algorithm for intra prediction in high-efficiency video coding (HEVC). Journal of Southwest Jiaotong University, 57(5), 180–193. https://doi.org/10.35741/issn.0258-2724.57.5.15
Pan, Z., Yi, X., Zhang, Y., Jeon, B., & Kwong, S. (2020). Efficient in-loop filtering based on enhanced deep convolutional neural networks for HEVC. IEEE Transactions on Image Processing, 29, 5352–5366.
Dhanalakshmi, A., & Nagarajan, G. (2020). Combined spatial temporal based In-loop filter for scalable extension of HEVC. ICT Express, 6(4), 306–311.
Lai, P.R., & Wang, J.S. (2020). Multi-stage attention convolutional neural networks for HEVC in-loop filtering,” in Proceedings - 2020 IEEE International Conference on Artifical Intelligents Circuits System AICAS 2020, pp. 173–177.
Cavigelli, L., Hager, P. & Benini, L. (2017). CAS-CNN: A deep convolu- tional neural network for image compression artifact suppression, in International Joint Conference on Neural Networks (IJCNN). IEEE, pp. 752–759.
Joy, H. K., & Kounte, M. R. (2020). A comprehensive review of traditional video processing. Advances in Science, Technology and Engineering System Journal, 5(6), 274–279.
Acknowledgements
The authors would like to thank the anonymous referees for providing valuable suggestions which helped clarify the exposition of the material. The authors are greatly indebted to the anonymous reviewers whose thought-provoking and encouraging comments have motivated them to modify significantly and update the paper. They also like to express their gratitude to REVA University for extending research facilities to carry out this research.
Funding
The authors have not disclosed any funding.
Author information
Authors and Affiliations
Contributions
HKJ Analysis of existing video coding techniques and it evolution. Extensive research on multiple intra and inter prediction techniques with deep learning methodology. MRK Content on Deep learning-based Quantization and entropy, Future trends and open research scope identification. AC Overall review of section wise contents in the paper. Content contribution on open research scope identification. MP Overall review of section wise contents in the paper. Content contribution on recent techniques and reviews on loop filtering and its research white spaces.
Corresponding author
Ethics declarations
Competing interest
The authors declare that they have no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Joy, H.K., Kounte, M.R., Chandrasekhar, A. et al. Deep Learning Based Video Compression Techniques with Future Research Issues. Wireless Pers Commun 131, 2599–2625 (2023). https://doi.org/10.1007/s11277-023-10558-2
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11277-023-10558-2