Deep Learning in Video Compression Algorithms

Hadar, Ofer; Birman, Raz

doi:10.1007/978-3-030-74478-6_8

Ofer Hadar³ &
Raz Birman³

965 Accesses

Abstract

Deep Neural Networks (DNN) have emerged in recent year as a best-of-breed alternative for performing various classification, prediction and identification tasks in images and other fields of study. In the last few years, various research groups are exploring the option to harness them to improve video coding with the primary purpose of improving video compression rates while retaining same video quality. Evolving Neural Networks based video coding research efforts are focused on two different directions: (1) improving existing video codecs by performing better predictions that are incorporated within the same codec framework, and (2) holistic methods of end-to-end image/video compression schemes. While some of the results are promising and the prospects are good, no breakthrough has been reported as of yet. This chapter provides an overview of state-of-the-art research work, providing examples of few prominent published papers that illustrate and further explain the different highlighted topics in the field of using DNNs for video compression.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 149.00; Price excludes VAT (USA)

Softcover Book: USD 199.99; Price excludes VAT (USA)

Hardcover Book: USD 199.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
A heat map image that reflects the movement magnitude and direction of individual pixels between consecutive video frames.
2.
A basic processing unit of HEVC that is the equivalent to block in previous standards (such as H.264).

References

Johannes Ballé, Valero Laparra, and Eero P. Simoncelli. End-to-end optimized image compression. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24–26, 2017, Conference Track Proceedings. OpenReview.net, 2017.
Google Scholar
Raz Birman, Yoram Segal, Avishay David-Malka, and Ofer Hadar. Intra prediction with deep learning. In Applications of Digital Image Processing XLI, volume 10752, page 1075214. International Society for Optics and Photonics, 2018.
Google Scholar
Raz Birman, Yoram Segal, and Ofer Hadar. Overview of research in the field of video compression using deep neural networks. Multim. Tools Appl., 79(17–18):11699–11722, 2020.
Article Google Scholar
Raz Birman, Yoram Segal, Ofer Hadar, and Jenny Benois-Pineau. Improvements of motion estimation and coding using neural networks. arXiv preprint arXiv:2002.10439, 2020.
Google Scholar
Souad Chaabouni, Jenny Benois-Pineau, Ofer Hadar, and Chokri Ben Amar. Deep learning for saliency prediction in natural video. CoRR, abs/1604.08010, 2016.
Google Scholar
Roman I Chernyak. Analysis of the intra predictions in h. 265/hevc. Applied Mathematical Sciences, 8(148):7389–7408, 2014.
Google Scholar
Zhibo Chen, Tianyu He, Xin Jin, and Feng Wu. Learning for video compression. IEEE Trans. Circuits Syst. Video Techn., 30(2):566–576, 2020.
Article Google Scholar
Tong Chen, Haojie Liu, Qiu Shen, Tao Yue, Xun Cao, and Zhan Ma. Deepcoder: A deep neural network based video compression. In 2017 IEEE Visual Communications and Image Processing, VCIP 2017, St. Petersburg, FL, USA, December 10–13, 2017, pages 1–4. IEEE, 2017.
Google Scholar
Wenxue Cui, Tao Zhang, Shengping Zhang, Feng Jiang, Wangmeng Zuo, and Debin Zhao. Convolutional neural networks based intra prediction for HEVC. CoRR, abs/1808.05734, 2018.
Google Scholar
Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron C. Courville, and Yoshua Bengio. Generative adversarial networks. CoRR, abs/1406.2661, 2014.
Google Scholar
Shuai Huo, Dong Liu, Feng Wu, and Houqiang Li. Convolutional neural network-based motion compensation refinement for video coding. In IEEE International Symposium on Circuits and Systems, ISCAS 2018, 27–30 May 2018, Florence, Italy, pages 1–4. IEEE, 2018.
Google Scholar
Ofer Hadar, Ariel Shleifer, Debargha Mukherjee, Urvang Joshi, Itai Mazar, Michael Yuzvinsky, Nitzan Tavor, Nati Itzhak, and Raz Birman. Novel modes and adaptive block scanning order for intra prediction in av1. In Applications of Digital Image Processing XL, volume 10396, page 103960G. International Society for Optics and Photonics, 2017.
Google Scholar
Yueyu Hu, Wenhan Yang, Mading Li, and Jiaying Liu. Progressive spatial recurrent neural network for intra prediction. IEEE Trans. Multimedia, 21(12):3024–3037, 2019.
Article Google Scholar
Ehab M. Ibrahim, Emad Badry, Ahmed M. Abdelsalam, Ibrahim L. Abdalla, Mohammed Sayed, and Hossam Shalaby. Neural networks based fractional pixel motion estimation for HEVC. In 2018 IEEE International Symposium on Multimedia, ISM 2018, Taichung, Taiwan, December 10–12, 2018, pages 110–113. IEEE Computer Society, 2018.
Google Scholar
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. Imagenet classification with deep convolutional neural networks. In Peter L. Bartlett, Fernando C. N. Pereira, Christopher J. C. Burges, Léon Bottou, and Kilian Q. Weinberger, editors, Advances in Neural Information Processing Systems 25: 26th Annual Conference on Neural Information Processing Systems 2012. Proceedings of a meeting held December 3–6, 2012, Lake Tahoe, Nevada, United States, pages 1106–1114, 2012.
Google Scholar
Jani Lainema, Frank Bossen, Woojin Han, Junghye Min, and Kemal Ugur. Intra coding of the HEVC standard. IEEE Trans. Circuits Syst. Video Techn., 22(12):1792–1801, 2012.
Article Google Scholar
Jung Kyung Lee, Na-Young Kim, Seunghyun Cho, and Je-Won Kang. Convolution neural network based video coding technique using reference video synthesis. In Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2018, Honolulu, HI, USA, November 12–15, 2018, pages 505–508. IEEE, 2018.
Google Scholar
Ye Li, Bin Li, Dong Liu, and Zhibo Chen. A convolutional neural network-based approach to rate control in HEVC intra coding. In 2017 IEEE Visual Communications and Image Processing, VCIP 2017, St. Petersburg, FL, USA, December 10–13, 2017, pages 1–4. IEEE, 2017.
Google Scholar
Jianping Lin, Dong Liu, Houqiang Li, and Feng Wu. Generative adversarial network-based frame extrapolation for video coding. In IEEE Visual Communications and Image Processing, VCIP 2018, Taichung, Taiwan, December 9–12, 2018, pages 1–4. IEEE, 2018.
Google Scholar
Jiahao Li, Bin Li, Jizheng Xu, and Ruiqin Xiong. Intra prediction using fully connected network for video coding. In 2017 IEEE International Conference on Image Processing, ICIP 2017, Beijing, China, September 17–20, 2017, pages 1–5. IEEE, 2017.
Google Scholar
Thorsten Laude and Jörn Ostermann. Deep learning-based intra prediction mode decision for HEVC. In 2016 Picture Coding Symposium, PCS 2016, Nuremberg, Germany, December 4–7, 2016, pages 1–5. IEEE, 2016.
Google Scholar
Guo Lu, Wanli Ouyang, Dong Xu, Xiaoyun Zhang, Chunlei Cai, and Zhiyong Gao. DVC: an end-to-end deep video compression framework. CoRR, abs/1812.00101, 2018.
Google Scholar
Anders Boesen Lindbo Larsen, Søren Kaae Sønderby, Hugo Larochelle, and Ole Winther. Autoencoding beyond pixels using a learned similarity metric. In Maria-Florina Balcan and Kilian Q. Weinberger, editors, Proceedings of the 33nd International Conference on Machine Learning, ICML 2016, New York City, NY, USA, June 19–24, 2016, volume 48 of JMLR Workshop and Conference Proceedings, pages 1558–1566. JMLR.org, 2016.
Google Scholar
Honggui Li and Maria Trocan. Deep neural network based single pixel prediction for unified video coding. Neurocomputing, 272:558–570, 2018.
Article Google Scholar
Jani Lainema and Kemal Ugur. Angular intra prediction in high efficiency video coding (HEVC). In IEEE 13th International Workshop on Multimedia Signal Processing (MMSP 2011), Hangzhou, China, October 17–19, 2011, pages 1–5. IEEE, 2011.
Google Scholar
Jiaying Liu, Sifeng Xia, Wenhan Yang, Mading Li, and Dong Liu. One-for-all: Grouped variation network-based fractional interpolation in video coding. IEEE Trans. Image Process., 28(5):2140–2151, 2019.
Article MathSciNet Google Scholar
Michaël Mathieu, Camille Couprie, and Yann LeCun. Deep multi-scale video prediction beyond mean square error. In Yoshua Bengio and Yann LeCun, editors, 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, May 2–4, 2016, Conference Track Proceedings, 2016.
Google Scholar
Detlev Marpe, Thomas Wiegand, and Heiko Schwarz. Context-based adaptive binary arithmetic coding in the H.264/AVC video compression standard. IEEE Trans. Circuits Syst. Video Techn., 13(7):620–636, 2003.
Google Scholar
Jens-Rainer Ohm and Gary J. Sullivan. High efficiency video coding: The next frontier in video compression [standards in a nutshell]. IEEE Signal Process. Mag., 30(1):152–158, 2013.
Article Google Scholar
From trends and recent developments in video coding standardization by J.-R. Ohm and M. Wien (via slideshare). https://www.slideshare.net/MathiasWien/trends-and-recent-developments-in-video-coding-standardization.
Carlo Noel Ochotorena and Yukihiko Yamashita. Regression-based intra-prediction for image and video coding. CoRR, abs/1605.03754, 2016.
Google Scholar
Iain E Richardson. The H. 264 advanced video compression standard. John Wiley & Sons, 2011.
Google Scholar
Vivienne Sze, Madhukar Budagavi, and Gary J. Sullivan, editors. High Efficiency Video Coding (HEVC), Algorithms and Architectures. Integrated Circuits and Systems. Springer, 2014.
Google Scholar
Shibani Santurkar, David M. Budden, and Nir Shavit. Generative compression. In 2018 Picture Coding Symposium, PCS 2018, San Francisco, CA, USA, June 24–27, 2018, pages 258–262. IEEE, 2018.
Google Scholar
Xingjian Shi, Zhourong Chen, Hao Wang, Dit-Yan Yeung, Wai-Kin Wong, and Wang-chun Woo. Convolutional LSTM network: A machine learning approach for precipitation nowcasting. In Corinna Cortes, Neil D. Lawrence, Daniel D. Lee, Masashi Sugiyama, and Roman Garnett, editors, Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, December 7–12, 2015, Montreal, Quebec, Canada, pages 802–810, 2015.
Google Scholar
Ionut Schiopu, Yu Liu, and Adrian Munteanu. Cnn-based prediction for lossless coding of photographic images. In 2018 Picture Coding Symposium, PCS 2018, San Francisco, CA, USA, June 24–27, 2018, pages 16–20. IEEE, 2018.
Google Scholar
Alena Selimovic, Blaz Meden, Peter Peer, and Ales Hladnik. Analysis of content-aware image compression with VGG16. In IEEE International Work Conference on Bioinspired Intelligence, IWOBI 2018, San Carlos, Alajuela, Costa Rica, July 18–20, 2018, pages 1–7. IEEE, 2018.
Google Scholar
Nitish Srivastava, Elman Mansimov, and Ruslan Salakhutdinov. Unsupervised learning of video representations using lstms. In Francis R. Bach and David M. Blei, editors, Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6–11 July 2015, volume 37 of JMLR Workshop and Conference Proceedings, pages 843–852. JMLR.org, 2015.
Google Scholar
Maxim P Sharabayko, Oleg G Ponomarev, and Roman I Chernyak. Intra compression efficiency in VP9 and HEVC. Applied Mathematical Sciences, 7(137):6803–6824, 2013.
Google Scholar
Wen Tao, Feng Jiang, Shengping Zhang, Jie Ren, Wuzhen Shi, Wangmeng Zuo, Xun Guo, and Debin Zhao. An end-to-end compression framework based on convolutional neural networks. In Ali Bilgin, Michael W. Marcellin, Joan Serra-Sagristà, and James A. Storer, editors, 2017 Data Compression Conference, DCC 2017, Snowbird, UT, USA, April 4–7, 2017, page 463. IEEE, 2017.
Google Scholar
Aäron van den Oord, Nal Kalchbrenner, and Koray Kavukcuoglu. Pixel recurrent neural networks. In Maria-Florina Balcan and Kilian Q. Weinberger, editors, Proceedings of the 33nd International Conference on Machine Learning, ICML 2016, New York City, NY, USA, June 19–24, 2016, volume 48 of JMLR Workshop and Conference Proceedings, pages 1747–1756. JMLR.org, 2016.
Google Scholar
Yang Wang, Xiaopeng Fan, Chuanmin Jia, Debin Zhao, and Wen Gao. Neural network based inter prediction for HEVC. In 2018 IEEE International Conference on Multimedia and Expo, ICME 2018, San Diego, CA, USA, July 23–27, 2018, pages 1–6. IEEE Computer Society, 2018.
Google Scholar
Ning Yan, Dong Liu, Houqiang Li, Tong Xu, Feng Wu, and Bin Li. Convolutional neural network-based invertible half-pixel interpolation filter for video coding. In 2018 IEEE International Conference on Image Processing, ICIP 2018, Athens, Greece, October 7–10, 2018, pages 201–205. IEEE, 2018.
Google Scholar
Shiping Zhu, Chang Liu, and Ziyao Xu. High-definition video compression system based on perception guidance of salient information of a convolutional neural network and HEVC compression domain. IEEE Transactions on Circuits and Systems for Video Technology, 2019.
Google Scholar
Han Zhang, Li Song, Zhengyi Luo, and Xiaokang Yang. Learning a convolutional neural network for fractional interpolation in HEVC inter coding. In 2017 IEEE Visual Communications and Image Processing, VCIP 2017, St. Petersburg, FL, USA, December 10–13, 2017, pages 1–4. IEEE, 2017.
Google Scholar
Zhenghui Zhao, Shiqi Wang, Shanshe Wang, Xinfeng Zhang, Siwei Ma, and Jiansheng Yang. CNN-based bi-directional motion compensation for high efficiency video coding. In IEEE International Symposium on Circuits and Systems, ISCAS 2018, 27–30 May 2018, Florence, Italy, pages 1–4. IEEE, 2018.
Google Scholar

Download references

Author information

Authors and Affiliations

Ben Gurion University of the Negev, School of Electrical and Computer Engineering, Beersheba, Israel
Ofer Hadar & Raz Birman

Authors

Ofer Hadar
View author publications
You can also search for this author in PubMed Google Scholar
Raz Birman
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ofer Hadar .

Editor information

Editors and Affiliations

LaBRI UMR 5800, University of Bordeaux, Talence Cedex, France
Jenny Benois-Pineau
LaBRI UMR 5800, University of Bordeaux, Talence Cedex, France
Akka Zemmari

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Hadar, O., Birman, R. (2021). Deep Learning in Video Compression Algorithms. In: Benois-Pineau, J., Zemmari, A. (eds) Multi-faceted Deep Learning. Springer, Cham. https://doi.org/10.1007/978-3-030-74478-6_8

Download citation

DOI: https://doi.org/10.1007/978-3-030-74478-6_8
Published: 24 February 2012
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-74477-9
Online ISBN: 978-3-030-74478-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics