Abstract
Deep Neural Networks (DNN) have emerged in recent year as a best-of-breed alternative for performing various classification, prediction and identification tasks in images and other fields of study. In the last few years, various research groups are exploring the option to harness them to improve video coding with the primary purpose of improving video compression rates while retaining same video quality. Evolving Neural Networks based video coding research efforts are focused on two different directions: (1) improving existing video codecs by performing better predictions that are incorporated within the same codec framework, and (2) holistic methods of end-to-end image/video compression schemes. While some of the results are promising and the prospects are good, no breakthrough has been reported as of yet. This chapter provides an overview of state-of-the-art research work, providing examples of few prominent published papers that illustrate and further explain the different highlighted topics in the field of using DNNs for video compression.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
A heat map image that reflects the movement magnitude and direction of individual pixels between consecutive video frames.
- 2.
A basic processing unit of HEVC that is the equivalent to block in previous standards (such as H.264).
References
Johannes Ballé, Valero Laparra, and Eero P. Simoncelli. End-to-end optimized image compression. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24–26, 2017, Conference Track Proceedings. OpenReview.net, 2017.
Raz Birman, Yoram Segal, Avishay David-Malka, and Ofer Hadar. Intra prediction with deep learning. In Applications of Digital Image Processing XLI, volume 10752, page 1075214. International Society for Optics and Photonics, 2018.
Raz Birman, Yoram Segal, and Ofer Hadar. Overview of research in the field of video compression using deep neural networks. Multim. Tools Appl., 79(17–18):11699–11722, 2020.
Raz Birman, Yoram Segal, Ofer Hadar, and Jenny Benois-Pineau. Improvements of motion estimation and coding using neural networks. arXiv preprint arXiv:2002.10439, 2020.
Souad Chaabouni, Jenny Benois-Pineau, Ofer Hadar, and Chokri Ben Amar. Deep learning for saliency prediction in natural video. CoRR, abs/1604.08010, 2016.
Roman I Chernyak. Analysis of the intra predictions in h. 265/hevc. Applied Mathematical Sciences, 8(148):7389–7408, 2014.
Zhibo Chen, Tianyu He, Xin Jin, and Feng Wu. Learning for video compression. IEEE Trans. Circuits Syst. Video Techn., 30(2):566–576, 2020.
Tong Chen, Haojie Liu, Qiu Shen, Tao Yue, Xun Cao, and Zhan Ma. Deepcoder: A deep neural network based video compression. In 2017 IEEE Visual Communications and Image Processing, VCIP 2017, St. Petersburg, FL, USA, December 10–13, 2017, pages 1–4. IEEE, 2017.
Wenxue Cui, Tao Zhang, Shengping Zhang, Feng Jiang, Wangmeng Zuo, and Debin Zhao. Convolutional neural networks based intra prediction for HEVC. CoRR, abs/1808.05734, 2018.
Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron C. Courville, and Yoshua Bengio. Generative adversarial networks. CoRR, abs/1406.2661, 2014.
Shuai Huo, Dong Liu, Feng Wu, and Houqiang Li. Convolutional neural network-based motion compensation refinement for video coding. In IEEE International Symposium on Circuits and Systems, ISCAS 2018, 27–30 May 2018, Florence, Italy, pages 1–4. IEEE, 2018.
Ofer Hadar, Ariel Shleifer, Debargha Mukherjee, Urvang Joshi, Itai Mazar, Michael Yuzvinsky, Nitzan Tavor, Nati Itzhak, and Raz Birman. Novel modes and adaptive block scanning order for intra prediction in av1. In Applications of Digital Image Processing XL, volume 10396, page 103960G. International Society for Optics and Photonics, 2017.
Yueyu Hu, Wenhan Yang, Mading Li, and Jiaying Liu. Progressive spatial recurrent neural network for intra prediction. IEEE Trans. Multimedia, 21(12):3024–3037, 2019.
Ehab M. Ibrahim, Emad Badry, Ahmed M. Abdelsalam, Ibrahim L. Abdalla, Mohammed Sayed, and Hossam Shalaby. Neural networks based fractional pixel motion estimation for HEVC. In 2018 IEEE International Symposium on Multimedia, ISM 2018, Taichung, Taiwan, December 10–12, 2018, pages 110–113. IEEE Computer Society, 2018.
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. Imagenet classification with deep convolutional neural networks. In Peter L. Bartlett, Fernando C. N. Pereira, Christopher J. C. Burges, Léon Bottou, and Kilian Q. Weinberger, editors, Advances in Neural Information Processing Systems 25: 26th Annual Conference on Neural Information Processing Systems 2012. Proceedings of a meeting held December 3–6, 2012, Lake Tahoe, Nevada, United States, pages 1106–1114, 2012.
Jani Lainema, Frank Bossen, Woojin Han, Junghye Min, and Kemal Ugur. Intra coding of the HEVC standard. IEEE Trans. Circuits Syst. Video Techn., 22(12):1792–1801, 2012.
Jung Kyung Lee, Na-Young Kim, Seunghyun Cho, and Je-Won Kang. Convolution neural network based video coding technique using reference video synthesis. In Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2018, Honolulu, HI, USA, November 12–15, 2018, pages 505–508. IEEE, 2018.
Ye Li, Bin Li, Dong Liu, and Zhibo Chen. A convolutional neural network-based approach to rate control in HEVC intra coding. In 2017 IEEE Visual Communications and Image Processing, VCIP 2017, St. Petersburg, FL, USA, December 10–13, 2017, pages 1–4. IEEE, 2017.
Jianping Lin, Dong Liu, Houqiang Li, and Feng Wu. Generative adversarial network-based frame extrapolation for video coding. In IEEE Visual Communications and Image Processing, VCIP 2018, Taichung, Taiwan, December 9–12, 2018, pages 1–4. IEEE, 2018.
Jiahao Li, Bin Li, Jizheng Xu, and Ruiqin Xiong. Intra prediction using fully connected network for video coding. In 2017 IEEE International Conference on Image Processing, ICIP 2017, Beijing, China, September 17–20, 2017, pages 1–5. IEEE, 2017.
Thorsten Laude and Jörn Ostermann. Deep learning-based intra prediction mode decision for HEVC. In 2016 Picture Coding Symposium, PCS 2016, Nuremberg, Germany, December 4–7, 2016, pages 1–5. IEEE, 2016.
Guo Lu, Wanli Ouyang, Dong Xu, Xiaoyun Zhang, Chunlei Cai, and Zhiyong Gao. DVC: an end-to-end deep video compression framework. CoRR, abs/1812.00101, 2018.
Anders Boesen Lindbo Larsen, Søren Kaae Sønderby, Hugo Larochelle, and Ole Winther. Autoencoding beyond pixels using a learned similarity metric. In Maria-Florina Balcan and Kilian Q. Weinberger, editors, Proceedings of the 33nd International Conference on Machine Learning, ICML 2016, New York City, NY, USA, June 19–24, 2016, volume 48 of JMLR Workshop and Conference Proceedings, pages 1558–1566. JMLR.org, 2016.
Honggui Li and Maria Trocan. Deep neural network based single pixel prediction for unified video coding. Neurocomputing, 272:558–570, 2018.
Jani Lainema and Kemal Ugur. Angular intra prediction in high efficiency video coding (HEVC). In IEEE 13th International Workshop on Multimedia Signal Processing (MMSP 2011), Hangzhou, China, October 17–19, 2011, pages 1–5. IEEE, 2011.
Jiaying Liu, Sifeng Xia, Wenhan Yang, Mading Li, and Dong Liu. One-for-all: Grouped variation network-based fractional interpolation in video coding. IEEE Trans. Image Process., 28(5):2140–2151, 2019.
Michaël Mathieu, Camille Couprie, and Yann LeCun. Deep multi-scale video prediction beyond mean square error. In Yoshua Bengio and Yann LeCun, editors, 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, May 2–4, 2016, Conference Track Proceedings, 2016.
Detlev Marpe, Thomas Wiegand, and Heiko Schwarz. Context-based adaptive binary arithmetic coding in the H.264/AVC video compression standard. IEEE Trans. Circuits Syst. Video Techn., 13(7):620–636, 2003.
Jens-Rainer Ohm and Gary J. Sullivan. High efficiency video coding: The next frontier in video compression [standards in a nutshell]. IEEE Signal Process. Mag., 30(1):152–158, 2013.
From trends and recent developments in video coding standardization by J.-R. Ohm and M. Wien (via slideshare). https://www.slideshare.net/MathiasWien/trends-and-recent-developments-in-video-coding-standardization.
Carlo Noel Ochotorena and Yukihiko Yamashita. Regression-based intra-prediction for image and video coding. CoRR, abs/1605.03754, 2016.
Iain E Richardson. The H. 264 advanced video compression standard. John Wiley & Sons, 2011.
Vivienne Sze, Madhukar Budagavi, and Gary J. Sullivan, editors. High Efficiency Video Coding (HEVC), Algorithms and Architectures. Integrated Circuits and Systems. Springer, 2014.
Shibani Santurkar, David M. Budden, and Nir Shavit. Generative compression. In 2018 Picture Coding Symposium, PCS 2018, San Francisco, CA, USA, June 24–27, 2018, pages 258–262. IEEE, 2018.
Xingjian Shi, Zhourong Chen, Hao Wang, Dit-Yan Yeung, Wai-Kin Wong, and Wang-chun Woo. Convolutional LSTM network: A machine learning approach for precipitation nowcasting. In Corinna Cortes, Neil D. Lawrence, Daniel D. Lee, Masashi Sugiyama, and Roman Garnett, editors, Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, December 7–12, 2015, Montreal, Quebec, Canada, pages 802–810, 2015.
Ionut Schiopu, Yu Liu, and Adrian Munteanu. Cnn-based prediction for lossless coding of photographic images. In 2018 Picture Coding Symposium, PCS 2018, San Francisco, CA, USA, June 24–27, 2018, pages 16–20. IEEE, 2018.
Alena Selimovic, Blaz Meden, Peter Peer, and Ales Hladnik. Analysis of content-aware image compression with VGG16. In IEEE International Work Conference on Bioinspired Intelligence, IWOBI 2018, San Carlos, Alajuela, Costa Rica, July 18–20, 2018, pages 1–7. IEEE, 2018.
Nitish Srivastava, Elman Mansimov, and Ruslan Salakhutdinov. Unsupervised learning of video representations using lstms. In Francis R. Bach and David M. Blei, editors, Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6–11 July 2015, volume 37 of JMLR Workshop and Conference Proceedings, pages 843–852. JMLR.org, 2015.
Maxim P Sharabayko, Oleg G Ponomarev, and Roman I Chernyak. Intra compression efficiency in VP9 and HEVC. Applied Mathematical Sciences, 7(137):6803–6824, 2013.
Wen Tao, Feng Jiang, Shengping Zhang, Jie Ren, Wuzhen Shi, Wangmeng Zuo, Xun Guo, and Debin Zhao. An end-to-end compression framework based on convolutional neural networks. In Ali Bilgin, Michael W. Marcellin, Joan Serra-Sagristà , and James A. Storer, editors, 2017 Data Compression Conference, DCC 2017, Snowbird, UT, USA, April 4–7, 2017, page 463. IEEE, 2017.
Aäron van den Oord, Nal Kalchbrenner, and Koray Kavukcuoglu. Pixel recurrent neural networks. In Maria-Florina Balcan and Kilian Q. Weinberger, editors, Proceedings of the 33nd International Conference on Machine Learning, ICML 2016, New York City, NY, USA, June 19–24, 2016, volume 48 of JMLR Workshop and Conference Proceedings, pages 1747–1756. JMLR.org, 2016.
Yang Wang, Xiaopeng Fan, Chuanmin Jia, Debin Zhao, and Wen Gao. Neural network based inter prediction for HEVC. In 2018 IEEE International Conference on Multimedia and Expo, ICME 2018, San Diego, CA, USA, July 23–27, 2018, pages 1–6. IEEE Computer Society, 2018.
Ning Yan, Dong Liu, Houqiang Li, Tong Xu, Feng Wu, and Bin Li. Convolutional neural network-based invertible half-pixel interpolation filter for video coding. In 2018 IEEE International Conference on Image Processing, ICIP 2018, Athens, Greece, October 7–10, 2018, pages 201–205. IEEE, 2018.
Shiping Zhu, Chang Liu, and Ziyao Xu. High-definition video compression system based on perception guidance of salient information of a convolutional neural network and HEVC compression domain. IEEE Transactions on Circuits and Systems for Video Technology, 2019.
Han Zhang, Li Song, Zhengyi Luo, and Xiaokang Yang. Learning a convolutional neural network for fractional interpolation in HEVC inter coding. In 2017 IEEE Visual Communications and Image Processing, VCIP 2017, St. Petersburg, FL, USA, December 10–13, 2017, pages 1–4. IEEE, 2017.
Zhenghui Zhao, Shiqi Wang, Shanshe Wang, Xinfeng Zhang, Siwei Ma, and Jiansheng Yang. CNN-based bi-directional motion compensation for high efficiency video coding. In IEEE International Symposium on Circuits and Systems, ISCAS 2018, 27–30 May 2018, Florence, Italy, pages 1–4. IEEE, 2018.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Hadar, O., Birman, R. (2021). Deep Learning in Video Compression Algorithms. In: Benois-Pineau, J., Zemmari, A. (eds) Multi-faceted Deep Learning. Springer, Cham. https://doi.org/10.1007/978-3-030-74478-6_8
Download citation
DOI: https://doi.org/10.1007/978-3-030-74478-6_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-74477-9
Online ISBN: 978-3-030-74478-6
eBook Packages: Computer ScienceComputer Science (R0)