Skip to main content

Deep Learning in Video Compression Algorithms

  • Chapter
  • First Online:
Multi-faceted Deep Learning
  • 965 Accesses

Abstract

Deep Neural Networks (DNN) have emerged in recent year as a best-of-breed alternative for performing various classification, prediction and identification tasks in images and other fields of study. In the last few years, various research groups are exploring the option to harness them to improve video coding with the primary purpose of improving video compression rates while retaining same video quality. Evolving Neural Networks based video coding research efforts are focused on two different directions: (1) improving existing video codecs by performing better predictions that are incorporated within the same codec framework, and (2) holistic methods of end-to-end image/video compression schemes. While some of the results are promising and the prospects are good, no breakthrough has been reported as of yet. This chapter provides an overview of state-of-the-art research work, providing examples of few prominent published papers that illustrate and further explain the different highlighted topics in the field of using DNNs for video compression.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 199.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    A heat map image that reflects the movement magnitude and direction of individual pixels between consecutive video frames.

  2. 2.

    A basic processing unit of HEVC that is the equivalent to block in previous standards (such as H.264).

References

  1. Johannes Ballé, Valero Laparra, and Eero P. Simoncelli. End-to-end optimized image compression. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24–26, 2017, Conference Track Proceedings. OpenReview.net, 2017.

    Google Scholar 

  2. Raz Birman, Yoram Segal, Avishay David-Malka, and Ofer Hadar. Intra prediction with deep learning. In Applications of Digital Image Processing XLI, volume 10752, page 1075214. International Society for Optics and Photonics, 2018.

    Google Scholar 

  3. Raz Birman, Yoram Segal, and Ofer Hadar. Overview of research in the field of video compression using deep neural networks. Multim. Tools Appl., 79(17–18):11699–11722, 2020.

    Article  Google Scholar 

  4. Raz Birman, Yoram Segal, Ofer Hadar, and Jenny Benois-Pineau. Improvements of motion estimation and coding using neural networks. arXiv preprint arXiv:2002.10439, 2020.

    Google Scholar 

  5. Souad Chaabouni, Jenny Benois-Pineau, Ofer Hadar, and Chokri Ben Amar. Deep learning for saliency prediction in natural video. CoRR, abs/1604.08010, 2016.

    Google Scholar 

  6. Roman I Chernyak. Analysis of the intra predictions in h. 265/hevc. Applied Mathematical Sciences, 8(148):7389–7408, 2014.

    Google Scholar 

  7. Zhibo Chen, Tianyu He, Xin Jin, and Feng Wu. Learning for video compression. IEEE Trans. Circuits Syst. Video Techn., 30(2):566–576, 2020.

    Article  Google Scholar 

  8. Tong Chen, Haojie Liu, Qiu Shen, Tao Yue, Xun Cao, and Zhan Ma. Deepcoder: A deep neural network based video compression. In 2017 IEEE Visual Communications and Image Processing, VCIP 2017, St. Petersburg, FL, USA, December 10–13, 2017, pages 1–4. IEEE, 2017.

    Google Scholar 

  9. Wenxue Cui, Tao Zhang, Shengping Zhang, Feng Jiang, Wangmeng Zuo, and Debin Zhao. Convolutional neural networks based intra prediction for HEVC. CoRR, abs/1808.05734, 2018.

    Google Scholar 

  10. Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron C. Courville, and Yoshua Bengio. Generative adversarial networks. CoRR, abs/1406.2661, 2014.

    Google Scholar 

  11. Shuai Huo, Dong Liu, Feng Wu, and Houqiang Li. Convolutional neural network-based motion compensation refinement for video coding. In IEEE International Symposium on Circuits and Systems, ISCAS 2018, 27–30 May 2018, Florence, Italy, pages 1–4. IEEE, 2018.

    Google Scholar 

  12. Ofer Hadar, Ariel Shleifer, Debargha Mukherjee, Urvang Joshi, Itai Mazar, Michael Yuzvinsky, Nitzan Tavor, Nati Itzhak, and Raz Birman. Novel modes and adaptive block scanning order for intra prediction in av1. In Applications of Digital Image Processing XL, volume 10396, page 103960G. International Society for Optics and Photonics, 2017.

    Google Scholar 

  13. Yueyu Hu, Wenhan Yang, Mading Li, and Jiaying Liu. Progressive spatial recurrent neural network for intra prediction. IEEE Trans. Multimedia, 21(12):3024–3037, 2019.

    Article  Google Scholar 

  14. Ehab M. Ibrahim, Emad Badry, Ahmed M. Abdelsalam, Ibrahim L. Abdalla, Mohammed Sayed, and Hossam Shalaby. Neural networks based fractional pixel motion estimation for HEVC. In 2018 IEEE International Symposium on Multimedia, ISM 2018, Taichung, Taiwan, December 10–12, 2018, pages 110–113. IEEE Computer Society, 2018.

    Google Scholar 

  15. Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. Imagenet classification with deep convolutional neural networks. In Peter L. Bartlett, Fernando C. N. Pereira, Christopher J. C. Burges, Léon Bottou, and Kilian Q. Weinberger, editors, Advances in Neural Information Processing Systems 25: 26th Annual Conference on Neural Information Processing Systems 2012. Proceedings of a meeting held December 3–6, 2012, Lake Tahoe, Nevada, United States, pages 1106–1114, 2012.

    Google Scholar 

  16. Jani Lainema, Frank Bossen, Woojin Han, Junghye Min, and Kemal Ugur. Intra coding of the HEVC standard. IEEE Trans. Circuits Syst. Video Techn., 22(12):1792–1801, 2012.

    Article  Google Scholar 

  17. Jung Kyung Lee, Na-Young Kim, Seunghyun Cho, and Je-Won Kang. Convolution neural network based video coding technique using reference video synthesis. In Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2018, Honolulu, HI, USA, November 12–15, 2018, pages 505–508. IEEE, 2018.

    Google Scholar 

  18. Ye Li, Bin Li, Dong Liu, and Zhibo Chen. A convolutional neural network-based approach to rate control in HEVC intra coding. In 2017 IEEE Visual Communications and Image Processing, VCIP 2017, St. Petersburg, FL, USA, December 10–13, 2017, pages 1–4. IEEE, 2017.

    Google Scholar 

  19. Jianping Lin, Dong Liu, Houqiang Li, and Feng Wu. Generative adversarial network-based frame extrapolation for video coding. In IEEE Visual Communications and Image Processing, VCIP 2018, Taichung, Taiwan, December 9–12, 2018, pages 1–4. IEEE, 2018.

    Google Scholar 

  20. Jiahao Li, Bin Li, Jizheng Xu, and Ruiqin Xiong. Intra prediction using fully connected network for video coding. In 2017 IEEE International Conference on Image Processing, ICIP 2017, Beijing, China, September 17–20, 2017, pages 1–5. IEEE, 2017.

    Google Scholar 

  21. Thorsten Laude and Jörn Ostermann. Deep learning-based intra prediction mode decision for HEVC. In 2016 Picture Coding Symposium, PCS 2016, Nuremberg, Germany, December 4–7, 2016, pages 1–5. IEEE, 2016.

    Google Scholar 

  22. Guo Lu, Wanli Ouyang, Dong Xu, Xiaoyun Zhang, Chunlei Cai, and Zhiyong Gao. DVC: an end-to-end deep video compression framework. CoRR, abs/1812.00101, 2018.

    Google Scholar 

  23. Anders Boesen Lindbo Larsen, Søren Kaae Sønderby, Hugo Larochelle, and Ole Winther. Autoencoding beyond pixels using a learned similarity metric. In Maria-Florina Balcan and Kilian Q. Weinberger, editors, Proceedings of the 33nd International Conference on Machine Learning, ICML 2016, New York City, NY, USA, June 19–24, 2016, volume 48 of JMLR Workshop and Conference Proceedings, pages 1558–1566. JMLR.org, 2016.

    Google Scholar 

  24. Honggui Li and Maria Trocan. Deep neural network based single pixel prediction for unified video coding. Neurocomputing, 272:558–570, 2018.

    Article  Google Scholar 

  25. Jani Lainema and Kemal Ugur. Angular intra prediction in high efficiency video coding (HEVC). In IEEE 13th International Workshop on Multimedia Signal Processing (MMSP 2011), Hangzhou, China, October 17–19, 2011, pages 1–5. IEEE, 2011.

    Google Scholar 

  26. Jiaying Liu, Sifeng Xia, Wenhan Yang, Mading Li, and Dong Liu. One-for-all: Grouped variation network-based fractional interpolation in video coding. IEEE Trans. Image Process., 28(5):2140–2151, 2019.

    Article  MathSciNet  Google Scholar 

  27. Michaël Mathieu, Camille Couprie, and Yann LeCun. Deep multi-scale video prediction beyond mean square error. In Yoshua Bengio and Yann LeCun, editors, 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, May 2–4, 2016, Conference Track Proceedings, 2016.

    Google Scholar 

  28. Detlev Marpe, Thomas Wiegand, and Heiko Schwarz. Context-based adaptive binary arithmetic coding in the H.264/AVC video compression standard. IEEE Trans. Circuits Syst. Video Techn., 13(7):620–636, 2003.

    Google Scholar 

  29. Jens-Rainer Ohm and Gary J. Sullivan. High efficiency video coding: The next frontier in video compression [standards in a nutshell]. IEEE Signal Process. Mag., 30(1):152–158, 2013.

    Article  Google Scholar 

  30. From trends and recent developments in video coding standardization by J.-R. Ohm and M. Wien (via slideshare). https://www.slideshare.net/MathiasWien/trends-and-recent-developments-in-video-coding-standardization.

  31. Carlo Noel Ochotorena and Yukihiko Yamashita. Regression-based intra-prediction for image and video coding. CoRR, abs/1605.03754, 2016.

    Google Scholar 

  32. Iain E Richardson. The H. 264 advanced video compression standard. John Wiley & Sons, 2011.

    Google Scholar 

  33. Vivienne Sze, Madhukar Budagavi, and Gary J. Sullivan, editors. High Efficiency Video Coding (HEVC), Algorithms and Architectures. Integrated Circuits and Systems. Springer, 2014.

    Google Scholar 

  34. Shibani Santurkar, David M. Budden, and Nir Shavit. Generative compression. In 2018 Picture Coding Symposium, PCS 2018, San Francisco, CA, USA, June 24–27, 2018, pages 258–262. IEEE, 2018.

    Google Scholar 

  35. Xingjian Shi, Zhourong Chen, Hao Wang, Dit-Yan Yeung, Wai-Kin Wong, and Wang-chun Woo. Convolutional LSTM network: A machine learning approach for precipitation nowcasting. In Corinna Cortes, Neil D. Lawrence, Daniel D. Lee, Masashi Sugiyama, and Roman Garnett, editors, Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, December 7–12, 2015, Montreal, Quebec, Canada, pages 802–810, 2015.

    Google Scholar 

  36. Ionut Schiopu, Yu Liu, and Adrian Munteanu. Cnn-based prediction for lossless coding of photographic images. In 2018 Picture Coding Symposium, PCS 2018, San Francisco, CA, USA, June 24–27, 2018, pages 16–20. IEEE, 2018.

    Google Scholar 

  37. Alena Selimovic, Blaz Meden, Peter Peer, and Ales Hladnik. Analysis of content-aware image compression with VGG16. In IEEE International Work Conference on Bioinspired Intelligence, IWOBI 2018, San Carlos, Alajuela, Costa Rica, July 18–20, 2018, pages 1–7. IEEE, 2018.

    Google Scholar 

  38. Nitish Srivastava, Elman Mansimov, and Ruslan Salakhutdinov. Unsupervised learning of video representations using lstms. In Francis R. Bach and David M. Blei, editors, Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6–11 July 2015, volume 37 of JMLR Workshop and Conference Proceedings, pages 843–852. JMLR.org, 2015.

    Google Scholar 

  39. Maxim P Sharabayko, Oleg G Ponomarev, and Roman I Chernyak. Intra compression efficiency in VP9 and HEVC. Applied Mathematical Sciences, 7(137):6803–6824, 2013.

    Google Scholar 

  40. Wen Tao, Feng Jiang, Shengping Zhang, Jie Ren, Wuzhen Shi, Wangmeng Zuo, Xun Guo, and Debin Zhao. An end-to-end compression framework based on convolutional neural networks. In Ali Bilgin, Michael W. Marcellin, Joan Serra-Sagristà, and James A. Storer, editors, 2017 Data Compression Conference, DCC 2017, Snowbird, UT, USA, April 4–7, 2017, page 463. IEEE, 2017.

    Google Scholar 

  41. Aäron van den Oord, Nal Kalchbrenner, and Koray Kavukcuoglu. Pixel recurrent neural networks. In Maria-Florina Balcan and Kilian Q. Weinberger, editors, Proceedings of the 33nd International Conference on Machine Learning, ICML 2016, New York City, NY, USA, June 19–24, 2016, volume 48 of JMLR Workshop and Conference Proceedings, pages 1747–1756. JMLR.org, 2016.

    Google Scholar 

  42. Yang Wang, Xiaopeng Fan, Chuanmin Jia, Debin Zhao, and Wen Gao. Neural network based inter prediction for HEVC. In 2018 IEEE International Conference on Multimedia and Expo, ICME 2018, San Diego, CA, USA, July 23–27, 2018, pages 1–6. IEEE Computer Society, 2018.

    Google Scholar 

  43. Ning Yan, Dong Liu, Houqiang Li, Tong Xu, Feng Wu, and Bin Li. Convolutional neural network-based invertible half-pixel interpolation filter for video coding. In 2018 IEEE International Conference on Image Processing, ICIP 2018, Athens, Greece, October 7–10, 2018, pages 201–205. IEEE, 2018.

    Google Scholar 

  44. Shiping Zhu, Chang Liu, and Ziyao Xu. High-definition video compression system based on perception guidance of salient information of a convolutional neural network and HEVC compression domain. IEEE Transactions on Circuits and Systems for Video Technology, 2019.

    Google Scholar 

  45. Han Zhang, Li Song, Zhengyi Luo, and Xiaokang Yang. Learning a convolutional neural network for fractional interpolation in HEVC inter coding. In 2017 IEEE Visual Communications and Image Processing, VCIP 2017, St. Petersburg, FL, USA, December 10–13, 2017, pages 1–4. IEEE, 2017.

    Google Scholar 

  46. Zhenghui Zhao, Shiqi Wang, Shanshe Wang, Xinfeng Zhang, Siwei Ma, and Jiansheng Yang. CNN-based bi-directional motion compensation for high efficiency video coding. In IEEE International Symposium on Circuits and Systems, ISCAS 2018, 27–30 May 2018, Florence, Italy, pages 1–4. IEEE, 2018.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ofer Hadar .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this chapter

Cite this chapter

Hadar, O., Birman, R. (2021). Deep Learning in Video Compression Algorithms. In: Benois-Pineau, J., Zemmari, A. (eds) Multi-faceted Deep Learning. Springer, Cham. https://doi.org/10.1007/978-3-030-74478-6_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-74478-6_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-74477-9

  • Online ISBN: 978-3-030-74478-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics