Skip to main content

Improving Video Quality with Generative Adversarial Networks

  • Chapter
  • First Online:
Book cover Multi-faceted Deep Learning

Abstract

The chapter will cover deep learning methodologies that can be employed to recover image and video quality. Most of the covered approaches will be based on conditional Generative Adversarial Networks (GAN) which have the benefit to produce images which look more natural. Looking at the inference phase we will show how to perform such operations with a low computational footprint. Regarding the training phase we will address in depth architectural choices, loss functions and training strategies in general. Finally, we will also deal with settings in which there is the possibility to control both end of the image transmission pipeline.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 199.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Jason Antic, Jeremy Howard, and Uri Manor. Decrappification, deoldification, and super resolution, 2019.

    Google Scholar 

  2. Noor Al-Shakarji, Filiz Bunyak, Hadi Aliakbarpour, Guna Seetharaman, and Kannappan Palaniappan. Multi-cue vehicle detection for semantic video compression in georegistered aerial videos. In Proc. of (CVPR) Workshops, June 2019.

    Google Scholar 

  3. Eirikur Agustsson and Radu Timofte. Ntire 2017 challenge on single image super-resolution: Dataset and study. In Proc. of IEEE CVPR Workshops, 2017.

    Google Scholar 

  4. A Diana Andrushia and R Thangarjan. Saliency-based image compression using Walsh–Hadamard transform (WHT). In Biologically rationalized computing techniques for image processing applications, pages 21–42. Springer, 2018.

    Google Scholar 

  5. Eirikur Agustsson, Michael Tschannen, Fabian Mentzer, Radu Timofte, and Luc Van Gool. Generative adversarial networks for extreme learned image compression. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), October 2019.

    Google Scholar 

  6. Andrew D. Bagdanov, Marco Bertini, Alberto Del Bimbo, and Lorenzo Seidenari. Adaptive video compression for video surveillance applications. In Proc. of International Symposium on Multimedia, 2011.

    Google Scholar 

  7. Marco Bertini, Alberto Del Bimbo, Andrea Prati, and Rita Cucchiara. Semantic adaptation of sport videos with user-centred performance analysis. IEEE Transactions on Multimedia, 8(3):433–443, Jun 2006.

    Article  Google Scholar 

  8. Yochai Blau and Tomer Michaeli. Rethinking lossy compression: The rate-distortion-perception tradeoff. In Proc. of ICML, 2019.

    Google Scholar 

  9. Joan Bruna, Pablo Sprechmann, and Yann LeCun. Super-resolution with deep convolutional sufficient statistics. CoRR, abs/1511.05666, 2015.

    Google Scholar 

  10. Daniel Bolya, Chong Zhou, Fanyi Xiao, and Yong Jae Lee. Yolact: real-time instance segmentation. In Proc. of International Conference on Computer Vision, pages 9157–9166, 2019.

    Google Scholar 

  11. Lukas Cavigelli, Pascal Hager, and Luca Benini. CAS-CNN: A deep convolutional neural network for image compression artifact suppression. In Proc. of IJCNN, 2017.

    Google Scholar 

  12. A. Dosovitskiy and T. Brox. Generating images with perceptual similarity metrics based on deep networks. In Proc. of NIPS, 2016.

    Google Scholar 

  13. Y. Dar, A. M. Bruckstein, M. Elad, and R. Giryes. Postprocessing of compressed images via sequential denoising. IEEE Transactions on Image Processing, 25(7):3044–3058, July 2016.

    Article  MathSciNet  Google Scholar 

  14. Chao Dong, Yubin Deng, Chen Change Loy, and Xiaoou Tang. Compression artifacts reduction by a deep convolutional network. In Proc. of International Conference on Computer Vision, 2015.

    Google Scholar 

  15. Xueyang Fu, Zheng-Jun Zha, Feng Wu, Xinghao Ding, and John Paisley. Jpeg artifacts reduction via deep convolutional sparse coding. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), October 2019.

    Google Scholar 

  16. Leonardo Galteri, Marco Bertini, Lorenzo Seidenari, Tiberio Uricchio, and Alberto Del Bimbo. Increasing video perceptual quality with gans and semantic coding. In Proc. of ACM International Conference on Multimedia (ACM MM), MM ’20, pages 862–870, New York, NY, USA, 2020. Association for Computing Machinery.

    Google Scholar 

  17. Leonardo Galteri, Marco Bertini, Lorenzo Seidenari, Tiberio Uricchio, and Alberto Del Bimbo. Increasing video perceptual quality with gans and semantic coding. In Proceedings of the 28th ACM International Conference on Multimedia, pages 862–870, 2020.

    Google Scholar 

  18. Leonardo Galteri, Marco Bertini, Lorenzo Seidenari, and Alberto Del Bimbo. Video compression for object detection algorithms. In Proc. of International Conference on Pattern Recognition, pages 3007–3012. IEEE, 2018.

    Google Scholar 

  19. Leon A. Gatys, Alexander S. Ecker, and Matthias Bethge. Texture synthesis and the controlled generation of natural stimuli using convolutional neural networks. CoRR, abs/1505.07376, 2015.

    Google Scholar 

  20. Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. In Proc. of NIPS, 2014.

    Google Scholar 

  21. Leonardo Galteri, Lorenzo Seidenari, Marco Bertini, Tiberio Uricchio, and Alberto Del Bimbo. Fast video quality enhancement using gans. In Proc. of ACM Multimedia, MM ’19, pages 1065–1067, New York, NY, USA, 2019. Association for Computing Machinery.

    Google Scholar 

  22. Leonardo Galteri, Lorenzo Seidenari, Marco Bertini, and Alberto Del Bimbo. Deep generative adversarial compression artifact removal. In Proc. of International Conference on Computer Vision, 2017.

    Google Scholar 

  23. L. Galteri, L. Seidenari, M. Bertini, and A. Del Bimbo. Deep universal generative adversarial compression artifact removal. IEEE Transactions on Multimedia, pages 1–1, 2019.

    Google Scholar 

  24. Leonardo Galteri, Lorenzo Seidenari, Marco Bertini, and Alberto Del Bimbo. Towards real-time image enhancement gans. In Proc. of International Conference on Analysis of Images and Patterns (CAIP). IAPR, 2019.

    Google Scholar 

  25. Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. Mask r-cnn. In Proc. of International Conference on Computer Vision, pages 2961–2969, 2017.

    Google Scholar 

  26. Andrew Howard, Mark Sandler, Grace Chu, Liang-Chieh Chen, Bo Chen, Mingxing Tan, Weijun Wang, Yukun Zhu, Ruoming Pang, Vijay Vasudevan, et al. Searching for mobilenetv3. In Proceedings of the IEEE International Conference on Computer Vision, pages 1314–1324, 2019.

    Google Scholar 

  27. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proc. of IEEE Computer Vision and Pattern Recognition, 2016.

    Google Scholar 

  28. Andrey Ignatov, Radu Timofte, et al. Pirm challenge on perceptual image enhancement on smartphones: report. In European Conference on Computer Vision (ECCV) Workshops, January 2019.

    Google Scholar 

  29. Justin Johnson, Alexandre Alahi, and Li Fei-Fei. Perceptual losses for real-time style transfer and super-resolution. In Proc. of European Conference on Computer Vision, 2016.

    Google Scholar 

  30. V. Jakhetiya, W. Lin, S. P. Jaiswal, S. C. Guntuku, and O. C. Au. Maximum a posterior and perceptually motivated reconstruction algorithm: A generic framework. IEEE Transactions on Multimedia, 19(1):93–106, 2017.

    Article  Google Scholar 

  31. Alexia Jolicoeur-Martineau. The relativistic discriminator: a key element missing from standard gan. arXiv preprint arXiv:1807.00734, 2018.

    Google Scholar 

  32. L. W. Kang, C. C. Hsu, B. Zhuang, C. W. Lin, and C. H. Yeh. Learning-based joint super-resolution and deblocking for a highly compressed image. IEEE Transactions on Multimedia, 17(7):921–934, 2015.

    Article  Google Scholar 

  33. H. Ko, D. Y. Lee, S. Cho, and A. C. Bovik. Quality prediction on deep generative images. IEEE Transactions on Image Processing, 29:5964–5979, 2020.

    Article  Google Scholar 

  34. Vitaliy Lyudvichenko, Mikhail Erofeev, Yury Gitman, and Dmitriy Vatolin. A semiautomatic saliency model and its application to video compression. In Proc. of IEEE International Conference on Intelligent Computer Communication and Processing (ICCP), 2017.

    Google Scholar 

  35. Vitaliy Lyudvichenko, Mikhail Erofeev, Alexander Ploshkin, and Dmitriy Vatolin. Improving video compression with deep visual-attention models. In Proc. of International Conference on Intelligent Medicine and Image Processing, IMIP ’19, pages 88–94, New York, NY, USA, 2019. Association for Computing Machinery.

    Google Scholar 

  36. Yu Li, Fangfang Guo, Robby T. Tan, and Michael S. Brown. A contrast enhancement framework with JPEG artifacts suppression. In Proc. of European Conference on Computer Vision, 2014.

    Google Scholar 

  37. Tao Li, Xiaohai He, Linbo Qing, Qizhi Teng, and Honggang Chen. An iterative framework of cascaded deblocking and super-resolution for compressed images. IEEE Transactions on Multimedia, 2017.

    Google Scholar 

  38. Cheng-Han Lee, Ziwei Liu, Lingyun Wu, and Ping Luo. Maskgan: Towards diverse and interactive facial image manipulation. In Proc. of CVPR, 2020.

    Google Scholar 

  39. Jonathan Long, Evan Shelhamer, and Trevor Darrell. Fully convolutional networks for semantic segmentation. In Proc. of IEEE Computer Vision and Pattern Recognition, 2015.

    Google Scholar 

  40. Christian Ledig, Lucas Theis, Ferenc Huszar, Jose Caballero, Andrew P. Aitken, Alykhan Tejani, Johannes Totz, Zehan Wang, and Wenzhe Shi. Photo-realistic single image super-resolution using a generative adversarial network. CoRR, abs/1609.04802, 2016.

    Google Scholar 

  41. Filippo Mameli, Marco Bertini, Leonardo Galteri, and Alberto Del Bimbo. A NoGAN approach for image and video restoration and compression artifact removal. In Proc. of International Conference on Pattern Recognition (ICPR), 2021.

    Google Scholar 

  42. I. Mitrica, A. Fiandrotti, M. Cagnazzo, E. Mercier, and C. Ruellan. Cockpit video coding with temporal prediction. In Proc. of EUVIP, pages 28–33, 2019.

    Google Scholar 

  43. A. Mittal, A. K. Moorthy, and A. C. Bovik. No-reference image quality assessment in the spatial domain. IEEE Transactions on Image Processing, 21(12):4695–4708, Dec 2012.

    Article  MathSciNet  Google Scholar 

  44. Danial Maleki, Soheila Nadalian, Mohammad Mahdi Derakhshani, and Mohammad Amin Sadeghi. Blockcnn: A deep network for artifact removal and image compression. In CVPR Workshops, pages 2555–2558, 2018.

    Google Scholar 

  45. Anish Mittal, Rajiv Soundararajan, and Alan C Bovik. Making a “completely blind” image quality analyzer. IEEE Signal Processing Letters, 20(3):209–212, 2013.

    Google Scholar 

  46. Anish Mittal, Michele A Saad, and Alan C Bovik. A completely blind video integrity oracle. IEEE Transactions on Image Processing, 25(1):289–300, 2016.

    Google Scholar 

  47. Xiaojiao Mao, Chunhua Shen, and Yu-Bin Yang. Image restoration using very deep convolutional encoder-decoder networks with symmetric skip connections. In Proc. of NIPS, 2016.

    Google Scholar 

  48. Fabian Mentzer, George Toderici, Michael Tschannen, and Eirikur Agustsson. High-fidelity generative image compression. 2020.

    Google Scholar 

  49. Augustus Odena, Vincent Dumoulin, and Chris Olah. Deconvolution and checkerboard artifacts. Distill, 2016. http://distill.pub/2016/deconv-checkerboard.

  50. Omkar M Parkhi, Andrea Vedaldi, and Andrew Zisserman. Deep face recognition. 2015.

    Google Scholar 

  51. Oren Rippel and Lubomir Bourdev. Real-time adaptive image compression. In Proc. of ICML, 2017.

    Google Scholar 

  52. Alessandro Redondi, Luca Baroffio, Lucio Bianchi, Matteo Cesana, and Marco Tagliasacchi. Compress-then-analyze versus analyze-then-compress: What is best in visual sensor networks? IEEE Transactions on Mobile Computing, 15(12):3000–3013, 2016.

    Article  Google Scholar 

  53. Oren Rippel, Sanjay Nair, Carissa Lew, Steve Branson, Alexander G Anderson, and Lubomir Bourdev. Learned video compression. In Proc. of ICCV, pages 3454–3463, 2019.

    Google Scholar 

  54. Pavel Svoboda, Michal Hradis, David Barina, and Pavel Zemcik. Compression artifacts removal using convolutional neural networks. arXiv preprint arXiv:1605.00366, 2016.

    Google Scholar 

  55. Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. MobileNetV2: Inverted residuals and linear bottlenecks. In Proc. of IEEE Computer Vision and Pattern Recognition, June 2018.

    Google Scholar 

  56. Skype video conferencing application. http://www.skype.com.

  57. Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. In Proc. of ICLR, 2015.

    Google Scholar 

  58. Hossein Talebi, Damien Kelly, Xiyang Luo, Ignacio Garcia Dorado, Feng Yang, Peyman Milanfar, and Michael Elad. Better compression with deep pre-editing. arXiv preprint arXiv:2002.00113, 2020.

    Google Scholar 

  59. Galteri, L., Seidenari, L., Bertini, M., and Del Bimbo, A. (2019). Deep universal generative adversarial compression artifact removal. IEEE Transactions on Multimedia, 21(8):2131–2145.

    Article  Google Scholar 

  60. Zhou Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli. Image quality assessment: from error visibility to structural similarity. IEEE Transactions on Image Processing, 13(4):600–612, April 2004.

    Google Scholar 

  61. Xiaoli Wang, Aakanksha Chowdhery, and Mung Chiang. Skyeyes: Adaptive video streaming from uavs. In Proc.of Workshop on Hot Topics in Wireless, HotWireless ’16, pages 2–6, New York, NY, USA, 2016. Association for Computing Machinery.

    Google Scholar 

  62. Maarten Wijnants, Sven Coppers, Gustavo Rovelo Ruiz, Peter Quax, and Wim Lamotte. Talking video heads: Saving streaming bitrate by adaptively applying object-based video principles to interview-like footage. In Proc. of ACM Multimedia, MM ’19, pages 2449–2458, New York, NY, USA, 2019. Association for Computing Machinery.

    Google Scholar 

  63. Zhangyang Wang, Ding Liu, Shiyu Chang, Qing Ling, Yingzhen Yang, and Thomas S Huang. D3: Deep dual-domain based fast restoration of JPEG-compressed images. In Proc. of IEEE Computer Vision and Pattern Recognition, 2016.

    Google Scholar 

  64. Xintao Wang, Ke Yu, Shixiang Wu, Jinjin Gu, Yihao Liu, Chao Dong, Yu Qiao, and Chen Change Loy. Esrgan: Enhanced super-resolution generative adversarial networks. In Proceedings of the European Conference on Computer Vision (ECCV), pages 0–0, 2018.

    Google Scholar 

  65. Jaeyoung Yoo, Sang-ho Lee, and Nojun Kwak. Image restoration by estimating frequency distribution of local patches. In Proc. of IEEE Computer Vision and Pattern Recognition, 2018.

    Google Scholar 

  66. Chia-Hung Yeh, Chu-Han Lin, Min-Hui Lin, Li-Wei Kang, Chih-Hsiang Huang, and Mei-Juan Chen. Deep learning-based compressed image artifacts reduction based on multi-scale image fusion. Information Fusion, 67:195–207, 2021.

    Article  Google Scholar 

  67. Changqian Yu, Jingbo Wang, Chao Peng, Changxin Gao, Gang Yu, and Nong Sang. Bisenet: Bilateral segmentation network for real-time semantic segmentation. In Proc. of European Conference on Computer Vision, pages 325–341, 2018.

    Google Scholar 

  68. Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, and Oliver Wang. The unreasonable effectiveness of deep features as a perceptual metric. In CVPR, 2018.

    Google Scholar 

  69. Zoom video conferencing application. http://www.zoom.us.

  70. X. Zhang, R. Xiong, X. Fan, S. Ma, and W. Gao. Compression artifact reduction by overlapped-block transform coefficient estimation with block similarity. IEEE Transactions on Image Processing, 22(12):4613–4626, 2013.

    Article  MathSciNet  Google Scholar 

  71. J. Zhang, R. Xiong, C. Zhao, Y. Zhang, S. Ma, and W. Gao. CONCOLOR: Constrained non-convex low-rank model for image deblocking. IEEE Transactions on Image Processing, 25(3):1246–1259, March 2016.

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lorenzo Seidenari .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this chapter

Cite this chapter

Galteri, L., Seidenari, L., Uricchio, T., Bertini, M., del Bimbo, A. (2021). Improving Video Quality with Generative Adversarial Networks. In: Benois-Pineau, J., Zemmari, A. (eds) Multi-faceted Deep Learning. Springer, Cham. https://doi.org/10.1007/978-3-030-74478-6_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-74478-6_12

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-74477-9

  • Online ISBN: 978-3-030-74478-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics