Improving Video Quality with Generative Adversarial Networks

Galteri, Leonardo; Seidenari, Lorenzo; Uricchio, Tiberio; Bertini, Marco; del Bimbo, Alberto

doi:10.1007/978-3-030-74478-6_12

Leonardo Galteri³,
Lorenzo Seidenari³,
Tiberio Uricchio³,
Marco Bertini³ &
…
Alberto del Bimbo³

935 Accesses

Abstract

The chapter will cover deep learning methodologies that can be employed to recover image and video quality. Most of the covered approaches will be based on conditional Generative Adversarial Networks (GAN) which have the benefit to produce images which look more natural. Looking at the inference phase we will show how to perform such operations with a low computational footprint. Regarding the training phase we will address in depth architectural choices, loss functions and training strategies in general. Finally, we will also deal with settings in which there is the possibility to control both end of the image transmission pipeline.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 149.00; Price excludes VAT (USA)

Softcover Book: USD 199.99; Price excludes VAT (USA)

Hardcover Book: USD 199.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Jason Antic, Jeremy Howard, and Uri Manor. Decrappification, deoldification, and super resolution, 2019.
Google Scholar
Noor Al-Shakarji, Filiz Bunyak, Hadi Aliakbarpour, Guna Seetharaman, and Kannappan Palaniappan. Multi-cue vehicle detection for semantic video compression in georegistered aerial videos. In Proc. of (CVPR) Workshops, June 2019.
Google Scholar
Eirikur Agustsson and Radu Timofte. Ntire 2017 challenge on single image super-resolution: Dataset and study. In Proc. of IEEE CVPR Workshops, 2017.
Google Scholar
A Diana Andrushia and R Thangarjan. Saliency-based image compression using Walsh–Hadamard transform (WHT). In Biologically rationalized computing techniques for image processing applications, pages 21–42. Springer, 2018.
Google Scholar
Eirikur Agustsson, Michael Tschannen, Fabian Mentzer, Radu Timofte, and Luc Van Gool. Generative adversarial networks for extreme learned image compression. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), October 2019.
Google Scholar
Andrew D. Bagdanov, Marco Bertini, Alberto Del Bimbo, and Lorenzo Seidenari. Adaptive video compression for video surveillance applications. In Proc. of International Symposium on Multimedia, 2011.
Google Scholar
Marco Bertini, Alberto Del Bimbo, Andrea Prati, and Rita Cucchiara. Semantic adaptation of sport videos with user-centred performance analysis. IEEE Transactions on Multimedia, 8(3):433–443, Jun 2006.
Article Google Scholar
Yochai Blau and Tomer Michaeli. Rethinking lossy compression: The rate-distortion-perception tradeoff. In Proc. of ICML, 2019.
Google Scholar
Joan Bruna, Pablo Sprechmann, and Yann LeCun. Super-resolution with deep convolutional sufficient statistics. CoRR, abs/1511.05666, 2015.
Google Scholar
Daniel Bolya, Chong Zhou, Fanyi Xiao, and Yong Jae Lee. Yolact: real-time instance segmentation. In Proc. of International Conference on Computer Vision, pages 9157–9166, 2019.
Google Scholar
Lukas Cavigelli, Pascal Hager, and Luca Benini. CAS-CNN: A deep convolutional neural network for image compression artifact suppression. In Proc. of IJCNN, 2017.
Google Scholar
A. Dosovitskiy and T. Brox. Generating images with perceptual similarity metrics based on deep networks. In Proc. of NIPS, 2016.
Google Scholar
Y. Dar, A. M. Bruckstein, M. Elad, and R. Giryes. Postprocessing of compressed images via sequential denoising. IEEE Transactions on Image Processing, 25(7):3044–3058, July 2016.
Article MathSciNet Google Scholar
Chao Dong, Yubin Deng, Chen Change Loy, and Xiaoou Tang. Compression artifacts reduction by a deep convolutional network. In Proc. of International Conference on Computer Vision, 2015.
Google Scholar
Xueyang Fu, Zheng-Jun Zha, Feng Wu, Xinghao Ding, and John Paisley. Jpeg artifacts reduction via deep convolutional sparse coding. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), October 2019.
Google Scholar
Leonardo Galteri, Marco Bertini, Lorenzo Seidenari, Tiberio Uricchio, and Alberto Del Bimbo. Increasing video perceptual quality with gans and semantic coding. In Proc. of ACM International Conference on Multimedia (ACM MM), MM ’20, pages 862–870, New York, NY, USA, 2020. Association for Computing Machinery.
Google Scholar
Leonardo Galteri, Marco Bertini, Lorenzo Seidenari, Tiberio Uricchio, and Alberto Del Bimbo. Increasing video perceptual quality with gans and semantic coding. In Proceedings of the 28th ACM International Conference on Multimedia, pages 862–870, 2020.
Google Scholar
Leonardo Galteri, Marco Bertini, Lorenzo Seidenari, and Alberto Del Bimbo. Video compression for object detection algorithms. In Proc. of International Conference on Pattern Recognition, pages 3007–3012. IEEE, 2018.
Google Scholar
Leon A. Gatys, Alexander S. Ecker, and Matthias Bethge. Texture synthesis and the controlled generation of natural stimuli using convolutional neural networks. CoRR, abs/1505.07376, 2015.
Google Scholar
Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. In Proc. of NIPS, 2014.
Google Scholar
Leonardo Galteri, Lorenzo Seidenari, Marco Bertini, Tiberio Uricchio, and Alberto Del Bimbo. Fast video quality enhancement using gans. In Proc. of ACM Multimedia, MM ’19, pages 1065–1067, New York, NY, USA, 2019. Association for Computing Machinery.
Google Scholar
Leonardo Galteri, Lorenzo Seidenari, Marco Bertini, and Alberto Del Bimbo. Deep generative adversarial compression artifact removal. In Proc. of International Conference on Computer Vision, 2017.
Google Scholar
L. Galteri, L. Seidenari, M. Bertini, and A. Del Bimbo. Deep universal generative adversarial compression artifact removal. IEEE Transactions on Multimedia, pages 1–1, 2019.
Google Scholar
Leonardo Galteri, Lorenzo Seidenari, Marco Bertini, and Alberto Del Bimbo. Towards real-time image enhancement gans. In Proc. of International Conference on Analysis of Images and Patterns (CAIP). IAPR, 2019.
Google Scholar
Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. Mask r-cnn. In Proc. of International Conference on Computer Vision, pages 2961–2969, 2017.
Google Scholar
Andrew Howard, Mark Sandler, Grace Chu, Liang-Chieh Chen, Bo Chen, Mingxing Tan, Weijun Wang, Yukun Zhu, Ruoming Pang, Vijay Vasudevan, et al. Searching for mobilenetv3. In Proceedings of the IEEE International Conference on Computer Vision, pages 1314–1324, 2019.
Google Scholar
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proc. of IEEE Computer Vision and Pattern Recognition, 2016.
Google Scholar
Andrey Ignatov, Radu Timofte, et al. Pirm challenge on perceptual image enhancement on smartphones: report. In European Conference on Computer Vision (ECCV) Workshops, January 2019.
Google Scholar
Justin Johnson, Alexandre Alahi, and Li Fei-Fei. Perceptual losses for real-time style transfer and super-resolution. In Proc. of European Conference on Computer Vision, 2016.
Google Scholar
V. Jakhetiya, W. Lin, S. P. Jaiswal, S. C. Guntuku, and O. C. Au. Maximum a posterior and perceptually motivated reconstruction algorithm: A generic framework. IEEE Transactions on Multimedia, 19(1):93–106, 2017.
Article Google Scholar
Alexia Jolicoeur-Martineau. The relativistic discriminator: a key element missing from standard gan. arXiv preprint arXiv:1807.00734, 2018.
Google Scholar
L. W. Kang, C. C. Hsu, B. Zhuang, C. W. Lin, and C. H. Yeh. Learning-based joint super-resolution and deblocking for a highly compressed image. IEEE Transactions on Multimedia, 17(7):921–934, 2015.
Article Google Scholar
H. Ko, D. Y. Lee, S. Cho, and A. C. Bovik. Quality prediction on deep generative images. IEEE Transactions on Image Processing, 29:5964–5979, 2020.
Article Google Scholar
Vitaliy Lyudvichenko, Mikhail Erofeev, Yury Gitman, and Dmitriy Vatolin. A semiautomatic saliency model and its application to video compression. In Proc. of IEEE International Conference on Intelligent Computer Communication and Processing (ICCP), 2017.
Google Scholar
Vitaliy Lyudvichenko, Mikhail Erofeev, Alexander Ploshkin, and Dmitriy Vatolin. Improving video compression with deep visual-attention models. In Proc. of International Conference on Intelligent Medicine and Image Processing, IMIP ’19, pages 88–94, New York, NY, USA, 2019. Association for Computing Machinery.
Google Scholar
Yu Li, Fangfang Guo, Robby T. Tan, and Michael S. Brown. A contrast enhancement framework with JPEG artifacts suppression. In Proc. of European Conference on Computer Vision, 2014.
Google Scholar
Tao Li, Xiaohai He, Linbo Qing, Qizhi Teng, and Honggang Chen. An iterative framework of cascaded deblocking and super-resolution for compressed images. IEEE Transactions on Multimedia, 2017.
Google Scholar
Cheng-Han Lee, Ziwei Liu, Lingyun Wu, and Ping Luo. Maskgan: Towards diverse and interactive facial image manipulation. In Proc. of CVPR, 2020.
Google Scholar
Jonathan Long, Evan Shelhamer, and Trevor Darrell. Fully convolutional networks for semantic segmentation. In Proc. of IEEE Computer Vision and Pattern Recognition, 2015.
Google Scholar
Christian Ledig, Lucas Theis, Ferenc Huszar, Jose Caballero, Andrew P. Aitken, Alykhan Tejani, Johannes Totz, Zehan Wang, and Wenzhe Shi. Photo-realistic single image super-resolution using a generative adversarial network. CoRR, abs/1609.04802, 2016.
Google Scholar
Filippo Mameli, Marco Bertini, Leonardo Galteri, and Alberto Del Bimbo. A NoGAN approach for image and video restoration and compression artifact removal. In Proc. of International Conference on Pattern Recognition (ICPR), 2021.
Google Scholar
I. Mitrica, A. Fiandrotti, M. Cagnazzo, E. Mercier, and C. Ruellan. Cockpit video coding with temporal prediction. In Proc. of EUVIP, pages 28–33, 2019.
Google Scholar
A. Mittal, A. K. Moorthy, and A. C. Bovik. No-reference image quality assessment in the spatial domain. IEEE Transactions on Image Processing, 21(12):4695–4708, Dec 2012.
Article MathSciNet Google Scholar
Danial Maleki, Soheila Nadalian, Mohammad Mahdi Derakhshani, and Mohammad Amin Sadeghi. Blockcnn: A deep network for artifact removal and image compression. In CVPR Workshops, pages 2555–2558, 2018.
Google Scholar
Anish Mittal, Rajiv Soundararajan, and Alan C Bovik. Making a “completely blind” image quality analyzer. IEEE Signal Processing Letters, 20(3):209–212, 2013.
Google Scholar
Anish Mittal, Michele A Saad, and Alan C Bovik. A completely blind video integrity oracle. IEEE Transactions on Image Processing, 25(1):289–300, 2016.
Google Scholar
Xiaojiao Mao, Chunhua Shen, and Yu-Bin Yang. Image restoration using very deep convolutional encoder-decoder networks with symmetric skip connections. In Proc. of NIPS, 2016.
Google Scholar
Fabian Mentzer, George Toderici, Michael Tschannen, and Eirikur Agustsson. High-fidelity generative image compression. 2020.
Google Scholar
Augustus Odena, Vincent Dumoulin, and Chris Olah. Deconvolution and checkerboard artifacts. Distill, 2016. http://distill.pub/2016/deconv-checkerboard.
Omkar M Parkhi, Andrea Vedaldi, and Andrew Zisserman. Deep face recognition. 2015.
Google Scholar
Oren Rippel and Lubomir Bourdev. Real-time adaptive image compression. In Proc. of ICML, 2017.
Google Scholar
Alessandro Redondi, Luca Baroffio, Lucio Bianchi, Matteo Cesana, and Marco Tagliasacchi. Compress-then-analyze versus analyze-then-compress: What is best in visual sensor networks? IEEE Transactions on Mobile Computing, 15(12):3000–3013, 2016.
Article Google Scholar
Oren Rippel, Sanjay Nair, Carissa Lew, Steve Branson, Alexander G Anderson, and Lubomir Bourdev. Learned video compression. In Proc. of ICCV, pages 3454–3463, 2019.
Google Scholar
Pavel Svoboda, Michal Hradis, David Barina, and Pavel Zemcik. Compression artifacts removal using convolutional neural networks. arXiv preprint arXiv:1605.00366, 2016.
Google Scholar
Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. MobileNetV2: Inverted residuals and linear bottlenecks. In Proc. of IEEE Computer Vision and Pattern Recognition, June 2018.
Google Scholar
Skype video conferencing application. http://www.skype.com.
Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. In Proc. of ICLR, 2015.
Google Scholar
Hossein Talebi, Damien Kelly, Xiyang Luo, Ignacio Garcia Dorado, Feng Yang, Peyman Milanfar, and Michael Elad. Better compression with deep pre-editing. arXiv preprint arXiv:2002.00113, 2020.
Google Scholar
Galteri, L., Seidenari, L., Bertini, M., and Del Bimbo, A. (2019). Deep universal generative adversarial compression artifact removal. IEEE Transactions on Multimedia, 21(8):2131–2145.
Article Google Scholar
Zhou Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli. Image quality assessment: from error visibility to structural similarity. IEEE Transactions on Image Processing, 13(4):600–612, April 2004.
Google Scholar
Xiaoli Wang, Aakanksha Chowdhery, and Mung Chiang. Skyeyes: Adaptive video streaming from uavs. In Proc.of Workshop on Hot Topics in Wireless, HotWireless ’16, pages 2–6, New York, NY, USA, 2016. Association for Computing Machinery.
Google Scholar
Maarten Wijnants, Sven Coppers, Gustavo Rovelo Ruiz, Peter Quax, and Wim Lamotte. Talking video heads: Saving streaming bitrate by adaptively applying object-based video principles to interview-like footage. In Proc. of ACM Multimedia, MM ’19, pages 2449–2458, New York, NY, USA, 2019. Association for Computing Machinery.
Google Scholar
Zhangyang Wang, Ding Liu, Shiyu Chang, Qing Ling, Yingzhen Yang, and Thomas S Huang. D3: Deep dual-domain based fast restoration of JPEG-compressed images. In Proc. of IEEE Computer Vision and Pattern Recognition, 2016.
Google Scholar
Xintao Wang, Ke Yu, Shixiang Wu, Jinjin Gu, Yihao Liu, Chao Dong, Yu Qiao, and Chen Change Loy. Esrgan: Enhanced super-resolution generative adversarial networks. In Proceedings of the European Conference on Computer Vision (ECCV), pages 0–0, 2018.
Google Scholar
Jaeyoung Yoo, Sang-ho Lee, and Nojun Kwak. Image restoration by estimating frequency distribution of local patches. In Proc. of IEEE Computer Vision and Pattern Recognition, 2018.
Google Scholar
Chia-Hung Yeh, Chu-Han Lin, Min-Hui Lin, Li-Wei Kang, Chih-Hsiang Huang, and Mei-Juan Chen. Deep learning-based compressed image artifacts reduction based on multi-scale image fusion. Information Fusion, 67:195–207, 2021.
Article Google Scholar
Changqian Yu, Jingbo Wang, Chao Peng, Changxin Gao, Gang Yu, and Nong Sang. Bisenet: Bilateral segmentation network for real-time semantic segmentation. In Proc. of European Conference on Computer Vision, pages 325–341, 2018.
Google Scholar
Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, and Oliver Wang. The unreasonable effectiveness of deep features as a perceptual metric. In CVPR, 2018.
Google Scholar
Zoom video conferencing application. http://www.zoom.us.
X. Zhang, R. Xiong, X. Fan, S. Ma, and W. Gao. Compression artifact reduction by overlapped-block transform coefficient estimation with block similarity. IEEE Transactions on Image Processing, 22(12):4613–4626, 2013.
Article MathSciNet Google Scholar
J. Zhang, R. Xiong, C. Zhao, Y. Zhang, S. Ma, and W. Gao. CONCOLOR: Constrained non-convex low-rank model for image deblocking. IEEE Transactions on Image Processing, 25(3):1246–1259, March 2016.
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

University of Florence, Firenze, Italy
Leonardo Galteri, Lorenzo Seidenari, Tiberio Uricchio, Marco Bertini & Alberto del Bimbo

Authors

Leonardo Galteri
View author publications
You can also search for this author in PubMed Google Scholar
Lorenzo Seidenari
View author publications
You can also search for this author in PubMed Google Scholar
Tiberio Uricchio
View author publications
You can also search for this author in PubMed Google Scholar
Marco Bertini
View author publications
You can also search for this author in PubMed Google Scholar
Alberto del Bimbo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lorenzo Seidenari .

Editor information

Editors and Affiliations

LaBRI UMR 5800, University of Bordeaux, Talence Cedex, France
Jenny Benois-Pineau
LaBRI UMR 5800, University of Bordeaux, Talence Cedex, France
Akka Zemmari

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Galteri, L., Seidenari, L., Uricchio, T., Bertini, M., del Bimbo, A. (2021). Improving Video Quality with Generative Adversarial Networks. In: Benois-Pineau, J., Zemmari, A. (eds) Multi-faceted Deep Learning. Springer, Cham. https://doi.org/10.1007/978-3-030-74478-6_12

Download citation

DOI: https://doi.org/10.1007/978-3-030-74478-6_12
Published: 24 February 2012
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-74477-9
Online ISBN: 978-3-030-74478-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics