Abstract
The chapter will cover deep learning methodologies that can be employed to recover image and video quality. Most of the covered approaches will be based on conditional Generative Adversarial Networks (GAN) which have the benefit to produce images which look more natural. Looking at the inference phase we will show how to perform such operations with a low computational footprint. Regarding the training phase we will address in depth architectural choices, loss functions and training strategies in general. Finally, we will also deal with settings in which there is the possibility to control both end of the image transmission pipeline.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Jason Antic, Jeremy Howard, and Uri Manor. Decrappification, deoldification, and super resolution, 2019.
Noor Al-Shakarji, Filiz Bunyak, Hadi Aliakbarpour, Guna Seetharaman, and Kannappan Palaniappan. Multi-cue vehicle detection for semantic video compression in georegistered aerial videos. In Proc. of (CVPR) Workshops, June 2019.
Eirikur Agustsson and Radu Timofte. Ntire 2017 challenge on single image super-resolution: Dataset and study. In Proc. of IEEE CVPR Workshops, 2017.
A Diana Andrushia and R Thangarjan. Saliency-based image compression using Walsh–Hadamard transform (WHT). In Biologically rationalized computing techniques for image processing applications, pages 21–42. Springer, 2018.
Eirikur Agustsson, Michael Tschannen, Fabian Mentzer, Radu Timofte, and Luc Van Gool. Generative adversarial networks for extreme learned image compression. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), October 2019.
Andrew D. Bagdanov, Marco Bertini, Alberto Del Bimbo, and Lorenzo Seidenari. Adaptive video compression for video surveillance applications. In Proc. of International Symposium on Multimedia, 2011.
Marco Bertini, Alberto Del Bimbo, Andrea Prati, and Rita Cucchiara. Semantic adaptation of sport videos with user-centred performance analysis. IEEE Transactions on Multimedia, 8(3):433–443, Jun 2006.
Yochai Blau and Tomer Michaeli. Rethinking lossy compression: The rate-distortion-perception tradeoff. In Proc. of ICML, 2019.
Joan Bruna, Pablo Sprechmann, and Yann LeCun. Super-resolution with deep convolutional sufficient statistics. CoRR, abs/1511.05666, 2015.
Daniel Bolya, Chong Zhou, Fanyi Xiao, and Yong Jae Lee. Yolact: real-time instance segmentation. In Proc. of International Conference on Computer Vision, pages 9157–9166, 2019.
Lukas Cavigelli, Pascal Hager, and Luca Benini. CAS-CNN: A deep convolutional neural network for image compression artifact suppression. In Proc. of IJCNN, 2017.
A. Dosovitskiy and T. Brox. Generating images with perceptual similarity metrics based on deep networks. In Proc. of NIPS, 2016.
Y. Dar, A. M. Bruckstein, M. Elad, and R. Giryes. Postprocessing of compressed images via sequential denoising. IEEE Transactions on Image Processing, 25(7):3044–3058, July 2016.
Chao Dong, Yubin Deng, Chen Change Loy, and Xiaoou Tang. Compression artifacts reduction by a deep convolutional network. In Proc. of International Conference on Computer Vision, 2015.
Xueyang Fu, Zheng-Jun Zha, Feng Wu, Xinghao Ding, and John Paisley. Jpeg artifacts reduction via deep convolutional sparse coding. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), October 2019.
Leonardo Galteri, Marco Bertini, Lorenzo Seidenari, Tiberio Uricchio, and Alberto Del Bimbo. Increasing video perceptual quality with gans and semantic coding. In Proc. of ACM International Conference on Multimedia (ACM MM), MM ’20, pages 862–870, New York, NY, USA, 2020. Association for Computing Machinery.
Leonardo Galteri, Marco Bertini, Lorenzo Seidenari, Tiberio Uricchio, and Alberto Del Bimbo. Increasing video perceptual quality with gans and semantic coding. In Proceedings of the 28th ACM International Conference on Multimedia, pages 862–870, 2020.
Leonardo Galteri, Marco Bertini, Lorenzo Seidenari, and Alberto Del Bimbo. Video compression for object detection algorithms. In Proc. of International Conference on Pattern Recognition, pages 3007–3012. IEEE, 2018.
Leon A. Gatys, Alexander S. Ecker, and Matthias Bethge. Texture synthesis and the controlled generation of natural stimuli using convolutional neural networks. CoRR, abs/1505.07376, 2015.
Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. In Proc. of NIPS, 2014.
Leonardo Galteri, Lorenzo Seidenari, Marco Bertini, Tiberio Uricchio, and Alberto Del Bimbo. Fast video quality enhancement using gans. In Proc. of ACM Multimedia, MM ’19, pages 1065–1067, New York, NY, USA, 2019. Association for Computing Machinery.
Leonardo Galteri, Lorenzo Seidenari, Marco Bertini, and Alberto Del Bimbo. Deep generative adversarial compression artifact removal. In Proc. of International Conference on Computer Vision, 2017.
L. Galteri, L. Seidenari, M. Bertini, and A. Del Bimbo. Deep universal generative adversarial compression artifact removal. IEEE Transactions on Multimedia, pages 1–1, 2019.
Leonardo Galteri, Lorenzo Seidenari, Marco Bertini, and Alberto Del Bimbo. Towards real-time image enhancement gans. In Proc. of International Conference on Analysis of Images and Patterns (CAIP). IAPR, 2019.
Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. Mask r-cnn. In Proc. of International Conference on Computer Vision, pages 2961–2969, 2017.
Andrew Howard, Mark Sandler, Grace Chu, Liang-Chieh Chen, Bo Chen, Mingxing Tan, Weijun Wang, Yukun Zhu, Ruoming Pang, Vijay Vasudevan, et al. Searching for mobilenetv3. In Proceedings of the IEEE International Conference on Computer Vision, pages 1314–1324, 2019.
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proc. of IEEE Computer Vision and Pattern Recognition, 2016.
Andrey Ignatov, Radu Timofte, et al. Pirm challenge on perceptual image enhancement on smartphones: report. In European Conference on Computer Vision (ECCV) Workshops, January 2019.
Justin Johnson, Alexandre Alahi, and Li Fei-Fei. Perceptual losses for real-time style transfer and super-resolution. In Proc. of European Conference on Computer Vision, 2016.
V. Jakhetiya, W. Lin, S. P. Jaiswal, S. C. Guntuku, and O. C. Au. Maximum a posterior and perceptually motivated reconstruction algorithm: A generic framework. IEEE Transactions on Multimedia, 19(1):93–106, 2017.
Alexia Jolicoeur-Martineau. The relativistic discriminator: a key element missing from standard gan. arXiv preprint arXiv:1807.00734, 2018.
L. W. Kang, C. C. Hsu, B. Zhuang, C. W. Lin, and C. H. Yeh. Learning-based joint super-resolution and deblocking for a highly compressed image. IEEE Transactions on Multimedia, 17(7):921–934, 2015.
H. Ko, D. Y. Lee, S. Cho, and A. C. Bovik. Quality prediction on deep generative images. IEEE Transactions on Image Processing, 29:5964–5979, 2020.
Vitaliy Lyudvichenko, Mikhail Erofeev, Yury Gitman, and Dmitriy Vatolin. A semiautomatic saliency model and its application to video compression. In Proc. of IEEE International Conference on Intelligent Computer Communication and Processing (ICCP), 2017.
Vitaliy Lyudvichenko, Mikhail Erofeev, Alexander Ploshkin, and Dmitriy Vatolin. Improving video compression with deep visual-attention models. In Proc. of International Conference on Intelligent Medicine and Image Processing, IMIP ’19, pages 88–94, New York, NY, USA, 2019. Association for Computing Machinery.
Yu Li, Fangfang Guo, Robby T. Tan, and Michael S. Brown. A contrast enhancement framework with JPEG artifacts suppression. In Proc. of European Conference on Computer Vision, 2014.
Tao Li, Xiaohai He, Linbo Qing, Qizhi Teng, and Honggang Chen. An iterative framework of cascaded deblocking and super-resolution for compressed images. IEEE Transactions on Multimedia, 2017.
Cheng-Han Lee, Ziwei Liu, Lingyun Wu, and Ping Luo. Maskgan: Towards diverse and interactive facial image manipulation. In Proc. of CVPR, 2020.
Jonathan Long, Evan Shelhamer, and Trevor Darrell. Fully convolutional networks for semantic segmentation. In Proc. of IEEE Computer Vision and Pattern Recognition, 2015.
Christian Ledig, Lucas Theis, Ferenc Huszar, Jose Caballero, Andrew P. Aitken, Alykhan Tejani, Johannes Totz, Zehan Wang, and Wenzhe Shi. Photo-realistic single image super-resolution using a generative adversarial network. CoRR, abs/1609.04802, 2016.
Filippo Mameli, Marco Bertini, Leonardo Galteri, and Alberto Del Bimbo. A NoGAN approach for image and video restoration and compression artifact removal. In Proc. of International Conference on Pattern Recognition (ICPR), 2021.
I. Mitrica, A. Fiandrotti, M. Cagnazzo, E. Mercier, and C. Ruellan. Cockpit video coding with temporal prediction. In Proc. of EUVIP, pages 28–33, 2019.
A. Mittal, A. K. Moorthy, and A. C. Bovik. No-reference image quality assessment in the spatial domain. IEEE Transactions on Image Processing, 21(12):4695–4708, Dec 2012.
Danial Maleki, Soheila Nadalian, Mohammad Mahdi Derakhshani, and Mohammad Amin Sadeghi. Blockcnn: A deep network for artifact removal and image compression. In CVPR Workshops, pages 2555–2558, 2018.
Anish Mittal, Rajiv Soundararajan, and Alan C Bovik. Making a “completely blind” image quality analyzer. IEEE Signal Processing Letters, 20(3):209–212, 2013.
Anish Mittal, Michele A Saad, and Alan C Bovik. A completely blind video integrity oracle. IEEE Transactions on Image Processing, 25(1):289–300, 2016.
Xiaojiao Mao, Chunhua Shen, and Yu-Bin Yang. Image restoration using very deep convolutional encoder-decoder networks with symmetric skip connections. In Proc. of NIPS, 2016.
Fabian Mentzer, George Toderici, Michael Tschannen, and Eirikur Agustsson. High-fidelity generative image compression. 2020.
Augustus Odena, Vincent Dumoulin, and Chris Olah. Deconvolution and checkerboard artifacts. Distill, 2016. http://distill.pub/2016/deconv-checkerboard.
Omkar M Parkhi, Andrea Vedaldi, and Andrew Zisserman. Deep face recognition. 2015.
Oren Rippel and Lubomir Bourdev. Real-time adaptive image compression. In Proc. of ICML, 2017.
Alessandro Redondi, Luca Baroffio, Lucio Bianchi, Matteo Cesana, and Marco Tagliasacchi. Compress-then-analyze versus analyze-then-compress: What is best in visual sensor networks? IEEE Transactions on Mobile Computing, 15(12):3000–3013, 2016.
Oren Rippel, Sanjay Nair, Carissa Lew, Steve Branson, Alexander G Anderson, and Lubomir Bourdev. Learned video compression. In Proc. of ICCV, pages 3454–3463, 2019.
Pavel Svoboda, Michal Hradis, David Barina, and Pavel Zemcik. Compression artifacts removal using convolutional neural networks. arXiv preprint arXiv:1605.00366, 2016.
Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. MobileNetV2: Inverted residuals and linear bottlenecks. In Proc. of IEEE Computer Vision and Pattern Recognition, June 2018.
Skype video conferencing application. http://www.skype.com.
Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. In Proc. of ICLR, 2015.
Hossein Talebi, Damien Kelly, Xiyang Luo, Ignacio Garcia Dorado, Feng Yang, Peyman Milanfar, and Michael Elad. Better compression with deep pre-editing. arXiv preprint arXiv:2002.00113, 2020.
Galteri, L., Seidenari, L., Bertini, M., and Del Bimbo, A. (2019). Deep universal generative adversarial compression artifact removal. IEEE Transactions on Multimedia, 21(8):2131–2145.
Zhou Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli. Image quality assessment: from error visibility to structural similarity. IEEE Transactions on Image Processing, 13(4):600–612, April 2004.
Xiaoli Wang, Aakanksha Chowdhery, and Mung Chiang. Skyeyes: Adaptive video streaming from uavs. In Proc.of Workshop on Hot Topics in Wireless, HotWireless ’16, pages 2–6, New York, NY, USA, 2016. Association for Computing Machinery.
Maarten Wijnants, Sven Coppers, Gustavo Rovelo Ruiz, Peter Quax, and Wim Lamotte. Talking video heads: Saving streaming bitrate by adaptively applying object-based video principles to interview-like footage. In Proc. of ACM Multimedia, MM ’19, pages 2449–2458, New York, NY, USA, 2019. Association for Computing Machinery.
Zhangyang Wang, Ding Liu, Shiyu Chang, Qing Ling, Yingzhen Yang, and Thomas S Huang. D3: Deep dual-domain based fast restoration of JPEG-compressed images. In Proc. of IEEE Computer Vision and Pattern Recognition, 2016.
Xintao Wang, Ke Yu, Shixiang Wu, Jinjin Gu, Yihao Liu, Chao Dong, Yu Qiao, and Chen Change Loy. Esrgan: Enhanced super-resolution generative adversarial networks. In Proceedings of the European Conference on Computer Vision (ECCV), pages 0–0, 2018.
Jaeyoung Yoo, Sang-ho Lee, and Nojun Kwak. Image restoration by estimating frequency distribution of local patches. In Proc. of IEEE Computer Vision and Pattern Recognition, 2018.
Chia-Hung Yeh, Chu-Han Lin, Min-Hui Lin, Li-Wei Kang, Chih-Hsiang Huang, and Mei-Juan Chen. Deep learning-based compressed image artifacts reduction based on multi-scale image fusion. Information Fusion, 67:195–207, 2021.
Changqian Yu, Jingbo Wang, Chao Peng, Changxin Gao, Gang Yu, and Nong Sang. Bisenet: Bilateral segmentation network for real-time semantic segmentation. In Proc. of European Conference on Computer Vision, pages 325–341, 2018.
Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, and Oliver Wang. The unreasonable effectiveness of deep features as a perceptual metric. In CVPR, 2018.
Zoom video conferencing application. http://www.zoom.us.
X. Zhang, R. Xiong, X. Fan, S. Ma, and W. Gao. Compression artifact reduction by overlapped-block transform coefficient estimation with block similarity. IEEE Transactions on Image Processing, 22(12):4613–4626, 2013.
J. Zhang, R. Xiong, C. Zhao, Y. Zhang, S. Ma, and W. Gao. CONCOLOR: Constrained non-convex low-rank model for image deblocking. IEEE Transactions on Image Processing, 25(3):1246–1259, March 2016.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Galteri, L., Seidenari, L., Uricchio, T., Bertini, M., del Bimbo, A. (2021). Improving Video Quality with Generative Adversarial Networks. In: Benois-Pineau, J., Zemmari, A. (eds) Multi-faceted Deep Learning. Springer, Cham. https://doi.org/10.1007/978-3-030-74478-6_12
Download citation
DOI: https://doi.org/10.1007/978-3-030-74478-6_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-74477-9
Online ISBN: 978-3-030-74478-6
eBook Packages: Computer ScienceComputer Science (R0)