Abstract
In recent years, image compression methods based on deep learning have received extensive attention and research. Most methods focus on minimizing the mean squared error (MSE) to obtain reconstructed images with higher peak signal-to-noise ratio (PSNR). However, the ability of pixel-wise distortion to capture the perceptual differences between images is fairly limited, which may suffer from undesirable visual perception quality of the reconstructed images. To address this problem, we propose a novel rate-distortion loss based on perception metric in learned image compression. In this work, we introduce the perception metric into the rate-distortion loss, which can enhance the capacity of compression model to capture perceptual differences and semantic information in images. By performing that, the rate-distortion performance of our proposed model on multi-scale structural similarity (MS-SSIM) and the classification accuracy of reconstructed images have been improved. Comprehensive experimental results demonstrate that the proposed method has comparable performance in terms of PSNR, and the performance on MS-SSIM outperforms traditional image codecs, such as JPEG and BPG, as well as other previous end-to-end compression methods. More significantly, the visual quality of the reconstructed images is dramatically improved.







Similar content being viewed by others
References
Versatile video coding reference software version 9.1 (vtm-9.1) (2020). https://vcgit.hhi.fraunhofer.de/jvet/VVCSoftware_VTM/-/tags/VTM-9.1
Workshop and challenge on learned image compression (2020). http://www.compression.cc/challenge/
Agustsson, E., Mentzer, F., Tschannen, M., Cavigelli, L., Timofte, R., Benini, L., Van Gool, L.: Soft-to-hard vector quantization for end-to-end learning compressible representations. In: Advances in Neural Information Processing Systems (NIPS), pp. 1141–1151 (2017)
Ballé, J., Laparra, V., Simoncelli, E.P.: End-to-end optimized image compression. In: International Conference on Learning Representations (ICLR), pp. 1–27 (2017)
Ballé, J., Minnen, D., Singh, S., Hwang, S.J., Johnston, N.: Variational image compression with a scale hyperprior. In: International Conference on Learning Representations (ICLR), pp. 1–23 (2018)
Bégaint, J., Racapé, F., Feltman, S., Pushparaja, A.: Compressai: a pytorch library and evaluation platform for end-to-end compression research. arXiv preprint arXiv:2011.03029 (2020)
Bellard, F.: BPG image format (2014). https://bellard.org/bpg/
Bruna, J., Sprechmann, P., LeCun, Y.: Super-resolution with deep convolutional sufficient statistics. arXiv preprint arXiv:1511.05666 (2015)
Cheng, Z., Sun, H., Takeuchi, M., Katto, J.: Deep residual learning for image compression. In: CVPR Workshops, pp. 1–4 (2019)
Cheng, Z., Sun, H., Takeuchi, M., Katto, J.: Learned image compression with discretized gaussian mixture likelihoods and attention modules. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7939–7948 (2020)
Dosovitskiy, A., Brox, T.: Generating images with perceptual similarity metrics based on deep networks. arXiv preprint arXiv:1602.02644 (2016)
Gupta, P., Srivastava, P., Bhardwaj, S., Bhateja, V.: A modified PSNR metric based on HVS for quality assessment of color images. In: 2011 International Conference on Communication and Industrial Application (ICCIA), pp. 1–4. IEEE (2011)
Johnson, J., Alahi, A., Fei-Fei, L.: Perceptual losses for real-time style transfer and super-resolution. In: European Conference on Computer Vision (ECCV), pp. 694–711. Springer (2016)
Johnston, N., Vincent, D., Minnen, D., Covell, M., Singh, S., Chinen, T., Hwang, S.J., Shor, J., Toderici, G.: Improved lossy image compression with priming and spatially adaptive bit rates for recurrent networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4385–4393 (2018)
Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Kodak, E.: Kodak lossless true color image suite (photocd pcd0992) (1993). http://r0k.us/graphics/kodak/
Ledig, C., Theis, L., Huszár, F., Caballero, J., Cunningham, A., Acosta, A., Aitken, A., Tejani, A., Totz, J., Wang, Z., et al.: Photo-realistic single image super-resolution using a generative adversarial network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4681–4690 (2017)
Lee, J., Cho, S., Beack, S.K.: Context-adaptive entropy model for end-to-end optimized image compression. arXiv preprint arXiv:1809.10452 (2018)
Li, M., Zuo, W., Gu, S., Zhao, D., Zhang, D.: Learning convolutional networks for content-weighted image compression. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3214–3223 (2018)
Minnen, D., Ballé, J., Toderici, G.: Joint autoregressive and hierarchical priors for learned image compression. In: Advances in Neural Information Processing Systems (NIPS), pp. 10771–10780 (2018)
Ohm, J.R., Sullivan, G.J.: Versatile video coding—towards the next generation of video compression. In: Picture Coding Symposium (2018)
Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., et al.: Pytorch: An imperative style, high-performance deep learning library. In: Advances in Neural Information Processing Systems (NIPS), pp. 8026–8037 (2019)
Rabbani, M., Joshi, R.: An overview of the jpeg2000 still image compression standard. Signal Proc. Image Commun. 17(1), 3–48 (2002)
Rippel, O., Bourdev, L.: Real-time adaptive image compression. In: International Conference on Machine Learning (ICML), pp. 2922–2930. PMLR (2017)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Sullivan, G.J., Wiegand, T.: Rate-distortion optimization for video compression. IEEE Signal Process. Mag. 15(6), 74–90 (1998)
Theis, L., Shi, W., Cunningham, A., Huszár, F.: Lossy image compression with compressive autoencoders. arXiv preprint arXiv:1703.00395 (2017)
Toderici, G., O’Malley, S.M., Hwang, S.J., Vincent, D., Minnen, D., Baluja, S., Covell, M., Sukthankar, R.: Variable rate image compression with recurrent neural networks. arXiv preprint arXiv:1511.06085 (2015)
Toderici, G., Vincent, D., Johnston, N., Jin Hwang, S., Minnen, D., Shor, J., Covell, M.: Full resolution image compression with recurrent neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5306–5314 (2017)
Van Oord, A., Kalchbrenner, N., Kavukcuoglu, K.: Pixel recurrent neural networks. In: International Conference on Machine Learning (ICML), pp. 1747–1756. PMLR (2016)
Wallace, G.K.: The jpeg still picture compression standard. IEEE Trans. Consum. Electron. 38(1), 43–59 (1992)
Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13(4), 600–612 (2004)
Wang, Z., Simoncelli, E.P., Bovik, A.C.: Multiscale structural similarity for image quality assessment. In: The Thrity-Seventh Asilomar Conference on Signals, Systems and Computers, pp. 1398–1402. IEEE (2003)
Acknowledgements
This research was supported by the National Natural Science Foundation of China (Grant No. 61871154, No. 62031013), by the Youth Program of National Natural Science Foundation of China (61906103, 61906124), by the Basic and applied basic research fund of Guangdong Province (2019A1515011307).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Liu, S., Huang, Y., Yang, H. et al. End-to-end image compression method based on perception metric. SIViP 16, 1803–1810 (2022). https://doi.org/10.1007/s11760-022-02137-y
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11760-022-02137-y