Skip to main content
Log in

End-to-end image compression method based on perception metric

  • Original Paper
  • Published:
Signal, Image and Video Processing Aims and scope Submit manuscript

Abstract

In recent years, image compression methods based on deep learning have received extensive attention and research. Most methods focus on minimizing the mean squared error (MSE) to obtain reconstructed images with higher peak signal-to-noise ratio (PSNR). However, the ability of pixel-wise distortion to capture the perceptual differences between images is fairly limited, which may suffer from undesirable visual perception quality of the reconstructed images. To address this problem, we propose a novel rate-distortion loss based on perception metric in learned image compression. In this work, we introduce the perception metric into the rate-distortion loss, which can enhance the capacity of compression model to capture perceptual differences and semantic information in images. By performing that, the rate-distortion performance of our proposed model on multi-scale structural similarity (MS-SSIM) and the classification accuracy of reconstructed images have been improved. Comprehensive experimental results demonstrate that the proposed method has comparable performance in terms of PSNR, and the performance on MS-SSIM outperforms traditional image codecs, such as JPEG and BPG, as well as other previous end-to-end compression methods. More significantly, the visual quality of the reconstructed images is dramatically improved.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Versatile video coding reference software version 9.1 (vtm-9.1) (2020). https://vcgit.hhi.fraunhofer.de/jvet/VVCSoftware_VTM/-/tags/VTM-9.1

  2. Workshop and challenge on learned image compression (2020). http://www.compression.cc/challenge/

  3. Agustsson, E., Mentzer, F., Tschannen, M., Cavigelli, L., Timofte, R., Benini, L., Van Gool, L.: Soft-to-hard vector quantization for end-to-end learning compressible representations. In: Advances in Neural Information Processing Systems (NIPS), pp. 1141–1151 (2017)

  4. Ballé, J., Laparra, V., Simoncelli, E.P.: End-to-end optimized image compression. In: International Conference on Learning Representations (ICLR), pp. 1–27 (2017)

  5. Ballé, J., Minnen, D., Singh, S., Hwang, S.J., Johnston, N.: Variational image compression with a scale hyperprior. In: International Conference on Learning Representations (ICLR), pp. 1–23 (2018)

  6. Bégaint, J., Racapé, F., Feltman, S., Pushparaja, A.: Compressai: a pytorch library and evaluation platform for end-to-end compression research. arXiv preprint arXiv:2011.03029 (2020)

  7. Bellard, F.: BPG image format (2014). https://bellard.org/bpg/

  8. Bruna, J., Sprechmann, P., LeCun, Y.: Super-resolution with deep convolutional sufficient statistics. arXiv preprint arXiv:1511.05666 (2015)

  9. Cheng, Z., Sun, H., Takeuchi, M., Katto, J.: Deep residual learning for image compression. In: CVPR Workshops, pp. 1–4 (2019)

  10. Cheng, Z., Sun, H., Takeuchi, M., Katto, J.: Learned image compression with discretized gaussian mixture likelihoods and attention modules. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7939–7948 (2020)

  11. Dosovitskiy, A., Brox, T.: Generating images with perceptual similarity metrics based on deep networks. arXiv preprint arXiv:1602.02644 (2016)

  12. Gupta, P., Srivastava, P., Bhardwaj, S., Bhateja, V.: A modified PSNR metric based on HVS for quality assessment of color images. In: 2011 International Conference on Communication and Industrial Application (ICCIA), pp. 1–4. IEEE (2011)

  13. Johnson, J., Alahi, A., Fei-Fei, L.: Perceptual losses for real-time style transfer and super-resolution. In: European Conference on Computer Vision (ECCV), pp. 694–711. Springer (2016)

  14. Johnston, N., Vincent, D., Minnen, D., Covell, M., Singh, S., Chinen, T., Hwang, S.J., Shor, J., Toderici, G.: Improved lossy image compression with priming and spatially adaptive bit rates for recurrent networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4385–4393 (2018)

  15. Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)

  16. Kodak, E.: Kodak lossless true color image suite (photocd pcd0992) (1993). http://r0k.us/graphics/kodak/

  17. Ledig, C., Theis, L., Huszár, F., Caballero, J., Cunningham, A., Acosta, A., Aitken, A., Tejani, A., Totz, J., Wang, Z., et al.: Photo-realistic single image super-resolution using a generative adversarial network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4681–4690 (2017)

  18. Lee, J., Cho, S., Beack, S.K.: Context-adaptive entropy model for end-to-end optimized image compression. arXiv preprint arXiv:1809.10452 (2018)

  19. Li, M., Zuo, W., Gu, S., Zhao, D., Zhang, D.: Learning convolutional networks for content-weighted image compression. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3214–3223 (2018)

  20. Minnen, D., Ballé, J., Toderici, G.: Joint autoregressive and hierarchical priors for learned image compression. In: Advances in Neural Information Processing Systems (NIPS), pp. 10771–10780 (2018)

  21. Ohm, J.R., Sullivan, G.J.: Versatile video coding—towards the next generation of video compression. In: Picture Coding Symposium (2018)

  22. Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., et al.: Pytorch: An imperative style, high-performance deep learning library. In: Advances in Neural Information Processing Systems (NIPS), pp. 8026–8037 (2019)

  23. Rabbani, M., Joshi, R.: An overview of the jpeg2000 still image compression standard. Signal Proc. Image Commun. 17(1), 3–48 (2002)

    Article  Google Scholar 

  24. Rippel, O., Bourdev, L.: Real-time adaptive image compression. In: International Conference on Machine Learning (ICML), pp. 2922–2930. PMLR (2017)

  25. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)

  26. Sullivan, G.J., Wiegand, T.: Rate-distortion optimization for video compression. IEEE Signal Process. Mag. 15(6), 74–90 (1998)

    Article  Google Scholar 

  27. Theis, L., Shi, W., Cunningham, A., Huszár, F.: Lossy image compression with compressive autoencoders. arXiv preprint arXiv:1703.00395 (2017)

  28. Toderici, G., O’Malley, S.M., Hwang, S.J., Vincent, D., Minnen, D., Baluja, S., Covell, M., Sukthankar, R.: Variable rate image compression with recurrent neural networks. arXiv preprint arXiv:1511.06085 (2015)

  29. Toderici, G., Vincent, D., Johnston, N., Jin Hwang, S., Minnen, D., Shor, J., Covell, M.: Full resolution image compression with recurrent neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5306–5314 (2017)

  30. Van Oord, A., Kalchbrenner, N., Kavukcuoglu, K.: Pixel recurrent neural networks. In: International Conference on Machine Learning (ICML), pp. 1747–1756. PMLR (2016)

  31. Wallace, G.K.: The jpeg still picture compression standard. IEEE Trans. Consum. Electron. 38(1), 43–59 (1992)

    Article  Google Scholar 

  32. Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13(4), 600–612 (2004)

    Article  Google Scholar 

  33. Wang, Z., Simoncelli, E.P., Bovik, A.C.: Multiscale structural similarity for image quality assessment. In: The Thrity-Seventh Asilomar Conference on Signals, Systems and Computers, pp. 1398–1402. IEEE (2003)

Download references

Acknowledgements

This research was supported by the National Natural Science Foundation of China (Grant No. 61871154, No. 62031013), by the Youth Program of National Natural Science Foundation of China (61906103, 61906124), by the Basic and applied basic research fund of Guangdong Province (2019A1515011307).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yongsheng Liang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, S., Huang, Y., Yang, H. et al. End-to-end image compression method based on perception metric. SIViP 16, 1803–1810 (2022). https://doi.org/10.1007/s11760-022-02137-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11760-022-02137-y

Keywords

Navigation