Rate-Distortion-Cognition Controllable Versatile Neural Image Compression

Liu, Jinming; Feng, Ruoyu; Qi, Yunpeng; Chen, Qiuyu; Chen, Zhibo; Zeng, Wenjun; Jin, Xin

doi:10.1007/978-3-031-72992-8_19

Jinming Liu^13,14,
Ruoyu Feng¹⁵,
Yunpeng Qi¹⁵,
Qiuyu Chen¹⁴,
Zhibo Chen¹⁵,
Wenjun Zeng¹⁴ &
…
Xin Jin¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15114))

Included in the following conference series:

European Conference on Computer Vision

302 Accesses

Abstract

Recently, the field of Image Coding for Machines (ICM) has garnered heightened interest and significant advances thanks to the rapid progress of learning-based techniques for image compression and analysis. Previous studies often require training separate codecs to support various bitrate levels, machine tasks, and networks, thus lacking both flexibility and practicality. To address these challenges, we propose a rate-distortion-cognition controllable versatile image compression, which method allows the users to adjust the bitrate (i.e., Rate), image reconstruction quality (i.e., Distortion), and machine task accuracy (i.e., Cognition) with a single neural model, achieving ultra-controllability. Specifically, we first introduce a cognition-oriented loss in the primary compression branch to train a codec for diverse machine tasks. This branch attains variable bitrate by regulating quantization degree through the latent code channels. To further enhance the quality of the reconstructed images, we employ an auxiliary branch to supplement residual information with a scalable bitstream. Ultimately, two branches use a ‘$\beta x + (1 - \beta ) y$’ interpolation strategy to achieve a balanced cognition-distortion trade-off. Extensive experiments demonstrate that our method yields satisfactory ICM performance and flexible Rate-Distortion-Cognition controlling.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Task-Aware Quantization Network for JPEG Image Compression

CodedVision: Towards Joint Image Understanding and Compression via End-to-End Learning

On Enhancing Low Bit-Rate Performance of an Image Codec Using Deep Learning-Based Nonlinear Processing

References

Agustsson, E., Minnen, D., Toderici, G., Mentzer, F.: Multi-realism image compression with a conditional generator. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 22324–22333 (2023)
Google Scholar
Agustsson, E., Tschannen, M., Mentzer, F., Timofte, R., Gool, L.V.: Generative adversarial networks for extreme learned image compression. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 221–231 (2019)
Google Scholar
Akbari, M., Liang, J., Han, J.: DSSLIC: deep semantic segmentation-based layered image compression. In: ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2042–2046. IEEE (2019)
Google Scholar
Babenko, A., Slesarev, A., Chigorin, A., Lempitsky, V.: Neural codes for image retrieval. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8689, pp. 584–599. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10590-1_38
Chapter Google Scholar
Badrinarayanan, V., Kendall, A., Cipolla, R.: SegNet: a deep convolutional encoder-decoder architecture for image segmentation. TPAMI 39(12), 2481–2495 (2017)
Article Google Scholar
Ballé, J., Laparra, V., Simoncelli, E.P.: End-to-end optimized image compression. In: ICLR (2017)
Google Scholar
Bjontegaard, G.: Calculation of average PSNR differences between RD-curves. In: VCEG-M33 (2001)
Google Scholar
Blau, Y., Michaeli, T.: Rethinking lossy compression: the rate-distortion-perception tradeoff. In: International Conference on Machine Learning. pp, 675–685. PMLR (2019)
Google Scholar
Bross, B., et al.: Overview of the versatile video coding (VVC) standard and its applications. TCSVT 31, 3736–3764 (2021)
Google Scholar
Caron, M., et al.: Emerging properties in self-supervised vision transformers. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9650–9660 (2021)
Google Scholar
Chamain, L.D., Racapé, F., Bégaint, J., Pushparaja, A., Feltman, S.: End-to-end optimized image compression for machines, a study. In: 2021 Data Compression Conference (DCC), pp. 163–172. IEEE (2021)
Google Scholar
Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. TPAMI 40(4), 834–848 (2017)
Article Google Scholar
Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: ICML, pp. 1597–1607. PMLR (2020)
Google Scholar
Chen, Y.H., Weng, Y.C., Kao, C.H., Chien, C., Chiu, W.C., Peng, W.H.: Transtic: transferring transformer-based image compression from human perception to machine perception. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 23297–23307 (2023)
Google Scholar
Chen, Z., Fan, K., Wang, S., Duan, L.Y., Lin, W., Kot, A.: Lossy intermediate deep learning feature compression and evaluation. In: ACM MM, pp. 2414–2422 (2019)
Google Scholar
Chen, Z., Fan, K., Wang, S., Duan, L., Lin, W., Kot, A.C.: Toward intelligent sensing: intermediate deep feature compression. TIP 29, 2230–2243 (2019)
Google Scholar
Cheng, Z., Sun, H., Takeuchi, M., Katto, J.: Learned image compression with discretized gaussian mixture likelihoods and attention modules. In: CVPR, pp. 7939–7948 (2020)
Google Scholar
Choi, H., Bajić, I.V.: Scalable image coding for humans and machines. IEEE Trans. Image Process. 31, 2739–2754 (2022)
Article Google Scholar
Choi, Y., El-Khamy, M., Lee, J.: Variable rate deep image compression with a conditional autoencoder. In: ICCV, pp. 3146–3154 (2019)
Google Scholar
Cui, Z., Wang, J., Bai, B., Guo, T., Feng, Y.: G-vae: a continuously variable rate deep image compression framework. arXiv preprint arXiv:2003.02012 (2020)
Dosovitskiy, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale. ICLR (2020)
Google Scholar
Duan, L.Y., et al.: Overview of the MPEG-CDVS standard. TIP 25(1), 179–194 (2015)
MathSciNet Google Scholar
Duan, L.Y., et al.: Compact descriptors for video analysis: the emerging MPEG standard. IEEE Multimedia 26(2), 44–54 (2018)
Article Google Scholar
Duan, L., Liu, J., Yang, W., Huang, T., Gao, W.: Video coding for machines: a paradigm of collaborative compression and intelligent analytics. TIP 29, 8680–8695 (2020)
Google Scholar
Feng, R., et al.: Image coding for machines with omnipotent feature learning. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13697, pp. 510–528. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19836-6_29
Chapter Google Scholar
Feng, R., Liu, J., Jin, X., Pan, X., Sun, H., Chen, Z.: Prompt-ICM: a unified framework towards image coding for machines with task-driven prompts. arXiv preprint arXiv:2305.02578 (2023)
Gao, W., et al.: Digital retina: a way to make the city brain more efficient by visual coding. IEEE Trans. Circ. Syst. Video Technol. 31(11), 4147–4161 (2021)
Article Google Scholar
Han, K., Xiao, A., Wu, E., Guo, J., Xu, C., Wang, Y.: Transformer in transformer. NeurIPS 34, 15908–15919 (2021)
Google Scholar
He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning. In: CVPR, pp. 9729–9738 (2020)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016)
Google Scholar
He, T., Sun, S., Guo, Z., Chen, Z.: Beyond coding: detection-driven image compression with semantically structured bit-stream. In: 2019 Picture Coding Symposium (PCS), pp. 1–5. IEEE (2019)
Google Scholar
Hu, Y., et al.: Planning-oriented autonomous driving. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 17853–17862 (2023)
Google Scholar
Hu, Y., Yang, S., Yang, W., Duan, L.Y., Liu, J.: Towards coding for human and machine vision: a scalable image coding approach. In: 2020 IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6. IEEE (2020)
Google Scholar
Iwai, S., Miyazaki, T., Omachi, S.: Controlling rate, distortion, and realism: towards a single comprehensive neural image compression model. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 2900–2909 (2024)
Google Scholar
Jia, M., et al.: Visual prompt tuning. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13693, pp. 709–727. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19827-4_41
Chapter Google Scholar
Jin, X., Feng, R., Sun, S., Feng, R., He, T., Chen, Z.: Semantical video coding: instill static-dynamic clues into structured bitstream for AI tasks. J. Vis. Commun. Image Represent. 93, 103816 (2023)
Article Google Scholar
Johnston, N., et al.: Improved lossy image compression with priming and spatially adaptive bit rates for recurrent networks. In: CVPR, pp. 4385–4393 (2018)
Google Scholar
Körber, N., Kromer, E., Siebert, A., Hauke, S., Mueller-Gritschneder, D.: Egic: enhanced low-bit-rate generative image compression guided by semantic segmentation. arXiv preprint arXiv:2309.03244 (2023)
Le, N., Zhang, H., Cricri, F., Ghaznavi-Youvalari, R., Rahtu, E.: Image coding for machines: an end-to-end learned approach. In: ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1590–1594. IEEE (2021)
Google Scholar
Li, H., Li, S., Dai, W., Li, C., Zou, J., Xiong, H.: Frequency-aware transformer for learned image compression. arXiv preprint arXiv:2310.16387 (2023)
Li, H., Li, S., Ding, S., Dai, W., Cao, M., Li, C., Zou, J., Xiong, H.: Image compression for machine and human vision with spatial-frequency adaptation. In: ECCV. Springer (2024)
Google Scholar
Li, X., Shi, J., Chen, Z.: Task-driven semantic coding via reinforcement learning. TIP 30, 6307–6320 (2021)
Google Scholar
Li, Y., Mao, H., Girshick, R., He, K.: Exploring plain vision transformer backbones for object detection. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13699, pp. 280–296. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-20077-9_17
Chapter Google Scholar
Liu, J., Sun, H., Katto, J.: Semantic segmentation in learned compressed domain. In: 2022 Picture Coding Symposium (PCS), pp. 181–185. IEEE (2022)
Google Scholar
Liu, J., Sun, H., Katto, J.: Learned image compression with mixed transformer-CNN architectures. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14388–14397 (2023)
Google Scholar
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: CVPR, pp. 3431–3440 (2015)
Google Scholar
Lu, G., Ge, X., Zhong, T., Geng, J., Hu, Q.: Preprocessing enhanced image compression for machine vision. arXiv preprint arXiv:2206.05650 (2022)
Lu, M., Guo, P., Shi, H., Cao, C., Ma, Z.: Transformer-based image compression. arXiv preprint arXiv:2111.06707 (2021)
Ma, H., Liu, D., Yan, N., Li, H., Wu, F.: End-to-end optimized versatile image compression with wavelet-like transform. IEEE Trans. Pattern Anal. Mach. Intell. 44(3), 1247–1263 (2020)
Article Google Scholar
Ma, S., Zhang, X., Wang, S., Zhang, X., Jia, C., Wang, S.: Joint feature and texture coding: toward smart video representation via front-end intelligence. TCSVT 29(10), 3095–3105 (2018)
Google Scholar
Mentzer, F., Agustsson, E., Tschannen, M., Timofte, R., Van Gool, L.: Conditional probability models for deep image compression. In: CVPR, pp. 4394–4402 (2018)
Google Scholar
Mentzer, F., Toderici, G., Tschannen, M., Agustsson, E.: High-fidelity generative image compression. arXiv preprint arXiv:2006.09965 (2020)
Minnen, D., Ballé, J., Toderici, G.: Joint autoregressive and hierarchical priors for learned image compression. In: NeurIPS (2018)
Google Scholar
Rabbani, M., Joshi, R.: An overview of the JPEG 2000 still image compression standard. Signal Process. Image Commun. 17(1), 3–48 (2002)
Article Google Scholar
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: CVPR, pp. 779–788 (2016)
Google Scholar
Redmon, J., Farhadi, A.: Yolo9000: better, faster, stronger. In: CVPR, pp. 7263–7271 (2017)
Google Scholar
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. NeurIPS 28, 91–99 (2015)
Google Scholar
Singh, S., Abu-El-Haija, S., Johnston, N., Ballé, J., Shrivastava, A., Toderici, G.: End-to-end learning of compressible features. In: 2020 IEEE International Conference on Image Processing (ICIP), pp. 3349–3353. IEEE (2020)
Google Scholar
Song, M., Choi, J., Han, B.: Variable-rate deep image compression through spatially-adaptive feature transform. In: ICCV, pp. 2380–2389 (2021)
Google Scholar
Sullivan, G.J., Ohm, J.R., Han, W.J., Wiegand, T.: Overview of the high efficiency video coding (HEVC) standard. TCSVT 22(12), 1649–1668 (2012)
Google Scholar
Sun, S., He, T., Chen, Z.: Semantic structured image coding framework for multiple intelligent applications. TCSVT 31, 3631–3642 (2020)
Google Scholar
Terhörst, P., et al.: Qmagface: simple and accurate quality-aware face recognition. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 3484–3494 (2023)
Google Scholar
Toderici, G., et al.: Full resolution image compression with recurrent neural networks. In: CVPR, pp. 5306–5314 (2017)
Google Scholar
Wallace, G.K.: The JPEG still picture compression standard. IEEE Trans. Consum. Electron. 38(1), xviii–xxxiv (1992)
Google Scholar
Wang, S., et al.: Towards analysis-friendly face representation with scalable feature and texture compression. TMM 24, 3169–3181 (2021)
Google Scholar
Wiegand, T., Sullivan, G.J., Bjontegaard, G., Luthra, A.: Overview of the H. 264/AVC video coding standard. TCSVT 13(7), 560–576 (2003)
Google Scholar
Wu, Y., Kirillov, A., Massa, F., Lo, W.Y., Girshick, R.: Detectron2 (2019). https://github.com/facebookresearch/detectron2
Xie, E., Wang, W., Yu, Z., Anandkumar, A., Alvarez, J.M., Luo, P.: SegFormer: simple and efficient design for semantic segmentation with transformers. NeurIPS 34, 12077–12090 (2021)
Google Scholar
Yan, F.F., Hou, F., Lu, Z.L., Hu, X., Huang, C.B.: Efficient characterization and classification of contrast sensitivity functions in aging. Sci. Rep. 7(1), 5045 (2017)
Article Google Scholar
Yang, F., Herranz, L., Van De Weijer, J., Guitián, J.A.I., López, A.M., Mozerov, M.G.: Variable rate deep image compression with modulated autoencoder. IEEE Signal Process. Lett. 27, 331–335 (2020)
Article Google Scholar
Yang, R., Mandt, S.: Lossy image compression with conditional diffusion models. arXiv preprint arXiv:2209.06950 (2022)
Zheng, S., et al.: Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers. In: CVPR, pp. 6881–6890 (2021)
Google Scholar

Download references

Acknowledgments

This work was supported in part by NSFC 62302246 and ZJNSFC under Grant LQ23F010008, and supported by High Performance Computing Center at Eastern Institute of Technology, Ningbo, and Ningbo Institute of Digital Twin.

Author information

Authors and Affiliations

Shanghai Jiao Tong University, Shanghai, China
Jinming Liu
Ningbo Institute of Digital Twin, Eastern Institute of Technology, Ningbo, China
Jinming Liu, Qiuyu Chen, Wenjun Zeng & Xin Jin
University of Science and Technology of China, Hefei, China
Ruoyu Feng, Yunpeng Qi & Zhibo Chen

Authors

Jinming Liu
View author publications
You can also search for this author in PubMed Google Scholar
Ruoyu Feng
View author publications
You can also search for this author in PubMed Google Scholar
Yunpeng Qi
View author publications
You can also search for this author in PubMed Google Scholar
Qiuyu Chen
View author publications
You can also search for this author in PubMed Google Scholar
Zhibo Chen
View author publications
You can also search for this author in PubMed Google Scholar
Wenjun Zeng
View author publications
You can also search for this author in PubMed Google Scholar
Xin Jin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xin Jin .

Editor information

Editors and Affiliations

University of Birmingham, Birmingham, UK
Aleš Leonardis
University of Trento, Trento, Italy
Elisa Ricci
Technical University of Darmstadt, Darmstadt, Hessen, Germany
Stefan Roth
Princeton University, Palo Alto, CA, USA
Olga Russakovsky
Czech Technical University in Prague, Prague, Czech Republic
Torsten Sattler
École des Ponts ParisTech, Marne-la-Vallée, France
Gül Varol

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liu, J. et al. (2025). Rate-Distortion-Cognition Controllable Versatile Neural Image Compression. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15114. Springer, Cham. https://doi.org/10.1007/978-3-031-72992-8_19

Download citation

DOI: https://doi.org/10.1007/978-3-031-72992-8_19
Published: 30 October 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-72991-1
Online ISBN: 978-3-031-72992-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics