Saliency-Guided Learned Image Compression for Object Detection

Xiong, Haoxuan; Xu, Yuanyuan

doi:10.1007/978-981-99-1639-9_27

Haoxuan Xiong¹⁰ &
Yuanyuan Xu¹⁰

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1791))

Included in the following conference series:

International Conference on Neural Information Processing

794 Accesses

Abstract

With recent advance of computer vision techniques, an increasing amount of image and video content is consumed by machines. However, existing image and video compression schemes are mainly designed for human vision, which are not optimized concerning machine vision. In this paper, we propose a saliency guided learned image compression scheme for machines, where object detection is considered as an example task. To obtain salient regions for machine vision, a saliency map is obtained for each detected object using an existing black-box explanation of neural networks, and maps for multiple objects are merged sophistically into one. Based on a neural network-based image codec, a bitrate allocation scheme has been designed which prunes the latent representation of the image according to the saliency map. During the training of end-to-end image codec, both pixel fidelity and machine vision fidelity are used for performance evaluation, where the degradation in detection accuracy is measured without ground-truth annotation. Experimental results demonstrate that the proposed scheme can achieve up to 14.1% reduction in bitrate with the same detection accuracy compared with the baseline learned image codec.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Arrieta, A.B., et al.: Explainable artificial intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI. Inf. Fusion 58, 82–115 (2020)
Article Google Scholar
Bjontegaard, G.: Calculation of average PSNR differences between RD-curves. In: VCEG-M33 (2001)
Google Scholar
Cai, Q., Chen, Z., Wu, D., Liu, S., Li, X.: A novel video coding strategy in HEVC for object detection. IEEE Trans. Circuits Syst. Video Technol. 31(12), 4924–4937 (2021)
Article Google Scholar
Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The PASCAL visual object classes challenge. http://host.robots.ox.ac.uk/pascal/VOC/
Gao, W., Liu, S., Xu, X., Rafie, M., Zhang, Y., Curcio, I.: Recent standard development activities on video coding for machines (2021). https://arxiv.org/abs/2105.12653
Hu, Y., Yang, W., Liu, J.: Coarse-to-fine hyper-prior modeling for learned image compression. In: Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, New York, NY, USA, 7–12 February 2020, pp. 11013–11020 (2020)
Google Scholar
Huang, Z., Jia, C., Wang, S., Ma, S.: Visual analysis motivated rate-distortion model for image coding. In: 2021 IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6. IEEE (2021)
Google Scholar
Vtm reference software for vvc (2021). https://vcgit.hhi.fraunhofer.de/jvet/VVCSoftware_VTM/-/tree/VTM-16.0
Le, N., Zhang, H., Cricri, F., Ghaznavi-Youvalari, R., Tavakoli, H.R., Rahtu, E.: Learned image coding for machines: a content-adaptive approach. In: IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6 (2021)
Google Scholar
Li, Y., et al.: Joint rate-distortion optimization for simultaneous texture and deep feature compression of facial images. In: Fourth IEEE International Conference on Multimedia Big Data, BigMM 2018, Xi’an, China, pp. 1–5 (2018)
Google Scholar
Petsiuk, V., Das, A., Saenko, K.: Rise: randomized input sampling for explanation of black-box models. In: British Machine Vision Conference 2018, BMVC 2018, Newcastle, UK, 3–6 September (2018)
Google Scholar
Petsiuk, V., et al.: Black-box explanation of object detectors via saliency maps. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11438–11447 (2021)
Google Scholar
Redmon, J., Farhadi, A.: Yolov3: an incremental improvement (2018). 10.48550/ARXIV.1804.02767
Google Scholar
Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.: Grad-cam: visual explanations from deep networks via gradient-based localization. In: IEEE International Conference on Computer Vision (ICCV), pp. 618–626 (2017)
Google Scholar
Sullivan, G.J., Ohm, J.R., Han, W.J., Wiegand, T.: Overview of the high efficiency video coding (HEVC) standard. IEEE Trans. Circuits Syst. Video Technol. 22(12), 1649–1668 (2012)
Article Google Scholar
Ultralytics: Yolov3 implementation (2021). https://doi.org/10.5281/zenodo.6222936, https://github.com/ultralytics/yolov3
Wang, S., et al.: Teacher-student learning with multi-granularity constraint towards compact facial feature representation. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). pp. 8503–8507 (2021)
Google Scholar
Wang, S., Wang, Z., Wang, S., Ye, Y.: End-to-end compression towards machine vision: network architecture design and optimization. IEEE Open J. Circuits Syst. 2, 675–685 (2021)
Article Google Scholar
Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., Torralba, A.: Learning deep features for discriminative localization. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2921–2929 (2016)
Google Scholar

Download references

Author information

Authors and Affiliations

Key Laboratory of Water Big Data Technology of Ministry of Water Resources, Hohai University College of Computer and Information, Hohai University, Nanjing, China
Haoxuan Xiong & Yuanyuan Xu

Authors

Haoxuan Xiong
View author publications
You can also search for this author in PubMed Google Scholar
Yuanyuan Xu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yuanyuan Xu .

Editor information

Editors and Affiliations

Indian Institute of Technology Indore, Indore, India
Mohammad Tanveer
Indian Institute of Information Technology - Allahabad, Prayagraj, India
Sonali Agarwal
Kobe University, Kobe, Japan
Seiichi Ozawa
Indian Institute of Technology Patna, Patna, India
Asif Ekbal
University of Innsbruck, Innsbruck, Austria
Adam Jatowt

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Xiong, H., Xu, Y. (2023). Saliency-Guided Learned Image Compression for Object Detection. In: Tanveer, M., Agarwal, S., Ozawa, S., Ekbal, A., Jatowt, A. (eds) Neural Information Processing. ICONIP 2022. Communications in Computer and Information Science, vol 1791. Springer, Singapore. https://doi.org/10.1007/978-981-99-1639-9_27

Download citation

DOI: https://doi.org/10.1007/978-981-99-1639-9_27
Published: 15 April 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-1638-2
Online ISBN: 978-981-99-1639-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Saliency-Guided Learned Image Compression for Object Detection