skip to main content
10.1145/3696409.3700197acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

A Multi-scale Framework towards Human-Machine Friendly Remote Sensing Image Coding

Published: 28 December 2024 Publication History

Abstract

With the increasing availability of remote sensing data and the development of machine learning-based collaborative interpretation techniques, remote sensing image transmission needs to serve both human and machine vision, in addition to achieving high-efficiency compression. However, existing compression methods that are optimized for pixel-level representation often suffer from performance degradation when confronted with downstream machine analysis tasks. To address this issue, we propose a feature-domain-optimized, human-machine friendly remote sensing image compression framework. We design a compact multi-scale feature extractor to transform images into a feature domain representation, and then compress the compact feature for efficient transmission. During the decoding process, the compressed feature is directly used for machine analysis, which avoids the trade-off between human and machine vision requirements. To reconstruct visually pleasing images, we exploit the synergy between tasks and enhance the image reconstruction process with features from high-level tasks. Experimental results on the AID and NWPU-RESISC45 datasets demonstrate that our proposed method outperforms existing compression techniques in terms of analytical performance, while maintaining equivalent visual effects. This work highlights the importance of designing compression methods that are tailored to the needs of both human interpreters and machine analysis tasks in the context of remote sensing image processing.

References

[1]
Johannes Ballé, Valero Laparra, and Eero P Simoncelli. 2016. End-to-end optimized image compression. arXiv preprint arXiv:https://arXiv.org/abs/1611.01704 (2016).
[2]
Benjamin Bross, Ye-Kui Wang, Yan Ye, Shan Liu, Jianle Chen, Gary J Sullivan, and Jens-Rainer Ohm. 2021. Overview of the versatile video coding (VVC) standard and its applications. IEEE Trans. Circuits Syst. Video Technol. 31, 10 (2021), 3736–3764.
[3]
Qi Cai, Zhifeng Chen, Dapeng Oliver Wu, Shan Liu, and Xiang Li. 2021. A novel video coding strategy in HEVC for object detection. IEEE Trans. Circuits Syst. Video Technol. 31, 12 (2021), 4924–4937.
[4]
Gong Cheng, Junwei Han, and Xiaoqiang Lu. 2017. Remote sensing image scene classification: Benchmark and state of the art. Proc. IEEE Inst. Electr. Electron. Eng. 105, 10 (2017), 1865–1883.
[5]
Zhengxue Cheng, Heming Sun, Masaru Takeuchi, and Jiro Katto. 2020. Learned image compression with discretized gaussian mixture likelihoods and attention modules. In CVPR. 7939–7948.
[6]
Chen Dong, Haotai Liang, Xiaodong Xu, Shujun Han, Bizhu Wang, and Ping Zhang. 2022. Semantic communication system based on semantic slice models propagation. IEEE J. Sel. Areas Commun. 41, 1 (2022), 202–213.
[7]
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, et al. 2020. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:https://arXiv.org/abs/2010.11929 (2020).
[8]
Zhihao Duan, Ming Lu, Jack Ma, Yuning Huang, Zhan Ma, and Fengqing Zhu. 2023. Qarv: Quantization-aware resnet vae for lossy image compression. IEEE Trans. Pattern Anal. Mach. Intell. (2023).
[9]
Ruoyu Feng, Xin Jin, Zongyu Guo, Runsen Feng, Yixin Gao, Tianyu He, Zhizheng Zhang, Simeng Sun, and Zhibo Chen. 2022. Image coding for machines with omnipotent feature learning. In ECCV. Springer, 510–528.
[10]
Dailan He, Ziming Yang, Hongjiu Yu, Tongda Xu, Jixiang Luo, Yuan Chen, Chenjian Gao, Xinjie Shi, Hongwei Qin, and Yan Wang. 2022. PO-ELIC: Perception-Oriented Efficient Learned Image Coding. In CVPR Workshops. 1764–1769.
[11]
Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, and Ross Girshick. 2020. Momentum Contrast for Unsupervised Visual Representation Learning. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 9726–9735.
[12]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In CVPR. 770–778.
[13]
Yueyu Hu, Shuai Yang, Wenhan Yang, Ling-Yu Duan, and Jiaying Liu. 2020. Towards coding for human and machine vision: A scalable image coding approach. In ICME. IEEE, 1–6.
[14]
Zhimeng Huang, Chuanmin Jia, Shanshe Wang, and Siwei Ma. 2021. Visual analysis motivated rate-distortion model for image coding. In ICME. IEEE, 1–6.
[15]
Liang Liao, Jing Xiao, Zheng Wang, Chia-Wen Lin, and Shin’ichi Satoh. 2021. Image inpainting guided by coherence priors of semantics and textures. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 6539–6548.
[16]
Kang Liu, Dong Liu, Li Li, Ning Yan, and Houqiang Li. 2021. Semantics-to-signal scalable image compression with learned revertible representations. Int. J. Comput. Vision 129, 9 (2021), 2605–2621.
[17]
Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. 2021. Swin transformer: Hierarchical vision transformer using shifted windows. In ICCV. 10012–10022.
[18]
David Minnen, Johannes Ballé, and George D Toderici. 2018. Joint autoregressive and hierarchical priors for learned image compression. NeurIPS 31 (2018).
[19]
Matthew J. Muckley, Alaaeldin El-Nouby, Karen Ullrich, Hervé Jégou, and Jakob Verbeek. 2023. Improving Statistical Fidelity for Neural Image Compression with Implicit Local Likelihood Models. In ICML.
[20]
Yichen Qian, Zhiyu Tan, Xiuyu Sun, Ming Lin, Dongyang Li, Zhenhong Sun, Hao Li, and Rong Jin. 2020. Learning accurate entropy model with global reference for image compression. arXiv preprint arXiv:https://arXiv.org/abs/2010.08321 (2020).
[21]
Majid Rabbani and Rajan Joshi. 2002. An overview of the JPEG 2000 still image compression standard. Signal Process. Image Commun. 17, 1 (2002), 3–48.
[22]
Gary J Sullivan, Jens-Rainer Ohm, Woo-Jin Han, and Thomas Wiegand. 2012. Overview of the high efficiency video coding (HEVC) standard. IEEE Trans. Circuits Syst. Video Technol. 22, 12 (2012), 1649–1668.
[23]
Gregory K Wallace. 1991. The JPEG still picture compression standard. Commun. ACM 34, 4 (1991), 30–44.
[24]
Huiwen Wang, Liang Liao, Jing Xiao, Weisi Lin, and Mi Wang. 2023. Uplink-Assist Downlink Remote Sensing Image Compression via Historical Referecing. IEEE Transactions on Geoscience and Remote Sensing (2023).
[25]
Haohan Wang, Xindi Wu, Zeyi Huang, and Eric P Xing. 2020. High-frequency component helps explain the generalization of convolutional neural networks. In CVPR. 8684–8694.
[26]
Thomas Wiegand, Gary J Sullivan, Gisle Bjontegaard, and Ajay Luthra. 2003. Overview of the H. 264/AVC video coding standard. IEEE Trans. Circuits Syst. Video Technol. 13, 7 (2003), 560–576.
[27]
Gui-Song Xia, Jingwen Hu, Fan Hu, Baoguang Shi, Xiang Bai, Yanfei Zhong, Liangpei Zhang, and Xiaoqiang Lu. 2017. AID: A benchmark data set for performance evaluation of aerial scene classification. IEEE Trans. Geosci. Remote Sens. 55, 7 (2017), 3965–3981.
[28]
Jing Xiao, Yu Chen, Liang Liao, Jinhui Hu, and Ruimin Hu. 2015. Global coding of multi-source surveillance video data. In 2015 Data Compression Conference. IEEE, 33–42.
[29]
Jing Xiao, Ruimin Hu, Liang Liao, Yu Chen, Zhongyuan Wang, and Zixiang Xiong. 2016. Knowledge-based coding of objects for multisource surveillance video data. IEEE Transactions on Multimedia 18, 9 (2016), 1691–1706.
[30]
Yueqi Xie, Ka Leong Cheng, and Qifeng Chen. 2021. Enhanced invertible encoding for learned image compression. In ACM MM. 162–170.
[31]
Zhen Zhang, Jing Xiao, Liang Liao, and Mi Wang. [n. d.]. RefScale: Multi-temporal Assisted Image Rescaling in Repetitive Observation Scenarios. In ACM Multimedia 2024.
[32]
Lei Zhou, Chunlei Cai, Yue Gao, Sanbao Su, and Junmin Wu. 2018. Variational autoencoder for low bit-rate image compression. In CVPRW. 2617–2620.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MMAsia '24: Proceedings of the 6th ACM International Conference on Multimedia in Asia
December 2024
939 pages
ISBN:9798400712739
DOI:10.1145/3696409
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 December 2024

Check for updates

Author Tags

  1. Image Compression
  2. Semantic Compression
  3. Remote Sensing Data
  4. Machine Vision
  5. Deep Learning

Qualifiers

  • Research-article

Conference

MMAsia '24
Sponsor:
MMAsia '24: ACM Multimedia Asia
December 3 - 6, 2024
Auckland, New Zealand

Acceptance Rates

Overall Acceptance Rate 59 of 204 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 16
    Total Downloads
  • Downloads (Last 12 months)16
  • Downloads (Last 6 weeks)16
Reflects downloads up to 18 Jan 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

Full Text

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media