research-article

A Multi-scale Framework towards Human-Machine Friendly Remote Sensing Image Coding

Authors:

Jing XiaoAuthors Info & Claims

MMASIA '24: Proceedings of the 6th ACM International Conference on Multimedia in Asia

Article No.: 36, Pages 1 - 6

https://doi.org/10.1145/3696409.3700197

Published: 28 December 2024 Publication History

Abstract

With the increasing availability of remote sensing data and the development of machine learning-based collaborative interpretation techniques, remote sensing image transmission needs to serve both human and machine vision, in addition to achieving high-efficiency compression. However, existing compression methods that are optimized for pixel-level representation often suffer from performance degradation when confronted with downstream machine analysis tasks. To address this issue, we propose a feature-domain-optimized, human-machine friendly remote sensing image compression framework. We design a compact multi-scale feature extractor to transform images into a feature domain representation, and then compress the compact feature for efficient transmission. During the decoding process, the compressed feature is directly used for machine analysis, which avoids the trade-off between human and machine vision requirements. To reconstruct visually pleasing images, we exploit the synergy between tasks and enhance the image reconstruction process with features from high-level tasks. Experimental results on the AID and NWPU-RESISC45 datasets demonstrate that our proposed method outperforms existing compression techniques in terms of analytical performance, while maintaining equivalent visual effects. This work highlights the importance of designing compression methods that are tailored to the needs of both human interpreters and machine analysis tasks in the context of remote sensing image processing.

References

[1]

Johannes Ballé, Valero Laparra, and Eero P Simoncelli. 2016. End-to-end optimized image compression. arXiv preprint arXiv:https://arXiv.org/abs/1611.01704 (2016).

[2]

Benjamin Bross, Ye-Kui Wang, Yan Ye, Shan Liu, Jianle Chen, Gary J Sullivan, and Jens-Rainer Ohm. 2021. Overview of the versatile video coding (VVC) standard and its applications. IEEE Trans. Circuits Syst. Video Technol. 31, 10 (2021), 3736–3764.

[3]

Qi Cai, Zhifeng Chen, Dapeng Oliver Wu, Shan Liu, and Xiang Li. 2021. A novel video coding strategy in HEVC for object detection. IEEE Trans. Circuits Syst. Video Technol. 31, 12 (2021), 4924–4937.

[4]

Gong Cheng, Junwei Han, and Xiaoqiang Lu. 2017. Remote sensing image scene classification: Benchmark and state of the art. Proc. IEEE Inst. Electr. Electron. Eng. 105, 10 (2017), 1865–1883.

[5]

Zhengxue Cheng, Heming Sun, Masaru Takeuchi, and Jiro Katto. 2020. Learned image compression with discretized gaussian mixture likelihoods and attention modules. In CVPR. 7939–7948.

[6]

Chen Dong, Haotai Liang, Xiaodong Xu, Shujun Han, Bizhu Wang, and Ping Zhang. 2022. Semantic communication system based on semantic slice models propagation. IEEE J. Sel. Areas Commun. 41, 1 (2022), 202–213.

[7]

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, et al. 2020. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:https://arXiv.org/abs/2010.11929 (2020).

[8]

Zhihao Duan, Ming Lu, Jack Ma, Yuning Huang, Zhan Ma, and Fengqing Zhu. 2023. Qarv: Quantization-aware resnet vae for lossy image compression. IEEE Trans. Pattern Anal. Mach. Intell. (2023).

[9]

Ruoyu Feng, Xin Jin, Zongyu Guo, Runsen Feng, Yixin Gao, Tianyu He, Zhizheng Zhang, Simeng Sun, and Zhibo Chen. 2022. Image coding for machines with omnipotent feature learning. In ECCV. Springer, 510–528.

Digital Library

[10]

Dailan He, Ziming Yang, Hongjiu Yu, Tongda Xu, Jixiang Luo, Yuan Chen, Chenjian Gao, Xinjie Shi, Hongwei Qin, and Yan Wang. 2022. PO-ELIC: Perception-Oriented Efficient Learned Image Coding. In CVPR Workshops. 1764–1769.

[11]

Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, and Ross Girshick. 2020. Momentum Contrast for Unsupervised Visual Representation Learning. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 9726–9735.

[12]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In CVPR. 770–778.

[13]

Yueyu Hu, Shuai Yang, Wenhan Yang, Ling-Yu Duan, and Jiaying Liu. 2020. Towards coding for human and machine vision: A scalable image coding approach. In ICME. IEEE, 1–6.

[14]

Zhimeng Huang, Chuanmin Jia, Shanshe Wang, and Siwei Ma. 2021. Visual analysis motivated rate-distortion model for image coding. In ICME. IEEE, 1–6.

[15]

Liang Liao, Jing Xiao, Zheng Wang, Chia-Wen Lin, and Shin’ichi Satoh. 2021. Image inpainting guided by coherence priors of semantics and textures. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 6539–6548.

[16]

Kang Liu, Dong Liu, Li Li, Ning Yan, and Houqiang Li. 2021. Semantics-to-signal scalable image compression with learned revertible representations. Int. J. Comput. Vision 129, 9 (2021), 2605–2621.

[17]

Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. 2021. Swin transformer: Hierarchical vision transformer using shifted windows. In ICCV. 10012–10022.

[18]

David Minnen, Johannes Ballé, and George D Toderici. 2018. Joint autoregressive and hierarchical priors for learned image compression. NeurIPS 31 (2018).

[19]

Matthew J. Muckley, Alaaeldin El-Nouby, Karen Ullrich, Hervé Jégou, and Jakob Verbeek. 2023. Improving Statistical Fidelity for Neural Image Compression with Implicit Local Likelihood Models. In ICML.

[20]

Yichen Qian, Zhiyu Tan, Xiuyu Sun, Ming Lin, Dongyang Li, Zhenhong Sun, Hao Li, and Rong Jin. 2020. Learning accurate entropy model with global reference for image compression. arXiv preprint arXiv:https://arXiv.org/abs/2010.08321 (2020).

[21]

Majid Rabbani and Rajan Joshi. 2002. An overview of the JPEG 2000 still image compression standard. Signal Process. Image Commun. 17, 1 (2002), 3–48.

[22]

Gary J Sullivan, Jens-Rainer Ohm, Woo-Jin Han, and Thomas Wiegand. 2012. Overview of the high efficiency video coding (HEVC) standard. IEEE Trans. Circuits Syst. Video Technol. 22, 12 (2012), 1649–1668.

[23]

Gregory K Wallace. 1991. The JPEG still picture compression standard. Commun. ACM 34, 4 (1991), 30–44.

[24]

Huiwen Wang, Liang Liao, Jing Xiao, Weisi Lin, and Mi Wang. 2023. Uplink-Assist Downlink Remote Sensing Image Compression via Historical Referecing. IEEE Transactions on Geoscience and Remote Sensing (2023).

[25]

Haohan Wang, Xindi Wu, Zeyi Huang, and Eric P Xing. 2020. High-frequency component helps explain the generalization of convolutional neural networks. In CVPR. 8684–8694.

[26]

Thomas Wiegand, Gary J Sullivan, Gisle Bjontegaard, and Ajay Luthra. 2003. Overview of the H. 264/AVC video coding standard. IEEE Trans. Circuits Syst. Video Technol. 13, 7 (2003), 560–576.

[27]

Gui-Song Xia, Jingwen Hu, Fan Hu, Baoguang Shi, Xiang Bai, Yanfei Zhong, Liangpei Zhang, and Xiaoqiang Lu. 2017. AID: A benchmark data set for performance evaluation of aerial scene classification. IEEE Trans. Geosci. Remote Sens. 55, 7 (2017), 3965–3981.

[28]

Jing Xiao, Yu Chen, Liang Liao, Jinhui Hu, and Ruimin Hu. 2015. Global coding of multi-source surveillance video data. In 2015 Data Compression Conference. IEEE, 33–42.

Digital Library

[29]

Jing Xiao, Ruimin Hu, Liang Liao, Yu Chen, Zhongyuan Wang, and Zixiang Xiong. 2016. Knowledge-based coding of objects for multisource surveillance video data. IEEE Transactions on Multimedia 18, 9 (2016), 1691–1706.

[30]

Yueqi Xie, Ka Leong Cheng, and Qifeng Chen. 2021. Enhanced invertible encoding for learned image compression. In ACM MM. 162–170.

Digital Library

[31]

Zhen Zhang, Jing Xiao, Liang Liao, and Mi Wang. [n. d.]. RefScale: Multi-temporal Assisted Image Rescaling in Repetitive Observation Scenarios. In ACM Multimedia 2024.

[32]

Lei Zhou, Chunlei Cai, Yue Gao, Sanbao Su, and Junmin Wu. 2018. Variational autoencoder for low bit-rate image compression. In CVPRW. 2617–2620.

Index Terms

A Multi-scale Framework towards Human-Machine Friendly Remote Sensing Image Coding
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
        Reconstruction
      2. Computer vision representations
        Image representations

Recommendations

Scalable image coding with enhancement features for human and machine
Abstract
The past decade has seen significant advancements in computer vision technologies, resulting in an increasing consumption of images and videos by both human and machine. Although machines are usually the primary consumers, there are many ...
Conditional Entropy Coding of VQ Indexes for Image Compression
DCC '97: Proceedings of the Conference on Data Compression

Vector quantization (VQ) is a source coding methodology with provable rate-distortion optimality. However, despite more than two decades of intensive research, VQ theoretical promise is yet to be fully realized in image compression practice. Restricted ...
Remote Sensing Image Compression: A Review
BIGMM '15: Proceedings of the 2015 IEEE International Conference on Multimedia Big Data

With the increasing spatial and temporal resolutions of acquired remote sensing (RS) images, effective image compression is becoming more and more important. RS image compression technologies have been extensively studied in the past a few decades, and ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MMAsia '24: Proceedings of the 6th ACM International Conference on Multimedia in Asia

December 2024

939 pages

ISBN:9798400712739

DOI:10.1145/3696409

Copyright © 2024 Copyright held by the owner/author(s). Publication rights licensed to ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 December 2024

Check for updates

Author Tags

Qualifiers

Research-article

Conference

MMAsia '24

Sponsor:

SIGMM

MMAsia '24: ACM Multimedia Asia

December 3 - 6, 2024

Auckland, New Zealand

Acceptance Rates

Overall Acceptance Rate 59 of 204 submissions, 29%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
16
Total Downloads

Downloads (Last 12 months)16
Downloads (Last 6 weeks)16

Reflects downloads up to 18 Jan 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

Media

Figures

Other

Tables

View full text|Download PDF

View Table of Contents