skip to main content
10.1145/3503161.3547817acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

RKformer: Runge-Kutta Transformer with Random-Connection Attention for Infrared Small Target Detection

Published: 10 October 2022 Publication History

Abstract

Infrared small target detection (IRSTD) refers to segmenting the small targets from infrared images, which is of great significance in practical applications. However, due to the small scale of targets as well as noise and clutter in the background, current deep neural network-based methods struggle in extracting features with discriminative semantics while preserving fine details. In this paper, we address this problem by proposing a novel RKformer model with an encoder-decoder structure, where four specifically designed Runge-Kutta transformer (RKT) blocks are stacked sequentially in the encoder. Technically, it has three key designs. First, we adopt a parallel encoder block (PEB) of the transformer and convolution to take their advantages in long-range dependency modeling and locality modeling for extracting semantics and preserving details. Second, we propose a novel random-connection attention (RCA) block, which has a reservoir structure to learn sparse attention via random connections during training. RCA encourages the target to attend to sparse relevant positions instead of all the large-area background pixels, resulting in more informative attention scores. It has fewer parameters and computations than the original self-attention in the transformer while performing better. Third, inspired by neural ordinary differential equations (ODE), we stack two PEBs with several residual connections as the basic encoder block to implement the Runge-Kutta method for solving ODE, which can effectively enhance the feature and suppress noise. Experiments on the public NUAA-SIRST dataset and IRSTD-1k dataset demonstrate the superiority of the RKformer over state-of-the-art methods.

Supplementary Material

MP4 File (MM22-fp00321.mp4)
Here is a video description of our work "RKformer: Runge-Kutta Transformer with Random-Connection Attention for Infrared Small Target Detection". Infrared small target detection is useful in many practical applications. However, due to the characteristics of infrared image, current deep learning-based methods cannot preserve fine details while extracting features with discriminative semantics. To address this challenge, we propose a novel RKformer. Specifically, we adopt a parallel encoder block (PEB) of the transformer and convolution to take their advantages in long-range dependency modeling and locality modeling. Two PEBs are stacked with several residual connections under the guidance of RK methods for solving ODE as the basic encoder block, which enhance the feature and suppress noise. We propose a random-connection attention (RCA) block, which learns sparse attention via random connections during training. Extensive experiments have verified the effectiveness of RKformer.

References

[1]
Xiangzhi Bai and Fugen Zhou. 2010. Analysis of new top-hat transformation and the application for infrared dim small target detection. Pattern Recognition 43, 6 (2010), 2145--2156.
[2]
Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, and Sergey Zagoruyko. 2020. End-to-end object detection with transformers. In European Conference on Computer Vision. Springer, 213--229.
[3]
CL Philip Chen, Hong Li, Yantao Wei, Tian Xia, and Yuan Yan Tang. 2013. A local contrast method for small infrared target detection. IEEE Transactions on Geoscience and Remote Sensing 52, 1 (2013), 574--581.
[4]
Yunjin Chen, Wei Yu, and Thomas Pock. 2015. On learning optimized reaction diffusion processes for effective image restoration. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5261--5269.
[5]
Yimian Dai and Yiquan Wu. 2017. Reweighted infrared patch-tensor model with both nonlocal and local priors for single-frame small target detection. IEEE journal of selected topics in Applied Earth Observations and Remote Sensing 10, 8 (2017), 3752--3767.
[6]
Yimian Dai, YiquanWu, Fei Zhou, and Kobus Barnard. 2021. Asymmetric contextual modulation for infrared small target detection. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 950--959.
[7]
Yimian Dai, Yiquan Wu, Fei Zhou, and Kobus Barnard. 2021. Attentional local contrast networks for infrared small target detection. IEEE Transactions on Geoscience and Remote Sensing 59, 11 (2021), 9813--9824.
[8]
He Deng, Xianping Sun, Maili Liu, Chaohui Ye, and Xin Zhou. 2016. Small infrared target detection based on weighted local difference measure. IEEE Transactions on Geoscience and Remote Sensing 54, 7 (2016), 4204--4214.
[9]
Suyog D Deshpande, Meng Hwa Er, Ronda Venkateswarlu, and Philip Chan. 1999. Max-mean and max-median filters for detection of small targets. In Signal and Data Processing of Small Targets 1999, Vol. 3809. International Society for Optics and Photonics, 74--83.
[10]
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, et al. 2020. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020).
[11]
Cong Fang, Zhenyu Zhao, Pan Zhou, and Zhouchen Lin. 2017. Feature learning via partial differential equation with applications to face recognition. Pattern Recognition 69 (2017), 14--25.
[12]
Chenqiang Gao, Deyu Meng, Yi Yang, YongtaoWang, Xiaofang Zhou, and Alexander G Hauptmann. 2013. Infrared patch-image model for small target detection in a single image. IEEE Transactions on Image Processing 22, 12 (2013), 4996--5009.
[13]
Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. Advances in Neural Information Processing Systems 27 (2014).
[14]
Ping Guo, Kaizhu Huang, and Zenglin Xu. 2021. Partial Differential Equations is All You Need for Generating Neural Architectures--A Theory for Physical Artificial Intelligence Systems. arXiv preprint arXiv:2103.08313 (2021).
[15]
Jinhui Han, Saed Moradi, Iman Faramarzi, Honghui Zhang, Qian Zhao, Xiaojian Zhang, and Nan Li. 2020. Infrared small target detection based on the weighted strengthened local contrast measure. IEEE Geoscience and Remote Sensing Letters (2020).
[16]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. 770--778.
[17]
Xiangyu He, Zitao Mo, PeisongWang, Yang Liu, Mingyuan Yang, and Jian Cheng. 2019. Ode-inspired network design for single image super-resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1732--1741.
[18]
Jonathan Ho, Nal Kalchbrenner, Dirk Weissenborn, and Tim Salimans. 2019. Axial attention in multidimensional transformers. arXiv preprint arXiv:1912.12180 (2019).
[19]
Xiaodi Hou and Liqing Zhang. 2007. Saliency detection: A spectral residual approach. In 2007 IEEE Conference on Computer Vision and Pattern Recognition. Ieee, 1--8.
[20]
Jie Hu, Li Shen, and Gang Sun. 2018. Squeeze-and-excitation networks. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. 7132--7141.
[21]
Herbert Jaeger. 2007. Echo state network. scholarpedia 2, 9 (2007), 2330.
[22]
Wei Li, Qiming Zhang, Jing Zhang, Zhen Huang, Xinmei Tian, and Dacheng Tao. 2022. Toward Real-world Single Image Deraining: A New Benchmark and Beyond. arXiv preprint arXiv:2206.05514 (2022).
[23]
Ziqiang Li, XintianWu, Beihao Xia, Jing Zhang, ChaoyueWang, and Bin Li. 2022. A Comprehensive Survey on Data-Efficient GANs in Image Generation. arXiv preprint arXiv:2204.08329 (2022).
[24]
Ming Liu, Hao-yuan Du, Yue-jin Zhao, Li-quan Dong, and Mei Hui. 2018. Image small target detection based on deep learning with snr controlled sample generation. In Current Trends in Computer Science and Mechanical Automation Vol. 1. De Gruyter Open Poland, 211--220.
[25]
Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. 2021. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 10012--10022.
[26]
Lingyi Lu and Xin Xu. 2021. Visible-Infrared Cross-Modal Person Reidentification based on Positive Feedback. In ACM Multimedia Asia. 1--6.
[27]
Yiping Lu, Aoxiao Zhong, Quanzheng Li, and Bin Dong. 2018. Beyond finite layer neural networks: Bridging deep architectures and numerical differential equations. In Proceedings of the International Conference on Machine Learning. 3276--3285.
[28]
Bruce McIntosh, Shashanka Venkataramanan, and Abhijit Mahalanobis. 2020. Infrared target detection in cluttered environments by maximization of a target to clutter ratio (TCR) metric using a convolutional neural network. IEEE Trans. Aerospace Electron. Systems 57, 1 (2020), 485--496.
[29]
Carole H Sudre,Wenqi Li, Tom Vercauteren, Sebastien Ourselin, and M Jorge Cardoso. 2017. Generalised dice overlap as a deep learning loss function for highly unbalanced segmentations. In Deep learning in medical image analysis and multimodal learning for clinical decision support. Springer, 240--248.
[30]
Yang Sun, Jungang Yang, and Wei An. 2020. Infrared dim and small target detection via multiple subspace learning and spatial-temporal patch-tensor model. IEEE Transactions on Geoscience and Remote Sensing 59, 5 (2020), 3737--3752.
[31]
Michael Teutsch and Wolfgang Krüger. 2010. Classification of small boats in infrared images for maritime surveillance. In 2010 InternationalWaterSide Security Conference. IEEE, 1--7.
[32]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in Neural Information Processing Systems 30 (2017).
[33]
HuanWang, Luping Zhou, and LeiWang. 2019. Miss detection vs. false alarm: Adversarial learning for small object segmentation in infrared images. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 8509--8518.
[34]
Wen Wang, Yang Cao, Jing Zhang, Fengxiang He, Zheng-Jun Zha, Yonggang Wen, and Dacheng Tao. 2021. Exploring Sequence Feature Alignment for Domain Adaptive Detection Transformers. In Proceedings of the 29th ACM International Conference on Multimedia. 1730--1738.
[35]
Wen Wang, Yang Cao, Jing Zhang, and Dacheng Tao. 2021. FP-DETR: Detection Transformer Advanced by Fully Pre-training. In International Conference on Learning Representations.
[36]
Xing Wei, Diangang Li, Xiaopeng Hong, Wei Ke, and Yihong Gong. 2020. Coattentive lifting for infrared-visible person re-identification. In Proceedings of the 28th ACM International Conference on Multimedia. 1028--1037.
[37]
Yufei Xu, Qiming Zhang, Jing Zhang, and Dacheng Tao. 2021. Vitae: Vision transformer advanced by exploring intrinsic inductive bias. Advances in Neural Information Processing Systems 34 (2021).
[38]
Jing Zhang, Yang Cao, Shuai Fang, Yu Kang, and ChangWen Chen. 2017. Fast haze removal for nighttime image using maximum reflectance prior. In Proceedings of the IEEE conference on computer vision and pattern recognition. 7418--7426.
[39]
Jing Zhang and Dacheng Tao. 2020. Empowering things with intelligence: a survey of the progress, challenges, and opportunities in artificial intelligence of things. IEEE Internet of Things Journal 8, 10 (2020), 7789--7817.
[40]
Landan Zhang, Lingbing Peng, Tianfang Zhang, Siying Cao, and Zhenming Peng. 2018. Infrared small target detection via non-convex rank approximation minimization joint l2, 1 norm. Remote Sensing 10, 11 (2018), 1821.
[41]
Landan Zhang and Zhenming Peng. 2019. Infrared small target detection based on partial sum of the tensor nuclear norm. Remote Sensing 11, 4 (2019), 382.
[42]
Mingjin Zhang, Rui Zhang, Yuxiang Yang, Haichen Bai, Jing Zhang, and Jie Guo. 2022. ISNet: Shape Matters for Infrared Small Target Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 877--886.
[43]
Qiming Zhang, Yufei Xu, Jing Zhang, and Dacheng Tao. 2022. ViTAEv2: Vision Transformer Advanced by Exploring Inductive Bias for Image Recognition and Beyond. arXiv preprint arXiv:2202.10108 (2022).
[44]
Qiming Zhang, Yufei Xu, Jing Zhang, and Dacheng Tao. 2022. VSA: Learning Varied-Size Window Attention in Vision Transformers. arXiv preprint arXiv:2204.08446 (2022).
[45]
Wei Zhang, Mingyu Cong, and Liping Wang. 2003. Algorithms for optical weak small targets detection and tracking. In International Conference on Neural Networks and Signal Processing, 2003. Proceedings of the 2003, Vol. 1. IEEE, 643--647.
[46]
Bin Zhao, Chunping Wang, Qiang Fu, and Zishuo Han. 2020. A novel pattern for infrared small target detection with generative adversarial network. IEEE Transactions on Geoscience and Remote Sensing 59, 5 (2020), 4481--4492.
[47]
Sixiao Zheng, Jiachen Lu, Hengshuang Zhao, Xiatian Zhu, Zekun Luo, Yabiao Wang, Yanwei Fu, Jianfeng Feng, Tao Xiang, Philip HS Torr, et al. 2021. Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers. In Proceedings of the IEEE/CVF conference on Computer Vision and Pattern Recognition. 6881--6890.
[48]
Xizhou Zhu, Weijie Su, Lewei Lu, Bin Li, Xiaogang Wang, and Jifeng Dai. 2020. Deformable DETR: Deformable Transformers for End-to-End Object Detection. In International Conference on Learning Representations.

Cited By

View all
  • (2025)GeoIoU-SEA-YOLO: An Advanced Model for Detecting Unsafe Behaviors on Construction SitesSensors10.3390/s2504123825:4(1238)Online publication date: 18-Feb-2025
  • (2025)Fast Generalized Radon–Fourier Transform Based on Blind Speed Sidelobe TractionRemote Sensing10.3390/rs1703047517:3(475)Online publication date: 30-Jan-2025
  • (2025)MDEformer: Mixed Difference Equation Inspired Transformer for Compressed Video Quality EnhancementIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2024.335498236:2(2410-2422)Online publication date: Feb-2025
  • Show More Cited By

Index Terms

  1. RKformer: Runge-Kutta Transformer with Random-Connection Attention for Infrared Small Target Detection

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      MM '22: Proceedings of the 30th ACM International Conference on Multimedia
      October 2022
      7537 pages
      ISBN:9781450392037
      DOI:10.1145/3503161
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 10 October 2022

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Runge-Kutta method
      2. infrared small target detection
      3. transformer

      Qualifiers

      • Research-article

      Funding Sources

      • Shaanxi Province Key Research and Development Program Project
      • Youth Talent Promotion Project of Shaanxi University Science and Technology Association
      • EquipmentAdvance Research Field Fund Project
      • Chongqing Excellent Scientist Project
      • Special Project on Technological Innovation and Application Development
      • Young Elite Scientists Sponsorship Program by CAST
      • National Natural Science Foundation of China

      Conference

      MM '22
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)236
      • Downloads (Last 6 weeks)21
      Reflects downloads up to 28 Feb 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2025)GeoIoU-SEA-YOLO: An Advanced Model for Detecting Unsafe Behaviors on Construction SitesSensors10.3390/s2504123825:4(1238)Online publication date: 18-Feb-2025
      • (2025)Fast Generalized Radon–Fourier Transform Based on Blind Speed Sidelobe TractionRemote Sensing10.3390/rs1703047517:3(475)Online publication date: 30-Jan-2025
      • (2025)MDEformer: Mixed Difference Equation Inspired Transformer for Compressed Video Quality EnhancementIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2024.335498236:2(2410-2422)Online publication date: Feb-2025
      • (2025)Saliency at the Helm: Steering Infrared Small Target Detection With Learnable KernelsIEEE Transactions on Geoscience and Remote Sensing10.1109/TGRS.2024.352194763(1-14)Online publication date: 2025
      • (2025)Infrared Small Target Detection via Local-Global Feature FusionIEEE Signal Processing Letters10.1109/LSP.2024.352322632(466-470)Online publication date: 2025
      • (2025)Infrared Small Target Detection Based on Weak Feature Enhancement and Target Adaptive ProliferationIEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing10.1109/JSTARS.2024.350999318(2829-2850)Online publication date: 2025
      • (2025)Self-supervised multimodal change detection based on difference contrast learning for remote sensing imageryPattern Recognition10.1016/j.patcog.2024.111148159(111148)Online publication date: Mar-2025
      • (2025)5-D spatial–temporal information-based infrared small target detection in complex environmentsPattern Recognition10.1016/j.patcog.2024.111003158(111003)Online publication date: Feb-2025
      • (2025)Graph-based context learning network for infrared small target detectionNeurocomputing10.1016/j.neucom.2024.128949616(128949)Online publication date: Feb-2025
      • (2025)A fully locally selective large kernel network for traffic video detectionMeasurement10.1016/j.measurement.2024.115779242(115779)Online publication date: Jan-2025
      • Show More Cited By

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media