Modality Balancing Mechanism for RGB-Infrared Object Detection in Aerial Image

Cai, Weibo; Li, Zheng; Dong, Junhao; Lai, Jianhuang; Xie, Xiaohua

doi:10.1007/978-981-99-8555-5_7

Weibo Cai^15,16,17,
Zheng Li^15,16,17,
Junhao Dong^15,16,17,
Jianhuang Lai^15,16,17 &
…
Xiaohua Xie^15,16,17

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14436))

Included in the following conference series:

Chinese Conference on Pattern Recognition and Computer Vision (PRCV)

914 Accesses

Abstract

RGB-Infrared object detection in aerial images has gained significant attention due to its effectiveness in mitigating the challenges posed by illumination restrictions. Existing methods often focus heavily on enhancing the fusion of two modalities while ignoring the optimization imbalance caused by inherent differences between modalities. In this work, we observe that there is an inconsistency between two modalities during joint training, and this hampers the model’s performance. Inspired by these findings, we argue that the focus of RGB-Infrared detection should be shifted to the optimization of two modalities, and further propose a Modality Balancing Mechanism (MBM) method for training the detection model. To be specific, we initially introduce an auxiliary detection head to inspect the training process of both modalities. Subsequently, the learning rates of the two backbones are dynamically adjusted using the Scaled Gaussian Function (SGF). Furthermore, the Multi-modal Feature Hybrid Sampling Module (MHSM) is introduced to augment representation by combining complementary features extracted from both modalities. Benefiting from the design of the proposed mechanism, experimental results on DroneVehicle and LLVIP demonstrate that our approach achieves state-of-the-art performance. The code are available at (https://github.com/ccccwb/Multimodal-Detection-and-Tracking-UAV).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Translation, Scale and Rotation: Cross-Modal Alignment Meets RGB-Infrared Vehicle Detection

CSFuser: A Cascade Siamese Fusion Architecture for RGB-Infrared Object Detection

GD-PAN: a multiscale fusion architecture applied to object detection in UAV aerial images

Article 08 May 2024

References

Bao, W., Huang, M., Hu, J., Xiang, X.: Attention-guided multi-modal and multi-scale fusion for multispectral pedestrian detection. In: Pattern Recognition and Computer Vision: 5th Chinese Conference, PRCV 2022, Shenzhen, China, 4–7 November 2022, Proceedings, Part I, pp. 382–393. Springer, Heidelberg (2022). https://doi.org/10.1007/978-3-031-18907-4_30
Chen, K., et al.: Mmdetection: open mmlab detection toolbox and benchmark. arXiv preprint arXiv:1906.07155 (2019)
Chen, Q., Huang, Y., Sun, H., Huang, W.: Pavement crack detection using hessian structure propagation. Adv. Eng. Inf. 49, 101303 (2021)
Article Google Scholar
Cheng, G., Yuan, X., Yao, X., Yan, K., Zeng, Q., Han, J.: Towards large-scale small object detection: survey and benchmarks. arXiv preprint arXiv:2207.14096 (2022)
Ding, J., Xue, N., Long, Y., Xia, G.S., Lu, Q.: Learning ROI transformer for oriented object detection in aerial images. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2849–2858 (2019)
Google Scholar
Ding, J.: Object detection in aerial images: a large-scale benchmark and challenges. IEEE Trans. Pattern Anal. Mach. Intell. 44(11), 7778–7796 (2021)
Article Google Scholar
Du, C., et al.: On uni-modal feature learning in supervised multi-modal learning. arXiv preprint arXiv:2305.01233 (2023)
Fu, H., et al.: LRAF-Net: long-range attention fusion network for visible-infrared object detection. IEEE Trans. Neural Netw. Learn. Syst. (2023)
Google Scholar
Han, J., Ding, J., Li, J., Xia, G.S.: Align deep features for oriented object detection. IEEE Trans. Geosci. Remote Sens. 60, 1–11 (2021)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Huang, Y., Lin, J., Zhou, C., Yang, H., Huang, L.: Modality competition: what makes joint training of multi-modal network fail in deep learning?(provably). In: International Conference on Machine Learning, pp. 9226–9259. PMLR (2022)
Google Scholar
Jia, X., Zhu, C., Li, M., Tang, W., Zhou, W.: LLVIP: a visible-infrared paired dataset for low-light vision. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3496–3504 (2021)
Google Scholar
Kim, K., Kim, S., Shchur, D.: A UAS-based work zone safety monitoring system by integrating internal traffic control plan (ITCP) and automated object detection in game engine environment. Autom. Constr. 128, 103736 (2021)
Article Google Scholar
Li, S., Liu, Y., Zhao, Q., Feng, Z.: Learning residue-aware correlation filters and refining scale for real-time UAV tracking. Pattern Recogn. 127, 108614 (2022)
Article Google Scholar
Liang, P.P., Zadeh, A., Morency, L.P.: Foundations and recent trends in multimodal machine learning: principles, challenges, and open questions. arXiv preprint arXiv:2209.03430 (2022)
Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2980–2988 (2017)
Google Scholar
Qingyun, F., Dapeng, H., Zhaokui, W.: Cross-modality fusion transformer for multispectral object detection. arXiv preprint arXiv:2111.00273 (2021)
Qingyun, F., Zhaokui, W.: Cross-modality attentive feature fusion for object detection in multispectral remote sensing imagery. Pattern Recogn. 130, 108786 (2022)
Article Google Scholar
Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst. 28 (2015)
Google Scholar
Sun, Y., Cao, B., Zhu, P., Hu, Q.: Drone-based RGB-infrared cross-modality vehicle detection via uncertainty-aware learning. IEEE Trans. Circuits Syst. Video Technol. 32(10), 6700–6713 (2022)
Article Google Scholar
Wu, J., Liang, Y., Akbari, H., Wang, Z., Yu, C., et al.: Scaling multimodal pre-training via cross-modality gradient harmonization. Adv. Neural. Inf. Process. Syst. 35, 36161–36173 (2022)
Google Scholar
Xie, J., et al.: Learning a dynamic cross-modal network for multispectral pedestrian detection. In: Proceedings of the 30th ACM International Conference on Multimedia, pp. 4043–4052 (2022)
Google Scholar
Xie, X., Cheng, G., Wang, J., Yao, X., Han, J.: Oriented R-CNN for object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3520–3529 (2021)
Google Scholar
Yuan, M., Wang, Y., Wei, X.: Translation, scale and rotation: cross-modal alignment meets RGB-infrared vehicle detection. In: Computer Vision-ECCV 2022: 17th European Conference, Tel Aviv, Israel, 23–27 October 2022, Proceedings, Part IX, pp. 509–525. Springer, Heidelberg (2022). https://doi.org/10.1007/978-3-031-20077-9_30
Zhang, L., et al.: Cross-modality interactive attention network for multispectral pedestrian detection. Inf. Fusion 50, 20–29 (2019)
Article Google Scholar
Zhang, L., Zhu, X., Chen, X., Yang, X., Lei, Z., Liu, Z.: Weakly aligned cross-modal learning for multispectral pedestrian detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5127–5137 (2019)
Google Scholar
Zhou, K., Chen, L., Cao, X.: Improving multispectral pedestrian detection by addressing modality imbalance problems. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12363, pp. 787–803. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58523-5_46
Chapter Google Scholar
Zhou, T., Fan, D.P., Cheng, M.M., Shen, J., Shao, L.: RGB-D salient object detection: a survey. Comput. Visual Media 7, 37–69 (2021)
Article Google Scholar

Download references

Acknowledgments

This project is in part supported by the Key-Area Research and Development Program of Guangzhou (202206030003), and the National Natural Science Foundation of China (U22A2095, 62072482). We would like to thank Qi Chen for insight discussion.

Author information

Authors and Affiliations

School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China
Weibo Cai, Zheng Li, Junhao Dong, Jianhuang Lai & Xiaohua Xie
Key Laboratory of Machine Intelligence and Advanced Computing, Ministry of Education, Guangzhou, China
Weibo Cai, Zheng Li, Junhao Dong, Jianhuang Lai & Xiaohua Xie
Guangdong Province Key Laboratory of Information Security Technology, Guangzhou, China
Weibo Cai, Zheng Li, Junhao Dong, Jianhuang Lai & Xiaohua Xie

Authors

Weibo Cai
View author publications
You can also search for this author in PubMed Google Scholar
Zheng Li
View author publications
You can also search for this author in PubMed Google Scholar
Junhao Dong
View author publications
You can also search for this author in PubMed Google Scholar
Jianhuang Lai
View author publications
You can also search for this author in PubMed Google Scholar
Xiaohua Xie
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiaohua Xie .

Editor information

Editors and Affiliations

Nanjing University of Information Science and Technology, Nanjing, China
Qingshan Liu
Xiamen University, Xiamen, China
Hanzi Wang
Beijing University of Posts and Telecommunications, Beijing, China
Zhanyu Ma
Sun Yat-sen University, Guangzhou, China
Weishi Zheng
Peking University, Beijing, China
Hongbin Zha
Chinese Academy of Sciences, Beijing, China
Xilin Chen
Chinese Academy of Sciences, Beijing, China
Liang Wang
Xiamen University, Xiamen, China
Rongrong Ji

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Cai, W., Li, Z., Dong, J., Lai, J., Xie, X. (2024). Modality Balancing Mechanism for RGB-Infrared Object Detection in Aerial Image. In: Liu, Q., et al. Pattern Recognition and Computer Vision. PRCV 2023. Lecture Notes in Computer Science, vol 14436. Springer, Singapore. https://doi.org/10.1007/978-981-99-8555-5_7

Download citation

DOI: https://doi.org/10.1007/978-981-99-8555-5_7
Published: 28 December 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8554-8
Online ISBN: 978-981-99-8555-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Modality Balancing Mechanism for RGB-Infrared Object Detection in Aerial Image

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Translation, Scale and Rotation: Cross-Modal Alignment Meets RGB-Infrared Vehicle Detection

CSFuser: A Cascade Siamese Fusion Architecture for RGB-Infrared Object Detection

GD-PAN: a multiscale fusion architecture applied to object detection in UAV aerial images

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Modality Balancing Mechanism for RGB-Infrared Object Detection in Aerial Image

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Translation, Scale and Rotation: Cross-Modal Alignment Meets RGB-Infrared Vehicle Detection

CSFuser: A Cascade Siamese Fusion Architecture for RGB-Infrared Object Detection

GD-PAN: a multiscale fusion architecture applied to object detection in UAV aerial images

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation