Cross-Modal Ship Grounding: Towards Large Model for Enhanced Few-Shot Learning

Hu, Quan; Chen, Li; Feng, Zhida; Chen, Yaojie

doi:10.1007/978-3-031-78113-1_2

Quan Hu^13,14,
Li Chen^13,14,
Zhida Feng^13,14 &
…
Yaojie Chen^13,14

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15330))

Included in the following conference series:

International Conference on Pattern Recognition

254 Accesses

Abstract

A growing body of research indicates that employing large models for adaptation to downstream tasks often yields remarkable performance. However, in the domain of ship detection, the potential of these large models is frequently underutilized due to domain shift issues. This paper introduces the Cross-Modal Ship Grounding (CSG) model, which leverages an efficient Cross-Modal Adapter (CMA) technology to transfer the general detection capabilities of large models to ship images, addressing domain shift with minimal training costs. To mitigate the challenges posed by complex and variable background interference, the Water-Land Separation (WLS) module is proposed to focus specifically on the water area. This module effectively addresses the issue of background target interference, thereby enhancing the model’s accuracy in complex scenes. Empirical evaluations on both private and public datasets demonstrate that the CSG model surpasses all state-of-the-art models in performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 119.99; Price excludes VAT (USA)

Softcover Book: USD 139.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

MS-SSD: multi-scale single shot detector for ship detection in remote sensing images

Article 30 April 2022

Investigating the Transferability of YOLOv5-Based Water Surface Object Detection Model in Maritime Applications

YOLO-GCV: a lightweight algorithm for ship object detection in complex inland waterway environments

Article 25 December 2024

References

Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229. Springer, Cham (2020)
Google Scholar
Gu, X., Lin, T.Y., Kuo, W., Cui, Y.: Open-vocabulary object detection via vision and language knowledge distillation. In: International Conference on Learning Representations (2021). https://api.semanticscholar.org/CorpusID:238744187
Hu, J.E., et al.: Lora: low-rank adaptation of large language models. arXiv abs/2106.09685 (2021). https://api.semanticscholar.org/CorpusID:235458009
Huang, Q., Sun, H., Wang, Y., Yuan, Y., Guo, X., Gao, Q.: Ship detection based on yolo algorithm for visible images. IET Image Process. (2023)
Google Scholar
Kamath, A., Singh, M., LeCun, Y., Synnaeve, G., Misra, I., Carion, N.: MDETR-modulated detection for end-to-end multi-modal understanding. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1780–1790 (2021)
Google Scholar
Ke, L., et al.: Segment anything in high quality. arXiv preprint arXiv:2306.01567 (2023)
Kim, K., Hong, S., Choi, B., Kim, E.: Probabilistic ship detection and classification using deep learning. Appl. Sci. 8(6), 936 (2018)
Article Google Scholar
Kirillov, A., et al.: Segment anything. In: 2023 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 3992–4003 (2023). https://api.semanticscholar.org/CorpusID:257952310
Koch, G., Zemel, R., Salakhutdinov, R., et al.: Siamese neural networks for one-shot image recognition. In: ICML Deep Learning Workshop, vol. 2. Lille (2015)
Google Scholar
Leela, S., Roh, M.I., Ohb, M.: Image-based ship detection using deep learning. Ocean Syst. Eng. 10 (2020)
Google Scholar
Liao, Y., et al.: A real-time cross-modality correlation filtering method for referring expression comprehension. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10880–10889 (2020)
Google Scholar
Liu, D., Zhang, H., Wu, F., Zha, Z.J.: Learning to assemble neural module tree networks for visual grounding. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4673–4682 (2019)
Google Scholar
Liu, R.W., Yuan, W., Chen, X., Lu, Y.: An enhanced CNN-enabled learning method for promoting ship detection in maritime surveillance system. Ocean Eng. 235, 109435 (2021)
Article Google Scholar
Liu, S., et al.: Grounding dino: marrying dino with grounded pre-training for open-set object detection. arXiv abs/2303.05499 (2023). https://api.semanticscholar.org/CorpusID:257427307
Ma, J., He, Y., Li, F., Han, L., You, C., Wang, B.: Segment anything in medical images. Nat. Commun. 15(1), 654 (2024)
Article Google Scholar
Radford, A., et al.: Learning transferable visual models from natural language supervision. In: International Conference on Machine Learning, pp. 8748–8763. PMLR (2021)
Google Scholar
Ravi, S., Larochelle, H.: Optimization as a model for few-shot learning. In: International Conference on Learning Representations (2016). https://api.semanticscholar.org/CorpusID:67413369
Reis, D., Kupec, J., Hong, J., Daoudi, A.: Real-time flying object detection with yolov8. arXiv preprint arXiv:2305.09972 (2023)
Shao, Z., Wang, L., Wang, Z., Du, W., Wu, W.: Saliency-aware convolution neural network for ship detection in surveillance video. IEEE Trans. Circuits Syst. Video Technol. 30(3), 781–794 (2019)
Article Google Scholar
Shao, Z., Wu, W., Wang, Z., Du, W., Li, C.: Seaships: a large-scale precisely annotated dataset for ship detection. IEEE Trans. Multimedia 20, 2593–2604 (2018). https://api.semanticscholar.org/CorpusID:52285314
Su, J.C., Maji, S., Hariharan, B.: When does self-supervision improve few-shot learning? In: European Conference on Computer Vision, pp. 645–666. Springer, Cham (2020)
Google Scholar
Wang, P., Wu, Q., Cao, J., Shen, C., Gao, L., Hengel, A.V.D.: Neighbourhood watch: referring expression comprehension via language-guided graph attention networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1960–1968 (2019)
Google Scholar
Yang, J., Chen, H., Yan, J., Chen, X., Yao, J.: Towards better understanding and better generalization of few-shot classification in histology images with contrastive learning. arXiv preprint arXiv:2202.09059 (2022)
Yang, L., et al.: Pdnet: toward better one-stage object detection with prediction decoupling. IEEE Trans. Image Process. 31, 5121–5133 (2022)
Article Google Scholar
Yang, Z., Chen, T., Wang, L., Luo, J.: Improving one-stage visual grounding by recursive sub-query construction. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12359, pp. 387–404. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58568-6_23
Yang, Z., Gong, B., Wang, L., Huang, W., Yu, D., Luo, J.: A fast and accurate one-stage approach to visual grounding. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4683–4693 (2019)
Google Scholar
Zareian, A., Rosa, K.D., Hu, D.H., Chang, S.F.: Open-vocabulary object detection using captions. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14393–14402 (2021)
Google Scholar
Zhang, C., et al.: Faster segment anything: towards lightweight SAM for mobile applications. arXiv preprint arXiv:2306.14289 (2023)
Zhao, Y., et al.: DETRs beat YOLOs on real-time object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16965–16974 (2024)
Google Scholar
Zhu, X., Ma, Y., Wang, T., Xu, Y., Shi, J., Lin, D.: SSN: shape signature networks for multi-class object detection from point clouds. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12370, pp. 581–597. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58595-2_35

Download references

Acknowledgement

This work was supported by National Natural Science Foundation of China (62271359).

Author information

Authors and Affiliations

School of Computer Science and Technology, Wuhan University of Science and Technology, Wuhan, China
Quan Hu, Li Chen, Zhida Feng & Yaojie Chen
Hubei Province Key Laboratory of Intelligent Information Processing and Real-Time Industrial System, Wuhan University of Science and Technology, Wuhan, China
Quan Hu, Li Chen, Zhida Feng & Yaojie Chen

Authors

Quan Hu
View author publications
You can also search for this author in PubMed Google Scholar
Li Chen
View author publications
You can also search for this author in PubMed Google Scholar
Zhida Feng
View author publications
You can also search for this author in PubMed Google Scholar
Yaojie Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Li Chen .

Editor information

Editors and Affiliations

University of Salford, Salford, Lancashire, UK
Apostolos Antonacopoulos
Indian Institute of Technology Bombay, Mumbai, Maharashtra, India
Subhasis Chaudhuri
Johns Hopkins University, Baltimore, MD, USA
Rama Chellappa
Chinese Academy of Sciences, Beijing, China
Cheng-Lin Liu
IIT Kharagpur, Kharagpur, West Bengal, India
Saumik Bhattacharya
Indian Statistical Institute Kolkata, Kolkata, West Bengal, India
Umapada Pal

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hu, Q., Chen, L., Feng, Z., Chen, Y. (2025). Cross-Modal Ship Grounding: Towards Large Model for Enhanced Few-Shot Learning. In: Antonacopoulos, A., Chaudhuri, S., Chellappa, R., Liu, CL., Bhattacharya, S., Pal, U. (eds) Pattern Recognition. ICPR 2024. Lecture Notes in Computer Science, vol 15330. Springer, Cham. https://doi.org/10.1007/978-3-031-78113-1_2

Download citation

DOI: https://doi.org/10.1007/978-3-031-78113-1_2
Published: 04 December 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-78112-4
Online ISBN: 978-3-031-78113-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)