Skip to main content

Cross-Modal Ship Grounding: Towards Large Model for Enhanced Few-Shot Learning

  • Conference paper
  • First Online:
Pattern Recognition (ICPR 2024)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15330))

Included in the following conference series:

  • 254 Accesses

Abstract

A growing body of research indicates that employing large models for adaptation to downstream tasks often yields remarkable performance. However, in the domain of ship detection, the potential of these large models is frequently underutilized due to domain shift issues. This paper introduces the Cross-Modal Ship Grounding (CSG) model, which leverages an efficient Cross-Modal Adapter (CMA) technology to transfer the general detection capabilities of large models to ship images, addressing domain shift with minimal training costs. To mitigate the challenges posed by complex and variable background interference, the Water-Land Separation (WLS) module is proposed to focus specifically on the water area. This module effectively addresses the issue of background target interference, thereby enhancing the model’s accuracy in complex scenes. Empirical evaluations on both private and public datasets demonstrate that the CSG model surpasses all state-of-the-art models in performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229. Springer, Cham (2020)

    Google Scholar 

  2. Gu, X., Lin, T.Y., Kuo, W., Cui, Y.: Open-vocabulary object detection via vision and language knowledge distillation. In: International Conference on Learning Representations (2021). https://api.semanticscholar.org/CorpusID:238744187

  3. Hu, J.E., et al.: Lora: low-rank adaptation of large language models. arXiv abs/2106.09685 (2021). https://api.semanticscholar.org/CorpusID:235458009

  4. Huang, Q., Sun, H., Wang, Y., Yuan, Y., Guo, X., Gao, Q.: Ship detection based on yolo algorithm for visible images. IET Image Process. (2023)

    Google Scholar 

  5. Kamath, A., Singh, M., LeCun, Y., Synnaeve, G., Misra, I., Carion, N.: MDETR-modulated detection for end-to-end multi-modal understanding. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1780–1790 (2021)

    Google Scholar 

  6. Ke, L., et al.: Segment anything in high quality. arXiv preprint arXiv:2306.01567 (2023)

  7. Kim, K., Hong, S., Choi, B., Kim, E.: Probabilistic ship detection and classification using deep learning. Appl. Sci. 8(6), 936 (2018)

    Article  Google Scholar 

  8. Kirillov, A., et al.: Segment anything. In: 2023 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 3992–4003 (2023). https://api.semanticscholar.org/CorpusID:257952310

  9. Koch, G., Zemel, R., Salakhutdinov, R., et al.: Siamese neural networks for one-shot image recognition. In: ICML Deep Learning Workshop, vol. 2. Lille (2015)

    Google Scholar 

  10. Leela, S., Roh, M.I., Ohb, M.: Image-based ship detection using deep learning. Ocean Syst. Eng. 10 (2020)

    Google Scholar 

  11. Liao, Y., et al.: A real-time cross-modality correlation filtering method for referring expression comprehension. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10880–10889 (2020)

    Google Scholar 

  12. Liu, D., Zhang, H., Wu, F., Zha, Z.J.: Learning to assemble neural module tree networks for visual grounding. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4673–4682 (2019)

    Google Scholar 

  13. Liu, R.W., Yuan, W., Chen, X., Lu, Y.: An enhanced CNN-enabled learning method for promoting ship detection in maritime surveillance system. Ocean Eng. 235, 109435 (2021)

    Article  Google Scholar 

  14. Liu, S., et al.: Grounding dino: marrying dino with grounded pre-training for open-set object detection. arXiv abs/2303.05499 (2023). https://api.semanticscholar.org/CorpusID:257427307

  15. Ma, J., He, Y., Li, F., Han, L., You, C., Wang, B.: Segment anything in medical images. Nat. Commun. 15(1), 654 (2024)

    Article  Google Scholar 

  16. Radford, A., et al.: Learning transferable visual models from natural language supervision. In: International Conference on Machine Learning, pp. 8748–8763. PMLR (2021)

    Google Scholar 

  17. Ravi, S., Larochelle, H.: Optimization as a model for few-shot learning. In: International Conference on Learning Representations (2016). https://api.semanticscholar.org/CorpusID:67413369

  18. Reis, D., Kupec, J., Hong, J., Daoudi, A.: Real-time flying object detection with yolov8. arXiv preprint arXiv:2305.09972 (2023)

  19. Shao, Z., Wang, L., Wang, Z., Du, W., Wu, W.: Saliency-aware convolution neural network for ship detection in surveillance video. IEEE Trans. Circuits Syst. Video Technol. 30(3), 781–794 (2019)

    Article  Google Scholar 

  20. Shao, Z., Wu, W., Wang, Z., Du, W., Li, C.: Seaships: a large-scale precisely annotated dataset for ship detection. IEEE Trans. Multimedia 20, 2593–2604 (2018). https://api.semanticscholar.org/CorpusID:52285314

  21. Su, J.C., Maji, S., Hariharan, B.: When does self-supervision improve few-shot learning? In: European Conference on Computer Vision, pp. 645–666. Springer, Cham (2020)

    Google Scholar 

  22. Wang, P., Wu, Q., Cao, J., Shen, C., Gao, L., Hengel, A.V.D.: Neighbourhood watch: referring expression comprehension via language-guided graph attention networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1960–1968 (2019)

    Google Scholar 

  23. Yang, J., Chen, H., Yan, J., Chen, X., Yao, J.: Towards better understanding and better generalization of few-shot classification in histology images with contrastive learning. arXiv preprint arXiv:2202.09059 (2022)

  24. Yang, L., et al.: Pdnet: toward better one-stage object detection with prediction decoupling. IEEE Trans. Image Process. 31, 5121–5133 (2022)

    Article  Google Scholar 

  25. Yang, Z., Chen, T., Wang, L., Luo, J.: Improving one-stage visual grounding by recursive sub-query construction. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12359, pp. 387–404. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58568-6_23

  26. Yang, Z., Gong, B., Wang, L., Huang, W., Yu, D., Luo, J.: A fast and accurate one-stage approach to visual grounding. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4683–4693 (2019)

    Google Scholar 

  27. Zareian, A., Rosa, K.D., Hu, D.H., Chang, S.F.: Open-vocabulary object detection using captions. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14393–14402 (2021)

    Google Scholar 

  28. Zhang, C., et al.: Faster segment anything: towards lightweight SAM for mobile applications. arXiv preprint arXiv:2306.14289 (2023)

  29. Zhao, Y., et al.: DETRs beat YOLOs on real-time object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16965–16974 (2024)

    Google Scholar 

  30. Zhu, X., Ma, Y., Wang, T., Xu, Y., Shi, J., Lin, D.: SSN: shape signature networks for multi-class object detection from point clouds. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12370, pp. 581–597. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58595-2_35

Download references

Acknowledgement

This work was supported by National Natural Science Foundation of China (62271359).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Li Chen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Hu, Q., Chen, L., Feng, Z., Chen, Y. (2025). Cross-Modal Ship Grounding: Towards Large Model for Enhanced Few-Shot Learning. In: Antonacopoulos, A., Chaudhuri, S., Chellappa, R., Liu, CL., Bhattacharya, S., Pal, U. (eds) Pattern Recognition. ICPR 2024. Lecture Notes in Computer Science, vol 15330. Springer, Cham. https://doi.org/10.1007/978-3-031-78113-1_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-78113-1_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-78112-4

  • Online ISBN: 978-3-031-78113-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics