Skip to main content

WAFormer: Ship Detection in SAR Images Based on Window-Aware Swin-Transformer

  • Conference paper
  • First Online:
Pattern Recognition and Computer Vision (PRCV 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13536))

Included in the following conference series:

Abstract

The research work of synthetic aperture radar (SAR) image target detection based on deep learning has made great progress. However, most of them apply the methods applicable to optical images directly to SAR images, ignoring the characteristics of targets in SAR images. For instance, the size of target in SAR images is usually small and volatile. Meanwhile, the target distribution is relatively sparse and the detection is affected by the complex background noise. In this paper, we propose an improved backbone network, called WAFormer, for ship targets detection in SAR images, based on the latest Swin-Transformer. WAFormer improves the local window attention mechanism of Swin-Transformer by introducing the new window settings. Our model can be more suitable to match the shape of the target, so that it obtains more accurate detection in SAR images. Experimental results show that the WAFormer achieves 74.4% mAP on the Official-SSDD SAR dataset, surpassing Swin-Transformer by +1.0, especially for large targets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Jiao, J., et al.: A densely connected end-to-end neural network for multiscale and multiscene SAR ship detection. IEEE Access 6, 20881–20892 (2018)

    Article  Google Scholar 

  2. Chang, Y.-L., Anagaw, A., Chang, L., Wang, Y.C., Hsiao, C.-Y., Lee, W.-H.: Ship detection based on YOLOv2 for SAR imagery. Remote Sens. 11(7), 786 (2019)

    Article  Google Scholar 

  3. Zhang, T., Zhang, X.: High-speed ship detection in SAR images based on a grid convolutional neural network. Remote Sens. 11(10), 1206 (2019)

    Article  Google Scholar 

  4. An, Q., Pan, Z., Liu, L., You, H.: DRBox-v2: an improved detector with rotatable boxes for target detection in SAR images. IEEE Trans. Geosci. Remote Sens. 57(11), 8333–8349 (2019)

    Article  Google Scholar 

  5. Zhang, T., et al.: SAR ship detection dataset (SSDD): official release and comprehensive data analysis. Remote Sens. 13(18), 3690 (2021)

    Article  Google Scholar 

  6. Li, J., Qu, C., Shao, J.: Ship detection in SAR images based on an improved faster R-CNN. In: 2017 SAR in Big Data Era: Models, Methods and Applications (BIGSARDATA), pp. 1–6. IEEE (2017)

    Google Scholar 

  7. Wei, S., Zeng, X., Qu, Q., Wang, M., Su, H., Shi, J.: HRSID: a high-resolution SAR images dataset for ship detection and instance segmentation. IEEE Access 8, 120234–120254 (2020)

    Article  Google Scholar 

  8. Liu, Z., et al.: Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021)

    Google Scholar 

  9. Cui, Z., Li, Q., Cao, Z., Liu, N.: Dense attention pyramid networks for multi-scale ship detection in SAR images. IEEE Trans. Geosci. Remote Sens. 57(11), 8983–8997 (2019)

    Article  Google Scholar 

  10. Zhao, Y., Zhao, L., Xiong, B., Kuang, G.: Attention receptive pyramid network for ship detection in SAR images. IEEE J. Sel. Top. Appl. Earth Observations Remote Sens. 13, 2738–2756 (2020)

    Article  Google Scholar 

  11. Cui, Z., Wang, X., Liu, N., Cao, Z., Yang, J.: Ship detection in large-scale SAR images via spatial shuffle-group enhance attention. IEEE Trans. Geosci. Remote Sens. 59(1), 379–391 (2020)

    Article  Google Scholar 

  12. Fu, J., Sun, X., Wang, Z., Fu, K.: An anchor-free method based on feature balancing and refinement network for multiscale ship detection in SAR images. IEEE Trans. Geosci. Remote Sens. 59(2), 1331–1344 (2020)

    Article  Google Scholar 

  13. Guo, H., Yang, X., Wang, N., Gao, X.: A CenterNet++ model for ship detection in SAR images. Pattern Recogn. 112, 107787 (2021)

    Article  Google Scholar 

  14. Tang, L., Tang, W., Qu, X., Han, Y., Wang, W., Zhao, B.: A scale-aware pyramid network for multi-scale object detection in SAR images. Remote Sens. 14(4), 973 (2022)

    Article  Google Scholar 

  15. Xu, X., Zhang, X., Zhang, T.: Lite-YOLOv5: a lightweight deep learning detector for on-board ship detection in large-scene sentinel-1 SAR images. Remote Sens. 14(4), 1018 (2022)

    Article  Google Scholar 

  16. Xia, R., et al.: CRTransSar: a visual transformer based on contextual joint representation learning for SAR ship detection. Remote Sens. 14(6), 1488 (2022)

    Article  Google Scholar 

  17. Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30 (2017)

    Google Scholar 

  18. Dosovitskiy, A., et al.: An image is worth 16 \(\times \) 16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)

  19. Touvron, H., Cord, M., Douze, M., Massa, F., Sablayrolles, A., Jégou, H.: Training data-efficient image transformers and distillation through attention. In: International Conference on Machine Learning, pp. 10347–10357. PMLR (2021)

    Google Scholar 

  20. Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12346, pp. 213–229. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58452-8_13

    Chapter  Google Scholar 

  21. Wang, W., et al.: Pyramid vision transformer: a versatile backbone for dense prediction without convolutions. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 568–578 (2021)

    Google Scholar 

  22. Dong, X., et al.: CSWin transformer: A general vision transformer backbone with cross-shaped windows. arXiv preprint arXiv:2107.00652 (2021)

  23. Chen, K., et al.: MMDetection: Open MMLab detection toolbox and benchmark. arXiv preprint arXiv:1906.07155 (2019)

  24. Sun, P., et al.: Sparse R-CNN: end-to-end object detection with learnable proposals. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14454–14463 (2021)

    Google Scholar 

  25. Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)

  26. Gehring, J., Auli, M., Grangier, D., Yarats, D., Dauphin, Y.N.: Convolutional sequence to sequence learning. In: International Conference on Machine Learning, pp. 1243–1252. PMLR (2017)

    Google Scholar 

  27. Shaw, P., Uszkoreit, J., Vaswani, A.: Self-attention with relative position representations. arXiv preprint arXiv:1803.02155 (2018)

  28. He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2961–2969 (2017)

    Google Scholar 

  29. Redmon, J., Farhadi, A.: YOLOv3: An incremental improvement. arXiv preprint arXiv:1804.02767 (2018)

  30. Liu, W., et al.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2

    Chapter  Google Scholar 

  31. Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2980–2988 (2017)

    Google Scholar 

  32. Girshick, R.: Fast R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448 (2015)

    Google Scholar 

  33. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lingfeng Wang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wang, Z., Wang, L., Wang, W., Tian, S., Zhang, Z. (2022). WAFormer: Ship Detection in SAR Images Based on Window-Aware Swin-Transformer. In: Yu, S., et al. Pattern Recognition and Computer Vision. PRCV 2022. Lecture Notes in Computer Science, vol 13536. Springer, Cham. https://doi.org/10.1007/978-3-031-18913-5_41

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-18913-5_41

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-18912-8

  • Online ISBN: 978-3-031-18913-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics