A Lightweight Anchor-Free Detector Using Cross-Dimensional Interactive Feature Balance for SAR Ship Detection
Abstract
In recent years, the deep convolutional neural network (CNN) model has rapidly developed, and it is widely used in SAR remote sensing target detection. Many methods utilize preset anchor boxes for target classification and boundary box coordinate regression. However, these methods encounter challenges when deployed to edge devices. Firstly, in SAR images, ships are sparsely and unevenly distributed, rendering most anchor boxes redundant. Secondly, the target detection model relying on anchor boxes demands extensive computation during anchor box generation, typically running on hardware-rich environments. Nevertheless, edge devices lack the necessary hardware environment to meet the deployment requirements of the CNN model. To address these issues, this paper proposes a lightweight anchor-free detector called Cross-Dimensional Interactive Feature Balance (CDIFB-Net). We adopt a key point strategy to predict the bounding box, eliminating the reliance on anchors. CDIFB-Net employs a partial convolution (PConv) based model as a lightweight feature extraction network to tackle the model's lightweight problem. Additionally, considering the characteristics of large target scale differences and unbalanced target distribution in SAR ship images, and to reduce the precision decline problem caused by lightweight, this paper introduces a cross-dimensional interactive feature balance pyramid (CDI-FBP), which balances semantic information of different levels through feature pyramid aggregation and averaging. The cross-dimensional interaction module (CDIM) establishes the relationship between the ship target and the background. Experiments conducted on the SAR Ship dataset (SSDD) and High-resolution SAR Image dataset (HRSID), along with inference experiments in different hardware environments, validate the effectiveness of the network for SAR ship detection. The experimental results demonstrate that the proposed CDIFB-Net maintains fast reasoning, low delay, and excellent detection accuracy across various hardware environments.
References
[1]
A. T. Manninen and L. M. H. Ulander, “Forestry parameter retrieval from texture in CARABAS VHF-band SAR images,” IEEE Trans. Geosci. Remote Sens., vol. 39, no. 12, pp. 2622–2633, Dec. 2001.
[2]
H. McNairn and B. Brisco, “The application of C-band polarimetric SAR for agriculture: A review,” Can. J. Remote Sens., vol. 30, no. 3, pp. 525–542, 2004.
[3]
M. W. Lang and E. S. Kasischke, “Using C-band synthetic aperture radar data to monitor forested wetland hydrology in Maryland's coastal plain, USA,” IEEE Trans. Geosci. Remote Sens., vol. 46, no. 2, pp. 535–546, Feb. 2008.
[4]
Shiqi Huang, Yiting Wang, and Peifeng Su, "A New Synthetical Method of Feature Enhancement and Detection for SAR Image Targets," Journal of Image and Graphics, Vol. 4, No. 2, pp. 73-77, December 2016.
[5]
S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards real-time object detection with region proposal networks,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 39, no. 6, pp. 1137–1149, Jun. 2017.
[6]
T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollár, “Focal loss for dense object detection,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 42, no. 2, pp. 318–327, Feb. 2020.
[7]
X. Zhou, D. Wang, and P. Krähenbühl, “Objects as points,” 2019, arXiv:1904.07850.
[8]
Z. Tian, C. Shen, H. Chen, and T. He, “FCOS: Fully convolutional onestage object detection,” in Proc. IEEE/CVF Int. Conf. Comput. Vis., 2019, pp. 9626–9635.
[9]
T. Kong, F. Sun, H. Liu, Y. Jiang, L. Li, and J. Shi, “FoveaBox: Beyond anchor-based object detection,” IEEE Trans. Image Process., vol. 29, pp. 7389–7398, 2020.
[10]
A. G. Howard, “MobileNets: Efficient convolutional neural networks for mobile vision applications,” 2017, arXiv:1704.04861. [Online]. Available: http://arxiv.org/abs/1704.04861
[11]
M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen, “MobileNetV2: Inverted residuals and linear bottlenecks,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., Salt Lake City, UT, USA, Jun. 2018, pp. 4510–4520.
[12]
A. Howard, “Searching for MobileNetV3,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Seoul, South Korea, Oct. 2019, pp. 1314–1324.
[13]
X. Zhang, X. Zhou, M. Lin, and J. Sun, “ShuffleNet: An extremely efficient convolutional neural network for mobile devices,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., Salt Lake City, UT, USA, Jun. 2018, pp. 6848–6856.
[14]
N. Ma, X. Zhang, H.-T. Zheng, and J. Sun, “ShuffleNet V2: Practical guidelines for efficient CNN architecture design,” in Proc. Eur. Conf. Comput. Vis., Munich, Germany, Sep. 2018, pp. 122–138.
[15]
K. Han, Y. Wang, Q. Tian, J. Guo, C. Xu, and C. Xu, “GhostNet: More features from cheap operations,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2020, pp. 1580–1589.
[16]
Laurent Sifre and St´ephane Mallat. Rigid-motion scattering for texture classification. arXiv preprint arXiv:1403.1687, 2014. 1, 3
[17]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems, 25, 2012. 1, 3, 10
[18]
Yunsheng Li, Yinpeng Chen, Xiyang Dai, Dongdong Chen, Mengchen Liu, Lu Yuan, Zicheng Liu, Lei Zhang, and Nuno Vasconcelos. Micronet: Improving image recognition with extremely low flops. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 468–477, 2021. 1, 3
[19]
J. Chen et al., “Run, don't walk: Chasing higher FLOPS for faster neural networks,” 2023.
[20]
Y. Cao, J. Xu, S. Lin, F. Wei, and H. Hu, “GCNet: Non-local networks meet squeeze-excitation networks and beyond,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. Workshop (ICCVW), Oct. 2019, pp. 1971–1980.
[21]
J. Pang, K. Chen, J. Shi, H. Feng, W. Ouyang, and D. Lin, “Libra R-CNN: Towards balanced learning for object detection,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2019, pp. 821–830.
[22]
MISRA D, NALAMADA T, ARASANIPALAI A U, Rotate to Attend: Convolutional Triplet Attention Module[C/OL]//2021 IEEE Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA. 2021.
[23]
K. Duan, S. Bai, L. Xie, H. Qi, Q. Huang, and Q. Tian, “CenterNet: Keypoint triplets for object detection,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Oct. 2019, pp. 6568–6577.
[24]
T. Zhang, “SAR ship detection dataset (SSDD): Official release and comprehensive data analysis,” Remote Sens., vol. 13, no. 18, 2021, Art. no. 3690.
[25]
S. Wei, X. Zeng, Q. Qu, M. Wang, H. Su, and J. Shi, “HRSID: A high-resolution SAR images dataset for ship detection and instance segmentation,” IEEE Access, vol. 8, pp. 120234–120254, 2020.
[26]
BAI L, YAO C, YE Z, Feature Enhancement Pyramid and Shallow Feature Reconstruction Network for SAR Ship Detection[J].
[27]
J. Pang, K. Chen, J. Shi, H. Feng, W. Ouyang, and D. Lin, “Libra R-CNN: Towards balanced learning for object detection,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2019, pp. 821–830.
Recommendations
Dangerous Goods Detection in X-ray Security Inspection Images Based on Improved YOLOv7
VSIP '23: Proceedings of the 2023 5th International Conference on Video, Signal and Image Processing
According to multi-scale targets and overlapping targets in X-ray security inspection images, this paper proposes a dangerous goods detection algorithm called as YOLOv7-MPCN(MaxPooling with CA and BiFPN). As the convolutional network layers overlap, the ...
Comments
Information & Contributors
Information
Published In
![cover image ACM Other conferences](/cms/asset/501ff48d-a4f7-40dc-ac49-71e87f35c325/3638682.cover.jpg)
November 2023
237 pages
ISBN:9798400709272
DOI:10.1145/3638682
Copyright © 2023 ACM.
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
Published: 22 May 2024
Check for updates
Author Tags
Qualifiers
- Research-article
- Research
- Refereed limited
Conference
VSIP 2023
VSIP 2023: 2023 the 5th International Conference on Video, Signal and Image Processing
November 24 - 26, 2023
Harbin, China
Contributors
Other Metrics
Bibliometrics & Citations
Bibliometrics
Article Metrics
- 0Total Citations
- 24Total Downloads
- Downloads (Last 12 months)24
- Downloads (Last 6 weeks)3
Reflects downloads up to 13 Feb 2025
Other Metrics
Citations
View Options
Login options
Check if you have access through your login credentials or your institution to get full access on this article.
Sign inFull Access
View options
View or Download as a PDF file.
PDFeReader
View online with eReader.
eReaderHTML Format
View this article in HTML Format.
HTML Format