Abstract
Underwater object detection is a fascinating but challengeable subject in computer vision. Features are difficult to extract due to the color cast and blur of underwater images. Moreover, given the small scale of the underwater object, some details will be lost after several layers of convolution. Therefore, a multi-scale aggregation feature pyramid network is proposed to integrate multi-scale features and improve underwater object detection performance. Specifically, a lightweight and efficient network is used to extract the basic features. A special subnet is designed to improve the feature extraction capability of the backbone network to enrich the detailed features of small underwater objects. In addition, a multi-scale feature pyramid is proposed to enrich feature map. Each feature map enhances contextual information through a combination of up-sampling and down-sampling. The centerness strategy of the fully convolutional one-stage object detection head is improved by adding corner point regression to enhance the recall rate of small objects. Generalized intersection over union (GIoU) instead of IoU can better reflect the degree of coincidence between the actual box and the predicted box. Therefore, the regression loss is changed to GIoU loss. This paper evaluates the network on the underwater image dataset and obtains 78.90% mAP. Meanwhile, the experiment on the PASCAL VOC datasets is conducted and gets 84.3% mAP.
Similar content being viewed by others
Data availability
The data that support the findings of this study are available from Peng Cheng Laboratory. Restrictions apply to the availability of these data, which were used under license for this study. Data are available at https://aistudio.baidu.com/aistudio/datasetdetail/25886 with the permission of Peng Cheng Laboratory.
References
Han, M., et al.: A review on intelligence dehazing and color restoration for underwater images. IEEE Trans. Syst. Man Cybern. Syst. 50(5), 1820–1832 (2018)
Wang, Jing, et al.: CA-GAN: class-condition attention GAN for underwater image enhancement. IEEE Access 8, 130719–130728 (2020)
Wang, Xinhua, et al.: Underwater object recognition based on deep encoding-decoding network. J. Ocean Univ. China 18(2), 376–382 (2019)
Chen, L., et al.: Underwater object detection using invert multi-class adaboost with deep learning. In: 2020 International Joint Conference on Neural Networks (IJCNN). IEEE (2020)
Wei, Jian, et al.: Enhanced object detection with deep convolutional neural networks for advanced driving assistance. IEEE Trans. Intell. Transp. Syst. 21(4), 1572–1583 (2019)
Dhillon, Anamika, Verma, Gyanendra K.: Convolutional neural network: a review of models, methodologies and applications to object detection. Progress Artif. Intell. 9(2), 85–112 (2020)
Li, H., et al.: Pyramid attention network for semantic segmentation. arXiv:1805.10180 (2018)
Ammari, Habib, et al.: Reconstructing fine details of small objects by using plasmonic spectroscopic data. SIAM J. Imag. Sci. 11(1), 1–23 (2018)
Liu, W., et al.: Ssd: single shot multibox detector. In: European conference on computer vision. Springer, Cham (2016)
Lin, T.-Y., et al.: Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017)
Lee, Y., et al.: An energy and GPU-computation efficient backbone network for real-time object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (2019)
Rezatofighi, H., et al.: Generalized intersection over union: a metric and a loss for bounding box regression. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2019)
He, Kaiming, et al.: Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 37(9), 1904–1916 (2015)
Girshick, R., et al.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2014)
Girshick, R.: Fast r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision (2015)
Ren, S., et al.: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern. Anal. Mach. Intell. 39(6):1137–1149 (2017)
Shrivastava, A., Abhinav, G., Ross, G.: Training region-based object detectors with online hard example mining. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016)
Singh, B., Larry, S.D. An analysis of scale invariance in object detection snip. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2018)
Fu, C.-Y., et al.: Dssd: deconvolutional single shot detector. arXiv:1701.06659 (2017)
Redmon, J., Ali, F.: Yolov3: an incremental improvement. arXiv:1804.02767 (2018)
Zhao, Q., et al.: M2det: a single-shot object detector based on multi-level feature pyramid network. In: Proceedings of the AAAI Conference on Artificial Intelligence vol. 33. No. 01. (2019)
Tian, Z., et al.: Fcos: fully convolutional one-stage object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (2019)
Xu, Fengqiang, et al.: Scale-aware feature pyramid architecture for marine object detection. Neural Comput. Appl. 33(8), 3637–3653 (2021)
Ghiasi, G., Tsung-Yi, L., Quoc, V.L.: Nas-fpn: learning scalable feature pyramid architecture for object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2019)
Lin, T.-Y., et al.: Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision (2017)
Zheng, Z., et al.: Distance-IoU loss: faster and better learning for bounding box regression. In: Proceedings of the AAAI Conference on Artificial Intelligence vol. 34. No. 07 (2020)
Chen, Z., et al.: Piou loss: towards accurate oriented object detection in complex environments. In: European Conference on Computer Vision. Springer, Cham (2020)
He, K., et al.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016)
Huang, G., et al.: Condensenet: an efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2018)
Bochkovskiy, A., Chien-Yao, W., Hong-Yuan, M.L.: Yolov4: optimal speed and accuracy of object detection. arXiv:2004.10934 (2020)
Duan, K., et al.: Centernet: keypoint triplets for object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (2019)
Rodner, E., Simon, M., Fisher, R., Denzler, J.: Fine-grained recognition in the noisy wild: sensitivity analysis of convolutional neural networks approaches. In: Procedings of the British Machine Vision Conference 2016. British Machine Vision Association (2016)
Acknowledgements
This work was supported in part by the Hebei Natural Science Foundation, China under Grant F2020203037, and F2022203025, in part by the National Natural Science Foundation of China under Grant 61873224, Grant 62271437, and Grant 62003295, in part by the Science and Technology Research Project of Universities in Hebei, China under Grant QN2020301.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declared that they have no conflicts of interest to this work.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Li, X., Yu, H. & Chen, H. Multi-scale aggregation feature pyramid with cornerness for underwater object detection. Vis Comput 40, 1299–1310 (2024). https://doi.org/10.1007/s00371-023-02849-3
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00371-023-02849-3