Skip to main content
Log in

Multi-scale aggregation feature pyramid with cornerness for underwater object detection

  • Original article
  • Published:
The Visual Computer Aims and scope Submit manuscript

Abstract

Underwater object detection is a fascinating but challengeable subject in computer vision. Features are difficult to extract due to the color cast and blur of underwater images. Moreover, given the small scale of the underwater object, some details will be lost after several layers of convolution. Therefore, a multi-scale aggregation feature pyramid network is proposed to integrate multi-scale features and improve underwater object detection performance. Specifically, a lightweight and efficient network is used to extract the basic features. A special subnet is designed to improve the feature extraction capability of the backbone network to enrich the detailed features of small underwater objects. In addition, a multi-scale feature pyramid is proposed to enrich feature map. Each feature map enhances contextual information through a combination of up-sampling and down-sampling. The centerness strategy of the fully convolutional one-stage object detection head is improved by adding corner point regression to enhance the recall rate of small objects. Generalized intersection over union (GIoU) instead of IoU can better reflect the degree of coincidence between the actual box and the predicted box. Therefore, the regression loss is changed to GIoU loss. This paper evaluates the network on the underwater image dataset and obtains 78.90% mAP. Meanwhile, the experiment on the PASCAL VOC datasets is conducted and gets 84.3% mAP.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Data availability

The data that support the findings of this study are available from Peng Cheng Laboratory. Restrictions apply to the availability of these data, which were used under license for this study. Data are available at https://aistudio.baidu.com/aistudio/datasetdetail/25886 with the permission of Peng Cheng Laboratory.

References

  1. Han, M., et al.: A review on intelligence dehazing and color restoration for underwater images. IEEE Trans. Syst. Man Cybern. Syst. 50(5), 1820–1832 (2018)

    Article  Google Scholar 

  2. Wang, Jing, et al.: CA-GAN: class-condition attention GAN for underwater image enhancement. IEEE Access 8, 130719–130728 (2020)

    Article  Google Scholar 

  3. Wang, Xinhua, et al.: Underwater object recognition based on deep encoding-decoding network. J. Ocean Univ. China 18(2), 376–382 (2019)

    Article  Google Scholar 

  4. Chen, L., et al.: Underwater object detection using invert multi-class adaboost with deep learning. In: 2020 International Joint Conference on Neural Networks (IJCNN). IEEE (2020)

  5. Wei, Jian, et al.: Enhanced object detection with deep convolutional neural networks for advanced driving assistance. IEEE Trans. Intell. Transp. Syst. 21(4), 1572–1583 (2019)

    Article  Google Scholar 

  6. Dhillon, Anamika, Verma, Gyanendra K.: Convolutional neural network: a review of models, methodologies and applications to object detection. Progress Artif. Intell. 9(2), 85–112 (2020)

    Article  Google Scholar 

  7. Li, H., et al.: Pyramid attention network for semantic segmentation. arXiv:1805.10180 (2018)

  8. Ammari, Habib, et al.: Reconstructing fine details of small objects by using plasmonic spectroscopic data. SIAM J. Imag. Sci. 11(1), 1–23 (2018)

    Article  MathSciNet  Google Scholar 

  9. Liu, W., et al.: Ssd: single shot multibox detector. In: European conference on computer vision. Springer, Cham (2016)

  10. Lin, T.-Y., et al.: Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017)

  11. Lee, Y., et al.: An energy and GPU-computation efficient backbone network for real-time object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (2019)

  12. Rezatofighi, H., et al.: Generalized intersection over union: a metric and a loss for bounding box regression. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2019)

  13. He, Kaiming, et al.: Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 37(9), 1904–1916 (2015)

    Article  Google Scholar 

  14. Girshick, R., et al.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2014)

  15. Girshick, R.: Fast r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision (2015)

  16. Ren, S., et al.: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern. Anal. Mach. Intell. 39(6):1137–1149 (2017)

  17. Shrivastava, A., Abhinav, G., Ross, G.: Training region-based object detectors with online hard example mining. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016)

  18. Singh, B., Larry, S.D. An analysis of scale invariance in object detection snip. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2018)

  19. Fu, C.-Y., et al.: Dssd: deconvolutional single shot detector. arXiv:1701.06659 (2017)

  20. Redmon, J., Ali, F.: Yolov3: an incremental improvement. arXiv:1804.02767 (2018)

  21. Zhao, Q., et al.: M2det: a single-shot object detector based on multi-level feature pyramid network. In: Proceedings of the AAAI Conference on Artificial Intelligence vol. 33. No. 01. (2019)

  22. Tian, Z., et al.: Fcos: fully convolutional one-stage object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (2019)

  23. Xu, Fengqiang, et al.: Scale-aware feature pyramid architecture for marine object detection. Neural Comput. Appl. 33(8), 3637–3653 (2021)

    Article  Google Scholar 

  24. Ghiasi, G., Tsung-Yi, L., Quoc, V.L.: Nas-fpn: learning scalable feature pyramid architecture for object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2019)

  25. Lin, T.-Y., et al.: Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision (2017)

  26. Zheng, Z., et al.: Distance-IoU loss: faster and better learning for bounding box regression. In: Proceedings of the AAAI Conference on Artificial Intelligence vol. 34. No. 07 (2020)

  27. Chen, Z., et al.: Piou loss: towards accurate oriented object detection in complex environments. In: European Conference on Computer Vision. Springer, Cham (2020)

  28. He, K., et al.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016)

  29. Huang, G., et al.: Condensenet: an efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2018)

  30. Bochkovskiy, A., Chien-Yao, W., Hong-Yuan, M.L.: Yolov4: optimal speed and accuracy of object detection. arXiv:2004.10934 (2020)

  31. Duan, K., et al.: Centernet: keypoint triplets for object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (2019)

  32. Rodner, E., Simon, M., Fisher, R., Denzler, J.: Fine-grained recognition in the noisy wild: sensitivity analysis of convolutional neural networks approaches. In: Procedings of the British Machine Vision Conference 2016. British Machine Vision Association (2016)

Download references

Acknowledgements

This work was supported in part by the Hebei Natural Science Foundation, China under Grant F2020203037, and F2022203025, in part by the National Natural Science Foundation of China under Grant 61873224, Grant 62271437, and Grant 62003295, in part by the Science and Technology Research Project of Universities in Hebei, China under Grant QN2020301.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Haifeng Yu.

Ethics declarations

Conflict of interest

The authors declared that they have no conflicts of interest to this work.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, X., Yu, H. & Chen, H. Multi-scale aggregation feature pyramid with cornerness for underwater object detection. Vis Comput 40, 1299–1310 (2024). https://doi.org/10.1007/s00371-023-02849-3

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00371-023-02849-3

Keywords

Navigation