Abstract
Underwater target detection is crucial for ocean exploration, but existing methods struggle to achieve satisfactory results due to the complexity of the underwater environment. To enhance the accuracy and real-time performance of underwater detection models, we propose an improved YOLOv7 model. We introduce a multi-granularity feature attention method based on the Efficient Channel Attention (ECA) to help the model better adapt to the diverse conditions in the underwater environment, reducing focus on redundant information. Utilizing coordinate convolution provides the network with spatial awareness of input image coordinates, enabling more effective localization of target objects and reducing interference from similar background elements. To accommodate the features of small and dense underwater targets, we use normalized Wasserstein distance to measure the similarity of bounding boxes. On the Underwater Robot Picking Contest 2019 (URPC 2019) dataset, the mean Average Precision (mAP) of our improved network has reached 86.19%, which represents a 1.57% increase compared to the original YOLOv7 network. Additionally, the frames per second (fps) has achieved 124, surpassing the performance of the original network. This improvement is significantly superior to conventional target detection models, providing a faster and more accurate advantage for underwater target detection tasks in complex underwater environments.









Similar content being viewed by others
Data availability
The datasets used and analyzed during the current study are publicly available and can be accessed at http://en.urpc.org.cn. The relevant data sources and links are also cited in the manuscript.
Notes
Underwater Robot Professional Contest: http://en.urpc.org.cn.
References
Teng, B., Zhao, H.: Underwater target recognition methods based on the framework of deep learning: a survey. Int. J. Adv. Rob. Syst. 17(6), 1–12 (2020)
Ge, H., Dai, Y., Zhu, Z., Zang, X.: Single-stage underwater target detection based on feature anchor frame double optimization network. Sensors 22(20), 7875 (2022)
Li, C.-Y., Guo, J.-C., Cong, R.-M., Pang, Y.-W., Wang, B.: Underwater image enhancement by dehazing with minimum information loss and histogram distribution prior. IEEE Trans. Image Process. 25(12), 5664–5677 (2016)
Ding, X., Wang, Y., Zhang, J., Fu, X.: Underwater image dehaze using scene depth estimation with adaptive color correction. In OCEANS 2017-Aberdeen, pp. 1–5. IEEE (2017)
Ghani, A.S.A., Ab Nasir, A.F., Tarmizi, W.F.W.: Integration of enhanced background filtering and wavelet fusion for high visibility and detection rate of deep sea underwater image of underwater vehicle. In 2017 5th International Conference on Information and Communication Technology (ICoIC7), pp. 1–6. IEEE (2017)
Krishnaraj, N., Elhoseny, M., Thenmozhi, M., Selim, M.M., Shankar, K.: Deep learning model for real-time image compression in internet of underwater things (iout). J. Real-Time Image Proc. 17(6), 2097–2111 (2020)
Ahn, J., Yasukawa, S., Sonoda, T., Nishida, Y., Ishii, K., Ura, T.: Image enhancement and compression of deep-sea floor image for acoustic transmission. In OCEANS 2016-Shanghai, pp. 1–6. IEEE (2016)
Chuang, M.-C., Hwang, J.-N., Williams, K.: A feature learning and object recognition framework for underwater fish images. IEEE Trans. Image Process. 25(4), 1862–1872 (2016)
Zhang, L., He, B., Song, Y., Yan, T.: Underwater image feature extraction and matching based on visual saliency detection. In OCEANS 2016-Shanghai, pp. 1–4. IEEE (2016)
Lee, D., Kim, G., Kim, D., Myung, H., Choi, H.-T.: Vision-based object detection and tracking for autonomous navigation of underwater robots. Ocean Eng. 48, 59–68 (2012)
Li, X., Yirui, W., Zhang, W., Wang, R., Hou, F.: Deep learning methods in real-time image super-resolution: a survey. J. Real-Time Image Proc. 17, 1885–1909 (2020)
Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 7464–7475 (2023)
Redmon, J.: You only look once: unified, real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (2016)
Redmon, J., Farhadi, A.: Yolo9000: better, faster, stronger. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7263–7271 (2017)
Ćorović, A., Ilić, V., Đurić, S., Marijan, M., Pavković, B.: The real-time detection of traffic participants using yolo algorithm. In 2018 26th Telecommunications Forum (TELFOR), pp. 1–4. IEEE (2018)
Chen, W., Huang, H., Peng, S., Zhou, C., Zhang, C.: Yolo-face: a real-time face detector. Vis. Comput. 37, 805–813 (2021)
Xuelong, H., Liu, Y., Zhao, Z., Liu, J., Yang, X., Sun, C., Chen, S., Li, B., Zhou, C.: Real-time detection of uneaten feed pellets in underwater images for aquaculture using an improved yolo-v4 network. Comput. Electron. Agric. 185, 106135 (2021)
Al Muksit, A., Hasan, F., Emon, Md.F.H.B., Haque, Md.R., Anwary, A.R., Shatabda, S.: Yolo-fish: A robust fish detection model to detect fish in realistic underwater environment. Ecol. Inform. 72, 101847 (2022)
Wang, Q., Wu, B., Zhu, P., Li, P., Zuo, W., Hu, Q.: Eca-net: Efficient channel attention for deep convolutional neural networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 11534–11542 (2020)
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7132–7141 (2018)
Woo, S., Park, J., Lee, J.-Y., Kweon, I.S.: Cbam: Convolutional block attention module. In Proceedings of the European conference on computer vision (ECCV), pp. 3–19 (2018)
Liu, R., Lehman, J., Molino, P., Such, F.P., Frank, E., Sergeev, A., Yosinski, J.: An intriguing failing of convolutional neural networks and the coordconv solution. Advances in neural information processing systems (2018)
Zheng, Z., Wang, P., Liu, W., Li, J., Ye, R., Ren, D.: Distance-iou loss: Faster and better learning for bounding box regression. In Proceedings of the AAAI conference on artificial intelligence 34, 12993–13000 (2020)
Shengkai, W., Yang, J., Wang, X., Li, X.: Iou-balanced loss functions for single-stage object detection. Pattern Recogn. Lett. 156, 96–103 (2022)
Wang, J., Xu, C., Yang, W., Yu, L.: A normalized gaussian Wasserstein distance for tiny object detection. arXiv preprint arXiv:2110.13389 (2021)
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., Berg, A.C.: Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pp. 21–37. Springer (2016)
Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1137–1149 (2016)
Farhadi, A., Redmon, J.: Yolov3: An incremental improvement. In: Computer vision and pattern recognition. volume 1804, pp. 1–6. Springer, Berlin/Heidelberg, Germany (2018)
Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In European conference on computer vision, pp. 213–229. Springer (2020)
Zhao, Y., Lv, W., Xu, S., Wei, J., Wang, G., Dang, Q., Liu, Y., Chen, J.: Detrs beat yolos on real-time object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16965–16974 (2024)
Acknowledgements
This work was supported by the National Natural Science Foundation of China under Grant No. 62202499, the State Key Program of National Natural Science Foundation of China under Grant No. 62133016 and the Changsha Soft Science Research Program under Grant No. kh2302054.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Wu, Q., Cen, L., Kan, S. et al. Real-time underwater target detection based on improved YOLOv7. J Real-Time Image Proc 22, 43 (2025). https://doi.org/10.1007/s11554-025-01621-1
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11554-025-01621-1