Abstract
Underwater object segmentation presents significant challenges due to the degradation of image quality and the complexity of underwater environments. In recent years, deep learning has provided an effective approach for object segmentation. However, DeepLabV3+, as a classical model for general scenes, shows limitations in achieving accurate and real-time segmentation in complex underwater conditions. To address this issue, we propose a DeepLab-FusionNet, an extended version of DeepLabV3+, specifically designed for underwater object segmentation. The model utilizes a multi-resolution parallel branch structure to extract multi-scale information and employs an improved inverted residual structure as the basic feature extraction module in the encoding network. Structural reparameterization technique is introduced to optimize inference speed and memory access costs during the inference stage. Additionally, a module for linking deep and shallow level information is constructed to reduce the loss of detail and spatial information during downsampling and convolution. Evaluation on the SUIM dataset shows a 3.3% increase in mean Intersection over Union (mIoU) and a speed improvement of 34 frames per second (FPS) compared to the baseline model DeepLabV3+. Further comparisons with other classic lightweight models and Transformer-based models on the UIIS and TrashCan datasets demonstrate that our model achieves good accuracy and balanced computational efficiency in challenging underwater environments. Although there is room for improvement due to overfitting and fixed convolution kernel limitations, future integration with Transformer methods is planned. Our model offers an effective solution for real-time target segmentation for underwater robots, with broad applications in human exploration and development of marine resources. Our codes are available at: https://github.com/sunmer1rain/deeplabv_fusionnet
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Hong L, Wang X, Zhang D (2024) Cfd-based hydrodynamic performance investigation of autonomous underwater vehicles: A survey. Ocean Eng 305:117911
Osayi Philip Igbinenikaro OOA, Etukudoh EA (2024) A comparative review of subsea navigation technologies in offshore engineering projects. Int J Front Eng Technol Res 6(2):019–034
Hasan K, Ahmad S, Liaf AF, Karimi M, Ahmed T, Shawon MA, Mekhilef S (2024) Oceanic challenges to technological solutions: A review of autonomous underwater vehicle path technologies in biomimicry, control, navigation, and sensing. IEEE Access 12:46202–46231
Huy DQ, Sadjoli N, Azam AB, Elhadidi B, Cai Y, Seet G (2023) Object perception in underwater environments: A survey on sensors and sensing methodologies. Ocean Eng 267
Li M, Zhang H, Gruen A, Li D (2024) A survey on underwater coral image segmentation based on deep learning. Geo-spatial Inf Sci p 1–25
Pergeorelis M, Bazik M, Saponaro P, Kim J, Kambhamettu C (2022) Synthetic data for semantic segmentation in underwater imagery. in OCEANS. Hampton Roads. IEEE 2022:1–6
Ji L, Du Y, Dang Y, Gao W, Zhang H (2024) A survey of methods for addressing the challenges of referring image segmentation. Neurocomputing 583:127599
Mo Y, Wu Y, Yang X, Liu F, Liao Y (2022) Review the state-of-the-art technologies of semantic segmentation based on deep learning. Neurocomputing 493:626–646
Hao S, Zhou Y, Guo Y (2020) A brief survey on semantic segmentation with deep learning. Neurocomputing 406:302–321
Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation.’ in Proceedings of the IEEE conference on computer vision and pattern recognition. IEEE, pp 3431–3440
Ronneberger O, Fischer P, Brox T (2015) U-net: Convolutional networks for biomedical image segmentation. in Medical Image Computing and Computer-Assisted Intervention-MICCAI, 18th International Conference, Munich, Germany, October 5–9, Proceedings, Part III 18. Springer 2015:234–241
Wang J, Liu X (2021) Medical image recognition and segmentation of pathological slices of gastric cancer based on deeplab v3+ neural network. Comput Methods Prog Biomed 207:106210
Chen L, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2017) Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans Patt Anal Mach Intell 40(4):834–848
Bai Z, Jing J (2023) Mobile-deeplab: a lightweight pixel segmentation-based method for fabric defect detection. J Intell Manuf
Chen L, Zhu Y, Papandreou G, Schroff F, Adam H (2018) Encoder-decoder with atrous separable convolution for semantic image segmentation. in Proceedings of the European conference on computer vision (ECCV), pp 801–818
Zhuang P, Wang Y, Qiao Y (2021) Wildfish++: A comprehensive fish benchmark for multimedia research. IEEE Trans Multimed 23:3603–3617
Ditria EM, Connolly RM, Jinks EL, Lopez-Marcano S (2021) Annotated video footage for automated identification and counting of fish in unconstrained seagrass habitats. Front Marine Sci 8
Cai L, Chen C, Chai H (2021) Underwater distortion target recognition network (udtrnet) via enhanced image features. Comput Intell Neurosci 2021:1–10
Zhang P, Yu H, Li H, Zhang X, Wei S, Tu W, Yang Z, Wu J, Lin Y (2023) Msgnet: multi-source guidance network for fish segmentation in underwater videos. Front Marine Sci 10
Martin-Abadal M, Guerrero-Font E, Bonin-Font F, Gonzalez-Cid Y (2018) Deep semantic segmentation in an auv for online posidonia oceanica meadows identification. IEEE Access 6(2018):60956–60967
Islam MJ, Edge C, Xiao Y, Luo P, Mehtaz M, Morse C, Enan SS, Sattar J (2020) Semantic segmentation of underwater imagery: Dataset and benchmark. in 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, pp 1769–1776
Nezla N, Haridas TM, Supriya M (2021) Semantic segmentation of underwater images using unet architecture based deep convolutional encoder decoder model. in 2021 7th International Conference on Advanced Computing and Communication Systems (ICACCS), vol 1. IEEE, pp 28–33
Zhou J, Yang T, Zhang W (2023) Underwater vision enhancement technologies: a comprehensive review, challenges, and recent trends. Appl Intell 53(3):3594–3621
Sun K, Xiao B, Liu D, Wang J (2019) Deep high-resolution representation learning for human pose estimation. in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5693–5703
Wang J, Sun K, Cheng T, Jiang B, Deng C, Zhao Y, Liu D, Mu Y, Tan M, Wang X et al (2020) Deep high-resolution representation learning for visual recognition. IEEE Trans Patt Anal Mach Intell 43(10):3349–3364
Sandler M, Howard A, Zhu M, Zhmoginov A, Chen LC (2018) Mobilenetv2: Inverted residuals and linear bottlenecks. in Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4510–4520
Howard A, Sandler M, Chu G, Chen LC, Chen B, Tan M, Wang W, Zhu Y, Pang R, Vasudevan V et al (2019) Searching for mobilenetv3. in Proceedings of the IEEE/CVF international conference on computer vision, pp 1314–1324
Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. in Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2818–2826
Szegedy C, Ioffe S, Vanhoucke V, Alemi A (2017) Inception-v4, inception-resnet and the impact of residual connections on learning. in Proc of the AAAI Conf Artif Intell 31(1)
Rahnemoonfar M, Dobbs D (2019) Semantic segmentation of underwater sonar imagery with deep learning. in IGARSS 2019-2019 IEEE International Geoscience and Remote Sensing Symposium. IEEE, pp 9455–9458
Tolie HF, Ren J, Elyan E (2024) Dicam: Deep inception and channel-wise attention modules for underwater image enhancement. Neurocomputing 584:127585
Liu F, Fang M (2020) Semantic segmentation of underwater images based on improved deeplab. J Marine Sci Eng 8(3):188
Jin A, Zeng X (2023) A novel deep learning method for underwater target recognition based on res-dense convolutional neural network with attention mechanism. J Marine Sci Eng 11(1):69
Ding X, Zhang X, Ma N, Han J, Ding G, Sun J (2021) Repvgg: Making vgg-style convnets great again. in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 13733–13742
Lian S, Li H, Cong R, Li S, Zhang W, Kwong S (2023) Watermask: Instance segmentation for underwater imagery. in 2023 IEEE/CVF International Conference on Computer Vision (ICCV). IEEE
Hong J, Fulton M, Sattar J (2020) Trashcan: A semantically-segmented dataset towards visual detection of marine debris. arXiv:2007.08097
Yu C, Gao C, Wang J, Yu G, Shen C, Sang N (2021) Bisenet v2: Bilateral network with guided aggregation for real-time semantic segmentation. Int J Comput Vis 129(11):3051–3068
Peng J, Liu Y, Tang S, Hao Y, Chu L, Chen G, Wu Z, Chen Z, Yu Z, Du Y et al (2022) Pp-liteseg: A superior real-time semantic segmentation model. arXiv:2204.02681
Strudel R, Garcia R, Laptev I, Schmid C (2021) Segmenter: Transformer for semantic segmentation. in 2021 IEEE/CVF International Conference on Computer Vision (ICCV). IEEE
Xie E, Wang W, Yu Z, Anandkumar A, Alvarez JM, Luo P (2021) Segformer: Simple and efficient design for semantic segmentation with transformers. Advances in neural information processing systems 34(2021):12077–12090
Zhang W, Huang Z, Luo G, Chen T, Wang X, Liu W, Yu G, Shen C (2022) Topformer: Token pyramid transformer for mobile semantic segmentation. in 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE
Acknowledgements
This research is currently supported by Guangdong Province Basic and Applied Basic Research Foundation(2022A1515110420), Shenzhen Science and Technology Program(Grant No.RCBS20221008093227028), and National Natural Science Foundation of China(Grant No.12405214). We would like to thank Ming Yang for his participation in improving the manuscript and for his dedicated efforts in collecting the dataset required for new experiments.
Author information
Authors and Affiliations
Contributions
Chengxiang Liu: Conceptualization, Methodology, Supervision, Writing - Reviewing and Editing, Project administration. Haoxin Yao: Software, Visualization, Data curation, Writing-Original Draft. Wenhui Qiu: Software, Methodology, Data curation. Hongyuan Cui: Supervision, Visualization, Investigation. Yubin Fang: Investigation, Validation. Anqi Xu: Conceptualization, Formal analysis, Supervision, Writing-Reviewing and Editing, Funding acquisition.
Corresponding author
Ethics declarations
Competing interests
The authors declare that they have no competing interest to this work.
Ethical and informed consent for data used
The authors of the submitted manuscript declare that does not involve any ethical issues.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Liu, C., Yao, H., Qiu, W. et al. Multi-scale feature map fusion encoding for underwater object segmentation. Appl Intell 55, 163 (2025). https://doi.org/10.1007/s10489-024-05971-4
Accepted:
Published:
DOI: https://doi.org/10.1007/s10489-024-05971-4