Multi-scale feature map fusion encoding for underwater object segmentation

Liu, Chengxiang; Yao, Haoxin; Qiu, Wenhui; Cui, Hongyuan; Fang, Yubin; Xu, Anqi

doi:10.1007/s10489-024-05971-4

Multi-scale feature map fusion encoding for underwater object segmentation

Published: 13 December 2024

Volume 55, article number 163, (2025)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Chengxiang Liu¹,
Haoxin Yao¹,
Wenhui Qiu¹,
Hongyuan Cui¹,
Yubin Fang¹ &
…
Anqi Xu ORCID: orcid.org/0000-0002-7073-171X²

183 Accesses
Explore all metrics

Abstract

Underwater object segmentation presents significant challenges due to the degradation of image quality and the complexity of underwater environments. In recent years, deep learning has provided an effective approach for object segmentation. However, DeepLabV3+, as a classical model for general scenes, shows limitations in achieving accurate and real-time segmentation in complex underwater conditions. To address this issue, we propose a DeepLab-FusionNet, an extended version of DeepLabV3+, specifically designed for underwater object segmentation. The model utilizes a multi-resolution parallel branch structure to extract multi-scale information and employs an improved inverted residual structure as the basic feature extraction module in the encoding network. Structural reparameterization technique is introduced to optimize inference speed and memory access costs during the inference stage. Additionally, a module for linking deep and shallow level information is constructed to reduce the loss of detail and spatial information during downsampling and convolution. Evaluation on the SUIM dataset shows a 3.3% increase in mean Intersection over Union (mIoU) and a speed improvement of 34 frames per second (FPS) compared to the baseline model DeepLabV3+. Further comparisons with other classic lightweight models and Transformer-based models on the UIIS and TrashCan datasets demonstrate that our model achieves good accuracy and balanced computational efficiency in challenging underwater environments. Although there is room for improvement due to overfitting and fixed convolution kernel limitations, future integration with Transformer methods is planned. Our model offers an effective solution for real-time target segmentation for underwater robots, with broad applications in human exploration and development of marine resources. Our codes are available at: https://github.com/sunmer1rain/deeplabv_fusionnet

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Compact and Fast Underwater Segmentation Network for Autonomous Underwater Vehicles

Multi-branch Underwater Scene Semantic Segmentation by Fusing Depth Information and Enhanced Visual Feature

Simultaneous Localization and Segmentation of Fish Objects Using Multi-task CNN and Dense CRF

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Data availability and access

The datasets such as SUIM, UIIS and TrashCan used in this research are available from the reference [21, 35, 36] respectively. All data generated or analysed during this study are included in this published article.

References

Hong L, Wang X, Zhang D (2024) Cfd-based hydrodynamic performance investigation of autonomous underwater vehicles: A survey. Ocean Eng 305:117911
Article MATH Google Scholar
Osayi Philip Igbinenikaro OOA, Etukudoh EA (2024) A comparative review of subsea navigation technologies in offshore engineering projects. Int J Front Eng Technol Res 6(2):019–034
Hasan K, Ahmad S, Liaf AF, Karimi M, Ahmed T, Shawon MA, Mekhilef S (2024) Oceanic challenges to technological solutions: A review of autonomous underwater vehicle path technologies in biomimicry, control, navigation, and sensing. IEEE Access 12:46202–46231
Article Google Scholar
Huy DQ, Sadjoli N, Azam AB, Elhadidi B, Cai Y, Seet G (2023) Object perception in underwater environments: A survey on sensors and sensing methodologies. Ocean Eng 267
Li M, Zhang H, Gruen A, Li D (2024) A survey on underwater coral image segmentation based on deep learning. Geo-spatial Inf Sci p 1–25
Pergeorelis M, Bazik M, Saponaro P, Kim J, Kambhamettu C (2022) Synthetic data for semantic segmentation in underwater imagery. in OCEANS. Hampton Roads. IEEE 2022:1–6
Google Scholar
Ji L, Du Y, Dang Y, Gao W, Zhang H (2024) A survey of methods for addressing the challenges of referring image segmentation. Neurocomputing 583:127599
Article Google Scholar
Mo Y, Wu Y, Yang X, Liu F, Liao Y (2022) Review the state-of-the-art technologies of semantic segmentation based on deep learning. Neurocomputing 493:626–646
Article MATH Google Scholar
Hao S, Zhou Y, Guo Y (2020) A brief survey on semantic segmentation with deep learning. Neurocomputing 406:302–321
Article MATH Google Scholar
Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation.’ in Proceedings of the IEEE conference on computer vision and pattern recognition. IEEE, pp 3431–3440
Ronneberger O, Fischer P, Brox T (2015) U-net: Convolutional networks for biomedical image segmentation. in Medical Image Computing and Computer-Assisted Intervention-MICCAI, 18th International Conference, Munich, Germany, October 5–9, Proceedings, Part III 18. Springer 2015:234–241
MATH Google Scholar
Wang J, Liu X (2021) Medical image recognition and segmentation of pathological slices of gastric cancer based on deeplab v3+ neural network. Comput Methods Prog Biomed 207:106210
Article MATH Google Scholar
Chen L, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2017) Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans Patt Anal Mach Intell 40(4):834–848
Article MATH Google Scholar
Bai Z, Jing J (2023) Mobile-deeplab: a lightweight pixel segmentation-based method for fabric defect detection. J Intell Manuf
Chen L, Zhu Y, Papandreou G, Schroff F, Adam H (2018) Encoder-decoder with atrous separable convolution for semantic image segmentation. in Proceedings of the European conference on computer vision (ECCV), pp 801–818
Zhuang P, Wang Y, Qiao Y (2021) Wildfish++: A comprehensive fish benchmark for multimedia research. IEEE Trans Multimed 23:3603–3617
Article MATH Google Scholar
Ditria EM, Connolly RM, Jinks EL, Lopez-Marcano S (2021) Annotated video footage for automated identification and counting of fish in unconstrained seagrass habitats. Front Marine Sci 8
Cai L, Chen C, Chai H (2021) Underwater distortion target recognition network (udtrnet) via enhanced image features. Comput Intell Neurosci 2021:1–10
Zhang P, Yu H, Li H, Zhang X, Wei S, Tu W, Yang Z, Wu J, Lin Y (2023) Msgnet: multi-source guidance network for fish segmentation in underwater videos. Front Marine Sci 10
Martin-Abadal M, Guerrero-Font E, Bonin-Font F, Gonzalez-Cid Y (2018) Deep semantic segmentation in an auv for online posidonia oceanica meadows identification. IEEE Access 6(2018):60956–60967
Article Google Scholar
Islam MJ, Edge C, Xiao Y, Luo P, Mehtaz M, Morse C, Enan SS, Sattar J (2020) Semantic segmentation of underwater imagery: Dataset and benchmark. in 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, pp 1769–1776
Nezla N, Haridas TM, Supriya M (2021) Semantic segmentation of underwater images using unet architecture based deep convolutional encoder decoder model. in 2021 7th International Conference on Advanced Computing and Communication Systems (ICACCS), vol 1. IEEE, pp 28–33
Zhou J, Yang T, Zhang W (2023) Underwater vision enhancement technologies: a comprehensive review, challenges, and recent trends. Appl Intell 53(3):3594–3621
Article MATH Google Scholar
Sun K, Xiao B, Liu D, Wang J (2019) Deep high-resolution representation learning for human pose estimation. in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5693–5703
Wang J, Sun K, Cheng T, Jiang B, Deng C, Zhao Y, Liu D, Mu Y, Tan M, Wang X et al (2020) Deep high-resolution representation learning for visual recognition. IEEE Trans Patt Anal Mach Intell 43(10):3349–3364
Article MATH Google Scholar
Sandler M, Howard A, Zhu M, Zhmoginov A, Chen LC (2018) Mobilenetv2: Inverted residuals and linear bottlenecks. in Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4510–4520
Howard A, Sandler M, Chu G, Chen LC, Chen B, Tan M, Wang W, Zhu Y, Pang R, Vasudevan V et al (2019) Searching for mobilenetv3. in Proceedings of the IEEE/CVF international conference on computer vision, pp 1314–1324
Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. in Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2818–2826
Szegedy C, Ioffe S, Vanhoucke V, Alemi A (2017) Inception-v4, inception-resnet and the impact of residual connections on learning. in Proc of the AAAI Conf Artif Intell 31(1)
Rahnemoonfar M, Dobbs D (2019) Semantic segmentation of underwater sonar imagery with deep learning. in IGARSS 2019-2019 IEEE International Geoscience and Remote Sensing Symposium. IEEE, pp 9455–9458
Tolie HF, Ren J, Elyan E (2024) Dicam: Deep inception and channel-wise attention modules for underwater image enhancement. Neurocomputing 584:127585
Article Google Scholar
Liu F, Fang M (2020) Semantic segmentation of underwater images based on improved deeplab. J Marine Sci Eng 8(3):188
Article MATH Google Scholar
Jin A, Zeng X (2023) A novel deep learning method for underwater target recognition based on res-dense convolutional neural network with attention mechanism. J Marine Sci Eng 11(1):69
Article MATH Google Scholar
Ding X, Zhang X, Ma N, Han J, Ding G, Sun J (2021) Repvgg: Making vgg-style convnets great again. in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 13733–13742
Lian S, Li H, Cong R, Li S, Zhang W, Kwong S (2023) Watermask: Instance segmentation for underwater imagery. in 2023 IEEE/CVF International Conference on Computer Vision (ICCV). IEEE
Hong J, Fulton M, Sattar J (2020) Trashcan: A semantically-segmented dataset towards visual detection of marine debris. arXiv:2007.08097
Yu C, Gao C, Wang J, Yu G, Shen C, Sang N (2021) Bisenet v2: Bilateral network with guided aggregation for real-time semantic segmentation. Int J Comput Vis 129(11):3051–3068
Article MATH Google Scholar
Peng J, Liu Y, Tang S, Hao Y, Chu L, Chen G, Wu Z, Chen Z, Yu Z, Du Y et al (2022) Pp-liteseg: A superior real-time semantic segmentation model. arXiv:2204.02681
Strudel R, Garcia R, Laptev I, Schmid C (2021) Segmenter: Transformer for semantic segmentation. in 2021 IEEE/CVF International Conference on Computer Vision (ICCV). IEEE
Xie E, Wang W, Yu Z, Anandkumar A, Alvarez JM, Luo P (2021) Segformer: Simple and efficient design for semantic segmentation with transformers. Advances in neural information processing systems 34(2021):12077–12090
MATH Google Scholar
Zhang W, Huang Z, Luo G, Chen T, Wang X, Liu W, Yu G, Shen C (2022) Topformer: Token pyramid transformer for mobile semantic segmentation. in 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE

Download references

Acknowledgements

This research is currently supported by Guangdong Province Basic and Applied Basic Research Foundation(2022A1515110420), Shenzhen Science and Technology Program(Grant No.RCBS20221008093227028), and National Natural Science Foundation of China(Grant No.12405214). We would like to thank Ming Yang for his participation in improving the manuscript and for his dedicated efforts in collecting the dataset required for new experiments.

Author information

Authors and Affiliations

College of Mechatronics and Control Engineering, Shenzhen University, 518060, Shenzhen, China
Chengxiang Liu, Haoxin Yao, Wenhui Qiu, Hongyuan Cui & Yubin Fang
College of Physics and Optoelectronic Engineering, Shenzhen University, 518060, Shenzhen, China
Anqi Xu

Authors

Chengxiang Liu
View author publications
You can also search for this author in PubMed Google Scholar
Haoxin Yao
View author publications
You can also search for this author in PubMed Google Scholar
Wenhui Qiu
View author publications
You can also search for this author in PubMed Google Scholar
Hongyuan Cui
View author publications
You can also search for this author in PubMed Google Scholar
Yubin Fang
View author publications
You can also search for this author in PubMed Google Scholar
Anqi Xu
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Chengxiang Liu: Conceptualization, Methodology, Supervision, Writing - Reviewing and Editing, Project administration. Haoxin Yao: Software, Visualization, Data curation, Writing-Original Draft. Wenhui Qiu: Software, Methodology, Data curation. Hongyuan Cui: Supervision, Visualization, Investigation. Yubin Fang: Investigation, Validation. Anqi Xu: Conceptualization, Formal analysis, Supervision, Writing-Reviewing and Editing, Funding acquisition.

Corresponding author

Correspondence to Anqi Xu.

Ethics declarations

Competing interests

The authors declare that they have no competing interest to this work.

Ethical and informed consent for data used

The authors of the submitted manuscript declare that does not involve any ethical issues.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Liu, C., Yao, H., Qiu, W. et al. Multi-scale feature map fusion encoding for underwater object segmentation. Appl Intell 55, 163 (2025). https://doi.org/10.1007/s10489-024-05971-4

Download citation

Accepted: 05 October 2024
Published: 13 December 2024
DOI: https://doi.org/10.1007/s10489-024-05971-4

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-scale feature map fusion encoding for underwater object segmentation

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Compact and Fast Underwater Segmentation Network for Autonomous Underwater Vehicles

Multi-branch Underwater Scene Semantic Segmentation by Fusing Depth Information and Enhanced Visual Feature

Simultaneous Localization and Segmentation of Fish Objects Using Multi-task CNN and Dense CRF

Data availability and access

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Ethical and informed consent for data used

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Multi-scale feature map fusion encoding for underwater object segmentation

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Compact and Fast Underwater Segmentation Network for Autonomous Underwater Vehicles

Multi-branch Underwater Scene Semantic Segmentation by Fusing Depth Information and Enhanced Visual Feature

Simultaneous Localization and Segmentation of Fish Objects Using Multi-task CNN and Dense CRF

Explore related subjects

Data availability and access

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Ethical and informed consent for data used

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation