Skip to main content

Advertisement

Multi-scale feature map fusion encoding for underwater object segmentation

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Underwater object segmentation presents significant challenges due to the degradation of image quality and the complexity of underwater environments. In recent years, deep learning has provided an effective approach for object segmentation. However, DeepLabV3+, as a classical model for general scenes, shows limitations in achieving accurate and real-time segmentation in complex underwater conditions. To address this issue, we propose a DeepLab-FusionNet, an extended version of DeepLabV3+, specifically designed for underwater object segmentation. The model utilizes a multi-resolution parallel branch structure to extract multi-scale information and employs an improved inverted residual structure as the basic feature extraction module in the encoding network. Structural reparameterization technique is introduced to optimize inference speed and memory access costs during the inference stage. Additionally, a module for linking deep and shallow level information is constructed to reduce the loss of detail and spatial information during downsampling and convolution. Evaluation on the SUIM dataset shows a 3.3% increase in mean Intersection over Union (mIoU) and a speed improvement of 34 frames per second (FPS) compared to the baseline model DeepLabV3+. Further comparisons with other classic lightweight models and Transformer-based models on the UIIS and TrashCan datasets demonstrate that our model achieves good accuracy and balanced computational efficiency in challenging underwater environments. Although there is room for improvement due to overfitting and fixed convolution kernel limitations, future integration with Transformer methods is planned. Our model offers an effective solution for real-time target segmentation for underwater robots, with broad applications in human exploration and development of marine resources. Our codes are available at: https://github.com/sunmer1rain/deeplabv_fusionnet

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Data availability and access

The datasets such as SUIM, UIIS and TrashCan used in this research are available from the reference [21, 35, 36] respectively. All data generated or analysed during this study are included in this published article.

References

  1. Hong L, Wang X, Zhang D (2024) Cfd-based hydrodynamic performance investigation of autonomous underwater vehicles: A survey. Ocean Eng 305:117911

    Article  MATH  Google Scholar 

  2. Osayi Philip Igbinenikaro OOA, Etukudoh EA (2024) A comparative review of subsea navigation technologies in offshore engineering projects. Int J Front Eng Technol Res 6(2):019–034

  3. Hasan K, Ahmad S, Liaf AF, Karimi M, Ahmed T, Shawon MA, Mekhilef S (2024) Oceanic challenges to technological solutions: A review of autonomous underwater vehicle path technologies in biomimicry, control, navigation, and sensing. IEEE Access 12:46202–46231

    Article  Google Scholar 

  4. Huy DQ, Sadjoli N, Azam AB, Elhadidi B, Cai Y, Seet G (2023) Object perception in underwater environments: A survey on sensors and sensing methodologies. Ocean Eng 267

  5. Li M, Zhang H, Gruen A, Li D (2024) A survey on underwater coral image segmentation based on deep learning. Geo-spatial Inf Sci p 1–25

  6. Pergeorelis M, Bazik M, Saponaro P, Kim J, Kambhamettu C (2022) Synthetic data for semantic segmentation in underwater imagery. in OCEANS. Hampton Roads. IEEE 2022:1–6

    Google Scholar 

  7. Ji L, Du Y, Dang Y, Gao W, Zhang H (2024) A survey of methods for addressing the challenges of referring image segmentation. Neurocomputing 583:127599

    Article  Google Scholar 

  8. Mo Y, Wu Y, Yang X, Liu F, Liao Y (2022) Review the state-of-the-art technologies of semantic segmentation based on deep learning. Neurocomputing 493:626–646

    Article  MATH  Google Scholar 

  9. Hao S, Zhou Y, Guo Y (2020) A brief survey on semantic segmentation with deep learning. Neurocomputing 406:302–321

    Article  MATH  Google Scholar 

  10. Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation.’ in Proceedings of the IEEE conference on computer vision and pattern recognition. IEEE, pp 3431–3440

  11. Ronneberger O, Fischer P, Brox T (2015) U-net: Convolutional networks for biomedical image segmentation. in Medical Image Computing and Computer-Assisted Intervention-MICCAI, 18th International Conference, Munich, Germany, October 5–9, Proceedings, Part III 18. Springer 2015:234–241

    MATH  Google Scholar 

  12. Wang J, Liu X (2021) Medical image recognition and segmentation of pathological slices of gastric cancer based on deeplab v3+ neural network. Comput Methods Prog Biomed 207:106210

    Article  MATH  Google Scholar 

  13. Chen L, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2017) Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans Patt Anal Mach Intell 40(4):834–848

    Article  MATH  Google Scholar 

  14. Bai Z, Jing J (2023) Mobile-deeplab: a lightweight pixel segmentation-based method for fabric defect detection. J Intell Manuf

  15. Chen L, Zhu Y, Papandreou G, Schroff F, Adam H (2018) Encoder-decoder with atrous separable convolution for semantic image segmentation. in Proceedings of the European conference on computer vision (ECCV), pp 801–818

  16. Zhuang P, Wang Y, Qiao Y (2021) Wildfish++: A comprehensive fish benchmark for multimedia research. IEEE Trans Multimed 23:3603–3617

    Article  MATH  Google Scholar 

  17. Ditria EM, Connolly RM, Jinks EL, Lopez-Marcano S (2021) Annotated video footage for automated identification and counting of fish in unconstrained seagrass habitats. Front Marine Sci 8

  18. Cai L, Chen C, Chai H (2021) Underwater distortion target recognition network (udtrnet) via enhanced image features. Comput Intell Neurosci 2021:1–10

  19. Zhang P, Yu H, Li H, Zhang X, Wei S, Tu W, Yang Z, Wu J, Lin Y (2023) Msgnet: multi-source guidance network for fish segmentation in underwater videos. Front Marine Sci 10

  20. Martin-Abadal M, Guerrero-Font E, Bonin-Font F, Gonzalez-Cid Y (2018) Deep semantic segmentation in an auv for online posidonia oceanica meadows identification. IEEE Access 6(2018):60956–60967

    Article  Google Scholar 

  21. Islam MJ, Edge C, Xiao Y, Luo P, Mehtaz M, Morse C, Enan SS, Sattar J (2020) Semantic segmentation of underwater imagery: Dataset and benchmark. in 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, pp 1769–1776

  22. Nezla N, Haridas TM, Supriya M (2021) Semantic segmentation of underwater images using unet architecture based deep convolutional encoder decoder model. in 2021 7th International Conference on Advanced Computing and Communication Systems (ICACCS), vol 1. IEEE, pp 28–33

  23. Zhou J, Yang T, Zhang W (2023) Underwater vision enhancement technologies: a comprehensive review, challenges, and recent trends. Appl Intell 53(3):3594–3621

    Article  MATH  Google Scholar 

  24. Sun K, Xiao B, Liu D, Wang J (2019) Deep high-resolution representation learning for human pose estimation. in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5693–5703

  25. Wang J, Sun K, Cheng T, Jiang B, Deng C, Zhao Y, Liu D, Mu Y, Tan M, Wang X et al (2020) Deep high-resolution representation learning for visual recognition. IEEE Trans Patt Anal Mach Intell 43(10):3349–3364

    Article  MATH  Google Scholar 

  26. Sandler M, Howard A, Zhu M, Zhmoginov A, Chen LC (2018) Mobilenetv2: Inverted residuals and linear bottlenecks. in Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4510–4520

  27. Howard A, Sandler M, Chu G, Chen LC, Chen B, Tan M, Wang W, Zhu Y, Pang R, Vasudevan V et al (2019) Searching for mobilenetv3. in Proceedings of the IEEE/CVF international conference on computer vision, pp 1314–1324

  28. Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. in Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2818–2826

  29. Szegedy C, Ioffe S, Vanhoucke V, Alemi A (2017) Inception-v4, inception-resnet and the impact of residual connections on learning. in Proc of the AAAI Conf Artif Intell 31(1)

  30. Rahnemoonfar M, Dobbs D (2019) Semantic segmentation of underwater sonar imagery with deep learning. in IGARSS 2019-2019 IEEE International Geoscience and Remote Sensing Symposium. IEEE, pp 9455–9458

  31. Tolie HF, Ren J, Elyan E (2024) Dicam: Deep inception and channel-wise attention modules for underwater image enhancement. Neurocomputing 584:127585

    Article  Google Scholar 

  32. Liu F, Fang M (2020) Semantic segmentation of underwater images based on improved deeplab. J Marine Sci Eng 8(3):188

    Article  MATH  Google Scholar 

  33. Jin A, Zeng X (2023) A novel deep learning method for underwater target recognition based on res-dense convolutional neural network with attention mechanism. J Marine Sci Eng 11(1):69

    Article  MATH  Google Scholar 

  34. Ding X, Zhang X, Ma N, Han J, Ding G, Sun J (2021) Repvgg: Making vgg-style convnets great again. in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 13733–13742

  35. Lian S, Li H, Cong R, Li S, Zhang W, Kwong S (2023) Watermask: Instance segmentation for underwater imagery. in 2023 IEEE/CVF International Conference on Computer Vision (ICCV). IEEE

  36. Hong J, Fulton M, Sattar J (2020) Trashcan: A semantically-segmented dataset towards visual detection of marine debris. arXiv:2007.08097

  37. Yu C, Gao C, Wang J, Yu G, Shen C, Sang N (2021) Bisenet v2: Bilateral network with guided aggregation for real-time semantic segmentation. Int J Comput Vis 129(11):3051–3068

    Article  MATH  Google Scholar 

  38. Peng J, Liu Y, Tang S, Hao Y, Chu L, Chen G, Wu Z, Chen Z, Yu Z, Du Y et al (2022) Pp-liteseg: A superior real-time semantic segmentation model. arXiv:2204.02681

  39. Strudel R, Garcia R, Laptev I, Schmid C (2021) Segmenter: Transformer for semantic segmentation. in 2021 IEEE/CVF International Conference on Computer Vision (ICCV). IEEE

  40. Xie E, Wang W, Yu Z, Anandkumar A, Alvarez JM, Luo P (2021) Segformer: Simple and efficient design for semantic segmentation with transformers. Advances in neural information processing systems 34(2021):12077–12090

    MATH  Google Scholar 

  41. Zhang W, Huang Z, Luo G, Chen T, Wang X, Liu W, Yu G, Shen C (2022) Topformer: Token pyramid transformer for mobile semantic segmentation. in 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE

Download references

Acknowledgements

This research is currently supported by Guangdong Province Basic and Applied Basic Research Foundation(2022A1515110420), Shenzhen Science and Technology Program(Grant No.RCBS20221008093227028), and National Natural Science Foundation of China(Grant No.12405214). We would like to thank Ming Yang for his participation in improving the manuscript and for his dedicated efforts in collecting the dataset required for new experiments.

Author information

Authors and Affiliations

Authors

Contributions

Chengxiang Liu: Conceptualization, Methodology, Supervision, Writing - Reviewing and Editing, Project administration. Haoxin Yao: Software, Visualization, Data curation, Writing-Original Draft. Wenhui Qiu: Software, Methodology, Data curation. Hongyuan Cui: Supervision, Visualization, Investigation. Yubin Fang: Investigation, Validation. Anqi Xu: Conceptualization, Formal analysis, Supervision, Writing-Reviewing and Editing, Funding acquisition.

Corresponding author

Correspondence to Anqi Xu.

Ethics declarations

Competing interests

The authors declare that they have no competing interest to this work.

Ethical and informed consent for data used

The authors of the submitted manuscript declare that does not involve any ethical issues.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, C., Yao, H., Qiu, W. et al. Multi-scale feature map fusion encoding for underwater object segmentation. Appl Intell 55, 163 (2025). https://doi.org/10.1007/s10489-024-05971-4

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10489-024-05971-4

Keywords