Skip to main content

PixMamba: Leveraging State Space Models in a Dual-Level Architecture for Underwater Image Enhancement

  • Conference paper
  • First Online:
Computer Vision – ACCV 2024 (ACCV 2024)

Abstract

Underwater Image Enhancement (UIE) is critical for marine research and exploration but hindered by complex color distortions and severe blurring. Recent deep learning-based methods have achieved remarkable results, yet these methods struggle with high computational costs and insufficient global modeling, resulting in locally under- or over-adjusted regions. We present PixMamba, a novel architecture, designed to overcome these challenges by leveraging State Space Models (SSMs) for efficient global dependency modeling. Unlike convolutional neural networks (CNNs) with limited receptive fields and transformer networks with high computational costs, PixMamba efficiently captures global contextual information while maintaining computational efficiency. Our dual-level strategy features the patch-level Efficient Mamba Net (EMNet) for reconstructing enhanced image feature and the pixel-level PixMamba Net (PixNet) to ensure fine-grained feature capturing and global consistency of enhanced image that were previously difficult to obtain. PixMamba achieves state-of-the-art performance across various underwater image datasets and delivers visually superior results. Code is available at https://github.com/weitunglin/pixmamba.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Ancuti, C.O., Ancuti, C., De Vleeschouwer, C., Bekaert, P.: Color balance and fusion for underwater image enhancement. IEEE Trans. Image Process. 27(1), 379–393 (2018)

    Article  MathSciNet  Google Scholar 

  2. Berman, D., Levy, D., Avidan, S., Treibitz, T.: Underwater single image color restoration using haze-lines and a new quantitative dataset. IEEE Trans. Pattern Anal. Mach. Intell. 43(8), 2822–2837 (2021)

    Google Scholar 

  3. Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-unet: Unet-like pure transformer for medical image segmentation. In: European Conference on Computer Vision Workshops (ECCVW). pp. 205–218 (2022)

    Google Scholar 

  4. Cao, X., Ren, L., Sun, C.: Dynamic target tracking control of autonomous underwater vehicle based on trajectory prediction. IEEE Transactions on Cybernetics 53(3), 1968–1981 (2023)

    Article  Google Scholar 

  5. Chen, S.F., Wen, C.X., Cheng, W.H., Hua, K.L.: Representation and boundary enhancement for action segmentation using transformer. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). pp. 5965–5969 (2024)

    Google Scholar 

  6. Cong, R., Yang, W., Zhang, W., Li, c., Guo, C.L., Huang, Q., Kwong, S.: PUGAN: Physical model-guided underwater image enhancement using GAN with dual-discriminators. IEEE Transactions on Image Processing 32, 4472–4485 (2023)

    Google Scholar 

  7. Dao, T., Gu, A.: Transformers are SSMs: Generalized models and efficient algorithms through structured state space duality. In: International Conference on Machine Learning (ICML) (2024)

    Google Scholar 

  8. Fayaz, S., Parah, S.A., Qureshi, G.J., Lloret, J., Ser, J.D., Muhammad, K.: Intelligent underwater object detection and image restoration for autonomous underwater vehicles. IEEE Trans. Veh. Technol. 73(2), 1726–1735 (2024)

    Article  Google Scholar 

  9. Fu, Z., Wang, W., Huang, Y., Ding, X., Ma, K.K.: Uncertainty inspired underwater image enhancement. In: European Conference on Computer Vision (ECCV). pp. 465–482 (2022)

    Google Scholar 

  10. Gu, A., Dao, T.: Mamba: Linear-time sequence modeling with selective state spaces. arXiv preprint arXiv:2312.00752 (2023)

  11. Gu, P., Zhang, Y., Wang, C., Chen, D.Z.: Convformer: Combining cnn and transformer for medical image segmentation. In: International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI). pp. 642–651 (2023)

    Google Scholar 

  12. Guan, M., Xu, H., Jiang, G., Yu, M., Chen, Y., Luo, T., Song, Y.: WaterMamba: Visual state space model for underwater image enhancement. arXiv preprint arXiv:2405.08419 (2024)

  13. Guo, C., Wu, R., Jin, X., Han, L., Chai, Z., Zhang, W., Li, C.: Underwater Ranker: Learn which is better and how to be better. In: AAAI Conference on Artificial Intelligence (AAAI). pp. 702–709 (2023)

    Google Scholar 

  14. Guo, H., Li, J., Dai, T., Ouyang, Z., Ren, X., Xia, S.T.: MambaIR: A simple baseline for image restoration with state-space model. arXiv preprint arXiv:2402.15648 (2024)

  15. Huang, S., Wang, K., Liu, H., Chen, J., Li, Y.: Contrastive semi-supervised learning for underwater image restoration via reliable bank. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 18145–18155 (2023)

    Google Scholar 

  16. Huang, T., Pei, X., You, S., Wang, F., Qian, C., Xu, C.: LocalMamba: Visual state space model with windowed selective scan. arXiv preprint arXiv:2403.09338 (2024)

  17. Jiang, J., Ye, T., Bai, J., Chen, S., Chai, W., Jun, S., Liu, Y., Chen, E.: Five A\(^+\) Network: You only need 9k parameters for underwater image enhancement. In: British Machine Vision Conference (BMVC) (2023)

    Google Scholar 

  18. Korhonen, J., You, J.: Peak signal-to-noise ratio revisited: Is simple beautiful? In: International Workshop on Quality of Multimedia Experience Workshop (QoMEX). pp. 37–38 (2012)

    Google Scholar 

  19. Lai, W.S., Huang, J.B., Ahuja, N., Yang, M.H.: Deep laplacian pyramid networks for fast and accurate super-resolution. In: IEEE/CVF Conferene on Computer Vision and Pattern Recognition (CVPR). pp. 624–632 (2017)

    Google Scholar 

  20. Li, C., Anwar, S., Hou, J., Cong, R., Guo, C., Ren, W.: Underwater image enhancement via medium transmission-guided multi-color space embedding. IEEE Transactions on Image Processing 30 (2021)

    Google Scholar 

  21. Li, C., Guo, C., Ren, W., Cong, R., Hou, J., Kwong, S., Tao, D.: An underwater image enhancement benchmark dataset and beyond. IEEE Trans. Image Process. 29, 4376–4389 (2020)

    Article  Google Scholar 

  22. Li, C., Quo, J., Pang, Y., Chen, S., Wang, J.: Single underwater image restoration by blue-green channels dehazing and red channel correction. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). pp. 1731–1735 (2016)

    Google Scholar 

  23. Lin, Y.X., Tan, D.S., Cheng, W.H., Chen, Y.Y., Hua, K.L.: Spatially-aware domain adaptation for semantic segmentation of urban scenes. In: IEEE International Conference on Image Processing (ICIP). pp. 1870–1874 (2019)

    Google Scholar 

  24. Liu, R., Fan, X., Zhu, M., Hou, M., Luo, Z.: Real-world underwater enhancement: Challenges, benchmarks, and solutions under natural light. IEEE Trans. Circuits Syst. Video Technol. 30(12), 4861–4875 (2020)

    Article  Google Scholar 

  25. Liu, Y., Tian, Y., Zhao, Y., Yu, H., Xie, L., Wang, Y., Ye, Q., Liu, Y.: VMamba: Visual state space model. arXiv preprint arXiv:2401.10166 (2024)

  26. Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: IEEE/CVF International Conference on Computer Vision (ICCV). pp. 10012–10022 (2021)

    Google Scholar 

  27. Loshchilov, I., Hutter, F.: SGDR: Stochastic gradient descent with warm restarts. In: International Conference on Learning Representations (ICLR) (2017)

    Google Scholar 

  28. Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: International Conference on Learning Representations (ICLR) (2019)

    Google Scholar 

  29. MMagic Contributors: MMagic: OpenMMLab multimodal advanced, generative, and intelligent creation toolbox. https://github.com/open-mmlab/mmagic (2023)

  30. Naik, A., Swarnakar, A., Mittal, K.: Shallow-UWnet: Compressed model for underwater image enhancement. In: AAAI Conference on Artificial Intelligence (AAAI). pp. 15853–15854 (2021)

    Google Scholar 

  31. Panetta, K., Gao, C., Agaian, S.: Human-visual-system-inspired underwater image quality measures. IEEE J. Oceanic Eng. 41(3), 541–551 (2016)

    Article  Google Scholar 

  32. Peng, Y.T., Cosman, P.C.: Underwater image restoration based on image blurriness and light absorption. IEEE Trans. Image Process. 26(4), 1579–1594 (2017)

    Article  MathSciNet  Google Scholar 

  33. Pramanick, A., Sarma, S., Sur, A.: X-caunet: Cross-color channel attention with underwater image-enhancing transformer. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). pp. 3550–3554 (2024)

    Google Scholar 

  34. Ren, T., Xu, H., Jiang, G., Yu, M., Zhang, X., Wang, B., Luo, T.: Reinforced swin-convs transformer for simultaneous underwater sensing scene image enhancement and super-resolution. IEEE Transactions on Geoscience and Remote Sensing (2022)

    Google Scholar 

  35. Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI). pp. 234–241 (2015)

    Google Scholar 

  36. Shahid, M., Chien, I.F., Sarapugdi, W., Miao, L., Hua, K.L.: Deep spatial-temporal networks for flame detection. Multimedia Tools and Applications 80, 1–22 (11 2021)

    Google Scholar 

  37. Shahid, M., Virtusio, J., Wu, Y.H., Chen, Y.Y., Tanveer, M., Muhammad, K., Hua, K.L.: Spatio-temporal self-attention network for fire detection and segmentation in video surveillance. IEEE Access PP, 1–1 (12 2021)

    Google Scholar 

  38. Shi, Y., Xia, B., Jin, X., Wang, X., Zhao, T., Xia, X., Xiao, X., Yang, W.: VmambaIR: Visual state space model for image restoration. arXiv preprint arXiv:2403.11423 (2024)

  39. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. Advances in Neural Information Processing Systems (NeurIPS) 30 (2017)

    Google Scholar 

  40. Wang, Y., Guo, J., Gao, H., Yue, H.: UIEC2-Net: Cnn-based underwater image enhancement using two color space. Signal Processing: Image Communication 96, 116250 (2021)

    Google Scholar 

  41. Wang, Z., Bovik, A., Sheikh, H., Simoncelli, E.: Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13(4), 600–612 (2004)

    Article  Google Scholar 

  42. Yang, M., Sowmya, A.: An underwater color image quality evaluation metric. IEEE Trans. Image Process. 24(12), 6062–6071 (2015)

    Article  MathSciNet  Google Scholar 

  43. Zamir, S.W., Arora, A., Khan, S., Hayat, M., Khan, F.S., Yang, M.H.: Restormer: Efficient transformer for high-resolution image restoration. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 5728–5739 (2022)

    Google Scholar 

  44. Zhou, J., Sun, J., Zhang, W., Lin, Z.: Multi-view underwater image enhancement method via embedded fusion mechanism. Eng. Appl. Artif. Intell. 121, 105946 (2023)

    Article  Google Scholar 

  45. Zhuang, P., Wu, J., Porikli, F., Li, C.: Underwater image enhancement with hyper-laplacian reflectance priors. IEEE Trans. Image Process. 31, 5442–5455 (2022)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kai-Lung Hua .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Lin, WT., Lin, YX., Chen, JW., Hua, KL. (2025). PixMamba: Leveraging State Space Models in a Dual-Level Architecture for Underwater Image Enhancement. In: Cho, M., Laptev, I., Tran, D., Yao, A., Zha, H. (eds) Computer Vision – ACCV 2024. ACCV 2024. Lecture Notes in Computer Science, vol 15475. Springer, Singapore. https://doi.org/10.1007/978-981-96-0911-6_11

Download citation

  • DOI: https://doi.org/10.1007/978-981-96-0911-6_11

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-96-0910-9

  • Online ISBN: 978-981-96-0911-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics