Abstract
Underwater Image Enhancement (UIE) is critical for marine research and exploration but hindered by complex color distortions and severe blurring. Recent deep learning-based methods have achieved remarkable results, yet these methods struggle with high computational costs and insufficient global modeling, resulting in locally under- or over-adjusted regions. We present PixMamba, a novel architecture, designed to overcome these challenges by leveraging State Space Models (SSMs) for efficient global dependency modeling. Unlike convolutional neural networks (CNNs) with limited receptive fields and transformer networks with high computational costs, PixMamba efficiently captures global contextual information while maintaining computational efficiency. Our dual-level strategy features the patch-level Efficient Mamba Net (EMNet) for reconstructing enhanced image feature and the pixel-level PixMamba Net (PixNet) to ensure fine-grained feature capturing and global consistency of enhanced image that were previously difficult to obtain. PixMamba achieves state-of-the-art performance across various underwater image datasets and delivers visually superior results. Code is available at https://github.com/weitunglin/pixmamba.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Ancuti, C.O., Ancuti, C., De Vleeschouwer, C., Bekaert, P.: Color balance and fusion for underwater image enhancement. IEEE Trans. Image Process. 27(1), 379–393 (2018)
Berman, D., Levy, D., Avidan, S., Treibitz, T.: Underwater single image color restoration using haze-lines and a new quantitative dataset. IEEE Trans. Pattern Anal. Mach. Intell. 43(8), 2822–2837 (2021)
Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-unet: Unet-like pure transformer for medical image segmentation. In: European Conference on Computer Vision Workshops (ECCVW). pp. 205–218 (2022)
Cao, X., Ren, L., Sun, C.: Dynamic target tracking control of autonomous underwater vehicle based on trajectory prediction. IEEE Transactions on Cybernetics 53(3), 1968–1981 (2023)
Chen, S.F., Wen, C.X., Cheng, W.H., Hua, K.L.: Representation and boundary enhancement for action segmentation using transformer. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). pp. 5965–5969 (2024)
Cong, R., Yang, W., Zhang, W., Li, c., Guo, C.L., Huang, Q., Kwong, S.: PUGAN: Physical model-guided underwater image enhancement using GAN with dual-discriminators. IEEE Transactions on Image Processing 32, 4472–4485 (2023)
Dao, T., Gu, A.: Transformers are SSMs: Generalized models and efficient algorithms through structured state space duality. In: International Conference on Machine Learning (ICML) (2024)
Fayaz, S., Parah, S.A., Qureshi, G.J., Lloret, J., Ser, J.D., Muhammad, K.: Intelligent underwater object detection and image restoration for autonomous underwater vehicles. IEEE Trans. Veh. Technol. 73(2), 1726–1735 (2024)
Fu, Z., Wang, W., Huang, Y., Ding, X., Ma, K.K.: Uncertainty inspired underwater image enhancement. In: European Conference on Computer Vision (ECCV). pp. 465–482 (2022)
Gu, A., Dao, T.: Mamba: Linear-time sequence modeling with selective state spaces. arXiv preprint arXiv:2312.00752 (2023)
Gu, P., Zhang, Y., Wang, C., Chen, D.Z.: Convformer: Combining cnn and transformer for medical image segmentation. In: International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI). pp. 642–651 (2023)
Guan, M., Xu, H., Jiang, G., Yu, M., Chen, Y., Luo, T., Song, Y.: WaterMamba: Visual state space model for underwater image enhancement. arXiv preprint arXiv:2405.08419 (2024)
Guo, C., Wu, R., Jin, X., Han, L., Chai, Z., Zhang, W., Li, C.: Underwater Ranker: Learn which is better and how to be better. In: AAAI Conference on Artificial Intelligence (AAAI). pp. 702–709 (2023)
Guo, H., Li, J., Dai, T., Ouyang, Z., Ren, X., Xia, S.T.: MambaIR: A simple baseline for image restoration with state-space model. arXiv preprint arXiv:2402.15648 (2024)
Huang, S., Wang, K., Liu, H., Chen, J., Li, Y.: Contrastive semi-supervised learning for underwater image restoration via reliable bank. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 18145–18155 (2023)
Huang, T., Pei, X., You, S., Wang, F., Qian, C., Xu, C.: LocalMamba: Visual state space model with windowed selective scan. arXiv preprint arXiv:2403.09338 (2024)
Jiang, J., Ye, T., Bai, J., Chen, S., Chai, W., Jun, S., Liu, Y., Chen, E.: Five A\(^+\) Network: You only need 9k parameters for underwater image enhancement. In: British Machine Vision Conference (BMVC) (2023)
Korhonen, J., You, J.: Peak signal-to-noise ratio revisited: Is simple beautiful? In: International Workshop on Quality of Multimedia Experience Workshop (QoMEX). pp. 37–38 (2012)
Lai, W.S., Huang, J.B., Ahuja, N., Yang, M.H.: Deep laplacian pyramid networks for fast and accurate super-resolution. In: IEEE/CVF Conferene on Computer Vision and Pattern Recognition (CVPR). pp. 624–632 (2017)
Li, C., Anwar, S., Hou, J., Cong, R., Guo, C., Ren, W.: Underwater image enhancement via medium transmission-guided multi-color space embedding. IEEE Transactions on Image Processing 30 (2021)
Li, C., Guo, C., Ren, W., Cong, R., Hou, J., Kwong, S., Tao, D.: An underwater image enhancement benchmark dataset and beyond. IEEE Trans. Image Process. 29, 4376–4389 (2020)
Li, C., Quo, J., Pang, Y., Chen, S., Wang, J.: Single underwater image restoration by blue-green channels dehazing and red channel correction. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). pp. 1731–1735 (2016)
Lin, Y.X., Tan, D.S., Cheng, W.H., Chen, Y.Y., Hua, K.L.: Spatially-aware domain adaptation for semantic segmentation of urban scenes. In: IEEE International Conference on Image Processing (ICIP). pp. 1870–1874 (2019)
Liu, R., Fan, X., Zhu, M., Hou, M., Luo, Z.: Real-world underwater enhancement: Challenges, benchmarks, and solutions under natural light. IEEE Trans. Circuits Syst. Video Technol. 30(12), 4861–4875 (2020)
Liu, Y., Tian, Y., Zhao, Y., Yu, H., Xie, L., Wang, Y., Ye, Q., Liu, Y.: VMamba: Visual state space model. arXiv preprint arXiv:2401.10166 (2024)
Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: IEEE/CVF International Conference on Computer Vision (ICCV). pp. 10012–10022 (2021)
Loshchilov, I., Hutter, F.: SGDR: Stochastic gradient descent with warm restarts. In: International Conference on Learning Representations (ICLR) (2017)
Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: International Conference on Learning Representations (ICLR) (2019)
MMagic Contributors: MMagic: OpenMMLab multimodal advanced, generative, and intelligent creation toolbox. https://github.com/open-mmlab/mmagic (2023)
Naik, A., Swarnakar, A., Mittal, K.: Shallow-UWnet: Compressed model for underwater image enhancement. In: AAAI Conference on Artificial Intelligence (AAAI). pp. 15853–15854 (2021)
Panetta, K., Gao, C., Agaian, S.: Human-visual-system-inspired underwater image quality measures. IEEE J. Oceanic Eng. 41(3), 541–551 (2016)
Peng, Y.T., Cosman, P.C.: Underwater image restoration based on image blurriness and light absorption. IEEE Trans. Image Process. 26(4), 1579–1594 (2017)
Pramanick, A., Sarma, S., Sur, A.: X-caunet: Cross-color channel attention with underwater image-enhancing transformer. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). pp. 3550–3554 (2024)
Ren, T., Xu, H., Jiang, G., Yu, M., Zhang, X., Wang, B., Luo, T.: Reinforced swin-convs transformer for simultaneous underwater sensing scene image enhancement and super-resolution. IEEE Transactions on Geoscience and Remote Sensing (2022)
Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI). pp. 234–241 (2015)
Shahid, M., Chien, I.F., Sarapugdi, W., Miao, L., Hua, K.L.: Deep spatial-temporal networks for flame detection. Multimedia Tools and Applications 80, 1–22 (11 2021)
Shahid, M., Virtusio, J., Wu, Y.H., Chen, Y.Y., Tanveer, M., Muhammad, K., Hua, K.L.: Spatio-temporal self-attention network for fire detection and segmentation in video surveillance. IEEE Access PP, 1–1 (12 2021)
Shi, Y., Xia, B., Jin, X., Wang, X., Zhao, T., Xia, X., Xiao, X., Yang, W.: VmambaIR: Visual state space model for image restoration. arXiv preprint arXiv:2403.11423 (2024)
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. Advances in Neural Information Processing Systems (NeurIPS) 30 (2017)
Wang, Y., Guo, J., Gao, H., Yue, H.: UIEC2-Net: Cnn-based underwater image enhancement using two color space. Signal Processing: Image Communication 96, 116250 (2021)
Wang, Z., Bovik, A., Sheikh, H., Simoncelli, E.: Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13(4), 600–612 (2004)
Yang, M., Sowmya, A.: An underwater color image quality evaluation metric. IEEE Trans. Image Process. 24(12), 6062–6071 (2015)
Zamir, S.W., Arora, A., Khan, S., Hayat, M., Khan, F.S., Yang, M.H.: Restormer: Efficient transformer for high-resolution image restoration. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 5728–5739 (2022)
Zhou, J., Sun, J., Zhang, W., Lin, Z.: Multi-view underwater image enhancement method via embedded fusion mechanism. Eng. Appl. Artif. Intell. 121, 105946 (2023)
Zhuang, P., Wu, J., Porikli, F., Li, C.: Underwater image enhancement with hyper-laplacian reflectance priors. IEEE Trans. Image Process. 31, 5442–5455 (2022)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Lin, WT., Lin, YX., Chen, JW., Hua, KL. (2025). PixMamba: Leveraging State Space Models in a Dual-Level Architecture for Underwater Image Enhancement. In: Cho, M., Laptev, I., Tran, D., Yao, A., Zha, H. (eds) Computer Vision – ACCV 2024. ACCV 2024. Lecture Notes in Computer Science, vol 15475. Springer, Singapore. https://doi.org/10.1007/978-981-96-0911-6_11
Download citation
DOI: https://doi.org/10.1007/978-981-96-0911-6_11
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-96-0910-9
Online ISBN: 978-981-96-0911-6
eBook Packages: Computer ScienceComputer Science (R0)