Skip to main content

MambaDW: Semantic-Aware Mamba for Document Watermark Removal

  • Conference paper
  • First Online:
Advances in Computer Graphics (CGI 2024)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15338))

Included in the following conference series:

  • 144 Accesses

Abstract

Watermark removal is essential for Optical Character Recognition and digital document reconstruction. Document images differ significantly from natural scene images, necessitating tailored image processing methods. Traditional Transformer-based methods, though effective, face scalability issues due to the quadratic complexity of attention computations. Inspired by Mamba, which scales linearly with context length, we introduce a semantically-guided Mamba approach, MambaDW. This two-stage method first removes most watermarks and then employs semantic-guided 2D selective scanning for precise enhancement and global information fusion. We also incorporate a self-supervised loss to enhance generalizability on unlabeled real data. Our experiments show MambaDW’s effectiveness in document watermark removal.

This work is supported by Shenzhen Science and Technology Innovation Commission (JSGG20220831105002004).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Chen, S., et al.: AdaptFormer: adapting vision transformers for scalable visual recognition. 35, 16664–16678 (2022)

    Google Scholar 

  2. Conde, M.V., Geigle, G., Timofte, R.: High-quality image restoration following human instructions. arXiv preprint arXiv:2401.16468 (2024)

  3. Dong, X., et al.: Cswin transformer: a general vision transformer backbone with cross-shaped windows, pp. 12124–12134 (2022)

    Google Scholar 

  4. Dosovitskiy, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale (2021)

    Google Scholar 

  5. Dosovitskiy, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)

  6. Fan, M., Wang, W., Yang, W., Liu, J.: Integrating semantic segmentation and Retinex model for low-light image enhancement. In: ACMMM, pp. 2317–2325 (2020)

    Google Scholar 

  7. Gu, A., Dao, T.: Mamba: linear-time sequence modeling with selective state spaces. arXiv preprint arXiv:2312.00752 (2023)

  8. Gu, A., Goel, K., Ré, C.: Efficiently modeling long sequences with structured state spaces. arXiv preprint arXiv:2111.00396 (2021)

  9. Guo, H., Li, J., Dai, T., Ouyang, Z., Ren, X., Xia, S.T.: Mambair: a simple baseline for image restoration with state-space model. arXiv preprint arXiv:2402.15648 (2024)

  10. Huang, J., Liu, Y., Chen, S.: Bootstrap diffusion model curve estimation for high resolution low-light image enhancement. In: Pacific Rim International Conference on Artificial Intelligence, pp. 67–80. Springer (2023)

    Google Scholar 

  11. Huang, Y., et al.: WaveDM: wavelet-based diffusion models for image restoration. IEEE Trans. Multimedia (2024)

    Google Scholar 

  12. Li, K., et al.: VideoMamba: state space model for efficient video understanding. arXiv preprint arXiv:2403.06977 (2024)

  13. Li, X., Chen, C., Zhou, S., Lin, X., Zuo, W., Zhang, L.: Blind face restoration via deep multi-scale component dictionaries. In: ECCV, pp. 399–415. Springer (2020)

    Google Scholar 

  14. Li, Y., Chang, Y., Yu, C., Yan, L.: Close the loop: a unified bottom-up and top-down paradigm for joint image deraining and segmentation. In: AAAI (2022)

    Google Scholar 

  15. Liang, D., et al.: Semantically contrastive learning for low-light image enhancement. In: AAAI, pp. 1555–1563 (2022)

    Google Scholar 

  16. Liu, J., et al.: Swin-UMamba: Mamba-based UNet with ImageNet-based pretraining. arXiv preprint arXiv:2402.03302 (2024)

  17. Liu, Y., et al.: VMamba: visual state space model. arXiv preprint arXiv:2401.10166 (2024)

  18. Ma, J., Li, F., Wang, B.: U-mamba: enhancing long-range dependency for biomedical image segmentation. arXiv preprint arXiv:2401.04722 (2024)

  19. Ma, X., Zhang, X., Pun, M.O.: RS3Mamba: visual state space model for remote sensing images semantic segmentation. arXiv preprint arXiv:2404.02457 (2024)

  20. Pei, X., Huang, T., Xu, C.: EfficientVMamba: atrous selective scan for light weight visual mamba. arXiv preprint arXiv:2403.09977 (2024)

  21. Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18, pp. 234–241. Springer (2015)

    Google Scholar 

  22. Shang, S., Shan, Z., Liu, G., Zhang, J.: ResDiff: combining CNN and diffusion model for image super-resolution. arXiv preprint arXiv:2303.08714 (2023)

  23. Smith, J.T., Warrington, A., Linderman, S.W.: Simplified state space layers for sequence modeling. arXiv preprint arXiv:2208.04933 (2022)

  24. Srinivas, A., Lin, T.Y., Parmar, N., Shlens, J., Abbeel, P., Vaswani, A.: Bottleneck transformers for visual recognition, pp. 16519–16529 (2021)

    Google Scholar 

  25. Sun, H., et al.: CoSeR: bridging image and language for cognitive super-resolution. arXiv preprint arXiv:2311.16512 (2023)

  26. Tian, C., Zheng, M., Li, B., Zhang, Y., Zhang, S., Zhang, D.: Perceptive self-supervised learning network for noisy image watermark removal. IEEE Trans. Circ. Syst. Video Technol. (2024)

    Google Scholar 

  27. Vaswani, A., et al.: Attention is all you need. Adv. Neural Inf. Process. Syst. 30 (2017)

    Google Scholar 

  28. Wang, X., Yu, K., Dong, C., Loy, C.C.: Recovering realistic texture in image super-resolution by deep spatial feature transform. In: CVPR, pp. 606–615 (2018)

    Google Scholar 

  29. Yang, J., et al.: Focal self-attention for local-global interactions in vision transformers. arXiv preprint arXiv:2107.00641 (2021)

  30. Zhang, L., He, Y., Zhang, Q., Liu, Z., Zhang, X., Xiao, C.: Document image shadow removal guided by color-aware background. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1818–1827 (2023)

    Google Scholar 

  31. Zhang, X., et al.: HiViT: a simpler and more efficient design of hierarchical vision transformer (2023)

    Google Scholar 

  32. Zheng, S., Gupta, G.: Semantic-guided zero-shot learning for low-light image/video enhancement. In: WACV, pp. 581–590 (2022)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shifeng Chen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Liu, Y., Yan, M., Hua, H., Huang, J., Chen, S. (2025). MambaDW: Semantic-Aware Mamba for Document Watermark Removal. In: Magnenat-Thalmann, N., Kim, J., Sheng, B., Deng, Z., Thalmann, D., Li, P. (eds) Advances in Computer Graphics. CGI 2024. Lecture Notes in Computer Science, vol 15338. Springer, Cham. https://doi.org/10.1007/978-3-031-81806-6_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-81806-6_12

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-81805-9

  • Online ISBN: 978-3-031-81806-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics