Abstract
Watermark removal is essential for Optical Character Recognition and digital document reconstruction. Document images differ significantly from natural scene images, necessitating tailored image processing methods. Traditional Transformer-based methods, though effective, face scalability issues due to the quadratic complexity of attention computations. Inspired by Mamba, which scales linearly with context length, we introduce a semantically-guided Mamba approach, MambaDW. This two-stage method first removes most watermarks and then employs semantic-guided 2D selective scanning for precise enhancement and global information fusion. We also incorporate a self-supervised loss to enhance generalizability on unlabeled real data. Our experiments show MambaDW’s effectiveness in document watermark removal.
This work is supported by Shenzhen Science and Technology Innovation Commission (JSGG20220831105002004).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Chen, S., et al.: AdaptFormer: adapting vision transformers for scalable visual recognition. 35, 16664–16678 (2022)
Conde, M.V., Geigle, G., Timofte, R.: High-quality image restoration following human instructions. arXiv preprint arXiv:2401.16468 (2024)
Dong, X., et al.: Cswin transformer: a general vision transformer backbone with cross-shaped windows, pp. 12124–12134 (2022)
Dosovitskiy, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale (2021)
Dosovitskiy, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)
Fan, M., Wang, W., Yang, W., Liu, J.: Integrating semantic segmentation and Retinex model for low-light image enhancement. In: ACMMM, pp. 2317–2325 (2020)
Gu, A., Dao, T.: Mamba: linear-time sequence modeling with selective state spaces. arXiv preprint arXiv:2312.00752 (2023)
Gu, A., Goel, K., Ré, C.: Efficiently modeling long sequences with structured state spaces. arXiv preprint arXiv:2111.00396 (2021)
Guo, H., Li, J., Dai, T., Ouyang, Z., Ren, X., Xia, S.T.: Mambair: a simple baseline for image restoration with state-space model. arXiv preprint arXiv:2402.15648 (2024)
Huang, J., Liu, Y., Chen, S.: Bootstrap diffusion model curve estimation for high resolution low-light image enhancement. In: Pacific Rim International Conference on Artificial Intelligence, pp. 67–80. Springer (2023)
Huang, Y., et al.: WaveDM: wavelet-based diffusion models for image restoration. IEEE Trans. Multimedia (2024)
Li, K., et al.: VideoMamba: state space model for efficient video understanding. arXiv preprint arXiv:2403.06977 (2024)
Li, X., Chen, C., Zhou, S., Lin, X., Zuo, W., Zhang, L.: Blind face restoration via deep multi-scale component dictionaries. In: ECCV, pp. 399–415. Springer (2020)
Li, Y., Chang, Y., Yu, C., Yan, L.: Close the loop: a unified bottom-up and top-down paradigm for joint image deraining and segmentation. In: AAAI (2022)
Liang, D., et al.: Semantically contrastive learning for low-light image enhancement. In: AAAI, pp. 1555–1563 (2022)
Liu, J., et al.: Swin-UMamba: Mamba-based UNet with ImageNet-based pretraining. arXiv preprint arXiv:2402.03302 (2024)
Liu, Y., et al.: VMamba: visual state space model. arXiv preprint arXiv:2401.10166 (2024)
Ma, J., Li, F., Wang, B.: U-mamba: enhancing long-range dependency for biomedical image segmentation. arXiv preprint arXiv:2401.04722 (2024)
Ma, X., Zhang, X., Pun, M.O.: RS3Mamba: visual state space model for remote sensing images semantic segmentation. arXiv preprint arXiv:2404.02457 (2024)
Pei, X., Huang, T., Xu, C.: EfficientVMamba: atrous selective scan for light weight visual mamba. arXiv preprint arXiv:2403.09977 (2024)
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18, pp. 234–241. Springer (2015)
Shang, S., Shan, Z., Liu, G., Zhang, J.: ResDiff: combining CNN and diffusion model for image super-resolution. arXiv preprint arXiv:2303.08714 (2023)
Smith, J.T., Warrington, A., Linderman, S.W.: Simplified state space layers for sequence modeling. arXiv preprint arXiv:2208.04933 (2022)
Srinivas, A., Lin, T.Y., Parmar, N., Shlens, J., Abbeel, P., Vaswani, A.: Bottleneck transformers for visual recognition, pp. 16519–16529 (2021)
Sun, H., et al.: CoSeR: bridging image and language for cognitive super-resolution. arXiv preprint arXiv:2311.16512 (2023)
Tian, C., Zheng, M., Li, B., Zhang, Y., Zhang, S., Zhang, D.: Perceptive self-supervised learning network for noisy image watermark removal. IEEE Trans. Circ. Syst. Video Technol. (2024)
Vaswani, A., et al.: Attention is all you need. Adv. Neural Inf. Process. Syst. 30 (2017)
Wang, X., Yu, K., Dong, C., Loy, C.C.: Recovering realistic texture in image super-resolution by deep spatial feature transform. In: CVPR, pp. 606–615 (2018)
Yang, J., et al.: Focal self-attention for local-global interactions in vision transformers. arXiv preprint arXiv:2107.00641 (2021)
Zhang, L., He, Y., Zhang, Q., Liu, Z., Zhang, X., Xiao, C.: Document image shadow removal guided by color-aware background. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1818–1827 (2023)
Zhang, X., et al.: HiViT: a simpler and more efficient design of hierarchical vision transformer (2023)
Zheng, S., Gupta, G.: Semantic-guided zero-shot learning for low-light image/video enhancement. In: WACV, pp. 581–590 (2022)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Liu, Y., Yan, M., Hua, H., Huang, J., Chen, S. (2025). MambaDW: Semantic-Aware Mamba for Document Watermark Removal. In: Magnenat-Thalmann, N., Kim, J., Sheng, B., Deng, Z., Thalmann, D., Li, P. (eds) Advances in Computer Graphics. CGI 2024. Lecture Notes in Computer Science, vol 15338. Springer, Cham. https://doi.org/10.1007/978-3-031-81806-6_12
Download citation
DOI: https://doi.org/10.1007/978-3-031-81806-6_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-81805-9
Online ISBN: 978-3-031-81806-6
eBook Packages: Computer ScienceComputer Science (R0)