MambaDW: Semantic-Aware Mamba for Document Watermark Removal

Liu, Yifan; Yan, Mingfu; Hua, He; Huang, Jiancheng; Chen, Shifeng

doi:10.1007/978-3-031-81806-6_12

Yifan Liu^13,14,
Mingfu Yan¹³,
He Hua¹⁵,
Jiancheng Huang¹³ &
…
Shifeng Chen¹³

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15338))

Included in the following conference series:

Computer Graphics International Conference

144 Accesses

Abstract

Watermark removal is essential for Optical Character Recognition and digital document reconstruction. Document images differ significantly from natural scene images, necessitating tailored image processing methods. Traditional Transformer-based methods, though effective, face scalability issues due to the quadratic complexity of attention computations. Inspired by Mamba, which scales linearly with context length, we introduce a semantically-guided Mamba approach, MambaDW. This two-stage method first removes most watermarks and then employs semantic-guided 2D selective scanning for precise enhancement and global information fusion. We also incorporate a self-supervised loss to enhance generalizability on unlabeled real data. Our experiments show MambaDW’s effectiveness in document watermark removal.

This work is supported by Shenzhen Science and Technology Innovation Commission (JSGG20220831105002004).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 74.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Chen, S., et al.: AdaptFormer: adapting vision transformers for scalable visual recognition. 35, 16664–16678 (2022)
Google Scholar
Conde, M.V., Geigle, G., Timofte, R.: High-quality image restoration following human instructions. arXiv preprint arXiv:2401.16468 (2024)
Dong, X., et al.: Cswin transformer: a general vision transformer backbone with cross-shaped windows, pp. 12124–12134 (2022)
Google Scholar
Dosovitskiy, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale (2021)
Google Scholar
Dosovitskiy, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)
Fan, M., Wang, W., Yang, W., Liu, J.: Integrating semantic segmentation and Retinex model for low-light image enhancement. In: ACMMM, pp. 2317–2325 (2020)
Google Scholar
Gu, A., Dao, T.: Mamba: linear-time sequence modeling with selective state spaces. arXiv preprint arXiv:2312.00752 (2023)
Gu, A., Goel, K., Ré, C.: Efficiently modeling long sequences with structured state spaces. arXiv preprint arXiv:2111.00396 (2021)
Guo, H., Li, J., Dai, T., Ouyang, Z., Ren, X., Xia, S.T.: Mambair: a simple baseline for image restoration with state-space model. arXiv preprint arXiv:2402.15648 (2024)
Huang, J., Liu, Y., Chen, S.: Bootstrap diffusion model curve estimation for high resolution low-light image enhancement. In: Pacific Rim International Conference on Artificial Intelligence, pp. 67–80. Springer (2023)
Google Scholar
Huang, Y., et al.: WaveDM: wavelet-based diffusion models for image restoration. IEEE Trans. Multimedia (2024)
Google Scholar
Li, K., et al.: VideoMamba: state space model for efficient video understanding. arXiv preprint arXiv:2403.06977 (2024)
Li, X., Chen, C., Zhou, S., Lin, X., Zuo, W., Zhang, L.: Blind face restoration via deep multi-scale component dictionaries. In: ECCV, pp. 399–415. Springer (2020)
Google Scholar
Li, Y., Chang, Y., Yu, C., Yan, L.: Close the loop: a unified bottom-up and top-down paradigm for joint image deraining and segmentation. In: AAAI (2022)
Google Scholar
Liang, D., et al.: Semantically contrastive learning for low-light image enhancement. In: AAAI, pp. 1555–1563 (2022)
Google Scholar
Liu, J., et al.: Swin-UMamba: Mamba-based UNet with ImageNet-based pretraining. arXiv preprint arXiv:2402.03302 (2024)
Liu, Y., et al.: VMamba: visual state space model. arXiv preprint arXiv:2401.10166 (2024)
Ma, J., Li, F., Wang, B.: U-mamba: enhancing long-range dependency for biomedical image segmentation. arXiv preprint arXiv:2401.04722 (2024)
Ma, X., Zhang, X., Pun, M.O.: RS3Mamba: visual state space model for remote sensing images semantic segmentation. arXiv preprint arXiv:2404.02457 (2024)
Pei, X., Huang, T., Xu, C.: EfficientVMamba: atrous selective scan for light weight visual mamba. arXiv preprint arXiv:2403.09977 (2024)
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18, pp. 234–241. Springer (2015)
Google Scholar
Shang, S., Shan, Z., Liu, G., Zhang, J.: ResDiff: combining CNN and diffusion model for image super-resolution. arXiv preprint arXiv:2303.08714 (2023)
Smith, J.T., Warrington, A., Linderman, S.W.: Simplified state space layers for sequence modeling. arXiv preprint arXiv:2208.04933 (2022)
Srinivas, A., Lin, T.Y., Parmar, N., Shlens, J., Abbeel, P., Vaswani, A.: Bottleneck transformers for visual recognition, pp. 16519–16529 (2021)
Google Scholar
Sun, H., et al.: CoSeR: bridging image and language for cognitive super-resolution. arXiv preprint arXiv:2311.16512 (2023)
Tian, C., Zheng, M., Li, B., Zhang, Y., Zhang, S., Zhang, D.: Perceptive self-supervised learning network for noisy image watermark removal. IEEE Trans. Circ. Syst. Video Technol. (2024)
Google Scholar
Vaswani, A., et al.: Attention is all you need. Adv. Neural Inf. Process. Syst. 30 (2017)
Google Scholar
Wang, X., Yu, K., Dong, C., Loy, C.C.: Recovering realistic texture in image super-resolution by deep spatial feature transform. In: CVPR, pp. 606–615 (2018)
Google Scholar
Yang, J., et al.: Focal self-attention for local-global interactions in vision transformers. arXiv preprint arXiv:2107.00641 (2021)
Zhang, L., He, Y., Zhang, Q., Liu, Z., Zhang, X., Xiao, C.: Document image shadow removal guided by color-aware background. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1818–1827 (2023)
Google Scholar
Zhang, X., et al.: HiViT: a simpler and more efficient design of hierarchical vision transformer (2023)
Google Scholar
Zheng, S., Gupta, G.: Semantic-guided zero-shot learning for low-light image/video enhancement. In: WACV, pp. 581–590 (2022)
Google Scholar

Download references

Author information

Authors and Affiliations

Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
Yifan Liu, Mingfu Yan, Jiancheng Huang & Shifeng Chen
Southern University of Science and Technology, Shenzhen, China
Yifan Liu
Northwestern Polytechnical University, Xi’an, China
He Hua

Authors

Yifan Liu
View author publications
You can also search for this author in PubMed Google Scholar
Mingfu Yan
View author publications
You can also search for this author in PubMed Google Scholar
He Hua
View author publications
You can also search for this author in PubMed Google Scholar
Jiancheng Huang
View author publications
You can also search for this author in PubMed Google Scholar
Shifeng Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shifeng Chen .

Editor information

Editors and Affiliations

University of Geneva, Geneva, Switzerland
Nadia Magnenat-Thalmann
The University of Sydney, Sydney, NSW, Australia
Jinman Kim
Shanghai Jiao Tong University, Shanghai, China
Bin Sheng
University of Houston, Houston, TX, USA
Zhigang Deng
EPFL, Lausanne, Switzerland
Daniel Thalmann
The Hong Kong Polytechnic University, Kowloon, Hong Kong
Ping Li

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liu, Y., Yan, M., Hua, H., Huang, J., Chen, S. (2025). MambaDW: Semantic-Aware Mamba for Document Watermark Removal. In: Magnenat-Thalmann, N., Kim, J., Sheng, B., Deng, Z., Thalmann, D., Li, P. (eds) Advances in Computer Graphics. CGI 2024. Lecture Notes in Computer Science, vol 15338. Springer, Cham. https://doi.org/10.1007/978-3-031-81806-6_12

Download citation

DOI: https://doi.org/10.1007/978-3-031-81806-6_12
Published: 27 February 2025
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-81805-9
Online ISBN: 978-3-031-81806-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

MambaDW: Semantic-Aware Mamba for Document Watermark Removal