research-article

Neural Dehydration: Effective Erasure of Black-box Watermarks from DNNs with Limited Data

Authors:

Min YangAuthors Info & Claims

CCS '24: Proceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security

Pages 675 - 689

https://doi.org/10.1145/3658644.3690334

Published: 09 December 2024 Publication History

Get Access

Abstract

To protect the intellectual property of well-trained deep neural networks (DNNs), black-box watermarks, which are embedded into the prediction behavior of DNN models on a set of specially-crafted samples and extracted from suspect models using only API access, have gained increasing popularity in both academy and industry. Watermark robustness is usually implemented against attackers who steal the protected model and obfuscate its parameters for watermark removal. However, current robustness evaluations are primarily performed under moderate attacks or unrealistic settings. Existing removal attacks could only crack a small subset of the mainstream black-box watermarks, and fall short in four key aspects: incomplete removal, reliance on prior knowledge of the watermark, performance degradation, and high dependency on data.

In this paper, we propose a watermark-agnostic removal attack called Neural Dehydration (abbrev. Dehydra), which effectively erases all ten mainstream black-box watermarks from DNNs, with only limited or even no data dependence. In general, our attack pipeline exploits the internals of the protected model to recover and unlearn the watermark message. We further design target class detection and recovered sample splitting algorithms to reduce the utility loss and achieve data-free watermark removal on five of the watermarking schemes. We conduct comprehensive evaluation of Dehydra against ten mainstream black-box watermarks on three benchmark datasets and DNN architectures. Compared with existing removal attacks, Dehydra achieves strong removal effectiveness across all the covered watermarks, preserving at least 90% of the stolen model utility, under the data-limited settings, i.e., less than 2% of the training data or even data-free. Our work reveals the vulnerabilities of existing black-box DNN watermarks in realistic settings, highlighting the urgent need for more robust watermarking techniques. To facilitate future studies, we open-source our code in the following repository: https://github.com/LouisVann/Dehydra.

References

[1]

Martin Abadi, Andy Chu, Ian Goodfellow, H Brendan McMahan, Ilya Mironov, Kunal Talwar, and Li Zhang. 2016. Deep learning with differential privacy. In Proceedings of the 2016 ACM SIGSAC conference on computer and communications security. 308--318.

Abstract

References

Index Terms

Recommendations

Reversible Data Embedding for Tamper-Proof Watermarks

An adaptive HVS based video watermarking scheme for multiple watermarks using BAM neural networks and fuzzy inference system

A New Robust and Reversible Watermarking Technique Based on Erasure Code

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Funding Sources

Conference

Acceptance Rates

Upcoming Conference

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Login options

Full Access

View options

PDF

eReader

Share

Share this Publication link

Share on social media

Affiliations