skip to main content
10.1145/3658644.3690334acmconferencesArticle/Chapter ViewAbstractPublication PagesccsConference Proceedingsconference-collections
research-article

Neural Dehydration: Effective Erasure of Black-box Watermarks from DNNs with Limited Data

Published: 09 December 2024 Publication History

Abstract

To protect the intellectual property of well-trained deep neural networks (DNNs), black-box watermarks, which are embedded into the prediction behavior of DNN models on a set of specially-crafted samples and extracted from suspect models using only API access, have gained increasing popularity in both academy and industry. Watermark robustness is usually implemented against attackers who steal the protected model and obfuscate its parameters for watermark removal. However, current robustness evaluations are primarily performed under moderate attacks or unrealistic settings. Existing removal attacks could only crack a small subset of the mainstream black-box watermarks, and fall short in four key aspects: incomplete removal, reliance on prior knowledge of the watermark, performance degradation, and high dependency on data.
In this paper, we propose a watermark-agnostic removal attack called Neural Dehydration (abbrev. Dehydra), which effectively erases all ten mainstream black-box watermarks from DNNs, with only limited or even no data dependence. In general, our attack pipeline exploits the internals of the protected model to recover and unlearn the watermark message. We further design target class detection and recovered sample splitting algorithms to reduce the utility loss and achieve data-free watermark removal on five of the watermarking schemes. We conduct comprehensive evaluation of Dehydra against ten mainstream black-box watermarks on three benchmark datasets and DNN architectures. Compared with existing removal attacks, Dehydra achieves strong removal effectiveness across all the covered watermarks, preserving at least 90% of the stolen model utility, under the data-limited settings, i.e., less than 2% of the training data or even data-free. Our work reveals the vulnerabilities of existing black-box DNN watermarks in realistic settings, highlighting the urgent need for more robust watermarking techniques. To facilitate future studies, we open-source our code in the following repository: https://github.com/LouisVann/Dehydra.

References

[1]
Martin Abadi, Andy Chu, Ian Goodfellow, H Brendan McMahan, Ilya Mironov, Kunal Talwar, and Li Zhang. 2016. Deep learning with differential privacy. In Proceedings of the 2016 ACM SIGSAC conference on computer and communications security. 308--318.
[2]
Yossi Adi, Carsten Baum, Moustapha Cisse, Benny Pinkas, and Joseph Keshet. 2018. Turning your weakness into a strength: Watermarking deep neural networks by backdooring. In 27th USENIX Security Symposium (USENIX Security 18). 1615--1631.
[3]
William Aiken, Hyoungshick Kim, and Simon S. Woo. 2020. Neural Network Laundering: Removing Black-Box Backdoor Watermarks from Deep Neural Networks. Comput. Secur., Vol. 106 (2020), 102277.
[4]
Marco Allodi, Alberto Broggi, Domenico Giaquinto, Marco Patander, and Antonio Prioletti. 2016. Machine learning in tracking associations with stereo vision and lidar observations for an autonomous vehicle. In 2016 IEEE intelligent vehicles symposium (IV). IEEE, 648--653.
[5]
Mrinal R Bachute and Javed M Subhedar. 2021. Autonomous driving architectures: insights of machine learning and deep learning algorithms. Machine Learning with Applications, Vol. 6 (2021), 100164.
[6]
Philipp Benz, Chaoning Zhang, Adil Karjauv, and In So Kweon. 2021. Robustness may be at odds with fairness: An empirical study on class-wise accuracy. In NeurIPS 2020 Workshop on Pre-registration in Machine Learning. PMLR, 325--342.
[7]
Nicholas Carlini, Chang Liu, Úlfar Erlingsson, Jernej Kos, and Dawn Song. 2019. The secret sharer: Evaluating and testing unintended memorization in neural networks. In 28th USENIX security symposium (USENIX security 19). 267--284.
[8]
Nicholas Carlini, Florian Tramer, Eric Wallace, Matthew Jagielski, Ariel Herbert-Voss, Katherine Lee, Adam Roberts, Tom Brown, Dawn Song, Ulfar Erlingsson, et al. 2021. Extracting training data from large language models. In 30th USENIX Security Symposium (USENIX Security 21). 2633--2650.
[9]
Xuxi Chen, Tianlong Chen, Zhenyu Zhang, and Zhangyang Wang. 2021. You are caught stealing my winning lottery ticket! Making a lottery ticket claim its ownership. Advances in Neural Information Processing Systems, Vol. 34 (2021), 1780--1791.
[10]
Xinyun Chen, Wenxiao Wang, Chris Bender, Yiming Ding, Ruoxi Jia, Bo Li, and Dawn Song. 2021. Refit: a unified watermark removal framework for deep learning systems with limited data. In Proceedings of the 2021 ACM Asia Conference on Computer and Communications Security. 321--335.
[11]
Xinyun Chen, Wenxiao Wang, Yiming Ding, Chris Bender, Ruoxi Jia, Bo Li, and Dawn Song. 2019. Leveraging unlabeled data for watermark removal of deep neural networks. In ICML workshop on Security and Privacy of Machine Learning. 1--6.
[12]
Zhenzhu Chen, Shang Wang, Anmin Fu, Yansong Gao, Shui Yu, and Robert H. Deng. 2022. LinkBreaker: Breaking the Backdoor-Trigger Link in DNNs via Neurons Consistency Check. IEEE Transactions on Information Forensics and Security, Vol. 17 (2022), 2000--2014.
[13]
Gregory Cohen, Saeed Afshar, Jonathan Tapson, and Andre Van Schaik. 2017. EMNIST: Extending MNIST to handwritten letters. In 2017 international joint conference on neural networks (IJCNN). IEEE, 2921--2926.
[14]
The Turing Way Community. 2021. Licensing Machine Learning models. The Turing Way. https://book.the-turing-way.org/reproducible-research/licensing/licensing-ml.html Accessed on July 21, 2024.
[15]
Bita Darvish Rouhani, Huili Chen, and Farinaz Koushanfar. 2019. Deepsigns: An end-to-end watermarking framework for ownership protection of deep neural networks. In Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems. 485--497.
[16]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).
[17]
Hugging Face. 2024. Hugging Face Models. https://huggingface.co/models Retrieved April 10, 2024 from
[18]
Zhijin Ge, Fanhua Shang, Hongying Liu, Yuanyuan Liu, and Xiaosen Wang. 2023. Boosting Adversarial Transferability by Achieving Flat Local Maxima. arXiv preprint arXiv:2306.05225 (2023).
[19]
Tristan Greene. 2018. IBM came up with a watermark for neural networks. https://thenextweb.com/news/ibm-came-up-with-a-watermark-for-neural-networks Retrieved April 10, 2024 from
[20]
Jia Guo and Miodrag Potkonjak. 2018. Watermarking deep neural networks for embedded systems. In 2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD). IEEE, 1--8.
[21]
Shangwei Guo, Tianwei Zhang, Han Qiu, Yi Zeng, Tao Xiang, and Yang Liu. 2020. Fine-tuning Is Not Enough: A Simple yet Effective Watermark Removal Attack for DNN Models. In International Joint Conference on Artificial Intelligence.
[22]
Abdalraouf Hassan and Ausif Mahmood. 2018. Convolutional recurrent deep learning model for sentence classification. Ieee Access, Vol. 6 (2018), 13949--13957.
[23]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.
[24]
Gao Huang, Zhuang Liu, Laurens Van Der Maaten, and Kilian Q Weinberger. 2017. Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4700--4708.
[25]
Andrew Ilyas, Shibani Santurkar, Dimitris Tsipras, Logan Engstrom, Brandon Tran, and Aleksander Madry. 2019. Adversarial examples are not bugs, they are features. Advances in neural information processing systems, Vol. 32 (2019).
[26]
Sergey Ioffe and Christian Szegedy. 2015. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International conference on machine learning. pmlr, 448--456.
[27]
Hengrui Jia, Christopher A Choquette-Choo, Varun Chandrasekaran, and Nicolas Papernot. 2021. Entangled Watermarks as a Defense against Model Extraction. In USENIX Security Symposium. 1937--1954.
[28]
James Kirkpatrick, Razvan Pascanu, Neil Rabinowitz, Joel Veness, Guillaume Desjardins, Andrei A Rusu, Kieran Milan, John Quan, Tiago Ramalho, Agnieszka Grabska-Barwinska, et al. 2017. Overcoming catastrophic forgetting in neural networks. Proceedings of the national academy of sciences, Vol. 114, 13 (2017), 3521--3526.
[29]
Alex Krizhevsky, Geoffrey Hinton, et al. 2009. Learning multiple layers of features from tiny images. (2009).
[30]
Erwan Le Merrer, Patrick Perez, and Gilles Trédan. 2020. Adversarial frontier stitching for remote neural network watermarking. Neural Computing and Applications, Vol. 32 (2020), 9233--9244.
[31]
Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. 1998 a. Gradient-based learning applied to document recognition. Proc. IEEE, Vol. 86, 11 (1998), 2278--2324.
[32]
Yann LeCun, Corinna Cortes, and Christopher J.C. Burges. 1998 b. The MNIST Database of Handwritten Digits. http://yann.lecun.com/exdb/mnist/ (1998).
[33]
Isabell Lederer, Rudolf Mayer, and Andreas Rauber. 2023. Identifying Appropriate Intellectual Property Protection Mechanisms for Machine Learning Models: A Systematization of Watermarking, Fingerprinting, Model Access, and Attacks. arXiv preprint arXiv:2304.11285 (2023).
[34]
Suyoung Lee, Wonho Song, Suman Jana, Meeyoung Cha, and Sooel Son. 2022. Evaluating the robustness of trigger set-based watermarks embedded in deep neural networks. IEEE Transactions on Dependable and Secure Computing (2022).
[35]
Huiying Li, Emily Wenger, Shawn Shan, Ben Y Zhao, and Haitao Zheng. 2019. Piracy resistant watermarks for deep neural networks. arXiv preprint arXiv:1910.01226 (2019).
[36]
Hao Li, Zheng Xu, Gavin Taylor, Christoph Studer, and Tom Goldstein. 2018. Visualizing the loss landscape of neural nets. Advances in neural information processing systems, Vol. 31 (2018).
[37]
Zheng Li, Chengyu Hu, Yang Zhang, and Shanqing Guo. 2019. How to prove your model belongs to you: a blind-watermark based framework to protect intellectual property of DNN. Proceedings of the 35th Annual Computer Security Applications Conference (2019). https://api.semanticscholar.org/CorpusID:207847538
[38]
Kang Liu, Brendan Dolan-Gavitt, and Siddharth Garg. 2018. Fine-pruning: Defending against backdooring attacks on deep neural networks. In Research in Attacks, Intrusions, and Defenses: 21st International Symposium, RAID 2018, Heraklion, Crete, Greece, September 10--12, 2018, Proceedings 21. Springer, 273--294.
[39]
Xuankai Liu, Fengting Li, Bihan Wen, and Qi Li. 2021. Removing backdoor-based watermarks in neural networks with limited data. In 2020 25th International Conference on Pattern Recognition (ICPR). IEEE, 10149--10156.
[40]
Yifan Lu, Wenxuan Li, Mi Zhang, Xudong Pan, and Min Yang. 2024. Neural Dehydration: Effective Erasure of Black-box Watermarks from DNNs with Limited Data. arXiv preprint arXiv:2309.03466 (2024).
[41]
Nils Lukas. 2022. Watermark-Robustness-Toolbox. https://github.com/dnn-security/Watermark-Robustness-Toolbox Retrieved April 10, 2024 from
[42]
Nils Lukas, Edward Jiang, Xinda Li, and Florian Kerschbaum. 2022. Sok: How robust is image classification deep neural network watermarking?. In 2022 IEEE Symposium on Security and Privacy (SP). IEEE, 787--804.
[43]
Xinsong Ma, Zekai Wang, and Weiwei Liu. 2022. On the tradeoff between robustness and fairness. Advances in Neural Information Processing Systems, Vol. 35 (2022), 26230--26241.
[44]
Alexander Mordvintsev, Christopher Olah, and Mike Tyka. 2015. Inceptionism: Going deeper into neural networks. (2015).
[45]
Ryota Namba and Jun Sakuma. 2019. Robust watermarking of neural network with exponential weighting. In Proceedings of the 2019 ACM Asia Conference on Computer and Communications Security. 228--240.
[46]
Omkar M. Parkhi, Andrea Vedaldi, and Andrew Zisserman. 2015. Deep Face Recognition. In British Machine Vision Conference. https://api.semanticscholar.org/CorpusID:4637184
[47]
Masoumeh Shafieinejad, Jiaqi Wang, Nils Lukas, and Florian Kerschbaum. 2019. On the Robustness of Backdoor-based Watermarking in Deep Neural Networks. Proceedings of the 2021 ACM Workshop on Information Hiding and Multimedia Security (2019).
[48]
Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).
[49]
Shichang Sun, Haoqi Wang, Mingfu Xue, Yushu Zhang, Jian Wang, and Weiqiang Liu. 2021. Detect and Remove Watermark in Deep Neural Networks via Generative Adversarial Networks. In Information Security - 24th International Conference, ISC 2021, Virtual Event, November 10--12, 2021, Proceedings (Lecture Notes in Computer Science, Vol. 13118), Joseph K. Liu, Sokratis K. Katsikas, Weizhi Meng, Willy Susilo, and Rolly Intan (Eds.). Springer, 341--357.
[50]
Zhichuang Sun, Ruimin Sun, Long Lu, and Alan Mislove. 2021. Mind Your Weight(s): A Large-scale Study on Insufficient Machine Learning Model Protection in Mobile Apps. In 30th USENIX Security Symposium, USENIX Security 2021, August 11--13, 2021, Michael D. Bailey and Rachel Greenstadt (Eds.). USENIX Association, 1955--1972. https://www.usenix.org/conference/usenixsecurity21/presentation/sun-zhichuang
[51]
Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. 2016. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2818--2826.
[52]
Qi Tian, Kun Kuang, Kelu Jiang, Fei Wu, and Yisen Wang. 2021. Analysis and applications of class-wise robustness in adversarial training. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. 1561--1570.
[53]
Florian Tramèr, Fan Zhang, Ari Juels, Michael K Reiter, and Thomas Ristenpart. 2016. Stealing machine learning models via prediction APIs. In 25th USENIX security symposium (USENIX Security 16). 601--618.
[54]
Yusuke Uchida, Yuki Nagai, Shigeyuki Sakazawa, and Shin'ichi Satoh. 2017. Embedding watermarks into deep neural networks. In Proceedings of the 2017 ACM on international conference on multimedia retrieval. 269--277.
[55]
Laurens Van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of machine learning research, Vol. 9, 11 (2008).
[56]
Bolun Wang, Yuanshun Yao, Shawn Shan, Huiying Li, Bimal Viswanath, Haitao Zheng, and Ben Y Zhao. 2019. Neural cleanse: Identifying and mitigating backdoor attacks in neural networks. In 2019 IEEE Symposium on Security and Privacy (SP). IEEE, 707--723.
[57]
Tianhao Wang and Florian Kerschbaum. 2021. Riga: Covert and robust white-box watermarking of deep neural networks. In Proceedings of the Web Conference 2021. 993--1004.
[58]
Han Xu, Xiaorui Liu, Yaxin Li, Anil Jain, and Jiliang Tang. 2021. To be robust or to be fair: Towards fairness in adversarial training. In International conference on machine learning. PMLR, 11492--11501.
[59]
Yifan Yan, Xudong Pan, Mi Zhang, and Min Yang. 2023. Rethinking White-Box Watermarks on Deep Learning Models under Neural Structural Obfuscation. In 32th USENIX security symposium (USENIX Security 23).
[60]
Ziqi Yang, Hung Dang, and Ee-Chien Chang. 2019. Effectiveness of Distillation Attack and Countermeasure on Neural Network Watermarking. ArXiv, Vol. abs/1906.06046 (2019).
[61]
Hongxu Yin, Pavlo Molchanov, Jose M Alvarez, Zhizhong Li, Arun Mallya, Derek Hoiem, Niraj K Jha, and Jan Kautz. 2020. Dreaming to distill: Data-free knowledge transfer via deepinversion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 8715--8724.
[62]
Youngsik Yoon, Jinhwan Nam, Hyojeong Yun, Dongwoo Kim, and Jungseul Ok. 2022. Few-Shot Unlearning by Model Inversion. arXiv preprint arXiv:2205.15567 (2022).
[63]
Jialong Zhang, Zhongshu Gu, Jiyong Jang, Hui Wu, Marc Ph Stoecklin, Heqing Huang, and Ian Molloy. 2018. Protecting intellectual property of deep neural networks with watermarking. In Proceedings of the 2018 on Asia Conference on Computer and Communications Security. 159--172.
[64]
Jianpeng Zhang, Yutong Xie, Qi Wu, and Yong Xia. 2019. Medical image classification using synergic deep learning. Medical image analysis, Vol. 54 (2019), 10--19.
[65]
Qi Zhong, Leo Yu Zhang, Shengshan Hu, Longxiang Gao, Jun Zhang, and Yang Xiang. 2022. Attention Distraction: Watermark Removal Through Continual Learning with Selective Forgetting. 2022 IEEE International Conference on Multimedia and Expo (ICME) (2022), 1--6.

Index Terms

  1. Neural Dehydration: Effective Erasure of Black-box Watermarks from DNNs with Limited Data

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      CCS '24: Proceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security
      December 2024
      5188 pages
      ISBN:9798400706363
      DOI:10.1145/3658644
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 09 December 2024

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. model watermarking
      2. removal attack
      3. robustness

      Qualifiers

      • Research-article

      Funding Sources

      Conference

      CCS '24
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 1,261 of 6,999 submissions, 18%

      Upcoming Conference

      CCS '25

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 117
        Total Downloads
      • Downloads (Last 12 months)117
      • Downloads (Last 6 weeks)27
      Reflects downloads up to 14 Feb 2025

      Other Metrics

      Citations

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media