research-article

Tracking the Leaker: An Encodable Watermarking Method for Dataset Intellectual Property Protection

Authors:

Weiqiang LiuAuthors Info & Claims

ACM-TURC '24: Proceedings of the ACM Turing Award Celebration Conference - China 2024

Pages 114 - 119

https://doi.org/10.1145/3674399.3674446

Published: 30 July 2024 Publication History

Abstract

Presently, numerous enterprises provide machine learning cloud services. However, the service provider may exploit user-uploaded data for unauthorized model retraining or illicit collection of user data for commercial model development. This study introduces a traceable dataset watermarking technique designed to ascertain the trustworthiness of third-party providers offering machine learning cloud services. In the event of a data breach, the source can be traced back to the suspicious third-party responsible for data leakage. Specifically, we propose a method that employs the clean-label backdoor attack framework to infer whether a third-party model is trained using user data. A watermark, associated with the encoding and designed as a trigger, is injected into the dataset through a trained autoencoder. Experimental evaluation on three datasets proves the effectiveness of the proposed method, yielding over 93% accuracy on average under normal conditions. A series of pruning and fine-tuning attacks were carried out on the method, with the results indicating that these attacks have a minimal impact and confirming the method’s robustness.

References

[1]

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. ImageNet: A large-scale hierarchical image database. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition. 248–255.

[2]

Li Deng. 2012. The MNIST Database of Handwritten Digit Images for Machine Learning Research. IEEE Signal Process. Mag. 29, 6 (2012), 141–142.

[3]

Ian J. Goodfellow, Yoshua Bengio, and Aaron C. Courville. 2016. Deep Learning. MIT Press.

Digital Library

[4]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In IEEE Conference on Computer Vision and Pattern Recognition. 770–778.

[5]

Robert V Hogg, Joseph W McKean, and Allen T Craig. 2019. Introduction to mathematical statistics. Pearson.

[6]

Forrest N. Iandola, Matthew W. Moskewicz, Sergey Karayev, Ross B. Girshick, Trevor Darrell, and Kurt Keutzer. 2014. DenseNet: Implementing Efficient ConvNet Descriptor Pyramids. CoRR abs/1404.1869 (2014).

[7]

Alex Krizhevsky, Geoffrey Hinton, 2009. Learning multiple layers of features from tiny images. (2009).

[8]

Yiming Li, Mingyan Zhu, Xue Yang, Yong Jiang, Tao Wei, and Shu-Tao Xia. 2023. Black-Box Dataset Ownership Verification via Backdoor Watermarking. IEEE Transactions on Information Forensics and Security 18 (2023), 2318–2332.

Digital Library

[9]

Kang Liu, Brendan Dolan-Gavitt, and Siddharth Garg. 2018. Fine-Pruning: Defending Against Backdooring Attacks on Deep Neural Networks. In Research in Attacks, Intrusions, and Defenses - 21st International Symposium, Vol. 11050. 273–294.

[10]

Yingqi Liu, Shiqing Ma, Yousra Aafer, Wen-Chuan Lee, Juan Zhai, Weihang Wang, and Xiangyu Zhang. 2018. Trojaning Attack on Neural Networks. In 25th Annual Network and Distributed System Security Symposium. 1–15.

[11]

Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. 2018. Towards Deep Learning Models Resistant to Adversarial Attacks. In 6th International Conference on Learning Representations. 1–23.

[12]

Tom M. Mitchell. 1997. Machine learning, International Edition. McGraw-Hill.

[13]

K Perlin. 2001. Real-Time Shading SIGGRAPH Course Notes, Chapter 2: Noise Hardware. (2001).

[14]

Mauro Ribeiro, Katarina Grolinger, and Miriam A. M. Capretz. 2015. MLaaS: Machine Learning as a Service. In 14th IEEE International Conference on Machine Learning and Applications. 896–902.

[15]

Matthew Tancik, Ben Mildenhall, and Ren Ng. 2020. StegaStamp: Invisible Hyperlinks in Physical Photographs. In Conference on Computer Vision and Pattern Recognition. 2114–2123.

[16]

Ruixiang Tang, Qizhang Feng, Ninghao Liu, Fan Yang, and Xia Hu. 2023. Did You Train on My Dataset? Towards Public Dataset Protection with Clean-Label Backdoor Watermarking. ACM SIGKDD Explorations Newsletter 25, 1 (2023), 43–53.

Digital Library

[17]

Alexander Turner, Dimitris Tsipras, and Aleksander Madry. 2018. Clean-Label Backdoor Attacks. (2018), 1–21.

[18]

Ning Yu, Vladislav Skripniuk, Sahar Abdelnabi, and Mario Fritz. 2021. Artificial Fingerprinting for Generative Models: Rooting Deepfake Attribution in Training Data. In IEEE International Conference on Computer Vision. 14428–14437.

Index Terms

Tracking the Leaker: An Encodable Watermarking Method for Dataset Intellectual Property Protection
1. Computing methodologies
  1. Machine learning
2. Security and privacy
  1. Security services
    1. Authorization
    2. Digital rights management

Recommendations

Did You Train on My Dataset? Towards Public Dataset Protection with CleanLabel Backdoor Watermarking

The huge supporting training data on the Internet has been a key factor in the success of deep learning models. However, this abundance of public-available data also raises concerns about the unauthorized exploitation of datasets for commercial purposes, ...
Active intellectual property protection for deep neural networks through stealthy backdoor and users’ identities authentication
Abstract
Recently, the intellectual properties (IP) protection of deep neural networks (DNN) has attracted serious concerns. A number of DNN copyright protection methods have been proposed. However, most of the existing DNN watermarking methods can only ...
DNN Intellectual Property Protection: Taxonomy, Attacks and Evaluations (Invited Paper)
GLSVLSI '21: Proceedings of the 2021 Great Lakes Symposium on VLSI

Since the training of deep neural networks (DNN) models requires massive training data, time and expensive hardware resources, the trained DNN model is oftentimes regarded as an intellectual property (IP). Recent researches show that DNN is vulnerable to ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

ACM-TURC '24: Proceedings of the ACM Turing Award Celebration Conference - China 2024

July 2024

261 pages

ISBN:9798400710117

DOI:10.1145/3674399

Copyright © 2024 Owner/Author.

This work is licensed under a Creative Commons Attribution International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 July 2024

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

ACM-TURC '24

ACM-TURC '24: ACM Turing Award Celebration Conference 2024

July 5 - 7, 2024

Changsha, China

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
51
Total Downloads

Downloads (Last 12 months)51
Downloads (Last 6 weeks)5

Reflects downloads up to 27 Feb 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Table of Conten