skip to main content
10.1145/3674399.3674446acmotherconferencesArticle/Chapter ViewAbstractPublication Pagesacm-turcConference Proceedingsconference-collections
research-article

Tracking the Leaker: An Encodable Watermarking Method for Dataset Intellectual Property Protection

Published: 30 July 2024 Publication History

Abstract

Presently, numerous enterprises provide machine learning cloud services. However, the service provider may exploit user-uploaded data for unauthorized model retraining or illicit collection of user data for commercial model development. This study introduces a traceable dataset watermarking technique designed to ascertain the trustworthiness of third-party providers offering machine learning cloud services. In the event of a data breach, the source can be traced back to the suspicious third-party responsible for data leakage. Specifically, we propose a method that employs the clean-label backdoor attack framework to infer whether a third-party model is trained using user data. A watermark, associated with the encoding and designed as a trigger, is injected into the dataset through a trained autoencoder. Experimental evaluation on three datasets proves the effectiveness of the proposed method, yielding over 93% accuracy on average under normal conditions. A series of pruning and fine-tuning attacks were carried out on the method, with the results indicating that these attacks have a minimal impact and confirming the method’s robustness.

References

[1]
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. ImageNet: A large-scale hierarchical image database. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition. 248–255.
[2]
Li Deng. 2012. The MNIST Database of Handwritten Digit Images for Machine Learning Research. IEEE Signal Process. Mag. 29, 6 (2012), 141–142.
[3]
Ian J. Goodfellow, Yoshua Bengio, and Aaron C. Courville. 2016. Deep Learning. MIT Press.
[4]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In IEEE Conference on Computer Vision and Pattern Recognition. 770–778.
[5]
Robert V Hogg, Joseph W McKean, and Allen T Craig. 2019. Introduction to mathematical statistics. Pearson.
[6]
Forrest N. Iandola, Matthew W. Moskewicz, Sergey Karayev, Ross B. Girshick, Trevor Darrell, and Kurt Keutzer. 2014. DenseNet: Implementing Efficient ConvNet Descriptor Pyramids. CoRR abs/1404.1869 (2014).
[7]
Alex Krizhevsky, Geoffrey Hinton, 2009. Learning multiple layers of features from tiny images. (2009).
[8]
Yiming Li, Mingyan Zhu, Xue Yang, Yong Jiang, Tao Wei, and Shu-Tao Xia. 2023. Black-Box Dataset Ownership Verification via Backdoor Watermarking. IEEE Transactions on Information Forensics and Security 18 (2023), 2318–2332.
[9]
Kang Liu, Brendan Dolan-Gavitt, and Siddharth Garg. 2018. Fine-Pruning: Defending Against Backdooring Attacks on Deep Neural Networks. In Research in Attacks, Intrusions, and Defenses - 21st International Symposium, Vol. 11050. 273–294.
[10]
Yingqi Liu, Shiqing Ma, Yousra Aafer, Wen-Chuan Lee, Juan Zhai, Weihang Wang, and Xiangyu Zhang. 2018. Trojaning Attack on Neural Networks. In 25th Annual Network and Distributed System Security Symposium. 1–15.
[11]
Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. 2018. Towards Deep Learning Models Resistant to Adversarial Attacks. In 6th International Conference on Learning Representations. 1–23.
[12]
Tom M. Mitchell. 1997. Machine learning, International Edition. McGraw-Hill.
[13]
K Perlin. 2001. Real-Time Shading SIGGRAPH Course Notes, Chapter 2: Noise Hardware. (2001).
[14]
Mauro Ribeiro, Katarina Grolinger, and Miriam A. M. Capretz. 2015. MLaaS: Machine Learning as a Service. In 14th IEEE International Conference on Machine Learning and Applications. 896–902.
[15]
Matthew Tancik, Ben Mildenhall, and Ren Ng. 2020. StegaStamp: Invisible Hyperlinks in Physical Photographs. In Conference on Computer Vision and Pattern Recognition. 2114–2123.
[16]
Ruixiang Tang, Qizhang Feng, Ninghao Liu, Fan Yang, and Xia Hu. 2023. Did You Train on My Dataset? Towards Public Dataset Protection with Clean-Label Backdoor Watermarking. ACM SIGKDD Explorations Newsletter 25, 1 (2023), 43–53.
[17]
Alexander Turner, Dimitris Tsipras, and Aleksander Madry. 2018. Clean-Label Backdoor Attacks. (2018), 1–21.
[18]
Ning Yu, Vladislav Skripniuk, Sahar Abdelnabi, and Mario Fritz. 2021. Artificial Fingerprinting for Generative Models: Rooting Deepfake Attribution in Training Data. In IEEE International Conference on Computer Vision. 14428–14437.

Index Terms

  1. Tracking the Leaker: An Encodable Watermarking Method for Dataset Intellectual Property Protection

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image ACM Other conferences
        ACM-TURC '24: Proceedings of the ACM Turing Award Celebration Conference - China 2024
        July 2024
        261 pages
        ISBN:9798400710117
        DOI:10.1145/3674399
        This work is licensed under a Creative Commons Attribution International 4.0 License.

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 30 July 2024

        Check for updates

        Author Tags

        1. Backdoor
        2. Data Security
        3. Dataset Watermarking
        4. Deep Neural Networks
        5. Intellectual Property Protection

        Qualifiers

        • Research-article
        • Research
        • Refereed limited

        Conference

        ACM-TURC '24

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • 0
          Total Citations
        • 51
          Total Downloads
        • Downloads (Last 12 months)51
        • Downloads (Last 6 weeks)5
        Reflects downloads up to 27 Feb 2025

        Other Metrics

        Citations

        View Options

        Login options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format.

        HTML Format

        Figures

        Tables

        Media

        Share

        Share

        Share this Publication link

        Share on social media