ABSTRACT
How can we gauge the privacy provided by machine learning algorithms? Models trained with differential privacy (DP) provably limit information leakage, but the question remains open for non-DP models. In this talk, we present multiple techniques for membership inference, which estimates if a given data sample is in the training set of a model. In particular, we introduce a watermarking-based method that allows for a very fast verification of data usage in a model: this technique creates marks called radioactive that propagates from the data to the model during training. This watermark is barely visible to the naked eye and allows data tracing even when the radioactive data represents only 1% of the training set.
Index Terms
- Tracing Data through Learning with Watermarking
Recommendations
Lossless data embedding--new paradigm in digital watermarking
Emerging applications of multimedia data hidingOne common drawback of virtually all current data embedding methods is the fact that the original image is inevitably distorted due to data embedding itself. This distortion typically cannot be removed completely due to quantization, bit-replacement, or ...
Data Privacy Examination against Semi-Supervised Learning
ASIA CCS '23: Proceedings of the 2023 ACM Asia Conference on Computer and Communications SecuritySemi-supervised learning, which learns with only a small amount of labeled data while collecting voluminous unlabeled data to aid its training, has achieved promising performance lately, but it also raises a serious privacy concern: Whether a user’s ...
Visible watermarking with reversibility of multimedia images for ownership declarations
Digital watermarking technology is primarily the joining of the rightful owner of the protected media. Once the media are suspected to be illegally used, an open algorithm can be used to extract the digital watermark for the purpose of showing the media'...
Comments