In order to examine the resilience and susceptibility of different Deep Neural Network-based watermarking methods, we expose the watermarked images to unseen benign and malicious transformations and evaluate the retrieved BRA. The results highlighted in bold are the top performances.
6.1 Benign Transform
For benign transformations, we applied different levels of Gaussian blur, JPEG compression, and different Instagram filters, namely Aden, Brooklyn, and Clarendon. We utilized the open-source Pilgram library [
2] to implement various Instagram filters, including Aden, Brooklyn, and Clarendon, to test their impact on our proposed watermarking scheme. This Pilgram [
2] library provides a range of image processing filters, including Instagram-style filters.
Figure 4(a) shows the illustration of Gaussian blur on invisible watermarked images using different kernel sizes and
\(\sigma\) values.
Figure 4(b) shows the illustration of JPEG compression on invisible watermarked images at different compression rates varying from
\(25\%\to 75\)%, respectively. For
Figure 4, the watermarked samples are generated using our U-Net+C+
\(A_{adv}\) model trained only on benign transformations.
Table 3 tabulates the BRA in
\(\%\) for different watermarking techniques after applying Gaussian blur as a benign transform at different kernel sizes and
\(\sigma\) values.
Table 4 tabulates BRA in
\(\%\) for different watermarking techniques after applying JPEG compression (benign transform) at compression rates of 25
\(\%\), 50
\(\%\), and 75
\(\%\), respectively.
Table 5 tabulates the effect of unseen benign transformations on invisible watermarked images in terms of BRA using different Instagram filters. All models are trained and tested on the CelebA dataset.
The overall performance in terms of BRA is \(97.1\%\) for the U-Net\(+\)C\(+\) \(A_{adv}\) on Gaussian blur, which is better than \(96.3\%\) BRA of the second-best model, StegaStamp, when only benign transformations \(g_{bt}\) are used during the training stage. Similar observations were seen on JPEG compression and unseen benign transformations.
Figure 5 illustrates the application of Instagram filters as unseen benign transforms on invisible watermarked images. In this study, Instagram filters such as Brooklyn, Clarendon, and Aden are used. These filters are designed to enhance facial images with unique aesthetic effects, each altering images in distinctive ways to cater to diverse visual preferences and styles.
Figure 5 illustrates the application of multiple Instagram filters and their combined impact as unseen benign transforms on invisible watermarked images. In this study, Instagram filters such as Brooklyn, Clarendon, and Aden are used. The symbol (A
\(+\) B) in the figure denotes the combined application of the Brooklyn and Aden filters to the watermarked image. Similarly, (B
\(+\) C) in the figure denotes the combined application of Brooklyn and Clarendon filters to the watermarked image. Finally, (A
\(+\) B
\(+\) C) in the figure denotes the combined application of Aden, Brooklyn, and Clarendon filters to the watermarked image. The evaluation shows that the model incorporating U-Net with a Critic (
\(C\)) and an Adversary (
\(A_{adv}\)) network performs best in terms of BRA. The overall performance in terms of BRA is
\(98.73\%\) for U-Net
\(+\)C
\(+\) \(A_{adv}\) on unseen benign transformations, which is better than
\(98.29\%\) BRA of the second-best model, Facesigns (Semi-Fragile), when only benign transformations
\(g_{bt}\) are used during the training stage. This superior performance is consistently observed across various settings that involve different image filters. The consistent performance of the model across all unknown benign transformations can be attributed to its specialized training exclusively on benign transformations such as cropping, compression, or subtle filtering. Training specifically on these types of transformations enables the model to become highly proficient in identifying and managing the specific patterns and distortions they introduce.
Overall, training the model using only benign transformations renders it robust to unseen benign transformations. Further, the integration of the adversary module during the training stage also plays a pivotal role in enhancing the robustness and imperceptibility of the watermarking process to unknown benign transformations. This process renders the watermark resilient against a variety of benign transformations, thereby preserving the integrity of the media content.
6.2 Malicious Transforms
For malicious transformations, we applied facial manipulations on the watermarked images using different generative models, namely, auto-encoder, GANs, and diffusion models and calculated the BRA. In this case, a low BRA is preferred for fragility against malicious transformations such as Deepfakes.
FaceSwap Based Malicious Transforms: The Faceswap model is a graphics-based method that aligns the facial landmarks to swap the faces between the source and the target using an encoder and decoder style model. This technology is widely used for various applications ranging from entertainment and media to more serious uses such as personalized advertisements and synthetic data generation for AI training.
Figure 6 shows the sample watermarked facial images with identity swaps generated from the Faceswap model. The input to the Faceswap model is the source image (
\(x_{sw}\)) (not watermarked) and target image (
\(x_{tw}\)) (watermarked). The output is the maliciously transformed facial image
\(x_{mt}=g_{mt}(x_{sw},x_{tw})\) with the identity swapped between the source and the target. For detailed technical description and implementation details, please refer to face-swap-based malicious transforms.
3 Table 6 shows the effect of malicious transformations based on FaceSwap (based on the encoder-decoder model) on invisible watermarked target images obtained using our proposed model in terms of BRA. All models are trained on the FF
\(++\) dataset and tested on the FF
\(++\) and CelebA datasets. From the table, overall performance in terms of (BRA) is
\(42.62\%\) for the U-Net
\(+\)C
\(+\) \(A_{adv}\) on Faceswap-based malicious transforms, which is lower than
\(43.49\%\) BRA of second best model FaceSigns (Semi-Fragile), when both benign
\(g_{bt}\) and malicious transformations
\(g_{mt}\) are used during training.
A lower BRA indicates an increased fragility, which is particularly valuable in the context of detecting malicious transformations like Deepfakes. These results are consistent for the FaceSwap model in both intra- and cross-dataset settings. The superior performance in terms of low BRA can be attributed to the malicious transform used during the training stage. This shows that the transform is able to mimic the facial manipulation process where the facial features are perturbed. Similarly,
Table 7 shows the BRA for Faceswap model-based malicious transformations when trained on CelebA and tested on FF
\(++\), and CelebA datasets. Again, overall performance in terms of (BRA) is
\(39.62\%\) for the U-Net
\(+\)C
\(+\) \(A_{adv}\) on Faceswap-based malicious transforms, which are lower than
\(41.95\%\) BRA of second best model FaceSigns (Semi-Fragile), when both benign
\(g_{bt}\) and malicious transformations
\(g_{mt}\) are used during training. These results are consistent for the Faceswap model in both intra- and cross-dataset settings.
GAN-Based Malicious Transforms: For this experiment, we have used three popular GAN variants, namely, FSGAN [
43] (identity swap), StarGAN [
13], and AttGAN [
24] (attribute manipulation) for the generation of manipulated images. These GANs are widely used for identity, expression, and attribute-based facial manipulation generation. For detailed technical description and implementation details, please refer to the original paper on FSGAN [
43], StarGAN [
13], and AttGAN [
24]. This information is not included for the sake of space.
Table 8 shows the effect of FSGAN [
43], StarGAN [
13], and AttGAN [
24] based facial manipulations, in terms of BRA, on invisible watermarked facial images. These manipulations are applied to all three different versions of our model, as mentioned in
Table 1.
Figure 7 gives an example of sample watermarked target facial images with identity swapped generated from the FSGAN model. The input to the FSGAN is the source image (
\(x_{sw}\)) (not watermarked) and the target image (
\(x_{tw}\)) (watermarked). The output is the malicious facial image transformed
\(x_{mt}=g_{mt}(x_{w})\) with the identity swapped between the source and the target. Similarly,
Figure 8 gives the example of sample watermarked facial images with attribute manipulations generated from the StarGAN and AttGAN models. The input to the StarGAN and AttGAN models is the watermarked image
\(x_{w}\) and the output is the malicious transformed facial image
\(x_{mt}=g_{mt}(x_{w})\) with the manipulated facial attributes such as eye glasses, facial expression and hair color.
All these models, including our U-Net-based and GAN-based models, were trained on the FF
\(++\) dataset and then tested on both the FF
\(++\) and CelebA datasets. As can be seen from the Table, overall performance in terms of (BRA) is
\(49.01\%\) for the U-Net
\(+\)C
\(+\) \(A_{adv}\) on GAN-based malicious transforms, which is lower than
\(50.91\%\) BRA of second-best model FaceSigns (Semi-Fragile), when both benign
\(g_{bt}\) and malicious transformations
\(g_{mt}\) are used during training. These results are consistent across FSGAN, StarGAN, and AttGAN models in the intra as well as cross-dataset settings. Similarly, in
Table 9 we used the same FSGAN, StarGAN, and AttGAN models for malicious transformations. The models are trained on CelebA and tested on FF
\(++\) and CelebA datasets. Again, the overall performance in terms of (BRA) is
\(47.39\%\) for the U-Net
\(+\)C
\(+\) \(A_{adv}\) on GAN-based malicious transforms, which is lower than
\(49.88\%\) BRA of the second-best model, FaceSigns (Semi-Fragile), when both benign
\(g_{bt}\) and malicious transformations
\(g_{mt}\) are used during training.
The superior performance of the model, indicated by a low BRA, can indeed be traced back to the inclusion of malicious transformations during the training phase. This approach allows the model to become adept at detecting and responding to manipulations akin to those encountered in real-world scenarios, such as Deepfakes and other forms of digital forgery. In addition, we trained the suggested model exclusively on malicious transformations; nonetheless, its performance is not as good as that of the model trained on both benign and malicious transformations. The reason could be that a model trained solely on malicious transformations tends to develop a narrow focus, optimizing specifically for certain types of data alterations. This specialization may limit the model’s capacity to generalize across a wider array of real-world scenarios, which could include benign transformations. Lacking exposure to these benign transformations, the model may struggle to accurately differentiate between genuinely malicious modifications and benign transformations in images, resulting in decreased overall performance.
Diffusion Model-Based Malicious Transforms: In this work, we use a Stable diffusion-based model which is a latent text-to-image/image-to-image diffusion model able to take any type of text input and produce realistic-looking images.
Figure 9 shows an example of maliciously transformed facial images from Stable Diffusion V
\(1.5\) and in-painting. The input to the Stable Diffusion V
\(1.5\) and in-painting models is the watermarked image
\(x_{w}\) and the output is a de-noised synthetic facial image
\(x_{mt}=g_{mt}(x_{w})\).
In our experiments, we used stable diffusion V
\(1.5\) [
26] and stable diffusion inpainting [
26] based diffusion models, which use the underlying concept of conditional mechanism and generative modeling of latent representation following a reverse Markov process. In order to make the training process more efficient and faster, we used
low-rank adaptation (LoRA)4 which is a simple training method that drastically lowers the total number of trainable parameters of specific computationally complex layers. Instead of modifying the entire weight matrix of a layer, LoRA introduces two low-rank matrices
\(A\) and
\(B\). These matrices are much smaller in size compared to the original weight matrix
\(W\) of the layer. As a result, LoRA training is considerably quicker and more memory-efficient, and smaller model weights are generated that are simpler to share and store. For detailed technical description and implementation details, please refer to the original work [
26], not discussed for the sake of space.
Tables 10,
11, and
12 show the effect of stable diffusion V
\(1.5\) [
26] and stable diffusion inpainting [
26] based malicious transformations on the invisible watermarked images in terms of BRA. The impact of these synthetic manipulations is evaluated on different versions of our proposed model as tabulated in
Table 1. All the models are trained on IMDB-WIKI Dataset and tested on FF
\(++\), IMDB-WIKI, and CelebA Datasets. From
Table 10, the overall performance in terms of (BRA) is
\(42\%\) for the U-Net
\(+\)C
\(+\) \(A_{adv}\) on Diffusion-based malicious transforms which is lower than
\(52.74\%\) BRA of second best model FaceSigns (Semi-Fragile), when both benign
\(g_{bt}\) and malicious transformations
\(g_{mt}\) are used during training. These results are consistent across Stable diffusion V
\(1.5\) and Stable Diffusion Inpainting models in the intra as well as cross-dataset settings.
Overall, when exposed to malicious transformations, the proposed model that is trained on both benign and malicious transforms exhibits a lower BRA, which is actually desirable. This lower BRA indicates increased fragility, a characteristic that is vital for the detection of altered media generated using malicious transformations like Deepfakes. The model’s performance is attributed to its training with both benign and malicious transformations, enabling it to acquire a comprehensive understanding of such transforms overall. We also experimented by training the model solely on malicious transformations, but its performance was notably inferior compared to the model trained on both benign and malicious transformations.