Abstract
In these last years, neural networks are becoming the basis for different kinds of applications and this is mainly due to the stunning performances they offer. Nevertheless, all that glitters is not gold: such tools have demonstrated to be highly sensitive to malicious approaches such as gradient manipulation or the injection of adversarial samples. In particular, another kind of attack that can be performed is to poison a neural network during the training time by injecting a perceptually barely visible trigger signal in a small portion of the dataset (target class), to actually create a backdoor into the trained model. Such a backdoor can be then exploited to redirect all the predictions to the chosen target class at test time. In this work, a novel backdoor attack which resorts to image watermarking algorithms to generate a trigger signal is presented. The watermark signal is almost unperceivable and is embedded in a portion of images of the target class; two different watermarking algorithms have been tested. Experimental results carried out on datasets like MNIST and GTSRB provide satisfactory performances in terms of attack success rate and introduced distortion.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Barni, M., Kallas, K., Tondi, B.: A new backdoor attack in CNNs by training set corruption without label poisoning. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 101–105 (2019). https://doi.org/10.1109/ICIP.2019.8802997
Chen, X., Liu, C., Li, B., Lu, K., Song, D.X.: Targeted backdoor attacks on deep learning systems using data poisoning. arXiv abs/1712.05526 (2017)
Cox, I., Kilian, J., Leighton, F., Shamoon, T.: Secure spread spectrum watermarking for multimedia. IEEE Trans. Image Process. 6(12), 1673–1687 (1997). https://doi.org/10.1109/83.650120. https://ieeexplore.ieee.org/document/650120/
Gao, Y., et al.: Backdoor attacks and countermeasures on deep learning: a comprehensive review (2020). https://doi.org/10.48550/ARXIV.2007.10760. https://arxiv.org/abs/2007.10760
Goodfellow, I., McDaniel, P., Papernot, N.: Making machine learning robust against adversarial inputs. Commun. ACM 61(7), 56–66 (2018). https://doi.org/10.1145/3134599
Gu, T., Dolan-Gavitt, B., Garg, S.: Badnets: identifying vulnerabilities in the machine learning model supply chain (2017). https://doi.org/10.48550/ARXIV.1708.06733. https://arxiv.org/abs/1708.06733
Guo, W., Tondi, B., Barni, M.: An overview of backdoor attacks against deep neural networks and possible defences. arXiv (2021). arXiv:2111.08429
Hammoud, H.A.A.K., Ghanem, B.: Check your other door! creating backdoor attacks in the frequency domain (2021). https://doi.org/10.48550/ARXIV.2109.05507. https://arxiv.org/abs/2109.05507
Houben, S., Stallkamp, J., Salmen, J., Schlipsing, M., Igel, C.: Detection of traffic signs in real-world images: the German traffic sign detection benchmark. In: International Joint Conference on Neural Networks, no. 1288 (2013)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization (2014). https://doi.org/10.48550/ARXIV.1412.6980. https://arxiv.org/abs/1412.6980
LeCun, Y., Cortes, C.: MNIST handwritten digit database (2010). http://yann.lecun.com/exdb/mnist/
Liu, Y., Ma, X., Bailey, J., Lu, F.: Reflection backdoor: a natural backdoor attack on deep neural networks. arXiv:2007.02343 (2020). http://arxiv.org/abs/2007.02343. arXiv: 2007.02343
van der Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9(86), 2579–2605 (2008). http://jmlr.org/papers/v9/vandermaaten08a.html
Muñoz-González, L., et al.: Towards poisoning of deep learning algorithms with back-gradient optimization. In: Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security, pp. 27–38. ACM (2017). https://doi.org/10.1145/3128572.3140451. https://dl.acm.org/doi/10.1145/3128572.3140451
De Rosa, A., Barni, M., Bartolini, F., Cappellini, V., Piva, A.: Optimum decoding of non-additive full frame DFT watermarks. In: Pfitzmann, A. (ed.) IH 1999. LNCS, vol. 1768, pp. 159–171. Springer, Heidelberg (2000). https://doi.org/10.1007/10719724_12
Wang, T., Yao, Y., Xu, F., An, S., Tong, H., Wang, T.: Backdoor attack through frequency domain (2021). https://doi.org/10.48550/ARXIV.2111.10991. https://arxiv.org/abs/2111.10991
Xiao, H., Eckert, C.: Adversarial label flips attack on support vector machines. In: ECAI 2012: Proceedings of the 20th European Conference on Artificial Intelligence, pp. 870–875 (2012). https://doi.org/10.3233/978-1-61499-098-7-870
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 Springer Nature Switzerland AG
About this paper
Cite this paper
Abbate, G., Amerini, I., Caldelli, R. (2023). Image Watermarking Backdoor Attacks in CNN-Based Classification Tasks. In: Rousseau, JJ., Kapralos, B. (eds) Pattern Recognition, Computer Vision, and Image Processing. ICPR 2022 International Workshops and Challenges. ICPR 2022. Lecture Notes in Computer Science, vol 13646. Springer, Cham. https://doi.org/10.1007/978-3-031-37745-7_1
Download citation
DOI: https://doi.org/10.1007/978-3-031-37745-7_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-37744-0
Online ISBN: 978-3-031-37745-7
eBook Packages: Computer ScienceComputer Science (R0)