Abstract
Deep neural networks (DNNs) can be easily fooled by adversarial examples during inference phase when attackers add imperceptible perturbations to original examples. Many works focus on adversarial detection and adversarial training to defend against adversarial attacks. However, few works explore the tool-chains behind adversarial examples, which is called Adversarial Attribution Problem (AAP). In this paper, AAP is defined as the recognition of three signatures, i.e., attack algorithm, victim model and hyperparameter. Existing works transfer AAP into a single-label classification task and ignore the relationship among above three signatures. Actually, there exists owner-member relationship between attack algorithm and hyperparameter, which means hyperparameter recognition relies on the result of attack algorithm classification. Besides, the value of hyperparameter is continuous, hence hyperparameter recognition should be regarded as a regression task. As a result, AAP should be considered as a multi-task learning problem rather than a single-label classification problem or a single-task learning problem. To deal with above problems, we propose a multi-task learning framework named Multi-Task Adversarial Attribution (MTAA) to recognize above three signatures simultaneously. It takes the relationship between attack algorithm and the corresponding hyperparameter into account and uses the uncertainty weighted loss to adjust the weights of three recognition tasks. The experimental results on MNIST and ImageNet show the feasibility and scalability of the proposed framework.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016)
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: CVPR, pp. 580–587 (2014)
Szegedy, C., et al.: Intriguing properties of neural networks. In: ICLR (2014)
Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: ICLR (2015)
Madry, A., Makelov, A., Schmidt, L., Tsipras, D., Vladu, A.: Towards deep learning models resistant to adversarial attacks. In: ICLR (2018)
Carlini, N., Wagner, D, Towards evaluating the robustness of neural networks. In: S &P, pp. 39–57 (2017). https://doi.org/10.1109/SP.2017.49
Qin, Y., Frosst, N., Sabour, S., Raffel, C., Cottrell, G., Hinton, G.: Detecting and diagnosing adversarial images with class-conditional capsule reconstructions. In: ICLR (2020)
Tramèr, F., Kurakin, A., Papernot, N., Goodfellow, I., Boneh, D., McDaniel, P.: Ensemble adversarial training: attacks and defenses. In: ICLR (2018)
DARPA Artificial Intelligence Exploration (AIE) Opportunity and DARPA-PA-19-02-09 and Reverse Engineering of Deceptions (RED)
Gong, Y., et al.: Reverse engineering of imperceptible adversarial image perturbations. In: ICLR (2022)
Thaker, D., Giampouras, P., Vidal, R.: Reverse Engineering \(\ell _p \) attacks: a block-sparse optimization approach with recovery guarantees. In: ICML, pp. 21253–21271 (2022)
Papernot, N., et al.: Technical report on the CleverHans v2.1.0 adversarial examples library. arXiv preprint arXiv:1610.00768 (2016)
Moayeri, M., Feizi, S, Sample efficient detection and classification of adversarial attacks via self-supervised embeddings. In: ICCV, pp. 7677–7686 (2021)
Dotter, M., Xie, S., Manville, K., Harguess, J., Busho, C.: Rodriguez, M.: Adversarial attack attribution: discovering attributable signals in adversarial ML attacks. In: RSEML Workshop at AAAI (2021)
Li, Y.: Supervised Classification on Deep Neural Network Attack Toolchains. Northeastern University (2021)
Nicholson, D.A., Emanuele, V.: Reverse engineering adversarial attacks with fingerprints from adversarial examples. arXiv preprint arXiv:2301.13869 (2023)
Kingma, D. P., Ba, J, Adam: a method for stochastic optimization. In: ICLR (2015)
Zhang, H., Li, A., Guo, J., Guo, Y.: Hybrid models for open set recognition. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12348, pp. 102–117. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58580-8_7
Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018)
Maninis, K.K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019)
Acknowledgements
This work was partially supported by National Natural Science Foundation of China (No. 61772284).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Guo, Z., Han, K., Ge, Y., Li, Y., Ji, W. (2024). Attribution of Adversarial Attacks via Multi-task Learning. In: Luo, B., Cheng, L., Wu, ZG., Li, H., Li, C. (eds) Neural Information Processing. ICONIP 2023. Lecture Notes in Computer Science, vol 14448. Springer, Singapore. https://doi.org/10.1007/978-981-99-8082-6_7
Download citation
DOI: https://doi.org/10.1007/978-981-99-8082-6_7
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8081-9
Online ISBN: 978-981-99-8082-6
eBook Packages: Computer ScienceComputer Science (R0)