research-article

Open access

Detecting Adversarial Examples via Reconstruction-based Semantic Inconsistency

Authors:

Nenghai YuAuthors Info & Claims

ACM-TURC '24: Proceedings of the ACM Turing Award Celebration Conference - China 2024

Pages 126 - 131

https://doi.org/10.1145/3674399.3674448

Published: 30 July 2024 Publication History

All formats PDF

Abstract

Adversarial attacks have been demonstrated a huge threat to the field of artificial intelligence security. To address it, adversarial training is proposed, but it requires a high computation cost and will degrade the original performance. For easy deployment, some detection solutions have emerged. However, most existing methods mainly leverage the vulnerability of adversarial perturbation against input-level processing and differences in distribution such as reconstruction or bit reduction, but it is difficult for them to detect attacks with different perturbation patterns and strengths. To address it, a very recent method, ContraNet, leverages the victim model’s prediction to guide the input reconstruction, but this will decrease the detection rate on clean images.

This paper also focuses on input-reconstruction but utilizes semantic similarity comparison. Different from ContraNet, we attempt to amplify the semantic inconsistency between the adversarial example and its reconstruction format based on the intrinsic features of the target classifier. In other words, this paper proposes a reconstruction-based detection via intrinsic features of the classifier to explore adversarial examples from the perspective of the target classifier rather than the distribution in pixel space perceived by humans. Based on this, a feature extractor can be learned in unsupervised learning through a modified SimCLR to perform semantic extraction, while the reconstruction of adversarial examples has inconsistent semantic information on the pixels, thus distinguishing clean samples and AEs. Our method can effectively defend against various disturbances and different types of attacks, maintaining a high detection robustness accuracy, and a high clean detection rate.

References

[1]

Nicholas Carlini and David Wagner. 2017. Towards evaluating the robustness of neural networks. In 2017 ieee symposium on security and privacy (sp). Ieee, 39–57.

[2]

Fabio Carrara, Fabrizio Falchi, Roberto Caldelli, Giuseppe Amato, and Rudy Becarelli. 2019. Adversarial image detection in deep neural networks. Multimedia Tools and Applications 78 (2019), 2815–2835.

Digital Library

[3]

Pin-Yu Chen, Yash Sharma, Huan Zhang, Jinfeng Yi, and Cho-Jui Hsieh. 2018. Ead: elastic-net attacks to deep neural networks via adversarial examples. In Proceedings of the AAAI conference on artificial intelligence, Vol. 32.

[4]

Pin-Yu Chen, Huan Zhang, Yash Sharma, Jinfeng Yi, and Cho-Jui Hsieh. 2017. Zoo: Zeroth order optimization based black-box attacks to deep neural networks without training substitute models. In Proceedings of the 10th ACM workshop on artificial intelligence and security. 15–26.

Digital Library

[5]

Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. 2020. A simple framework for contrastive learning of visual representations. In International conference on machine learning. PMLR, 1597–1607.

[6]

Vincent Dumoulin and Francesco Visin. 2016. A guide to convolution arithmetic for deep learning. arXiv preprint arXiv:1603.07285 (2016).

[7]

Yifei Gao, Zhiyu Lin, Yunfan Yang, and Jitao Sang. 2023. Towards Black-box Adversarial Example Detection: A Data Reconstruction-based Method. arXiv preprint arXiv:2306.02021 (2023).

[8]

Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. 2014. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572 (2014).

[9]

Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, and Ross Girshick. 2020. Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 9729–9738.

[10]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770–778.

[11]

Zhiyuan He, YANG Yijun, Pin-Yu Chen, Qiang Xu, and Tsung-Yi Ho. 2023. Be your own neighborhood: Detecting adversarial example by the neighborhood relations built on self-supervised learning. (2023).

[12]

Gao Huang, Zhuang Liu, Laurens Van Der Maaten, and Kilian Q Weinberger. 2017. Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4700–4708.

[13]

Zhichao Huang and Tong Zhang. 2019. Black-box adversarial attack with transferable model-based embedding. arXiv preprint arXiv:1911.07140 (2019).

[14]

Alex Krizhevsky, Geoffrey Hinton, 2009. Learning multiple layers of features from tiny images. (2009).

[15]

Alexey Kurakin, Ian J Goodfellow, and Samy Bengio. 2018. Adversarial examples in the physical world. In Artificial intelligence safety and security. Chapman and Hall/CRC, 99–112.

[16]

Aleksander Mądry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. 2017. Towards deep learning models resistant to adversarial attacks. stat 1050 (2017), 9.

[17]

Dongyu Meng and Hao Chen. 2017. Magnet: a two-pronged defense against adversarial examples. In Proceedings of the 2017 ACM SIGSAC conference on computer and communications security. 135–147.

Digital Library

[18]

Umberto Michelucci. 2022. An introduction to autoencoders. arXiv preprint arXiv:2201.03898 (2022).

[19]

Takeru Miyato, Shin-ichi Maeda, Masanori Koyama, Ken Nakae, and Shin Ishii. 2015. Distributional smoothing with virtual adversarial training. arXiv preprint arXiv:1507.00677 (2015).

[20]

Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, and Pascal Frossard. 2016. Deepfool: a simple and accurate method to fool deep neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2574–2582.

[21]

Yuval Netzer, Tao Wang, Adam Coates, Alessandro Bissacco, Baolin Wu, Andrew Y Ng, 2011. Reading digits in natural images with unsupervised feature learning. In NIPS workshop on deep learning and unsupervised feature learning, Vol. 2011. Granada, Spain, 7.

[22]

Weili Nie, Brandon Guo, Yujia Huang, Chaowei Xiao, Arash Vahdat, and Anima Anandkumar. 2022. Diffusion models for adversarial purification. arXiv preprint arXiv:2205.07460 (2022).

[23]

Johannes Stallkamp, Marc Schlipsing, Jan Salmen, and Christian Igel. 2012. Man vs. computer: Benchmarking machine learning algorithms for traffic sign recognition. Neural networks 32 (2012), 323–332.

[24]

Zhou Wang, Alan C Bovik, Hamid R Sheikh, and Eero P Simoncelli. 2004. Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing 13, 4 (2004), 600–612.

Digital Library

[25]

Chaowei Xiao, Bo Li, Jun-Yan Zhu, Warren He, Mingyan Liu, and Dawn Song. 2018. Generating adversarial examples with adversarial networks. arXiv preprint arXiv:1801.02610 (2018).

Digital Library

[26]

Weilin Xu, David Evans, and Yanjun Qi. 2017. Feature squeezing: Detecting adversarial examples in deep neural networks. arXiv preprint arXiv:1704.01155 (2017).

[27]

Yijun Yang, Ruiyuan Gao, Yu Li, Qiuxia Lai, and Qiang Xu. 2022. What you see is not what the network infers: Detecting adversarial examples based on semantic contradiction. arXiv preprint arXiv:2201.09650 (2022).

[28]

Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, and Oliver Wang. 2018. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE conference on computer vision and pattern recognition. 586–595.

[29]

Yao Zhu, Yuefeng Chen, Xiaodan Li, Kejiang Chen, Yuan He, Xiang Tian, Bolun Zheng, Yaowu Chen, and Qingming Huang. 2022. Toward understanding and boosting adversarial transferability from a distribution perspective. IEEE Transactions on Image Processing 31 (2022), 6487–6501.

Index Terms

Detecting Adversarial Examples via Reconstruction-based Semantic Inconsistency
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems

Recommendations

Adversarial Attacks and Detection on Reinforcement Learning-Based Interactive Recommender Systems
SIGIR '20: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval

Adversarial attacks pose significant challenges for detecting adversarial attacks at an early stage. We propose attack-agnostic detection on reinforcement learning-based interactive recommendation systems. We first craft adversarial examples to show ...
Adversarial examples: A survey of attacks and defenses in deep learning-enabled cybersecurity systems
Abstract
Over the last few years, the adoption of machine learning in a wide range of domains has been remarkable. Deep learning, in particular, has been extensively used to drive applications and services in specializations such as computer vision, ...
Highlights
- A taxonomy of cybersecurity applications is established.
- Adversarial machine learning is systematically overviewed.
- An extensive, curated list of cybersecurity-related datasets is provided.
- Methods for generating adversarial ...
Robust Person Re-identification with Adversarial Examples Detection and Perturbation Extraction
Pattern Recognition and Computer Vision
Abstract
Person re-identification (ReID) systems, those based on deep neural networks, have been shown their vulnerability to adversarial examples, i.e. images that only added slight perturbations. In previous defense methods, those input transformations ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

ACM-TURC '24: Proceedings of the ACM Turing Award Celebration Conference - China 2024

July 2024

261 pages

ISBN:9798400710117

DOI:10.1145/3674399

Copyright © 2024 Owner/Author.

This work is licensed under a Creative Commons Attribution International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 July 2024

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

the Natural Science Foundation of China
the Natural Science Foundation of China
Key Research and Development program of Anhui Province

Conference

ACM-TURC '24

ACM-TURC '24: ACM Turing Award Celebration Conference 2024

July 5 - 7, 2024

Changsha, China

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
266
Total Downloads

Downloads (Last 12 months)266
Downloads (Last 6 weeks)49

Reflects downloads up to 15 Feb 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten