skip to main content
10.1145/3674399.3674448acmotherconferencesArticle/Chapter ViewAbstractPublication Pagesacm-turcConference Proceedingsconference-collections
research-article
Open access

Detecting Adversarial Examples via Reconstruction-based Semantic Inconsistency

Published: 30 July 2024 Publication History

Abstract

Adversarial attacks have been demonstrated a huge threat to the field of artificial intelligence security. To address it, adversarial training is proposed, but it requires a high computation cost and will degrade the original performance. For easy deployment, some detection solutions have emerged. However, most existing methods mainly leverage the vulnerability of adversarial perturbation against input-level processing and differences in distribution such as reconstruction or bit reduction, but it is difficult for them to detect attacks with different perturbation patterns and strengths. To address it, a very recent method, ContraNet, leverages the victim model’s prediction to guide the input reconstruction, but this will decrease the detection rate on clean images.
This paper also focuses on input-reconstruction but utilizes semantic similarity comparison. Different from ContraNet, we attempt to amplify the semantic inconsistency between the adversarial example and its reconstruction format based on the intrinsic features of the target classifier. In other words, this paper proposes a reconstruction-based detection via intrinsic features of the classifier to explore adversarial examples from the perspective of the target classifier rather than the distribution in pixel space perceived by humans. Based on this, a feature extractor can be learned in unsupervised learning through a modified SimCLR to perform semantic extraction, while the reconstruction of adversarial examples has inconsistent semantic information on the pixels, thus distinguishing clean samples and AEs. Our method can effectively defend against various disturbances and different types of attacks, maintaining a high detection robustness accuracy, and a high clean detection rate.

References

[1]
Nicholas Carlini and David Wagner. 2017. Towards evaluating the robustness of neural networks. In 2017 ieee symposium on security and privacy (sp). Ieee, 39–57.
[2]
Fabio Carrara, Fabrizio Falchi, Roberto Caldelli, Giuseppe Amato, and Rudy Becarelli. 2019. Adversarial image detection in deep neural networks. Multimedia Tools and Applications 78 (2019), 2815–2835.
[3]
Pin-Yu Chen, Yash Sharma, Huan Zhang, Jinfeng Yi, and Cho-Jui Hsieh. 2018. Ead: elastic-net attacks to deep neural networks via adversarial examples. In Proceedings of the AAAI conference on artificial intelligence, Vol. 32.
[4]
Pin-Yu Chen, Huan Zhang, Yash Sharma, Jinfeng Yi, and Cho-Jui Hsieh. 2017. Zoo: Zeroth order optimization based black-box attacks to deep neural networks without training substitute models. In Proceedings of the 10th ACM workshop on artificial intelligence and security. 15–26.
[5]
Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. 2020. A simple framework for contrastive learning of visual representations. In International conference on machine learning. PMLR, 1597–1607.
[6]
Vincent Dumoulin and Francesco Visin. 2016. A guide to convolution arithmetic for deep learning. arXiv preprint arXiv:1603.07285 (2016).
[7]
Yifei Gao, Zhiyu Lin, Yunfan Yang, and Jitao Sang. 2023. Towards Black-box Adversarial Example Detection: A Data Reconstruction-based Method. arXiv preprint arXiv:2306.02021 (2023).
[8]
Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. 2014. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572 (2014).
[9]
Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, and Ross Girshick. 2020. Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 9729–9738.
[10]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770–778.
[11]
Zhiyuan He, YANG Yijun, Pin-Yu Chen, Qiang Xu, and Tsung-Yi Ho. 2023. Be your own neighborhood: Detecting adversarial example by the neighborhood relations built on self-supervised learning. (2023).
[12]
Gao Huang, Zhuang Liu, Laurens Van Der Maaten, and Kilian Q Weinberger. 2017. Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4700–4708.
[13]
Zhichao Huang and Tong Zhang. 2019. Black-box adversarial attack with transferable model-based embedding. arXiv preprint arXiv:1911.07140 (2019).
[14]
Alex Krizhevsky, Geoffrey Hinton, 2009. Learning multiple layers of features from tiny images. (2009).
[15]
Alexey Kurakin, Ian J Goodfellow, and Samy Bengio. 2018. Adversarial examples in the physical world. In Artificial intelligence safety and security. Chapman and Hall/CRC, 99–112.
[16]
Aleksander Mądry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. 2017. Towards deep learning models resistant to adversarial attacks. stat 1050 (2017), 9.
[17]
Dongyu Meng and Hao Chen. 2017. Magnet: a two-pronged defense against adversarial examples. In Proceedings of the 2017 ACM SIGSAC conference on computer and communications security. 135–147.
[18]
Umberto Michelucci. 2022. An introduction to autoencoders. arXiv preprint arXiv:2201.03898 (2022).
[19]
Takeru Miyato, Shin-ichi Maeda, Masanori Koyama, Ken Nakae, and Shin Ishii. 2015. Distributional smoothing with virtual adversarial training. arXiv preprint arXiv:1507.00677 (2015).
[20]
Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, and Pascal Frossard. 2016. Deepfool: a simple and accurate method to fool deep neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2574–2582.
[21]
Yuval Netzer, Tao Wang, Adam Coates, Alessandro Bissacco, Baolin Wu, Andrew Y Ng, 2011. Reading digits in natural images with unsupervised feature learning. In NIPS workshop on deep learning and unsupervised feature learning, Vol. 2011. Granada, Spain, 7.
[22]
Weili Nie, Brandon Guo, Yujia Huang, Chaowei Xiao, Arash Vahdat, and Anima Anandkumar. 2022. Diffusion models for adversarial purification. arXiv preprint arXiv:2205.07460 (2022).
[23]
Johannes Stallkamp, Marc Schlipsing, Jan Salmen, and Christian Igel. 2012. Man vs. computer: Benchmarking machine learning algorithms for traffic sign recognition. Neural networks 32 (2012), 323–332.
[24]
Zhou Wang, Alan C Bovik, Hamid R Sheikh, and Eero P Simoncelli. 2004. Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing 13, 4 (2004), 600–612.
[25]
Chaowei Xiao, Bo Li, Jun-Yan Zhu, Warren He, Mingyan Liu, and Dawn Song. 2018. Generating adversarial examples with adversarial networks. arXiv preprint arXiv:1801.02610 (2018).
[26]
Weilin Xu, David Evans, and Yanjun Qi. 2017. Feature squeezing: Detecting adversarial examples in deep neural networks. arXiv preprint arXiv:1704.01155 (2017).
[27]
Yijun Yang, Ruiyuan Gao, Yu Li, Qiuxia Lai, and Qiang Xu. 2022. What you see is not what the network infers: Detecting adversarial examples based on semantic contradiction. arXiv preprint arXiv:2201.09650 (2022).
[28]
Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, and Oliver Wang. 2018. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE conference on computer vision and pattern recognition. 586–595.
[29]
Yao Zhu, Yuefeng Chen, Xiaodan Li, Kejiang Chen, Yuan He, Xiang Tian, Bolun Zheng, Yaowu Chen, and Qingming Huang. 2022. Toward understanding and boosting adversarial transferability from a distribution perspective. IEEE Transactions on Image Processing 31 (2022), 6487–6501.

Index Terms

  1. Detecting Adversarial Examples via Reconstruction-based Semantic Inconsistency

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    ACM-TURC '24: Proceedings of the ACM Turing Award Celebration Conference - China 2024
    July 2024
    261 pages
    ISBN:9798400710117
    DOI:10.1145/3674399
    This work is licensed under a Creative Commons Attribution International 4.0 License.

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 30 July 2024

    Check for updates

    Author Tags

    1. Adversarial examples detection
    2. reconstruction
    3. semantic inconsistency

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Funding Sources

    • the Natural Science Foundation of China
    • the Natural Science Foundation of China
    • Key Research and Development program of Anhui Province

    Conference

    ACM-TURC '24

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 266
      Total Downloads
    • Downloads (Last 12 months)266
    • Downloads (Last 6 weeks)49
    Reflects downloads up to 15 Feb 2025

    Other Metrics

    Citations

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Login options

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media