skip to main content
10.1145/3460120.3485378acmconferencesArticle/Chapter ViewAbstractPublication PagesccsConference Proceedingsconference-collections
research-article

Feature-Indistinguishable Attack to Circumvent Trapdoor-Enabled Defense

Authors Info & Claims
Published:13 November 2021Publication History

ABSTRACT

Deep neural networks (DNNs) are vulnerable to adversarial attacks. A great effort has been directed to developing effective defenses against adversarial attacks and finding vulnerabilities of proposed defenses. A recently proposed defense called Trapdoor-enabled Detection (TeD) deliberately injects trapdoors into DNN models to trap and detect adversarial examples targeting categories protected by TeD. TeD can effectively detect existing state-of-the-art adversarial attacks. In this paper, we propose a novel black-box adversarial attack on TeD, called Feature-Indistinguishable Attack (FIA). It circumvents TeD by crafting adversarial examples indistinguishable in the feature (i.e., neuron-activation) space from benign examples in the target category. To achieve this goal, FIA jointly minimizes the distance to the expectation of feature representations of benign samples in the target category and maximizes the distances to positive adversarial examples generated to query TeD in the preparation phase. A constraint is used to ensure that the feature vector of a generated adversarial example is within the distribution of feature vectors of benign examples in the target category. Our extensive empirical evaluation with different configurations and variants of TeD indicates that our proposed FIA can effectively circumvent TeD. FIA opens a door for developing much more powerful adversarial attacks. The FIA code is available at: https://github.com/CGCL-codes/FeatureIndistinguishableAttack.

Skip Supplemental Material Section

Supplemental Material

CCS21-fp462.mp4

mp4

225 MB

References

  1. Anish Athalye, Nicholas Carlini, and David A. Wagner. 2018. Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples. In Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10--15, 2018 (Proceedings of Machine Learning Research), Jennifer G. Dy and Andreas Krause (Eds.), Vol. 80. PMLR, 274--283. http://proceedings.mlr.press/v80/athalye18a.htmlGoogle ScholarGoogle Scholar
  2. Osbert Bastani, Yani Ioannou, Leonidas Lampropoulos, Dimitrios Vytiniotis, Aditya Nori, and Antonio Criminisi. 2016. Measuring neural net robustness with constraints. In 30th Conference on Neural Information Processing Systems (NIPS 2016).Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Avishek Joey Bose and Parham Aarabi. 2018. Adversarial attacks on face detectors using neural net based constrained optimization. In 2018 IEEE 20th International Workshop on Multimedia Signal Processing (MMSP). IEEE, 1--6.Google ScholarGoogle ScholarCross RefCross Ref
  4. Wieland Brendel, Jonas Rauber, and Matthias Bethge. 2018. Decision-based adversarial attacks: Reliable attacks against black-box machine learning models. In International Conference on Learning Representations ICLR.Google ScholarGoogle Scholar
  5. Jacob Buckman, Aurko Roy, Colin Raffel, and Ian Goodfellow. 2018. Thermometer encoding: One hot way to resist adversarial examples. In International Conference on Learning Representations.Google ScholarGoogle Scholar
  6. Nicholas Carlini. 2020. A partial break of the honeypots defense to catch adversarial attacks. arXiv preprint arXiv:2009.10975 (2020).Google ScholarGoogle Scholar
  7. Nicholas Carlini and David Wagner. 2016. Defensive distillation is not robust to adversarial examples. arXiv preprint arXiv:1607.04311 (2016).Google ScholarGoogle Scholar
  8. Nicholas Carlini and David Wagner. 2017. Adversarial examples are not easily detected: Bypassing ten detection methods. In Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security. 3--14.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Nicholas Carlini and David Wagner. 2017. Magnet and "efficient defenses against adversarial attacks" are not robust to adversarial examples. arXiv preprint arXiv:1711.08478 (2017).Google ScholarGoogle Scholar
  10. Nicholas Carlini and David A.Wagner. 2017. Towards Evaluating the Robustness of Neural Networks. In 2017 IEEE Symposium on Security and Privacy, SP 2017, San Jose, CA, USA, May 22--26, 2017. IEEE Computer Society, 39--57. https: //doi.org/10.1109/SP.2017.49Google ScholarGoogle Scholar
  11. Pin-Yu Chen, Yash Sharma, Huan Zhang, Jinfeng Yi, and Cho-Jui Hsieh. 2018. EAD: Elastic-Net Attacks to Deep Neural Networks via Adversarial Examples. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, (AAAI-18), Sheila A. McIlraith and Kilian Q. Weinberger (Eds.). AAAI Press, 10--17. https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/view/16893Google ScholarGoogle ScholarCross RefCross Ref
  12. Shang-Tse Chen, Cory Cornelius, Jason Martin, and Duen Horng Polo Chau. 2018. Shapeshifter: Robust physical adversarial attack on faster r-cnn object detector. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 52--68.Google ScholarGoogle Scholar
  13. Jeremy Cohen, Elan Rosenfeld, and Zico Kolter. 2019. Certified adversarial robustness via randomized smoothing. In International Conference on Machine Learning. PMLR, 1310--1320.Google ScholarGoogle Scholar
  14. Francesco Croce and Matthias Hein. 2020. Minimally distorted adversarial examples with a fast adaptive boundary attack. In International Conference on Machine Learning. PMLR, 2196--2205.Google ScholarGoogle Scholar
  15. Guneet S Dhillon, Kamyar Azizzadenesheli, Zachary C Lipton, Jeremy Bernstein, Jean Kossai, Aran Khanna, and Anima Anandkumar. 2018. Stochastic activation pruning for robust adversarial defense. (2018).Google ScholarGoogle Scholar
  16. Yinpeng Dong, Fangzhou Liao, Tianyu Pang, Hang Su, Jun Zhu, Xiaolin Hu, and Jianguo Li. 2018. Boosting adversarial attacks with momentum. In CVPR.Google ScholarGoogle Scholar
  17. Yinpeng Dong, Tianyu Pang, Hang Su, and Jun Zhu. 2019. Evading Defenses to Transferable Adversarial Examples by Translation-Invariant Attacks. In CVPR.Google ScholarGoogle Scholar
  18. Martin Ester, Hans-Peter Kriegel, Jörg Sander, Xiaowei Xu, et al. 1996. A density-based algorithm for discovering clusters in large spatial databases with noise. In kdd, Vol. 96. 226--231.Google ScholarGoogle Scholar
  19. Reuben Feinman, Ryan R Curtin, Saurabh Shintre, and Andrew B Gardner. 2017. Detecting adversarial samples from artifacts. arXiv preprint arXiv:1703.00410 (2017).Google ScholarGoogle Scholar
  20. Ian J. Goodfellow, Jonathon Shlens, and Christian Szegedy. 2015. Explaining and Harnessing Adversarial Examples. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7--9, 2015, Conference Track Proceedings, Yoshua Bengio and Yann LeCun (Eds.). http://arxiv.org/abs/1412.6572Google ScholarGoogle Scholar
  21. Kathrin Grosse, Praveen Manoharan, Nicolas Papernot, Michael Backes, and Patrick McDaniel. 2017. On the (statistical) detection of adversarial examples. arXiv preprint arXiv:1702.06280 (2017).Google ScholarGoogle Scholar
  22. Shixiang Gu and Luca Rigazio. 2015. Towards deep neural network architectures robust to adversarial examples. In International Conference on Learning Representations, (ICLR) Workshop.Google ScholarGoogle Scholar
  23. Tianyu Gu, Kang Liu, Brendan Dolan-Gavitt, and Siddharth Garg. 2019. BadNets: Evaluating Backdooring Attacks on Deep Neural Networks. IEEE Access 7 (2019), 47230--47244. https://doi.org/10.1109/ACCESS.2019.2909068Google ScholarGoogle ScholarCross RefCross Ref
  24. Chuan Guo, Mayank Rana, Moustapha Cisse, and Laurens Van Der Maaten. 2018. Countering adversarial images using input transformations. (2018).Google ScholarGoogle Scholar
  25. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.Google ScholarGoogle ScholarCross RefCross Ref
  26. Warren He, James Wei, Xinyun Chen, Nicholas Carlini, and Dawn Song. 2017. Adversarial example defense: Ensembles of weak defenses are not strong. In 11th USENIX workshop on offensive technologies (WOOT 17).Google ScholarGoogle Scholar
  27. Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. 2015. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015).Google ScholarGoogle Scholar
  28. Ruitong Huang, Bing Xu, Dale Schuurmans, and Csaba Szepesvári. 2016. Learning with a strong adversary. In International Conference on Learning Representations (ICLR), 2016.Google ScholarGoogle Scholar
  29. Nathan Inkawhich, Kevin J Liang, Lawrence Carin, and Yiran Chen. 2020. Transferable perturbations of deep feature distributions. In International Conference on Learning Representations, ICLR 2020.Google ScholarGoogle Scholar
  30. Nathan Inkawhich, Kevin J Liang, BinghuiWang, Matthew Inkawhich, Lawrence Carin, and Yiran Chen. 2020. Perturbing across the feature hierarchy to improve standard and strict blackbox attack transferability. arXiv preprint arXiv:2004.14861 (2020).Google ScholarGoogle Scholar
  31. Nathan Inkawhich, Wei Wen, Hai Helen Li, and Yiran Chen. 2019. Feature space perturbations yield more transferable adversarial examples. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 7066--7074.Google ScholarGoogle ScholarCross RefCross Ref
  32. Alex Krizhevsky, Geoffrey Hinton, et al. 2009. Learning multiple layers of features from tiny images. (2009).Google ScholarGoogle Scholar
  33. Alexey Kurakin, Ian Goodfellow, and Samy Bengio. 2017. Adversarial examples in the physical world. In 5th International Conference on Learning Representations ICLR 2017, Workshop track.Google ScholarGoogle Scholar
  34. Alexey Kurakin, Ian Goodfellow, and Samy Bengio. 2017. Adversarial machine learning at scale. In 5th International Conference on Learning Representations ICLR 2017.Google ScholarGoogle Scholar
  35. Yann LeCun, Lawrence D Jackel, Léon Bottou, Corinna Cortes, John S Denker, Harris Drucker, Isabelle Guyon, Urs A Muller, Eduard Sackinger, Patrice Simard, et al. 1995. Learning algorithms for classification: A comparison on handwritten digit recognition. Neural networks: the statistical mechanics perspective (1995), 261--276.Google ScholarGoogle Scholar
  36. Mathias Lecuyer, Vaggelis Atlidakis, Roxana Geambasu, Daniel Hsu, and Suman Jana. 2019. Certified robustness to adversarial examples with differential privacy. In 2019 IEEE Symposium on Security and Privacy (SP). IEEE, 656--672.Google ScholarGoogle ScholarCross RefCross Ref
  37. Bai Li, Changyou Chen, Wenlin Wang, and Lawrence Carin. 2019. Certified adversarial robustness with additive noise. In NeurIPS 2019.Google ScholarGoogle Scholar
  38. Jiajun Lu, Theerasit Issaranon, and David Forsyth. 2017. Safetynet: Detecting and rejecting adversarial examples robustly. In Proceedings of the IEEE International Conference on Computer Vision. 446--454.Google ScholarGoogle ScholarCross RefCross Ref
  39. Shiqing Ma and Yingqi Liu. 2019. Nic: Detecting adversarial samples with neural network invariant checking. In Proceedings of the 26th Network and Distributed System Security Symposium (NDSS 2019).Google ScholarGoogle ScholarCross RefCross Ref
  40. Xingjun Ma, Bo Li, Yisen Wang, Sarah M Erfani, Sudanthi Wijewickrema, Grant Schoenebeck, Dawn Song, Michael E Houle, and James Bailey. 2018. Characterizing adversarial subspaces using local intrinsic dimensionality. (2018).Google ScholarGoogle Scholar
  41. Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. 2018. Towards Deep Learning Models Resistant to Adversarial Attacks. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net. https://openreview.net/forum?id=rJzIBfZAbGoogle ScholarGoogle Scholar
  42. Dongyu Meng and Hao Chen. 2017. MagNet: A Two-Pronged Defense against Adversarial Examples. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security (CCS '17). Association for Computing Machinery, New York, NY, USA, 135--147. https://doi.org/10.1145/3133956.3134 057Google ScholarGoogle ScholarCross RefCross Ref
  43. Jeet Mohapatra, Ching-Yun Ko, Sijia Liu, Pin-Yu Chen, Luca Daniel, et al. 2020. Rethinking randomized smoothing for adversarial robustness. arXiv preprint arXiv:2003.01249 (2020).Google ScholarGoogle Scholar
  44. Nina Narodytska and Shiva Prasad Kasiviswanathan. 2017. Simple black-box adversarial perturbations for deep networks. In CVPR Workshops.Google ScholarGoogle Scholar
  45. Nicolas Papernot, Patrick McDaniel, Ian Goodfellow, Somesh Jha, Z Berkay Celik, and Ananthram Swami. 2017. Practical black-box attacks against machine learning. In Proceedings of the 2017 ACM on Asia conference on computer and communications security. 506--519.Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Nicolas Papernot, Patrick McDaniel, Somesh Jha, Matt Fredrikson, Z Berkay Celik, and Ananthram Swami. 2016. The limitations of deep learning in adversarial settings. In 2016 IEEE European symposium on security and privacy (EuroS&P). IEEE, 372--387.Google ScholarGoogle ScholarCross RefCross Ref
  47. Nicolas Papernot, Patrick McDaniel, Xi Wu, Somesh Jha, and Ananthram Swami. 2016. Distillation as a defense to adversarial perturbations against deep neural networks. In 2016 IEEE symposium on security and privacy (SP). IEEE, 582--597.Google ScholarGoogle ScholarCross RefCross Ref
  48. Pouya Samangouei, Maya Kabkab, and Rama Chellappa. 2018. Defense-gan: Protecting classiers against adversarial attacks using generative models. (2018).Google ScholarGoogle Scholar
  49. Uri Shaham, Yutaro Yamada, and Sahand Negahban. 2018. Understanding adversarial training: Increasing local stability of supervised models through robust optimization. Neurocomputing 307 (2018), 195--204.Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Shawn Shan. 2021. Gotta Catch'Em All: Using Honeypots to Catch Adversarial Attacks on Neural Networks. https://github.com/Shawn-Shan/trapdoor, note = Accessed on May 1st, 2021. (2021).Google ScholarGoogle Scholar
  51. Shawn Shan, Emily Wenger, Bolun Wang, Bo Li, Haitao Zheng, and Ben Y. Zhao. 2020. Gotta Catch'Em All: Using Honeypots to Catch Adversarial Attacks on Neural Networks. In Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security (CCS '20). Association for Computing Machinery, New York, NY, USA, 67--83. https://doi.org/10.1145/3372297.3417231Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Mahmood Sharif, Sruti Bhagavatula, Lujo Bauer, and Michael K Reiter. 2016. Accessorize to a crime: Real and stealthy attacks on state-of-the-art face recognition. In Proceedings of the 2016 acm sigsac conference on computer and communications security. 1528--1540.Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Yang Song, Taesup Kim, Sebastian Nowozin, Stefano Ermon, and Nate Kushman. 2018. Pixeldefend: Leveraging generative models to understand and defend against adversarial examples. (2018).Google ScholarGoogle Scholar
  54. Johannes Stallkamp, Marc Schlipsing, Jan Salmen, and Christian Igel. 2012. Man vs. computer: Benchmarking machine learning algorithms for traffic sign recognition. Neural networks 32 (2012), 323--332.Google ScholarGoogle Scholar
  55. Jiawei Su, Danilo Vasconcellos Vargas, and Kouichi Sakurai. 2019. One pixel attack for fooling deep neural networks. IEEE Transactions on Evolutionary Computation 23, 5 (2019), 828--841.Google ScholarGoogle ScholarCross RefCross Ref
  56. Christian Szegedy,Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. 2014. Intriguing properties of neural networks. 2nd International Conference on Learning Representations (ICLR) 2014.Google ScholarGoogle Scholar
  57. Florian Tramèr, Alexey Kurakin, Nicolas Papernot, Ian Goodfellow, Dan Boneh, and Patrick McDaniel. 2017. Ensemble adversarial training: Attacks and defenses. arXiv preprint arXiv:1705.07204 (2017).Google ScholarGoogle Scholar
  58. Jonathan Uesato, Brendan O'Donoghue, Pushmeet Kohli, and Aäron van den Oord. 2018. Adversarial Risk and the Dangers of Evaluating Against Weak Attacks. In Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10--15, 2018 (Proceedings of Machine Learning Research), Jennifer G. Dy and Andreas Krause (Eds.), Vol. 80. PMLR, 5032--5041. http://proceedings.mlr.press/v80/uesato18a.htmlGoogle ScholarGoogle Scholar
  59. Bolun Wang, Yuanshun Yao, Shawn Shan, Huiying Li, Bimal Viswanath, Haitao Zheng, and Ben Y. Zhao. 2019. Neural Cleanse: Identifying and Mitigating Backdoor Attacks in Neural Networks. In 2019 IEEE Symposium on Security and Privacy, SP 2019, San Francisco, CA, USA, May 19--23, 2019. IEEE, 707--723. https://doi.org/10.1109/SP.2019.00031Google ScholarGoogle Scholar
  60. Lior Wolf, Tal Hassner, and Itay Maoz. 2011. Face recognition in unconstrained videos with matched background similarity. In CVPR 2011. IEEE, 529--534.Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. Cihang Xie, Jianyu Wang, Zhishuai Zhang, Zhou Ren, and Alan Yuille. 2018. Mitigating adversarial effects through randomization. (2018).Google ScholarGoogle Scholar
  62. Cihang Xie, Zhishuai Zhang, Yuyin Zhou, Song Bai, JianyuWang, Zhou Ren, and Alan L Yuille. 2019. Improving transferability of adversarial examples with input diversity. In CVPR.Google ScholarGoogle Scholar
  63. Weilin Xu, David Evans, and Yanjun Qi. 2018. Feature Squeezing: Detecting Adversarial Examples in Deep Neural Networks. In 25th Annual Network and Distributed System Security Symposium, NDSS 2018, San Diego, California, USA, February 18--21, 2018. The Internet Society. http://wp.internetsociety.org/ndss/wpcontent/uploads/sites/25/2018/02/ndss2018_03A-4_Xu_paper.pdfGoogle ScholarGoogle Scholar
  64. Valentina Zantedeschi, Maria-Irina Nicolae, and Ambrish Rawat. 2017. Efficient defenses against adversarial attacks. In Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security. 39--49.Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. Stephan Zheng, Yang Song, Thomas Leung, and Ian Goodfellow. 2016. Improving the robustness of deep neural networks via stability training. In Proceedings of the ieee conference on computer vision and pattern recognition. 4480--4488.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Feature-Indistinguishable Attack to Circumvent Trapdoor-Enabled Defense

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          CCS '21: Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security
          November 2021
          3558 pages
          ISBN:9781450384544
          DOI:10.1145/3460120

          Copyright © 2021 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 13 November 2021

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          Overall Acceptance Rate1,261of6,999submissions,18%

          Upcoming Conference

          CCS '24
          ACM SIGSAC Conference on Computer and Communications Security
          October 14 - 18, 2024
          Salt Lake City , UT , USA

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader