research-article

Public Access

LeapAttack: Hard-Label Adversarial Attack on Text via Gradient-Based Optimization

Authors:
Muchao Ye

The Pennsylvania State University, University Park, PA, USA

The Pennsylvania State University, University Park, PA, USA
View Profile

,
Jinghui Chen

The Pennsylvania State University, University Park, PA, USA

The Pennsylvania State University, University Park, PA, USA
View Profile

,
Chenglin Miao

University of Georgia, Athens, GA, USA

University of Georgia, Athens, GA, USA
View Profile

,
Ting Wang

The Pennsylvania State University, University Park, PA, USA

The Pennsylvania State University, University Park, PA, USA
View Profile

,
Fenglong Ma

The Pennsylvania State University, University Park, PA, USA

The Pennsylvania State University, University Park, PA, USA
View Profile

KDD '22: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data MiningAugust 2022Pages 2307–2315https://doi.org/10.1145/3534678.3539357

Published:14 August 2022Publication History

KDD '22: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

Pages 2307–2315

ABSTRACT

Generating text adversarial examples in the hard-label setting is a more realistic and challenging black-box adversarial attack problem, whose challenge comes from the fact that gradient cannot be directly calculated from discrete word replacements. Consequently, the effectiveness of gradient-based methods for this problem still awaits improvement. In this paper, we propose a gradient-based optimization method named LeapAttack to craft high-quality text adversarial examples in the hard-label setting. To specify, LeapAttack employs the word embedding space to characterize the semantic deviation between the two words of each perturbed substitution by their difference vector. Facilitated by this expression, LeapAttack gradually updates the perturbation direction and constructs adversarial examples in an iterative round trip: firstly, the gradient is estimated by transforming randomly sampled word candidates to continuous difference vectors after moving the current adversarial example near the decision boundary; secondly, the estimated gradient is mapped back to a new substitution word based on the cosine similarity metric. Extensive experimental results show that in the general case LeapAttack can efficiently generate high-quality text adversarial examples with the highest semantic similarity and the lowest perturbation rate in the hard-label setting.

Supplemental Material

KDD22-rtfp1386.mp4

mp4

26.4 MB

Download

References

Samuel R. Bowman, Gabor Angeli, Christopher Potts, and Christopher D. Manning. 2015. A large annotated corpus for learning natural language inference. In EMNLP. The Association for Computational Linguistics, 632--642.Google Scholar
Wieland Brendel, Jonas Rauber, and Matthias Bethge. 2018. Decision-Based Adversarial Attacks: Reliable Attacks Against Black-Box Machine Learning Models. In ICLR. OpenReview.net.Google Scholar
Nicholas Carlini and David A. Wagner. 2017. Towards Evaluating the Robustness of Neural Networks. In S&P. IEEE Computer Society, 39--57.Google Scholar
Daniel Cer, Yinfei Yang, Sheng-yi Kong, Nan Hua, Nicole Limtiaco, Rhomni St John, Noah Constant, Mario Guajardo-Céspedes, Steve Yuan, Chris Tar, et al. 2018. Universal sentence encoder. arXiv preprint arXiv:1803.11175 (2018).Google Scholar
Jianbo Chen, Michael I. Jordan, and Martin J. Wainwright. 2020. HopSkipJumpAttack: A Query-Efficient Decision-Based Attack. In S&P. IEEE, 1277--1294.Google Scholar
Minhao Cheng, Thong Le, Pin-Yu Chen, Huan Zhang, Jinfeng Yi, and Cho-Jui Hsieh. 2019. Query-Efficient Hard-label Black-box Attack: An Optimization-based Approach. In ICLR. OpenReview.net.Google Scholar
Minhao Cheng, Simranjit Singh, Patrick H. Chen, Pin-Yu Chen, Sijia Liu, and Cho-Jui Hsieh. 2020. Sign-OPT: A Query-Efficient Hard-label Adversarial Attack. In International Conference on Learning Representations.Google Scholar
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In NAACL-HLT. Association for Computational Linguistics, 4171--4186.Google Scholar
Javid Ebrahimi, Anyi Rao, Daniel Lowd, and Dejing Dou. 2018. HotFlip: White- Box Adversarial Examples for Text Classification. In ACL. Association for Computational Linguistics, 31--36.Google Scholar
Ji Gao, Jack Lanchantin, Mary Lou Soffa, and Yanjun Qi. 2018. Black-Box Generation of Adversarial Text Sequences to Evade Deep Learning Classifiers. In SP Workshops. IEEE Computer Society, 50--56.Google ScholarCross Ref
Ian J. Goodfellow, Jonathon Shlens, and Christian Szegedy. 2015. Explaining and Harnessing Adversarial Examples. In ICLR.Google Scholar
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long Short-Term Memory. Neural Comput. 9, 8 (1997), 1735--1780.Google ScholarDigital Library
Di Jin, Zhijing Jin, Joey Tianyi Zhou, and Peter Szolovits. 2020. Is BERT Really Robust? A Strong Baseline for Natural Language Attack on Text Classification and Entailment. In AAAI. AAAI Press, 8018--8025.Google Scholar
Yoon Kim. 2014. Convolutional Neural Networks for Sentence Classification. In EMNLP. ACL, 1746--1751.Google Scholar
Alexey Kurakin, Ian J. Goodfellow, and Samy Bengio. 2017. Adversarial examples in the physical world. In ICLR. OpenReview.net.Google Scholar
Jinfeng Li, Shouling Ji, Tianyu Du, Bo Li, and Ting Wang. 2019. TextBugger: Generating Adversarial Text Against Real-world Applications. In NDSS.Google Scholar
Andrew L. Maas, Raymond E. Daly, Peter T. Pham, Dan Huang, Andrew Y. Ng, and Christopher Potts. 2011. Learning Word Vectors for Sentiment Analysis. In ACL. The Association for Computer Linguistics, 142--150.Google ScholarDigital Library
Rishabh Maheshwary, Saket Maheshwary, and Vikram Pudi. 2021. Generating Natural Language Attacks in a Hard Label Black Box Setting. In AAAI.Google Scholar
Paul Michel, Xian Li, Graham Neubig, and Juan Miguel Pino. 2019. On Evaluation of Adversarial Perturbations for Sequence-to-Sequence Models. In NAACL-HLT 2019. Association for Computational Linguistics, 3103--3114.Google ScholarCross Ref
Shakir Mohamed, Mihaela Rosca, Michael Figurnov, and Andriy Mnih. 2020. Monte Carlo Gradient Estimation in Machine Learning. J. Mach. Learn. Res. 21, 132 (2020), 1--62.Google Scholar
Nikola Mrksic, Diarmuid Ó Séaghdha, Blaise Thomson, Milica Gasic, Lina Maria Rojas-Barahona, Pei-Hao Su, David Vandyke, Tsung-Hsien Wen, and Steve J. Young. 2016. Counter-fitting Word Vectors to Linguistic Constraints. In NAACL HLT 2016. The Association for Computational Linguistics, 142--148.Google Scholar
Bo Pang and Lillian Lee. 2005. Seeing Stars: Exploiting Class Relationships for Sentiment Categorization with Respect to Rating Scales. In ACL. The Association for Computer Linguistics, 115--124.Google ScholarDigital Library
Shuhuai Ren, Yihe Deng, Kun He, and Wanxiang Che. 2019. Generating Natural Language Adversarial Examples through Probability Weighted Word Saliency. In ACL. Association for Computational Linguistics, 1085--1097.Google Scholar
Adina Williams, Nikita Nangia, and Samuel R. Bowman. 2018. A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference. In NAACL-HLT. Association for Computational Linguistics, 1112--1122.Google Scholar
Muchao Ye, Chenglin Miao, Ting Wang, and Fenglong Ma. 2022. TextHoaxer: Budgeted Hard-Label Adversarial Attacks on Text. AAAI (2022).Google Scholar
Xiang Zhang, Junbo Jake Zhao, and Yann LeCun. 2015. Character-level Convolutional Networks for Text Classification. In NeurIPS. 649--657.Google ScholarDigital Library
Yao Zhou, Jun Wu, Haixun Wang, and Jingrui He. 2020. Adversarial Robustness through Bias Variance Decomposition: A New Perspective for Federated Learning. arXiv preprint arXiv:2009.09026 (2020).Google Scholar
Yi Zhou, Xiaoqing Zheng, Cho-Jui Hsieh, Kai-Wei Chang, and Xuanjing Huang. 2021. Defense against Synonym Substitution-based Adversarial Attacks via Dirichlet Neighborhood Ensemble. In ACL. Online.Google Scholar

Index Terms

LeapAttack: Hard-Label Adversarial Attack on Text via Gradient-Based Optimization
1. Computing methodologies
  1. Artificial intelligence
    1. Search methodologies
      1. Discrete space search
2. Security and privacy

Recommendations

Average Gradient-Based Adversarial Attack
Deep neural networks (DNNs) are vulnerable to adversarial attacks which can fool the classifiers by adding small perturbations to the original example. The added perturbations in most existing attacks are mainly determined by the gradient of the loss ...
Read More
On three-term conjugate gradient algorithms for unconstrained optimization

This paper presents a project for three-term conjugate gradient algorithms development. The search direction of the algorithms from this class has three terms and is computed as modifications of the classical conjugate gradient algorithms to satisfy ...
Read More
Image Sharpening via Sobolev Gradient Flows

Motivated by some recent work in active contour applications, we study the use of Sobolev gradients for PDE-based image diffusion and sharpening. We begin by studying, for the case of isotropic diffusion, the gradient descent/ascent equation obtained by ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
KDD '22: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
August 2022
5033 pages
ISBN:9781450393850
DOI:10.1145/3534678
General Chairs:
Aidong Zhang
University of Virginia
,
Huzefa Rangwala
Amazon/George Mason University
Copyright © 2022 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 14 August 2022
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
gradient-based optimization
hard-label text adversarial attack
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate1,133of8,635submissions,13%
Upcoming Conference
KDD '24

Sponsor:

sigkdd

sigkdd

The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 25 - 29, 2024

Barcelona , Spain
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 2
  Total Citations
  View Citations
- 518
  Total Downloads
- Downloads (Last 12 months)288
- Downloads (Last 6 weeks)42
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

LeapAttack: Hard-Label Adversarial Attack on Text via Gradient-Based Optimization

KDD '22: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Average Gradient-Based Adversarial Attack

On three-term conjugate gradient algorithms for unconstrained optimization

Image Sharpening via Sobolev Gradient Flows

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

LeapAttack: Hard-Label Adversarial Attack on Text via Gradient-Based Optimization

KDD '22: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Average Gradient-Based Adversarial Attack

On three-term conjugate gradient algorithms for unconstrained optimization

Image Sharpening via Sobolev Gradient Flows

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media