research-article

SGFL-Attack: A Similarity-Guidance Strategy for Hard-Label Textual Adversarial Attack Based on Feedback Learning

Authors:

Wenming ZhouAuthors Info & Claims

CIKM '24: Proceedings of the 33rd ACM International Conference on Information and Knowledge Management

Pages 1920 - 1929

https://doi.org/10.1145/3627673.3679639

Published: 21 October 2024 Publication History

Get Access

Abstract

Hard-label black-box textual adversarial attack presents a challenging task where only the predictions of the victim model are available. Moreover, several constraints further complicate the task of launching such attacks, including the inherent discrete and non-differentiable nature of text data and the need to introduce subtle perturbations that remain imperceptible to humans while preserving semantic similarity. Despite the considerable research efforts dedicated to this problem, existing methods still suffer from several limitations. For example, algorithms based on complex heuristic searches necessitate extensive querying, rendering them computationally expensive. The introduction of continuous gradient strategies into discrete text spaces often leads to estimation errors. Meanwhile, geometry-based strategies are prone to falling into local optima. To address these limitations, in this paper, we introduce SGFL-Attack, a novel approach that leverages a Similarity-Guidance strategy based on Feedback Learning for hard-label textual adversarial attack, with limited query budget. Specifically, the proposed SGFL-Attack utilizes word embedding vectors to assess the importance of words and positions in text sequences, and employs a feedback learning mechanism to determine reward or punishment based on changes in predicted labels caused by replacing words. In each iteration, SGFL-Attack guides the search based on knowledge acquired from the feedback learning mechanism, generating more similar samples while maintaining low perturbations. Moreover, to reduce the query budget, we incorporate local hash mapping to avoid redundant queries during the search process. Extensive experiments on seven widely used datasets show that the proposed SGFL-Attack method significantly outperforms state-of-the-art baselines and defenses over multiple language models.

References

[1]

Samuel R Bowman, Gabor Angeli, Christopher Potts, and Christopher D Manning. 2015. A large annotated corpus for learning natural language inference. arXiv preprint arXiv:1508.05326 (2015).

Abstract

References

Index Terms

Recommendations

LESSON: Multi-Label Adversarial False Data Injection Attack for Deep Learning Locational Detection

Adversarial Label Poisoning Attack on Graph Neural Networks via Label Propagation

A Novel Ranking Method for Textual Adversarial Attack

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Funding Sources

Conference

Acceptance Rates

Upcoming Conference

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Login options

Full Access

View options

PDF

eReader

Share

Share this Publication link

Share on social media

Affiliations