research-article

Generating Content-Preserving and Semantics-Flipping Adversarial Text

Authors:

Weiping Pei,

Chuan YueAuthors Info & Claims

ASIA CCS '22: Proceedings of the 2022 ACM on Asia Conference on Computer and Communications Security

Pages 975 - 989

https://doi.org/10.1145/3488932.3517397

Published: 30 May 2022 Publication History

Get Access

Abstract

Natural Language Processing (NLP) models are often vulnerable to semantics-preserving adversarial attacks. That is, they make different semantic predictions on input instances with similar content and semantics. However, it remains unclear to which extent modern NLP models are vulnerable to content-preserving and semantics-flipping (CPSF) adversarial attacks. That is, they would make the same semantic prediction on input instances with similar content but flipped semantics. Attackers can use either semantics-preserving or CPSF adversarial examples to create misunderstanding between humans and models, and incur severe consequences in real-world applications. However, this equally important problem on CPSF adversarial examples has not been studied by researchers yet. In this paper, we perform the first study to investigate CPSF adversarial examples and propose CPSF adversarial attacks to reveal this new type of vulnerability of NLP models. We develop a two-stage approach to generate CPSF adversarial examples. Our experiments on two types of NLP tasks, sentiment analysis and textual entailment, demonstrate that CPSF adversarial examples can successfully fool victim models while preserving the same content with flipped semantics to humans. We further validate the good transferability of CPSF adversarial examples on NLP services of Microsoft and Google. Moreover, we demonstrate that adversarial training can to a meaningful extent mitigate CPSF adversarial attacks. Overall, our work implies that researchers need to improve NLP models' robustness against CPSF adversarial attacks that uniquely exploit the blind spots where NLP models are too insensitive to even big changes in semantics.

Supplementary Material

MP4 File (ASIA-CCS22-fp338.mp4)

The presentation video of the paper "Generating Content-Preserving and Semantics-Flipping Adversarial Text".

Download
29.12 MB

References

[1]

Moustafa Alzantot, Yash Sharma, Ahmed Elgohary, Bo-Jhang Ho, Mani Srivastava, and Kai-Wei Chang. 2018. Generating Natural Language Adversarial Examples. Proceedings of the Conference on Empirical Methods in Natural Language Processing. 2890--2896.

Abstract

Supplementary Material

References

Cited By

Index Terms

Recommendations

Adversarial examples: A survey of attacks and defenses in deep learning-enabled cybersecurity systems

Adversarial Attacks and Defenses: Frontiers, Advances and Practice

Textual adversarial attacks in cybersecurity named entity recognition

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Funding Sources

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Share

Share this Publication link

Share on social media

Affiliations