Abstract:
In light of the widespread utilization of deep neural networks in various natural language tasks, adversarial attacks targeting these models have emerged as a significant...Show MoreMetadata
Abstract:
In light of the widespread utilization of deep neural networks in various natural language tasks, adversarial attacks targeting these models have emerged as a significant concern. Through the introduction of subtle perturbations via word substitution, word-level adversarial attacks have demonstrated the ability to deceive deep classifiers into making erroneous decisions. Previous word-level attacks have predominantly focused on substituting verbs and adjectives with their synonyms. In this paper, we explore whether deep textual classifiers exhibit resilience against the replacement of Named Entities (NEs) belonging to the same category. We introduce an effective NE adversarial attack method, which underscores that NEs are very sensitive to such substitutions for deep textual classifiers. To enhance the robustness of these classifiers, we also propose three defense strategies: 1) mask replacement, 2) concept replacement, and 3) data augmentation based on NE sampling. We assess the performance of these strategies in text classification tasks involving various victim and attack models, employing a range of standard datasets. Experimental results demonstrate the efficacy of our defense strategies against NE adversarial attacks.
Published in: ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Date of Conference: 14-19 April 2024
Date Added to IEEE Xplore: 18 March 2024
ISBN Information: