Abstract:
Adversarial attacks pose a significant threat to security-critical applications by deliberately deceiving model predictions. Numerous works attempt to create robust model...Show MoreMetadata
Abstract:
Adversarial attacks pose a significant threat to security-critical applications by deliberately deceiving model predictions. Numerous works attempt to create robust models by encoding useful information to intermediate representations. However, they still contain too much information about the training data which hinders improving the robustness of the model. To mitigate this issue, we propose a novel approach, CHBaR, that incorporates class-conditioned information into intermediate representations. The class-conditioned information plays the role of weight components which are multiplied with the intermediate representations to produce class-conditioned representations. We utilize an attribution-based explanation method to obtain this class-conditioned information. As a result, the weight components emphasize class-relevant features by highlighting relevant information from the target class. This weighting process easily integrates the target class without complex computations and conceals useless representations, thus enhancing model predictions by masking features unrelated to the class. Extensive experiments demonstrate the effectiveness of our proposed method in enhancing adversarial robustness. Especially, on the SVHN dataset, our proposed method shows an increment of 6.98 % points compared to the baseline model in PGD40 adversarial attack with the TRADES training setting.
Date of Conference: 06-10 October 2024
Date Added to IEEE Xplore: 20 January 2025
ISBN Information: