Abstract
Deep neural networks are vulnerable to adversarial examples. Prior defenses attempted to make deep networks more robust by either changing the network architecture or augmenting the training set with adversarial examples, but both have inherent limitations. Motivated by recent research that shows outliers in the training set have a high negative influence on the trained model, we studied the relationship between model robustness and the quality of the training set. We first show that outliers give the model better generalization ability but weaker robustness. Next, we propose an adversarial example detection framework, in which we design two methods for removing outliers from training set to obtain the sanitized model and then detect adversarial example by calculating the difference of outputs between the original and the sanitized model. We evaluated the framework on both MNIST and SVHN. Based on the difference measured by Kullback-Leibler divergence, we could detect adversarial examples with accuracy between 94.67% to 99.89%.
Y. Liu and J. Chen—Equal contribution
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Unlike data sanitization, which commonly modifies individual datum, we modify no example but merely remove outliers from the training set.
- 2.
Some computer fonts are difficult to recognize and therefore are excluded from our evaluation
- 3.
- 4.
References
Carlini, N., Wagner, D.: Adversarial examples are not easily detected: bypassing ten detection methods. arXiv preprint arXiv:1705.07263 (2017)
Carlini, N., Wagner, D.: Towards evaluating the robustness of neural networks. In: IEEE Symposium on Security and Privacy (2017)
Cook, R.D.: Detection of influential observation in linear regression. Technometrics 19(1), 15–18 (1977)
Cook, R.D.: Influential observations in linear regression. J. Am. Stat. Assoc. 74(365), 169–174 (1979)
Cook, R.D., Weisberg, S.: Characterizations of an empirical influence function for detecting influential cases in regression. Technometrics 22(4), 495–508 (1980)
Cook, R.D., Weisberg, S.: Residuals and Influence in Regression. Chapman and Hall, New York (1982)
Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: International Conference on Learning Representations (ICLR) (2015)
Koh, P.W., Liang, P.: Understanding black-box predictions via influence functions. In: International Conference on Machine Learning (2017)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
Kullback, S., Leibler, R.A.: On information and sufficiency. Ann. Math. Stat. 22(1), 79–86 (1951)
Kurakin, A., Goodfellow, I.J., Bengio, S.: Adversarial examples in the physical world. CoRR abs/1607.02533 (2016)
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)
Madsen, H., Thyregod, P.: Introduction to General and Generalized Linear Models. CRC Press, Boca Raton (2010)
Meng, D., Chen, H.: MagNet: a two-pronged defense against adversarial examples. In: ACM Conference on Computer and Communications Security, Dallas, TX (CCS) (2017)
Metzen, J.H., Genewein, T., Fischer, V., Bischoff, B.: On detecting adversarial perturbations. In: International Conference on Learning Representations (ICLR) (2017)
Moosavi-Dezfooli, S.M., Fawzi, A., Frossard, P.: DeepFool: a simple and accurate method to fool deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2574–2582 (2016)
Papernot, N., McDaniel, P., Wu, X., Jha, S., Swami, A.: Distillation as a defense to adversarial perturbations against deep neural networks. In: IEEE Symposium on Security and Privacy (2016)
Szegedy, C., et al.: Intriguing properties of neural networks. In: International Conference on Learning Representations (ICLR) (2014)
Wojnowicz, M., et al.: Influence sketching: finding influential samples in large-scale regressions. In: 2016 IEEE International Conference on Big Data (Big Data), pp. 3601–3612. IEEE (2016)
Xu, W., Evans, D., Qi, Y.: Feature squeezing: detecting adversarial examples in deep neural networks. In: Network and Distributed Systems Security Symposium (NDSS) (2018)
Acknowledgment
This material is based upon work supported by the National Science Foundation under Grant No. 1801751.
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Liu, Y., Chen, J., Chen, H. (2018). Less is More: Culling the Training Set to Improve Robustness of Deep Neural Networks. In: Bushnell, L., Poovendran, R., BaÅŸar, T. (eds) Decision and Game Theory for Security. GameSec 2018. Lecture Notes in Computer Science(), vol 11199. Springer, Cham. https://doi.org/10.1007/978-3-030-01554-1_6
Download citation
DOI: https://doi.org/10.1007/978-3-030-01554-1_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-01553-4
Online ISBN: 978-3-030-01554-1
eBook Packages: Computer ScienceComputer Science (R0)