Abstract
Convolutional neural network (CNN) models are widely used for image classification. However, CNN models are vulnerable to out-of-distribution (OoD) samples. This vulnerability makes it difficult to use CNN models in safety-critical applications (e.g., autonomous driving, medical diagnostics). OoD samples occur either naturally or in an adversarial setting. Detecting OoD samples is an active area of research. Papernot and McDaniel [43] have proposed a detection method based on applying a nearest neighbor (NN) search on the layer activations of the CNN. The result of the NN search is used to identify if a sample is in-distribution or OoD. However, a NN search is slow and memory-intensive at inference. We examine a more efficient alternative detection approach based on clustering. We have conducted experiments for CNN models trained on MNIST, SVHN, and CIFAR-10. In the experiments, we have tested our approach on naturally occurring OoD samples, and several kinds of adversarial examples. We have also compared different clustering strategies. Our results show that a clustering-based approach is suitable for detecting OoD samples. This approach is faster and more memory-efficient than a NN approach.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Ackermann, M.R., Blömer, J., Kuntze, D., Sohler, C.: Analysis of agglomerative clustering. Algorithmica 69, 184–215 (2014)
Ankerst, M., Breunig, M.M., Kriegel, H.P., Sander, J.: Optics: ordering points to identify the clustering structure. In: Proceedings of SIGMOD, pp. 49–60. ACM, Philadelphia (1999)
Biggio, B., et al.: Evasion attacks against machine learning at test time. In: Blockeel, H., Kersting, K., Nijssen, S., Železný, F. (eds.) ECML PKDD 2013. LNCS (LNAI), vol. 8190, pp. 387–402. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-40994-3_25
Blundell, C., Cornebise, J., Kavukcuoglu, K., Wierstra, D.: Weight uncertainty in neural networks. In: Bach, F., Blei, D. (eds.) ICML, vol. 37, pp. 1613–1622. PMLR, Lille (2015)
Chen, B., et al.: Detecting backdoor attacks on deep neural networks by activation clustering. In: Espinoza, H., hÉigeartaigh, S.Ó., Huang, X., Hernández-Orallo, J., Castillo-Effen, M. (eds.) Workshop on SafeAI@AAAI. CEUR Workshop, vol. 2301. ceur-ws.org, Honolulu (2019)
Chen, T., Navratil, J., Iyengar, V., Shanmugam, K.: Confidence scoring using whitebox meta-models with linear classifier probes. In: Chaudhuri, K., Sugiyama, M. (eds.) AISTATS, vol. 89, pp. 1467–1475. PMLR, Naha (2019)
Chou, E., Tramer, F., Pellegrino, G.: Sentinet: detecting localized universal attacks against deep learning systems. ArXiv https://arxiv.org/abs/1812.00292 (2020)
Clanuwat, T., Bober-Irizar, M., Kitamoto, A., Lamb, A., Yamamoto, K., Ha, D.: Deep learning for classical Japanese literature. ArXiv https://arxiv.org/abs/1812.01718 (2018)
Cohen, G., Sapiro, G., Giryes, R.: Detecting adversarial samples using influence functions and nearest neighbors. In: CVPR, pp. 14441–14450. IEEE, Seattle (2020)
Crecchi, F., Bacciu, D., Biggio, B.: Detecting adversarial examples through nonlinear dimensionality reduction. ArXiv https://arxiv.org/abs/1904.13094 (2019)
Croce, F., Hein, M.: Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks. In: ICML, vol. 119, pp. 2206–2216. PMLR (2020)
DeVries, T., Taylor, G.W.: Learning confidence for out-of-distribution detection in neural networks. ArXiv https://arxiv.org/abs/1802.04865 (2018)
Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: KDD, pp. 226–231. AAAI Press, Portland(1996)
Gal, Y.: Uncertainty in deep learning. Ph.D. thesis, Univ of Cambridge (2016)
Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: representing model uncertainty in deep learning. In: Balcan, M., Weinberger, K. (eds.) ICML, vol. 48, pp. 1050–1059. PMLR, New York (2016)
Goodfellow, I., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) ICLR, San Diego, CA, USA (2015)
Grosse, K., Manoharan, P., Papernot, N., Backes, M., McDaniel, P.: On the (statistical) detection of adversarial examples. ArXiv https://arxiv.org/abs/1702.06280 (2017)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778. IEEE, Las Vegas (2016)
Hendrycks, D., Gimpel, K.: A baseline for detecting misclassified and out-of-distribution examples in neural networks. In: ICLR. Toulon, France (2017)
Hendrycks, D., Mazeika, M., Kadavath, S., Song, D.: Using self-supervised learning can improve model robustness and uncertainty. In: Wallach, H., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E., Garnett, R. (eds.) NeurIPS, vol. 32, pp. 15637–15648. CAI, Vancouver(2019)
Hendrycks, D., Zhao, K., Basart, S., Steinhardt, J., Song, D.: Natural adversarial examples. ArXiv https://arxiv.org/abs/1907.07174 (2020)
Huang, H., Li, Z., Wang, L., Chen, S., Dong, B., Zhou, X.: Feature space singularity for out-of-distribution detection. ArXiv https://arxiv.org/abs/2011.14654 (2020)
Kim, H.: Torchattacks: A pytorch repository for adversarial attacks. ArXiv https://arxiv.org/abs/2010.01950 (2020)
Krizhevsky, A.: Learning multiple layers of features from tiny images. Univ of Toronto, Tech. rep. (2009)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Pereira, F., Burges, C.J.C., Bottou, L., Weinberger, K.Q. (eds.) NIPS, vol. 25, pp. 1097–1105. CAI, Lake Tahoe (2012)
Kurakin, A., Goodfellow, I.J., Bengio, S.: Adversarial examples in the physical world. In: ICLR. Toulon, France (2017)
LeCun, Y., Cortes, C., Burges, C.: Mnist handwritten digit database. ATT Labs [Online]. http://yann.lecun.com/exdb/mnist 2 (2010)
Lee, K., Lee, H., Lee, K., Shin, J.: Training confidence-calibrated classifiers for detecting out-of-distribution samples. In: ICLR. Vancouver, CA (2018)
Lee, K., Lee, K., Lee, H., Shin, J.: A simple unified framework for detecting out-of-distribution samples and adversarial attacks. ArXiv https://arxiv.org/abs/1807.03888 (2018)
Li, X., Li, F.: Adversarial examples detection in deep networks with convolutional filter statistics. In: ICCV, pp. 5775–5783. IEEE, Venice, Italy (2017)
Liang, S., Li, Y., Srikant, R.: Enhancing the reliability of out-of-distribution image detection in neural networks. In: ICLR. Vancouver, CA (2018)
Liu, W., Wang, X., Owens, J., Li, Y.: Energy-based out-of-distribution detection. In: Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M.F., Lin, H. (eds.) NeurIPS, vol. 33, pp. 21464–21475. CAI (2020)
Ma, X., et al.: Characterizing adversarial subspaces using local intrinsic dimensionality. In: ICLR. Vancouver, CA (2018)
van der Maaten, L.J.P.: Learning a parametric embedding by preserving local structure. In: van Dyk, D., Welling, M. (eds.) AISTATS, vol. 5, pp. 384–391. PMLR, Clearwater Beach (2009)
MacQueen, J.B.: Some methods for classification and analysis of multivariate observations. In: Cam, L.M.L., Neyman, J. (eds.) Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 281–297. Univ of Calif Press (1967)
Madry, A., Makelov, A., Schmidt, L., Tsipras, D., Vladu, A.: Towards deep learning models resistant to adversarial attacks. In: ICLR. Vancouver, CA (2018)
McInnes, L., Healy, J., Melville, J.: UMAP: uniform manifold approximation and projection for dimension reduction. ArXiv https://arxiv.org/abs/1802.03426 (2018)
Meng, D., Chen, H.: Magnet: A two-pronged defense against adversarial examples. In: SIGSAC, pp. 135–147. ACM, Dallas (2017)
Metzen, J.H., Genewein, T., Fischer, V., Bischoff, B.: On detecting adversarial perturbations. In: ICLR. Toulon, France (2017)
Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning. In: NIPS Workshop on Deep Learning and Unsupervised Feature Learning (2011)
Nguyen, A., Yosinski, J., Clune, J.: Multifaceted feature visualization: uncovering the different types of features learned by each neuron in deep neural networks. In: Visualization for Deep Learning workshop, International Conference in Machine Learning (2016). arXiv preprint arXiv:1602.03616
Pang, T., Du, C., Dong, Y., Zhu, J.: Towards robust detection of adversarial examples. In: Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett, R. (eds.) NeurIPS, vol. 31, pp. 4584–4594. CAI, Montreal (2018)
Papernot, N., McDaniel, P.: Deep k-nearest neighbors: towards confident, interpretable and robust deep learning. ArXiv https://arxiv.org/abs/1803.04765 (2018)
Pearson, K.: LIII. On lines and planes of closest fit to systems of points in space. London Edinb. Dublin Philos. Mag. J. Sci. 2(11), 559–572 (1901)
Qin, Y., Frosst, N., Sabour, S., Raffel, C., Cottrell, G., Hinton, G.E.: Detecting and diagnosing adversarial images with class-conditional capsule reconstructions. In: ICLR. Addis Ababa, Ethiopia (2020)
Rousseeuw, P.J.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20(1), 53–65 (1987)
Szegedy, C., et al.: Intriguing properties of neural networks. In: Bengio, Y., LeCun, Y. (eds.) ICLR. Banff, CA (2014)
Xu, W., Evans, D., Qi, Y.: Feature squeezing: detecting adversarial examples in deep neural networks. ArXiv https://arxiv.org/abs/1704.01155 (2017)
Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8689, pp. 818–833. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10590-1_53
Zhang, H., Dauphin, Y.N., Ma, T.: Fixup initialization: residual learning without normalization. ArXiv https://arxiv.org/abs/1901.09321 (2019)
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Lehmann, D., Ebner, M. (2021). Layer-Wise Activation Cluster Analysis of CNNs to Detect Out-of-Distribution Samples. In: Farkaš, I., Masulli, P., Otte, S., Wermter, S. (eds) Artificial Neural Networks and Machine Learning – ICANN 2021. ICANN 2021. Lecture Notes in Computer Science(), vol 12894. Springer, Cham. https://doi.org/10.1007/978-3-030-86380-7_18
Download citation
DOI: https://doi.org/10.1007/978-3-030-86380-7_18
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86379-1
Online ISBN: 978-3-030-86380-7
eBook Packages: Computer ScienceComputer Science (R0)