Abstract
Detecting out-of-distribution (OOD) inputs are pivotal for real-world applications. However, due to the inaccessibility of OODs during training phase, applying supervised binary classification with in-distribution (ID) and OOD labels is not feasible. Therefore, previous works typically employ the proxy ID classification task to learn feature representation for OOD detection task. In this study, we delve into the relationship between the two tasks through the lens of Information Theory. Our analysis reveals that optimizing the classification objective could inevitably cause the over-confidence and undesired compression of OOD detection-relevant information. To address these two problems, we propose OOD Entropy Regularization (OER) to regularize the information captured in classification-oriented representation learning for detecting OOD samples. Both theoretical analyses and experimental results underscore the consistent improvement of OER on OOD detection.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Ahmad, I., Lin, P.E.: A nonparametric estimation of the entropy for absolutely continuous distributions (corresp.). IEEE Trans. Inf. Theory 22(3), 372–375 (1976)
Barber, D., Agakov, F.: The im algorithm: a variational approach to information maximization. In: Proceedings of the 16th International Conference on Neural Information Processing Systems, pp. 201–208 (2003)
Barber, D., Agakov, F.: The im algorithm: a variational approach to information maximization. In: Advances in Neural Information Processing Systems, vol. 16, no. 320, p. 201 (2004)
Belghazi, M.I., et al.: Mutual information neural estimation. In: International Conference on Machine Learning, pp. 531–540. PMLR (2018)
Cai, M., Li, Y.: Out-of-distribution detection via frequency-regularized generative models. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 5521–5530 (2023)
Cen, J., et al.: Enlarging instance-specific and class-specific information for open-set action recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 15295–15304 (2023)
Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: International Conference on Machine Learning, pp. 1597–1607 (2020)
Cimpoi, M., Maji, S., Kokkinos, I., Mohamed, S., Vedaldi, A.: Describing textures in the wild. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3606–3613 (2014)
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255 (2009)
Djurisic, A., Bozanic, N., Ashok, A., Liu, R.: Extremely simple activation shaping for out-of-distribution detection. arXiv preprint arXiv:2209.09858 (2022)
Du, X., Gozum, G., Ming, Y., Li, Y.: Siren: shaping representations for detecting out-of-distribution objects. In: Advances in Neural Information Processing Systems, pp. 20434–20449 (2022)
Du, X., Wang, Z., Cai, M., Li, Y.: VoS: learning what you don’t know by virtual outlier synthesis. arXiv preprint arXiv:2202.01197 (2022)
Federici, M., Dutta, A., Forré, P., Kushman, N., Akata, Z.: Learning robust representations via multi-view information bottleneck. arXiv preprint arXiv:2002.07017 (2020)
Filos, A., Tigkas, P., McAllister, R., Rhinehart, N., Levine, S., Gal, Y.: Can autonomous vehicles identify, recover from, and adapt to distribution shifts? In: International Conference on Machine Learning, pp. 3145–3153 (2020)
Grathwohl, W., Wang, K.C., Jacobsen, J.H., Duvenaud, D., Norouzi, M., Swersky, K.: Your classifier is secretly an energy based model and you should treat it like one. arXiv preprint arXiv:1912.03263 (2019)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Hendrycks, D., et al.: Scaling out-of-distribution detection for real-world settings. arXiv preprint arXiv:1911.11132 (2019)
Hendrycks, D., Gimpel, K.: A baseline for detecting misclassified and out-of-distribution examples in neural networks. arXiv preprint arXiv:1610.02136 (2016)
Hobson, E.W.: The theory of spherical and ellipsoidal harmonics. CUP Archive (1931)
Khosla, P., et al.: Supervised contrastive learning. In: Advances in Neural Information Processing Systems, pp. 18661–18673 (2020)
Kirichenko, P., Izmailov, P., Wilson, A.G.: Why normalizing flows fail to detect out-of-distribution data. In: Advances in Neural Information Processing Systems, vol. 33, pp. 20578–20589 (2020)
Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images. Computer Science Department, University of Toronto, Technical report (2009)
Kullback, S.: Information Theory and Statistics. Courier Corporation (1997)
Lee, K., Lee, K., Lee, H., Shin, J.: A simple unified framework for detecting out-of-distribution samples and adversarial attacks. In: Advances in Neural Information Processing Systems (2018)
Li, J., Xiong, C., Hoi, S.C.: MoPro: webly supervised learning with momentum prototypes. arXiv preprint arXiv:2009.07995 (2020)
Liang, P.P., Deng, Z., Ma, M., Zou, J., Morency, L.P., Salakhutdinov, R.: Factorized contrastive learning: going beyond multi-view redundancy. arXiv preprint arXiv:2306.05268 (2023)
Liang, S., Li, Y., Srikant, R.: Enhancing the reliability of out-of-distribution image detection in neural networks. arXiv preprint arXiv:1706.02690 (2017)
Lin, Z., Roy, S.D., Li, Y.: Mood: multi-level out-of-distribution detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 15313–15323 (2021)
Liu, W., Wang, X., Owens, J., Li, Y.: Energy-based out-of-distribution detection. In: Advances in Neural Information Processing Systems, pp. 21464–21475 (2020)
Liu, Z., Miao, Z., Zhan, X., Wang, J., Gong, B., Yu, S.X.: Large-scale long-tailed recognition in an open world. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2537–2546 (2019)
Loshchilov, I., Hutter, F.: SGDR: stochastic gradient descent with warm restarts. arXiv preprint arXiv:1608.03983 (2016)
Mai, S., Zeng, Y., Hu, H.: Multimodal information bottleneck: learning minimal sufficient unimodal and multimodal representations. IEEE Trans. Multimed. (2022)
McInnes, L., Healy, J., Melville, J.: UMAP: uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802.03426 (2018)
Ming, Y., Sun, Y., Dia, O., Li, Y.: Cider: exploiting hyperspherical embeddings for out-of-distribution detection. arXiv preprint arXiv:2203.04450 (2022)
Nalisnick, E., Matsukawa, A., Teh, Y.W., Gorur, D., Lakshminarayanan, B.: Do deep generative models know what they don’t know? arXiv preprint arXiv:1810.09136 (2018)
Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning. In: NIPS Workshop on Deep Learning and Unsupervised Feature Learning (2011)
Nguyen, A.T., Lu, F., Munoz, G.L., Raff, E., Nicholas, C., Holt, J.: Out of distribution data detection using dropout Bayesian neural networks. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 7877–7885 (2022)
Nguyen, X., Wainwright, M.J., Jordan, M.I.: Estimating divergence functionals and the likelihood ratio by convex risk minimization. IEEE Trans. Inf. Theory 56(11), 5847–5861 (2010)
van den Oord, A., Li, Y., Vinyals, O.: Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748 (2018)
Poole, B., Ozair, S., Van Den Oord, A., Alemi, A., Tucker, G.: On variational bounds of mutual information. In: International Conference on Machine Learning, pp. 5171–5180. PMLR (2019)
Ren, J., Fort, S., Liu, J., Roy, A.G., Padhy, S., Lakshminarayanan, B.: A simple fix to mahalanobis distance for improving near-OOD detection. arXiv preprint arXiv:2106.09022 (2021)
Ren, J., et al.: Likelihood ratios for out-of-distribution detection. In: Advances in Neural Information Processing Systems, vol. 32 (2019)
Sehwag, V., Chiang, M., Mittal, P.: SSD: a unified framework for self-supervised outlier detection. arXiv preprint arXiv:2103.12051 (2021)
Shannon, C.E.: A mathematical theory of communication. Bell Syst. Tech. J. 27(3), 379–423 (1948)
Sun, Y., Guo, C., Li, Y.: React: out-of-distribution detection with rectified activations. In: Advances in Neural Information Processing Systems, pp. 144–157 (2021)
Sun, Y., Li, Y.: Dice: leveraging sparsification for out-of-distribution detection. In: European Conference on Computer Vision, pp. 691–708 (2022)
Sun, Y., Ming, Y., Zhu, X., Li, Y.: Out-of-distribution detection with deep nearest neighbors. In: International Conference on Machine Learning, pp. 20827–20840 (2022)
Tack, J., Mo, S., Jeong, J., Shin, J.: CSI: novelty detection via contrastive learning on distributionally shifted instances. In: Advances in Neural Information Processing Systems, pp. 11839–11852 (2020)
Tao, L., Du, X., Zhu, X., Li, Y.: Non-parametric outlier synthesis. arXiv preprint arXiv:2303.02966 (2023)
Tian, Y., Sun, C., Poole, B., Krishnan, D., Schmid, C., Isola, P.: What makes for good views for contrastive learning? In: Advances in Neural Information Processing Systems, vol. 33, pp. 6827–6839 (2020)
Tishby, N., Pereira, F.C., Bialek, W.: The information bottleneck method. arXiv preprint physics/0004057 (2000)
Tishby, N., Zaslavsky, N.: Deep learning and the information bottleneck principle. In: 2015 IEEE Information Theory Workshop, pp. 1–5 (2015)
Van Horn, G., et al.: The inaturalist species classification and detection dataset. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8769–8778 (2018)
Vaze, S., Han, K., Vedaldi, A., Zisserman, A.: Open-set recognition: a good closed-set classifier is all you need? (2022)
Wang, F., Liu, H.: Understanding the behaviour of contrastive loss. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2495–2504 (2021)
Wang, H., Guo, X., Deng, Z.H., Lu, Y.: Rethinking minimal sufficient representation in contrastive learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16041–16050 (2022)
Wang, H., Liu, W., Bocchieri, A., Li, Y.: Can multi-label classification networks know what they don’t know? In: Advances in Neural Information Processing Systems, pp. 29074–29087 (2021)
Winkens, J., et al.: Contrastive training for improved out-of-distribution detection. arXiv preprint arXiv:2007.05566 (2020)
Xiao, J., Hays, J., Ehinger, K.A., Oliva, A., Torralba, A.: Sun database: large-scale scene recognition from abbey to zoo. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 3485–3492 (2010)
Xiao, Z., Yan, Q., Amit, Y.: Likelihood regret: an out-of-distribution detection score for variational auto-encoder. In: Advances in Neural Information Processing Systems, vol. 33, pp. 20685–20696 (2020)
Xu, P., Ehinger, K.A., Zhang, Y., Finkelstein, A., Kulkarni, S.R., Xiao, J.: Turkergaze: crowdsourcing saliency with webcam based eye tracking. arXiv preprint arXiv:1504.06755 (2015)
Yang, J., et al.: Semantically coherent out-of-distribution detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 8301–8309 (2021)
Yu, F., Seff, A., Zhang, Y., Song, S., Funkhouser, T., Xiao, J.: LSUN: construction of a large-scale image dataset using deep learning with humans in the loop. arXiv preprint arXiv:1506.03365 (2015)
Zagoruyko, S., Komodakis, N.: Wide residual networks. arXiv preprint arXiv:1605.07146 (2016)
Zhang, J., et al.: Out-of-distribution detection based on in-distribution data patterns memorization with modern hopfield energy. In: The Eleventh International Conference on Learning Representations (2022)
Zhang, Y., Xu, Y., Chen, J., Xie, F., Chen, H.: Prototypical information bottlenecking and disentangling for multimodal cancer survival prediction. arXiv preprint arXiv:2401.01646 (2024)
Zhang, Z., Xiang, X.: Decoupling maxlogit for out-of-distribution detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3388–3397 (2023)
Zhou, B., Lapedriza, A., Khosla, A., Oliva, A., Torralba, A.: Places: a 10 million image database for scene recognition. IEEE Trans. Pattern Anal. Mach. Intell. 1452–1464 (2017)
Zimmerer, D., et al.: Mood 2020: a public benchmark for out-of-distribution detection and localization on medical images. IEEE Trans. Med. Imaging 2728–2738 (2022)
Acknowledgments
This work is partially supported by National Key R&D Program of China no. 2021ZD0111901, and National Natural Science Foundation of China (NSFC): 62376259 and 62276246.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Hu, J., Liu, W., Chang, H., Ma, B., Shan, S., Chen, X. (2025). An Information Theoretical View for Out-of-Distribution Detection. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15113. Springer, Cham. https://doi.org/10.1007/978-3-031-73001-6_24
Download citation
DOI: https://doi.org/10.1007/978-3-031-73001-6_24
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-73000-9
Online ISBN: 978-3-031-73001-6
eBook Packages: Computer ScienceComputer Science (R0)