Skip to main content

An Information Theoretical View for Out-of-Distribution Detection

  • Conference paper
  • First Online:
Computer Vision – ECCV 2024 (ECCV 2024)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15113))

Included in the following conference series:

  • 199 Accesses

Abstract

Detecting out-of-distribution (OOD) inputs are pivotal for real-world applications. However, due to the inaccessibility of OODs during training phase, applying supervised binary classification with in-distribution (ID) and OOD labels is not feasible. Therefore, previous works typically employ the proxy ID classification task to learn feature representation for OOD detection task. In this study, we delve into the relationship between the two tasks through the lens of Information Theory. Our analysis reveals that optimizing the classification objective could inevitably cause the over-confidence and undesired compression of OOD detection-relevant information. To address these two problems, we propose OOD Entropy Regularization (OER) to regularize the information captured in classification-oriented representation learning for detecting OOD samples. Both theoretical analyses and experimental results underscore the consistent improvement of OER on OOD detection.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Ahmad, I., Lin, P.E.: A nonparametric estimation of the entropy for absolutely continuous distributions (corresp.). IEEE Trans. Inf. Theory 22(3), 372–375 (1976)

    Google Scholar 

  2. Barber, D., Agakov, F.: The im algorithm: a variational approach to information maximization. In: Proceedings of the 16th International Conference on Neural Information Processing Systems, pp. 201–208 (2003)

    Google Scholar 

  3. Barber, D., Agakov, F.: The im algorithm: a variational approach to information maximization. In: Advances in Neural Information Processing Systems, vol. 16, no. 320, p. 201 (2004)

    Google Scholar 

  4. Belghazi, M.I., et al.: Mutual information neural estimation. In: International Conference on Machine Learning, pp. 531–540. PMLR (2018)

    Google Scholar 

  5. Cai, M., Li, Y.: Out-of-distribution detection via frequency-regularized generative models. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 5521–5530 (2023)

    Google Scholar 

  6. Cen, J., et al.: Enlarging instance-specific and class-specific information for open-set action recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 15295–15304 (2023)

    Google Scholar 

  7. Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: International Conference on Machine Learning, pp. 1597–1607 (2020)

    Google Scholar 

  8. Cimpoi, M., Maji, S., Kokkinos, I., Mohamed, S., Vedaldi, A.: Describing textures in the wild. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3606–3613 (2014)

    Google Scholar 

  9. Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255 (2009)

    Google Scholar 

  10. Djurisic, A., Bozanic, N., Ashok, A., Liu, R.: Extremely simple activation shaping for out-of-distribution detection. arXiv preprint arXiv:2209.09858 (2022)

  11. Du, X., Gozum, G., Ming, Y., Li, Y.: Siren: shaping representations for detecting out-of-distribution objects. In: Advances in Neural Information Processing Systems, pp. 20434–20449 (2022)

    Google Scholar 

  12. Du, X., Wang, Z., Cai, M., Li, Y.: VoS: learning what you don’t know by virtual outlier synthesis. arXiv preprint arXiv:2202.01197 (2022)

  13. Federici, M., Dutta, A., Forré, P., Kushman, N., Akata, Z.: Learning robust representations via multi-view information bottleneck. arXiv preprint arXiv:2002.07017 (2020)

  14. Filos, A., Tigkas, P., McAllister, R., Rhinehart, N., Levine, S., Gal, Y.: Can autonomous vehicles identify, recover from, and adapt to distribution shifts? In: International Conference on Machine Learning, pp. 3145–3153 (2020)

    Google Scholar 

  15. Grathwohl, W., Wang, K.C., Jacobsen, J.H., Duvenaud, D., Norouzi, M., Swersky, K.: Your classifier is secretly an energy based model and you should treat it like one. arXiv preprint arXiv:1912.03263 (2019)

  16. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

    Google Scholar 

  17. Hendrycks, D., et al.: Scaling out-of-distribution detection for real-world settings. arXiv preprint arXiv:1911.11132 (2019)

  18. Hendrycks, D., Gimpel, K.: A baseline for detecting misclassified and out-of-distribution examples in neural networks. arXiv preprint arXiv:1610.02136 (2016)

  19. Hobson, E.W.: The theory of spherical and ellipsoidal harmonics. CUP Archive (1931)

    Google Scholar 

  20. Khosla, P., et al.: Supervised contrastive learning. In: Advances in Neural Information Processing Systems, pp. 18661–18673 (2020)

    Google Scholar 

  21. Kirichenko, P., Izmailov, P., Wilson, A.G.: Why normalizing flows fail to detect out-of-distribution data. In: Advances in Neural Information Processing Systems, vol. 33, pp. 20578–20589 (2020)

    Google Scholar 

  22. Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images. Computer Science Department, University of Toronto, Technical report (2009)

    Google Scholar 

  23. Kullback, S.: Information Theory and Statistics. Courier Corporation (1997)

    Google Scholar 

  24. Lee, K., Lee, K., Lee, H., Shin, J.: A simple unified framework for detecting out-of-distribution samples and adversarial attacks. In: Advances in Neural Information Processing Systems (2018)

    Google Scholar 

  25. Li, J., Xiong, C., Hoi, S.C.: MoPro: webly supervised learning with momentum prototypes. arXiv preprint arXiv:2009.07995 (2020)

  26. Liang, P.P., Deng, Z., Ma, M., Zou, J., Morency, L.P., Salakhutdinov, R.: Factorized contrastive learning: going beyond multi-view redundancy. arXiv preprint arXiv:2306.05268 (2023)

  27. Liang, S., Li, Y., Srikant, R.: Enhancing the reliability of out-of-distribution image detection in neural networks. arXiv preprint arXiv:1706.02690 (2017)

  28. Lin, Z., Roy, S.D., Li, Y.: Mood: multi-level out-of-distribution detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 15313–15323 (2021)

    Google Scholar 

  29. Liu, W., Wang, X., Owens, J., Li, Y.: Energy-based out-of-distribution detection. In: Advances in Neural Information Processing Systems, pp. 21464–21475 (2020)

    Google Scholar 

  30. Liu, Z., Miao, Z., Zhan, X., Wang, J., Gong, B., Yu, S.X.: Large-scale long-tailed recognition in an open world. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2537–2546 (2019)

    Google Scholar 

  31. Loshchilov, I., Hutter, F.: SGDR: stochastic gradient descent with warm restarts. arXiv preprint arXiv:1608.03983 (2016)

  32. Mai, S., Zeng, Y., Hu, H.: Multimodal information bottleneck: learning minimal sufficient unimodal and multimodal representations. IEEE Trans. Multimed. (2022)

    Google Scholar 

  33. McInnes, L., Healy, J., Melville, J.: UMAP: uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802.03426 (2018)

  34. Ming, Y., Sun, Y., Dia, O., Li, Y.: Cider: exploiting hyperspherical embeddings for out-of-distribution detection. arXiv preprint arXiv:2203.04450 (2022)

  35. Nalisnick, E., Matsukawa, A., Teh, Y.W., Gorur, D., Lakshminarayanan, B.: Do deep generative models know what they don’t know? arXiv preprint arXiv:1810.09136 (2018)

  36. Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning. In: NIPS Workshop on Deep Learning and Unsupervised Feature Learning (2011)

    Google Scholar 

  37. Nguyen, A.T., Lu, F., Munoz, G.L., Raff, E., Nicholas, C., Holt, J.: Out of distribution data detection using dropout Bayesian neural networks. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 7877–7885 (2022)

    Google Scholar 

  38. Nguyen, X., Wainwright, M.J., Jordan, M.I.: Estimating divergence functionals and the likelihood ratio by convex risk minimization. IEEE Trans. Inf. Theory 56(11), 5847–5861 (2010)

    Article  MathSciNet  Google Scholar 

  39. van den Oord, A., Li, Y., Vinyals, O.: Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748 (2018)

  40. Poole, B., Ozair, S., Van Den Oord, A., Alemi, A., Tucker, G.: On variational bounds of mutual information. In: International Conference on Machine Learning, pp. 5171–5180. PMLR (2019)

    Google Scholar 

  41. Ren, J., Fort, S., Liu, J., Roy, A.G., Padhy, S., Lakshminarayanan, B.: A simple fix to mahalanobis distance for improving near-OOD detection. arXiv preprint arXiv:2106.09022 (2021)

  42. Ren, J., et al.: Likelihood ratios for out-of-distribution detection. In: Advances in Neural Information Processing Systems, vol. 32 (2019)

    Google Scholar 

  43. Sehwag, V., Chiang, M., Mittal, P.: SSD: a unified framework for self-supervised outlier detection. arXiv preprint arXiv:2103.12051 (2021)

  44. Shannon, C.E.: A mathematical theory of communication. Bell Syst. Tech. J. 27(3), 379–423 (1948)

    Article  MathSciNet  Google Scholar 

  45. Sun, Y., Guo, C., Li, Y.: React: out-of-distribution detection with rectified activations. In: Advances in Neural Information Processing Systems, pp. 144–157 (2021)

    Google Scholar 

  46. Sun, Y., Li, Y.: Dice: leveraging sparsification for out-of-distribution detection. In: European Conference on Computer Vision, pp. 691–708 (2022)

    Google Scholar 

  47. Sun, Y., Ming, Y., Zhu, X., Li, Y.: Out-of-distribution detection with deep nearest neighbors. In: International Conference on Machine Learning, pp. 20827–20840 (2022)

    Google Scholar 

  48. Tack, J., Mo, S., Jeong, J., Shin, J.: CSI: novelty detection via contrastive learning on distributionally shifted instances. In: Advances in Neural Information Processing Systems, pp. 11839–11852 (2020)

    Google Scholar 

  49. Tao, L., Du, X., Zhu, X., Li, Y.: Non-parametric outlier synthesis. arXiv preprint arXiv:2303.02966 (2023)

  50. Tian, Y., Sun, C., Poole, B., Krishnan, D., Schmid, C., Isola, P.: What makes for good views for contrastive learning? In: Advances in Neural Information Processing Systems, vol. 33, pp. 6827–6839 (2020)

    Google Scholar 

  51. Tishby, N., Pereira, F.C., Bialek, W.: The information bottleneck method. arXiv preprint physics/0004057 (2000)

    Google Scholar 

  52. Tishby, N., Zaslavsky, N.: Deep learning and the information bottleneck principle. In: 2015 IEEE Information Theory Workshop, pp. 1–5 (2015)

    Google Scholar 

  53. Van Horn, G., et al.: The inaturalist species classification and detection dataset. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8769–8778 (2018)

    Google Scholar 

  54. Vaze, S., Han, K., Vedaldi, A., Zisserman, A.: Open-set recognition: a good closed-set classifier is all you need? (2022)

    Google Scholar 

  55. Wang, F., Liu, H.: Understanding the behaviour of contrastive loss. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2495–2504 (2021)

    Google Scholar 

  56. Wang, H., Guo, X., Deng, Z.H., Lu, Y.: Rethinking minimal sufficient representation in contrastive learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16041–16050 (2022)

    Google Scholar 

  57. Wang, H., Liu, W., Bocchieri, A., Li, Y.: Can multi-label classification networks know what they don’t know? In: Advances in Neural Information Processing Systems, pp. 29074–29087 (2021)

    Google Scholar 

  58. Winkens, J., et al.: Contrastive training for improved out-of-distribution detection. arXiv preprint arXiv:2007.05566 (2020)

  59. Xiao, J., Hays, J., Ehinger, K.A., Oliva, A., Torralba, A.: Sun database: large-scale scene recognition from abbey to zoo. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 3485–3492 (2010)

    Google Scholar 

  60. Xiao, Z., Yan, Q., Amit, Y.: Likelihood regret: an out-of-distribution detection score for variational auto-encoder. In: Advances in Neural Information Processing Systems, vol. 33, pp. 20685–20696 (2020)

    Google Scholar 

  61. Xu, P., Ehinger, K.A., Zhang, Y., Finkelstein, A., Kulkarni, S.R., Xiao, J.: Turkergaze: crowdsourcing saliency with webcam based eye tracking. arXiv preprint arXiv:1504.06755 (2015)

  62. Yang, J., et al.: Semantically coherent out-of-distribution detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 8301–8309 (2021)

    Google Scholar 

  63. Yu, F., Seff, A., Zhang, Y., Song, S., Funkhouser, T., Xiao, J.: LSUN: construction of a large-scale image dataset using deep learning with humans in the loop. arXiv preprint arXiv:1506.03365 (2015)

  64. Zagoruyko, S., Komodakis, N.: Wide residual networks. arXiv preprint arXiv:1605.07146 (2016)

  65. Zhang, J., et al.: Out-of-distribution detection based on in-distribution data patterns memorization with modern hopfield energy. In: The Eleventh International Conference on Learning Representations (2022)

    Google Scholar 

  66. Zhang, Y., Xu, Y., Chen, J., Xie, F., Chen, H.: Prototypical information bottlenecking and disentangling for multimodal cancer survival prediction. arXiv preprint arXiv:2401.01646 (2024)

  67. Zhang, Z., Xiang, X.: Decoupling maxlogit for out-of-distribution detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3388–3397 (2023)

    Google Scholar 

  68. Zhou, B., Lapedriza, A., Khosla, A., Oliva, A., Torralba, A.: Places: a 10 million image database for scene recognition. IEEE Trans. Pattern Anal. Mach. Intell. 1452–1464 (2017)

    Google Scholar 

  69. Zimmerer, D., et al.: Mood 2020: a public benchmark for out-of-distribution detection and localization on medical images. IEEE Trans. Med. Imaging 2728–2738 (2022)

    Google Scholar 

Download references

Acknowledgments

This work is partially supported by National Key R&D Program of China no. 2021ZD0111901, and National Natural Science Foundation of China (NSFC): 62376259 and 62276246.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hong Chang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Hu, J., Liu, W., Chang, H., Ma, B., Shan, S., Chen, X. (2025). An Information Theoretical View for Out-of-Distribution Detection. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15113. Springer, Cham. https://doi.org/10.1007/978-3-031-73001-6_24

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-73001-6_24

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-73000-9

  • Online ISBN: 978-3-031-73001-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics