Skip to main content

Investigating Neural Network Training on a Feature Level Using Conditional Independence

  • Conference paper
  • First Online:
Computer Vision – ECCV 2022 Workshops (ECCV 2022)

Abstract

There are still open questions about how the learned representations of deep models change during the training process. Understanding this process could aid in validating the training. Towards this goal, previous works analyze the training in the mutual information plane. We use a different approach and base our analysis on a method built on Reichenbach’s common cause principle. Using this method, we test whether the model utilizes information contained in human-defined features. Given such a set of features, we investigate how the relative feature usage changes throughout the training process. We analyze multiple networks training on different tasks, including melanoma classification as a real-world application. We find that over the training, models concentrate on features containing information relevant to the task. This concentration is a form of representation compression. Crucially, we also find that the selected features can differ between training from-scratch and finetuning a pre-trained network.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Time per update step: \(\approx 0.24\)s, time per CI test: \(\approx 2\)s.

References

  1. International skin imaging collaboration, ISIC Archive. https://www.isic-archive.com/

  2. Alain, G., Bengio, Y.: Understanding intermediate layers using linear classifier probes. arXiv preprint arXiv:1610.01644 (2016)

  3. Bau, D., Zhou, B., Khosla, A., Oliva, A., Torralba, A.: Network dissection: Quantifying interpretability of deep visual representations. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6541–6549 (2017)

    Google Scholar 

  4. Chalupka, K., Perona, P., Eberhardt, F.: Fast conditional independence test for vector variables with large sample sizes. arXiv preprint arXiv:1804.02747 (2018)

  5. Chelombiev, I., Houghton, C., O’Donnell, C.: Adaptive estimators show information compression in deep neural networks. arXiv preprint arXiv:1902.09037 (2019)

  6. Codella, N., et al.: Skin Lesion Analysis Toward Melanoma Detection 2018: A Challenge Hosted by the International Skin Imaging Collaboration (ISIC). arXiv:1902.03368 [cs] (2019). http://arxiv.org/abs/1902.03368,arXiv: 1902.03368

  7. Codella, N.C., et al.: Skin lesion analysis toward melanoma detection: A challenge at the 2017 international symposium on biomedical imaging (isbi), hosted by the international skin imaging collaboration (isic). In: 2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018), pp. 168–172. IEEE (2018)

    Google Scholar 

  8. Daudin, J.: Partial association measures and an application to qualitative regression. Biometrika 67(3), 581–590 (1980)

    Article  MathSciNet  MATH  Google Scholar 

  9. Fukumizu, K., Gretton, A., Sun, X., Schölkopf, B.: Kernel measures of conditional dependence. In: Advances in Neural Information Processing systems, vol. 20 (2007)

    Google Scholar 

  10. Gretton, A., Borgwardt, K., Rasch, M., Schölkopf, B., Smola, A.: A kernel method for the two-sample-problem. In: Advances in Neural Information Processing Systems, vol. 19 (2006)

    Google Scholar 

  11. Gretton, A., Fukumizu, K., Teo, C.H., Song, L., Schölkopf, B., Smola, A.J., et al.: A kernel statistical test of independence. In: Nips. vol. 20, pp. 585–592. Citeseer (2007)

    Google Scholar 

  12. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

    Google Scholar 

  13. Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4700–4708 (2017)

    Google Scholar 

  14. Kim, B., Wattenberg, M., Gilmer, J., Cai, C., Wexler, J., Viegas, F., et al.: Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In: International Conference on Machine Learning, pp. 2668–2677. PMLR (2018)

    Google Scholar 

  15. Lecun, Y., et al.: Backpropagation applied to handwritten zip code recognition. Neural Comput. 1(4), 541–551 (1989). https://doi.org/10.1162/neco.1989.1.4.541

    Article  Google Scholar 

  16. LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)

    Article  Google Scholar 

  17. Li, C., Fan, X.: On nonparametric conditional independence tests for continuous variables. Wiley Interdisc. Rev.: Comput. Stat. 12(3), e1489 (2020)

    Article  MathSciNet  Google Scholar 

  18. Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48

    Chapter  Google Scholar 

  19. Mercer, J.: Functions of positive and negative type and their connection with the theory of integral equations. Philos. Trans. Roy. Soc. London 209, 415–446 (1909)

    Article  MATH  Google Scholar 

  20. Nachbar, F., et al.: The ABCD rule of dermatoscopy. High prospective value in the diagnosis of doubtful melanocytic skin lesions. Journal of the American Academy of Dermatology 30(4), 551–559 (Apr 1994). https://doi.org/10.1016/s0190-9622(94)70061-3

  21. Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Proceedings of the 27th International Conference on International Conference on Machine Learning. pp. 807–814. ICML’10, Omnipress, Madison, WI, USA (2010)

    Google Scholar 

  22. Pearl, J.: Causality. Cambridge University Press (2009)

    Google Scholar 

  23. Rahimi, A., Recht, B.: Random features for large-scale kernel machines. In: NIPS (2007)

    Google Scholar 

  24. Reichenbach, H.: The direction of time. University of California Press (1956)

    Google Scholar 

  25. Reimers, C., Penzel, N., Bodesheim, P., Runge, J., Denzler, J.: Conditional dependence tests reveal the usage of abcd rule features and bias variables in automatic skin lesion classification. In: CVPR ISIC Skin Image Analysis Workshop (CVPR-WS), pp. 1810–1819 (June 2021)

    Google Scholar 

  26. Reimers, C., Runge, J., Denzler, J.: Determining the relevance of features for deep neural networks. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12371, pp. 330–346. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58574-7_20

    Chapter  Google Scholar 

  27. Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning representations by back-propagating errors. Nature 323(6088), 533–536 (1986)

    Google Scholar 

  28. Runge, J.: Conditional independence testing based on a nearest-neighbor estimator of conditional mutual information. In: International Conference on Artificial Intelligence and Statistics, pp. 938–947. PMLR (2018)

    Google Scholar 

  29. Russakovsky, O., et al.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vision 115(3), 211–252 (2015)

    Article  MathSciNet  Google Scholar 

  30. Santiago, C., Barata, C., Sasdelli, M., Carneiro, G., Nascimento, J.C.: Low: training deep neural networks by learning optimal sample weights. Pattern Recognit. 110, 107585 (2021)

    Article  Google Scholar 

  31. Saxe, A.M., et al.: On the information bottleneck theory of deep learning. J. Stat. Mech: Theory Exp. 2019(12), 124020 (2019)

    Article  MathSciNet  MATH  Google Scholar 

  32. Shah, R.D., Peters, J.: The hardness of conditional independence testing and the generalised covariance measure. Ann. Stat. 48(3), 1514–1538 (2020)

    Article  MathSciNet  MATH  Google Scholar 

  33. Shwartz-Ziv, R.: Information flow in deep neural networks. arXiv preprint arXiv:2202.06749 (2022)

  34. Shwartz-Ziv, R., Tishby, N.: Opening the black box of deep neural networks via information. arXiv preprint arXiv:1703.00810 (2017)

  35. Strobl, E.V., Zhang, K., Visweswaran, S.: Approximate kernel-based conditional independence tests for fast non-parametric causal discovery. Journal of Causal Inference 7(1), 20180017 (2019). https://doi.org/10.1515/jci-2018-0017, https://doi.org/10.1515/jci-2018-0017

  36. Szegedy, C., et al.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015)

    Google Scholar 

  37. Tan, M., Le, Q.: Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning, pp. 6105–6114. PMLR (2019)

    Google Scholar 

  38. Tishby, N., Pereira, F.C., Bialek, W.: The information bottleneck method. ArXiv physics/0004057 (2000)

    Google Scholar 

  39. Tschandl, P., Rosendahl, C., Kittler, H.: The ham10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions. Scientific data 5(1), 1–9 (2018)

    Article  Google Scholar 

  40. Welinder, P., et al.: Caltech-ucsd birds 200 (2010)

    Google Scholar 

  41. Xiao, H., Rasul, K., Vollgraf, R.: Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747 (2017)

  42. Yao, P., et al.: Single model deep learning on imbalanced small datasets for skin lesion classification. IEEE Transactions on Medical Imaging (2021)

    Google Scholar 

  43. Zhang, K., Peters, J., Janzing, D., Schölkopf, B.: Kernel-based conditional independence test and application in causal discovery. arXiv preprint arXiv:1202.3775 (2012)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Niklas Penzel .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 4239 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Penzel, N., Reimers, C., Bodesheim, P., Denzler, J. (2023). Investigating Neural Network Training on a Feature Level Using Conditional Independence. In: Karlinsky, L., Michaeli, T., Nishino, K. (eds) Computer Vision – ECCV 2022 Workshops. ECCV 2022. Lecture Notes in Computer Science, vol 13806. Springer, Cham. https://doi.org/10.1007/978-3-031-25075-0_27

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-25075-0_27

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-25074-3

  • Online ISBN: 978-3-031-25075-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics