Investigating Neural Network Training on a Feature Level Using Conditional Independence

Penzel, Niklas; Reimers, Christian; Bodesheim, Paul; Denzler, Joachim

doi:10.1007/978-3-031-25075-0_27

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13806))

Included in the following conference series:

European Conference on Computer Vision

1322 Accesses
1 Citations

Abstract

There are still open questions about how the learned representations of deep models change during the training process. Understanding this process could aid in validating the training. Towards this goal, previous works analyze the training in the mutual information plane. We use a different approach and base our analysis on a method built on Reichenbach’s common cause principle. Using this method, we test whether the model utilizes information contained in human-defined features. Given such a set of features, we investigate how the relative feature usage changes throughout the training process. We analyze multiple networks training on different tasks, including melanoma classification as a real-world application. We find that over the training, models concentrate on features containing information relevant to the task. This concentration is a form of representation compression. Crucially, we also find that the selected features can differ between training from-scratch and finetuning a pre-trained network.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Time per update step: \(\approx 0.24\)s, time per CI test: \(\approx 2\)s.

References

International skin imaging collaboration, ISIC Archive. https://www.isic-archive.com/
Alain, G., Bengio, Y.: Understanding intermediate layers using linear classifier probes. arXiv preprint arXiv:1610.01644 (2016)
Bau, D., Zhou, B., Khosla, A., Oliva, A., Torralba, A.: Network dissection: Quantifying interpretability of deep visual representations. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6541–6549 (2017)
Google Scholar
Chalupka, K., Perona, P., Eberhardt, F.: Fast conditional independence test for vector variables with large sample sizes. arXiv preprint arXiv:1804.02747 (2018)
Chelombiev, I., Houghton, C., O’Donnell, C.: Adaptive estimators show information compression in deep neural networks. arXiv preprint arXiv:1902.09037 (2019)
Codella, N., et al.: Skin Lesion Analysis Toward Melanoma Detection 2018: A Challenge Hosted by the International Skin Imaging Collaboration (ISIC). arXiv:1902.03368 [cs] (2019). http://arxiv.org/abs/1902.03368,arXiv: 1902.03368
Codella, N.C., et al.: Skin lesion analysis toward melanoma detection: A challenge at the 2017 international symposium on biomedical imaging (isbi), hosted by the international skin imaging collaboration (isic). In: 2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018), pp. 168–172. IEEE (2018)
Google Scholar
Daudin, J.: Partial association measures and an application to qualitative regression. Biometrika 67(3), 581–590 (1980)
Article MathSciNet MATH Google Scholar
Fukumizu, K., Gretton, A., Sun, X., Schölkopf, B.: Kernel measures of conditional dependence. In: Advances in Neural Information Processing systems, vol. 20 (2007)
Google Scholar
Gretton, A., Borgwardt, K., Rasch, M., Schölkopf, B., Smola, A.: A kernel method for the two-sample-problem. In: Advances in Neural Information Processing Systems, vol. 19 (2006)
Google Scholar
Gretton, A., Fukumizu, K., Teo, C.H., Song, L., Schölkopf, B., Smola, A.J., et al.: A kernel statistical test of independence. In: Nips. vol. 20, pp. 585–592. Citeseer (2007)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4700–4708 (2017)
Google Scholar
Kim, B., Wattenberg, M., Gilmer, J., Cai, C., Wexler, J., Viegas, F., et al.: Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In: International Conference on Machine Learning, pp. 2668–2677. PMLR (2018)
Google Scholar
Lecun, Y., et al.: Backpropagation applied to handwritten zip code recognition. Neural Comput. 1(4), 541–551 (1989). https://doi.org/10.1162/neco.1989.1.4.541
Article Google Scholar
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)
Article Google Scholar
Li, C., Fan, X.: On nonparametric conditional independence tests for continuous variables. Wiley Interdisc. Rev.: Comput. Stat. 12(3), e1489 (2020)
Article MathSciNet Google Scholar
Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Chapter Google Scholar
Mercer, J.: Functions of positive and negative type and their connection with the theory of integral equations. Philos. Trans. Roy. Soc. London 209, 415–446 (1909)
Article MATH Google Scholar
Nachbar, F., et al.: The ABCD rule of dermatoscopy. High prospective value in the diagnosis of doubtful melanocytic skin lesions. Journal of the American Academy of Dermatology 30(4), 551–559 (Apr 1994). https://doi.org/10.1016/s0190-9622(94)70061-3
Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Proceedings of the 27th International Conference on International Conference on Machine Learning. pp. 807–814. ICML’10, Omnipress, Madison, WI, USA (2010)
Google Scholar
Pearl, J.: Causality. Cambridge University Press (2009)
Google Scholar
Rahimi, A., Recht, B.: Random features for large-scale kernel machines. In: NIPS (2007)
Google Scholar
Reichenbach, H.: The direction of time. University of California Press (1956)
Google Scholar
Reimers, C., Penzel, N., Bodesheim, P., Runge, J., Denzler, J.: Conditional dependence tests reveal the usage of abcd rule features and bias variables in automatic skin lesion classification. In: CVPR ISIC Skin Image Analysis Workshop (CVPR-WS), pp. 1810–1819 (June 2021)
Google Scholar
Reimers, C., Runge, J., Denzler, J.: Determining the relevance of features for deep neural networks. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12371, pp. 330–346. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58574-7_20
Chapter Google Scholar
Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning representations by back-propagating errors. Nature 323(6088), 533–536 (1986)
Google Scholar
Runge, J.: Conditional independence testing based on a nearest-neighbor estimator of conditional mutual information. In: International Conference on Artificial Intelligence and Statistics, pp. 938–947. PMLR (2018)
Google Scholar
Russakovsky, O., et al.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vision 115(3), 211–252 (2015)
Article MathSciNet Google Scholar
Santiago, C., Barata, C., Sasdelli, M., Carneiro, G., Nascimento, J.C.: Low: training deep neural networks by learning optimal sample weights. Pattern Recognit. 110, 107585 (2021)
Article Google Scholar
Saxe, A.M., et al.: On the information bottleneck theory of deep learning. J. Stat. Mech: Theory Exp. 2019(12), 124020 (2019)
Article MathSciNet MATH Google Scholar
Shah, R.D., Peters, J.: The hardness of conditional independence testing and the generalised covariance measure. Ann. Stat. 48(3), 1514–1538 (2020)
Article MathSciNet MATH Google Scholar
Shwartz-Ziv, R.: Information flow in deep neural networks. arXiv preprint arXiv:2202.06749 (2022)
Shwartz-Ziv, R., Tishby, N.: Opening the black box of deep neural networks via information. arXiv preprint arXiv:1703.00810 (2017)
Strobl, E.V., Zhang, K., Visweswaran, S.: Approximate kernel-based conditional independence tests for fast non-parametric causal discovery. Journal of Causal Inference 7(1), 20180017 (2019). https://doi.org/10.1515/jci-2018-0017, https://doi.org/10.1515/jci-2018-0017
Szegedy, C., et al.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015)
Google Scholar
Tan, M., Le, Q.: Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning, pp. 6105–6114. PMLR (2019)
Google Scholar
Tishby, N., Pereira, F.C., Bialek, W.: The information bottleneck method. ArXiv physics/0004057 (2000)
Google Scholar
Tschandl, P., Rosendahl, C., Kittler, H.: The ham10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions. Scientific data 5(1), 1–9 (2018)
Article Google Scholar
Welinder, P., et al.: Caltech-ucsd birds 200 (2010)
Google Scholar
Xiao, H., Rasul, K., Vollgraf, R.: Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747 (2017)
Yao, P., et al.: Single model deep learning on imbalanced small datasets for skin lesion classification. IEEE Transactions on Medical Imaging (2021)
Google Scholar
Zhang, K., Peters, J., Janzing, D., Schölkopf, B.: Kernel-based conditional independence test and application in causal discovery. arXiv preprint arXiv:1202.3775 (2012)

Download references

Author information

Authors and Affiliations

Computer Vision Group, Friedrich Schiller University Jena, Ernst-Abbe-Platz 2, 07743, Jena, Germany
Niklas Penzel, Paul Bodesheim & Joachim Denzler
Max Planck Institute for Biogeochemistry, Hans-Knöll-Straße 10, 07745, Jena, Germany
Christian Reimers

Authors

Niklas Penzel
View author publications
You can also search for this author in PubMed Google Scholar
Christian Reimers
View author publications
You can also search for this author in PubMed Google Scholar
Paul Bodesheim
View author publications
You can also search for this author in PubMed Google Scholar
Joachim Denzler
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Niklas Penzel .

Editor information

Editors and Affiliations

IBM Research - MIT-IBM Watson AI Lab, Massachusetts, USA
Leonid Karlinsky
Technion – Israel Institute of Technology, Haifa, Israel
Tomer Michaeli
Kyoto University, Kyoto, Japan
Ko Nishino

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 4239 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Penzel, N., Reimers, C., Bodesheim, P., Denzler, J. (2023). Investigating Neural Network Training on a Feature Level Using Conditional Independence. In: Karlinsky, L., Michaeli, T., Nishino, K. (eds) Computer Vision – ECCV 2022 Workshops. ECCV 2022. Lecture Notes in Computer Science, vol 13806. Springer, Cham. https://doi.org/10.1007/978-3-031-25075-0_27

Download citation

DOI: https://doi.org/10.1007/978-3-031-25075-0_27
Published: 19 February 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-25074-3
Online ISBN: 978-3-031-25075-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Investigating Neural Network Training on a Feature Level Using Conditional Independence