Enhancing Generalization in Convolutional Neural Networks Through Regularization with Edge and Line Features

Linse, Christoph; Brückner, Beatrice; Martinetz, Thomas

doi:10.1007/978-3-031-72332-2_28

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15016))

Included in the following conference series:

International Conference on Artificial Neural Networks

420 Accesses
1 Altmetric

Abstract

This paper proposes a novel regularization approach to bias Convolutional Neural Networks (CNNs) toward utilizing edge and line features in their hidden layers. Rather than learning arbitrary kernels, we constrain the convolution layers to edge and line detection kernels. This intentional bias regularizes the models, improving generalization performance, especially on small datasets. As a result, test accuracies improve by margins of $5-11$ percentage points across four challenging fine-grained classification datasets with limited training data and an identical number of trainable parameters. Instead of traditional convolutional layers, we use Pre-defined Filter Modules, which convolve input data using a fixed set of $3 \times 3$ pre-defined edge and line filters. A subsequent ReLU erases information that did not trigger any positive response. Next, a $1 \times 1$ convolutional layer generates linear combinations. Notably, the pre-defined filters are a fixed component of the architecture, remaining unchanged during the training phase. Our findings reveal that the number of dimensions spanned by the set of pre-defined filters has a low impact on recognition performance. However, the size of the set of filters matters, with nine or more filters providing optimal results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Belkin, M., Hsu, D., Ma, S., Mandal, S.: Reconciling modern machine-learning practice and the classical bias-variance trade-off. Proc. Natl. Acad. Sci. 116(32), 15849–15854 (2019)
Article MathSciNet Google Scholar
Gavrikov, P., Keuper, J.: CNN filter DB: an empirical investigation of trained convolutional filters. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 19044–19054. IEEE, New Orleans (2022)
Google Scholar
Gavrikov, P., Keuper, J.: Rethinking 1x1 convolutions: can we train CNNs with frozen random filters? (2023). arXiv preprint arXiv:2301.11360 [cs]
He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on ImageNet classification. In: 2015 IEEE International Conference on Computer Vision (ICCV), pp. 1026–1034. IEEE, Santiago (2015)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778. IEEE, Las Vegas (2016)
Google Scholar
Hertel, L., Barth, E., Käster, T., Martinetz, T.: Deep convolutional neural networks as generic feature extractors. arXiv preprint arXiv:1710.02286 [cs] (2017)
Howard, A.G., et al.: Mobilenets: efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017)
Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4700–4708 (2017)
Google Scholar
Krause, J., Stark, M., Deng, J., Fei-Fei, L.: 3D Object rpresentations for fine-grained categorization. In: 2013 IEEE International Conference on Computer Vision Workshops, pp. 554–561. IEEE, Sydney (2013)
Google Scholar
Linse, C., Barth, E., Martinetz, T.: Convolutional neural networks do work with pre-defined filters. In: 2023 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE (2023)
Google Scholar
Linse, C., Martinetz, T.: Large neural networks learning from scratch with very few data and without explicit regularization. In: Proceedings of the 2023 15th International Conference on Machine Learning and Computing (ICMLC 2023), pp. 279-283. Association for Computing Machinery, New York (2023)
Google Scholar
Maji, S., Rahtu, E., Kannala, J., Blaschko, M., Vedaldi, A.: Fine-grained visual classification of aircraft. arXiv preprint arXiv:1306.5151 [cs] (2013)
Nair, V., Hinton, G.E.: Rectified linear units improve restricted Boltzmann machines. In: Proceedings of the 27th International Conference on International Conference on Machine Learning (ICML 2010), pp. 807–814. Omnipress, Madison (2010), event-place: Haifa, Israel
Google Scholar
Nilsback, M.E., Zisserman, A.: Automated flower classification over a large number of classes. In: 2008 Sixth Indian Conference on Computer Vision. Graphics and Image Processing, pp. 722–729. IEEE, Bhubaneswar (2008)
Google Scholar
Paszke, A., et al.: PyTorch: an imperative style, high-performance deep learning library. In: Wallach, H., Larochelle, H., Beygelzimer, A., AlchÃ-Buc, F.d., Fox, E., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 32, pp. 8024–8035. Curran Associates, Inc. (2019)
Google Scholar
Ramanujan, V., Wortsman, M., Kembhavi, A., Farhadi, A., Rastegari, M.: What’s hidden in a randomly weighted neural network? arXiv preprint arXiv:1911.13299 [cs] (2020)
Russakovsky, O., et al.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vision 115(3), 211–252 (2015)
Google Scholar
Wah, C., Branson, S., Welinder, P., Perona, P., Belongie, S.: The Caltech-UCSD Birds-200-2011 Dataset. Tech. Rep. CNS-TR-2011-001, California Institute of Technology (2011)
Google Scholar
Wimmer, P., Mehnert, J., Condurache, A.: Interspace pruning: using adaptive filter representations to improve training of sparse CNNs. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12527–12537 (2022)
Google Scholar

Download references

Acknowledgment

The work of Christoph Linse was supported by the Bundesministerium für Wirtschaft und Klimaschutz through the Mittelstand-Digital Zentrum Schleswig-Holstein Project.

Author information

Authors and Affiliations

Institute for Neuro- and Bioinformatics, University of Lübeck, 23562, Lübeck, Germany
Christoph Linse, Beatrice Brückner & Thomas Martinetz

Authors

Christoph Linse
View author publications
You can also search for this author in PubMed Google Scholar
Beatrice Brückner
View author publications
You can also search for this author in PubMed Google Scholar
Thomas Martinetz
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Christoph Linse .

Editor information

Editors and Affiliations

IDSIA USI-SUPSI, Lugano, Switzerland
Michael Wand
Comenius University, Bratislava, Slovakia
Kristína Malinovská
KAUST Center of Generative AI, Thuwal, Saudi Arabia
Jürgen Schmidhuber
Helmholtz Zentrum München, Neuherberg, Germany
Igor V. Tetko

Appendix

1.1 Training Hyperparameters for ImageNet

The networks are trained with a batch size of 48 on five NVIDIA GeForce RTX 4090 GPUs. The remaining training hyperparameters are taken from the training reference of PyTorch [15]. The cross-entropy loss is minimized using stochastic gradient descent for 90 epochs with a momentum of 0.9 and weight-decay 0.0001. The initial learning rate of 0.1 is reduced by a factor of 0.1 every 30 epochs.

1.2 Linear Independency of ReLU-Based Functions

Two functions $f_1, f_2 : X \rightarrow Y$ are linearly independent if

$$\begin{aligned} (\forall \textbf{x} \in X: c_1 f_1(\textbf{x}) + c_2 f_2(\textbf{x}) = 0) \Leftrightarrow c_1 = c_2 = 0 . \end{aligned}$$

(9)

Consider the functions from (7) that occur in the $\text {PFM}$ module. The functions are linearly dependent if the pre-defined filters $\tilde{w}_1$ and $\tilde{w}_2$ are linearly dependent and $a \tilde{w}_1 = \tilde{w}_2, a \ge 0$.

Proof

Choose some arbitrary $\textbf{x} \in \mathbb {R}^{M \times N}$. Choose $c_1 \in \mathbb {R} \backslash \{0\}$ and $c_2 = - c_1 / a$. Then,

$$\begin{aligned} {\begin{matrix} & c_1 f^{(\tilde{w}_1, m, n)}(\textbf{x}) + c_2 f^{(\tilde{w}_2, m, n)}(\textbf{x}) \\ & = c_1 \text {ReLU}(\tilde{w}_1 * \textbf{x})[m,n] - c_1 \frac{a}{a} \text {ReLU}(\tilde{w}_1 * \textbf{x})[m,n] = 0 . \\ \end{matrix}} \end{aligned}$$

(10)

$\Box $

The functions in (7) are linearly independent if $\tilde{w}_1$ and $\tilde{w}_2$ are linearly dependent and $a \tilde{w}_1 = \tilde{w}_2, a < 0$.

Proof

The $\leftarrow $ direction is clear. To show the $\rightarrow $ direction, let $\textbf{x} \in \mathbb {R}^{M \times N}$.

$$\begin{aligned} {\begin{matrix} & c_1 f^{(\tilde{w}_1, m, n)}(\textbf{x}) + c_2 f^{(\tilde{w}_2, m, n)}(\textbf{x}) = 0 \\ & \Leftrightarrow c_1 \text {ReLU}(\tilde{w}_1 * \textbf{x})[m,n] + c_2 \text {ReLU}(a \tilde{w}_1 * \textbf{x})[m,n] = 0 \\ & \Leftrightarrow c_1 \text {ReLU}(\tilde{w}_1 * \textbf{x})[m,n] - a c_2 \text {ReLU}(-\tilde{w}_1 * \textbf{x})[m,n] = 0 \\ & \text {Case1}: (\tilde{w}_1 * \textbf{x})[m,n] \ge 0 \implies c_1 = 0\\ & \text {Case2}: (\tilde{w}_1 * \textbf{x})[m,n] < 0 \implies c_2 = 0\\ \end{matrix}} \end{aligned}$$

(11)

The sum has to be zero for all $\textbf{x} \in \mathbb {R}^{M \times N}$. This means that both coefficients $c_1$ and $c_2$ have to be zero. $\Box $

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Linse, C., Brückner, B., Martinetz, T. (2024). Enhancing Generalization in Convolutional Neural Networks Through Regularization with Edge and Line Features. In: Wand, M., Malinovská, K., Schmidhuber, J., Tetko, I.V. (eds) Artificial Neural Networks and Machine Learning – ICANN 2024. ICANN 2024. Lecture Notes in Computer Science, vol 15016. Springer, Cham. https://doi.org/10.1007/978-3-031-72332-2_28

Download citation

DOI: https://doi.org/10.1007/978-3-031-72332-2_28
Published: 17 September 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-72331-5
Online ISBN: 978-3-031-72332-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Enhancing Generalization in Convolutional Neural Networks Through Regularization with Edge and Line Features

Abstract

Access this chapter

Subscribe and save

Buy Now

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendix

Appendix

1.1 Training Hyperparameters for ImageNet

1.2 Linear Independency of ReLU-Based Functions

Proof

Proof

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation