Optimizing sparse topologies via competitive joint unstructured neural networks

Galatolo, Federico A.; Cimino, Mario G. C. A.

doi:10.1007/s13748-024-00339-8

Optimizing sparse topologies via competitive joint unstructured neural networks

Regular Paper
Published: 16 September 2024

Volume 13, pages 335–349, (2024)
Cite this article

Progress in Artificial Intelligence Aims and scope Submit manuscript

81 Accesses
Explore all metrics

Abstract

A major research problem of artificial neural networks (NNs) is to reduce the number of model parameters. The available approaches are pruning methods, consisting of removing connections of a dense model, and natively sparse models, based on training sparse models using meta-heuristics to preserve their topological properties. In this paper, the limits of both approaches are discussed. A novel hybrid training approach is developed and experimented. The approach is based on a linear combination of sparse unstructured NNs, which are joint because they share connections. Such NNs dynamically compete during the optimization, since the less important networks are iteratively pruned until the most important network remains. The method, called Competitive Joint Unstructured NNs (CJUNNs), is formalized with an efficient derivation in tensor algebra, which has been implemented and publicly released. Experimental results show its effectiveness on benchmark datasets compared to structured pruning.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Topological Insights into Sparse Neural Networks

A brain-inspired algorithm for training highly sparse neural networks

Article Open access 08 November 2022

Learning N:M Structured Sparse Neural Networks from Scratch: A Comparative Study

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Availability of data and material

The data used in this work is publicly available on: https://github.com/galatolofederico/cjunn.

Code availability

The software used in this work is publicly available at: https://github.com/galatolofederico/cjunn.

References

Blalock, D., Gonzalez Ortiz, J.J., Frankle, J., Guttag, J.: What is the state of neural network pruning? arXiv preprint arXiv:2003.03033, (2020)
Alford, S., Robinett, R., Milechin, L., Kepner, J.: Pruned and structurally sparse neural networks. In: 2018 IEEE MIT Undergraduate Research Technology Conference (URTC), pp. 1–4. IEEE (2018)
Galatolo, F.A., Cimino, M.G., Vaglini, G.: Formal derivation of mesh neural networks with their forward-only gradient propagation. Neural Process. Lett. 53, 1963–1978 (2021)
Article Google Scholar
Ruder, S.: An overview of gradient descent optimization algorithms. arXiv preprint arXiv:1609.04747 (2016)
Munro, P.: Backpropagation, pp. 73–73. Springer US, Boston, MA (2010)
Liu, Z., Sun, M., Zhou, T., Huang, G., Darrell, T.: Rethinking the value of network pruning. arXiv preprint arXiv:1810.05270 (2018)
Frankle, J., Carbin, M.: The lottery ticket hypothesis: finding sparse, trainable neural networks. arXiv preprint arXiv:1803.03635 (2018)
Srinivas, S., Subramanya, A., Venkatesh Babu, R.: Training sparse neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 138–145 (2017)
Molchanov, P., Mallya, A., Tyree, S., Frosio, I., Kautz, J.: Importance estimation for neural network pruning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11264–11272 (2019)
Tanaka, H., Kunin, D., Yamins, D.L., Ganguli, S.: Pruning neural networks without any data by iteratively conserving synaptic flow. In: Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M.F., Lin, H. (eds.) Advances in Neural Information Processing Systems, vol. 33, pp. 6377–6389. Curran Associates Inc., New York (2020)
Google Scholar
Prabhu, A., Varma, G., Namboodiri, A.: Deep expander networks: efficient deep networks from graph theory. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 20–35 (2018)
Kepner, J., Robinett, R.: Radix-net: structured sparse matrices for deep neural networks. In: 2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp. 268–274. IEEE (2019)
Liu, S., Tian, Y., Chen, T., Shen L.: Don’t be so dense: sparse-to-sparse GAN training without sacrificing performance. arXiv preprint arXiv:2203.02770 (2022)
Galatolo, F., Cimino, M.G.C.A., Vaglini, G.: Using stigmergy as a computational memory in the design of recurrent neural networks. In: Proceedings of the 8th International Conference on Pattern Recognition Applications and Methods - ICPRAM, pp. 830–836. INSTICC, SciTePress (2019)
Galatolo, F.A., Mario, G., Cimino, C.A., Vaglini, G.: Using stigmergy to incorporate the time into artificial neural networks. In: Groza, A., Prasath, R. (eds.) Mining Intelligence and Knowledge Exploration, pp. 248–258. Springer, Cham (2018)
Chapter Google Scholar
Galatolo, F.A.: cjunn, https://github.com/galatolofederico/cjunn (2021)
Bergstra, J., Bardenet, R., Bengio, Y., Kégl, B.: Algorithms for hyper-parameter optimization. Adv. Neural Inf. Process. Syst. 24 (2011)
Li, L., Jamieson, K., Rostamizadeh, A., Gonina, E., Hardt, M., Recht, B., Talwalkar, A.: A system for massively parallel hyperparameter tuning. arXiv preprint arXiv:1810.05934 (2018)
Fisher, Ronald A.: The use of multiple measurements in taxonomic problems. Ann. Eugen. 7(2), 179–188 (1936)
Article Google Scholar
Charytanowicz, M., Niewczas, J., Kulczycki, P., Kowalski, P.A., Łukasik, S., Żak S.: complete gradient clustering algorithm for features analysis of x-ray images. In: Information Technologies in Biomedicine, pp. 15–24. Springer (2010)
Yeh, I.-C., Yang, K.-J., Ting, T.-M.: Knowledge discovery on RFM model using Bernoulli sequence. Expert Syst. Appl. 36(3), 5866–5871 (2009)
Article Google Scholar
Mansouri, Kamel, Ringsted, Tine, Ballabio, Davide, Todeschini, Roberto, Consonni, Viviana: Quantitative structure-activity relationship models for ready biodegradability of chemicals. J. Chem. Inf. Model. 53(4), 867–878 (2013)
Article Google Scholar
Buscema, M., Terzi, S., Tastle, W.J.: A new meta-classifier. In: Annual Meeting of the North American Fuzzy Information Processing Society (NAFIPS), Toronto, ON, Canada, pp. (1–7) (2010)
Shapiro, A.D.: The role of structured induction in expert systems, unpublished Ph.D thesis, University of Edinburgh (1983)
Freire, A.L., Barreto, G.A., Veloso, M., Varela, A.T.: Short-term memory mechanisms in neural network learning of robot navigation tasks: a case study. In: 2009 6th Latin American Robotics Symposium (LARS 2009), pp. 1–6. IEEE (2009)
Zupan, B., Bohanec, M., Bratko, I., Demsar, J.: Machine learning by function decomposition. In: ICML, pp. 421–429 (1997)

Download references

Funding

Work partially supported: (i) by the Italian Ministry of Education and Research (MIUR) in the frameworks of the FoReLab project (Departments of Excellence), of the National Recovery and Resilience Plan (National Center for Sustainable Mobility MOST/Spoke10), and of the "Reasoning" project, PRIN 2020 LS Programme, Project number 2493 04-11-2021; (ii) by the PNRR - M4C2 - Investimento 1.3, Partenariato Esteso PE00000013 - "FAIR - Future Artificial Intelligence Research" - Spoke 1 "Human-centered AI", funded by the European Commission under the NextGeneration EU program; (iii) by the University of Pisa, in the framework of the PRA_2022_101 project “Decision Support Systems for territorial networks for managing ecosystem services”.

Author information

Authors and Affiliations

Department of Information Engineering, University of Pisa, Largo Lazzarino 1, 56122, Pisa, Italy
Federico A. Galatolo & Mario G. C. A. Cimino

Authors

Federico A. Galatolo
View author publications
You can also search for this author in PubMed Google Scholar
Mario G. C. A. Cimino
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Federico A. Galatolo: public responsibility for the content, concept, design, analysis, writing, and revision of the manuscript. Mario G.C.A. Cimino: public responsibility for the content, concept, design, analysis, writing, and revision of the manuscript.

Corresponding author

Correspondence to Federico A. Galatolo.

Ethics declarations

Conflict of interest

All the authors declare that they have no known conflict of interest or conflict of interest that could have appeared to influence the work reported in this paper.

Ethics approval

Not applicable.

Consent to participate

Not applicable.

Consent for publication

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

1.1 Justification for Eq. 11

Given i input nodes, h hidden nodes, and o output nodes, there are $i \cdot \left( {\begin{array}{c}h\\ l-1\end{array}}\right) \cdot o$ paths from any input node to any output node of length l. The minimum path length is $PL_{\min }=1$, and the maximum is $PL_{\max }=h+1$. To compute the average path length, let us determine the weighted sum from 1 to $h+1$ of the path lengths, where each path length l is multiplied by the number of paths of that length. Finally, let us divide this sum by the total number of paths:

$$\begin{aligned} \widehat{PL}_{\text {avg}} = \frac{\sum \nolimits _{l=1}^{h+1}{i \left( {\begin{array}{c}h\\ l-1\end{array}}\right) o l}}{\sum \nolimits _{l=1}^{h+1}{i\left( {\begin{array}{c}h\\ l-1\end{array}}\right) o}} \end{aligned}$$

According to the binomial theorem:

$$\begin{aligned} (x + a)^n = \sum \limits _{k=0}^{n} \left( {\begin{array}{c}n\\ k\end{array}}\right) x^k a^{n-k} \end{aligned}$$

For $x=1$ and $a=1$ it follows:

$$\begin{aligned} \sum \limits _{k=0}^{n} \left( {\begin{array}{c}n\\ k\end{array}}\right) = 2^n \end{aligned}$$

Then, the denominator of Eq. 11 can be simplified as follows:

$$\begin{aligned} \sum \limits _{l=1}^{h+1}{i\left( {\begin{array}{c}h\\ l-1\end{array}}\right) o} = i o\sum \limits _{l=0}^{h}{\left( {\begin{array}{c}h\\ l\end{array}}\right) } = i o 2^h \end{aligned}$$

As the numerator of Eq. 11, it can be rewritten as follows:

$$\begin{aligned} \begin{aligned} \sum \limits _{l=1}^{h+1}{i\left( {\begin{array}{c}h\\ l-1\end{array}}\right) o l}&= i o\sum \limits _{l=0}^{h}{\left( {\begin{array}{c}h\\ l\end{array}}\right) (l+1)}\\&=i o \left( \sum \limits _{l=0}^{h}{\left( {\begin{array}{c}h\\ l\end{array}}\right) l} + \sum \limits _{l=0}^{h}{\left( {\begin{array}{c}h\\ l\end{array}}\right) }\right) \\&=i o \left( \sum \limits _{l=0}^{h}{\left( {\begin{array}{c}h\\ l\end{array}}\right) l} + 2^h \right) \end{aligned} \end{aligned}$$

The first addend $\sum \limits _{l=0}^{h}{\left( {\begin{array}{c}h\\ l\end{array}}\right) l}$ can be simplified by considering the binomial theorem for $a=1$:

$$\begin{aligned} (x + 1)^n = \sum \limits _{k=0}^{n} \left( {\begin{array}{c}n\\ k\end{array}}\right) x^k \end{aligned}$$

Taking the derivatives of both sides:

$$\begin{aligned} \begin{aligned} \frac{\partial }{\partial x}(x + 1)^n = \frac{\partial }{\partial x}\sum \limits _{k=0}^{n} \left( {\begin{array}{c}n\\ k\end{array}}\right) x^k \\ n(x+1)^{n-1} = \sum \limits _{k=0}^{n} k\left( {\begin{array}{c}n\\ k\end{array}}\right) x^{k-1} \end{aligned} \end{aligned}$$

For $x=1$ it follows that:

$$\begin{aligned} \sum \limits _{k=0}^{n} k\left( {\begin{array}{c}n\\ k\end{array}}\right) = n 2^{n-1} \end{aligned}$$

i.e.,

$$\begin{aligned} \sum \limits _{l=0}^{h}{\left( {\begin{array}{c}h\\ l\end{array}}\right) l} = h 2^{h-1} \end{aligned}$$

As a consequence, Eq. 11 can be rewritten as follows:

$$\begin{aligned} \begin{aligned} \widehat{PL}_{\text {avg}}&= \frac{\sum \nolimits _{l=1}^{h+1}{i\left( {\begin{array}{c}h\\ l-1\end{array}}\right) o l}}{\sum \nolimits _{l=1}^{h+1}{i\left( {\begin{array}{c}h\\ l-1\end{array}}\right) o}} = \frac{i o(h 2^{h-1} + 2^h)}{i o 2^h} \\&= \frac{2^{h-1}(h+2)}{2^h} = \frac{h+2}{2} = \frac{h}{2} + 1 \end{aligned} \end{aligned}$$

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Galatolo, F.A., Cimino, M.G.C.A. Optimizing sparse topologies via competitive joint unstructured neural networks. Prog Artif Intell 13, 335–349 (2024). https://doi.org/10.1007/s13748-024-00339-8

Download citation

Received: 02 November 2022
Accepted: 04 August 2024
Published: 16 September 2024
Issue Date: December 2024
DOI: https://doi.org/10.1007/s13748-024-00339-8

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Optimizing sparse topologies via competitive joint unstructured neural networks

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Topological Insights into Sparse Neural Networks

A brain-inspired algorithm for training highly sparse neural networks

Learning N:M Structured Sparse Neural Networks from Scratch: A Comparative Study

Availability of data and material

Code availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethics approval

Consent to participate

Consent for publication

Additional information

Publisher's Note

Appendix

1.1 Justification for Eq. 11

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Optimizing sparse topologies via competitive joint unstructured neural networks

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Topological Insights into Sparse Neural Networks

A brain-inspired algorithm for training highly sparse neural networks

Learning N:M Structured Sparse Neural Networks from Scratch: A Comparative Study

Explore related subjects

Availability of data and material

Code availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethics approval

Consent to participate

Consent for publication

Additional information

Publisher's Note

Appendix

Appendix

1.1 Justification for Eq. 11

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation