Skip to main content

Advertisement

Log in

Optimizing sparse topologies via competitive joint unstructured neural networks

  • Regular Paper
  • Published:
Progress in Artificial Intelligence Aims and scope Submit manuscript

Abstract

A major research problem of artificial neural networks (NNs) is to reduce the number of model parameters. The available approaches are pruning methods, consisting of removing connections of a dense model, and natively sparse models, based on training sparse models using meta-heuristics to preserve their topological properties. In this paper, the limits of both approaches are discussed. A novel hybrid training approach is developed and experimented. The approach is based on a linear combination of sparse unstructured NNs, which are joint because they share connections. Such NNs dynamically compete during the optimization, since the less important networks are iteratively pruned until the most important network remains. The method, called Competitive Joint Unstructured NNs (CJUNNs), is formalized with an efficient derivation in tensor algebra, which has been implemented and publicly released. Experimental results show its effectiveness on benchmark datasets compared to structured pruning.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Algorithm 1
Algorithm 2

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Availability of data and material

The data used in this work is publicly available on: https://github.com/galatolofederico/cjunn.

Code availability

The software used in this work is publicly available at: https://github.com/galatolofederico/cjunn.

References

  1. Blalock, D., Gonzalez Ortiz, J.J., Frankle, J., Guttag, J.: What is the state of neural network pruning? arXiv preprint arXiv:2003.03033, (2020)

  2. Alford, S., Robinett, R., Milechin, L., Kepner, J.: Pruned and structurally sparse neural networks. In: 2018 IEEE MIT Undergraduate Research Technology Conference (URTC), pp. 1–4. IEEE (2018)

  3. Galatolo, F.A., Cimino, M.G., Vaglini, G.: Formal derivation of mesh neural networks with their forward-only gradient propagation. Neural Process. Lett. 53, 1963–1978 (2021)

    Article  Google Scholar 

  4. Ruder, S.: An overview of gradient descent optimization algorithms. arXiv preprint arXiv:1609.04747 (2016)

  5. Munro, P.: Backpropagation, pp. 73–73. Springer US, Boston, MA (2010)

  6. Liu, Z., Sun, M., Zhou, T., Huang, G., Darrell, T.: Rethinking the value of network pruning. arXiv preprint arXiv:1810.05270 (2018)

  7. Frankle, J., Carbin, M.: The lottery ticket hypothesis: finding sparse, trainable neural networks. arXiv preprint arXiv:1803.03635 (2018)

  8. Srinivas, S., Subramanya, A., Venkatesh Babu, R.: Training sparse neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 138–145 (2017)

  9. Molchanov, P., Mallya, A., Tyree, S., Frosio, I., Kautz, J.: Importance estimation for neural network pruning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11264–11272 (2019)

  10. Tanaka, H., Kunin, D., Yamins, D.L., Ganguli, S.: Pruning neural networks without any data by iteratively conserving synaptic flow. In: Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M.F., Lin, H. (eds.) Advances in Neural Information Processing Systems, vol. 33, pp. 6377–6389. Curran Associates Inc., New York (2020)

    Google Scholar 

  11. Prabhu, A., Varma, G., Namboodiri, A.: Deep expander networks: efficient deep networks from graph theory. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 20–35 (2018)

  12. Kepner, J., Robinett, R.: Radix-net: structured sparse matrices for deep neural networks. In: 2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp. 268–274. IEEE (2019)

  13. Liu, S., Tian, Y., Chen, T., Shen L.: Don’t be so dense: sparse-to-sparse GAN training without sacrificing performance. arXiv preprint arXiv:2203.02770 (2022)

  14. Galatolo, F., Cimino, M.G.C.A., Vaglini, G.: Using stigmergy as a computational memory in the design of recurrent neural networks. In: Proceedings of the 8th International Conference on Pattern Recognition Applications and Methods - ICPRAM, pp. 830–836. INSTICC, SciTePress (2019)

  15. Galatolo, F.A., Mario, G., Cimino, C.A., Vaglini, G.: Using stigmergy to incorporate the time into artificial neural networks. In: Groza, A., Prasath, R. (eds.) Mining Intelligence and Knowledge Exploration, pp. 248–258. Springer, Cham (2018)

    Chapter  Google Scholar 

  16. Galatolo, F.A.: cjunn, https://github.com/galatolofederico/cjunn (2021)

  17. Bergstra, J., Bardenet, R., Bengio, Y., Kégl, B.: Algorithms for hyper-parameter optimization. Adv. Neural Inf. Process. Syst. 24 (2011)

  18. Li, L., Jamieson, K., Rostamizadeh, A., Gonina, E., Hardt, M., Recht, B., Talwalkar, A.: A system for massively parallel hyperparameter tuning. arXiv preprint arXiv:1810.05934 (2018)

  19. Fisher, Ronald A.: The use of multiple measurements in taxonomic problems. Ann. Eugen. 7(2), 179–188 (1936)

    Article  Google Scholar 

  20. Charytanowicz, M., Niewczas, J., Kulczycki, P., Kowalski, P.A., Łukasik, S., Żak S.: complete gradient clustering algorithm for features analysis of x-ray images. In: Information Technologies in Biomedicine, pp. 15–24. Springer (2010)

  21. Yeh, I.-C., Yang, K.-J., Ting, T.-M.: Knowledge discovery on RFM model using Bernoulli sequence. Expert Syst. Appl. 36(3), 5866–5871 (2009)

    Article  Google Scholar 

  22. Mansouri, Kamel, Ringsted, Tine, Ballabio, Davide, Todeschini, Roberto, Consonni, Viviana: Quantitative structure-activity relationship models for ready biodegradability of chemicals. J. Chem. Inf. Model. 53(4), 867–878 (2013)

    Article  Google Scholar 

  23. Buscema, M., Terzi, S., Tastle, W.J.: A new meta-classifier. In: Annual Meeting of the North American Fuzzy Information Processing Society (NAFIPS), Toronto, ON, Canada, pp. (1–7) (2010)

  24. Shapiro, A.D.: The role of structured induction in expert systems, unpublished Ph.D thesis, University of Edinburgh (1983)

  25. Freire, A.L., Barreto, G.A., Veloso, M., Varela, A.T.: Short-term memory mechanisms in neural network learning of robot navigation tasks: a case study. In: 2009 6th Latin American Robotics Symposium (LARS 2009), pp. 1–6. IEEE (2009)

  26. Zupan, B., Bohanec, M., Bratko, I., Demsar, J.: Machine learning by function decomposition. In: ICML, pp. 421–429 (1997)

Download references

Funding

Work partially supported: (i) by the Italian Ministry of Education and Research (MIUR) in the frameworks of the FoReLab project (Departments of Excellence), of the National Recovery and Resilience Plan (National Center for Sustainable Mobility MOST/Spoke10), and of the "Reasoning" project, PRIN 2020 LS Programme, Project number 2493 04-11-2021; (ii) by the PNRR - M4C2 - Investimento 1.3, Partenariato Esteso PE00000013 - "FAIR - Future Artificial Intelligence Research" - Spoke 1 "Human-centered AI", funded by the European Commission under the NextGeneration EU program; (iii) by the University of Pisa, in the framework of the PRA_2022_101 project “Decision Support Systems for territorial networks for managing ecosystem services”.

Author information

Authors and Affiliations

Authors

Contributions

Federico A. Galatolo: public responsibility for the content, concept, design, analysis, writing, and revision of the manuscript. Mario G.C.A. Cimino: public responsibility for the content, concept, design, analysis, writing, and revision of the manuscript.

Corresponding author

Correspondence to Federico A. Galatolo.

Ethics declarations

Conflict of interest

All the authors declare that they have no known conflict of interest or conflict of interest that could have appeared to influence the work reported in this paper.

Ethics approval

Not applicable.

Consent to participate

Not applicable.

Consent for publication

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

1.1 Justification for Eq. 11

Given i input nodes, h hidden nodes, and o output nodes, there are \(i \cdot \left( {\begin{array}{c}h\\ l-1\end{array}}\right) \cdot o\) paths from any input node to any output node of length l. The minimum path length is \(PL_{\min }=1\), and the maximum is \(PL_{\max }=h+1\). To compute the average path length, let us determine the weighted sum from 1 to \(h+1\) of the path lengths, where each path length l is multiplied by the number of paths of that length. Finally, let us divide this sum by the total number of paths:

$$\begin{aligned} \widehat{PL}_{\text {avg}} = \frac{\sum \nolimits _{l=1}^{h+1}{i \left( {\begin{array}{c}h\\ l-1\end{array}}\right) o l}}{\sum \nolimits _{l=1}^{h+1}{i\left( {\begin{array}{c}h\\ l-1\end{array}}\right) o}} \end{aligned}$$

According to the binomial theorem:

$$\begin{aligned} (x + a)^n = \sum \limits _{k=0}^{n} \left( {\begin{array}{c}n\\ k\end{array}}\right) x^k a^{n-k} \end{aligned}$$

For \(x=1\) and \(a=1\) it follows:

$$\begin{aligned} \sum \limits _{k=0}^{n} \left( {\begin{array}{c}n\\ k\end{array}}\right) = 2^n \end{aligned}$$

Then, the denominator of Eq. 11 can be simplified as follows:

$$\begin{aligned} \sum \limits _{l=1}^{h+1}{i\left( {\begin{array}{c}h\\ l-1\end{array}}\right) o} = i o\sum \limits _{l=0}^{h}{\left( {\begin{array}{c}h\\ l\end{array}}\right) } = i o 2^h \end{aligned}$$

As the numerator of Eq. 11, it can be rewritten as follows:

$$\begin{aligned} \begin{aligned} \sum \limits _{l=1}^{h+1}{i\left( {\begin{array}{c}h\\ l-1\end{array}}\right) o l}&= i o\sum \limits _{l=0}^{h}{\left( {\begin{array}{c}h\\ l\end{array}}\right) (l+1)}\\&=i o \left( \sum \limits _{l=0}^{h}{\left( {\begin{array}{c}h\\ l\end{array}}\right) l} + \sum \limits _{l=0}^{h}{\left( {\begin{array}{c}h\\ l\end{array}}\right) }\right) \\&=i o \left( \sum \limits _{l=0}^{h}{\left( {\begin{array}{c}h\\ l\end{array}}\right) l} + 2^h \right) \end{aligned} \end{aligned}$$

The first addend \(\sum \limits _{l=0}^{h}{\left( {\begin{array}{c}h\\ l\end{array}}\right) l}\) can be simplified by considering the binomial theorem for \(a=1\):

$$\begin{aligned} (x + 1)^n = \sum \limits _{k=0}^{n} \left( {\begin{array}{c}n\\ k\end{array}}\right) x^k \end{aligned}$$

Taking the derivatives of both sides:

$$\begin{aligned} \begin{aligned} \frac{\partial }{\partial x}(x + 1)^n = \frac{\partial }{\partial x}\sum \limits _{k=0}^{n} \left( {\begin{array}{c}n\\ k\end{array}}\right) x^k \\ n(x+1)^{n-1} = \sum \limits _{k=0}^{n} k\left( {\begin{array}{c}n\\ k\end{array}}\right) x^{k-1} \end{aligned} \end{aligned}$$

For \(x=1\) it follows that:

$$\begin{aligned} \sum \limits _{k=0}^{n} k\left( {\begin{array}{c}n\\ k\end{array}}\right) = n 2^{n-1} \end{aligned}$$

i.e.,

$$\begin{aligned} \sum \limits _{l=0}^{h}{\left( {\begin{array}{c}h\\ l\end{array}}\right) l} = h 2^{h-1} \end{aligned}$$

As a consequence, Eq. 11 can be rewritten as follows:

$$\begin{aligned} \begin{aligned} \widehat{PL}_{\text {avg}}&= \frac{\sum \nolimits _{l=1}^{h+1}{i\left( {\begin{array}{c}h\\ l-1\end{array}}\right) o l}}{\sum \nolimits _{l=1}^{h+1}{i\left( {\begin{array}{c}h\\ l-1\end{array}}\right) o}} = \frac{i o(h 2^{h-1} + 2^h)}{i o 2^h} \\&= \frac{2^{h-1}(h+2)}{2^h} = \frac{h+2}{2} = \frac{h}{2} + 1 \end{aligned} \end{aligned}$$

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Galatolo, F.A., Cimino, M.G.C.A. Optimizing sparse topologies via competitive joint unstructured neural networks. Prog Artif Intell 13, 335–349 (2024). https://doi.org/10.1007/s13748-024-00339-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13748-024-00339-8

Keywords

Navigation