Abstract
Multilayer neural architectures with a complete bipartite topology have very high training time and memory requirements. Solid evidence suggests that not every connection contributes to the performance; thus, network sparsification has emerged. We get inspiration from the topology of real biological neural networks which are scale-free. We depart from the usual complete bipartite topology among layers, and instead we start from structured sparse topologies known in network science, e.g., scale-free and end up again in a structured sparse topology, e.g., scale-free. Moreover, we apply smart link rewiring methods to construct these sparse topologies. Thus, the number of trainable parameters is reduced, with a direct impact on lowering training time and a direct beneficial result in reducing memory requirements. We design several variants of our concept (SF2SFrand, SF2SFba, SF2SF5, SF2SW, and SW2SW, respectively) by considering the neural network topology as a Scale-Free or Small-World one in every case. We conduct experiments by cutting and stipulating the replacing method of the 30% of the linkages on the network in every epoch. Our winning method, namely the one starting from a scale-free topology and producing a scale-free-like topology (SF2SFrand) can reduce training time without sacrificing neural network accuracy and also cutting memory requirements for the storage of the neural network.
Similar content being viewed by others
References
Barabasi A-L (2016) Network Science Cambridge University Press
Barabasi A-L, Albert R (1999) Emergence of scaling in random networks. Science 286 (5439):509–512
Basaras P, Katsaros D, Tassiulas L (2013) Detecting influential spreaders in complex, dynamic networks. IEEE Comp Magazine 46(4):26–31
Bullmore E, Sporns O (2009) Complex brain networks: graph theoretical analysis of structural and functional systems. Nature Rev Neuroscience 10:186–198
Cai H, Gan C, Zhu L, Han S (2020) TinyTL: reduce memory, not parameters for efficient on-device learning. In: Proceedings of the conference on neural information processing systems (NeurIPS
Cavallaro L, Bagdasar O, Meo PD, Fiumara G, Liotta A (2020) Artificial neural networks training acceleration through network science strategies. Soft Comput 24:17787–17795
Chouliaras A, Fragkou E, Katsaros D (2021) Feed forward neural network sparsification with dynamic pruning. In: Proceedings of the panhellenic conference on informatics (PCI)
Diao H, Li G, Hao Y (2022) PA-NAS: partial operation activation for memory-efficient architecture search. Appl Intell. To appear
Erkaymaz O (2020) Resilient back-propagation approach in small-world feed-forward neural network topology based on newman-watts algorithm. Neural Comput Applic 32:16279–16289
Frankle J, Carbin M (2019) The lottery ticket hypothesis: finding sparse, trainable neural networks. In: Proceedings of the international conference on learning representations (ICLR)
Goodfellow I, Bengio Y, Courville A (2016) Deep learning. The MIT Press
Han S, Pool J, Tran J, Dally W (2015) Learning both weights and connections for efficient neural network. In: Proceedings of advances in neural information processing systems, pp 1135–1143
Han S, Mao H, Dally WJ (2016) Deep compression: compressing deep neural networks with pruning, trained quantization and Huffman coding. In: Proceedings of the international conference on learning representations (ICLR
Hao J, Cai Z, Li R, Zhu WW (2021) Saliency: a new selection criterion of important architectures in neural architecture search. Neural Comput Appl. To appear
Hoefler T, Alistarh D, Ben-Nun T, Dryden N, Peste A (2021) Sparsity in deep learning pruning and growth for efficient inference and training in neural networks. J Mach Learn Res 23:1–124
Hong Z-Q, Yang J-Y (1991) Optimal discriminant plane for a small number of samples and design method of classifier on the plane. Pattern Recogn 24:317–324
Iiduka H (2022) Appropriate learning rates of adaptive learning rate optimization algorithms for training deep neural networks. IEEE Trans Cybern. To appear
James AP, Dimitrijev S (2012) Feature selection using nearest attributes. Available at: arXiv:1201.5946
Jouppi NP, Young C, Patil N, Patterson D (2018) Domain-specific architecture for deep neural networks. Commun ACM 61(9):50–59
Liebenwein L, Baykal C, Carter B, Gifford D, Rus D (2021) Lost in pruning: the effects of pruning neural networks beyond test accuracy. In: Proceedings of the machine learning systems conference (MLSys
Liu S, Mocanu DC, Matavalam ARR, Pei Y, Pechenizkiy M (2020) Sparse evolutionary deep learning with over one million artificial neurons on commodity hardware. Neural Comput Applic 33:2589–2604
Mocanu DC, Mocanu E, Stone P, Nguyen PH, Gibesce M, Liotta A (2018) Scalable training of artificial neural networks with adaptive sparse connectivity inspired by network science. Nat Commun, pp 9
Mokhtari A, Ribeiro A (2015) Global convergence of online limited memory BFGS. J Mach Learn Res 16:3151–3181
Narang S, Diamos G, Sengupta S, Elsen E (2017) Exploring sparsity in recurrent neural networks. In: Proceedings of the international conference on learning representations (ICLR)
Nene SA, Nayar SK, Murase H (1996) Columbia object image library (COIL-20). Technical report CUCS-006-96 Columbia University
Papakostas D, Kasidakis T, Fragkou E, Katsaros D (2021) Backbones for internet of battlefield things. In: Proceedings of the IEEE/IFIP annual conference on wireless on-demand network systems and services (WONS)
Qiu S, Xu X, Cai B (2019) FReLU: flexible rectified linear units for improving convolutional neural networks. Available at arXiv:1706.08098
Ray PP (2022) A review on tinyML: state-of-the-art and prospects. J King Saud University– Comput Inf Sci, To appear
Reddi SJ, Kale S, Kumar S (2018) On the convergence of Adam and beyond. In: Proceedings of the international conference on learning representations (ICLR)
Ren P, Xiao Y, Chang X, Huang P-Y, Li Z, Chen X, Wang W (2021) A comprehensive survey of neural architecture search challenges and solutions. ACM Comput Surv 54(76):1–34
Renda A, Frankle J, Carbin M (2020) Comparing rewinding and fine-tuning in neural network pruning. In: Proceedings of the international conference on learning representations (ICLR)
Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958
Sun X, Ren X, Ma S, Wang H (2017) meProp: sparsified back propagation for accelerted deep learning with reduced overfitting. Proc Mach Learn Res 70:3299–3308
Sun X, Ren X, Ma S, Wei B, Li W, Xu J, Wang H, Zhang Y (2019) Training simplification and model simplification for deep learning: a minimal effort back propagation method. IEEE Trans Kowl Data Eng, A minimal effort back propagation method. IEEE Transactions on Kowledge and Data Engineering, Training simplification and model simplification for deep learning. To appear
Wang X, Zheng Z, He Y, Yan F, qiang Zeng Z, Yang Y (2021) Soft person reidentification network pruning via blockwise adjacent filter decaying. IEEE Trans Cybern
Xiao H, Rasul K, Vollgraf R (2017) Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. arXiv:1708.07747
Xu S, Chen H, Gong X, Liu K, Lu J, Zhang B (2021) Efficient structured pruning based on deep feature stabilization. Neural Comput Applic 33:7409–7420
Zlateski A, Lee K, Seung HS (2017) Scalable training of 3d convolutional networks on multi- and many-cores. J Parallel Distrib Comput 106:195–204
Acknowledgements
Part of this work was done in the context of the BSc thesis/dissertation (2019) of the first two authors in the University of Thessaly, entitled “Neural Network Training Techniques Based on Topology Sparsification” and “Speeding up Neural Network Training via Topology Sparsification”.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
The authors declare that they have no known conflicting/competing financial or non-financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
The research work is supported by the Hellenic Foundation for Research and Innovation (HFRI) under the 3rd Call for HFRI PhD Fellowships (Fellowship Number: 5631)
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Fragkou, E., Koultouki, M. & Katsaros, D. Model reduction of feed forward neural networks for resource-constrained devices. Appl Intell 53, 14102–14127 (2023). https://doi.org/10.1007/s10489-022-04195-8
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-022-04195-8