Abstract
Over the last decade, deep neural networks have shown great success in the fields of machine learning and computer vision. Currently, the CNN (convolutional neural network) is one of the most successful networks, having been applied in a wide variety of application domains, including pattern recognition, medical diagnosis and signal processing. Despite CNNs’ impressive performance, their architectural design remains a significant challenge for researchers and practitioners. The problem of selecting hyperparameters is extremely important for these networks. The reason for this is that the search space grows exponentially in size as the number of layers increases. In fact, all existing classical and evolutionary pruning methods take as input an already pre-trained or designed architecture. None of them take pruning into account during the design process. However, to evaluate the quality and possible compactness of any generated architecture, filter pruning should be applied before the communication with the data set to compute the classification error. For instance, a medium-quality architecture in terms of classification could become a very light and accurate architecture after pruning, and vice versa. Many cases are possible, and the number of possibilities is huge. This motivated us to frame the whole process as a bi-level optimization problem where: (1) architecture generation is done at the upper level (with minimum NB and NNB) while (2) its filter pruning optimization is done at the lower level. Motivated by evolutionary algorithms’ (EAs) success in bi-level optimization, we use the newly suggested co-evolutionary migration-based algorithm (CEMBA) as a search engine in this research to address our bi-level architectural optimization problem. The performance of our suggested technique, called Bi-CNN-D-C (Bi-level convolution neural network design and compression), is evaluated using the widely used benchmark data sets for image classification, called CIFAR-10, CIFAR-100 and ImageNet. Our proposed approach is validated by means of a set of comparative experiments with respect to relevant state-of-the-art architectures.









Similar content being viewed by others
References
Louati H, Bechikh S, Louati A, Hung C-C, Ben Said L (2021) Deep convolutional neural network architecture design as a bi-level optimization problem. Neurocomputing 439:44–62
Louati A (2020) A hybridization of deep learning techniques to predict and control traffic disturbances. Artif Intell Rev 53(8):5675–5704
Louati A, Louati H, Li Z (2021) Deep learning and case-based reasoning for predictive and adaptive traffic emergency management. J Supercomput 77(5):4389–4418
Bengio Y, Lamblin P, Popovici D, Larochelle H (2006) Greedy layerwise training of deep networks. In: Scholkopf B, Platt JC, Hofmann T (eds) Advances in neural information processing systems 19, Proceedings of the twentieth annual conference on neural information processing systems, pp 153–160
LeCun YY, Bengio H (2015) Deep learning. Neurocomputing 521:7553–436444
Zhen X, Chakraborty R, Singh V(2021) Simpler certified radius maximization by propagating covariances. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition, pp 770–778
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. CoRR, arXiv: abs/1409.1556
Lopez-Rincon A, Tonda A, Elati M, Schwander O, Piwowarski B, Gallinari P (2018) Evolutionary optimization of convolutional neural networks for cancer mirna biomarkers classification. Appl Soft Comput 65:91–100
Darwish A, Hassanien AE, Das S (2020) A survey of swarm and evolutionary computing approaches for deep learning. Artif Intell Rev 53(3):1767–1812
Chauhan J, Rajasegaran J, Seneviratne S, Misra A, Seneviratne A (2018) Performance characterization of deep learning models for breathing based authentication on resource-constrained devices. In: IMWUT, pp 1–24
Perenda E, Rajendran S, Bovet G, Pollin S, Zheleva M (2021) Evolutionary optimization of residual neural network architectures for modulation classification. IEEE Trans Cogn Commun Netw. https://doi.org/10.1109/TCCN.2021.3137519
Chollet F (2017) Xception: deep learning with depthwise separable convolutions. In: IEEE conference on computer vision and pattern recognition CVPR, pp 1251–1258
Hu H, Peng R, Tai Y-W, Tang C-K (2016) Network trimming: a datadriven neuron pruning approach towards efficient deep architectures. arXiv: 1607.03250 13(3):1–18
Mishra R, Gupta HP, Dutta T (2020) A survey on deep neural network compression: Challenges, overview, and solutions. arXiv:2010.03954
Abd Elaziz M, Dahou A, Abualigah L, Yu L, Alshinwan M, Khasawneh AM, Lu S (2021) Advanced metaheuristic optimization techniques in applications of deep neural networks: a review. Neural Comput Appl 33(21):14079–14099
Ünal HT, Başçiftçi F (2022) Evolutionary design of neural network architectures: a review of three decades of research. Artif Intell Rev 55:1723–1802
Said R, Bechikh S, Louati A, Aldaej A, Said LB (2020) Solving combinatorial multi-objective bi-level optimization problems using multiple populations and migration schemes. IEEE Access 8:141674–141695
Cheung B, Sable C (2011) Hybrid evolution of convolutional networks. In: In 2011 10th international conference on machine learning and applications and workshops, pp 293–297
Deng L (2012) The mnist database of handwritten digit images for machine learning research. IEEE Signal Process Mag 29:141–142
Fujino S, Mori N, Matsumoto K (2012) The mnist database of handwritten digit images for machine learning research. IEEE Signal Process Mag 29:141–142
Real E, Moore S, Selle A, Saxena S, Suematsu Y.L, Tan J, Le Q, Kurakin A (2017) Large-scale evolution of image classifiers. In: In 34th international conference on machine learning, pp 2902–2911
Xie S, Girshick R, Dollar P, Tu Z, He K (2017) Aggregated resid- ’ual transformations for deep neural networks. In: In 34th international conference on machine learning, pp 1492–1500
Mirjalili S (2019) Evolutionary algorithms and neural networks. In: Studies in computational intelligence, ISBN:978-3-319-93025-1
Martín A, Lara-Cabrera R, Fuentes-Hurtado F, Naranjo V, Camacho D (2012) Evodeep: a new evolutionary approach for automatic deep neural networks parametrisation. J Parallel Distrib Comput 117:180–191
Real E, Aggarwal A, Huang Y, Le QV (2019) Regularized evolution for image classifier architecture search. In: In Aaai conference on artificial intelligence, pp 4780–4789
Sun Y, Xue B, Zhang M, Yen GG (2020) Completely automated CNN architecture design based on blocks. IEEE Trans Neural Netw Learn Syst 31(4):1242–1254
Liang J, Guo Q, Yue C, Qu B, Yu K (2018) A self-organizing multiobjective particle swarm optimization algorithm for multimodal multi-objective. In: In international conference on swarm intelligence, pp 550–560
Rahul M, Gupta HP, Dutta T (2020) A survey on deep neural network compression: challenges, overview, and solutions, arXiv:2010.03954
Francisco E, Fernandes J, Yen GG (2021) Pruning deep convolutional neural networks architectures with evolution strategy. Inf Sci 552(4):29–47
Hao L, Kadav A, Durdanovic I, Samet H, Graf HP (2016)Pruning deep convolutional neural networks architectures with evolution strategy. Inform Sci, arXiv:1608.08710
Luo J, Wu J, Lin W (2017) Thinet: a filter level pruning method for deep neural network compression. In: ICCV, pp 5058–5066
Denton EL, Zaremba W, Bruna J, LeCun Y, Fergus R (2014) Exploiting linear structure within convolutional networks for efficient evaluation. In: NIPS, pp 1269–1277
Hu H, Peng R, Tai YW, Tang CK (2016) Network trimming: a datadriven neuron pruning approach towards efficient deep architectures. arXiv: 1607.03250
Han S, Mao H, Dally WJ (2015) Deep compression: compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv:1510.00149
Qin Q, Ren J, Yu J, Wang H, Gao L, Zheng J, Feng Y, Fang J, Wang Z (2018) To compress, or not to compress: characterizing deep learning model compression for embedded inference. In: 2018 IEEE international conference on parallel, pp 729–736
Chauhan J, Rajasegaran J, Seneviratne S, Misra A, Seneviratne A, Lee Y (2018) Performance characterization of deep learning models for breathing-based authentication on resource-constrained devices. Proc ACM Interact Mob Wearable Ubiquitous Technol 2(4):1–24
Jacob B, Kligys S, Chen B, Zhu M, Tang M, Howard A, Adam H, Kalenichenko D (2018) Quantization and training of neural networks for efficient integer-arithmetic-only inference. In: CVPR, pp 2704–2713
Han S, Mao H, Dally, WJ (2016) Deep compression: Compressing deep neural networks with pruning, trained quantization & huffman coding. In: ICLR
Schmidhuber J, Heil, S (1995) Predictive coding with neural nets: application to text compression. In: NeurIPS, pp 1047–1054
Han S, Mao H, Dally WJ (2015) Deep compression: compressing deep neural networks with pruning, trained quantization and huffman coding, arXiv:1510.00149
Louati A, Louati H, Nusir M, Hardjono B (2020) Multi-agent deep neural networks coupled with LQF-MWM algorithm for traffic control and emergency vehicles guidance. J Ambi Intell Humanized Comput 11(11):5611–5627
Liang F, Tian Z, Dong M, Cheng S, Sun L, Li H, Chen Y, Zhang G (2021) Efficient neural network using pointwise convolution kernels with linear phase constraint. Neurocomputing 423:572–579
Bhattacharya S, Lane ND (2016) Sparsification and separation of deep learning layers for constrained resource inference on wearables. In: SenSys, pp 176–189
Zhou Y, Yen GG, Yi Z (2021) A knee-guided evolutionary algorithm for compressing deep neural networks. IEEE Trans Cybern 51(3):1626–1638
Huynh LN, Lee Y, Balan RK (2017) Deepmon: Mobile gpu-based deep learning framework for continuous vision applications. In: SenSys, pp 82–95
Song Han HM, Dally, WJ (2016) Deep compression: compressing deep neural networks with pruning, trained quantization and huffman coding. In: ICLR
Elias P (1975) Universal codeword sets and representations of the integers. IEEE Trans Cybern 21(2):194–203
Gallager R, van Voorhis D (1975) Optimal source codes for geometrically distributed integer alphabets (corresp.). IEEE Trans Inf Theory 21(2):228–230
Xie L, Yuille A (2017) Genetic cnn. In: Proceedings of the IEEE international conference on computer vision, pp 1379–1388
Spears VM, Jong KAD (1991) On the virtues of parameterized uniform crossover. In: In fourth international conference on genetic algorithms, pp 230–236
Settle TF, Krauss TP, Ramaswamy K (2006) U.S. Patent No. 7,079,585. Washington, DC: U.S. Patent and Trademark Office
Chakraborty UK, Janikow CZ (2003) An analysis of gray versus binary encoding in genetic search. US Patent 156:253–269
Chakraborty UK, Janikow CZ (2003) An analysis of gray versus binary encoding in genetic search. Inf Sci 156(3–4):253–269
Lu Z, Whalen I, Dhebar Y, Deb K, Goodman E, Banzhaf W, Boddeti VN (2019) Multi-criterion evolutionary design of deep convolutional neural networks, arXiv:abs/1912.01369
Dwork C, Feldman V, Hardt M, Pitassi T, Reingold O, Roth A (2015) STATISTICS the reusable holdout: preserving validity in adaptive data analysis. Science 349(6248):636–638
Kohavi R, John GH (1997) Wrappers for feature subset selection. Artif Intell 97(1–2):273–324
Shinozaki T, Watanabe S (2015) Structure discovery of deep neural network based on evolutionary algorithms. In: 2015 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 4979–4983
Li H, Kadav A, Durdanovic I, Samet H, Graf HP (2016) Pruning filters for efficient convnets. arXiv:1608.08710
Eiben AE, Smit SK (2011) Parameter tuning for configuring and analyzing evolutionary algorithms. Swarm Evol Comput 1(1):19–31
Lu Z, Whalen I, Dhebar Y, Deb K, Goodman E, Banzhaf W, Boddeti VN (2019) Nsga-net: neural architecture search using multi-objective genetic algorithm. In: In Genetic and evolutionary computation conference, pp 419–427
Cohen JP, Morrison P, Dao L, Roth K, Duong TQ, Ghassemi M (2020) Covid-19 image data collection: Prospective predictions are the future. journal of machine learning for biomedical imaging (melba). https://github.com/ieee8023/covid-chestxray-dataset
Canayaz M, Şehribanoğlu S, Özdağ R, Demir M (2022) COVID-19 diagnosis on CT images with Bayes optimization-based deep neural networks and machine learning algorithms. Neural Comput Appl 34(7):5349–5365
Louati H, Bechikh S, Louati A, Aldaej A, Said LB (2021) Evolutionary optimization of convolutional neural network architecture design for thoracic x-ray image classification. In: Advances and trends in artificial intelligence. Artificial Intelligence Practices, pp 121–132
Louati H, Bechikh S, Louati A, Aldaej A, Said LB (2022) Evolutionary optimization for cnn compression using thoracic x-ray image classification. In: the 34th international conference on industrial, engineering & other applications of applied intelligent systems
Shan F, Gao Y, Wang J, Shi W, Shi N, Han M, Xue Z, Shen D, Shi Y (2020) Lung infection quantification of COVID-19 in CT images with deep learning. arXiv:2003.04655
Sethy PK, Behera SK (2020) Detection of coronavirus disease (COVID-19) based on deep features. Int J Math Eng Manag Sci 5(4):643–651
Butt C, Gill J, Chun D, Babu BA (2020) Deep learning system to screen coronavirus disease 2019 pneumonia. Appl Intell. https://doi.org/10.1007/s10489-020-01714-3
Wang S, Kang B, Ma J, Zeng X, Xiao M, Guo J, Cai M, Yang J, Li Y, Meng X, Xu B (2021) A deep learning algorithm using CT images to screen for corona virus disease (COVID-19). Eur Radiol 31(8):6096–6104
Louati A, Lahyani R, Aldaej A, Aldumaykhi A, Otai S (2022) Price forecasting for real estate using machine learning: A case study on riyadh city. Concurr Comput Practice Exp 34(6):6748
Louati A, Masmoudi F, Lahyani R (2022) Traffic disturbance mining and feedforward neural network to enhance the immune network control performance. In: Proceedings of seventh international congress on information and communication technology
Banan A, Nasiri A, Taheri-Garavand A (2020) Deep learning-based appearance features extraction for automated carp species identification. Aquacult Eng 89:102053
Shamshirband S, Rabczuk T, Chau K-W (2019) A survey of deep learning techniques: application in wind and solar energy resources. IEEE Access 7:164650–164666
Fan Y, Xu K, Wu H, Zheng Y, Tao B (2020) Spatiotemporal modeling for nonlinear distributed thermal processes based on kl decomposition, mlp and lstm network. IEEE Access 8:25111–25121
Azzouz R, Bechikh S, Ben Said L (2014) A multiple reference point-based evolutionary algorithm for dynamic multi-objective optimization with undetectable changes. In: 2014 IEEE congress on evolutionary computation (CEC)
Kolstad CD (1985) A review of the literature on bi-level mathematical programming. Report Number: LA-10284-MS
Candler WV, Townsley R (1962) A study of the demand for butter in the united kingdom. Australian J Agricult Econom 6:36–48
Louati A, Lahyani R, Aldaej A, Mellouli R, Nusir M (2021) Mixed integer linear programming models to solve a real-life vehicle routing problem with pickup and delivery. Appl Sci 11(20):9551
Bard JF, Falk JE (1982) An explicit solution to the multi-level programming problem. Comput Oper Res 9(1):77–100
Shimizu K, Kobayashi Y, Muraoka K (1981) Midperipheral fundus involvement in diabetic retinopathy. Ophthalmology 88(7):601–612
Białas S, Garloff J (1985) Convex combinations of stable polynomials. J Franklin Inst 319(3):373–377
Sinha A, Malo P, Frantsev A, Deb K (2013) Multi-objective stackelberg game between a regulating authority and a mining company: a case study in environmental economics. In: 2013 IEEE congress on evolutionary computation, pp 478–485
Sinha A, Bedi S, Deb K (2018) Bilevel optimization based on kriging approximations of lower level optimal value function. In: 2018 IEEE congress on evolutionary computation (CEC), pp 1–8
Sinha A, Malo P, Deb K (2017) A review on bilevel optimization: from classical to evolutionary approaches and applications. IEEE Trans Evol Comput 22(2):276–295
Said R, Elarbi M, Bechikh S, Ben Said L (2021) Solving combinatorial bi-level optimization problems using multiple populations and migration schemes. Oper Res 1–39
Ross PJ (1996) Taguchi Techniques for Quality Engineering: Loss Function. Orthogonal Experiments, Parameter and Tolerance Design
Eiben AE, Smit SK (2011) Parameter tuning for configuring and analyzing evolutionary algorithms. Swarm Evol Comput 1(1):19–31
Acknowledgements
The authors thank the Deanship of Scientific Research at Prince Sattam bin Abdulaziz University for supporting this work.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A Manually designed CNN architectures used for compression
The most evocative examples of manual architecture are VGG16, VGG19, ResNet50, DenseNet50 and ResNet110. The details of the CNN architectures used to compare the proposed approach are summarized in Table 10.
Appendix B Bi-level optimization’s main definitions
In the academic and real-world optimization problems, the majority uses a single level of optimization. Multiples problems, however, are designed as two levels, referred to as BLOP [75]. We uncover a problem of optimization nested within external optimization constraints in such scenarios. The upper-level problem, often known as the leader problem, is considered as an external task of optimization. The nested internal optimization work called also a follower problem or a lower level, and the two-level problem is referred to as a leader–follower problem or a Stackelberg game [76]. The follower problem looks like a constraint at the upper level; therefore, uniquely the best solution with regard to the follower optimization problem could be considered as a leader candidate.
Definition: Assuming \(\Re ^{n}\times \Re ^{n} \rightarrow \Re\) to be the leader problem and \(f:\Re ^{n}\times \Re ^{n} \rightarrow \Re\) to be the follower one, analytically, a BLOP could be stated as follows:
There are two types of variables in a BLOP: variables of the upper-level variables and variables of the lower level. The optimization work for the follower problem is done against the \(x_{l}\) variables, with the \(x_{u}\) variables acting as fixed parameters. As a result, each \(x_{u}\) represents a new follower issue, the optimal solution of which is a function of \(x_{u}\) and must be found. In the leader problem, all variables (\(x_{u}\), \(x_{l}\)) are examined, with the exception of \(x_{l}\). The formal definition of BLOP is provided below:
Single-level problems are intrinsically more complex to solve than BLOPs. It is not unexpected that the majority of previous research has focused on the more straightforward situations of BLOP, such as problems with good features like linear objectives, constraint functions, convexity or quadraticity [77]. Despite the fact that it was the first study on bi-level optimization originates from the 1970s, the utility of these mathematical programs in representing hierarchical decision-making engineering and processes challenges were not realized until the early 1980s. BLOPs have encouraged researchers to devote special attention to them. Kolstad [75] compiled the first bibliographic survey on the subject (1985) in the mid-1980s.
An example of a single-objective BLOP with two levels is illustrated [1]
Existing BLOP-solving methods can be divided into two groups: (1) classical methods while (2) evolutionary methods. Number one family preserves extreme point-based approaches such as [78], branch-and-bound [79], penalty function methods [80], complementary pivoting [81] and methods of trust region [82]. These strategies have a disadvantage which is that they are significantly reliant to the mathematical properties of the BLOP in question. Metaheuristic algorithms, which are mostly evolutionary algorithms, belong to the second family (EA). Several EAs have recently proved their efficacy in addressing such problems due to their insensitivity to the mathematical properties of the problem, as well as their capacity to handle enormous problem instances by offering satisfactory answers in a fair amount of time. Here are a few examples of notable works [80, 81].
Appendix C Main principle of CEMBA
As for each upper-level architecture there exists a whole search space of possible filter pruning decisions, the joint neural architecture search and (filter) compression are framed as a bi-level optimization problem. Motivated by the recent emergence of the field of EBO (evolutionary bi-level optimization) and the related achieved interesting results in many application fields, we decided to use the evolutionary algorithm as a search engine to solve our bi-level optimization problem. The main difficulty faced in EBO is the important computational cost it needs. This is because each the fitness evaluation of each upper-level solution requires running a whole lower-level evolutionary process to approximate the optimal corresponding lower-level solution. Through the literature there exist many EBO algorithms [83], but most of them focus on problems with continuous variables (using approximation techniques and gradient information). The number of algorithms that were designed for the discrete case is much reduced with regard to the continuous case. Examples of discrete EBO algorithms are NBEA, BLMA, CODBA, CODCRO and CEMBA [84]. As the majority of algorithms deal with the continuous situation, CEMBA among the most effective bi-level EAs for dealing with discrete scenarios that has been demonstrated [17]. Every upper-level solution is evaluated using 2 stages: first, the variables of the upper-level are directed to the lower level; and second, the indicator-based multi-objective local search gets closer to the solution with the maximum marginal contribution in terms of multi-objective quality indicator and sends it to the next level to complete the evaluation of the quality of the consideration. To summarize how it works, we will go over the research processes of its upper and lower levels:
-
Upper and lower population initialization Create the upper-level population and the lower-level population from scratch. Two starting populations were obtained by using the Discrete Space Decomposition Method twice. The goal of utilizing a decomposition approach is to produce uniform coverage of the decision space and, to the extent possible, a collection of solutions that are evenly dispersed over every level decision space.
-
Lower-level fitness assignment To assess an upper-level solution in BLOP, a bi-level optimization problem necessitates to execute of an entire lower-level method, which is the BLOP’s fundamental challenge. As a result, we dissect each problem level using 2 populations for solving it. The lower-level algorithm of each lower population employs the higher solutions of the matching upper population to evaluate the lower-level solutions.
-
Local search procedure The local search is applied for each lower-level population using the IBMOLS principle for the lower-level method. In fact, we begin by calculating the objective function’s normalized values. As a result, for each lower solution, we create a neighborhood and then calculate its fitness value using an indicator I and the objective function’s normalized values. An update of the fitness values is then performed, the worst-case solution is removed, and the fitness values are updated again. It is worth noting that neighborhood generation comes to a halt in one of two situations: (1) whenever entire solution neighborhood has been examined, or (2) in case an adjacent solution that improves (with regard to I) is discovered. In case that all lower-level members have been visited, the entire local search procedure comes to an end.
-
Best indicator contribution lower-level solution determination Because evaluating the upper-level population’s leader solutions necessitates approximating each matching lower-level population’s follower solution, the lower-level solutions are compared to the members of the upper-level.
-
Upper-level indicator-based procedure Each higher-level population performs its algorithm based on the IBEA after obtaining the lower-level solutions with the best indicator contribution of the follower problem. This is because we find the person with the lowest fitness value, delete him or her and then update the fitness values of the remaining people until we reach the stop criterion. After that, mating selection and variation are used. We should remark that using IBEA aids the algorithm in approximating the best upper front.
-
Migration strategy (each \(\alpha\) generations) After a certain number of generations, use a migration strategy. As a result, we use the parameter \(\beta\) to select a set of solutions that includes \(\beta\) objective follower space solutions. The migration step employs this pre-selected collection of solutions.
Appendix D Tuning of parameters
The Taguchi method [85] is a sophisticated case of the trial-and-error one [86]. In order to clarify more and verify the proposed parameter tuning values, we have applied the Taguchi method in which the signal-to-noise ratio (SNR) parameter is calculated as follows:
where N represents the number of performed runs. The SNR parameter reflects the variability and the mean of the experimental data. The used parameters for tuning are the following: (1) population size (Pop. size), (2) upper generation number (UGen. nb) and (3) lower generation number (LGen. nb). The considered levels for each parameter, while the corresponding orthogonal array (L27(33 ) where we have 27 experiments, 3 variables and 2 levels. Figure 11 displays the obtained SNR results for IB-CEMBA. Moreover, Fig. 12 displays computed results for Bi-CNN-D-C in terms of mean fitness values of the upper-level in Taguchi experimental analysis, which confirmed the achieved optimal levels using SNR parameter. In fact, the computed mean upper-level fitness values confirmed the achieved optimal level using SNR parameter.
Rights and permissions
About this article
Cite this article
Louati, H., Bechikh, S., Louati, A. et al. Joint design and compression of convolutional neural networks as a Bi-level optimization problem. Neural Comput & Applic 34, 15007–15029 (2022). https://doi.org/10.1007/s00521-022-07331-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-022-07331-0