Skip to main content

Advertisement

Log in

Joint design and compression of convolutional neural networks as a Bi-level optimization problem

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Over the last decade, deep neural networks have shown great success in the fields of machine learning and computer vision. Currently, the CNN (convolutional neural network) is one of the most successful networks, having been applied in a wide variety of application domains, including pattern recognition, medical diagnosis and signal processing. Despite CNNs’ impressive performance, their architectural design remains a significant challenge for researchers and practitioners. The problem of selecting hyperparameters is extremely important for these networks. The reason for this is that the search space grows exponentially in size as the number of layers increases. In fact, all existing classical and evolutionary pruning methods take as input an already pre-trained or designed architecture. None of them take pruning into account during the design process. However, to evaluate the quality and possible compactness of any generated architecture, filter pruning should be applied before the communication with the data set to compute the classification error. For instance, a medium-quality architecture in terms of classification could become a very light and accurate architecture after pruning, and vice versa. Many cases are possible, and the number of possibilities is huge. This motivated us to frame the whole process as a bi-level optimization problem where: (1) architecture generation is done at the upper level (with minimum NB and NNB) while (2) its filter pruning optimization is done at the lower level. Motivated by evolutionary algorithms’ (EAs) success in bi-level optimization, we use the newly suggested co-evolutionary migration-based algorithm (CEMBA) as a search engine in this research to address our bi-level architectural optimization problem. The performance of our suggested technique, called Bi-CNN-D-C (Bi-level convolution neural network design and compression), is evaluated using the widely used benchmark data sets for image classification, called CIFAR-10, CIFAR-100 and ImageNet. Our proposed approach is validated by means of a set of comparative experiments with respect to relevant state-of-the-art architectures.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+
from $39.99 /Month
  • Starting from 10 chapters or articles per month
  • Access and download chapters and articles from more than 300k books and 2,500 journals
  • Cancel anytime
View plans

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. Louati H, Bechikh S, Louati A, Hung C-C, Ben Said L (2021) Deep convolutional neural network architecture design as a bi-level optimization problem. Neurocomputing 439:44–62

    Article  Google Scholar 

  2. Louati A (2020) A hybridization of deep learning techniques to predict and control traffic disturbances. Artif Intell Rev 53(8):5675–5704

    Article  Google Scholar 

  3. Louati A, Louati H, Li Z (2021) Deep learning and case-based reasoning for predictive and adaptive traffic emergency management. J Supercomput 77(5):4389–4418

    Article  Google Scholar 

  4. Bengio Y, Lamblin P, Popovici D, Larochelle H (2006) Greedy layerwise training of deep networks. In: Scholkopf B, Platt JC, Hofmann T (eds) Advances in neural information processing systems 19, Proceedings of the twentieth annual conference on neural information processing systems, pp 153–160

  5. LeCun YY, Bengio H (2015) Deep learning. Neurocomputing 521:7553–436444

    Google Scholar 

  6. Zhen X, Chakraborty R, Singh V(2021) Simpler certified radius maximization by propagating covariances. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition, pp 770–778

  7. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. CoRR, arXiv: abs/1409.1556

  8. Lopez-Rincon A, Tonda A, Elati M, Schwander O, Piwowarski B, Gallinari P (2018) Evolutionary optimization of convolutional neural networks for cancer mirna biomarkers classification. Appl Soft Comput 65:91–100

    Article  Google Scholar 

  9. Darwish A, Hassanien AE, Das S (2020) A survey of swarm and evolutionary computing approaches for deep learning. Artif Intell Rev 53(3):1767–1812

    Article  Google Scholar 

  10. Chauhan J, Rajasegaran J, Seneviratne S, Misra A, Seneviratne A (2018) Performance characterization of deep learning models for breathing based authentication on resource-constrained devices. In: IMWUT, pp 1–24

  11. Perenda E, Rajendran S, Bovet G, Pollin S, Zheleva M (2021) Evolutionary optimization of residual neural network architectures for modulation classification. IEEE Trans Cogn Commun Netw. https://doi.org/10.1109/TCCN.2021.3137519

    Article  Google Scholar 

  12. Chollet F (2017) Xception: deep learning with depthwise separable convolutions. In: IEEE conference on computer vision and pattern recognition CVPR, pp 1251–1258

  13. Hu H, Peng R, Tai Y-W, Tang C-K (2016) Network trimming: a datadriven neuron pruning approach towards efficient deep architectures. arXiv: 1607.03250 13(3):1–18

  14. Mishra R, Gupta HP, Dutta T (2020) A survey on deep neural network compression: Challenges, overview, and solutions. arXiv:2010.03954

  15. Abd Elaziz M, Dahou A, Abualigah L, Yu L, Alshinwan M, Khasawneh AM, Lu S (2021) Advanced metaheuristic optimization techniques in applications of deep neural networks: a review. Neural Comput Appl 33(21):14079–14099

    Article  Google Scholar 

  16. Ünal HT, Başçiftçi F (2022) Evolutionary design of neural network architectures: a review of three decades of research. Artif Intell Rev 55:1723–1802

    Article  Google Scholar 

  17. Said R, Bechikh S, Louati A, Aldaej A, Said LB (2020) Solving combinatorial multi-objective bi-level optimization problems using multiple populations and migration schemes. IEEE Access 8:141674–141695

    Article  Google Scholar 

  18. Cheung B, Sable C (2011) Hybrid evolution of convolutional networks. In: In 2011 10th international conference on machine learning and applications and workshops, pp 293–297

  19. Deng L (2012) The mnist database of handwritten digit images for machine learning research. IEEE Signal Process Mag 29:141–142

    Article  Google Scholar 

  20. Fujino S, Mori N, Matsumoto K (2012) The mnist database of handwritten digit images for machine learning research. IEEE Signal Process Mag 29:141–142

    Article  Google Scholar 

  21. Real E, Moore S, Selle A, Saxena S, Suematsu Y.L, Tan J, Le Q, Kurakin A (2017) Large-scale evolution of image classifiers. In: In 34th international conference on machine learning, pp 2902–2911

  22. Xie S, Girshick R, Dollar P, Tu Z, He K (2017) Aggregated resid- ’ual transformations for deep neural networks. In: In 34th international conference on machine learning, pp 1492–1500

  23. Mirjalili S (2019) Evolutionary algorithms and neural networks. In: Studies in computational intelligence, ISBN:978-3-319-93025-1

  24. Martín A, Lara-Cabrera R, Fuentes-Hurtado F, Naranjo V, Camacho D (2012) Evodeep: a new evolutionary approach for automatic deep neural networks parametrisation. J Parallel Distrib Comput 117:180–191

    Article  Google Scholar 

  25. Real E, Aggarwal A, Huang Y, Le QV (2019) Regularized evolution for image classifier architecture search. In: In Aaai conference on artificial intelligence, pp 4780–4789

  26. Sun Y, Xue B, Zhang M, Yen GG (2020) Completely automated CNN architecture design based on blocks. IEEE Trans Neural Netw Learn Syst 31(4):1242–1254

    Article  MathSciNet  Google Scholar 

  27. Liang J, Guo Q, Yue C, Qu B, Yu K (2018) A self-organizing multiobjective particle swarm optimization algorithm for multimodal multi-objective. In: In international conference on swarm intelligence, pp 550–560

  28. Rahul M, Gupta HP, Dutta T (2020) A survey on deep neural network compression: challenges, overview, and solutions, arXiv:2010.03954

  29. Francisco E, Fernandes J, Yen GG (2021) Pruning deep convolutional neural networks architectures with evolution strategy. Inf Sci 552(4):29–47

    MathSciNet  MATH  Google Scholar 

  30. Hao L, Kadav A, Durdanovic I, Samet H, Graf HP (2016)Pruning deep convolutional neural networks architectures with evolution strategy. Inform Sci, arXiv:1608.08710

  31. Luo J, Wu J, Lin W (2017) Thinet: a filter level pruning method for deep neural network compression. In: ICCV, pp 5058–5066

  32. Denton EL, Zaremba W, Bruna J, LeCun Y, Fergus R (2014) Exploiting linear structure within convolutional networks for efficient evaluation. In: NIPS, pp 1269–1277

  33. Hu H, Peng R, Tai YW, Tang CK (2016) Network trimming: a datadriven neuron pruning approach towards efficient deep architectures. arXiv: 1607.03250

  34. Han S, Mao H, Dally WJ (2015) Deep compression: compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv:1510.00149

  35. Qin Q, Ren J, Yu J, Wang H, Gao L, Zheng J, Feng Y, Fang J, Wang Z (2018) To compress, or not to compress: characterizing deep learning model compression for embedded inference. In: 2018 IEEE international conference on parallel, pp 729–736

  36. Chauhan J, Rajasegaran J, Seneviratne S, Misra A, Seneviratne A, Lee Y (2018) Performance characterization of deep learning models for breathing-based authentication on resource-constrained devices. Proc ACM Interact Mob Wearable Ubiquitous Technol 2(4):1–24

    Article  Google Scholar 

  37. Jacob B, Kligys S, Chen B, Zhu M, Tang M, Howard A, Adam H, Kalenichenko D (2018) Quantization and training of neural networks for efficient integer-arithmetic-only inference. In: CVPR, pp 2704–2713

  38. Han S, Mao H, Dally, WJ (2016) Deep compression: Compressing deep neural networks with pruning, trained quantization & huffman coding. In: ICLR

  39. Schmidhuber J, Heil, S (1995) Predictive coding with neural nets: application to text compression. In: NeurIPS, pp 1047–1054

  40. Han S, Mao H, Dally WJ (2015) Deep compression: compressing deep neural networks with pruning, trained quantization and huffman coding, arXiv:1510.00149

  41. Louati A, Louati H, Nusir M, Hardjono B (2020) Multi-agent deep neural networks coupled with LQF-MWM algorithm for traffic control and emergency vehicles guidance. J Ambi Intell Humanized Comput 11(11):5611–5627

    Article  Google Scholar 

  42. Liang F, Tian Z, Dong M, Cheng S, Sun L, Li H, Chen Y, Zhang G (2021) Efficient neural network using pointwise convolution kernels with linear phase constraint. Neurocomputing 423:572–579

    Article  Google Scholar 

  43. Bhattacharya S, Lane ND (2016) Sparsification and separation of deep learning layers for constrained resource inference on wearables. In: SenSys, pp 176–189

  44. Zhou Y, Yen GG, Yi Z (2021) A knee-guided evolutionary algorithm for compressing deep neural networks. IEEE Trans Cybern 51(3):1626–1638

    Article  Google Scholar 

  45. Huynh LN, Lee Y, Balan RK (2017) Deepmon: Mobile gpu-based deep learning framework for continuous vision applications. In: SenSys, pp 82–95

  46. Song Han HM, Dally, WJ (2016) Deep compression: compressing deep neural networks with pruning, trained quantization and huffman coding. In: ICLR

  47. Elias P (1975) Universal codeword sets and representations of the integers. IEEE Trans Cybern 21(2):194–203

    MathSciNet  MATH  Google Scholar 

  48. Gallager R, van Voorhis D (1975) Optimal source codes for geometrically distributed integer alphabets (corresp.). IEEE Trans Inf Theory 21(2):228–230

    Article  Google Scholar 

  49. Xie L, Yuille A (2017) Genetic cnn. In: Proceedings of the IEEE international conference on computer vision, pp 1379–1388

  50. Spears VM, Jong KAD (1991) On the virtues of parameterized uniform crossover. In: In fourth international conference on genetic algorithms, pp 230–236

  51. Settle TF, Krauss TP, Ramaswamy K (2006) U.S. Patent No. 7,079,585. Washington, DC: U.S. Patent and Trademark Office

  52. Chakraborty UK, Janikow CZ (2003) An analysis of gray versus binary encoding in genetic search. US Patent 156:253–269

    MathSciNet  Google Scholar 

  53. Chakraborty UK, Janikow CZ (2003) An analysis of gray versus binary encoding in genetic search. Inf Sci 156(3–4):253–269

    Article  MathSciNet  Google Scholar 

  54. Lu Z, Whalen I, Dhebar Y, Deb K, Goodman E, Banzhaf W, Boddeti VN (2019) Multi-criterion evolutionary design of deep convolutional neural networks, arXiv:abs/1912.01369

  55. Dwork C, Feldman V, Hardt M, Pitassi T, Reingold O, Roth A (2015) STATISTICS the reusable holdout: preserving validity in adaptive data analysis. Science 349(6248):636–638

    Article  MathSciNet  Google Scholar 

  56. Kohavi R, John GH (1997) Wrappers for feature subset selection. Artif Intell 97(1–2):273–324

    Article  Google Scholar 

  57. Shinozaki T, Watanabe S (2015) Structure discovery of deep neural network based on evolutionary algorithms. In: 2015 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 4979–4983

  58. Li H, Kadav A, Durdanovic I, Samet H, Graf HP (2016) Pruning filters for efficient convnets. arXiv:1608.08710

  59. Eiben AE, Smit SK (2011) Parameter tuning for configuring and analyzing evolutionary algorithms. Swarm Evol Comput 1(1):19–31

    Article  Google Scholar 

  60. Lu Z, Whalen I, Dhebar Y, Deb K, Goodman E, Banzhaf W, Boddeti VN (2019) Nsga-net: neural architecture search using multi-objective genetic algorithm. In: In Genetic and evolutionary computation conference, pp 419–427

  61. Cohen JP, Morrison P, Dao L, Roth K, Duong TQ, Ghassemi M (2020) Covid-19 image data collection: Prospective predictions are the future. journal of machine learning for biomedical imaging (melba). https://github.com/ieee8023/covid-chestxray-dataset

  62. Canayaz M, Şehribanoğlu S, Özdağ R, Demir M (2022) COVID-19 diagnosis on CT images with Bayes optimization-based deep neural networks and machine learning algorithms. Neural Comput Appl 34(7):5349–5365

    Article  Google Scholar 

  63. Louati H, Bechikh S, Louati A, Aldaej A, Said LB (2021) Evolutionary optimization of convolutional neural network architecture design for thoracic x-ray image classification. In: Advances and trends in artificial intelligence. Artificial Intelligence Practices, pp 121–132

  64. Louati H, Bechikh S, Louati A, Aldaej A, Said LB (2022) Evolutionary optimization for cnn compression using thoracic x-ray image classification. In: the 34th international conference on industrial, engineering & other applications of applied intelligent systems

  65. Shan F, Gao Y, Wang J, Shi W, Shi N, Han M, Xue Z, Shen D, Shi Y (2020) Lung infection quantification of COVID-19 in CT images with deep learning. arXiv:2003.04655

  66. Sethy PK, Behera SK (2020) Detection of coronavirus disease (COVID-19) based on deep features. Int J Math Eng Manag Sci 5(4):643–651

    Google Scholar 

  67. Butt C, Gill J, Chun D, Babu BA (2020) Deep learning system to screen coronavirus disease 2019 pneumonia. Appl Intell. https://doi.org/10.1007/s10489-020-01714-3

    Article  Google Scholar 

  68. Wang S, Kang B, Ma J, Zeng X, Xiao M, Guo J, Cai M, Yang J, Li Y, Meng X, Xu B (2021) A deep learning algorithm using CT images to screen for corona virus disease (COVID-19). Eur Radiol 31(8):6096–6104

    Article  Google Scholar 

  69. Louati A, Lahyani R, Aldaej A, Aldumaykhi A, Otai S (2022) Price forecasting for real estate using machine learning: A case study on riyadh city. Concurr Comput Practice Exp 34(6):6748

    Google Scholar 

  70. Louati A, Masmoudi F, Lahyani R (2022) Traffic disturbance mining and feedforward neural network to enhance the immune network control performance. In: Proceedings of seventh international congress on information and communication technology

  71. Banan A, Nasiri A, Taheri-Garavand A (2020) Deep learning-based appearance features extraction for automated carp species identification. Aquacult Eng 89:102053

    Article  Google Scholar 

  72. Shamshirband S, Rabczuk T, Chau K-W (2019) A survey of deep learning techniques: application in wind and solar energy resources. IEEE Access 7:164650–164666

    Article  Google Scholar 

  73. Fan Y, Xu K, Wu H, Zheng Y, Tao B (2020) Spatiotemporal modeling for nonlinear distributed thermal processes based on kl decomposition, mlp and lstm network. IEEE Access 8:25111–25121

    Article  Google Scholar 

  74. Azzouz R, Bechikh S, Ben Said L (2014) A multiple reference point-based evolutionary algorithm for dynamic multi-objective optimization with undetectable changes. In: 2014 IEEE congress on evolutionary computation (CEC)

  75. Kolstad CD (1985) A review of the literature on bi-level mathematical programming. Report Number: LA-10284-MS

  76. Candler WV, Townsley R (1962) A study of the demand for butter in the united kingdom. Australian J Agricult Econom 6:36–48

    Article  Google Scholar 

  77. Louati A, Lahyani R, Aldaej A, Mellouli R, Nusir M (2021) Mixed integer linear programming models to solve a real-life vehicle routing problem with pickup and delivery. Appl Sci 11(20):9551

    Article  Google Scholar 

  78. Bard JF, Falk JE (1982) An explicit solution to the multi-level programming problem. Comput Oper Res 9(1):77–100

    Article  MathSciNet  Google Scholar 

  79. Shimizu K, Kobayashi Y, Muraoka K (1981) Midperipheral fundus involvement in diabetic retinopathy. Ophthalmology 88(7):601–612

    Article  Google Scholar 

  80. Białas S, Garloff J (1985) Convex combinations of stable polynomials. J Franklin Inst 319(3):373–377

    Article  MathSciNet  Google Scholar 

  81. Sinha A, Malo P, Frantsev A, Deb K (2013) Multi-objective stackelberg game between a regulating authority and a mining company: a case study in environmental economics. In: 2013 IEEE congress on evolutionary computation, pp 478–485

  82. Sinha A, Bedi S, Deb K (2018) Bilevel optimization based on kriging approximations of lower level optimal value function. In: 2018 IEEE congress on evolutionary computation (CEC), pp 1–8

  83. Sinha A, Malo P, Deb K (2017) A review on bilevel optimization: from classical to evolutionary approaches and applications. IEEE Trans Evol Comput 22(2):276–295

    Article  Google Scholar 

  84. Said R, Elarbi M, Bechikh S, Ben Said L (2021) Solving combinatorial bi-level optimization problems using multiple populations and migration schemes. Oper Res 1–39

  85. Ross PJ (1996) Taguchi Techniques for Quality Engineering: Loss Function. Orthogonal Experiments, Parameter and Tolerance Design

  86. Eiben AE, Smit SK (2011) Parameter tuning for configuring and analyzing evolutionary algorithms. Swarm Evol Comput 1(1):19–31

    Article  Google Scholar 

Download references

Acknowledgements

The authors thank the Deanship of Scientific Research at Prince Sattam bin Abdulaziz University for supporting this work.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ali Louati.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A Manually designed CNN architectures used for compression

The most evocative examples of manual architecture are VGG16, VGG19, ResNet50, DenseNet50 and ResNet110. The details of the CNN architectures used to compare the proposed approach are summarized in Table 10.

Table 10 CNN architectures overview for comparing the suggested approach

Appendix B Bi-level optimization’s main definitions

In the academic and real-world optimization problems, the majority uses a single level of optimization. Multiples problems, however, are designed as two levels, referred to as BLOP [75]. We uncover a problem of optimization nested within external optimization constraints in such scenarios. The upper-level problem, often known as the leader problem, is considered as an external task of optimization. The nested internal optimization work called also a follower problem or a lower level, and the two-level problem is referred to as a leader–follower problem or a Stackelberg game [76]. The follower problem looks like a constraint at the upper level; therefore, uniquely the best solution with regard to the follower optimization problem could be considered as a leader candidate.

Definition: Assuming \(\Re ^{n}\times \Re ^{n} \rightarrow \Re\) to be the leader problem and \(f:\Re ^{n}\times \Re ^{n} \rightarrow \Re\) to be the follower one, analytically, a BLOP could be stated as follows:

$$\begin{aligned}&\mathrm{Min}{_{x_{u} \in X_{U},x_{l}\in X_{L}}}L(x_{u},x_{l}) \nonumber \\&\mathrm{subject} \, to \left\{ \begin{matrix} x_{l}\in \mathrm{ArgMin} \begin{Bmatrix} f(x_{u},x_{l}),g_{j}(x_{u},x_{l}))\le 0,j=1,\ldots ,J \end{Bmatrix}\\ G_{k}(x_{u},x_{l})\le 0,k=1,\ldots ,K . \end{matrix}\right. \end{aligned}$$
(B1)

There are two types of variables in a BLOP: variables of the upper-level variables and variables of the lower level. The optimization work for the follower problem is done against the \(x_{l}\) variables, with the \(x_{u}\) variables acting as fixed parameters. As a result, each \(x_{u}\) represents a new follower issue, the optimal solution of which is a function of \(x_{u}\) and must be found. In the leader problem, all variables (\(x_{u}\), \(x_{l}\)) are examined, with the exception of \(x_{l}\). The formal definition of BLOP is provided below:

Single-level problems are intrinsically more complex to solve than BLOPs. It is not unexpected that the majority of previous research has focused on the more straightforward situations of BLOP, such as problems with good features like linear objectives, constraint functions, convexity or quadraticity [77]. Despite the fact that it was the first study on bi-level optimization originates from the 1970s, the utility of these mathematical programs in representing hierarchical decision-making engineering and processes challenges were not realized until the early 1980s. BLOPs have encouraged researchers to devote special attention to them. Kolstad [75] compiled the first bibliographic survey on the subject (1985) in the mid-1980s.

Fig. 10
figure 10

An example of a single-objective BLOP with two levels is illustrated [1]

Existing BLOP-solving methods can be divided into two groups: (1) classical methods while (2) evolutionary methods. Number one family preserves extreme point-based approaches such as [78], branch-and-bound [79], penalty function methods [80], complementary pivoting [81] and methods of trust region [82]. These strategies have a disadvantage which is that they are significantly reliant to the mathematical properties of the BLOP in question. Metaheuristic algorithms, which are mostly evolutionary algorithms, belong to the second family (EA). Several EAs have recently proved their efficacy in addressing such problems due to their insensitivity to the mathematical properties of the problem, as well as their capacity to handle enormous problem instances by offering satisfactory answers in a fair amount of time. Here are a few examples of notable works [80, 81].

Appendix C Main principle of CEMBA

As for each upper-level architecture there exists a whole search space of possible filter pruning decisions, the joint neural architecture search and (filter) compression are framed as a bi-level optimization problem. Motivated by the recent emergence of the field of EBO (evolutionary bi-level optimization) and the related achieved interesting results in many application fields, we decided to use the evolutionary algorithm as a search engine to solve our bi-level optimization problem. The main difficulty faced in EBO is the important computational cost it needs. This is because each the fitness evaluation of each upper-level solution requires running a whole lower-level evolutionary process to approximate the optimal corresponding lower-level solution. Through the literature there exist many EBO algorithms [83], but most of them focus on problems with continuous variables (using approximation techniques and gradient information). The number of algorithms that were designed for the discrete case is much reduced with regard to the continuous case. Examples of discrete EBO algorithms are NBEA, BLMA, CODBA, CODCRO and CEMBA [84]. As the majority of algorithms deal with the continuous situation, CEMBA among the most effective bi-level EAs for dealing with discrete scenarios that has been demonstrated [17]. Every upper-level solution is evaluated using 2 stages: first, the variables of the upper-level are directed to the lower level; and second, the indicator-based multi-objective local search gets closer to the solution with the maximum marginal contribution in terms of multi-objective quality indicator and sends it to the next level to complete the evaluation of the quality of the consideration. To summarize how it works, we will go over the research processes of its upper and lower levels:

  • Upper and lower population initialization Create the upper-level population and the lower-level population from scratch. Two starting populations were obtained by using the Discrete Space Decomposition Method twice. The goal of utilizing a decomposition approach is to produce uniform coverage of the decision space and, to the extent possible, a collection of solutions that are evenly dispersed over every level decision space.

  • Lower-level fitness assignment To assess an upper-level solution in BLOP, a bi-level optimization problem necessitates to execute of an entire lower-level method, which is the BLOP’s fundamental challenge. As a result, we dissect each problem level using 2 populations for solving it. The lower-level algorithm of each lower population employs the higher solutions of the matching upper population to evaluate the lower-level solutions.

  • Local search procedure The local search is applied for each lower-level population using the IBMOLS principle for the lower-level method. In fact, we begin by calculating the objective function’s normalized values. As a result, for each lower solution, we create a neighborhood and then calculate its fitness value using an indicator I and the objective function’s normalized values. An update of the fitness values is then performed, the worst-case solution is removed, and the fitness values are updated again. It is worth noting that neighborhood generation comes to a halt in one of two situations: (1) whenever entire solution neighborhood has been examined, or (2) in case an adjacent solution that improves (with regard to I) is discovered. In case that all lower-level members have been visited, the entire local search procedure comes to an end.

  • Best indicator contribution lower-level solution determination Because evaluating the upper-level population’s leader solutions necessitates approximating each matching lower-level population’s follower solution, the lower-level solutions are compared to the members of the upper-level.

  • Upper-level indicator-based procedure Each higher-level population performs its algorithm based on the IBEA after obtaining the lower-level solutions with the best indicator contribution of the follower problem. This is because we find the person with the lowest fitness value, delete him or her and then update the fitness values of the remaining people until we reach the stop criterion. After that, mating selection and variation are used. We should remark that using IBEA aids the algorithm in approximating the best upper front.

  • Migration strategy (each \(\alpha\) generations) After a certain number of generations, use a migration strategy. As a result, we use the parameter \(\beta\) to select a set of solutions that includes \(\beta\) objective follower space solutions. The migration step employs this pre-selected collection of solutions.

Appendix D Tuning of parameters

The Taguchi method [85] is a sophisticated case of the trial-and-error one [86]. In order to clarify more and verify the proposed parameter tuning values, we have applied the Taguchi method in which the signal-to-noise ratio (SNR) parameter is calculated as follows:

$$\begin{aligned} \hbox {SNR}=-log_{10}\left(\frac{1}{N}\sum _{i=1}^{N}\hbox {(objective function)}_{i}^{2}\right) \end{aligned}$$
(D2)
Fig. 11
figure 11

Signal to noise

Fig. 12
figure 12

Mean of means

where N represents the number of performed runs. The SNR parameter reflects the variability and the mean of the experimental data. The used parameters for tuning are the following: (1) population size (Pop. size), (2) upper generation number (UGen. nb) and (3) lower generation number (LGen. nb). The considered levels for each parameter, while the corresponding orthogonal array (L27(33 ) where we have 27 experiments, 3 variables and 2 levels. Figure 11 displays the obtained SNR results for IB-CEMBA. Moreover, Fig. 12 displays computed results for Bi-CNN-D-C in terms of mean fitness values of the upper-level in Taguchi experimental analysis, which confirmed the achieved optimal levels using SNR parameter. In fact, the computed mean upper-level fitness values confirmed the achieved optimal level using SNR parameter.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Louati, H., Bechikh, S., Louati, A. et al. Joint design and compression of convolutional neural networks as a Bi-level optimization problem. Neural Comput & Applic 34, 15007–15029 (2022). https://doi.org/10.1007/s00521-022-07331-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-022-07331-0

Keywords

Profiles

  1. Ali Louati