Skip to main content
Log in

A hierarchical evolution of neural architecture search method based on state transition algorithm

  • Original Article
  • Published:
International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Abstract

Among different search strategies to realize neural architecture search (NAS), evolutionary algorithms (EAs) have gained much attention due to their global optimization capability. However, the large number of performance evaluation procedure makes the EA-based methods extremely time-consuming. To address this issue, an efficient framework called HENA (Hierarchical Evolution of Neural Architecture) is proposed in this paper. In HENA, NAS is hierarchically divided into two continuous phases, candidate operation search and connection relationship search. For this purpose, a supernet is defined to subsume the whole search space, where all weights can be inherited by all architectures. A novel evolutionary algorithm called state transition algorithm (STA) is used to traverse the search space continuously. In the first phase, several sampled architectures are trained on mini-group data to search better operation for different positions within the network. In the second phase, better connection relationship among operations can be directly determined without training via inheriting trained weights. Finally, the proposed method is evaluated on the widely used datasets, and the experimental results show that (1) the architecture learned by HENA obtains state-of-the-art performance (2.93% and 20.44% error rate on CIFAR-10 and CIFAR-100 respectively). (2) the learned architecture achieves 24.8% top-1 error on ImageNet dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Data availability

Some or all data, models, or code generated or used during the study are available from the corresponding author by request.

References

  1. Zhou X, Gao Y, Li C, Huang Z (2021) A multiple gradient descent design for multi-task learning on edge computing: multi-objective machine learning approach. IEEE Trans Netw Sci Eng 9(1):121-133

  2. Schmidhuber J (2015) Deep learning in neural networks: an overview. Neural Netw 61:85–117

    Article  Google Scholar 

  3. Sutskever I, Vinyals O, Le QV (2014) Sequence to sequence learning with neural networks. In: Advances in neural information processing systems, pp 3104–3112

  4. Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4700–4708

  5. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

  6. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556

  7. Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7132–7141

  8. Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H (2017) MobileNets: efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861

  9. Sandler M, Howard A, Zhu M, Zhmoginov A, Chen L-C (2018) MobileNetV2: inverted residuals and linear bottlenecks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4510–4520

  10. Real E, Aggarwal A, Huang Y, Le QV (2019) Regularized evolution for image classifier architecture search. In: Proceedings of the AAAI conference on artificial intelligence, vol 33, pp 4780–4789

  11. Elsken T, Metzen JH, Hutter F (2019) Neural architecture search: a survey. J Mach Learn Res 20(1):1997–2017

    MathSciNet  MATH  Google Scholar 

  12. Zoph B, Le QV (2016) Neural architecture search with reinforcement learning. arXiv preprint arXiv:1611.01578

  13. Guo J, Han K, Wang Y, Zhang C, Yang Z, Wu H, Chen X, Xu C (2020) Hit-detector: hierarchical trinity architecture search for object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11405–11414

  14. Dong H, Zou B, Zhang L, Zhang S (2020) Automatic design of CNNS via differentiable neural architecture search for PolSAR image classification. IEEE Trans Geosci Remote Sens 58(9):6362–6375

    Article  Google Scholar 

  15. Liu Y, Sun Y, Xue B, Zhang M, Yen GG, Tan KC (2021) A survey on evolutionary neural architecture search. IEEE Trans Neural Netw Learn Syst

  16. Xie L, Yuille A (2017) Genetic CNN. In: Proceedings of the IEEE international conference on computer vision, pp 1379–1388

  17. Real E, Moore S, Selle A, Saxena S, Suematsu YL, Tan J, Le QV, Kurakin A (2017) Large-scale evolution of image classifiers. In: International conference on machine learning, PMLR, pp 2902–2911

  18. Song D, Xu C, Jia X, Chen Y, Xu C, Wang Y (2020) Efficient residual dense block search for image super-resolution. In: Proceedings of the AAAI conference on artificial intelligence, vol 34, pp 12007–12014

  19. Sun Y, Xue B, Zhang M, Yen GG, Lv J (2020) Automatically designing CNN architectures using the genetic algorithm for image classification. IEEE Trans Cybern 50(9):3840–3854

    Article  Google Scholar 

  20. Zoph B, Vasudevan V, Shlens J, Le QV (2018) Learning transferable architectures for scalable image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 8697–8710

  21. Zhong Z, Yang Z, Deng B, Yan J, Wu W, Shao J, Liu C (2021) BlockQNN: efficient block-wise neural network architecture generation. IEEE Trans Pattern Anal Mach Intell 43(7):2314–2328

    Article  Google Scholar 

  22. Liu H, Simonyan K, Yang Y (2018) Darts: differentiable architecture search. arXiv preprint arXiv:1806.09055

  23. Sun Y, Xue B, Zhang M, Yen GG (2019) Completely automated CNN architecture design based on blocks. IEEE Trans Neural Netw Learn Syst 31(4):1242–1254

    Article  MathSciNet  Google Scholar 

  24. Zhou X, Gao DY, Yang C, Gui W (2016) Discrete state transition algorithm for unconstrained integer optimization problems. Neurocomputing 173:864–874

    Article  Google Scholar 

  25. Zhou X, Yang C, Gui W (2012) State transition algorithm. J Ind Manag Optim 33(12):1039–1056

    Article  MathSciNet  MATH  Google Scholar 

  26. Zhou X, Yang C, Gui W (2019) A statistical study on parameter selection of operators in continuous state transition algorithm. IEEE Trans Cybern 49(10):3722–3730

    Article  Google Scholar 

  27. Yang C, Tang X, Zhou X, Gui W (2013) A discrete state transition algorithm for traveling salesman problem. Control Theory Appl 30(8):1040–1046

    Google Scholar 

  28. Dong T, Yang C, Zhou X, Gui W (2016) A novel discrete state transition algorithm for staff assignment problem. Control Theory Appl 33(10):1378–1388

    MATH  Google Scholar 

  29. Zhou X, Gao DY, Simpson AR (2016) Optimal design of water distribution networks by a discrete state transition algorithm. Eng Optim 48(4):603–628

    Article  Google Scholar 

  30. Zhou X, Yang K, Xie Y, Yang C, Huang T (2019) A novel modularity-based discrete state transition algorithm for community detection in networks. Neurocomputing 334:89–99

    Article  Google Scholar 

  31. Huang Z, Yang C, Zhou X, Huang T (2019) A hybrid feature selection method based on binary state transition algorithm and reliefF. IEEE J Biomed Health Inform 23(5):1888–1898

    Article  Google Scholar 

  32. Zhou X, Zhang R, Yang C et al (2020) A hybrid feature selection method for production condition recognition in froth flotation with noisy labels. Miner Eng 153:106201

    Article  Google Scholar 

  33. Cai H, Zhu L, Han S (2018) ProxylessNas: direct neural architecture search on target task and hardware. arXiv preprint arXiv:1812.00332

  34. Pham H, Guan M, Zoph B, Le Q, Dean J (2018) Efficient neural architecture search via parameters sharing. In: International conference on machine learning, PMLR, pp 4095–4104

  35. Xie S, Zheng H, Liu C, Lin L (2018) SNAS: stochastic neural architecture search. arXiv preprint arXiv:1812.09926

  36. Wei C, Niu C, Tang Y, Wang Y, Hu H, Liang J (2022) NPENAS: neural predictor guided evolution for neural architecture search. IEEE Trans Neural Netw Learn Syst

  37. Sun Y, Wang H, Xue B, Jin Y, Yen GG, Zhang M (2019) Surrogate-assisted evolutionary deep learning using an end-to-end random forest-based performance predictor. IEEE Trans Evol Comput 24(2):350–364

    Article  Google Scholar 

  38. Tang Y, Wang Y, Xu Y, Chen H, Shi B, Xu C, Xu C, Tian Q, Xu C (2020) A semi-supervised assessor of neural architectures. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1810–1819

  39. Guan C, Wang X, Zhu W (2021) AutoAttend: automated attention representation search. In: International conference on machine learning, PMLR, pp 3864–3874

  40. Liu S, Zhang H, Jin Y (2022) A survey on computationally efficient neural architecture search. J Autom Intell 1(1):100002

    Google Scholar 

  41. Guo Z, Zhang X, Mu H, Heng W, Liu Z, Wei Y, Sun J (2020) Single path one-shot neural architecture search with uniform sampling. In: European conference on computer vision, Springer, pp 544–560

  42. Bender G, Kindermans P-J, Zoph B, Vasudevan V, Le Q (2018) Understanding and simplifying one-shot architecture search. In: International conference on machine learning, PMLR, pp 550–559

  43. Brock A, Lim T, Ritchie JM, Weston N (2017) SMASH: one-shot model architecture search through hypernetworks. arXiv preprint arXiv:1708.05344

  44. Wu B, Dai X, Zhang P, Wang Y, Sun F, Wu Y, Tian Y, Vajda P, Jia Y, Keutzer K (2019) FBNet: hardware-aware efficient ConvNET design via differentiable neural architecture search. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10734–10742

  45. Zhang X, Huang Z, Wang N, Xiang S, Pan C (2020) You only search once: single shot neural architecture search via direct sparse optimization. IEEE Trans Pattern Anal Mach Intell 43(9):2891–2904

    Article  Google Scholar 

  46. Wang R, Cheng M, Chen X, Tang X, Hsieh C-J (2021) Rethinking architecture selection in differentiable NAS. In: International conference on learning representation

  47. Yang Y, Li H, You S, Wang F, Qian C, Lin Z (2020) ISTA-NAS: efficient and consistent neural architecture search by sparse coding. Adv Neural Inf Process Syst 33:10503–10513

    Google Scholar 

  48. Veniat T, Denoyer L (2018) Learning time/memory-efficient deep architectures with budgeted super networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3492–3500

  49. Wang Q, Wu B, Zhu P, Li P, Zuo W, Hu Q (2020) ECA-Net: efficient channel attention for deep convolutional neural networks. In: 2020 IEEE/CVF conference on computer vision and pattern recognition, pp 11531–11539

  50. Ying C, Klein A, Christiansen E, Real E, Murphy K, Hutter F (2019) NAS-Bench-101: towards reproducible neural architecture search. In: International conference on machine learning, PMLR, pp 7105–7114

  51. Szegedy C, Ioffe S, Vanhoucke V, Alemi AA (2017) Inception-v4, inception-ResNet and the impact of residual connections on learning. In: Proceedings of the AAAI conference on artificial intelligence, pp 4278–4284

  52. Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: IEEE conference on computer vision and pattern recognition, pp 2818–2826

  53. Howard A, Pang R, Adam H, Le QV, Sandler M, Chen B, Wang W, Chen L, Tan M, Chu G, Vasudevan V, Zhu Y (2019) Searching for MobileNetV3. In: Proceedings of the IEEE international conference on computer vision, pp 1314–1324

  54. Yang Z, Wang Y, Chen X, Shi B, Xu C, Xu C, Tian Q, Xu C (2020) CARS: continuous evolution for efficient neural architecture search. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1829–1838

  55. Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780

    Article  Google Scholar 

  56. Gers FA, Schmidhuber J, Cummins F (2000) Learning to forget: continual prediction with LSTM. Neural Comput 12(10):2451–2471

    Article  Google Scholar 

  57. Hajek B (1988) Cooling schedules for optimal annealing. Math Oper Res 13(2):311–329

    Article  MathSciNet  MATH  Google Scholar 

  58. Krizhevsky A, Hinton G et al (2009) Learning multiple layers of features from tiny images. Master’s thesis, University of Tront

  59. Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) ImageNet: a large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition, IEEE, pp 248–255

  60. DeVries T, Taylor GW (2017) Improved regularization of convolutional neural networks with cutout. arXiv preprint arXiv:1708.04552

  61. Loshchilov I, Hutter F (2016) SGDR: stochastic gradient descent with warm restarts. arXiv preprint arXiv:1608.03983

  62. Liu C, Zoph B, Neumann M, Shlens J, Hua W, Li L-J, Fei-Fei L, Yuille A, Huang J, Murphy K (2018) Progressive neural architecture search. In: Proceedings of the European conference on computer vision, pp 19–34

  63. Baker B, Gupta O, Naik N, Raskar R (2016) Designing neural network architectures using reinforcement learning. arXiv preprint arXiv:1611.02167

  64. Lu Z, Whalen I, Boddeti V, Dhebar Y, Deb K, Goodman E, Banzhaf W (2019) NSGA-Net: neural architecture search using multi-objective genetic algorithm. In: Proceedings of the genetic and evolutionary computation conference, pp 419–427

  65. Elsken T, Metzen JH, Hutter F (2018) Efficient multi-objective neural architecture search via lamarckian evolution. arXiv preprint arXiv:1804.09081

  66. Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–9

  67. Zhang X, Zhou X, Lin M, Sun J (2018) ShuffleNet: an extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6848–6856

  68. Tan M, Chen B, Pang R, Vasudevan V, Sandler M, Howard A, Le QV (2019) MnasNet: platform-aware neural architecture search for mobile. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 2820–2828

  69. Zhou D, Zhou X, Zhang W, Loy CC, Yi S, Zhang X, Ouyang W (2020) EcoNAS: finding proxies for economical neural architecture search. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11396–11404

Download references

Acknowledgements

The work presented in this paper was supported by the National Natural Science Foundation of China (Grant No. 62273357, 61860206014), National Key Research and Development Program of China (2022YFC2904502), Hunan Provincial Natural Science Foundation of China (Grant No. 2021JJ20082).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiaojun Zhou.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Du, Y., Zhou, X., Huang, T. et al. A hierarchical evolution of neural architecture search method based on state transition algorithm. Int. J. Mach. Learn. & Cyber. 14, 2723–2738 (2023). https://doi.org/10.1007/s13042-023-01794-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13042-023-01794-w

Keywords

Navigation