Abstract
The neural architecture search (NAS) is characterized by a wide search space and a time consuming objective function. Many papers have dealt with the reduction of the cost of the objective function assessment. Among them, there is DARTS paper [1] that proposes to transform the original discrete problem into a continuous one. This paper builds an overparameterized network called hypernetwork and weights its edges by continuous coefficients. This approach allows to considerably reduce the computational cost, but the quality of the obtained architectures is highly variable. We propose to reduce this variability by introducing a convex depth regularization. We also add a heuristic that controls the number of unweighted operations. The goal is to correct the short term bias, introduced by the hypergradient approximation. Finally, we will show the efficiency of these proposals by starting again the work developed in Dots paper [2].
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Liu, H., Simonyan, K., Yang, Y.: Darts: Differentiable architecture search. In: International Conference on Learning Representations (ICLR) (2019)
Gu, Y.C., et al.: DOTS: decoupling operation and topology in differentiable architecture search. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2021)
He, C., Ye, H., Shen, L., Zhang, T.: MiLeNAS: efficient neural architecture search via mixed-level reformulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2020)
Zhou, P., Xiong, C., Socher, R., Hoi, S.C.H.: Theory-inspired path-regularized differential network architecture search. In: Neural Information Processing Systems (NeurIPS) (2020)
Sohl-Dickstein, J.: Persistent unbiased gradient estimation in unrolled computation graphs with persistent evolution strategies. In: International Conference on Machine Learning (2021)
Metz, L., Maheswaranathan, N., Sun, R., Daniel Freeman, C., Poole, B., Sohl-Dickstein, J.: Using a thousand optimization tasks to learn hyperparameter search strategies. In: Neural Information Processing Systems (NeurIPS) (2020)
Fu, J., Luo, H., Feng, J., Low, K.H., Chua, T.S.: DrMAD: distilling reverse-mode automatic differentiation for optimizing hyperparameters of deep neural networks. In: International Joint Conference on Artificial Intelligence (IJCAI) (2016)
Hou, P., Jin, Y., Chen, Y.: Single-DARTS: towards stable architecture search. In: IEEE/CVF International Conference on Computer Vision Workshops (ICCVW) (2021)
Wu, Y., Ren, M., Liao, R., Grosse, R.: Understanding short-horizon bias in stochastic meta-optimizations. In: International Conference on Learning Representations (ICLR) (2018)
Luketina, J., Berglund, M., Greff, K., Raiko, T.: Scalable gradient-based tuning of continuous regularization hyperparameters. In: International conference on machine learning (ICML) (2016)
Lee, H.B., Lee, H., Shin, J., Yang, E., Hospedales, T.M., Hwang, S.J.: Online hyperparameter meta-learning with hypergradient distillation. In: International Conference on Learning Representations (ICLR) (2022)
Franceschi, L., Donini, M., Frasconi, P., Pontil, M.: Forward and reverse gradient-based hyperparameter optimization. In: International Conference on Machine Learning (ICML) (2017)
Lorraine, J., Vicol, P., Duvenaud, D.: Optimizing millions of hyperparameters by implicit differentiation. In: International Conference on Artificial Intelligence and Statistics (2020)
Choe, H., Na, B., Mok, J., Yoon, S.: Variance-stationary differentiable NAS. In: British Machine Vision Conference (BMVC) (2021)
Wei, T., Wang, C., Rui, Y., Chen, C.W.: Network morphism. In: Proceedings of Machine Learning Research (PMLR) (2016)
Hong, W., et al.: DropNAS: grouped operation dropout for differentiable architecture search. In: International Joint Conference on Artificial Intelligence (IJCAI) (2020)
Lin, M., et al.: Zen-NAS: a zero-shot NAS for high-performance image recognition. In: International Conference on Computer Vision (ICCV) (2021)
Li, L., Jamieson, K., DeSalvo, G., Rostamizadeh, A., Talwalkar, A.: Hyperband: a novel bandit-based approach to hyperparameter optimization. J. Mach. Learn. Res. 18, 6765–6816 (2018)
Zoph, B., Vasudevan, V., Shlens, J., Le, Q.V.: Learning transferable architectures for scalable image recognition. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2019)
Real, E., Aggarwal, A., Huang, Y., Le, Q.V.: Regularized evolution for image classifier architecture search. In: Conference on Artificial Intelligence (AAAI) (2019)
Vicol, P., Lorraine, J.P., Pedregosa, F., Duvenaud, D., Grosse, R.B.: On implicit bias in overparameterized bilevel optimization. In: International Conference on Machine Learning (ICML) (2022)
Li, G., Qian, G., Delgadillo, I.C., Muller, M., Thabet, A., Ghanem, B.: SGAS: sequential greedy architecture search. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2020)
Kingma, D.P., Ba, J.L.: ADAM: a method for stochastic optimization. In: International Conference on Learning Representations (ICLR) (2015)
Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: International Conference on Learning Representations (ICLR) (2019)
Ang, A., Ma, J., Liu, N., Huang, K., Wang, Y.: Fast projection onto the capped simplex with applications to sparse regression in bioinformatics. In: Neural Information Processing Systems (NeurIPS) (2021)
Wang, W., Lu, C.: Projection onto the capped simplex. arXiv preprint arXiv:1503.01002 (2015)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Lacharme, G., Cardot, H., Lenté, C., Monmarché, N. (2023). DARTS with Degeneracy Correction. In: Pertusa, A., Gallego, A.J., Sánchez, J.A., Domingues, I. (eds) Pattern Recognition and Image Analysis. IbPRIA 2023. Lecture Notes in Computer Science, vol 14062. Springer, Cham. https://doi.org/10.1007/978-3-031-36616-1_4
Download citation
DOI: https://doi.org/10.1007/978-3-031-36616-1_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-36615-4
Online ISBN: 978-3-031-36616-1
eBook Packages: Computer ScienceComputer Science (R0)