Kronecker-factored Approximate Curvature with adaptive learning rate for optimizing model-agnostic meta-learning

Zhang, Ce; Yao, Xiao; Shi, Changfeng; Gu, Min

doi:10.1007/s00530-023-01159-x

Kronecker-factored Approximate Curvature with adaptive learning rate for optimizing model-agnostic meta-learning

Regular Paper
Published: 23 August 2023

Volume 29, pages 3169–3177, (2023)
Cite this article

Multimedia Systems Aims and scope Submit manuscript

Ce Zhang¹,
Xiao Yao¹,
Changfeng Shi² &
…
Min Gu³

96 Accesses
Explore all metrics

Abstract

Model-agnostic meta-learning (MAML) highlights the ability to quickly adapt to new tasks with only a small amount of labeled training data among many few-shot learning algorithms. However, the computational complexity is high, because the MAML algorithm generates a large number of second-order parameters in the secondary gradient update. In addition, due to the non-convex nature of the neural network, the loss landscape has many flat areas, leading to slow convergence during training, and excessively long training. In this paper, a second-order optimization method called Kronecker-factored Approximate Curvature (K-FAC) is proposed to approximate Natural Gradient Descent. K-FAC reduces the computational complexity by approximating the large matrix of the Fisher information as the Kronecker product of two much smaller matrices, and the second-order parameter information is fully utilized to accelerate the convergence. Moreover, in order to solve the problem that Natural Gradient Descent is sensitive to the learning rate, this paper proposes Kronecker-factored Approximate Curvature with adaptive learning rate for optimizing model-agnostic meta-learning (AK-MAML), which automatically adjusts the learning rate according to the curvature and improves the efficiency of training. Experimental results show that AK-MAML has the ability of faster convergence, lower computation, and higher accuracy on few-shot datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Model-agnostic multi-stage loss optimization meta learning

Article 26 April 2021

Few-shot learning with adaptively initialized task optimizer: a practical meta-learning approach

Article 10 October 2019

Meta-learning with Network Pruning

Data availability

All data are available from authors upon reasonable request.

References

Xie, Y., Wang, H., Yu, B., Zhang, C.: Secure collaborative few-shot learning. Knowl.-Based Syst. 203(7553), 106157 (2020)
Article Google Scholar
Xu, Z., Chen, X., Tang, W., Lai, J., Cao, L.: Meta weight learning via model-agnostic meta-learning. Neurocomputing 432(7587), 124 (2020)
Google Scholar
Finn, C., Abbeel, P., Levine, S.: Model-agnostic meta-learning for fast adaptation of deep networks. In: International Conference on Machine Learning. PMLR, pp. 1126–1135 (2017)
Zhang, G., Martens, J., Grosse, R.B.: Fast convergence of natural gradient descent for over-parameterized neural networks. In: Advances in Neural Information Processing Systems, vol. 32 (2019)
Martens, J., Grosse, R.: Optimizing neural networks with kronecker-factored approximate curvature. In: International Conference on Machine Learning. PMLR, pp. 2408–2417 (2015)
Wan, W.: Implementing online natural gradient learning: problems and solutions. IEEE Trans. Neural Netw. 17(2), 317–329 (2006). https://doi.org/10.1109/TNN.2005.863406
Article Google Scholar
Antoniou, A., Edwards, H., Storkey, A.: How to train your MAML (2018). arXiv:1810.09502
Li, Z., Zhou, F., Chen, F., et al.: Meta-sgd: learning to learn quickly for few-shot learning (2017). arXiv:1707.09835
Agarwal, N., Bullins, B., Hazan, E.: Second order stochastic optimization in linear time. J. Mach. Learn. Res. 18, 1–40 (2016)
MathSciNet MATH Google Scholar
Truong, T.T., To, T.D., Nguyen, T.H., et al.: A fast and simple modification of Newton’s method helping to avoid saddle points (2020). arXiv:2006.01512
Nocedal, J.: Optimization methods for large-scale machine learning. Siam Rev. 60(2), 223 (2016)
MathSciNet MATH Google Scholar
Ya-Xiang, Y.: A modified bfgs algorithm for unconstrained optimization. Ima J. Numer. Anal. 3, 325–332 (1991)
MathSciNet MATH Google Scholar
Yao, Z., Gholami, A., Shen, S., Keutzer, K., Mahoney, M.W.: Adahessian: an adaptive second order optimizer for machine learning. AAAI (2020). https://doi.org/10.1609/aaai.v35i12.17275
Article Google Scholar
Gupta, V., Koren, T., Singer, Y:. Shampoo: preconditioned stochastic tensor optimization. In: International Conference on Machine Learning. PMLR, pp. 1842–1850 (2018)
Osawa, K., Tsuji, Y., Ueno, Y., et al.: Large-scale distributed second-order optimization using kronecker-factored approximate curvature for deep convolutional neural networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12359–12367 (2019)
Zhang, Z., Su, X., Ding, L., et al.: Multi-scale image segmentation of coal piles on a belt based on the Hessian matrix. Particuology 11(5), 549–555 (2013)
Article Google Scholar
Barfoot, T.D.: Multivariate Gaussian variational inference by natural gradient descent (2020). arXiv:2001.10025
Tieleman, T., Hinton, G.: Lecture 6.5-rmsprop: divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning. 26–31 (2012)
Ager, S.: Omniglot: the online encyclopedia of writing systems & languages. Simon Ager (1998)
Deng, J., Dong, W., Socher, R., et al.: Imagenet: a large-scale hierarchical image database[C] 2009 IEEE conference on computer vision and pattern recognition. IEEE, 2009: 248–255

Download references

Acknowledgements

This work was supported by the Fundamental Research Funds for the Central Universities B220202019, Changzhou Sci&Tech Program (Grant No. CJ20210092), Young Talent Development Plan of Changzhou Health Commission (Grant No. CZQM2020025), and the Key Research and Development Program of Jiangsu under grants BK20192004, BE2018004-04.

Author information

Authors and Affiliations

The College of IoT Engineering, Hohai University, Changzhou, China
Ce Zhang & Xiao Yao
Business School of Hohai University, Changzhou, China
Changfeng Shi
The First People’s Hospital of Changzhou, Changzhou, China
Min Gu

Authors

Ce Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Xiao Yao
View author publications
You can also search for this author in PubMed Google Scholar
Changfeng Shi
View author publications
You can also search for this author in PubMed Google Scholar
Min Gu
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All the authors have contributed to the research of this paper. Ce Zhang proposed the scheme and carried out the experiment under the guidance of Xiao Yao. Xiao Yao and Ce Zhang wrote the first draft together. Changfeng Shi and Min Gu are responsible for data sorting and analysis. All the authors read the article and put forward suggestions for revision.

Corresponding author

Correspondence to Xiao Yao.

Ethics declarations

Conflict of interest

The authors certify that they have no conflict of interest.

Additional information

Communicated by A. Sur.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Zhang, C., Yao, X., Shi, C. et al. Kronecker-factored Approximate Curvature with adaptive learning rate for optimizing model-agnostic meta-learning. Multimedia Systems 29, 3169–3177 (2023). https://doi.org/10.1007/s00530-023-01159-x

Download citation

Received: 24 May 2022
Accepted: 05 August 2023
Published: 23 August 2023
Issue Date: December 2023
DOI: https://doi.org/10.1007/s00530-023-01159-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Kronecker-factored Approximate Curvature with adaptive learning rate for optimizing model-agnostic meta-learning

Abstract

Access this article

Similar content being viewed by others

Model-agnostic multi-stage loss optimization meta learning

Few-shot learning with adaptively initialized task optimizer: a practical meta-learning approach

Meta-learning with Network Pruning

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Kronecker-factored Approximate Curvature with adaptive learning rate for optimizing model-agnostic meta-learning

Abstract

Access this article

Similar content being viewed by others

Model-agnostic multi-stage loss optimization meta learning

Few-shot learning with adaptively initialized task optimizer: a practical meta-learning approach

Meta-learning with Network Pruning

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation