Abstract
Detecting and mitigating overfitting in deep neural networks remains a critical challenge in modern machine learning. This paper investigates innovative approaches to address these challenges, particularly focusing on vision transformer-based models. By leveraging meta-learning techniques and reinforcement learning frameworks, we introduce transformer-based loss landscape exploration (TLLE), which utilizes the validation loss landscape to guide gradient descent optimization. Unlike conventional methods, TLLE employs the actor-critic algorithm to learn the mapping from model weights to future values, facilitating efficient sample collection and precise value predictions. Experimental results demonstrate the superior performance of TLLE-enhanced transformer models in image classification and segmentation tasks, showcasing the efficacy of our approach in optimizing deep learning models for image analysis.




Similar content being viewed by others
Data availability
No datasets were generated or analyzed during the current study.
References
Loshchilov I, Hutter F (2016) Sgdr: Stochastic gradient descent with warm restarts. In: arXiv preprint arXiv:1608.03983
Keskar NS, et al (2016) On large-batch training for deep learning: generalization gap and sharp minima. In: arXiv preprint arXiv:1609.04836
Deng J, (2009) Imagenet: a large-scale hierarchical image database. In: IEEE Conference on Computer Vision and Pattern Recognition. IEEE. pp. 248–255
Srivastava N et al (2014) Dropout: a simple way to prevent neural networks from overfitting. Jo Mach Learn Res 15(1):929–1958
Goodfellow IJ, Shlens J, Szegedy C (2014) Explaining and harnessing adversarial examples. In: arXiv preprint arXiv:1412.6572
Madry A, et al (2017) Towards deep learning models resistant to adversarial attacks. In: arXiv preprint arXiv:1706.06083
Shorten C, Khoshgoftaar TM (2019) A survey on image data augmentation for deep learning. J. Big Data 6(1):1–48
Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. In: arXiv preprint arXiv:1412.6980
Vanschoren J (2018) Meta-learning: a survey. In: arXiv preprint arXiv:1810.03548
Franceschi L, et al (2018) Bilevel programming for hyperparameter optimization and meta-learning. In: International Conference on Machine Learning. PMLR. pp. 1568–1577
Jomaa HS, Grabocka J, Schmidt-Thieme L (2019) Hyp-rl: hyperparameter optimization by reinforcement learning. In: arXiv preprint arXiv:1906.11527
Lorraine J, Vicol P, Duvenaud D (2020) Optimizing millions of hyperparameters by implicit differentiation. In: International Conference on Artificial Intelligence and Statistics. PMLR. pp. 1540–1552
Finn C, Abbeel P, Levine S (2017) Model-agnostic meta-learning for fast adaptation of deep networks. In: International Conference on Machine Learning. PMLR. pp. 1126–1135
Li K, Malik J (2016) Learning to optimize. In: arXiv preprint arXiv:1606.01885
Zhang E, Wahib M, Munetomo M (2022) Learning from the past: regularization by validation. In: 2022 Joint 12th International Conference on Soft Computing and Intelligent Systems and 23rd International Symposium on Advanced Intelligent Systems (SCIS &ISIS). , pp. 1–8. https://doi.org/10.1109/SCISISIS55246.2022.10002143
Mnih V et al (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529–533
Konda V, Tsitsiklis J (1999) Actor-critic algorithms. In: Advances in Neural Information Processing Systems. 12
Haarnoja T et al (2018) Soft actor-critic algorithms and applications. In: arXiv preprint arXiv:1812.05905
Li H, et al (2018) Visualizing the loss landscape of neural nets. In: Advances in Neural Information Processing Systems. 31
Garipov T, et al (2018) Loss surfaces, mode connectivity, and fast ensembling of dnns. In: Advances in Neural Information Processing Systems. 31
Izmailov P et al (2018) Averaging weights leads to wider optima and better generalization. In: arXiv preprint arXiv:1803.05407
Foret P, et al (2020) Sharpness-aware minimization for efficiently improving generalization. In: arXiv preprint arXiv:2010.01412
Li Z, et al (2017) Meta-sgd: Learning to learn quickly for few-shot learning. In: arXiv preprint arXiv:1707.09835
Antoniou A, Edwards H, Storkey, A (2018) How to train your MAML. In: arXiv preprint arXiv:1810.09502
You Y, Gitman I, Ginsburg B (2017) Scaling sgd batch size to 32k for imagenet training. In: arXiv preprint arXiv:1708.03888 6(12) , p. 6
Pham H, et al (2018) Efficient neural architecture search via parameters sharing. In: International Conference on Machine Learning. PMLR. pp. 4095–4104
Li Y, et al (2020) Neural architecture search in a proxy validation loss landscape. pp. 5853–5862
Alexey D, et al (2020) An image is worth 16x16 words: Transformers for image recognition at scale. In: arXiv preprint arXiv:2010.11929
Sutton RS, Barto AG (2018) Reinforcement learning: an introduction. MIT Press
Deng L (2012) The mnist database of handwritten digit images for machine learning research. IEEE Signal Process. Mag. 29(6):141–142
Alex K, Vinod N, Geoffrey H CIFAR-10 (Canadian Institute for Advanced Research). http://www.cs.toronto.edu/~kriz/cifar.html
Alex K, Vinod N, Geoffrey H CIFAR-100 (Canadian Institute for Advanced Research). http://www.cs.toronto.edu/~kriz/cifar.html
Kim YJ et al (2021) PAIP 2019: Liver cancer segmentation challenge. In: Medical Image Analysis 67, p. 101854. ISSN: 1361-8415. https://doi.org/10.1016/j.media.2020.101854. https://www.sciencedirect.com/science/article/pii/S1361841520302188
He K et al (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 770–778
Glorot X, Bengio Y (2010) Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics. JMLR Workshop and Conference Proceedings. pp. 249–256
Author information
Authors and Affiliations
Contributions
Writing–original draft, Writing–review and editing, and Funding acquisition. Rui Zhong: Investigation, Methodology, Formal Analysis, and Writing–review and editing. Xingbang Du: Investigation, Methodology, Formal Analysis, and Writing–review and editing. Muhamed Wahib: Writing–review and editing and Project administration. Masaharu Munetomo: Writing–review and editing and Project administration.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zhang, E., Zhong, R., Du, X. et al. Vision transformer-based meta loss landscape exploration with actor-critic method. J Supercomput 81, 350 (2025). https://doi.org/10.1007/s11227-024-06867-3
Accepted:
Published:
DOI: https://doi.org/10.1007/s11227-024-06867-3