Skip to main content

Advertisement

Log in

Vision transformer-based meta loss landscape exploration with actor-critic method

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Detecting and mitigating overfitting in deep neural networks remains a critical challenge in modern machine learning. This paper investigates innovative approaches to address these challenges, particularly focusing on vision transformer-based models. By leveraging meta-learning techniques and reinforcement learning frameworks, we introduce transformer-based loss landscape exploration (TLLE), which utilizes the validation loss landscape to guide gradient descent optimization. Unlike conventional methods, TLLE employs the actor-critic algorithm to learn the mapping from model weights to future values, facilitating efficient sample collection and precise value predictions. Experimental results demonstrate the superior performance of TLLE-enhanced transformer models in image classification and segmentation tasks, showcasing the efficacy of our approach in optimizing deep learning models for image analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+
from $39.99 /Month
  • Starting from 10 chapters or articles per month
  • Access and download chapters and articles from more than 300k books and 2,500 journals
  • Cancel anytime
View plans

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Data availability

No datasets were generated or analyzed during the current study.

References

  1. Loshchilov I, Hutter F (2016) Sgdr: Stochastic gradient descent with warm restarts. In: arXiv preprint arXiv:1608.03983

  2. Keskar NS, et al (2016) On large-batch training for deep learning: generalization gap and sharp minima. In: arXiv preprint arXiv:1609.04836

  3. Deng J, (2009) Imagenet: a large-scale hierarchical image database. In: IEEE Conference on Computer Vision and Pattern Recognition. IEEE. pp. 248–255

  4. Srivastava N et al (2014) Dropout: a simple way to prevent neural networks from overfitting. Jo Mach Learn Res 15(1):929–1958

    MathSciNet  MATH  Google Scholar 

  5. Goodfellow IJ, Shlens J, Szegedy C (2014) Explaining and harnessing adversarial examples. In: arXiv preprint arXiv:1412.6572

  6. Madry A, et al (2017) Towards deep learning models resistant to adversarial attacks. In: arXiv preprint arXiv:1706.06083

  7. Shorten C, Khoshgoftaar TM (2019) A survey on image data augmentation for deep learning. J. Big Data 6(1):1–48

    Article  MATH  Google Scholar 

  8. Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. In: arXiv preprint arXiv:1412.6980

  9. Vanschoren J (2018) Meta-learning: a survey. In: arXiv preprint arXiv:1810.03548

  10. Franceschi L, et al (2018) Bilevel programming for hyperparameter optimization and meta-learning. In: International Conference on Machine Learning. PMLR. pp. 1568–1577

  11. Jomaa HS, Grabocka J, Schmidt-Thieme L (2019) Hyp-rl: hyperparameter optimization by reinforcement learning. In: arXiv preprint arXiv:1906.11527

  12. Lorraine J, Vicol P, Duvenaud D (2020) Optimizing millions of hyperparameters by implicit differentiation. In: International Conference on Artificial Intelligence and Statistics. PMLR. pp. 1540–1552

  13. Finn C, Abbeel P, Levine S (2017) Model-agnostic meta-learning for fast adaptation of deep networks. In: International Conference on Machine Learning. PMLR. pp. 1126–1135

  14. Li K, Malik J (2016) Learning to optimize. In: arXiv preprint arXiv:1606.01885

  15. Zhang E, Wahib M, Munetomo M (2022) Learning from the past: regularization by validation. In: 2022 Joint 12th International Conference on Soft Computing and Intelligent Systems and 23rd International Symposium on Advanced Intelligent Systems (SCIS &ISIS). , pp. 1–8. https://doi.org/10.1109/SCISISIS55246.2022.10002143

  16. Mnih V et al (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529–533

    Article  MATH  Google Scholar 

  17. Konda V, Tsitsiklis J (1999) Actor-critic algorithms. In: Advances in Neural Information Processing Systems. 12

  18. Haarnoja T et al (2018) Soft actor-critic algorithms and applications. In: arXiv preprint arXiv:1812.05905

  19. Li H, et al (2018) Visualizing the loss landscape of neural nets. In: Advances in Neural Information Processing Systems. 31

  20. Garipov T, et al (2018) Loss surfaces, mode connectivity, and fast ensembling of dnns. In: Advances in Neural Information Processing Systems. 31

  21. Izmailov P et al (2018) Averaging weights leads to wider optima and better generalization. In: arXiv preprint arXiv:1803.05407

  22. Foret P, et al (2020) Sharpness-aware minimization for efficiently improving generalization. In: arXiv preprint arXiv:2010.01412

  23. Li Z, et al (2017) Meta-sgd: Learning to learn quickly for few-shot learning. In: arXiv preprint arXiv:1707.09835

  24. Antoniou A, Edwards H, Storkey, A (2018) How to train your MAML. In: arXiv preprint arXiv:1810.09502

  25. You Y, Gitman I, Ginsburg B (2017) Scaling sgd batch size to 32k for imagenet training. In: arXiv preprint arXiv:1708.03888 6(12) , p. 6

  26. Pham H, et al (2018) Efficient neural architecture search via parameters sharing. In: International Conference on Machine Learning. PMLR. pp. 4095–4104

  27. Li Y, et al (2020) Neural architecture search in a proxy validation loss landscape. pp. 5853–5862

  28. Alexey D, et al (2020) An image is worth 16x16 words: Transformers for image recognition at scale. In: arXiv preprint arXiv:2010.11929

  29. Sutton RS, Barto AG (2018) Reinforcement learning: an introduction. MIT Press

    MATH  Google Scholar 

  30. Deng L (2012) The mnist database of handwritten digit images for machine learning research. IEEE Signal Process. Mag. 29(6):141–142

    Article  MATH  Google Scholar 

  31. Alex K, Vinod N, Geoffrey H CIFAR-10 (Canadian Institute for Advanced Research). http://www.cs.toronto.edu/~kriz/cifar.html

  32. Alex K, Vinod N, Geoffrey H CIFAR-100 (Canadian Institute for Advanced Research). http://www.cs.toronto.edu/~kriz/cifar.html

  33. Kim YJ et al (2021) PAIP 2019: Liver cancer segmentation challenge. In: Medical Image Analysis 67, p. 101854. ISSN: 1361-8415. https://doi.org/10.1016/j.media.2020.101854. https://www.sciencedirect.com/science/article/pii/S1361841520302188

  34. He K et al (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 770–778

  35. Glorot X, Bengio Y (2010) Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics. JMLR Workshop and Conference Proceedings. pp. 249–256

Download references

Author information

Authors and Affiliations

Authors

Contributions

Writing–original draft, Writing–review and editing, and Funding acquisition. Rui Zhong: Investigation, Methodology, Formal Analysis, and Writing–review and editing. Xingbang Du: Investigation, Methodology, Formal Analysis, and Writing–review and editing. Muhamed Wahib: Writing–review and editing and Project administration. Masaharu Munetomo: Writing–review and editing and Project administration.

Corresponding author

Correspondence to Enzhi Zhang.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, E., Zhong, R., Du, X. et al. Vision transformer-based meta loss landscape exploration with actor-critic method. J Supercomput 81, 350 (2025). https://doi.org/10.1007/s11227-024-06867-3

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11227-024-06867-3

Keywords