Using Multiple Heads to Subsize Meta-memorization Problem

Wang, Lu; Eddie Law, K. L.

doi:10.1007/978-3-031-15937-4_42

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13532))

Included in the following conference series:

International Conference on Artificial Neural Networks

1841 Accesses
1 Citations

Abstract

The memorization problem is a meta-level overfitting phenomenon in meta-learning. The trained model prefers to remember learned tasks instead of adapting to new tasks. This issue limits many meta-learning approaches to generalize. In this paper, we mitigate this limitation issue by proposing multiple supervisions through a multi-objective optimization process. The design leads to a Multi-Input Multi-Output (MIMO) configuration for meta-learning. The model has multiple outputs through different heads. Each head is supervised by a different order of labels for the same task. This leads to different memories, resulting in meta-level conflicts as regularization to avoid meta-overfitting. The resulting MIMO configuration is applicable to all MAML-like algorithms with minor increments in training computation, the inference calculation can be reduced through early-exit policy or better performance can be achieved through low cost ensemble. In experiments, identical model and training settings are used in all test cases, our proposed design is able to suppress the meta-overfitting issue, achieve smoother loss landscapes, and improve generalisation.

K. L. E. Law would appreciate the financial support provided by Macao Polytechnic University through the research funding programme (#RP/ESCA-09/2021).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Finn, C., Abbeel, P., Levine, S.: Model-agnostic meta-learning for fast adaptation of deep networks. In: ICML, vol. 70, pp. 1126–1135 (2017). http://proceedings.mlr.press/v70/finn17a.html
Hospedales, T.M., Antoniou, A., Micaelli, P., Storkey, A.J.: Meta-learning in neural networks: a survey. IEEE Trans. Pattern Anal. Mach. Intell. (2021). https://ieeexplore.ieee.org/document/9428530
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)
Google Scholar
Silver, D., et al.: Mastering the game of go with deep neural networks and tree search. Nature 529(7587), 484–489 (2016)
Article Google Scholar
Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: NAACL-HLT (2019)
Google Scholar
Marcus, G.: Deep learning: a critical appraisal. CoRR abs/1801.00631 (2018). https://arxiv.org/abs/1801.00631
Yin, M., Tucker, G., Zhou, M., Levine, S., Finn, C.: Meta-learning without memorization. In: ICLR (2020)
Google Scholar
Doveh, S., et al.: MetAdapt: meta-learned task-adaptive architecture for few-shot classification. Pattern Recognit. Lett. 149, 130–136 (2021)
Article Google Scholar
Raghu, A., Raghu, M., Bengio, S., Vinyals, O.: Rapid learning or feature reuse? towards understanding the effectiveness of MAML. In: ICLR (2020)
Google Scholar
Nichol, A., Achiam, J., Schulman, J.: On first-order meta-learning algorithms. CoRR abs/1803.02999 (2018). https://arxiv.org/abs/1803.02999
Srivastava, N., Hinton, G.E., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)
MathSciNet MATH Google Scholar
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: ICML (2015)
Google Scholar
Rajendran, J., Irpan, A., Jang, E.: Meta-learning requires meta-augmentation. In: NeurIPS (2020)
Google Scholar
Pan, E., Rajak, P., Shrivastava, S.: Meta-regularization by enforcing mutual-exclusiveness. CoRR abs/2101.09819 (2021)
Google Scholar
Tian, H., Liu, B., Yuan, X.-T., Liu, Q.: Meta-learning with network pruning. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12364, pp. 675–700. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58529-7_40
Chapter Google Scholar
Zintgraf, L.M., Shiarlis, K., Kurin, V., Hofmann, K., Whiteson, S.: Fast context adaptation via meta-learning. In: ICML (2019)
Google Scholar
Havasi, M., et al.: Training independent subnetworks for robust prediction. In: ICLR (2021)
Google Scholar
Pan, S.J., Yang, Q.: A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22(10), 1345–1359 (2010)
Article Google Scholar
Raffel, C., et al.: Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res. 21, 140:1–140:67 (2020). https://jmlr.org/papers/v21/20-074.html
Thrun, S., Pratt, L.Y.: Learning to learn: introduction and overview. In: Thrun, S., Pratt, L.Y. (eds.) Learning to Learn, pp. 3–17. Springer, Boston (1998). https://doi.org/10.1007/978-1-4615-5529-2_1
Chapter MATH Google Scholar
Snell, J., Swersky, K., Zemel, R.S.: Prototypical networks for few-shot learning. In: NeurIPS (2017)
Google Scholar
Metz, L., Maheswaranathan, N., Cheung, B., Sohl-Dickstein, J.: Meta-learning update rules for unsupervised representation learning. In: ICLR (2019)
Google Scholar
Alet, F., Schneider, M.F., Lozano-Pérez, T., Kaelbling, L.P.: Meta-learning curiosity algorithms. In: ICLR (2020)
Google Scholar
Franceschi, L., Frasconi, P., Salzo, S., Grazzi, R., Pontil, M.: Bilevel programming for hyperparameter optimization and meta-learning. In: ICML (2018)
Google Scholar
Elsken, T., Staffler, B., Metzen, J.H., Hutter, F.: Meta-learning of neural architectures for few-shot learning. In: CVPR (2020)
Google Scholar
Finn, C.: Learning to Learn with Gradients. Ph.D. thesis, University of California, Berkeley, USA (2018). https://escholarship.org/uc/item/0987d4n3
Mishra, N., Rohaninejad, M., Chen, X., Abbeel, P.: A simple neural attentive meta-learner. In: ICLR (2018)
Google Scholar
Yoon, J., Kim, T., Dia, O., Kim, S., Bengio, Y., Ahn, S.: Bayesian model-agnostic meta-learning. In: NeurIPS (2018)
Google Scholar
Jamal, M.A., Qi, G.: Task agnostic meta-learning for few-shot learning. In: CVPR, pp. 11719–11727 (2019)
Google Scholar
Lee, K., Maji, S., Ravichandran, A., Soatto, S.: Meta-learning with differentiable convex optimization. In: CVPR (2019)
Google Scholar
Tseng, H., Chen, Y., Tsai, Y., Liu, S., Lin, Y., Yang, M.: Regularizing meta-learning via gradient dropout. In: ACCV (2020)
Google Scholar
Lee, H., Nam, T., Yang, E., Hwang, S.J.: Meta dropout: learning to perturb latent features for generalization. In: ICLR (2020)
Google Scholar
Yao, H., et al.: Improving generalization in meta-learning via task augmentation. In: ICML (2021)
Google Scholar
Han, Y., Huang, G., Song, S., Yang, L., Wang, H., Wang, Y.: Dynamic neural networks: a survey. CoRR abs/2102.04906 (2021). https://arxiv.org/abs/2102.04906
Teerapittayanon, S., McDanel, B., Kung, H.T.: Branchynet: fast inference via early exiting from deep neural networks. In: ICPR (2016)
Google Scholar
Keskar, N.S., Mudigere, D., Nocedal, J., Smelyanskiy, M., Tang, P.T.P.: On large-batch training for deep learning: generalization gap and sharp minima. In: ICLR (2017)
Google Scholar
Xie, Z., Sato, I., Sugiyama, M.: A diffusion theory for deep learning dynamics: Stochastic gradient descent exponentially favors flat minima. In: Keskar (2021)
Google Scholar
Li, H., Xu, Z., Taylor, G., Studer, C., Goldstein, T.: Visualizing the loss landscape of neural nets. In: Bengio, S., Wallach, H.M., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett, R. (eds.) NeurIPS (2018)
Google Scholar
De Bernardi, M.: Loss-landscapes. https://pypi.org/project/loss-landscapes/3.0.6/
Bertinetto, L., Henriques, J.F., Torr, P.H.S., Vedaldi, A.: Meta-learning with differentiable closed-form solvers. In: ICLR (2019)
Google Scholar
Frankle, J., Carbin, M.: The lottery ticket hypothesis: finding sparse, trainable neural networks. In: ICLR (2019)
Google Scholar
Krizhevsky, A.: Learning multiple layers of features from tiny images. Technical report, University of Toronto (2009)
Google Scholar
Paszke, A., et al.: Pytorch: an imperative style, high-performance deep learning library. In: Wallach, H., Larochelle, H., Beygelzimer, A., d’ Alché-Buc, F., Fox, E., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32, pp. 8024–8035. Curran Associates, Inc. (2019). https://proceedings.neurips.cc/paper/2019/file/bdbca288fee7f92f2bfa9f7012727740-Paper.pdf
Arnold, S.M.R., Mahajan, P., Datta, D., Bunner, I., Zarkias, K.S.: learn2learn: a library for meta-learning research. CoRR abs/2008.12284 (2020). https://arxiv.org/abs/2008.12284

Download references

Author information

Authors and Affiliations

Faculty of Applied Sciences, Macao Polytechnic University, Macao, China
Lu Wang & K. L. Eddie Law

Authors

Lu Wang
View author publications
You can also search for this author in PubMed Google Scholar
K. L. Eddie Law
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lu Wang .

Editor information

Editors and Affiliations

University of the West of England, Bristol, UK
Elias Pimenidis
Lancaster University, Lancaster, UK
Plamen Angelov
Digital Innovation, Teeside University, Middlesbrough, UK
Chrisina Jayne
Democritus University of Thrace, Xanthi, Greece
Antonios Papaleonidas
The University of the West of England, Bristol, UK
Mehmet Aydin

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, L., Eddie Law, K.L. (2022). Using Multiple Heads to Subsize Meta-memorization Problem. In: Pimenidis, E., Angelov, P., Jayne, C., Papaleonidas, A., Aydin, M. (eds) Artificial Neural Networks and Machine Learning – ICANN 2022. ICANN 2022. Lecture Notes in Computer Science, vol 13532. Springer, Cham. https://doi.org/10.1007/978-3-031-15937-4_42

Download citation

DOI: https://doi.org/10.1007/978-3-031-15937-4_42
Published: 07 September 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-15936-7
Online ISBN: 978-3-031-15937-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Using Multiple Heads to Subsize Meta-memorization Problem