Skip to main content

AdaHAT: Adaptive Hard Attention to the Task in Task-Incremental Learning

  • Conference paper
  • First Online:
Machine Learning and Knowledge Discovery in Databases. Research Track (ECML PKDD 2024)

Abstract

Catastrophic forgetting is a major issue in task-incremental learning, where a neural network loses what it has learned in previous tasks after being trained on new tasks. A number of architecture-based approaches have been proposed to address this issue. However, the architecture-based approaches suffer from another issue on network capacity when the network learns long sequences of tasks. As the network is trained on an increasing number of new tasks in a long sequence of tasks, more parameters become static to prevent the network from forgetting what it has learned in previous tasks. In this paper, we propose an adaptive task-based hard attention mechanism which allows adaptive updates to static parameters by taking into account the information about previous tasks on both the importance of these parameters to previous tasks and the current network capacity. We develop a new neural network architecture incorporating our proposed Adaptive Hard Attention to the Task (AdaHAT) mechanism. AdaHAT extends an existing architecture-based approach, Hard Attention to the Task (HAT), to learn long sequences of tasks in an incremental manner. We conduct experiments on a number of datasets and compare AdaHAT with a number of baselines, including HAT. Our experimental results show that AdaHAT achieves better average performance over tasks than these baselines, especially on long sequences of tasks, demonstrating the benefits from balancing the trade-off between stability and plasticity of a network when learning such sequences of tasks. Our code is available at github.com/pengxiang-wang/continual-learning-arena.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://github.com/pengxiang-wang/continual-learning-arena.

  2. 2.

    \({\textbf {m}}^{\le 0}_l\) starts with all zeros to calculate \({\textbf {m}}^{\le 1}_l\).

  3. 3.

    AdaHAT denotes the adjustment rate as \(a^\star _{l,ij}\).

References

  1. Ahn, H., Cha, S., Lee, D., Moon, T.: Uncertainty-based continual learning with adaptive regularization. In: Advances in Neural Information Processing Systems, vol. 32 (2019)

    Google Scholar 

  2. Aljundi, R., Chakravarty, P., Tuytelaars, T.: Expert gate: lifelong learning with a network of experts. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3366–3375 (2017)

    Google Scholar 

  3. Buzzega, P., Boschini, M., Porrello, A., Abati, D., Calderara, S.: Dark experience for general continual learning: a strong, simple baseline. Adv. Neural. Inf. Process. Syst. 33, 15920–15930 (2020)

    Google Scholar 

  4. De Lange, M., et al.: A continual learning survey: defying forgetting in classification tasks. IEEE Trans. Pattern Anal. Mach. Intell. 44(7), 3366–3385 (2021)

    Google Scholar 

  5. Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, pp. 249–256. JMLR Workshop and Conference Proceedings (2010)

    Google Scholar 

  6. Goodfellow, I.J., Mirza, M., Xiao, D., Courville, A., Bengio, Y.: An empirical investigation of catastrophic forgetting in gradient-based neural networks. arXiv preprint arXiv:1312.6211 (2013)

  7. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

    Google Scholar 

  8. Hung, C.Y., Tu, C.H., Wu, C.E., Chen, C.H., Chan, Y.M., Chen, C.S.: Compacting, picking and growing for unforgetting continual learning. In: Advances in Neural Information Processing Systems, vol. 32 (2019)

    Google Scholar 

  9. Kaushik, P., Gain, A., Kortylewski, A., Yuille, A.: Understanding catastrophic forgetting and remembering in continual learning with optimal relevance mapping. arXiv preprint arXiv:2102.11343 (2021)

  10. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)

  11. Kirkpatrick, J., et al.: Overcoming catastrophic forgetting in neural networks. Proc. Natl. Acad. Sci. 114(13), 3521–3526 (2017)

    Article  MathSciNet  Google Scholar 

  12. Lee, S.W., Kim, J.H., Jun, J., Ha, J.W., Zhang, B.T.: Overcoming catastrophic forgetting by incremental moment matching. In: Advances in Neural Information Processing Systems, vol. 30 (2017)

    Google Scholar 

  13. Li, Z., Hoiem, D.: Learning without forgetting. IEEE Trans. Pattern Anal. Mach. Intell. 40(12), 2935–2947 (2017)

    Article  Google Scholar 

  14. Lopez-Paz, D., Ranzato, M.: Gradient episodic memory for continual learning. In: Advances in Neural Information Processing Systems, vol. 30 (2017)

    Google Scholar 

  15. Mallya, A., Davis, D., Lazebnik, S.: Piggyback: adapting a single network to multiple tasks by learning to mask weights. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 67–82 (2018)

    Google Scholar 

  16. Mallya, A., Lazebnik, S.: PackNet: adding multiple tasks to a single network by iterative pruning. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. pp. 7765–7773 (2018)

    Google Scholar 

  17. McCloskey, M., Cohen, N.J.: Catastrophic interference in connectionist networks: the sequential learning problem. In: Psychology of Learning and Motivation, vol. 24, pp. 109–165. Elsevier (1989)

    Google Scholar 

  18. Nguyen, C.V., Li, Y., Bui, T.D., Turner, R.E.: Variational continual learning. arXiv preprint arXiv:1710.10628 (2017)

  19. Ratcliff, R.: Connectionist models of recognition memory: constraints imposed by learning and forgetting functions. Psychol. Rev. 97(2), 285 (1990)

    Article  Google Scholar 

  20. Rebuffi, S.A., Kolesnikov, A., Sperl, G., Lampert, C.H.: iCaRl: incremental classifier and representation learning. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp. 2001–2010 (2017)

    Google Scholar 

  21. Rusu, A.A., et al.: Progressive neural networks. arXiv preprint arXiv:1606.04671 (2016)

  22. Serra, J., Suris, D., Miron, M., Karatzoglou, A.: Overcoming catastrophic forgetting with hard attention to the task. In: International Conference on Machine Learning, pp. 4548–4557. PMLR (2018)

    Google Scholar 

  23. Srivastava, R.K., Masci, J., Kazerounian, S., Gomez, F., Schmidhuber, J.: Compete to compute. In: Advances in Neural Information Processing Systems, vol. 26 (2013)

    Google Scholar 

  24. Van de Ven, G.M., Tolias, A.S.: Three scenarios for continual learning. arXiv preprint arXiv:1904.07734 (2019)

  25. Wang, L., Zhang, X., Su, H., Zhu, J.: A comprehensive survey of continual learning: theory, method and application. IEEE Trans. Pattern Anal. Mach. Intell. 46, 5362–5383 (2024)

    Article  Google Scholar 

  26. Wortsman, M., et al.: Supermasks in superposition. Adv. Neural. Inf. Process. Syst. 33, 15173–15184 (2020)

    Google Scholar 

  27. Zenke, F., Poole, B., Ganguli, S.: Continual learning through synaptic intelligence. In: International Conference on Machine Learning, pp. 3987–3995. PMLR (2017)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pengxiang Wang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wang, P., Bo, H., Hong, J., Liu, W., Mu, K. (2024). AdaHAT: Adaptive Hard Attention to the Task in Task-Incremental Learning. In: Bifet, A., Davis, J., Krilavičius, T., Kull, M., Ntoutsi, E., Žliobaitė, I. (eds) Machine Learning and Knowledge Discovery in Databases. Research Track. ECML PKDD 2024. Lecture Notes in Computer Science(), vol 14943. Springer, Cham. https://doi.org/10.1007/978-3-031-70352-2_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-70352-2_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-70351-5

  • Online ISBN: 978-3-031-70352-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics