AdaHAT: Adaptive Hard Attention to the Task in Task-Incremental Learning

Wang, Pengxiang; Bo, Hongbo; Hong, Jun; Liu, Weiru; Mu, Kedian

doi:10.1007/978-3-031-70352-2_9

Pengxiang Wang ORCID: orcid.org/0009-0006-3479-2298¹³,
Hongbo Bo^14,15,
Jun Hong¹⁶,
Weiru Liu¹⁵ &
…
Kedian Mu¹³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14943))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

700 Accesses

Abstract

Catastrophic forgetting is a major issue in task-incremental learning, where a neural network loses what it has learned in previous tasks after being trained on new tasks. A number of architecture-based approaches have been proposed to address this issue. However, the architecture-based approaches suffer from another issue on network capacity when the network learns long sequences of tasks. As the network is trained on an increasing number of new tasks in a long sequence of tasks, more parameters become static to prevent the network from forgetting what it has learned in previous tasks. In this paper, we propose an adaptive task-based hard attention mechanism which allows adaptive updates to static parameters by taking into account the information about previous tasks on both the importance of these parameters to previous tasks and the current network capacity. We develop a new neural network architecture incorporating our proposed Adaptive Hard Attention to the Task (AdaHAT) mechanism. AdaHAT extends an existing architecture-based approach, Hard Attention to the Task (HAT), to learn long sequences of tasks in an incremental manner. We conduct experiments on a number of datasets and compare AdaHAT with a number of baselines, including HAT. Our experimental results show that AdaHAT achieves better average performance over tasks than these baselines, especially on long sequences of tasks, demonstrating the benefits from balancing the trade-off between stability and plasticity of a network when learning such sequences of tasks. Our code is available at github.com/pengxiang-wang/continual-learning-arena.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 139.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Helpful or Harmful: Inter-task Association in Continual Learning

Avoiding Forgetting and Allowing Forward Transfer in Continual Learning via Sparse Networks

Theoretical Understanding of the Information Flow on Continual Learning Performance

Notes

1.
https://github.com/pengxiang-wang/continual-learning-arena.
2.
${\textbf {m}}^{\le 0}_l$ starts with all zeros to calculate ${\textbf {m}}^{\le 1}_l$.
3.
AdaHAT denotes the adjustment rate as $a^\star _{l,ij}$.

References

Ahn, H., Cha, S., Lee, D., Moon, T.: Uncertainty-based continual learning with adaptive regularization. In: Advances in Neural Information Processing Systems, vol. 32 (2019)
Google Scholar
Aljundi, R., Chakravarty, P., Tuytelaars, T.: Expert gate: lifelong learning with a network of experts. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3366–3375 (2017)
Google Scholar
Buzzega, P., Boschini, M., Porrello, A., Abati, D., Calderara, S.: Dark experience for general continual learning: a strong, simple baseline. Adv. Neural. Inf. Process. Syst. 33, 15920–15930 (2020)
Google Scholar
De Lange, M., et al.: A continual learning survey: defying forgetting in classification tasks. IEEE Trans. Pattern Anal. Mach. Intell. 44(7), 3366–3385 (2021)
Google Scholar
Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, pp. 249–256. JMLR Workshop and Conference Proceedings (2010)
Google Scholar
Goodfellow, I.J., Mirza, M., Xiao, D., Courville, A., Bengio, Y.: An empirical investigation of catastrophic forgetting in gradient-based neural networks. arXiv preprint arXiv:1312.6211 (2013)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Hung, C.Y., Tu, C.H., Wu, C.E., Chen, C.H., Chan, Y.M., Chen, C.S.: Compacting, picking and growing for unforgetting continual learning. In: Advances in Neural Information Processing Systems, vol. 32 (2019)
Google Scholar
Kaushik, P., Gain, A., Kortylewski, A., Yuille, A.: Understanding catastrophic forgetting and remembering in continual learning with optimal relevance mapping. arXiv preprint arXiv:2102.11343 (2021)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Kirkpatrick, J., et al.: Overcoming catastrophic forgetting in neural networks. Proc. Natl. Acad. Sci. 114(13), 3521–3526 (2017)
Article MathSciNet Google Scholar
Lee, S.W., Kim, J.H., Jun, J., Ha, J.W., Zhang, B.T.: Overcoming catastrophic forgetting by incremental moment matching. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Google Scholar
Li, Z., Hoiem, D.: Learning without forgetting. IEEE Trans. Pattern Anal. Mach. Intell. 40(12), 2935–2947 (2017)
Article Google Scholar
Lopez-Paz, D., Ranzato, M.: Gradient episodic memory for continual learning. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Google Scholar
Mallya, A., Davis, D., Lazebnik, S.: Piggyback: adapting a single network to multiple tasks by learning to mask weights. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 67–82 (2018)
Google Scholar
Mallya, A., Lazebnik, S.: PackNet: adding multiple tasks to a single network by iterative pruning. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. pp. 7765–7773 (2018)
Google Scholar
McCloskey, M., Cohen, N.J.: Catastrophic interference in connectionist networks: the sequential learning problem. In: Psychology of Learning and Motivation, vol. 24, pp. 109–165. Elsevier (1989)
Google Scholar
Nguyen, C.V., Li, Y., Bui, T.D., Turner, R.E.: Variational continual learning. arXiv preprint arXiv:1710.10628 (2017)
Ratcliff, R.: Connectionist models of recognition memory: constraints imposed by learning and forgetting functions. Psychol. Rev. 97(2), 285 (1990)
Article Google Scholar
Rebuffi, S.A., Kolesnikov, A., Sperl, G., Lampert, C.H.: iCaRl: incremental classifier and representation learning. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp. 2001–2010 (2017)
Google Scholar
Rusu, A.A., et al.: Progressive neural networks. arXiv preprint arXiv:1606.04671 (2016)
Serra, J., Suris, D., Miron, M., Karatzoglou, A.: Overcoming catastrophic forgetting with hard attention to the task. In: International Conference on Machine Learning, pp. 4548–4557. PMLR (2018)
Google Scholar
Srivastava, R.K., Masci, J., Kazerounian, S., Gomez, F., Schmidhuber, J.: Compete to compute. In: Advances in Neural Information Processing Systems, vol. 26 (2013)
Google Scholar
Van de Ven, G.M., Tolias, A.S.: Three scenarios for continual learning. arXiv preprint arXiv:1904.07734 (2019)
Wang, L., Zhang, X., Su, H., Zhu, J.: A comprehensive survey of continual learning: theory, method and application. IEEE Trans. Pattern Anal. Mach. Intell. 46, 5362–5383 (2024)
Article Google Scholar
Wortsman, M., et al.: Supermasks in superposition. Adv. Neural. Inf. Process. Syst. 33, 15173–15184 (2020)
Google Scholar
Zenke, F., Poole, B., Ganguli, S.: Continual learning through synaptic intelligence. In: International Conference on Machine Learning, pp. 3987–3995. PMLR (2017)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Mathematical Sciences, Peking University, Beijing, China
Pengxiang Wang & Kedian Mu
Population Health Sciences Institute, Newcastle University, Newcastle, UK
Hongbo Bo
School of Engineering Mathematics and Technology, University of Bristol, Bristol, UK
Hongbo Bo & Weiru Liu
School of Computing and Creative Technologies, University of the West of England, Bristol, UK
Jun Hong

Authors

Pengxiang Wang
View author publications
You can also search for this author in PubMed Google Scholar
Hongbo Bo
View author publications
You can also search for this author in PubMed Google Scholar
Jun Hong
View author publications
You can also search for this author in PubMed Google Scholar
Weiru Liu
View author publications
You can also search for this author in PubMed Google Scholar
Kedian Mu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Pengxiang Wang .

Editor information

Editors and Affiliations

LTCI, Télécom Paris, Palaiseau Cedex, France
Albert Bifet
KU Leuven, Leuven, Belgium
Jesse Davis
Faculty of Informatics, Vytautas Magnus University, Akademija, Lithuania
Tomas Krilavičius
Institute of Computer Science, University of Tartu, Tartu, Estonia
Meelis Kull
Department of Computer Science, Bundeswehr University Munich, Munich, Germany
Eirini Ntoutsi
Department of Computer Science, University of Helsinki, Helsinki, Finland
Indrė Žliobaitė

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, P., Bo, H., Hong, J., Liu, W., Mu, K. (2024). AdaHAT: Adaptive Hard Attention to the Task in Task-Incremental Learning. In: Bifet, A., Davis, J., Krilavičius, T., Kull, M., Ntoutsi, E., Žliobaitė, I. (eds) Machine Learning and Knowledge Discovery in Databases. Research Track. ECML PKDD 2024. Lecture Notes in Computer Science(), vol 14943. Springer, Cham. https://doi.org/10.1007/978-3-031-70352-2_9

Download citation

DOI: https://doi.org/10.1007/978-3-031-70352-2_9
Published: 22 August 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-70351-5
Online ISBN: 978-3-031-70352-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the ECML PKDD community (opens in a new tab)

AdaHAT: Adaptive Hard Attention to the Task in Task-Incremental Learning

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Helpful or Harmful: Inter-task Association in Continual Learning

Avoiding Forgetting and Allowing Forward Transfer in Continual Learning via Sparse Networks

Theoretical Understanding of the Information Flow on Continual Learning Performance

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Subscribe and save

Buy Now

Navigation

AdaHAT: Adaptive Hard Attention to the Task in Task-Incremental Learning

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Helpful or Harmful: Inter-task Association in Continual Learning

Avoiding Forgetting and Allowing Forward Transfer in Continual Learning via Sparse Networks

Theoretical Understanding of the Information Flow on Continual Learning Performance

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation