Abstract
Catastrophic forgetting is a major issue in task-incremental learning, where a neural network loses what it has learned in previous tasks after being trained on new tasks. A number of architecture-based approaches have been proposed to address this issue. However, the architecture-based approaches suffer from another issue on network capacity when the network learns long sequences of tasks. As the network is trained on an increasing number of new tasks in a long sequence of tasks, more parameters become static to prevent the network from forgetting what it has learned in previous tasks. In this paper, we propose an adaptive task-based hard attention mechanism which allows adaptive updates to static parameters by taking into account the information about previous tasks on both the importance of these parameters to previous tasks and the current network capacity. We develop a new neural network architecture incorporating our proposed Adaptive Hard Attention to the Task (AdaHAT) mechanism. AdaHAT extends an existing architecture-based approach, Hard Attention to the Task (HAT), to learn long sequences of tasks in an incremental manner. We conduct experiments on a number of datasets and compare AdaHAT with a number of baselines, including HAT. Our experimental results show that AdaHAT achieves better average performance over tasks than these baselines, especially on long sequences of tasks, demonstrating the benefits from balancing the trade-off between stability and plasticity of a network when learning such sequences of tasks. Our code is available at github.com/pengxiang-wang/continual-learning-arena.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
\({\textbf {m}}^{\le 0}_l\) starts with all zeros to calculate \({\textbf {m}}^{\le 1}_l\).
- 3.
AdaHAT denotes the adjustment rate as \(a^\star _{l,ij}\).
References
Ahn, H., Cha, S., Lee, D., Moon, T.: Uncertainty-based continual learning with adaptive regularization. In: Advances in Neural Information Processing Systems, vol. 32 (2019)
Aljundi, R., Chakravarty, P., Tuytelaars, T.: Expert gate: lifelong learning with a network of experts. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3366–3375 (2017)
Buzzega, P., Boschini, M., Porrello, A., Abati, D., Calderara, S.: Dark experience for general continual learning: a strong, simple baseline. Adv. Neural. Inf. Process. Syst. 33, 15920–15930 (2020)
De Lange, M., et al.: A continual learning survey: defying forgetting in classification tasks. IEEE Trans. Pattern Anal. Mach. Intell. 44(7), 3366–3385 (2021)
Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, pp. 249–256. JMLR Workshop and Conference Proceedings (2010)
Goodfellow, I.J., Mirza, M., Xiao, D., Courville, A., Bengio, Y.: An empirical investigation of catastrophic forgetting in gradient-based neural networks. arXiv preprint arXiv:1312.6211 (2013)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Hung, C.Y., Tu, C.H., Wu, C.E., Chen, C.H., Chan, Y.M., Chen, C.S.: Compacting, picking and growing for unforgetting continual learning. In: Advances in Neural Information Processing Systems, vol. 32 (2019)
Kaushik, P., Gain, A., Kortylewski, A., Yuille, A.: Understanding catastrophic forgetting and remembering in continual learning with optimal relevance mapping. arXiv preprint arXiv:2102.11343 (2021)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Kirkpatrick, J., et al.: Overcoming catastrophic forgetting in neural networks. Proc. Natl. Acad. Sci. 114(13), 3521–3526 (2017)
Lee, S.W., Kim, J.H., Jun, J., Ha, J.W., Zhang, B.T.: Overcoming catastrophic forgetting by incremental moment matching. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Li, Z., Hoiem, D.: Learning without forgetting. IEEE Trans. Pattern Anal. Mach. Intell. 40(12), 2935–2947 (2017)
Lopez-Paz, D., Ranzato, M.: Gradient episodic memory for continual learning. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Mallya, A., Davis, D., Lazebnik, S.: Piggyback: adapting a single network to multiple tasks by learning to mask weights. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 67–82 (2018)
Mallya, A., Lazebnik, S.: PackNet: adding multiple tasks to a single network by iterative pruning. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. pp. 7765–7773 (2018)
McCloskey, M., Cohen, N.J.: Catastrophic interference in connectionist networks: the sequential learning problem. In: Psychology of Learning and Motivation, vol. 24, pp. 109–165. Elsevier (1989)
Nguyen, C.V., Li, Y., Bui, T.D., Turner, R.E.: Variational continual learning. arXiv preprint arXiv:1710.10628 (2017)
Ratcliff, R.: Connectionist models of recognition memory: constraints imposed by learning and forgetting functions. Psychol. Rev. 97(2), 285 (1990)
Rebuffi, S.A., Kolesnikov, A., Sperl, G., Lampert, C.H.: iCaRl: incremental classifier and representation learning. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp. 2001–2010 (2017)
Rusu, A.A., et al.: Progressive neural networks. arXiv preprint arXiv:1606.04671 (2016)
Serra, J., Suris, D., Miron, M., Karatzoglou, A.: Overcoming catastrophic forgetting with hard attention to the task. In: International Conference on Machine Learning, pp. 4548–4557. PMLR (2018)
Srivastava, R.K., Masci, J., Kazerounian, S., Gomez, F., Schmidhuber, J.: Compete to compute. In: Advances in Neural Information Processing Systems, vol. 26 (2013)
Van de Ven, G.M., Tolias, A.S.: Three scenarios for continual learning. arXiv preprint arXiv:1904.07734 (2019)
Wang, L., Zhang, X., Su, H., Zhu, J.: A comprehensive survey of continual learning: theory, method and application. IEEE Trans. Pattern Anal. Mach. Intell. 46, 5362–5383 (2024)
Wortsman, M., et al.: Supermasks in superposition. Adv. Neural. Inf. Process. Syst. 33, 15173–15184 (2020)
Zenke, F., Poole, B., Ganguli, S.: Continual learning through synaptic intelligence. In: International Conference on Machine Learning, pp. 3987–3995. PMLR (2017)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Wang, P., Bo, H., Hong, J., Liu, W., Mu, K. (2024). AdaHAT: Adaptive Hard Attention to the Task in Task-Incremental Learning. In: Bifet, A., Davis, J., Krilavičius, T., Kull, M., Ntoutsi, E., Žliobaitė, I. (eds) Machine Learning and Knowledge Discovery in Databases. Research Track. ECML PKDD 2024. Lecture Notes in Computer Science(), vol 14943. Springer, Cham. https://doi.org/10.1007/978-3-031-70352-2_9
Download citation
DOI: https://doi.org/10.1007/978-3-031-70352-2_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-70351-5
Online ISBN: 978-3-031-70352-2
eBook Packages: Computer ScienceComputer Science (R0)