Skip to main content
Log in

A robust and anti-forgettiable model for class-incremental learning

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

In many real-world scenarios, neural network models are not always fixed; they are expected to adapt to a dynamic environment and incrementally learn new knowledge. However, catastrophic forgetting is a challenge for incremental learning in neural networks since updating the model parameters to incorporate new knowledge often results in performance degradation on previous tasks. In this paper, we focus on class-incremental learning (CIL) and attempt to mitigate catastrophic forgetting by improving the robustness of neural networks. Specifically, we modify two aspects of the models. First, we argue that plain batch normalization (BN) has a negative effect on CIL. Hence, we propose a variant BN, called noisy batch normalization (NBN), which introduces Gaussian noise to resist the impact of the change in feature distributions and improves feature representation robustness. Second, to address the task-level overfitting problem in CIL, we introduce a decoder-based regularization (DBR) term, which employs a decoder following the feature encoder to reconstruct the input. DBR can avoid overfitting of the current task and provide a distillation loss to retain the knowledge of previous tasks. We design two CIL scenarios and validate our approaches on the CIFAR-100, MiniImageNet, Fashion MNIST, and Omniglot datasets. The results show that the performance of CIL algorithms based on our approach is better than that of the original algorithms, indicating that our approach can enhance the model robustness and help the networks extract anti-forgettable feature representations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Algorithm 1
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. Aljundi R, Lin M, Goujaud B et al (2019) Gradient based sample selection for online continual learning. In: NeurIPS, pp 11,816–11,825

  2. Buzzega P, Boschini M, Porrello A et al (2020) Dark experience for general continual learning: a strong, simple baseline. In: NeurIPS, pp 15,920–15,930

  3. Buzzega P, Boschini M, Porrello A et al (2021) Rethinking experience replay: a bag of tricks for continual learning. In: ICPR, pp 2180–2187. https://doi.org/10.1109/ICPR48806.2021.9412614

  4. Castro FM, Marín-Jiménez MJ, Guil N et al (2018) End-to-end incremental learning. In: ECCV, pp 241–257. https://doi.org/10.1007/978-3-030-01258-8_15

  5. Chaudhry A, Ranzato M, Rohrbach M et al (2019) Efficient lifelong learning with A-GEM. In: ICLR

  6. Deecke L, Murray I, Bilen H (2019) Mode normalization. In: ICLR

  7. Ding J (2022) Incremental learning with open set based discrimination enhancement. Appl Intell 52(5):5159–5172. https://doi.org/10.1007/s10489-021-02643-5

    Article  Google Scholar 

  8. Douillard A, Cord M, Ollion C et al (2020) Podnet: pooled outputs distillation for small-tasks incremental learning. In: ECCV, pp 86–102. https://doi.org/10.1007/978-3-030-58565-5_6

  9. Farajtabar M, Azizan N, Mott A et al (2020) Orthogonal gradient descent for continual learning. In: AISTATS, pp 3762–3773

  10. Fayek HM, Cavedon L, Wu HR (2020) Progressive learning: a deep learning framework for continual learning. Neural Netw 128:345–357. https://doi.org/10.1016/j.neunet.2020.05.011

    Article  MATH  Google Scholar 

  11. Gao Y, Ascoli GA, Zhao L (2021) Schematic memory persistence and transience for efficient and robust continual learning. Neural Netw 144:49–60. https://doi.org/10.1016/j.neunet.2021.08.011

    Article  Google Scholar 

  12. Hinton G, Vinyals O, Dean J (2015) Distilling the knowledge in a neural network. Comput Sci 14(7):38–39

    Google Scholar 

  13. Hou S, Pan X, Loy CC et al (2019) Learning a unified classifier incrementally via rebalancing. In: CVPR, pp 831–839. https://doi.org/10.1109/CVPR.2019.00092

  14. Ioffe S (2017) Batch renormalization: towards reducing minibatch dependence in batch-normalized models. In: NIPS, pp 1945–1953

  15. Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. In: ICML, pp 448–456

  16. Ji Z, Liu J, Wang Q et al (2021) Coordinating experience replay: a harmonious experience retention approach for continual learning. Knowl-Based Syst 234:107–589. https://doi.org/10.1016/j.knosys.2021.107589

    Article  Google Scholar 

  17. Jiang M, Li F, Liu L (2022) Continual meta-learning algorithm. Appl Intell 52(4):4527–4542. https://doi.org/10.1007/s10489-021-02543-8

    Article  Google Scholar 

  18. Kemker R, Kanan C (2018) Fearnet: Brain-inspired model for incremental learning. In: ICLR

  19. Kirkpatrick J, Pascanu R, Rabinowitz N et al (2017) Overcoming catastrophic forgetting in neural networks. Proc National Acad Sci 114(13):3521–3526. https://doi.org/10.1073/pnas.1611835114

    Article  MathSciNet  MATH  Google Scholar 

  20. Lake BM, Salakhutdinov R, Tenenbaum JB (2015) Human-level concept learning through probabilistic program induction. Science 350(6266):1332–1338. https://doi.org/10.1126/science.aab3050

    Article  MathSciNet  MATH  Google Scholar 

  21. Li Z, Hoiem D (2018) Learning without forgetting. IEEE Trans Pattern Anal Mach Intell 40 (12):2935–2947. https://doi.org/10.1109/TPAMI.2017.2773081

    Article  Google Scholar 

  22. Lomonaco V, Maltoni D, Pellegrini L (2020) Rehearsal-free continual learning over small non-i.i.d. batches. In: CVPR workshop, pp 989–998. https://doi.org/10.1109/CVPRW50498.2020.00131

  23. Lopez-Paz D, Ranzato M (2017) Gradient episodic memory for continual learning. In: NeurIPS, pp 6467–6476

  24. Mai Z, Li R, Kim H et al (2021) Supervised contrastive replay: revisiting the nearest class mean classifier in online class-incremental continual learning. In: CVPR workshop, pp 3589–3599. https://doi.org/10.1109/CVPRW53098.2021.00398

  25. McCloskey M, Cohen NJ, Bower GH (1989) Catastrophic interference in connectionist networks: the sequential learning problem, vol 24, Academic Press, pp 109–165.. https://doi.org/10.1016/S0079-7421(08)60536-8

  26. Pham Q, Liu C, HOI S (2022) Continual normalization: rethinking batch normalization for online continual learning. In: ICLR

  27. Rebuffi S, Kolesnikov A, Sperl G et al (2017) icarl: incremental classifier and representation learning. In: CVPR, pp 5533–5542. https://doi.org/10.1109/CVPR.2017.587

  28. Riemer M, Cases I, Ajemian R et al (2019) Learning to learn without forgetting by maximizing transfer and minimizing interference. In: ICLR

  29. Rosenfeld A, Tsotsos JK (2020) Incremental learning through deep adaptation. IEEE Trans Pattern Anal Mach Intell 42(3):651–663. https://doi.org/10.1109/TPAMI.2018.2884462

    Article  Google Scholar 

  30. Saha G, Garg I, Roy K (2021) Gradient projection memory for continual learning. In: ICLR

  31. Serrà J, Suris D, Miron M et al (2018) Overcoming catastrophic forgetting with hard attention to the task. In: ICML, pp 4555–4564

  32. Shim D, Mai Z, Jeong J et al (2021) Online class-incremental continual learning with adversarial shapley value. In: AAAI, pp 9630–9638

  33. Shin H, Lee JK, Kim J et al (2017) Continual learning with deep generative replay. In: NeurIPS, pp 2990–2999

  34. Sokar G, Mocanu DC, Pechenizkiy M (2021) Spacenet: make free space for continual learning. Neurocomputing 439:1–11. https://doi.org/10.1016/j.neucom.2021.01.078

    Article  Google Scholar 

  35. Vinyals O, Blundell C, Lillicrap T et al (2016) Matching networks for one shot learning. In: NeurIPS, pp 3630–3638

  36. Wu C, Herranz L, Liu X et al (2018) Memory replay gans: learning to generate new categories without forgetting. In: NeurIPS, pp 5966–5976

  37. Wu Y, Chen Y, Wang L et al (2019) Large scale incremental learning. In: CVPR, pp 374–382. https://doi.org/10.1109/CVPR.2019.00046

  38. Xiao H, Rasul K, Vollgraf R (2017) Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. CoRR arXiv:1708.07747

  39. Yan S, Xie J, He X (2021) DER: dynamically expandable representation for class incremental learning. In: CVPR, pp 3014–3023

  40. Yu H, Dai Q (2022) Dwe-il: a new incremental learning algorithm for non-stationary time series prediction via dynamically weighting ensemble learning. Appl Intell 52(1):174–194. https://doi.org/10.1007/s10489-021-02385-4

    Article  Google Scholar 

  41. Zenke F, Poole B, Ganguli S (2017) Continual learning through synaptic intelligence. In: ICML, pp 3987–3995

  42. Zhao B, Xiao X, Gan G et al (2020) Maintaining discrimination and fairness in class incremental learning. In: CVPR, pp 13,205–13,214. https://doi.org/10.1109/CVPR42600.2020.01322

  43. Zhou D, Wang F, Ye H et al (2021) Pycil: a python toolbox for class-incremental learning. CoRR arXiv:2112.12533

  44. Zhu F, Zhang XY, Wang C et al (2021) Prototype augmentation and self-supervision for incremental learning. In: CVPR, pp 5871–5880

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China (No. 72071145), and the National Key Research and Development Program of China (No. 2019YFB1704402).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yang Xiang.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A

Appendix A

Fig. 11
figure 11

T-SNE visualization of the feature distribution of the test samples from the first task (10 categories) at the first and second steps in the normal protocol on the CIFAR-100 dataset. (a) and (b) are feature distributions with BN and NBN in the ICaRL algorithm. (c) and (d) are feature distributions with BN and NBN in the UCIR algorithm. Different colors indicate different classes. The marker “o” indicates the feature distribution of the first task when it has just been trained. The marker “x” indicates the feature distribution of the first task after the model has learned the second task

Table 4 Architecture of the feature encoder, classifier, and decoder (used for DBR) based on ResNet-20 on the F-MNIST dataset
Table 5 Architecture of the feature encoder, classifier, and decoder (used for DBR) based on ResNet-32 on the CIFAR-100 dataset
Table 6 Architecture of the feature encoder, classifier, and decoder (used for DBR) based on ResNet-18 on the MiniImageNet dataset
Table 7 Architecture of the feature encoder, classifier, and decoder (used for DBR) based on ResNet-10 on the Omniglot dataset

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, J., Xiang, Y. A robust and anti-forgettiable model for class-incremental learning. Appl Intell 53, 14128–14145 (2023). https://doi.org/10.1007/s10489-022-04239-z

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-022-04239-z

Keywords

Navigation