Skip to main content

Advertisement

Log in

Continual text classification based on knowledge distillation and class-aware experience replay

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Continual text classification aims at constantly classifying the texts from an infinite text stream while preserving stable classification performance on the seen texts. How to avoid catastrophic forgetting is a core issue in continual text classification. Most existing methods for handling catastrophic forgetting are based on regularization or replay. Usually, regularization-based strategies only consider one neural network layer and ignore the knowledge contained in other layers, and replay-based strategies neglect the class information. In the present paper, we introduce two strategies, knowledge distillation and class-aware experience replay, to consider two-level knowledge in a neural network and the class information to mitigate catastrophic forgetting. We use BERT as the encoder of our method. Extensive experimental results obtained on large-scale benchmarks show that our method is superior to the state-of-the-art methods under the continual learning setting.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Manning C, Schütze H (1999) Foundations of statistical natural language processing. MIT Press, Cambridge

    MATH  Google Scholar 

  2. Chen C, Teng Z, Wang Z, Zhang Y (2022) Discrete opinion tree induction for aspect-based sentiment analysis. In: Proceedings of the 60th annual meeting of the association for computational linguistics (ACL), 2051–2064

  3. He D, Ren Y, Khattak AM, Liu X, Tao S, Gao W (2021) Automatic topic labeling using graph-based pre-trained neural embedding. Neurocomputing 463:596–608

    Article  Google Scholar 

  4. Kumar S (2022) Answer-level calibration for free-form multiple choice question answering. In: Proceedings of the 60th annual meeting of the association for computational linguistics (ACL), pp 665–679

  5. Zhou Y, Liu P, Qiu X (2022) KNN-contrastive learning for out-of-domain intent classification. In: Proceedings of the 60th annual meeting of the association for computational linguistics (ACL), 2022(1) pp 5129–5141

  6. de Masson d'Autume C, Ruder S, Kong L, Yogatama D (2019) Episodic memory in lifelong language learning. In: Annual conference on neural information processing systems (NeurIPS), pp 13122–13131

  7. Sun FK, Ho CH, and Lee HY (2020) Lamol: language modeling for lifelong language learning. In: International conference on learning representations (ICLR)

  8. Gupta P, Chaudhary Y, Runkler TA, Schütze H (2020) Neural topic modeling with continual lifelong learning. In: Proceedings of the 37th international conference on machine learning (ICML), pp 3907–3917

  9. Javed K, White M (2019) Meta-learning representations for continual learning. In: Annual conference on neural information processing systems (NeurIPS), pp 1818–1828

  10. Chen Z, Liu B (2018) Lifelong machine learning. Synth Lect Artif Intell Mach Learn 12(3):1–207

    Google Scholar 

  11. Kemker R, Kanan C (2018) Fearnet: brain-inspired model for incremental learning. In: 6th International conference on learning representations (ICLR)

  12. Shin H, Lee JK, Kim J, Kim J (2017) Continual learning with deep generative replay. In: Annual Conference on neural information processing systems (NeurIPS), pp 2990–2999

  13. Aljundi R, Babiloni F, Elhoseiny M, Rohrbach M, Tuytelaars T (2018) Memory aware synapses: learning what (not) to forget. In: Proceedings of the European conference on computer vision (ECCV), pp 139–154

  14. Li Z, Hoiem D (2018) Learning without forgetting. IEEE Trans Pattern Anal Mach Intell 40(12):2935–2947

    Article  Google Scholar 

  15. Bapna A, Firat O (2019) Simple, scalable adaptation for neural machine translation. In: Proceedings of the Conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP), pp 1538–1548, Hong Kong, China. Association for Computational Linguistics

  16. Abati D, Tomczak J, Blankevoort T, Calderara S, Cucchiara R, Bejnordi BE (2020) Conditional channel gated networks for task-aware continual learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3931–3940

  17. Riemer M, Cases I, Ajemian R, Liu M, Rish I, Tu Y, Tesauro G (2019) Learning to learn without forgetting by maximizing transfer and minimizing interference. In: International conference on learning representations

  18. Gupta G, Yadav K, Paull L (2020) La-MAML: Look-ahead meta learning for continual learning. In: Annual conference on neural information processing systems (NeurIPS)

  19. Lee E, Huang CH, Lee CY (2021) Few-shot and continual learning with attentive independent mechanisms. In: IEEE/CVF international conference on computer vision (ICCV), pp 9435–9444

  20. Rebuffi SA, Kolesnikov A, Sperl G, Lampert CH (2017) icarl: incremental classifier and representation learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2001–2010

  21. Huang Y et al (2021) Continual learning for text classification with information disentanglement based regularization. In: Proceedings of the 2021 conference of the North American chapter of the association for computational linguistics: human language technologies

  22. Geoffrey EH, Vinyals O, Dean J (2015) Distilling the knowledge in a neural network. CoRR abs/1503.02531

  23. Kim Y (2014) Convolutional neural networks for sentence classification. In: Proceedings of the conference on empirical methods in natural language processing (EMNLP), pp 1746–1751

  24. Huang T, Shen G, Deng ZH (2019) Leap-LSTM: enhancing long short-term memory for text categorization. IJCAI 2019:5017–5023

    Google Scholar 

  25. Devlin J, Chang MW, Lee K, Toutanova K (2019) BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, Vol 1 (Long and Short Papers), pp 4171–4186, Minneapolis, Minnesota. Association for Computational Linguistics

  26. Wang Y, Wang S, Yao Q, Dou D (2021) Hierarchical heterogeneous graph representation learning for short text classification. In: Proceedings of the conference on empirical methods in natural language processing, pp 3091–3101

  27. Giordano M, Maddalena L, Manzo M, Guarracino MR (2022) Adversarial attacks on graph-level embedding methods: a case study. Ann Math Artif Intell 2022:1–27

    MATH  Google Scholar 

  28. McCloskey M, Cohen NJ (1989) Catastrophic interference in connectionist networks: the sequential learning problem. Psychology of learning and motivation, vol 24. Elsevier, pp 109–165

    Google Scholar 

  29. Mark BR (1994) Continual learning in reinforcement environments. In GMD-Bericht

  30. Yoon J, Madaan D, Yang E et al (2022) Online coreset selection for rehearsal-based continual learning. In: The tenth international conference on learning representations, (ICLR)

  31. Prabhu A, Torr, PH, Dokania, PK (2020) Gdumb: a simple approach that questions our progress in continual learning. In: European conference on computer vision. Springer, pp 524–540

  32. Aljundi R, Lin M, Goujaud B, Bengio Y (2019) Gradient based sample selection for online continual learning. Adv Neural Inf Process Syst 32(2019):11816–11825

    Google Scholar 

  33. Ghorbani A, Zou J (2019) Data shapley: equitable valuation of data for machine learning. In: International conference on machine learning. pp 2242–2251

  34. Zenke F, Poole B, Ganguli S (2017) Continual learning through synaptic intelligence. In: International conference on machine learning. PMLR, pp 3987–3995

  35. Qin C, Joty SR (2022) Continual few-shot relation learning via embedding space regularization and data augmentation. In: Proceedings of the 60th annual meeting of the association for computational linguistics (ACL), pp 2776–2789

  36. Wang H, Xiong W, Yu M, Guo X, Chang S, Wang WY (2019) Sentence embedding alignment for lifelong relation extraction. In: Proceedings of the conference of the North American chapter of the association for computational linguistics: human language technologies, pp 796–806

  37. Mallya A, Davis D, Lazebnik S (2018) Piggyback: adapting a single network to multiple tasks by learning to mask weights. In: Proceedings of the European conference on computer vision (ECCV), pp 67–82

  38. Yoon J, Kim S, Yang E, Hwang SJ (2019) Scalable and order-robust continual learning with additive parameter decomposition. In: International conference on learning representations

  39. Serra J, Suris D, Miron M, Karatzoglou A (2018) Overcoming catastrophic forgetting with hard attention to the task. In: International conference on machine learning, pp 4548–4557

  40. Wortsman M, Ramanujan V, Liu R, Kembhavi A, Rastegari M, Yosinski J, Farhadi A (2020) (2020) Supermasks in superposition. Adv Neural Inf Process Syst 33:15173–15184

    Google Scholar 

  41. Urban G, Geras KJ, Kahou S E, Aslan O, Wang S, Caruana R et al (2017) Do deep convolutional nets really need to be deep and convolutional? In: The 5th international conference on learning representations (ICLR)

  42. Zhang S, Feng Y, Li L (2021) Future guided incremental transformer for simultaneous translation. In: the Thirty-Fifth AAAI conference on artificial intelligence (AAAI), pp 14428–14436

  43. Jiao X, Yin Y, Shang L, Jiang X, Chen X, Li L, Wang F, Liu Q (2020) TinyBERT: Distilling BERT for Natural Language Understanding. In: Findings of the association for computational linguistics (EMNLP), pp 4163–417

  44. Triki AR, Aljundi R, Blaschko MB, Tuytelaars T (2017) Encoder based lifelong learning. ICCV 2017:1329–1337

    Google Scholar 

  45. Hou S, Pan X, Loy CC, Wang Z, Lin D (2019) Learning a unified classifier incrementally via rebalancing. CVPR 2019:831–839

    Google Scholar 

  46. PourKeshavarz M, Zhao G, Sabokrou M (2022) Looking back on learned experiences for class/task incremental learning. In: ICLR

  47. Lee K, Lee K, Shin J, Lee H (2019) Overcoming catastrophic forgetting with unlabeled data in the wild. In: IEEE/CVF international conference on computer vision (ICCV), pp 312–321

  48. Biesialska M, Biesialska K, Costa-jussà MR (2020) Continual lifelong learning in natural language processing: a survey. In: Proceedings of the 28th international conference on computational linguistics (COLING), pp 6523–6541

  49. Chaudhry A, Dokania PK, Ajanthan T, Torr PHS (2018) Riemannian walk for incremental learning: understanding forgetting and intransigence. In: European conference on computer vision (ECCV), pp 532–547

  50. ChiccoJurman DG (2020) The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genom 21:1–13

    Google Scholar 

  51. https://pytorch.org/get-started/previous-versions/

  52. Maaten L, Hinton G (2008) Visualizing data using t-sne. J Mach Learn Res 9(86):2579–2605

    MATH  Google Scholar 

Download references

Acknowledgements

We are very grateful to the anonymous reviewers. Their insightful comments are very helpful to improve the paper. The work is supported by National Key R&D Program of China (No. 2022YFA1003701) and the project of Changchun Municipal Science and Technology Bureau under the Grant 21ZY31.

Author information

Authors and Affiliations

Authors

Contributions

FY and ZF wrote the main manuscript text and YC conducted all the experiments. MK prepared all tables and SL prepared all figures. All authors reviewed the manuscript.

Corresponding author

Correspondence to Zhiguo Fu.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yang, F., Che, Y., Kang, M. et al. Continual text classification based on knowledge distillation and class-aware experience replay. Knowl Inf Syst 65, 3923–3944 (2023). https://doi.org/10.1007/s10115-023-01889-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-023-01889-4

Keywords

Navigation