Abstract
Continual text classification aims at constantly classifying the texts from an infinite text stream while preserving stable classification performance on the seen texts. How to avoid catastrophic forgetting is a core issue in continual text classification. Most existing methods for handling catastrophic forgetting are based on regularization or replay. Usually, regularization-based strategies only consider one neural network layer and ignore the knowledge contained in other layers, and replay-based strategies neglect the class information. In the present paper, we introduce two strategies, knowledge distillation and class-aware experience replay, to consider two-level knowledge in a neural network and the class information to mitigate catastrophic forgetting. We use BERT as the encoder of our method. Extensive experimental results obtained on large-scale benchmarks show that our method is superior to the state-of-the-art methods under the continual learning setting.
Similar content being viewed by others
References
Manning C, Schütze H (1999) Foundations of statistical natural language processing. MIT Press, Cambridge
Chen C, Teng Z, Wang Z, Zhang Y (2022) Discrete opinion tree induction for aspect-based sentiment analysis. In: Proceedings of the 60th annual meeting of the association for computational linguistics (ACL), 2051–2064
He D, Ren Y, Khattak AM, Liu X, Tao S, Gao W (2021) Automatic topic labeling using graph-based pre-trained neural embedding. Neurocomputing 463:596–608
Kumar S (2022) Answer-level calibration for free-form multiple choice question answering. In: Proceedings of the 60th annual meeting of the association for computational linguistics (ACL), pp 665–679
Zhou Y, Liu P, Qiu X (2022) KNN-contrastive learning for out-of-domain intent classification. In: Proceedings of the 60th annual meeting of the association for computational linguistics (ACL), 2022(1) pp 5129–5141
de Masson d'Autume C, Ruder S, Kong L, Yogatama D (2019) Episodic memory in lifelong language learning. In: Annual conference on neural information processing systems (NeurIPS), pp 13122–13131
Sun FK, Ho CH, and Lee HY (2020) Lamol: language modeling for lifelong language learning. In: International conference on learning representations (ICLR)
Gupta P, Chaudhary Y, Runkler TA, Schütze H (2020) Neural topic modeling with continual lifelong learning. In: Proceedings of the 37th international conference on machine learning (ICML), pp 3907–3917
Javed K, White M (2019) Meta-learning representations for continual learning. In: Annual conference on neural information processing systems (NeurIPS), pp 1818–1828
Chen Z, Liu B (2018) Lifelong machine learning. Synth Lect Artif Intell Mach Learn 12(3):1–207
Kemker R, Kanan C (2018) Fearnet: brain-inspired model for incremental learning. In: 6th International conference on learning representations (ICLR)
Shin H, Lee JK, Kim J, Kim J (2017) Continual learning with deep generative replay. In: Annual Conference on neural information processing systems (NeurIPS), pp 2990–2999
Aljundi R, Babiloni F, Elhoseiny M, Rohrbach M, Tuytelaars T (2018) Memory aware synapses: learning what (not) to forget. In: Proceedings of the European conference on computer vision (ECCV), pp 139–154
Li Z, Hoiem D (2018) Learning without forgetting. IEEE Trans Pattern Anal Mach Intell 40(12):2935–2947
Bapna A, Firat O (2019) Simple, scalable adaptation for neural machine translation. In: Proceedings of the Conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP), pp 1538–1548, Hong Kong, China. Association for Computational Linguistics
Abati D, Tomczak J, Blankevoort T, Calderara S, Cucchiara R, Bejnordi BE (2020) Conditional channel gated networks for task-aware continual learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3931–3940
Riemer M, Cases I, Ajemian R, Liu M, Rish I, Tu Y, Tesauro G (2019) Learning to learn without forgetting by maximizing transfer and minimizing interference. In: International conference on learning representations
Gupta G, Yadav K, Paull L (2020) La-MAML: Look-ahead meta learning for continual learning. In: Annual conference on neural information processing systems (NeurIPS)
Lee E, Huang CH, Lee CY (2021) Few-shot and continual learning with attentive independent mechanisms. In: IEEE/CVF international conference on computer vision (ICCV), pp 9435–9444
Rebuffi SA, Kolesnikov A, Sperl G, Lampert CH (2017) icarl: incremental classifier and representation learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2001–2010
Huang Y et al (2021) Continual learning for text classification with information disentanglement based regularization. In: Proceedings of the 2021 conference of the North American chapter of the association for computational linguistics: human language technologies
Geoffrey EH, Vinyals O, Dean J (2015) Distilling the knowledge in a neural network. CoRR abs/1503.02531
Kim Y (2014) Convolutional neural networks for sentence classification. In: Proceedings of the conference on empirical methods in natural language processing (EMNLP), pp 1746–1751
Huang T, Shen G, Deng ZH (2019) Leap-LSTM: enhancing long short-term memory for text categorization. IJCAI 2019:5017–5023
Devlin J, Chang MW, Lee K, Toutanova K (2019) BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, Vol 1 (Long and Short Papers), pp 4171–4186, Minneapolis, Minnesota. Association for Computational Linguistics
Wang Y, Wang S, Yao Q, Dou D (2021) Hierarchical heterogeneous graph representation learning for short text classification. In: Proceedings of the conference on empirical methods in natural language processing, pp 3091–3101
Giordano M, Maddalena L, Manzo M, Guarracino MR (2022) Adversarial attacks on graph-level embedding methods: a case study. Ann Math Artif Intell 2022:1–27
McCloskey M, Cohen NJ (1989) Catastrophic interference in connectionist networks: the sequential learning problem. Psychology of learning and motivation, vol 24. Elsevier, pp 109–165
Mark BR (1994) Continual learning in reinforcement environments. In GMD-Bericht
Yoon J, Madaan D, Yang E et al (2022) Online coreset selection for rehearsal-based continual learning. In: The tenth international conference on learning representations, (ICLR)
Prabhu A, Torr, PH, Dokania, PK (2020) Gdumb: a simple approach that questions our progress in continual learning. In: European conference on computer vision. Springer, pp 524–540
Aljundi R, Lin M, Goujaud B, Bengio Y (2019) Gradient based sample selection for online continual learning. Adv Neural Inf Process Syst 32(2019):11816–11825
Ghorbani A, Zou J (2019) Data shapley: equitable valuation of data for machine learning. In: International conference on machine learning. pp 2242–2251
Zenke F, Poole B, Ganguli S (2017) Continual learning through synaptic intelligence. In: International conference on machine learning. PMLR, pp 3987–3995
Qin C, Joty SR (2022) Continual few-shot relation learning via embedding space regularization and data augmentation. In: Proceedings of the 60th annual meeting of the association for computational linguistics (ACL), pp 2776–2789
Wang H, Xiong W, Yu M, Guo X, Chang S, Wang WY (2019) Sentence embedding alignment for lifelong relation extraction. In: Proceedings of the conference of the North American chapter of the association for computational linguistics: human language technologies, pp 796–806
Mallya A, Davis D, Lazebnik S (2018) Piggyback: adapting a single network to multiple tasks by learning to mask weights. In: Proceedings of the European conference on computer vision (ECCV), pp 67–82
Yoon J, Kim S, Yang E, Hwang SJ (2019) Scalable and order-robust continual learning with additive parameter decomposition. In: International conference on learning representations
Serra J, Suris D, Miron M, Karatzoglou A (2018) Overcoming catastrophic forgetting with hard attention to the task. In: International conference on machine learning, pp 4548–4557
Wortsman M, Ramanujan V, Liu R, Kembhavi A, Rastegari M, Yosinski J, Farhadi A (2020) (2020) Supermasks in superposition. Adv Neural Inf Process Syst 33:15173–15184
Urban G, Geras KJ, Kahou S E, Aslan O, Wang S, Caruana R et al (2017) Do deep convolutional nets really need to be deep and convolutional? In: The 5th international conference on learning representations (ICLR)
Zhang S, Feng Y, Li L (2021) Future guided incremental transformer for simultaneous translation. In: the Thirty-Fifth AAAI conference on artificial intelligence (AAAI), pp 14428–14436
Jiao X, Yin Y, Shang L, Jiang X, Chen X, Li L, Wang F, Liu Q (2020) TinyBERT: Distilling BERT for Natural Language Understanding. In: Findings of the association for computational linguistics (EMNLP), pp 4163–417
Triki AR, Aljundi R, Blaschko MB, Tuytelaars T (2017) Encoder based lifelong learning. ICCV 2017:1329–1337
Hou S, Pan X, Loy CC, Wang Z, Lin D (2019) Learning a unified classifier incrementally via rebalancing. CVPR 2019:831–839
PourKeshavarz M, Zhao G, Sabokrou M (2022) Looking back on learned experiences for class/task incremental learning. In: ICLR
Lee K, Lee K, Shin J, Lee H (2019) Overcoming catastrophic forgetting with unlabeled data in the wild. In: IEEE/CVF international conference on computer vision (ICCV), pp 312–321
Biesialska M, Biesialska K, Costa-jussà MR (2020) Continual lifelong learning in natural language processing: a survey. In: Proceedings of the 28th international conference on computational linguistics (COLING), pp 6523–6541
Chaudhry A, Dokania PK, Ajanthan T, Torr PHS (2018) Riemannian walk for incremental learning: understanding forgetting and intransigence. In: European conference on computer vision (ECCV), pp 532–547
ChiccoJurman DG (2020) The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genom 21:1–13
Maaten L, Hinton G (2008) Visualizing data using t-sne. J Mach Learn Res 9(86):2579–2605
Acknowledgements
We are very grateful to the anonymous reviewers. Their insightful comments are very helpful to improve the paper. The work is supported by National Key R&D Program of China (No. 2022YFA1003701) and the project of Changchun Municipal Science and Technology Bureau under the Grant 21ZY31.
Author information
Authors and Affiliations
Contributions
FY and ZF wrote the main manuscript text and YC conducted all the experiments. MK prepared all tables and SL prepared all figures. All authors reviewed the manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Yang, F., Che, Y., Kang, M. et al. Continual text classification based on knowledge distillation and class-aware experience replay. Knowl Inf Syst 65, 3923–3944 (2023). https://doi.org/10.1007/s10115-023-01889-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-023-01889-4