Continual text classification based on knowledge distillation and class-aware experience replay

Yang, Fengqin; Che, Yinshu; Kang, Mei; Liu, Shuhua; Fu, Zhiguo

doi:10.1007/s10115-023-01889-4

Continual text classification based on knowledge distillation and class-aware experience replay

Regular Paper
Published: 11 May 2023

Volume 65, pages 3923–3944, (2023)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Fengqin Yang¹,
Yinshu Che¹,
Mei Kang¹,
Shuhua Liu¹ &
…
Zhiguo Fu¹

315 Accesses
1 Altmetric
Explore all metrics

Abstract

Continual text classification aims at constantly classifying the texts from an infinite text stream while preserving stable classification performance on the seen texts. How to avoid catastrophic forgetting is a core issue in continual text classification. Most existing methods for handling catastrophic forgetting are based on regularization or replay. Usually, regularization-based strategies only consider one neural network layer and ignore the knowledge contained in other layers, and replay-based strategies neglect the class information. In the present paper, we introduce two strategies, knowledge distillation and class-aware experience replay, to consider two-level knowledge in a neural network and the class information to mitigate catastrophic forgetting. We use BERT as the encoder of our method. Extensive experimental results obtained on large-scale benchmarks show that our method is superior to the state-of-the-art methods under the continual learning setting.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Analyzing Natural Language Essay Generator Models Using Long Short-Term Memory Neural Networks

Difficulty-Aware Mixup for Replay-based Continual Learning

Reinforcement Learning for Extreme Multi-label Text Classification

References

Manning C, Schütze H (1999) Foundations of statistical natural language processing. MIT Press, Cambridge
MATH Google Scholar
Chen C, Teng Z, Wang Z, Zhang Y (2022) Discrete opinion tree induction for aspect-based sentiment analysis. In: Proceedings of the 60th annual meeting of the association for computational linguistics (ACL), 2051–2064
He D, Ren Y, Khattak AM, Liu X, Tao S, Gao W (2021) Automatic topic labeling using graph-based pre-trained neural embedding. Neurocomputing 463:596–608
Article Google Scholar
Kumar S (2022) Answer-level calibration for free-form multiple choice question answering. In: Proceedings of the 60th annual meeting of the association for computational linguistics (ACL), pp 665–679
Zhou Y, Liu P, Qiu X (2022) KNN-contrastive learning for out-of-domain intent classification. In: Proceedings of the 60th annual meeting of the association for computational linguistics (ACL), 2022(1) pp 5129–5141
de Masson d'Autume C, Ruder S, Kong L, Yogatama D (2019) Episodic memory in lifelong language learning. In: Annual conference on neural information processing systems (NeurIPS), pp 13122–13131
Sun FK, Ho CH, and Lee HY (2020) Lamol: language modeling for lifelong language learning. In: International conference on learning representations (ICLR)
Gupta P, Chaudhary Y, Runkler TA, Schütze H (2020) Neural topic modeling with continual lifelong learning. In: Proceedings of the 37th international conference on machine learning (ICML), pp 3907–3917
Javed K, White M (2019) Meta-learning representations for continual learning. In: Annual conference on neural information processing systems (NeurIPS), pp 1818–1828
Chen Z, Liu B (2018) Lifelong machine learning. Synth Lect Artif Intell Mach Learn 12(3):1–207
Google Scholar
Kemker R, Kanan C (2018) Fearnet: brain-inspired model for incremental learning. In: 6th International conference on learning representations (ICLR)
Shin H, Lee JK, Kim J, Kim J (2017) Continual learning with deep generative replay. In: Annual Conference on neural information processing systems (NeurIPS), pp 2990–2999
Aljundi R, Babiloni F, Elhoseiny M, Rohrbach M, Tuytelaars T (2018) Memory aware synapses: learning what (not) to forget. In: Proceedings of the European conference on computer vision (ECCV), pp 139–154
Li Z, Hoiem D (2018) Learning without forgetting. IEEE Trans Pattern Anal Mach Intell 40(12):2935–2947
Article Google Scholar
Bapna A, Firat O (2019) Simple, scalable adaptation for neural machine translation. In: Proceedings of the Conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP), pp 1538–1548, Hong Kong, China. Association for Computational Linguistics
Abati D, Tomczak J, Blankevoort T, Calderara S, Cucchiara R, Bejnordi BE (2020) Conditional channel gated networks for task-aware continual learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3931–3940
Riemer M, Cases I, Ajemian R, Liu M, Rish I, Tu Y, Tesauro G (2019) Learning to learn without forgetting by maximizing transfer and minimizing interference. In: International conference on learning representations
Gupta G, Yadav K, Paull L (2020) La-MAML: Look-ahead meta learning for continual learning. In: Annual conference on neural information processing systems (NeurIPS)
Lee E, Huang CH, Lee CY (2021) Few-shot and continual learning with attentive independent mechanisms. In: IEEE/CVF international conference on computer vision (ICCV), pp 9435–9444
Rebuffi SA, Kolesnikov A, Sperl G, Lampert CH (2017) icarl: incremental classifier and representation learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2001–2010
Huang Y et al (2021) Continual learning for text classification with information disentanglement based regularization. In: Proceedings of the 2021 conference of the North American chapter of the association for computational linguistics: human language technologies
Geoffrey EH, Vinyals O, Dean J (2015) Distilling the knowledge in a neural network. CoRR abs/1503.02531
Kim Y (2014) Convolutional neural networks for sentence classification. In: Proceedings of the conference on empirical methods in natural language processing (EMNLP), pp 1746–1751
Huang T, Shen G, Deng ZH (2019) Leap-LSTM: enhancing long short-term memory for text categorization. IJCAI 2019:5017–5023
Google Scholar
Devlin J, Chang MW, Lee K, Toutanova K (2019) BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, Vol 1 (Long and Short Papers), pp 4171–4186, Minneapolis, Minnesota. Association for Computational Linguistics
Wang Y, Wang S, Yao Q, Dou D (2021) Hierarchical heterogeneous graph representation learning for short text classification. In: Proceedings of the conference on empirical methods in natural language processing, pp 3091–3101
Giordano M, Maddalena L, Manzo M, Guarracino MR (2022) Adversarial attacks on graph-level embedding methods: a case study. Ann Math Artif Intell 2022:1–27
MATH Google Scholar
McCloskey M, Cohen NJ (1989) Catastrophic interference in connectionist networks: the sequential learning problem. Psychology of learning and motivation, vol 24. Elsevier, pp 109–165
Google Scholar
Mark BR (1994) Continual learning in reinforcement environments. In GMD-Bericht
Yoon J, Madaan D, Yang E et al (2022) Online coreset selection for rehearsal-based continual learning. In: The tenth international conference on learning representations, (ICLR)
Prabhu A, Torr, PH, Dokania, PK (2020) Gdumb: a simple approach that questions our progress in continual learning. In: European conference on computer vision. Springer, pp 524–540
Aljundi R, Lin M, Goujaud B, Bengio Y (2019) Gradient based sample selection for online continual learning. Adv Neural Inf Process Syst 32(2019):11816–11825
Google Scholar
Ghorbani A, Zou J (2019) Data shapley: equitable valuation of data for machine learning. In: International conference on machine learning. pp 2242–2251
Zenke F, Poole B, Ganguli S (2017) Continual learning through synaptic intelligence. In: International conference on machine learning. PMLR, pp 3987–3995
Qin C, Joty SR (2022) Continual few-shot relation learning via embedding space regularization and data augmentation. In: Proceedings of the 60th annual meeting of the association for computational linguistics (ACL), pp 2776–2789
Wang H, Xiong W, Yu M, Guo X, Chang S, Wang WY (2019) Sentence embedding alignment for lifelong relation extraction. In: Proceedings of the conference of the North American chapter of the association for computational linguistics: human language technologies, pp 796–806
Mallya A, Davis D, Lazebnik S (2018) Piggyback: adapting a single network to multiple tasks by learning to mask weights. In: Proceedings of the European conference on computer vision (ECCV), pp 67–82
Yoon J, Kim S, Yang E, Hwang SJ (2019) Scalable and order-robust continual learning with additive parameter decomposition. In: International conference on learning representations
Serra J, Suris D, Miron M, Karatzoglou A (2018) Overcoming catastrophic forgetting with hard attention to the task. In: International conference on machine learning, pp 4548–4557
Wortsman M, Ramanujan V, Liu R, Kembhavi A, Rastegari M, Yosinski J, Farhadi A (2020) (2020) Supermasks in superposition. Adv Neural Inf Process Syst 33:15173–15184
Google Scholar
Urban G, Geras KJ, Kahou S E, Aslan O, Wang S, Caruana R et al (2017) Do deep convolutional nets really need to be deep and convolutional? In: The 5th international conference on learning representations (ICLR)
Zhang S, Feng Y, Li L (2021) Future guided incremental transformer for simultaneous translation. In: the Thirty-Fifth AAAI conference on artificial intelligence (AAAI), pp 14428–14436
Jiao X, Yin Y, Shang L, Jiang X, Chen X, Li L, Wang F, Liu Q (2020) TinyBERT: Distilling BERT for Natural Language Understanding. In: Findings of the association for computational linguistics (EMNLP), pp 4163–417
Triki AR, Aljundi R, Blaschko MB, Tuytelaars T (2017) Encoder based lifelong learning. ICCV 2017:1329–1337
Google Scholar
Hou S, Pan X, Loy CC, Wang Z, Lin D (2019) Learning a unified classifier incrementally via rebalancing. CVPR 2019:831–839
Google Scholar
PourKeshavarz M, Zhao G, Sabokrou M (2022) Looking back on learned experiences for class/task incremental learning. In: ICLR
Lee K, Lee K, Shin J, Lee H (2019) Overcoming catastrophic forgetting with unlabeled data in the wild. In: IEEE/CVF international conference on computer vision (ICCV), pp 312–321
Biesialska M, Biesialska K, Costa-jussà MR (2020) Continual lifelong learning in natural language processing: a survey. In: Proceedings of the 28th international conference on computational linguistics (COLING), pp 6523–6541
Chaudhry A, Dokania PK, Ajanthan T, Torr PHS (2018) Riemannian walk for incremental learning: understanding forgetting and intransigence. In: European conference on computer vision (ECCV), pp 532–547
ChiccoJurman DG (2020) The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genom 21:1–13
Google Scholar
https://pytorch.org/get-started/previous-versions/
Maaten L, Hinton G (2008) Visualizing data using t-sne. J Mach Learn Res 9(86):2579–2605
MATH Google Scholar

Download references

Acknowledgements

We are very grateful to the anonymous reviewers. Their insightful comments are very helpful to improve the paper. The work is supported by National Key R&D Program of China (No. 2022YFA1003701) and the project of Changchun Municipal Science and Technology Bureau under the Grant 21ZY31.

Author information

Authors and Affiliations

School of Information Science and Technology, Northeast Normal University, Changchun, China
Fengqin Yang, Yinshu Che, Mei Kang, Shuhua Liu & Zhiguo Fu

Authors

Fengqin Yang
View author publications
You can also search for this author in PubMed Google Scholar
Yinshu Che
View author publications
You can also search for this author in PubMed Google Scholar
Mei Kang
View author publications
You can also search for this author in PubMed Google Scholar
Shuhua Liu
View author publications
You can also search for this author in PubMed Google Scholar
Zhiguo Fu
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

FY and ZF wrote the main manuscript text and YC conducted all the experiments. MK prepared all tables and SL prepared all figures. All authors reviewed the manuscript.

Corresponding author

Correspondence to Zhiguo Fu.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Yang, F., Che, Y., Kang, M. et al. Continual text classification based on knowledge distillation and class-aware experience replay. Knowl Inf Syst 65, 3923–3944 (2023). https://doi.org/10.1007/s10115-023-01889-4

Download citation

Received: 22 December 2022
Revised: 14 February 2023
Accepted: 22 April 2023
Published: 11 May 2023
Issue Date: October 2023
DOI: https://doi.org/10.1007/s10115-023-01889-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Continual text classification based on knowledge distillation and class-aware experience replay

Abstract

Access this article

Similar content being viewed by others

Analyzing Natural Language Essay Generator Models Using Long Short-Term Memory Neural Networks

Difficulty-Aware Mixup for Replay-based Continual Learning

Reinforcement Learning for Extreme Multi-label Text Classification

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Continual text classification based on knowledge distillation and class-aware experience replay

Abstract

Access this article

Similar content being viewed by others

Analyzing Natural Language Essay Generator Models Using Long Short-Term Memory Neural Networks

Difficulty-Aware Mixup for Replay-based Continual Learning

Reinforcement Learning for Extreme Multi-label Text Classification

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation