ABSTRACT
Catastrophic forgetting poses a major challenge in continual learning where the old knowledge is forgotten when the model is updated on new tasks. Existing solutions tend to solve this challenge through generative models or exemplar-replay strategies. However, such methods may not alleviate the issue that the low-quality samples are generated or selected for the replay, which would directly reduce the effectiveness of the model, especially in the class imbalance, noise, or redundancy scenarios. Accordingly, how to select a suitable coreset during continual learning becomes significant in such setting. In this work, we propose a novel approach that leverages continual coreset sampling (CCS) to address these challenges. We aim to select the most representative subsets during each iteration. When the model is trained on new tasks, it closely approximates/matches the gradient of both the previous and current tasks with respect to the model parameters. This way, adaptation of the model to new datasets could be more efficient. Furthermore, different from the old data storage for maintaining the old knowledge, our approach choose to preserving them in the latent space. We augment the previous classes in the embedding space as the pseudo sample vectors from the old encoder output, strengthened by the joint training with selected new data. It could avoid data privacy invasions in a real-world application when we update the model. Our experiments validate the effectiveness of our proposed approach over various CV/NLP datasets under against current baselines, and we also indicate the obvious improvement of model adaptation and forgetting reduction in a data-free manner.
Supplemental Material
- Rahaf Aljundi, Francesca Babiloni, Mohamed Elhoseiny, Marcus Rohrbach, and Tinne Tuytelaars. 2018. Memory Aware Synapses: Learning What (not) to Forget. In Proceedings of the 15th European Conference on Computer Vision (ECCV), Part III. Munich, Germany, 144--161.Google ScholarDigital Library
- Rahaf Aljundi, Min Lin, Baptiste Goujaud, and Yoshua Bengio. 2019. Gradient based sample selection for online continual learning. (2019), 11816--11825.Google Scholar
- Chaitanya Baweja, Ben Glocker, and Konstantinos Kamnitsas. 2018. Towards continual learning in medical imaging. arXiv preprint arXiv:1811.02496 (2018).Google Scholar
- Zalá n Borsos, Mojmir Mutny, and Andreas Krause. 2020. Coresets via Bilevel Optimization for Continual Learning and Streaming. In Advances in Neural Information Processing Systems (NeurIPS). virtual.Google Scholar
- Pietro Buzzega, Matteo Boschini, Angelo Porrello, Davide Abati, and Simone Calderara. 2020. Dark Experience for General Continual Learning: a Strong, Simple Baseline. In Advances in Neural Information Processing Systems (NeurIPS). virtual.Google Scholar
- Arslan Chaudhry, Marc'Aurelio Ranzato, Marcus Rohrbach, and Mohamed Elhoseiny. 2019. Efficient Lifelong Learning with A-GEM. In Proceedings of the 7th International Conference on Learning Representations (ICLR). New Orleans, LA.Google Scholar
- Yung-Sung Chuang, Shang-Yu Su, and Yun-Nung Chen. 2020. Lifelong Language Knowledge Distillation. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). Online, 2914--2924.Google ScholarCross Ref
- Cyprien de Masson D'Autume, Sebastian Ruder, Lingpeng Kong, and Dani Yogatama. 2019. Episodic memory in lifelong language learning. NeurIPS, Vol. 32 (2019).Google Scholar
- Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT). Minneapolis, MN, 4171--4186.Google Scholar
- Songlin Dong, Xiaopeng Hong, Xiaoyu Tao, Xinyuan Chang, Xing Wei, and Yihong Gong. 2021. Few-Shot Class-Incremental Learning via Relation Knowledge Distillation. In Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI). Virtual Event, 1255--1263.Google ScholarCross Ref
- Enrico Fini, Stéphane Lathuilière, Enver Sangineto, Moin Nabi, and Elisa Ricci. 2020. Online Continual Learning Under Extreme Memory Constraints. In Proceedings of the 16th European Conference on Computer Vision (ECCV), Part XXVIII. Glasgow, UK, 720--735.Google ScholarDigital Library
- Ross B. Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. 2014. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Columbus, OH, 580--587.Google ScholarDigital Library
- Geof H Givens and Jennifer A Hoeting. 2012. Computational statistics. Vol. 703. John Wiley & Sons.Google Scholar
- Junfeng Guo and Cong Liu. 2020. Practical Poisoning Attacks on Neural Networks. In Proceedings of the 16th European Conference on Computer Vision (ECCV), Part XXVII. Glasgow UK, 142--158.Google ScholarDigital Library
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, NV, 770--778.Google ScholarCross Ref
- Byeongho Heo, Minsik Lee, Sangdoo Yun, and Jin Young Choi. 2019. Knowledge Transfer via Distillation of Activation Boundaries Formed by Hidden Neurons. In Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence (AAAI). Honolulu, HI, 3779--3787.Google ScholarDigital Library
- Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. 2015. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015).Google Scholar
- Saihui Hou, Xinyu Pan, Chen Change Loy, Zilei Wang, and Dahua Lin. 2018. Lifelong Learning via Progressive Distillation and Retrospection. In Proceedings of the 15th European Conference on Computer Vision (ECCV), Part III. Munich, Germany, 452--467.Google ScholarDigital Library
- Saihui Hou, Xinyu Pan, Chen Change Loy, Zilei Wang, and Dahua Lin. 2019. Learning a unified classifier incrementally via rebalancing. In CVPR. 831--839.Google Scholar
- Yufan Huang, Yanzhe Zhang, Jiaao Chen, Xuezhi Wang, and Diyi Yang. 2021. Continual learning for text classification with information disentanglement based regularization. NAACL (2021).Google Scholar
- Angelos Katharopoulos and Francc ois Fleuret. 2018. Not all samples are created equal: Deep learning with importance sampling. In ICML. PMLR, 2525--2534.Google Scholar
- KrishnaTeja Killamsetty, Durga Sivasubramanian, Ganesh Ramakrishnan, Abir De, and Rishabh K. Iyer. 2021b. GRAD-MATCH: Gradient Matching based Data Subset Selection for Efficient Deep Model Training. (2021), 5464--5474.Google Scholar
- KrishnaTeja Killamsetty, Durga Sivasubramanian, Ganesh Ramakrishnan, and Rishabh K. Iyer. 2021a. GLISTER: Generalization based Data Subset Selection for Efficient and Robust Learning. In Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI). Virtual Event, 8110--8118.Google Scholar
- James Kirkpatrick, Razvan Pascanu, Neil Rabinowitz, Joel Veness, Guillaume Desjardins, Andrei A Rusu, Kieran Milan, John Quan, Tiago Ramalho, Agnieszka Grabska-Barwinska, et al. 2017. Overcoming catastrophic forgetting in neural networks. Proceedings of the national academy of sciences (2017), 201611835.Google ScholarCross Ref
- Alex Krizhevsky and Geoffrey Hinton. 2009. Learning multiple layers of features from tiny images. Technical Report. Citeseer.Google Scholar
- Dingcheng Li, Zheng Chen, Eunah Cho, Jie Hao, Xiaohu Liu, Fan Xing, Chenlei Guo, and Yang Liu. 2022. Overcoming Catastrophic Forgetting During Domain Adaptation of Seq2seq Language Generation. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 5441--5454.Google ScholarCross Ref
- Yuanpeng Li, Liang Zhao, Kenneth Church, and Mohamed Elhoseiny. 2020. Compositional Language Continual Learning. In Proceedings of the 8th International Conference on Learning Representations (ICLR). Addis Ababa, Ethiopia.Google Scholar
- Zhizhong Li and Derek Hoiem. 2018. Learning without Forgetting. IEEE Trans. Pattern Anal. Mach. Intell., Vol. 40, 12 (2018), 2935--2947.Google ScholarDigital Library
- Michael McCloskey and Neal J Cohen. 1989. Catastrophic interference in connectionist networks: The sequential learning problem. In Psychology of learning and motivation. Vol. 24. Elsevier, 109--165.Google Scholar
- German Ignacio Parisi, Ronald Kemker, Jose L. Part, Christopher Kanan, and Stefan Wermter. 2019. Continual lifelong learning with neural networks: A review. Neural Networks, Vol. 113 (2019), 54--71.Google ScholarDigital Library
- Sylvestre-Alvise Rebuffi, Alexander Kolesnikov, Georg Sperl, and Christoph H. Lampert. 2017. iCaRL: Incremental Classifier and Representation Learning. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, HI, 5533--5542.Google Scholar
- Kan Ren, Jiarui Qin, Yuchen Fang, Weinan Zhang, Lei Zheng, Weijie Bian, Guorui Zhou, Jian Xu, Yong Yu, Xiaoqiang Zhu, and Kun Gai. 2019. Lifelong Sequential Modeling with Personalized Memorization for User Response Prediction. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR). Paris, France, 565--574.Google ScholarDigital Library
- Ozan Sener and Silvio Savarese. 2018. Active Learning for Convolutional Neural Networks: A Core-Set Approach. In Proceedings of the 6th International Conference on Learning Representations (ICLR). Vancouver, Canada.Google Scholar
- Hanul Shin, Jung Kwon Lee, Jaehong Kim, and Jiwon Kim. 2017. Continual Learning with Deep Generative Replay. In Advances in Neural Information Processing Systems (NIPS). Long Beach, CA, 2990--2999.Google ScholarDigital Library
- Fan-Keng Sun, Cheng-Hao Ho, and Hung-Yi Lee. 2020. LAMOL: LAnguage MOdeling for Lifelong Language Learning. In Proceedings of the 8th International Conference on Learning Representations (ICLR). Addis Ababa, Ethiopia.Google Scholar
- Gido M van de Ven and Andreas S Tolias. 2019. Three scenarios for continual learning. arXiv preprint arXiv:1904.07734 (2019).Google Scholar
- Hong Wang, Wenhan Xiong, Mo Yu, Xiaoxiao Guo, Shiyu Chang, and William Yang Wang. 2019b. Sentence embedding alignment for lifelong relation extraction. NAACL (2019).Google Scholar
- Xinshao Wang, Yang Hua, Elyor Kodirov, Guosheng Hu, and Neil M Robertson. 2019a. Deep metric learning by online soft mining and class-aware attention. In AAAI, Vol. 33. 5361--5368.Google ScholarDigital Library
- Yigong Wang, Zhuoyi Wang, Yu Lin, Latifur Khan, and Dingcheng Li. 2021b. CIFDM: Continual and Interactive Feature Distillation for Multi-Label Stream Learning. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR). Virtual Event, Canada, 2121--2125.Google ScholarDigital Library
- Zhuoyi Wang, Yuqiao Chen, Chen Zhao, Yu Lin, Xujiang Zhao, Hemeng Tao, Yigong Wang, and Latifur Khan. 2021a. CLEAR: Contrastive-Prototype Learning with Drift Estimation for Resource Constrained Stream Mining. In Proceedings of the Web Conference (WWW). Virtual Event / Ljubljana, Slovenia, 1351--1362.Google ScholarDigital Library
- Kai Wei, Rishabh K. Iyer, and Jeff A. Bilmes. 2015. Submodularity in Data Subset Selection and Active Learning. In Proceedings of the 32nd International Conference on Machine Learning (ICML). Lille, France, 1954--1963.Google Scholar
- Hongxu Yin, Pavlo Molchanov, Jose M. Alvarez, Zhizhong Li, Arun Mallya, Derek Hoiem, Niraj K. Jha, and Jan Kautz. 2020. Dreaming to Distill: Data-Free Knowledge Transfer via DeepInversion. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Seattle, WA, 8712--8721.Google ScholarCross Ref
- Jaehong Yoon, Divyam Madaan, Eunho Yang, and Sung Ju Hwang. 2022. Online Coreset Selection for Rehearsal-based Continual Learning. In Proceedings of the Tenth International Conference on Learning Representations (ICLR). Virtual Event.Google Scholar
- Lu Yu, Bartlomiej Twardowski, Xialei Liu, Luis Herranz, Kai Wang, Yongmei Cheng, Shangling Jui, and Joost van de Weijer. 2020. Semantic Drift Compensation for Class-Incremental Learning. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Seattle, WA, 6980--6989.Google Scholar
- Friedemann Zenke, Ben Poole, and Surya Ganguli. 2017. Continual Learning Through Synaptic Intelligence. In Proceedings of the 34th International Conference on Machine Learning (ICML). Sydney, Australia, 3987--3995.Google ScholarDigital Library
- Xiang Zhang, Junbo Jake Zhao, and Yann LeCun. 2015. Character-level Convolutional Networks for Text Classification. In Advances in Neural Information Processing Systems (NIPS). Montreal, Canada, 649--657.Google ScholarDigital Library
- Fei Zhu, Xu-Yao Zhang, Chuang Wang, Fei Yin, and Cheng-Lin Liu. 2021. Prototype augmentation and self-supervision for incremental learning. In CVPR. 5871--5880.Google Scholar
Index Terms
- Latent Coreset Sampling based Data-Free Continual Learning
Recommendations
Fast collapsed gibbs sampling for latent dirichlet allocation
KDD '08: Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data miningIn this paper we introduce a novel collapsed Gibbs sampling method for the widely used latent Dirichlet allocation (LDA) model. Our new method results in significant speedups on real world text corpora. Conventional Gibbs sampling schemes for LDA ...
Continual Learning with Neuron Activation Importance
Image Analysis and Processing – ICIAP 2022AbstractContinual learning is a concept of online learning with multiple sequential tasks. One of the critical barriers of continual learning is that a network should learn a new task keeping the knowledge of old tasks without access to any data of the ...
Comments