skip to main content
research-article

Distilled Meta-learning for Multi-Class Incremental Learning

Published:15 March 2023Publication History
Skip Abstract Section

Abstract

Meta-learning approaches have recently achieved promising performance in multi-class incremental learning. However, meta-learners still suffer from catastrophic forgetting, i.e., they tend to forget the learned knowledge from the old tasks when they focus on rapidly adapting to the new classes of the current task. To solve this problem, we propose a novel distilled meta-learning (DML) framework for multi-class incremental learning that integrates seamlessly meta-learning with knowledge distillation in each incremental stage. Specifically, during inner-loop training, knowledge distillation is incorporated into the DML to overcome catastrophic forgetting. During outer-loop training, a meta-update rule is designed for the meta-learner to learn across tasks and quickly adapt to new tasks. By virtue of the bilevel optimization, our model is encouraged to reach a balance between the retention of old knowledge and the learning of new knowledge. Experimental results on four benchmark datasets demonstrate the effectiveness of our proposal and show that our method significantly outperforms other state-of-the-art incremental learning methods.

REFERENCES

  1. [1] Aljundi Rahaf, Babiloni Francesca, Elhoseiny Mohamed, Rohrbach Marcus, and Tuytelaars Tinne. 2018. Memory aware synapses: Learning what (not) to forget. In Proceedings of the European Conference on Computer Vision (ECCV’18), Ferrari Vittorio, Hebert Martial, Sminchisescu Cristian, and Weiss Yair (Eds.). Springer International Publishing, Cham, 144161. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. [2] Bengio Yoshua, Bengio Samy, and Cloutier Jocelyn. 1990. Learning a Synaptic Learning Rule. Citeseer.Google ScholarGoogle Scholar
  3. [3] Castro F. M., Marín-Jiménez M. J., Guil N., Schmid C., and Alahari K.. 2018. End-to-end incremental learning. In Proceedings of the European Conference on Computer Vision. 241257.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. [4] Chaudhry Arslan, Dokania Puneet K., Ajanthan Thalaiyasingam, and Torr Philip H. S.. 2018. Riemannian walk for incremental learning: Understanding forgetting and intransigence. In Proceedings of the European Conference on Computer Vision (ECCV’18). 532547.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. [5] Cheng Meng, Wang Hanli, and Long Yu. 2021. Meta-learning-based incremental few-shot object detection. IEEE Trans Circ. Syst. Vid. Technol. 32, 4 (2021), 21582169.Google ScholarGoogle ScholarCross RefCross Ref
  6. [6] Chi Zhixiang, Gu Li, Liu Huan, Wang Yang, Yu Yuanhao, and Tang Jin. 2022. MetaFSCIL: A meta-learning approach for few-shot class incremental learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1416614175.Google ScholarGoogle ScholarCross RefCross Ref
  7. [7] Cichon J. and Gan Wen Biao. 2015. Branch-specific dendritic Ca2+ spikes cause persistent synaptic plasticity. Nature 520, 7546 (2015), 180185.Google ScholarGoogle ScholarCross RefCross Ref
  8. [8] Lopez-Paz M.-A. Ranzato and D.. 2017. Gradient episodic memory for continual learning. In Advances in Neural Information Processing Systems. 64706479.Google ScholarGoogle Scholar
  9. [9] Dhar P., Singh R. V., Peng K., Wu Z., and Chellappa R.. 2019. Learning without memorizing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’19). 51335141. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  10. [10] Ditzler Gregory, Roveri Manuel, Alippi Cesare, and Polikar Robi. 2015. Learning in nonstationary environments: A survey. IEEE Comput. Intell. Mag. 10, 4 (2015), 1225.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. [11] Douillard Arthur, Chen Yifu, Dapogny Arnaud, and Cord Matthieu. 2021. PLOP: Learning without forgetting for continual semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’21). 40404050.Google ScholarGoogle ScholarCross RefCross Ref
  12. [12] Hinton G., Vinyals O., and Dean J.. 2015. Distilling the knowledge in a neural network. Comput. Sci. 14, 7 (2015), 3839.Google ScholarGoogle Scholar
  13. [13] Hou Saihui, Pan Xinyu, Loy Chen Change, Wang Zilei, and Lin Dahua. 2019. Learning a unified classifier incrementally via rebalancing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’19). 831839. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  14. [14] Javed Khurram and White Martha. 2019. Meta-learning representations for continual learning. In Advances in Neural Information Processing Systems, Vol. 32. 115.Google ScholarGoogle Scholar
  15. [15] Kirkpatrick James, Pascanu Razvan, Rabinowitz Neil, Veness Joel, Desjardins Guillaume, Rusu Andrei A., Milan Kieran, Quan John, Ramalho Tiago, Grabska-Barwinska Agnieszka, Hassabis Demis, Clopath Claudia, Kumaran Dharshan, and Hadsell Raia. 2017. Overcoming catastrophic forgetting in neural networks. Proc. Natl. Acad. Sci. U.S.A. 114, 13 (March2017), 35213526. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  16. [16] Kj Joseph, Rajasegaran Jathushan, Khan Salman, Khan Fahad Shahbaz, and Balasubramanian Vineeth N.. 2021. Incremental object detection via meta-learning. IEEE Trans. Pattern Anal. Mach. Intell. (2021), 111.Google ScholarGoogle Scholar
  17. [17] Krizhevsky A. and Hinton G.. 2009. Learning multiple layers of features from tiny images. Computer Science Department, University of Toronto, Technical Report 1 (012009).Google ScholarGoogle Scholar
  18. [18] Lange S. and Grieser G.. 2002. On the power of incremental learning. Theor. Comput. Sci. 2, 288 (2002), 277307.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. [19] Lecun Y. and Cortes C.. 2010. The mnist Database of Handwritten Digits. Retrieved from http://yann.lecun.com/exdb/mnist/.Google ScholarGoogle Scholar
  20. [20] Li Z. and Hoiem D.. 2017. Learning without forgetting. IEEE Trans. Pattern Anal. Mach. Intell. 40, 12 (2017), 29352947.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. [21] Liu Liyuan, Jiang Haoming, He Pengcheng, Chen Weizhu, Liu Xiaodong, Gao Jianfeng, and Han Jiawei. 2019. On the variance of the adaptive learning rate and beyond. arXiv:1908.03265. Retrieved from https://arxiv.org/abs/1908.03265.Google ScholarGoogle Scholar
  22. [22] Liu X., Masana M., Herranz L., Weijer J. Van de, López A. M., and Bagdanov A. D.. 2018. Rotate your Networks: Better weight consolidation and less catastrophic forgetting. In Proceedings of the 24th International Conference on Pattern Recognition (ICPR’18). 22622268. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  23. [23] Liu Yu, Hong Xiaopeng, Tao Xiaoyu, Dong Songlin, Shi Jingang, and Gong Yihong. 2021. Structural knowledge organization and transfer for class-incremental learning. In Proceedings of the ACM Multimedia Asia (MMAsia’21). Association for Computing Machinery, New York, NY, Article 18, 7 pages. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. [24] Liu Yaoyao, Schiele Bernt, and Sun Qianru. 2021. Adaptive aggregation networks for class-incremental learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 25442553.Google ScholarGoogle ScholarCross RefCross Ref
  25. [25] Liu Yaoyao, Su Yuting, Liu An-An, Schiele Bernt, and Sun Qianru. 2020. Mnemonics Training: Multi-class incremental learning without forgetting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’20). IEEE, 1224512254. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  26. [26] McCloskey Michael and Cohen Neal J.. 1989. Catastrophic interference in connectionist networks: The sequential learning problem. In Psychology of Learning and Motivation. Elsevier, 109165. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  27. [27] Mermillod Martial, Bugaiska Aurélia, and BONIN Patrick. 2013. The stability-plasticity dilemma: Investigating the continuum from catastrophic forgetting to age-limited learning effects. Front. Psychol. 4, 504 (2013), 13. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  28. [28] Muhlbaier M. D., Topalis A., and Polikar R.. 2009. Learn\(^{++}\).NC: Combining ensemble of classifiers with dynamically weighted consult-and-vote for efficient incremental learning of new classes. IEEE Trans. Neural Netw. 20, 1 (2009), 152168.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. [29] Netzer Yuval, Wang Tao, Coates Adam, Bissacco Alessandro, Wu Bo, and Ng Andrew Y.. 2011. Reading digits in natural images with unsupervised feature learning. In NIPS Workshop on Deep Learning and Unsupervised Feature Learning 2011. 19.Google ScholarGoogle Scholar
  30. [30] Nichol Alex, Achiam Joshua, and Schulman John. 2018. On first-order meta-learning algorithms. arXiv:1803.02999. Retrieved from https://arxiv.org/abs/1803.02999.Google ScholarGoogle Scholar
  31. [31] Nichol Alex and Schulman John. 2018. Reptile: A scalable metalearning algorithm. arXiv:1803.02999. Retrieved from https://arxiv.org/abs/1803.02999.Google ScholarGoogle Scholar
  32. [32] Rajasegaran Jathushan, Hayat Munawar, Khan Salman, Khan Fahad Shahbaz, and Shao Ling. 2019. Random path selection for incremental learning. Advances in Neural Information Processing Systems (2019), 111.Google ScholarGoogle Scholar
  33. [33] Rajasegaran Jathushan, Khan Salman, Hayat Munawar, Khan Fahad Shahbaz, and Shah Mubarak. 2020. iTAML: An incremental task-agnostic meta-learning approach. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’20). IEEE, 1358813597. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  34. [34] Rebuffi S., Kolesnikov A., Sperl G., and Lampert C. H.. 2017. iCaRL: Incremental classifier and representation learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). 55335542. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  35. [35] Ruping S.. 2001. Incremental learning with support vector machines. In Proceedings of the IEEE International Conference on Data Mining. 641642. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  36. [36] Russakovsky O., Deng J., Su H., Krause J., Satheesh S., Ma S., Huang Z., Karpathy A., Khosla A., and Bernstein M.. 2015. ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. 115, 3 (2015), 211252.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. [37] Schmidhuber Jürgen. 1992. Learning to control fast-weight memories: An alternative to dynamic recurrent networks. Neural Comput. 4, 1 (1992), 131139.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. [38] Shin Hanul, Lee Jung Kwon, Kim Jaehong, and Kim Jiwon. 2017. Continual learning with deep generative replay. In Advances in Neural Information Processing Systems, Guyon I., Luxburg U. V., Bengio S., Wallach H., Fergus R., Vishwanathan S., and Garnett R. (Eds.), Vol. 30. 22902999.Google ScholarGoogle Scholar
  39. [39] Tan Zhen, Ding Kaize, Guo Ruocheng, and Liu Huan. 2022. Graph few-shot class-incremental learning. In Proceedings of the 15th ACM International Conference on Web Search and Data Mining. 987996.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. [40] Wang Kai, Liu Xialei, Bagdanov Andrew D., Herranz Luis, Jui Shangling, and Weijer Joost van de. 2022. Incremental meta-learning via episodic replay distillation for few-shot image recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 37293739.Google ScholarGoogle ScholarCross RefCross Ref
  41. [41] Wu Y., Chen Y., Wang L., Ye Y., Liu Z., Guo Y., and Fu Y.. 2019. Large scale incremental learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’19). 374382. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  42. [42] Yang Dongbao, Zhou Yu, Shi Wei, Wu Dayan, and Wang Weiping. 2022. RD-IOD: Two-level residual-distillation-based triple-network for incremental object detection. ACM Trans. Multimedia Comput. Commun. Appl. 18, 1 (2022), 123.Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. [43] Zenke F., Poole B., and Ganguli S.. 2017. Continual learning through synaptic intelligence. Int. Conf. Mach. Learn. 70 (2017), 39873995.Google ScholarGoogle Scholar
  44. [44] Zhang Junting, Zhang Jie, Ghosh Shalini, Li Dawei, Tasci Serafettin, Heck Larry, Zhang Heming, and Kuo C.-C. Jay. 2020. Class-incremental learning via deep model consolidation. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision (WACV’20). IEEE, 11311140. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  45. [45] Zhou Da-Wei, Ye Han-Jia, and Zhan De-Chuan. 2021. Co-transport for class-incremental learning. In Proceedings of the 29th ACM International Conference on Multimedia. 16451654.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Distilled Meta-learning for Multi-Class Incremental Learning

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in

            Full Access

            • Published in

              cover image ACM Transactions on Multimedia Computing, Communications, and Applications
              ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 19, Issue 4
              July 2023
              263 pages
              ISSN:1551-6857
              EISSN:1551-6865
              DOI:10.1145/3582888
              • Editor:
              • Abdulmotaleb El Saddik
              Issue’s Table of Contents

              Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 15 March 2023
              • Online AM: 17 January 2023
              • Accepted: 7 December 2022
              • Revised: 17 October 2022
              • Received: 8 June 2022
              Published in tomm Volume 19, Issue 4

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • research-article

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader

            Full Text

            View this article in Full Text.

            View Full Text

            HTML Format

            View this article in HTML Format .

            View HTML Format