research-article

Distilled Meta-learning for Multi-Class Incremental Learning

Authors:
Hao Liu

China University of Mining and Technology, Jiangsu, China

China University of Mining and Technology, Jiangsu, China

0000-0001-6728-1773
View Profile

,
Zhaoyu Yan

China University of Mining and Technology, Jiangsu, China

China University of Mining and Technology, Jiangsu, China

0000-0001-9124-3048
View Profile

,
Bing Liu

China University of Mining and Technology, Jiangsu, China

China University of Mining and Technology, Jiangsu, China

0000-0002-2365-6606
View Profile

,
Jiaqi Zhao

China University of Mining and Technology, Jiangsu, China

China University of Mining and Technology, Jiangsu, China

0000-0002-3564-5090
View Profile

,
Yong Zhou

China University of Mining and Technology, Jiangsu, China

China University of Mining and Technology, Jiangsu, China

0000-0001-6207-0299
View Profile

,
Abdulmotaleb El Saddik

University of Ottawa, Ontario, Canada

University of Ottawa, Ontario, Canada

0000-0002-7690-8547
View Profile

ACM Transactions on Multimedia Computing, Communications, and Applications Volume 19 Issue 4Article No.: 149pp 1–16https://doi.org/10.1145/3576045

Published:15 March 2023Publication History

ACM Transactions on Multimedia Computing, Communications, and Applications

Abstract

Meta-learning approaches have recently achieved promising performance in multi-class incremental learning. However, meta-learners still suffer from catastrophic forgetting, i.e., they tend to forget the learned knowledge from the old tasks when they focus on rapidly adapting to the new classes of the current task. To solve this problem, we propose a novel distilled meta-learning (DML) framework for multi-class incremental learning that integrates seamlessly meta-learning with knowledge distillation in each incremental stage. Specifically, during inner-loop training, knowledge distillation is incorporated into the DML to overcome catastrophic forgetting. During outer-loop training, a meta-update rule is designed for the meta-learner to learn across tasks and quickly adapt to new tasks. By virtue of the bilevel optimization, our model is encouraged to reach a balance between the retention of old knowledge and the learning of new knowledge. Experimental results on four benchmark datasets demonstrate the effectiveness of our proposal and show that our method significantly outperforms other state-of-the-art incremental learning methods.

REFERENCES

[1] Aljundi Rahaf, Babiloni Francesca, Elhoseiny Mohamed, Rohrbach Marcus, and Tuytelaars Tinne. 2018. Memory aware synapses: Learning what (not) to forget. In Proceedings of the European Conference on Computer Vision (ECCV’18), Ferrari Vittorio, Hebert Martial, Sminchisescu Cristian, and Weiss Yair (Eds.). Springer International Publishing, Cham, 144–161. Google ScholarDigital Library
[2] Bengio Yoshua, Bengio Samy, and Cloutier Jocelyn. 1990. Learning a Synaptic Learning Rule. Citeseer.Google Scholar
[3] Castro F. M., Marín-Jiménez M. J., Guil N., Schmid C., and Alahari K.. 2018. End-to-end incremental learning. In Proceedings of the European Conference on Computer Vision. 241–257.Google ScholarDigital Library
[4] Chaudhry Arslan, Dokania Puneet K., Ajanthan Thalaiyasingam, and Torr Philip H. S.. 2018. Riemannian walk for incremental learning: Understanding forgetting and intransigence. In Proceedings of the European Conference on Computer Vision (ECCV’18). 532–547.Google ScholarDigital Library
[5] Cheng Meng, Wang Hanli, and Long Yu. 2021. Meta-learning-based incremental few-shot object detection. IEEE Trans Circ. Syst. Vid. Technol. 32, 4 (2021), 2158–2169.Google ScholarCross Ref
[6] Chi Zhixiang, Gu Li, Liu Huan, Wang Yang, Yu Yuanhao, and Tang Jin. 2022. MetaFSCIL: A meta-learning approach for few-shot class incremental learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 14166–14175.Google ScholarCross Ref
[7] Cichon J. and Gan Wen Biao. 2015. Branch-specific dendritic Ca2+ spikes cause persistent synaptic plasticity. Nature 520, 7546 (2015), 180–185.Google ScholarCross Ref
[8] Lopez-Paz M.-A. Ranzato and D.. 2017. Gradient episodic memory for continual learning. In Advances in Neural Information Processing Systems. 6470–6479.Google Scholar
[9] Dhar P., Singh R. V., Peng K., Wu Z., and Chellappa R.. 2019. Learning without memorizing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’19). 5133–5141. DOI:Google ScholarCross Ref
[10] Ditzler Gregory, Roveri Manuel, Alippi Cesare, and Polikar Robi. 2015. Learning in nonstationary environments: A survey. IEEE Comput. Intell. Mag. 10, 4 (2015), 12–25.Google ScholarDigital Library
[11] Douillard Arthur, Chen Yifu, Dapogny Arnaud, and Cord Matthieu. 2021. PLOP: Learning without forgetting for continual semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’21). 4040–4050.Google ScholarCross Ref
[12] Hinton G., Vinyals O., and Dean J.. 2015. Distilling the knowledge in a neural network. Comput. Sci. 14, 7 (2015), 38–39.Google Scholar
[13] Hou Saihui, Pan Xinyu, Loy Chen Change, Wang Zilei, and Lin Dahua. 2019. Learning a unified classifier incrementally via rebalancing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’19). 831–839. DOI:Google ScholarCross Ref
[14] Javed Khurram and White Martha. 2019. Meta-learning representations for continual learning. In Advances in Neural Information Processing Systems, Vol. 32. 1–15.Google Scholar
[15] Kirkpatrick James, Pascanu Razvan, Rabinowitz Neil, Veness Joel, Desjardins Guillaume, Rusu Andrei A., Milan Kieran, Quan John, Ramalho Tiago, Grabska-Barwinska Agnieszka, Hassabis Demis, Clopath Claudia, Kumaran Dharshan, and Hadsell Raia. 2017. Overcoming catastrophic forgetting in neural networks. Proc. Natl. Acad. Sci. U.S.A. 114, 13 (March2017), 3521–3526. DOI:Google ScholarCross Ref
[16] Kj Joseph, Rajasegaran Jathushan, Khan Salman, Khan Fahad Shahbaz, and Balasubramanian Vineeth N.. 2021. Incremental object detection via meta-learning. IEEE Trans. Pattern Anal. Mach. Intell. (2021), 1–11.Google Scholar
[17] Krizhevsky A. and Hinton G.. 2009. Learning multiple layers of features from tiny images. Computer Science Department, University of Toronto, Technical Report 1 (012009).Google Scholar
[18] Lange S. and Grieser G.. 2002. On the power of incremental learning. Theor. Comput. Sci. 2, 288 (2002), 277–307.Google ScholarDigital Library
[19] Lecun Y. and Cortes C.. 2010. The mnist Database of Handwritten Digits. Retrieved from http://yann.lecun.com/exdb/mnist/.Google Scholar
[20] Li Z. and Hoiem D.. 2017. Learning without forgetting. IEEE Trans. Pattern Anal. Mach. Intell. 40, 12 (2017), 2935–2947.Google ScholarDigital Library
[21] Liu Liyuan, Jiang Haoming, He Pengcheng, Chen Weizhu, Liu Xiaodong, Gao Jianfeng, and Han Jiawei. 2019. On the variance of the adaptive learning rate and beyond. arXiv:1908.03265. Retrieved from https://arxiv.org/abs/1908.03265.Google Scholar
[22] Liu X., Masana M., Herranz L., Weijer J. Van de, López A. M., and Bagdanov A. D.. 2018. Rotate your Networks: Better weight consolidation and less catastrophic forgetting. In Proceedings of the 24th International Conference on Pattern Recognition (ICPR’18). 2262–2268. DOI:Google ScholarCross Ref
[23] Liu Yu, Hong Xiaopeng, Tao Xiaoyu, Dong Songlin, Shi Jingang, and Gong Yihong. 2021. Structural knowledge organization and transfer for class-incremental learning. In Proceedings of the ACM Multimedia Asia (MMAsia’21). Association for Computing Machinery, New York, NY, Article 18, 7 pages. DOI:Google ScholarDigital Library
[24] Liu Yaoyao, Schiele Bernt, and Sun Qianru. 2021. Adaptive aggregation networks for class-incremental learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2544–2553.Google ScholarCross Ref
[25] Liu Yaoyao, Su Yuting, Liu An-An, Schiele Bernt, and Sun Qianru. 2020. Mnemonics Training: Multi-class incremental learning without forgetting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’20). IEEE, 12245–12254. DOI:Google ScholarCross Ref
[26] McCloskey Michael and Cohen Neal J.. 1989. Catastrophic interference in connectionist networks: The sequential learning problem. In Psychology of Learning and Motivation. Elsevier, 109–165. DOI:Google ScholarCross Ref
[27] Mermillod Martial, Bugaiska Aurélia, and BONIN Patrick. 2013. The stability-plasticity dilemma: Investigating the continuum from catastrophic forgetting to age-limited learning effects. Front. Psychol. 4, 504 (2013), 1–3. DOI:Google ScholarCross Ref
[28] Muhlbaier M. D., Topalis A., and Polikar R.. 2009. Learn\(^{++}\).NC: Combining ensemble of classifiers with dynamically weighted consult-and-vote for efficient incremental learning of new classes. IEEE Trans. Neural Netw. 20, 1 (2009), 152–168.Google ScholarDigital Library
[29] Netzer Yuval, Wang Tao, Coates Adam, Bissacco Alessandro, Wu Bo, and Ng Andrew Y.. 2011. Reading digits in natural images with unsupervised feature learning. In NIPS Workshop on Deep Learning and Unsupervised Feature Learning 2011. 1–9.Google Scholar
[30] Nichol Alex, Achiam Joshua, and Schulman John. 2018. On first-order meta-learning algorithms. arXiv:1803.02999. Retrieved from https://arxiv.org/abs/1803.02999.Google Scholar
[31] Nichol Alex and Schulman John. 2018. Reptile: A scalable metalearning algorithm. arXiv:1803.02999. Retrieved from https://arxiv.org/abs/1803.02999.Google Scholar
[32] Rajasegaran Jathushan, Hayat Munawar, Khan Salman, Khan Fahad Shahbaz, and Shao Ling. 2019. Random path selection for incremental learning. Advances in Neural Information Processing Systems (2019), 1–11.Google Scholar
[33] Rajasegaran Jathushan, Khan Salman, Hayat Munawar, Khan Fahad Shahbaz, and Shah Mubarak. 2020. iTAML: An incremental task-agnostic meta-learning approach. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’20). IEEE, 13588–13597. DOI:Google ScholarCross Ref
[34] Rebuffi S., Kolesnikov A., Sperl G., and Lampert C. H.. 2017. iCaRL: Incremental classifier and representation learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). 5533–5542. DOI:Google ScholarCross Ref
[35] Ruping S.. 2001. Incremental learning with support vector machines. In Proceedings of the IEEE International Conference on Data Mining. 641–642. DOI:Google ScholarCross Ref
[36] Russakovsky O., Deng J., Su H., Krause J., Satheesh S., Ma S., Huang Z., Karpathy A., Khosla A., and Bernstein M.. 2015. ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. 115, 3 (2015), 211–252.Google ScholarDigital Library
[37] Schmidhuber Jürgen. 1992. Learning to control fast-weight memories: An alternative to dynamic recurrent networks. Neural Comput. 4, 1 (1992), 131–139.Google ScholarDigital Library
[38] Shin Hanul, Lee Jung Kwon, Kim Jaehong, and Kim Jiwon. 2017. Continual learning with deep generative replay. In Advances in Neural Information Processing Systems, Guyon I., Luxburg U. V., Bengio S., Wallach H., Fergus R., Vishwanathan S., and Garnett R. (Eds.), Vol. 30. 2290–2999.Google Scholar
[39] Tan Zhen, Ding Kaize, Guo Ruocheng, and Liu Huan. 2022. Graph few-shot class-incremental learning. In Proceedings of the 15th ACM International Conference on Web Search and Data Mining. 987–996.Google ScholarDigital Library
[40] Wang Kai, Liu Xialei, Bagdanov Andrew D., Herranz Luis, Jui Shangling, and Weijer Joost van de. 2022. Incremental meta-learning via episodic replay distillation for few-shot image recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3729–3739.Google ScholarCross Ref
[41] Wu Y., Chen Y., Wang L., Ye Y., Liu Z., Guo Y., and Fu Y.. 2019. Large scale incremental learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’19). 374–382. DOI:Google ScholarCross Ref
[42] Yang Dongbao, Zhou Yu, Shi Wei, Wu Dayan, and Wang Weiping. 2022. RD-IOD: Two-level residual-distillation-based triple-network for incremental object detection. ACM Trans. Multimedia Comput. Commun. Appl. 18, 1 (2022), 1–23.Google ScholarDigital Library
[43] Zenke F., Poole B., and Ganguli S.. 2017. Continual learning through synaptic intelligence. Int. Conf. Mach. Learn. 70 (2017), 3987–3995.Google Scholar
[44] Zhang Junting, Zhang Jie, Ghosh Shalini, Li Dawei, Tasci Serafettin, Heck Larry, Zhang Heming, and Kuo C.-C. Jay. 2020. Class-incremental learning via deep model consolidation. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision (WACV’20). IEEE, 1131–1140. DOI:Google ScholarCross Ref
[45] Zhou Da-Wei, Ye Han-Jia, and Zhan De-Chuan. 2021. Co-transport for class-incremental learning. In Proceedings of the 29th ACM International Conference on Multimedia. 1645–1654.Google ScholarDigital Library

Index Terms

Distilled Meta-learning for Multi-Class Incremental Learning
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision tasks
  2. Machine learning
    1. Learning paradigms
      1. Multi-task learning
        Lifelong machine learning
2. Networks
  1. Network algorithms
  2. Network performance evaluation
    1. Network experimentation
    2. Network performance analysis

Recommendations

Incremental Learning Based on Dual-Branch Network
Pattern Recognition and Computer Vision
Abstract
Incremental learning aims to overcome catastrophic forgetting. When the model learns multiple tasks sequentially, due to the imbalance of new and old classes numbers, the knowledge of old classes stored in the model is destroyed by large number of ...
Read More
Incremental learning with neural networks for computer vision: a survey
Abstract
Incremental learning is one of the most important abilities of human beings. In the age of artificial intelligence, it is the key task to make neural network models as powerful as human beings, to achieve the ability to continuously acquire, fine-...
Read More
Incremental learning by heterogeneous bagging ensemble
ADMA'10: Proceedings of the 6th international conference on Advanced data mining and applications - Volume Part II

Classifier ensemble is a main direction of incremental learning researches, and many ensemble-based incremental learning methods have been presented. Among them, Learn++, which is derived from the famous ensemble algorithm, AdaBoost, is special. Learn++ ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Multimedia Computing, Communications, and Applications Volume 19, Issue 4
July 2023
263 pages
ISSN:1551-6857
EISSN:1551-6865
DOI:10.1145/3582888
Editor:
Abdulmotaleb El Saddik
Mohamed Bin Zayed University of Artificial Intelligence, UAE and University of Ottawa, Canada
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 15 March 2023
- Online AM: 17 January 2023
- Accepted: 7 December 2022
- Revised: 17 October 2022
- Received: 8 June 2022
Published in tomm Volume 19, Issue 4

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Incremental learning
meta-learning
knowledge distillation
catastrophic forgetting
stability–plasticity dilemma
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 1
  Total Citations
  View Citations
- 560
  Total Downloads
- Downloads (Last 12 months)366
- Downloads (Last 6 weeks)34
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

View Full Text

HTML Format

View this article in HTML Format .

View HTML Format

Distilled Meta-learning for Multi-Class Incremental Learning

ACM Transactions on Multimedia Computing, Communications, and Applications

Abstract

REFERENCES

Cited By

Index Terms

Recommendations

Incremental Learning Based on Dual-Branch Network

Incremental learning with neural networks for computer vision: a survey

Incremental learning by heterogeneous bagging ensemble

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Full Text

HTML Format

Caption

Distilled Meta-learning for Multi-Class Incremental Learning

ACM Transactions on Multimedia Computing, Communications, and Applications

Abstract

REFERENCES

Cited By

Index Terms

Recommendations

Incremental Learning Based on Dual-Branch Network

Incremental learning with neural networks for computer vision: a survey

Incremental learning by heterogeneous bagging ensemble

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Full Text

HTML Format

Share this Publication link

Share on Social Media