skip to main content
10.1145/3664647.3681164acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Incremental Learning via Robust Parameter Posterior Fusion

Published: 28 October 2024 Publication History

Abstract

The posterior estimation of parameters based on Bayesian theory is a crucial technique in Incremental Learning (IL). The estimated posterior is typically utilized to impose loss regularization, which aligns the current training model parameters with the previously learned posterior to mitigate catastrophic forgetting, a major challenge in IL. However, this additional loss regularization can also impose detriment to the model learning, preventing it from reaching the true global optimum. To overcome this limitation, this paper introduces a novel Bayesian IL framework, Robust Parameter Posterior Fusion (RP2F). Unlike traditional methods, RP2F directly estimates the parameter posterior for new data without introducing extra loss regularization, which allows the model to accommodate new knowledge more sufficiently. It then fuses this new posterior with the existing ones based on the Maximum A Posteriori (MAP) principle, ensuring effective knowledge sharing across tasks. Furthermore, RP2F incorporates a common parameter-robustness priori to facilitate a seamless integration during posterior fusion. Comprehensive experiments on CIFAR-10, CIFAR-100, and Tiny-ImageNet datasets show that RP2F not only effectively mitigates catastrophic forgetting but also achieves backward knowledge transfer.

References

[1]
Rahaf Aljundi, Francesca Babiloni, Mohamed Elhoseiny, Marcus Rohrbach, and Tinne Tuytelaars. 2018. Memory aware synapses: Learning what (not) to forget. In European Conference on Computer Vision. 139--154.
[2]
Elahe Arani, Fahad Sarfraz, and Bahram Zonooz. 2022. Learning fast, learning slow: a general continual learning method based on complementary learning system. In International Conference on Learning Representations. https://openreview.net/forum?id=uxxFrDwrE7Y
[3]
Jacopo Bonato, Francesco Pelosin, Luigi Sabetta, and Alessandro Nicolosi. 2024. MIND: Multi-task incremental network distillation. (2024).
[4]
Lorenzo Bonicelli, Matteo Boschini, Angelo Porrello, Concetto Spampinato, and Simone Calderara. 2022. On the effectiveness of lipschitz-driven rehearsal in continual learning. In Advances in Neural Information Processing Systems.
[5]
Matteo Boschini, Lorenzo Bonicelli, Pietro Buzzega, Angelo Porrello, and Simone Calderara. 2023. Class-incremental continual learning into the eXtended DER-verse. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 45, 5 (2023), 5497--5512.
[6]
Pietro Buzzega, Matteo Boschini, Angelo Porrello, Davide Abati, and SIMONE CALDERARA. 2020. Dark experience for general continual learning: A strong, simple baseline. In Advances in Neural Information Processing Systems, Vol. 33. 15920--15930.
[7]
Lucas Caccia, Rahaf Aljundi, Nader Asadi, Tinne Tuytelaars, Joelle Pineau, and Eugene Belilovsky. 2022. New insights on reducing abrupt representation change in online continual learning. In International Conference on Learning Representations.
[8]
Arslan Chaudhry, Marc'Aurelio Ranzato, Marcus Rohrbach, and Mohamed Elhoseiny. 2019. Efficient lifelong learning with A-GEM. In International Conference on Learning Representations.
[9]
Hung-Jen Chen, An-Chieh Cheng, Da-Cheng Juan, Wei Wei, and Min Sun. 2020. Mitigating Forgetting in Online Continual Learning via Instance-Aware Parameterization. In Advances in Neural Information Processing Systems, Vol. 33. 17466--17477.
[10]
M. Delange, R. Aljundi, M. Masana, S. Parisot, X. Jia, A. Leonardis, G. Slabaugh, and T. Tuytelaars. 2021. A continual learning survey: Defying forgetting in classification tasks. IEEE Transactions on Pattern Analysis and Machine Intelligence (2021).
[11]
Prithviraj Dhar, Rajat Vikram Singh, Kuan-Chuan Peng, Ziyan Wu, and Rama Chellappa. 2019. Learning without memorizing. In IEEE Conference on Computer Vision and Pattern Recognition.
[12]
Arthur Douillard, Matthieu Cord, Charles Ollion, Thomas Robert, and Eduardo Valle. 2020. Podnet: Pooled outputs distillation for small-tasks incremental learning. In European Conference on Computer Vision. 86--102.
[13]
Ruili Feng, Kecheng Zheng, Yukun Huang, Deli Zhao, Michael Jordan, and Zheng-Jun Zha. 2022. Rank diminishing in deep neural networks. In Advances in Neural Information Processing Systems, Vol. 35. 33054--33065.
[14]
Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2020. Generative adversarial networks. Commun. ACM, Vol. 63, 11 (2020), 139--144.
[15]
Ian J Goodfellow, Mehdi Mirza, Da Xiao, Aaron Courville, and Yoshua Bengio. 2013. An empirical investigation of catastrophic forgetting in gradient-based neural networks. arXiv preprint arXiv:1312.6211 (2013).
[16]
Ching-Yi Hung, Cheng-Hao Tu, Cheng-En Wu, Chien-Hung Chen, Yi-Ming Chan, and Chu-Song Chen. 2019. Compacting, picking and growing for unforgetting continual learning. In Advances in Neural Information Processing Systems, Vol. 32. 13647--13657.
[17]
Ta-Chu Kao, Kristopher Jensen, Gido van de Ven, Alberto Bernacchia, and Guillaume Hennequin. 2021. Natural continual learning: success is a journey, not (just) a destination. In Advances in Neural Information Processing Systems, Vol. 34. 28067--28079.
[18]
Sanghwan Kim, Lorenzo Noci, Antonio Orvieto, and Thomas Hofmann. 2023. Achieving a better stability-plasticity trade-Off via auxiliary networks in continual learning. In IEEE Conference on Computer Vision and Pattern Recognition. 11930--11939.
[19]
James Kirkpatrick, Razvan Pascanu, Neil Rabinowitz, Joel Veness, Guillaume Desjardins, Andrei A Rusu, Kieran Milan, John Quan, Tiago Ramalho, Agnieszka Grabska-Barwinska, et al. 2017. Overcoming catastrophic forgetting in neural networks. Proceedings of the National Academy of Sciences, Vol. 114, 13 (2017), 3521--3526.
[20]
Tatsuya Konishi, Mori Kurokawa, Chihiro Ono, Zixuan Ke, Gyuhak Kim, and Bing Liu. 2023. Parameter-level soft-masking for continual learning. In International Conference on Machine Learning, Vol. 202. 17492--17505.
[21]
Alex Krizhevsky. 2012. Learning multiple layers of features from tiny images. University of Toronto (05 2012).
[22]
Sang-Woo Lee, Jin-Hwa Kim, Jaehyun Jun, Jung-Woo Ha, and Byoung-Tak Zhang. 2017. Overcoming catastrophic forgetting by incremental moment matching. In Advances in Neural Information Processing Systems, Vol. 30.
[23]
Xiaorong Li, Shipeng Wang, Jian Sun, and Zongben Xu. 2023. Memory efficient data-free distillation for continual learning. Pattern Recognition, Vol. 144 (2023), 109875.
[24]
Xiaorong Li, Shipeng Wang, Jian Sun, and Zongben Xu. 2023. Variational data-free knowledge distillation for continual learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 45, 10 (2023), 1--17.
[25]
Zhizhong Li and Derek Hoiem. 2017. Learning without forgetting. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 40, 12 (2017), 2935--2947.
[26]
Yu Liu, Sarah Parisot, Gregory Slabaugh, Xu Jia, Ales Leonardis, and Tinne Tuytelaars. 2020. More classifiers, less forgetting: A generic multi-classifier paradigm for incremental learning. In European Conference on Computer Vision. 699--716.
[27]
David Lopez-Paz and Marctextquotesingle Aurelio Ranzato. 2017. Gradient episodic memory for continual learning. In Advances in Neural Information Processing Systems, Vol. 30. 6467--6476.
[28]
David J. C. MacKay. 1992. A practical bayesian framework for backpropagation networks. Neural Computation, Vol. 4, 3 (05 1992), 448--472.
[29]
Simone Magistri, Tomaso Trinci, Albin Soutif, Joost van de Weijer, and Andrew D. Bagdanov. 2024. Elastic feature consolidation for cold start exemplar-free incremental learning. In International Conference on Learning Representations.
[30]
Zheda Mai, Ruiwen Li, Jihwan Jeong, David Quispe, Hyunwoo Kim, and Scott Sanner. 2022. Online continual learning in image classification: An empirical survey. Neurocomputing, Vol. 469 (2022), 28--51.
[31]
Arun Mallya and Svetlana Lazebnik. 2018. Packnet: Adding multiple tasks to a single network by iterative pruning. In IEEE Conference on Computer Vision and Pattern Recognition. 7765--7773.
[32]
Michael McCloskey and Neal J Cohen. 1989. Catastrophic interference in connectionist networks: The sequential learning problem. In Psychology of Learning and Motivation. Vol. 24. Elsevier, 109--165.
[33]
Zichen Miao, Ze Wang, Wei Chen, and Qiang Qiu. 2022. Continual learning with filter atom swapping. In International Conference on Learning Representations.
[34]
Cuong V. Nguyen, Yingzhen Li, Thang D. Bui, and Richard E. Turner. 2018. Variational continual learning. In International Conference on Learning Representations.
[35]
Quang Pham, Chenghao Liu, and Steven Hoi. 2021. DualNet: continual learning, fast and slow. In Advances in Neural Information Processing Systems, Vol. 34. 16131--16144.
[36]
Hippolyt Ritter, Aleksandar Botev, and David Barber. 2018. Online structured laplace approximations for overcoming catastrophic forgetting. In Advances in Neural Information Processing Systems, Vol. 31.
[37]
Grzegorz Rypeść, Sebastian Cygert, Valeriya Khan, Tomasz Trzcinski, Bartosz Michał Zieliński, and Bartłomiej Twardowski. 2024. Divide and not forget: Ensemble of selectively trained experts in continual learning. In International Conference on Learning Representations.
[38]
Gobinda Saha, Isha Garg, and Kaushik Roy. 2021. Gradient projection memory for continual learning. In International Conference on Learning Representations.
[39]
Fahad Sarfraz, Elahe Arani, and Bahram Zonooz. 2023. Sparse coding in a dual memory system for lifelong learning. In AAAI Conference on Artificial Intelligence, Vol. 37. 9714--9722.
[40]
Dayana Savostianova, Emanuele Zangrando, Gianluca Ceruti, and Francesco Tudisco. 2023. Robust low-rank training via approximate orthonormal constraints. In Advances in Neural Information Processing Systems, Vol. 36. 66064--66083.
[41]
Tom Schaul, Sixin Zhang, and Yann LeCun. 2013. No more pesky learning rates. In International Conference on Machine Learning, Vol. 28. 343--351.
[42]
Tom Schaul, Sixin Zhang, and Yann LeCun. 2013. No more pesky learning rates. In International Conference on Machine Learning, Vol. 28. 343--351.
[43]
Jonathan Schwarz, Wojciech Czarnecki, Jelena Luketina, Agnieszka Grabska-Barwinska, Yee Whye Teh, Razvan Pascanu, and Raia Hadsell. 2018. Progress & compress: A scalable framework for continual learning. In International Conference on Machine Learning. 4528--4537.
[44]
Wuxuan Shi and Mang Ye. 2023. Prototype reminiscence and augmented asymmetric knowledge aggregation for non-exemplar class-incremental learning. In IEEE International Conference on Computer Vision. 1772--1781.
[45]
Yujun Shi, Kuangqi Zhou, Jian Liang, Zihang Jiang, Jiashi Feng, Philip H.S. Torr, Song Bai, and Vincent Y. F. Tan. 2022. Mimicking the oracle: An initial phase decorrelation approach for class incremental learning. In IEEE Conference on Computer Vision and Pattern Recognition. 16722--16731.
[46]
Hanul Shin, Jung Kwon Lee, Jaehong Kim, and Jiwon Kim. 2017. Continual learning with deep generative replay. In Advances in Neural Information Processing Systems, Vol. 30.
[47]
James Smith, Yen-Chang Hsu, Jonathan Balloch, Yilin Shen, Hongxia Jin, and Zsolt Kira. 2021. Always be dreaming: A new approach for data-free class-incremental learning. In IEEE International Conference on Computer Vision.
[48]
Stanford. 2015. Tiny imageNet challenge (CS231n). http://tiny-imagenet.herokuapp.com/. (2015). http://tiny-imagenet.herokuapp.com/
[49]
Wenju Sun, Qingyong Li, Wen Wang, and Yangli-ao Geng. 2023. Towards plastic and stable exemplar-free incremental learning: A dual-learner framework with cumulative parameter averaging. arXiv preprint arXiv:2310.18639 (2023).
[50]
Wenju Sun, Qingyong Li, Jing Zhang, Danyu Wang, Wen Wang, and YangLi ao Geng. 2023. Exemplar-free class incremental learning via discriminative and comparable parallel one-class classifiers. Pattern Recognition, Vol. 140 (2023), 109561.
[51]
Wenju Sun, Qingyong Li, Jing Zhang, Wen Wang, and Yangli-ao Geng. 2023. Decoupling learning and remembering: A bilevel memory rramework with knowledge projection for task-incremental learning. In IEEE Conference on Computer Vision and Pattern Recognition.
[52]
Wenju Sun, Jing Zhang, Danyu Wang, Yangli-ao Geng, and Qingyong Li. 2021. ILCOC: An incremental learning framework based on contrastive one-class classifiers. In IEEE Conference on Computer Vision and Pattern Recognition Workshops. 3580--3588.
[53]
Filip Szatkowski, Mateusz Pyla, Marcin Przewikęźlikowski, Sebastian Cygert, Bartłomiej Twardowski, and Tomasz Trzciński. 2024. Adapt your teacher: Improving knowledge distillation for exemplar-free continual learning. In IEEE Winter Conference on Applications of Computer Vision. 1977--1987.
[54]
Xiaoyu Tao, Xinyuan Chang, Xiaopeng Hong, Xing Wei, and Yihong Gong. 2020. Topology-preserving class-incremental learning. In European Conference on Computer Vision. Springer, 254--270.
[55]
Gido M Van de Ven and Andreas S Tolias. 2019. Three scenarios for continual learning. arXiv preprint arXiv:1904.07734 (2019).
[56]
Vinay Kumar Verma, Kevin J Liang, Nikhil Mehta, Piyush Rai, and Lawrence Carin. 2021. Efficient feature transformations for discriminative and generative continual learning. In IEEE Conference on Computer Vision and Pattern Recognition. 13865--13875.
[57]
Liyuan Wang, Mingtian Zhang, Zhongfan Jia, Qian Li, Chenglong Bao, Kaisheng Ma, Jun Zhu, and Yi Zhong. 2021. AFEC: Active forgetting of negative transfer in continual learning. In Advances in Neural Information Processing Systems, Vol. 34. 22379--22391.
[58]
Shipeng Wang, Xiaorong Li, Jian Sun, and Zongben Xu. 2021. Training networks in null space of feature covariance for continual learning. In IEEE Conference on Computer Vision and Pattern Recognition. 184--193.
[59]
Ju Xu and Zhanxing Zhu. 2018. Reinforced continual learning. In Advances in Neural Information Processing Systems, Vol. 31. 907--916.
[60]
Hongxu Yin, Pavlo Molchanov, Jose M. Alvarez, Zhizhong Li, Arun Mallya, Derek Hoiem, Niraj K. Jha, and Jan Kautz. 2020. Dreaming to distill: data-free knowledge transfer via deepinversion. In IEEE Conference on Computer Vision and Pattern Recognition.
[61]
Jaehong Yoon, Eunho Yang, Jeongtae Lee, and Sung Ju Hwang. 2018. Lifelong learning with dynamically expandable networks. In International Conference on Learning Representations.
[62]
Guanxiong Zeng, Yang Chen, Bo Cui, and Shan Yu. 2019. Continual learning of context-dependent processing in neural networks. Nature Machine Intelligence, Vol. 1, 8 (2019), 364--372.
[63]
Friedemann Zenke, Ben Poole, and Surya Ganguli. 2017. Continual learning through synaptic intelligence. In International Conference on Machine Learning. 3987--3995.
[64]
Fei Zhu, Zhen Cheng, Xu-yao Zhang, and Cheng-lin Liu. 2021. Class-incremental learning via dual augmentation. In Advances in Neural Information Processing Systems, Vol. 34. 14306--14318.
[65]
Fei Zhu, Xu-Yao Zhang, Chuang Wang, Fei Yin, and Cheng-Lin Liu. 2021. Prototype augmentation and self-supervision for incremental learning. In IEEE Conference on Computer Vision and Pattern Recognition. 5871--5880.
[66]
Kai Zhu, Wei Zhai, Yang Cao, Jiebo Luo, and Zheng-Jun Zha. 2022. Self-sustaining representation expansion for non-exemplar class-incremental learning. In IEEE Conference on Computer Vision and Pattern Recognition. 9296--9305.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MM '24: Proceedings of the 32nd ACM International Conference on Multimedia
October 2024
11719 pages
ISBN:9798400706868
DOI:10.1145/3664647
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 October 2024

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. bayesian theory
  2. catastrophic forgetting
  3. incremental learning
  4. lifelong learning

Qualifiers

  • Research-article

Funding Sources

Conference

MM '24
Sponsor:
MM '24: The 32nd ACM International Conference on Multimedia
October 28 - November 1, 2024
Melbourne VIC, Australia

Acceptance Rates

MM '24 Paper Acceptance Rate 1,150 of 4,385 submissions, 26%;
Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 51
    Total Downloads
  • Downloads (Last 12 months)51
  • Downloads (Last 6 weeks)14
Reflects downloads up to 13 Feb 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media