skip to main content
10.1145/3628797.3629000acmotherconferencesArticle/Chapter ViewAbstractPublication PagessoictConference Proceedingsconference-collections
research-article

Sharpness and Gradient Aware Minimization for Memory-based Continual Learning

Published:07 December 2023Publication History

ABSTRACT

Memory-based Continual Learning methods (CL) preserve performance on old data by storing a small buffer of seen samples to re-learn with current data. Despite their impressive results, these methods may still obtain sub-optimal solutions as a result of overfitting training data, especially on the limited buffer. This can be attributed to their employment of empirical risk minimization over training data. To overcome this problem, we leverage Sharpness Aware Minimization (SAM), a recently proposed training technique, to improve models’ generalization, and thus CL performance. In particular, SAM seeks for flat minima whose neighbors’ loss values are also low by simultaneously guiding a model towards SAM gradient direction corresponding to low-loss regions and flat regions. However, we conjecture that directly applying SAM to replay-based CL methods whose loss function contains multiple objectives may cause gradient conflict among them. We then propose to manipulate each objective’s SAM gradient such that their potential conflict is minimized by adopting one gradient aggregation strategy from Multi-task Learning. Finally, through extensive experiments, we empirically verify our hypothesis and show consistent improvements in our method over strong memory-replay baselines.

References

  1. Lorenzo Bonicelli, Matteo Boschini, Angelo Porrello, Concetto Spampinato, and Simone Calderara. 2022. On the effectiveness of lipschitz-driven rehearsal in continual learning. Advances in Neural Information Processing Systems 35 (2022), 31886–31901.Google ScholarGoogle Scholar
  2. Pietro Buzzega, Matteo Boschini, Angelo Porrello, Davide Abati, and Simone Calderara. 2020. Dark experience for general continual learning: a strong, simple baseline. Advances in neural information processing systems 33 (2020), 15920–15930.Google ScholarGoogle Scholar
  3. Sungmin Cha, Hsiang Hsu, Taebaek Hwang, Flavio P Calmon, and Taesup Moon. 2020. CPR: classifier-projection regularization for continual learning. arXiv preprint arXiv:2006.07326 (2020).Google ScholarGoogle Scholar
  4. Arslan Chaudhry, Albert Gordo, Puneet Dokania, Philip Torr, and David Lopez-Paz. 2021. Using hindsight to anchor past knowledge in continual learning. In Proceedings of the AAAI conference on artificial intelligence, Vol. 35. 6993–7001.Google ScholarGoogle ScholarCross RefCross Ref
  5. Arslan Chaudhry, Marc’Aurelio Ranzato, Marcus Rohrbach, and Mohamed Elhoseiny. 2018. Efficient lifelong learning with a-gem. arXiv preprint arXiv:1812.00420 (2018).Google ScholarGoogle Scholar
  6. Arslan Chaudhry, Marcus Rohrbach, Mohamed Elhoseiny, Thalaiyasingam Ajanthan, Puneet K Dokania, Philip HS Torr, and Marc’Aurelio Ranzato. 2019. On tiny episodic memories in continual learning. arXiv preprint arXiv:1902.10486 (2019).Google ScholarGoogle Scholar
  7. Danruo Deng, Guangyong Chen, Jianye Hao, Qiong Wang, and Pheng-Ann Heng. 2021. Flattening sharpness for dynamic gradient projection memory benefits continual learning. Advances in Neural Information Processing Systems 34 (2021), 18710–18721.Google ScholarGoogle Scholar
  8. Pierre Foret, Ariel Kleiner, Hossein Mobahi, and Behnam Neyshabur. 2020. Sharpness-aware minimization for efficiently improving generalization. arXiv preprint arXiv:2010.01412 (2020).Google ScholarGoogle Scholar
  9. Pierre Foret, Ariel Kleiner, Hossein Mobahi, and Behnam Neyshabur. 2021. Sharpness-aware Minimization for Efficiently Improving Generalization. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net. https://openreview.net/forum?id=6Tm1mposlrMGoogle ScholarGoogle Scholar
  10. Robert M French. 1999. Catastrophic forgetting in connectionist networks. Trends in cognitive sciences 3, 4 (1999), 128–135.Google ScholarGoogle Scholar
  11. Saihui Hou, Xinyu Pan, Chen Change Loy, Zilei Wang, and Dahua Lin. 2019. Learning a unified classifier incrementally via rebalancing. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 831–839.Google ScholarGoogle ScholarCross RefCross Ref
  12. Nitish Shirish Keskar, Dheevatsa Mudigere, Jorge Nocedal, Mikhail Smelyanskiy, and Ping Tak Peter Tang. 2016. On large-batch training for deep learning: Generalization gap and sharp minima. arXiv preprint arXiv:1609.04836 (2016).Google ScholarGoogle Scholar
  13. James Kirkpatrick, Razvan Pascanu, Neil Rabinowitz, Joel Veness, Guillaume Desjardins, Andrei A Rusu, Kieran Milan, John Quan, Tiago Ramalho, Agnieszka Grabska-Barwinska, 2017. Overcoming catastrophic forgetting in neural networks. Proceedings of the national academy of sciences 114, 13 (2017), 3521–3526.Google ScholarGoogle ScholarCross RefCross Ref
  14. Lilly Kumari, Shengjie Wang, Tianyi Zhou, and Jeff A Bilmes. 2022. Retrospective adversarial replay for continual learning. Advances in Neural Information Processing Systems 35 (2022), 28530–28544.Google ScholarGoogle Scholar
  15. Zhizhong Li and Derek Hoiem. 2017. Learning without forgetting. IEEE transactions on pattern analysis and machine intelligence 40, 12 (2017), 2935–2947.Google ScholarGoogle Scholar
  16. David Lopez-Paz and Marc’Aurelio Ranzato. 2017. Gradient episodic memory for continual learning. Advances in neural information processing systems 30 (2017).Google ScholarGoogle Scholar
  17. Kaifeng Lyu, Zhiyuan Li, and Sanjeev Arora. 2022. Understanding the generalization benefit of normalization layers: Sharpness reduction. Advances in Neural Information Processing Systems 35 (2022), 34689–34708.Google ScholarGoogle Scholar
  18. Marc Masana, Xialei Liu, Bartłomiej Twardowski, Mikel Menta, Andrew D Bagdanov, and Joost Van De Weijer. 2022. Class-incremental learning: survey and performance evaluation on image classification. IEEE Transactions on Pattern Analysis and Machine Intelligence 45, 5 (2022), 5513–5533.Google ScholarGoogle Scholar
  19. Seyed Iman Mirzadeh, Mehrdad Farajtabar, Dilan Gorur, Razvan Pascanu, and Hassan Ghasemzadeh. 2020. Linear mode connectivity in multitask and continual learning. arXiv preprint arXiv:2010.04495 (2020).Google ScholarGoogle Scholar
  20. Seyed Iman Mirzadeh, Mehrdad Farajtabar, Razvan Pascanu, and Hassan Ghasemzadeh. 2020. Understanding the role of training regimes in continual learning. Advances in Neural Information Processing Systems 33 (2020), 7308–7320.Google ScholarGoogle Scholar
  21. Henning Petzka, Michael Kamp, Linara Adilova, Cristian Sminchisescu, and Mario Boley. 2021. Relative flatness and generalization. Advances in neural information processing systems 34 (2021), 18420–18432.Google ScholarGoogle Scholar
  22. Sylvestre-Alvise Rebuffi, Alexander Kolesnikov, Georg Sperl, and Christoph H Lampert. 2017. icarl: Incremental classifier and representation learning. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. 2001–2010.Google ScholarGoogle ScholarCross RefCross Ref
  23. Andrei A Rusu, Neil C Rabinowitz, Guillaume Desjardins, Hubert Soyer, James Kirkpatrick, Koray Kavukcuoglu, Razvan Pascanu, and Raia Hadsell. 2016. Progressive neural networks. arXiv preprint arXiv:1606.04671 (2016).Google ScholarGoogle Scholar
  24. Gobinda Saha, Isha Garg, and Kaushik Roy. 2021. Gradient projection memory for continual learning. arXiv preprint arXiv:2103.09762 (2021).Google ScholarGoogle Scholar
  25. Joan Serra, Didac Suris, Marius Miron, and Alexandros Karatzoglou. 2018. Overcoming Catastrophic Forgetting with Hard Attention to the Task. In Proceedings of the 35th International Conference on Machine Learning(Proceedings of Machine Learning Research, Vol. 80), Jennifer Dy and Andreas Krause (Eds.). PMLR, 4548–4557. https://proceedings.mlr.press/v80/serra18a.htmlGoogle ScholarGoogle Scholar
  26. Guangyuan Shi, Jiaxin Chen, Wenlong Zhang, Li-Ming Zhan, and Xiao-Ming Wu. 2021. Overcoming catastrophic forgetting in incremental few-shot learning by finding flat minima. Advances in neural information processing systems 34 (2021), 6747–6761.Google ScholarGoogle Scholar
  27. Hanul Shin, Jung Kwon Lee, Jaehong Kim, and Jiwon Kim. 2017. Continual Learning with Deep Generative Replay. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA, Isabelle Guyon, Ulrike von Luxburg, Samy Bengio, Hanna M. Wallach, Rob Fergus, S. V. N. Vishwanathan, and Roman Garnett (Eds.). 2990–2999. https://proceedings.neurips.cc/paper/2017/hash/0efbe98067c6c73dba1250d2beaa81f9-Abstract.htmlGoogle ScholarGoogle Scholar
  28. Tianhe Yu, Saurabh Kumar, Abhishek Gupta, Sergey Levine, Karol Hausman, and Chelsea Finn. 2020. Gradient surgery for multi-task learning. Advances in Neural Information Processing Systems 33 (2020), 5824–5836.Google ScholarGoogle Scholar
  29. Friedemann Zenke, Ben Poole, and Surya Ganguli. 2017. Continual Learning through Synaptic Intelligence. In Proceedings of the 34th International Conference on Machine Learning - Volume 70 (Sydney, NSW, Australia) (ICML’17). JMLR.org, 3987–3995.Google ScholarGoogle Scholar
  30. Yang Zhao, Hao Zhang, and Xiuyuan Hu. 2022. Penalizing Gradient Norm for Efficiently Improving Generalization in Deep Learning. In International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA(Proceedings of Machine Learning Research, Vol. 162), Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvári, Gang Niu, and Sivan Sabato (Eds.). PMLR, 26982–26992. https://proceedings.mlr.press/v162/zhao22i.htmlGoogle ScholarGoogle Scholar

Index Terms

  1. Sharpness and Gradient Aware Minimization for Memory-based Continual Learning

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      SOICT '23: Proceedings of the 12th International Symposium on Information and Communication Technology
      December 2023
      1058 pages
      ISBN:9798400708916
      DOI:10.1145/3628797

      Copyright © 2023 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 7 December 2023

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed limited

      Acceptance Rates

      Overall Acceptance Rate147of318submissions,46%
    • Article Metrics

      • Downloads (Last 12 months)73
      • Downloads (Last 6 weeks)5

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format