research-article

Sharpness and Gradient Aware Minimization for Memory-based Continual Learning

Authors:
Lam Tran Tung

The School of Informations and Communication Techniology, HUST, Viet Nam

The School of Informations and Communication Techniology, HUST, Viet Nam

0009-0001-7881-2143
View Profile

,
Viet Nguyen Van

The School of Informations and Communication Techniology, HUST, Viet Nam

The School of Informations and Communication Techniology, HUST, Viet Nam

0009-0009-5408-0724
View Profile

,
Phi Nguyen Hoang

The School of Informations and Communication Techniology, HUST, Viet Nam

The School of Informations and Communication Techniology, HUST, Viet Nam

0009-0002-2967-5952
View Profile

,
Khoat Than

The School of Informations and Communication Techniology, HUST, Viet Nam

The School of Informations and Communication Techniology, HUST, Viet Nam

0000-0001-8615-2854
View Profile

SOICT '23: Proceedings of the 12th International Symposium on Information and Communication TechnologyDecember 2023Pages 189–196https://doi.org/10.1145/3628797.3629000

Published:07 December 2023Publication History

SOICT '23: Proceedings of the 12th International Symposium on Information and Communication Technology

Pages 189–196

ABSTRACT

Memory-based Continual Learning methods (CL) preserve performance on old data by storing a small buffer of seen samples to re-learn with current data. Despite their impressive results, these methods may still obtain sub-optimal solutions as a result of overfitting training data, especially on the limited buffer. This can be attributed to their employment of empirical risk minimization over training data. To overcome this problem, we leverage Sharpness Aware Minimization (SAM), a recently proposed training technique, to improve models’ generalization, and thus CL performance. In particular, SAM seeks for flat minima whose neighbors’ loss values are also low by simultaneously guiding a model towards SAM gradient direction corresponding to low-loss regions and flat regions. However, we conjecture that directly applying SAM to replay-based CL methods whose loss function contains multiple objectives may cause gradient conflict among them. We then propose to manipulate each objective’s SAM gradient such that their potential conflict is minimized by adopting one gradient aggregation strategy from Multi-task Learning. Finally, through extensive experiments, we empirically verify our hypothesis and show consistent improvements in our method over strong memory-replay baselines.

References

Lorenzo Bonicelli, Matteo Boschini, Angelo Porrello, Concetto Spampinato, and Simone Calderara. 2022. On the effectiveness of lipschitz-driven rehearsal in continual learning. Advances in Neural Information Processing Systems 35 (2022), 31886–31901.Google Scholar
Pietro Buzzega, Matteo Boschini, Angelo Porrello, Davide Abati, and Simone Calderara. 2020. Dark experience for general continual learning: a strong, simple baseline. Advances in neural information processing systems 33 (2020), 15920–15930.Google Scholar
Sungmin Cha, Hsiang Hsu, Taebaek Hwang, Flavio P Calmon, and Taesup Moon. 2020. CPR: classifier-projection regularization for continual learning. arXiv preprint arXiv:2006.07326 (2020).Google Scholar
Arslan Chaudhry, Albert Gordo, Puneet Dokania, Philip Torr, and David Lopez-Paz. 2021. Using hindsight to anchor past knowledge in continual learning. In Proceedings of the AAAI conference on artificial intelligence, Vol. 35. 6993–7001.Google ScholarCross Ref
Arslan Chaudhry, Marc’Aurelio Ranzato, Marcus Rohrbach, and Mohamed Elhoseiny. 2018. Efficient lifelong learning with a-gem. arXiv preprint arXiv:1812.00420 (2018).Google Scholar
Arslan Chaudhry, Marcus Rohrbach, Mohamed Elhoseiny, Thalaiyasingam Ajanthan, Puneet K Dokania, Philip HS Torr, and Marc’Aurelio Ranzato. 2019. On tiny episodic memories in continual learning. arXiv preprint arXiv:1902.10486 (2019).Google Scholar
Danruo Deng, Guangyong Chen, Jianye Hao, Qiong Wang, and Pheng-Ann Heng. 2021. Flattening sharpness for dynamic gradient projection memory benefits continual learning. Advances in Neural Information Processing Systems 34 (2021), 18710–18721.Google Scholar
Pierre Foret, Ariel Kleiner, Hossein Mobahi, and Behnam Neyshabur. 2020. Sharpness-aware minimization for efficiently improving generalization. arXiv preprint arXiv:2010.01412 (2020).Google Scholar
Pierre Foret, Ariel Kleiner, Hossein Mobahi, and Behnam Neyshabur. 2021. Sharpness-aware Minimization for Efficiently Improving Generalization. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net. https://openreview.net/forum?id=6Tm1mposlrMGoogle Scholar
Robert M French. 1999. Catastrophic forgetting in connectionist networks. Trends in cognitive sciences 3, 4 (1999), 128–135.Google Scholar
Saihui Hou, Xinyu Pan, Chen Change Loy, Zilei Wang, and Dahua Lin. 2019. Learning a unified classifier incrementally via rebalancing. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 831–839.Google ScholarCross Ref
Nitish Shirish Keskar, Dheevatsa Mudigere, Jorge Nocedal, Mikhail Smelyanskiy, and Ping Tak Peter Tang. 2016. On large-batch training for deep learning: Generalization gap and sharp minima. arXiv preprint arXiv:1609.04836 (2016).Google Scholar
James Kirkpatrick, Razvan Pascanu, Neil Rabinowitz, Joel Veness, Guillaume Desjardins, Andrei A Rusu, Kieran Milan, John Quan, Tiago Ramalho, Agnieszka Grabska-Barwinska, 2017. Overcoming catastrophic forgetting in neural networks. Proceedings of the national academy of sciences 114, 13 (2017), 3521–3526.Google ScholarCross Ref
Lilly Kumari, Shengjie Wang, Tianyi Zhou, and Jeff A Bilmes. 2022. Retrospective adversarial replay for continual learning. Advances in Neural Information Processing Systems 35 (2022), 28530–28544.Google Scholar
Zhizhong Li and Derek Hoiem. 2017. Learning without forgetting. IEEE transactions on pattern analysis and machine intelligence 40, 12 (2017), 2935–2947.Google Scholar
David Lopez-Paz and Marc’Aurelio Ranzato. 2017. Gradient episodic memory for continual learning. Advances in neural information processing systems 30 (2017).Google Scholar
Kaifeng Lyu, Zhiyuan Li, and Sanjeev Arora. 2022. Understanding the generalization benefit of normalization layers: Sharpness reduction. Advances in Neural Information Processing Systems 35 (2022), 34689–34708.Google Scholar
Marc Masana, Xialei Liu, Bartłomiej Twardowski, Mikel Menta, Andrew D Bagdanov, and Joost Van De Weijer. 2022. Class-incremental learning: survey and performance evaluation on image classification. IEEE Transactions on Pattern Analysis and Machine Intelligence 45, 5 (2022), 5513–5533.Google Scholar
Seyed Iman Mirzadeh, Mehrdad Farajtabar, Dilan Gorur, Razvan Pascanu, and Hassan Ghasemzadeh. 2020. Linear mode connectivity in multitask and continual learning. arXiv preprint arXiv:2010.04495 (2020).Google Scholar
Seyed Iman Mirzadeh, Mehrdad Farajtabar, Razvan Pascanu, and Hassan Ghasemzadeh. 2020. Understanding the role of training regimes in continual learning. Advances in Neural Information Processing Systems 33 (2020), 7308–7320.Google Scholar
Henning Petzka, Michael Kamp, Linara Adilova, Cristian Sminchisescu, and Mario Boley. 2021. Relative flatness and generalization. Advances in neural information processing systems 34 (2021), 18420–18432.Google Scholar
Sylvestre-Alvise Rebuffi, Alexander Kolesnikov, Georg Sperl, and Christoph H Lampert. 2017. icarl: Incremental classifier and representation learning. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. 2001–2010.Google ScholarCross Ref
Andrei A Rusu, Neil C Rabinowitz, Guillaume Desjardins, Hubert Soyer, James Kirkpatrick, Koray Kavukcuoglu, Razvan Pascanu, and Raia Hadsell. 2016. Progressive neural networks. arXiv preprint arXiv:1606.04671 (2016).Google Scholar
Gobinda Saha, Isha Garg, and Kaushik Roy. 2021. Gradient projection memory for continual learning. arXiv preprint arXiv:2103.09762 (2021).Google Scholar
Joan Serra, Didac Suris, Marius Miron, and Alexandros Karatzoglou. 2018. Overcoming Catastrophic Forgetting with Hard Attention to the Task. In Proceedings of the 35th International Conference on Machine Learning(Proceedings of Machine Learning Research, Vol. 80), Jennifer Dy and Andreas Krause (Eds.). PMLR, 4548–4557. https://proceedings.mlr.press/v80/serra18a.htmlGoogle Scholar
Guangyuan Shi, Jiaxin Chen, Wenlong Zhang, Li-Ming Zhan, and Xiao-Ming Wu. 2021. Overcoming catastrophic forgetting in incremental few-shot learning by finding flat minima. Advances in neural information processing systems 34 (2021), 6747–6761.Google Scholar
Hanul Shin, Jung Kwon Lee, Jaehong Kim, and Jiwon Kim. 2017. Continual Learning with Deep Generative Replay. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA, Isabelle Guyon, Ulrike von Luxburg, Samy Bengio, Hanna M. Wallach, Rob Fergus, S. V. N. Vishwanathan, and Roman Garnett (Eds.). 2990–2999. https://proceedings.neurips.cc/paper/2017/hash/0efbe98067c6c73dba1250d2beaa81f9-Abstract.htmlGoogle Scholar
Tianhe Yu, Saurabh Kumar, Abhishek Gupta, Sergey Levine, Karol Hausman, and Chelsea Finn. 2020. Gradient surgery for multi-task learning. Advances in Neural Information Processing Systems 33 (2020), 5824–5836.Google Scholar
Friedemann Zenke, Ben Poole, and Surya Ganguli. 2017. Continual Learning through Synaptic Intelligence. In Proceedings of the 34th International Conference on Machine Learning - Volume 70 (Sydney, NSW, Australia) (ICML’17). JMLR.org, 3987–3995.Google Scholar
Yang Zhao, Hao Zhang, and Xiuyuan Hu. 2022. Penalizing Gradient Norm for Efficiently Improving Generalization in Deep Learning. In International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA(Proceedings of Machine Learning Research, Vol. 162), Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvári, Gang Niu, and Sivan Sabato (Eds.). PMLR, 26982–26992. https://proceedings.mlr.press/v162/zhao22i.htmlGoogle Scholar

Index Terms

Sharpness and Gradient Aware Minimization for Memory-based Continual Learning
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
        Object recognition

Recommendations

Class Gradient Projection For Continual Learning
MM '22: Proceedings of the 30th ACM International Conference on Multimedia

Catastrophic forgetting is one of the most critical challenges in Continual Learning (CL). Recent approaches tackle this problem by projecting the gradient update orthogonal to the gradient subspace of existing tasks. While the results are remarkable, ...
Read More
Make sharpness-aware minimization stronger: a sparsified perturbation approach
NIPS '22: Proceedings of the 36th International Conference on Neural Information Processing Systems

Deep neural networks often suffer from poor generalization caused by complex and non-convex loss landscapes. One of the popular solutions is Sharpness-Aware Minimization (SAM), which smooths the loss landscape via minimizing the maximized change of ...
Read More
Semi-supervised Continual Learning with Meta Self-training
CIKM '22: Proceedings of the 31st ACM International Conference on Information & Knowledge Management

Continual learning (CL) aims to enhance sequential learning by alleviating the forgetting of previously acquired knowledge. Recent advances in CL lack consideration of the real-world scenarios, where labeled data are scarce and unlabeled data are ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

SOICT '23: Proceedings of the 12th International Symposium on Information and Communication Technology
December 2023
1058 pages
ISBN:9798400708916
DOI:10.1145/3628797

Copyright © 2023 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 7 December 2023
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Continual Learning
Flatness
Gradient Alignment
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate147of318submissions,46%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 73
  Total Downloads
- Downloads (Last 12 months)73
- Downloads (Last 6 weeks)5
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Sharpness and Gradient Aware Minimization for Memory-based Continual Learning

SOICT '23: Proceedings of the 12th International Symposium on Information and Communication Technology

ABSTRACT

References

Cited By

Index Terms

Recommendations

Class Gradient Projection For Continual Learning

Make sharpness-aware minimization stronger: a sparsified perturbation approach

Semi-supervised Continual Learning with Meta Self-training

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

Sharpness and Gradient Aware Minimization for Memory-based Continual Learning

SOICT '23: Proceedings of the 12th International Symposium on Information and Communication Technology

ABSTRACT

References

Cited By

Index Terms

Recommendations

Class Gradient Projection For Continual Learning

Make sharpness-aware minimization stronger: a sparsified perturbation approach

Semi-supervised Continual Learning with Meta Self-training

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media