Abstract
Training data of many vision tasks may be sequentially arrived in practice, e.g., the vision tasks in autonomous driving or video surveillance applications. This raises a fundamental challenge that, how to keep improving the performance on a specific task by learning from sequentially available training splits. This paper investigates this task as Incremental Model Enhancement (IME). IME is distinct from the conventional Incremental Learning (IL), where each training split typically corresponds to a set of independent classes, domains, or tasks. In IME, each training split may only cover part of the entire data distribution for the target vision task. Consequently, the IME model should be optimized towards the joint distribution of all available training splits, instead of optimizing towards each newly arrived one like IL methods. To deal with above issues, our method stores feature vectors of previously observed training data in the memory bank, which preserves compressed knowledge of the previous training data. We hence adopt the memorized features and each newly arrived training split for training via Memory-based Contrastive Learning (MCL). A new Contrastive Relation Preserving (CRP) scheme updates the memory bank to prevent obsoleteness of the preserved features and works with MCL simultaneously to boost the model performance. Experiments on several large-scale image classification benchmarks demonstrate the effectiveness of our method. Our method also works well on semantic segmentation, showing strong generalization ability on diverse vision tasks.












Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data Availability
This paper uses public datasets to conduct experiments. Those datasets are available in following URLs. TinyImageNet (Stanford, 2015): http://tiny-imagenet.herokuapp.com/. miniImageNet (Vinyals et al., 2016): https://goo.gl/e3orz6/. ImageNet1K (Russakovsky et al., 2015): https://www.image-net.org/. CUB (Wah et al., 2011): http://www.vision.caltech.edu/datasets/cub_200_2011/. Cityscapes (Cordts et al., 2016): https://www.cityscapes-dataset.com/. OfficeHome (Krause et al., 2013): https://www.hemanthdv.org/officeHomeDataset.html/. iNaturalist2018 (Van Horn et al., 2018): https://github.com/visipedia/inat_comp/tree/master/2018.
References
Aljundi, R., Lin, M., Goujaud, B., & Bengio, Y. (2019). Gradient based sample selection for online continual learning. arXiv preprint arXiv:1903.08671.
Ashok, A., Joseph, K., Balasubramanian, V. N. (2022). Class-incremental learning with cross-space clustering and controlled transfer. In ECCV (pp. 105–122). Springer.
Bang, J., Kim, H., Yoo, Y., Ha, J. W., & Choi, J. (2021). Rainbow memory: Continual learning with a memory of diverse samples. In CVPR (pp. 8218–8227).
Belouadah, E., & Popescu, A. (2019). Il2m: Class incremental learning with dual memory. In ICCV (pp. 583–592).
Bobu, A., Tzeng, E., Hoffman, J., & Darrell, T. (2018). Adapting to continuously shifting domains. In ICLR workshop.
Buzzega, P., Boschini, M., Porrello, A., Abati, D., & Calderara, S. (2020). Dark experience for general continual learning: A strong, simple baseline. arXiv preprint arXiv:2004.07211.
Cha, H., Lee, J., & Shin, J. (2021). Co2l: Contrastive continual learning. In ICCV (pp. 9516–9525).
Chen, LC., Papandreou, G., Schroff, F., & Adam, H. (2017). Rethinking atrous convolution for semantic image segmentation. arXiv:1706.05587.
Chen, X., & He, K. (2021). Exploring simple siamese representation learning. In CVPR (pp. 15750–15758).
Chen, X., Fan, H., Girshick, R., & He, K. (2020b). Improved baselines with momentum contrastive learning. arXiv preprint arXiv:2003.04297.
Chen, T., Kornblith, S., Norouzi, M., & Hinton, G. (2020a). A simple framework for contrastive learning of visual representations. In ICML, PMLR (pp. 1597–1607).
Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., & Schiele, B. (2016). The cityscapes dataset for semantic urban scene understanding. In CVPR.
Dhar, P., Singh, R. V., Peng, K. C., Wu, Z., & Chellappa, R. (2019). Learning without memorizing. In CVPR (pp. 5138–5146).
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., & Uszkoreit, J. (2020). An image is worth \(16\times 16\) words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929.
Douillard, A., Cord, M., Ollion, C., Robert, T., & Valle, E. (2020). Podnet: Pooled outputs distillation for small-tasks incremental learning. In ECCV (pp. 86–102). Springer.
Fang, Z., Wang, J., Wang, L., Zhang, L., Yang, Y., & Liu, Z. (2021). Seed: Self-supervised distillation for visual representation. arXiv preprint arXiv:2101.04731.
Farajtabar, M., Azizan, N., Mott, A., & Li, A. (2020). Orthogonal gradient descent for continual learning. In International conference on artificial intelligence and statistics (pp. 3762–3773). PMLR.
Gou, J., Yu, B., Maybank, S. J., & Tao, D. (2021). Knowledge distillation: A survey. IJCV, 129, 1789–1819.
Hadsell, R., Chopra, S., & LeCun, Y. (2006). Dimensionality reduction by learning an invariant mapping. In 2006 IEEE computer society conference on computer vision and pattern recognition (CVPR’06) (pp. 1735–1742). IEEE.
Hayes, TL., Kafle, K., Shrestha, R., Acharya, M., & Kanan, C. (2020). Remind your neural network to prevent catastrophic forgetting. In ECCV (pp. 466–483). Springer.
Hayes, T. L., & Kanan, C. (2020). Lifelong machine learning with deep streaming linear discriminant analysis. In CVPRW (pp. 220–221).
He, K., Fan, H., Wu, Y., Xie, S., & Girshick, R. (2020). Momentum contrast for unsupervised visual representation learning. In CVPR (pp. 9729–9738).
He, Y., Zhang, X., & Sun, J. (2017). Channel pruning for accelerating very deep neural networks. In ICCV (pp. 1389–1397).
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In CVPR (pp. 770–778).
Hinton, G., Vinyals, O., & Dean, J. (2015). Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531.
Hoi, S. C., Sahoo, D., Lu, J., & Zhao, P. (2021). Online learning: A comprehensive survey. Neurocomputing, 459, 249–289.
Iscen, A., Zhang, J., Lazebnik, S., & Schmid, C. (2020). Memory-efficient incremental learning through feature adaptation. In ECCV (pp. 699–715). Springer.
Jung, H., Ju, J., Jung, M., & Kim, J. (2018). Less-forgetful learning for domain expansion in deep neural networks. In AAAI.
Kalantidis, Y., Sariyildiz, M. B., Pion, N., Weinzaepfel, P., & Larlus, D. (2020). Hard negative mixing for contrastive learning. NeuIPS, 33, 21798–21809.
Kang, B., Li, Y., Xie, S., Yuan, Z., & Feng, J. (2020). Exploring balanced feature spaces for representation learning. In ICLR.
Khosla, P., Teterwak, P., Wang, C., Sarna, A., Tian, Y., Isola, P., Maschinot, A., Liu, C., & Krishnan, D. (2020). Supervised contrastive learning. arXiv preprint arXiv:2004.11362.
Kim, G., Xiao, C., Konishi, T., & Liu, B. (2023) Learnability and algorithm for continual learning. arXiv preprint arXiv:2306.12646.
Kirkpatrick, J., Pascanu, R., Rabinowitz, N., Veness, J., Desjardins, G., Rusu, A. A., Milan, K., Quan, J., Ramalho, T., Grabska-Barwinska, A., & Hassabis, D. (2017). Overcoming catastrophic forgetting in neural networks. Proceedings of the National Academy of Sciences, 114(13), 3521–3526.
Kovashka, A., Russakovsky, O., Fei-Fei, L., & Grauman, K. (2016). Crowdsourcing in computer vision. Foundations and Trends ® in Computer Graphics and Vision, 10(3), 177–243.
Krause, J., Stark, M., Deng, J., & Fei-Fei, L. (2013). 3d object representations for fine-grained categorization. In ICCVW (pp. 554–561).
Lee, S. W., Kim, J. H., Jun, J., Ha, J. W., & Zhang, B. T. (2017). Overcoming catastrophic forgetting by incremental moment matching. arXiv preprint arXiv:1703.08475.
Li ,J., Zhou, P., Xiong, C., & Hoi, S. C. (2020). Prototypical contrastive learning of unsupervised representations. arXiv preprint arXiv:2005.04966.
Li, Z., & Hoiem, D. (2017). Learning without forgetting. TPAMI, 40(12), 2935–2947.
Lopez-Paz, D., & Ranzato, M. (2017). Gradient episodic memory for continual learning. NeurIPS, 30, 6467–6476.
Mallya, A., & Lazebnik, S .(2018). Packnet: Adding multiple tasks to a single network by iterative pruning. In CVPR (pp. 7765–7773).
Mancini, M., Bulo, S. R., Caputo, B., & Ricci, E. (2019). Adagraph: Unifying predictive and continuous domain adaptation through graphs. In CVPR (pp. 6568–6577).
Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., & Desmaison, A. (2019). Pytorch: An imperative style, high-performance deep learning library. NeurIPS, 32, 8026–8037.
Petit, G., Popescu, A., Belouadah, E., Picard, D., & Delezoide, B. (2023). Plastil: Plastic and stable exemplar-free class-incremental learning. In Conference on lifelong learning agents (pp. 399–414). PMLR.
Prabhu, A., Torr, P. H., & Dokania, P. K. (2020). Gdumb: A simple approach that questions our progress in continual learning. In ECCV (pp. 524–540). Springer.
Pu, N., Chen, W., Liu, Y., Bakker, E. M., & Lew, M. S. (2021). Lifelong person re-identification via adaptive knowledge accumulation. In CVPR (pp. 7901–7910).
Rebuffi, S. A., Kolesnikov, A., Sperl, G., & Lampert, C. H. (2017). icarl: Incremental classifier and representation learning. In CVPR (pp. 2001–2010).
Romero, A., Ballas, N., & Kahou, S. E.(2014). Fitnets: Hints for thin deep nets. arXiv preprint arXiv:1412.6550.
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., & Berg, A. C. (2015). Imagenet large scale visual recognition challenge. IJCV, 115(3), 211–252.
Saha, G., Garg, I., & Roy, K. (2021). Gradient projection memory for continual learning. arXiv preprint arXiv:2103.09762.
Stanford. (2015). Tiny ImageNet Challenge (CS231n). http://tiny-imagenet.herokuapp.com/
Sun, Q., Liu, Y., Chua, T. S., & Schiele, B. (2019). Meta-transfer learning for few-shot learning. In CVPR (pp. 403–412).
Tao, X., Chang, X., Hong, X., Wei, X., & Gong, Y. (2020). Topology-preserving class-incremental learning. In ECCV (pp. 254–270). Springer.
Van der Maaten, L., & Hinton, G. (2008). Visualizing data using t-SNE. Journal of Machine Learning Research, 9, 2579–2605.
Van Horn, G., Mac Aodha, O., Song, Y., Cui, Y., Sun, C., Shepard, A., Adam, H., Perona, P., & Belongie, S. (2018). The inaturalist species classification and detection dataset. In CVPR (pp. 8769–8778).
Venkateswara, H., Eusebio, J., Chakraborty, S., & Panchanathan, S. (2017). Deep hashing network for unsupervised domain adaptation. In CVPR (pp. 5018–5027).
Vijayanarasimhan, S., & Grauman, K. (2011). Cost-sensitive active visual category learning. IJCV, 91, 24–44.
Vinyals, O., Blundell, C., Lillicrap, T., & Wierstra, D. (2016). Matching networks for one shot learning. NeurIPS, 29, 3630–3638.
Wah, C., Branson, S., Welinder, P., Perona, P., & Belongie, S. (2011). The caltech-ucsd birds-200-2011 dataset. California Institute of Technology.
Wang, Q., Fink, O., Van Gool, L., & Dai, D. (2022b). Continual test-time domain adaptation. In CVPR (pp. 7201–7211).
Wang, S., Li, X., Sun, J., & Xu, Z. (2021). Training networks in null space of feature covariance for continual learning. In CVPR (pp. 184–193).
Wang, Z., Zhang, Z., Lee, C. Y., Zhang, H., Sun, R., Ren, X., Su, G., Perot, V., Dy, J., & Pfister, T. (2022d). Learning to prompt for continual learning. In CVPR (pp. 139–149).
Wang, F. Y., Zhou, D. W., Ye, H. J., & Zhan, D. C. (2022a). Foster: Feature boosting and compression for class-incremental learning. In ECCV (pp. 398–414). Springer.
Wang, Y., Huang, Z., & Hong, X. (2022). S-prompts learning with pre-trained transformers: An Occam’s razor for domain incremental learning. NeurIPS, 35, 5682–5695.
Xie, J., Yan, S., & He, X. (2022). General incremental learning with domain-aware categorical representations. In CVPR (pp. 14351–14360).
Yan, S., Xie, J., & He, X. (2021). Der: Dynamically expandable representation for class incremental learning. In CVPR (pp. 3014–3023).
Yao, X., Bai, Y., Zhang, X., Zhang, Y., Sun, Q., Chen, R., Li, R. & Yu, B. (2022). Pcl: Proxy-based contrastive learning for domain generalization. In CVPR (pp. 7097–7107).
Yoon, J., Yang, E., Lee, J., & Hwang, S. J. (2017). Lifelong learning with dynamically expandable networks. arXiv preprint arXiv:1708.01547.
Yu, L., Twardowski, B., Liu, X., Herranz, L., Wang, K., Cheng, Y., Jui, S., & Weijer, J. V. D. (2020). Semantic drift compensation for class-incremental learning. In CVPR (pp. 6982–6991).
Yuan, Y., Chen, X., & Wang, J. (2020). Object-contextual representations for semantic segmentation. In ECCV (pp. 173–190). Springer.
Zagoruyko, S., & Komodakis, N. (2016). Paying more attention to attention: Improving the performance of convolutional neural networks via attention transfer. arXiv preprint arXiv:1612.03928.
Zhu, R., Zhao, B., Liu, J., Sun, Z., & Chen, C. W. (2021). Improving contrastive learning by visualizing feature transformation. In ICCV (pp. 10306–10315).
Acknowledgements
This work is supported in part by the Natural Science Foundation of China under Grant No. U20B2052, 61936011, in part by the Okawa Foundation Research Award.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Nicu Sebe.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Xuan, S., Yang, M. & Zhang, S. Incremental Model Enhancement via Memory-based Contrastive Learning. Int J Comput Vis 133, 65–83 (2025). https://doi.org/10.1007/s11263-024-02138-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11263-024-02138-z