skip to main content
research-article

Instance-Based Continual Learning: A Real-World Dataset and Baseline for Fresh Recognition

Published: 24 August 2023 Publication History

Abstract

Real-time learning on real-world data streams with temporal relations is essential for intelligent agents. However, current online Continual Learning (CL) benchmarks adopt the mini-batch setting and are composed of temporally unrelated and disjoint tasks as well as pre-set class boundaries. In this paper, we delve into a real-world CL scenario for fresh recognition where algorithms are required to recognize a huge variety of products to facilitate the checkout speed. Products mainly consists of packaged cereals, seasonal fruits, and vegetables from local farms or shipped from overseas. Since algorithms process instance streams consisting of sequential images, we name this real-world CL problem as Instance-Based Continual Learning (IBCL). Different from the current online CL setting, algorithms are required to perform instant testing and learning upon each incoming instance. Moreover, IBCL has no task boundaries or class boundaries and allows the evolution and the forgetting of old samples within each class. To promote the researches on real CL challenges, we propose the first real-world CL dataset coined the Continual Fresh Recognition (CFR) dataset, which consists of fresh recognition data streams (766 K labelled images in total) collected from 30 supermarkets. Based on the CFR dataset, we extensively evaluate the performance of current online CL methods under various settings and find that current prominent online CL methods operate at high latency and demand significant memory consumption to cache old samples for replaying. Therefore, we make the first attempt to design an efficient and effective Instant Training-Free Learning (ITFL) framework for IBCL. ITFL consists of feature extractors trained in the metric learning manner and reformulates CL as a temporal classification problem among several most similar classes. Unlike current online CL methods that cache image samples (150 KB per image) and rely on training to learn new knowledge, our framework only caches features (2 KB per image) and is free of training in deployment. Extensive evaluations across three datasets demonstrate that our method achieves comparable recognition accuracy to current methods with lower latency and less resource consumption. Our codes and datasets will be publicly available at https://github.com/detectRecog/IBCL.

References

[1]
Rahaf Aljundi et al. 2019. Online continual learning with maximal interfered retrieval. Advances in Neural Information Processing Systems 32 (2019).
[2]
Rahaf Aljundi, Min Lin, Baptiste Goujaud, and Yoshua Bengio. 2019. Gradient based sample selection for online continual learning. Advances in Neural Information Processing Systems 32 (2019).
[3]
Jihwan Bang et al. 2022. Online continual learning on a contaminated data stream with blurry task boundaries. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 9275–9284.
[4]
Lucas Caccia and Joelle Pineau. 2021. Special: Self-supervised pretraining for continual learning. arXiv preprint arXiv:2106.09065 (2021).
[5]
Qi Cai, Yingwei Pan, Ting Yao, Chenggang Yan, and Tao Mei. 2018. Memory matching networks for one-shot image recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4080–4088.
[6]
Mathilde Caron, Hugo Touvron, Ishan Misra, Hervé Jégou, Julien Mairal, Piotr Bojanowski, and Armand Joulin. 2021. Emerging properties in self-supervised vision transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 9650–9660.
[7]
Francisco M. Castro, Manuel J. Marín-Jiménez, Nicolás Guil, Cordelia Schmid, and Karteek Alahari. 2018. End-to-end incremental learning. In Proceedings of the European Conference on Computer Vision (ECCV). 233–248.
[8]
Hyuntak Cha, Jaeho Lee, and Jinwoo Shin. 2021. Co2l: Contrastive continual learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 9516–9525.
[9]
Arslan Chaudhry et al. 2019. On tiny episodic memories in continual learning. arXiv preprint arXiv:1902.10486 (2019).
[10]
Arslan Chaudhry, Puneet K. Dokania, Thalaiyasingam Ajanthan, and Philip H. S. Torr. 2018. Riemannian walk for incremental learning: Understanding forgetting and intransigence. In Proceedings of the European Conference on Computer Vision (ECCV). 532–547.
[11]
Arslan Chaudhry, Marc’Aurelio Ranzato, Marcus Rohrbach, and Mohamed Elhoseiny. 2018. Efficient lifelong learning with A-GEM. arXiv preprint arXiv:1812.00420 (2018).
[12]
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, et al. 2020. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020).
[13]
Arthur Douillard, Alexandre Ramé, Guillaume Couairon, and Matthieu Cord. 2022. DyTox: Transformers for continual learning with dynamic token expansion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 9285–9295.
[14]
Enrico Fini, Victor G. Turrisi da Costa, Xavier Alameda-Pineda, Elisa Ricci, Karteek Alahari, and Julien Mairal. 2021. Self-supervised models are continual learners. arXiv preprint arXiv:2112.04215 (2021).
[15]
Jhair Gallardo, Tyler L. Hayes, and Christopher Kanan. 2021. Self-supervised training enhances online continual learning. arXiv preprint arXiv:2103.14010 (2021).
[16]
Alex Graves, Greg Wayne, and Ivo Danihelka. 2014. Neural turing machines. arXiv preprint arXiv:1410.5401 (2014).
[17]
Yanan Gu, Xu Yang, Kun Wei, and Cheng Deng. 2022. Not just selection, but exploration: Online class-incremental continual learning via dual view consistency. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 7442–7451.
[18]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 770–778.
[19]
Qiang Hou, Weiqing Min, Jing Wang, Sujuan Hou, Yuanjie Zheng, and Shuqiang Jiang. 2021. FoodLogoDet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In Proceedings of the 29th ACM International Conference on Multimedia. 4670–4679.
[20]
Shuqiang Jiang, Weiqing Min, Yongqiang Lyu, and Linhu Liu. 2020. Few-shot food recognition via multi-view representation learning. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 16, 3 (2020), 1–20.
[21]
Shuqiang Jiang, Yaohui Zhu, Chenlong Liu, Xinhang Song, Xiangyang Li, and Weiqing Min. 2022. Dataset bias in few-shot image recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence (2022).
[22]
Theodoros Karagkioules, Georgios S. Paschos, Nikolaos Liakopoulos, Attilio Fiandrotti, Dimitrios Tsilimantos, and Marco Cagnazzo. 2022. Online learning for adaptive video streaming in mobile networks. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 18, 1 (2022), 1–22.
[23]
Ronald Kemker and Christopher Kanan. 2017. FearNet: Brain-inspired model for incremental learning. arXiv preprint arXiv:1711.10563 (2017).
[24]
Prannay Khosla et al. 2020. Supervised contrastive learning. Advances in Neural Information Processing Systems 33 (2020), 18661–18673.
[25]
James Kirkpatrick, Razvan Pascanu, Neil Rabinowitz, Joel Veness, Guillaume Desjardins, Andrei A. Rusu, Kieran Milan, John Quan, Tiago Ramalho, Agnieszka Grabska-Barwinska, et al. 2017. Overcoming catastrophic forgetting in neural networks. Proceedings of the National Academy of Sciences 114, 13 (2017), 3521–3526.
[26]
Alex Krizhevsky, Geoffrey Hinton, et al. 2009. Learning multiple layers of features from tiny images. (2009).
[27]
K. H. Lee et al. 2017. CleanNet: Transfer learning for scalable image classifier training with label noise.
[28]
Soochan Lee, Junsoo Ha, Dongsu Zhang, and Gunhee Kim. 2020. A neural Dirichlet process mixture model for task-free continual learning. arXiv preprint arXiv:2001.00689 (2020).
[29]
Wen Li, Limin Wang, Wei Li, Eirikur Agustsson, and Luc Van Gool. 2017. WebVision database: Visual learning and understanding from web data. arXiv preprint arXiv:1708.02862 (2017).
[30]
Yunfan Li, Mouxing Yang, Dezhong Peng, Taihao Li, Jiantao Huang, and Xi Peng. 2022. Twin contrastive learning for online clustering. International Journal of Computer Vision 130, 9 (2022), 2205–2221.
[31]
Zhizhong Li and Derek Hoiem. 2017. Learning without forgetting. IEEE Transactions on Pattern Analysis and Machine Intelligence 40, 12 (2017), 2935–2947.
[32]
Zhiqiu Lin et al. 2021. The CLEAR benchmark: Continual LEArning on real-world imagery. In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2).
[33]
Vincenzo Lomonaco and Davide Maltoni. 2017. Core50: A new dataset and benchmark for continuous object recognition. In Conference on Robot Learning. PMLR, 17–26.
[34]
David Lopez-Paz and Marc’Aurelio Ranzato. 2017. Gradient episodic memory for continual learning. Advances in Neural Information Processing Systems 30 (2017).
[35]
Dezhao Luo et al. 2022. Exploring relations in untrimmed videos for self-supervised learning. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 18, 1s (2022), 1–21.
[36]
Divyam Madaan, Jaehong Yoon, Yuanchun Li, Yunxin Liu, and Sung Ju Hwang. 2021. Rethinking the representational continuity: Towards unsupervised continual learning. arXiv preprint arXiv:2110.06976 (2021).
[37]
Zheda Mai, Ruiwen Li, Jihwan Jeong, David Quispe, Hyunwoo Kim, and Scott Sanner. 2022. Online continual learning in image classification: An empirical survey. Neurocomputing 469 (2022), 28–51.
[38]
Zheda Mai, Ruiwen Li, Hyunwoo Kim, and Scott Sanner. 2021. Supervised contrastive replay: Revisiting the nearest class mean classifier in online class-incremental continual learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3589–3599.
[39]
Martial Mermillod, Aurélia Bugaiska, and Patrick Bonin. 2013. The stability-plasticity dilemma: Investigating the continuum from catastrophic forgetting to age-limited learning effects. Frontiers in Psychology 4 (2013), 504.
[40]
A. Miller, A. Fisch, J. Dodge, A. H. Karimi, A. Bordes, and J. Weston. 2016. Key-value memory networks for directly reading documents. In Conference on Empirical Methods in Natural Language Processing.
[41]
Weiqing Min, Linhu Liu, Zhiling Wang, Zhengdong Luo, Xiaoming Wei, Xiaolin Wei, and Shuqiang Jiang. 2020. ISIA Food-500: A dataset for large-scale food recognition via stacked global-local attention network. In Proceedings of the 28th ACM International Conference on Multimedia. 393–401.
[42]
Bo Pang et al. 2022. Fully unsupervised person re-identification via selective contrastive learning. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 18, 2 (2022), 1–15.
[43]
German I. Parisi, Jun Tani, Cornelius Weber, and Stefan Wermter. 2017. Lifelong learning of human actions with deep neural network self-organization. Neural Networks 96 (2017), 137–149.
[44]
German I. Parisi, Jun Tani, Cornelius Weber, and Stefan Wermter. 2018. Lifelong learning of spatiotemporal representations with dual-memory recurrent self-organization. Frontiers in Neurorobotics (2018), 78.
[45]
Lorenzo Pellegrini et al. 2020. Latent replay for real-time continual learning. In 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 10203–10209.
[46]
Ameya Prabhu, Philip H. S. Torr, and Puneet K. Dokania. 2020. GDumb: A simple approach that questions our progress in continual learning. In Proceedings of the European Conference on Computer Vision (ECCV). Springer, 524–540.
[47]
Charles R. Qi et al. 2017. PointNet: Deep learning on point sets for 3D classification and segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 652–660.
[48]
Sylvestre-Alvise Rebuffi et al. 2017. iCaRL: Incremental classifier and representation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2001–2010.
[49]
Andrei A. Rusu, Neil C. Rabinowitz, Guillaume Desjardins, Hubert Soyer, James Kirkpatrick, Koray Kavukcuoglu, Razvan Pascanu, and Raia Hadsell. 2016. Progressive neural networks. arXiv preprint arXiv:1606.04671 (2016).
[50]
Joan Serra, Didac Suris, Marius Miron, and Alexandros Karatzoglou. 2018. Overcoming catastrophic forgetting with hard attention to the task. In International Conference on Machine Learning. PMLR, 4548–4557.
[51]
Dongsub Shim et al. 2021. Online class-incremental continual learning with adversarial Shapley value. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 9630–9638.
[52]
Bart Thomee, David A. Shamma, Gerald Friedland, Benjamin Elizalde, Karl Ni, Douglas Poland, Damian Borth, and Li-Jia Li. 2016. YFCC100M: The new data in multimedia research. Commun. ACM 59, 2 (2016), 64–73.
[53]
Rishabh Tiwari et al. 2022. GCR: Gradient coreset based replay buffer selection for continual learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 99–108.
[54]
G. M. van de Ven and A. S. Tolias. 2019. Three scenarios for continual learning. (2019).
[55]
Oriol Vinyals, Charles Blundell, Timothy Lillicrap, Daan Wierstra, et al. 2016. Matching networks for one shot learning. Advances in Neural Information Processing Systems 29 (2016).
[56]
Timmy S. T. Wan, Jun-Cheng Chen, Tzer-Yi Wu, and Chu-Song Chen. 2022. Continual learning for visual search with backward consistent feature embedding. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 16702–16711.
[57]
Yue Wang, Yongbin Sun, Ziwei Liu, Sanjay E. Sarma, Michael M. Bronstein, and Justin M. Solomon. 2019. Dynamic graph CNN for learning on point clouds. ACM Transactions on Graphics (TOG) 38, 5 (2019), 1–12.
[58]
Zhen Wang, Liu Liu, Yiqun Duan, Yajing Kong, and Dacheng Tao. 2022. Continual learning with lifelong vision transformer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 171–181.
[59]
J. Weston, S. Chopra, and A. Bordes. 2014. Memory networks. Eprint Arxiv (2014).
[60]
J. E. Weston, A. D. Szlam, R. D. Fergus, and S. Sukhbaatar. 2017. End-to-End Memory Networks.
[61]
Xing Xu et al. 2021. Cross-modal hybrid feature fusion for image-sentence matching. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 17, 4 (2021), 1–23.
[62]
Mengqi Xue, Haofei Zhang, Jie Song, and Mingli Song. 2022. Meta-attention for ViT-backed continual learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 150–159.
[63]
Qingsen Yan, Dong Gong, Yuhang Liu, Anton van den Hengel, and Javen Qinfeng Shi. 2022. Learning Bayesian sparse networks with full experience replay for continual learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 109–118.
[64]
Friedemann Zenke, Ben Poole, and Surya Ganguli. 2017. Continual learning through synaptic intelligence. In International Conference on Machine Learning. PMLR, 3987–3995.
[65]
Chen Zeno, Itay Golan, Elad Hoffer, and Daniel Soudry. 2018. Task agnostic continual learning using online variational Bayes. arXiv preprint arXiv:1803.10123 (2018).

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Multimedia Computing, Communications, and Applications
ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 20, Issue 1
January 2024
639 pages
EISSN:1551-6865
DOI:10.1145/3613542
  • Editor:
  • Abdulmotaleb El Saddik
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 August 2023
Online AM: 25 April 2023
Accepted: 26 March 2023
Revised: 19 March 2023
Received: 18 October 2022
Published in TOMM Volume 20, Issue 1

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Online continual learning
  2. fresh recognition

Qualifiers

  • Research-article

Funding Sources

  • “Pioneer” and “Leading Goose” R&D Program of Zhejiang
  • National Natural Science Foundation of China
  • China Postdoctoral Science Foundation
  • Natural Science Foundation of Zhejiang province
  • Fundamental Research Funds for the Central Universities

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 622
    Total Downloads
  • Downloads (Last 12 months)294
  • Downloads (Last 6 weeks)19
Reflects downloads up to 16 Feb 2025

Other Metrics

Citations

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

Full Text

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media