Abstract
Real-time learning on real-world data streams with temporal relations is essential for intelligent agents. However, current online Continual Learning (CL) benchmarks adopt the mini-batch setting and are composed of temporally unrelated and disjoint tasks as well as pre-set class boundaries. In this paper, we delve into a real-world CL scenario for fresh recognition where algorithms are required to recognize a huge variety of products to facilitate the checkout speed. Products mainly consists of packaged cereals, seasonal fruits, and vegetables from local farms or shipped from overseas. Since algorithms process instance streams consisting of sequential images, we name this real-world CL problem as Instance-Based Continual Learning (IBCL). Different from the current online CL setting, algorithms are required to perform instant testing and learning upon each incoming instance. Moreover, IBCL has no task boundaries or class boundaries and allows the evolution and the forgetting of old samples within each class. To promote the researches on real CL challenges, we propose the first real-world CL dataset coined the Continual Fresh Recognition (CFR) dataset, which consists of fresh recognition data streams (766 K labelled images in total) collected from 30 supermarkets. Based on the CFR dataset, we extensively evaluate the performance of current online CL methods under various settings and find that current prominent online CL methods operate at high latency and demand significant memory consumption to cache old samples for replaying. Therefore, we make the first attempt to design an efficient and effective Instant Training-Free Learning (ITFL) framework for IBCL. ITFL consists of feature extractors trained in the metric learning manner and reformulates CL as a temporal classification problem among several most similar classes. Unlike current online CL methods that cache image samples (150 KB per image) and rely on training to learn new knowledge, our framework only caches features (2 KB per image) and is free of training in deployment. Extensive evaluations across three datasets demonstrate that our method achieves comparable recognition accuracy to current methods with lower latency and less resource consumption. Our codes and datasets will be publicly available at https://github.com/detectRecog/IBCL.
- [1] . 2019. Online continual learning with maximal interfered retrieval. Advances in Neural Information Processing Systems 32 (2019).Google Scholar
- [2] . 2019. Gradient based sample selection for online continual learning. Advances in Neural Information Processing Systems 32 (2019).Google Scholar
- [3] . 2022. Online continual learning on a contaminated data stream with blurry task boundaries. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 9275–9284.Google ScholarCross Ref
- [4] . 2021. Special: Self-supervised pretraining for continual learning. arXiv preprint arXiv:2106.09065 (2021).Google Scholar
- [5] . 2018. Memory matching networks for one-shot image recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4080–4088.Google ScholarCross Ref
- [6] . 2021. Emerging properties in self-supervised vision transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 9650–9660.Google ScholarCross Ref
- [7] . 2018. End-to-end incremental learning. In Proceedings of the European Conference on Computer Vision (ECCV). 233–248.Google ScholarDigital Library
- [8] . 2021. Co2l: Contrastive continual learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 9516–9525.Google ScholarCross Ref
- [9] . 2019. On tiny episodic memories in continual learning. arXiv preprint arXiv:1902.10486 (2019).Google Scholar
- [10] . 2018. Riemannian walk for incremental learning: Understanding forgetting and intransigence. In Proceedings of the European Conference on Computer Vision (ECCV). 532–547.Google ScholarDigital Library
- [11] . 2018. Efficient lifelong learning with A-GEM. arXiv preprint arXiv:1812.00420 (2018).Google Scholar
- [12] . 2020. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020).Google Scholar
- [13] . 2022. DyTox: Transformers for continual learning with dynamic token expansion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 9285–9295.Google ScholarCross Ref
- [14] . 2021. Self-supervised models are continual learners. arXiv preprint arXiv:2112.04215 (2021).Google Scholar
- [15] . 2021. Self-supervised training enhances online continual learning. arXiv preprint arXiv:2103.14010 (2021).Google Scholar
- [16] . 2014. Neural turing machines. arXiv preprint arXiv:1410.5401 (2014).Google Scholar
- [17] . 2022. Not just selection, but exploration: Online class-incremental continual learning via dual view consistency. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 7442–7451.Google ScholarCross Ref
- [18] . 2016. Deep residual learning for image recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 770–778.Google ScholarCross Ref
- [19] . 2021. FoodLogoDet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In Proceedings of the 29th ACM International Conference on Multimedia. 4670–4679.Google ScholarDigital Library
- [20] . 2020. Few-shot food recognition via multi-view representation learning. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 16, 3 (2020), 1–20.Google ScholarDigital Library
- [21] . 2022. Dataset bias in few-shot image recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence (2022).Google Scholar
- [22] . 2022. Online learning for adaptive video streaming in mobile networks. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 18, 1 (2022), 1–22.Google ScholarDigital Library
- [23] . 2017. FearNet: Brain-inspired model for incremental learning. arXiv preprint arXiv:1711.10563 (2017).Google Scholar
- [24] . 2020. Supervised contrastive learning. Advances in Neural Information Processing Systems 33 (2020), 18661–18673.Google Scholar
- [25] . 2017. Overcoming catastrophic forgetting in neural networks. Proceedings of the National Academy of Sciences 114, 13 (2017), 3521–3526.Google ScholarCross Ref
- [26] . 2009. Learning multiple layers of features from tiny images. (2009).Google Scholar
- [27] . 2017. CleanNet: Transfer learning for scalable image classifier training with label noise.Google Scholar
- [28] . 2020. A neural Dirichlet process mixture model for task-free continual learning. arXiv preprint arXiv:2001.00689 (2020).Google Scholar
- [29] . 2017. WebVision database: Visual learning and understanding from web data. arXiv preprint arXiv:1708.02862 (2017).Google Scholar
- [30] . 2022. Twin contrastive learning for online clustering. International Journal of Computer Vision 130, 9 (2022), 2205–2221.Google ScholarDigital Library
- [31] . 2017. Learning without forgetting. IEEE Transactions on Pattern Analysis and Machine Intelligence 40, 12 (2017), 2935–2947.Google ScholarDigital Library
- [32] . 2021. The CLEAR benchmark: Continual LEArning on real-world imagery. In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2).Google Scholar
- [33] . 2017. Core50: A new dataset and benchmark for continuous object recognition. In Conference on Robot Learning. PMLR, 17–26.Google Scholar
- [34] . 2017. Gradient episodic memory for continual learning. Advances in Neural Information Processing Systems 30 (2017).Google Scholar
- [35] . 2022. Exploring relations in untrimmed videos for self-supervised learning. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 18, 1s (2022), 1–21.Google ScholarDigital Library
- [36] . 2021. Rethinking the representational continuity: Towards unsupervised continual learning. arXiv preprint arXiv:2110.06976 (2021).Google Scholar
- [37] . 2022. Online continual learning in image classification: An empirical survey. Neurocomputing 469 (2022), 28–51.Google ScholarDigital Library
- [38] . 2021. Supervised contrastive replay: Revisiting the nearest class mean classifier in online class-incremental continual learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3589–3599.Google ScholarCross Ref
- [39] . 2013. The stability-plasticity dilemma: Investigating the continuum from catastrophic forgetting to age-limited learning effects. Frontiers in Psychology 4 (2013), 504.Google ScholarCross Ref
- [40] . 2016. Key-value memory networks for directly reading documents. In Conference on Empirical Methods in Natural Language Processing.Google ScholarCross Ref
- [41] . 2020. ISIA Food-500: A dataset for large-scale food recognition via stacked global-local attention network. In Proceedings of the 28th ACM International Conference on Multimedia. 393–401.Google ScholarDigital Library
- [42] . 2022. Fully unsupervised person re-identification via selective contrastive learning. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 18, 2 (2022), 1–15.Google ScholarDigital Library
- [43] . 2017. Lifelong learning of human actions with deep neural network self-organization. Neural Networks 96 (2017), 137–149.Google ScholarCross Ref
- [44] . 2018. Lifelong learning of spatiotemporal representations with dual-memory recurrent self-organization. Frontiers in Neurorobotics (2018), 78.Google ScholarCross Ref
- [45] . 2020. Latent replay for real-time continual learning. In 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 10203–10209.Google ScholarDigital Library
- [46] . 2020. GDumb: A simple approach that questions our progress in continual learning. In Proceedings of the European Conference on Computer Vision (ECCV). Springer, 524–540.Google ScholarDigital Library
- [47] . 2017. PointNet: Deep learning on point sets for 3D classification and segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 652–660.Google Scholar
- [48] . 2017. iCaRL: Incremental classifier and representation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2001–2010.Google ScholarCross Ref
- [49] . 2016. Progressive neural networks. arXiv preprint arXiv:1606.04671 (2016).Google Scholar
- [50] . 2018. Overcoming catastrophic forgetting with hard attention to the task. In International Conference on Machine Learning. PMLR, 4548–4557.Google Scholar
- [51] . 2021. Online class-incremental continual learning with adversarial Shapley value. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 9630–9638.Google ScholarCross Ref
- [52] . 2016. YFCC100M: The new data in multimedia research. Commun. ACM 59, 2 (2016), 64–73.Google ScholarDigital Library
- [53] . 2022. GCR: Gradient coreset based replay buffer selection for continual learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 99–108.Google ScholarCross Ref
- [54] . 2019. Three scenarios for continual learning. (2019).Google Scholar
- [55] . 2016. Matching networks for one shot learning. Advances in Neural Information Processing Systems 29 (2016).Google Scholar
- [56] . 2022. Continual learning for visual search with backward consistent feature embedding. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 16702–16711.Google ScholarCross Ref
- [57] . 2019. Dynamic graph CNN for learning on point clouds. ACM Transactions on Graphics (TOG) 38, 5 (2019), 1–12.Google ScholarDigital Library
- [58] . 2022. Continual learning with lifelong vision transformer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 171–181.Google ScholarCross Ref
- [59] . 2014. Memory networks. Eprint Arxiv (2014).Google Scholar
- [60] . 2017. End-to-End Memory Networks.Google Scholar
- [61] . 2021. Cross-modal hybrid feature fusion for image-sentence matching. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 17, 4 (2021), 1–23.Google ScholarDigital Library
- [62] . 2022. Meta-attention for ViT-backed continual learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 150–159.Google ScholarCross Ref
- [63] . 2022. Learning Bayesian sparse networks with full experience replay for continual learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 109–118.Google ScholarCross Ref
- [64] . 2017. Continual learning through synaptic intelligence. In International Conference on Machine Learning. PMLR, 3987–3995.Google ScholarDigital Library
- [65] . 2018. Task agnostic continual learning using online variational Bayes. arXiv preprint arXiv:1803.10123 (2018).Google Scholar
Index Terms
- Instance-Based Continual Learning: A Real-World Dataset and Baseline for Fresh Recognition
Recommendations
A Sphere-Description-Based Approach for Multiple-Instance Learning
Multiple-instance learning (MIL) is a generalization of supervised learning which addresses the classification of bags. Similar to traditional supervised learning, most of the existing MIL work is proposed based on the assumption that a representative ...
Online continual learning for human activity recognition
AbstractSensor-based human activity recognition (HAR), with the ability to recognise human activities from wearable or embedded sensors, has been playing an important role in many applications including personal health monitoring, smart home, and ...
Multiple instance learning with bag dissimilarities
Multiple instance learning (MIL) is concerned with learning from sets (bags) of objects (instances), where the individual instance labels are ambiguous. In this setting, supervised learning cannot be applied directly. Often, specialized MIL methods ...
Comments