skip to main content
10.1145/3442381.3449820acmconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article

CLEAR: Contrastive-Prototype Learning with Drift Estimation for Resource Constrained Stream Mining

Published: 03 June 2021 Publication History

Abstract

Non-stationary data stream mining aims to classify large scale online instances that emerge continuously. The most apparent challenge compared with the offline learning manner is the issue of consecutive emergence of new categories, when tackling non-static categorical distribution. Non-stationary stream settings often appear in real-world applications, e.g., online classification in E-commerce systems that involves the incoming productions, or the summary of news topics on social networks (Twitter). Ideally, a learning model should be able to learn novel concepts from labeled data (in new tasks) and reduce the abrupt degradation of model performance on the old concept (also named catastrophic forgetting problem). In this work, we focus on improving the performance of the stream mining approach under the constrained resources, where both the memory resource of old data and labeled new instances are limited/scarce. We propose a simple yet efficient resource-constrained framework CLEAR to facilitate previous challenges during the one-pass stream mining. Specifically, CLEAR focuses on creating and calibrating the class representation (the prototype) in the embedding space. We first apply the contrastive-prototype learning on large amount of unlabeled data, and generate the discriminative prototype for each class in the embedding space. Next, for updating on new tasks/categories, we propose a drift estimation strategy to calibrate/compensate for the drift of each class representation, which could reduce the knowledge forgetting without storing any previous data. We perform experiments on public datasets (e.g., CUB200, CIFAR100) under stream setting, our approach is consistently and clearly better than many state-of-the-art methods, along with both the memory and annotation restriction.

References

[1]
Rahaf Aljundi, Francesca Babiloni, Mohamed Elhoseiny, Marcus Rohrbach, and Tinne Tuytelaars. 2018. Memory aware synapses: Learning what (not) to forget. In ECCV. 139–154.
[2]
Rahaf Aljundi, Klaas Kelchtermans, and Tinne Tuytelaars. 2019. Task-free continual learning. In CVPR. 11254–11263.
[3]
Arslan Chaudhry, Marc’Aurelio Ranzato, Marcus Rohrbach, and Mohamed Elhoseiny. 2019. Efficient lifelong learning with a-gem. ICLR (2019).
[4]
Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. 2020. A simple framework for contrastive learning of visual representations. ICML (2020).
[5]
Ekin D Cubuk, Barret Zoph, Dandelion Mane, Vijay Vasudevan, and Quoc V Le. 2019. Autoaugment: Learning augmentation strategies from data. In CVPR. 113–123.
[6]
Monidipa Das, Mahardhika Pratama, Septiviana Savitri, and Jie Zhang. 2019. Muse-rnn: A multilayer self-evolving recurrent neural network for data stream classification. In ICDM. IEEE, 110–119.
[7]
Prithviraj Dhar, Rajat Vikram Singh, Kuan-Chuan Peng, Ziyan Wu, and Rama Chellappa. 2019. Learning without memorizing. In CVPR. 5138–5146.
[8]
Carl Doersch, Abhinav Gupta, and Alexei A Efros. 2015. Unsupervised visual representation learning by context prediction. In ICCV. 1422–1430.
[9]
Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. 2014. Rich feature hierarchies for accurate object detection and semantic segmentation. In CVPR. 580–587.
[10]
Lan-Zhe Guo, Zhen-Yu Zhang, Yuan Jiang, Yu-Feng Li, and Zhi-Hua Zhou. 2020. Safe deep semi-supervised learning for unseen-class unlabeled data. In ICML.
[11]
Ahsanul Haque, Latifur Khan, and Michael Baron. 2016. SAND: Semi-Supervised Adaptive Novel Class Detection and Classification over Data Stream. In AAAI. 1652–1658.
[12]
Ahsanul Haque, Latifur Khan, Michael Baron, Bhavani Thuraisingham, and Charu Aggarwal. 2016. Efficient handling of concept drift and concept evolution over stream data. In ICDE. IEEE, 481–492.
[13]
Munawar Hayat, Salman Khan, Syed Waqas Zamir, Jianbing Shen, and Ling Shao. 2019. Gaussian affinity for max-margin class imbalanced learning. In ICCV. 6469–6479.
[14]
Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, and Ross Girshick. 2019. Momentum contrast for unsupervised visual representation learning. arXiv preprint arXiv:1911.05722(2019).
[15]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In CVPR. 770–778.
[16]
Cheng-Kang Hsieh, Longqi Yang, Yin Cui, Tsung-Yi Lin, Serge Belongie, and Deborah Estrin. 2017. Collaborative metric learning. In WWW. 193–201.
[17]
Ma Jianxin, Zhou Chang, Yang Hongxia, Cui Peng, Wang Xin, and Zhu Wenwu. 2020. Disentangled Self-Supervision in Sequential Recommenders. In Proceedings of the 26rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2161–2170.
[18]
Guolin Ke, Zhenhui Xu, Jia Zhang, Jiang Bian, and Tie-Yan Liu. 2019. DeepGBM: A deep learning framework distilled by GBDT for online prediction tasks. In KDD. 384–394.
[19]
Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. ICLR (2014).
[20]
James Kirkpatrick, Razvan Pascanu, Neil Rabinowitz, Joel Veness, Guillaume Desjardins, Andrei A Rusu, Kieran Milan, John Quan, Tiago Ramalho, Agnieszka Grabska-Barwinska, 2017. Overcoming catastrophic forgetting in neural networks. Proceedings of the national academy of sciences (2017), 201611835.
[21]
Alex Krizhevsky and Geoffrey Hinton. 2009. Learning multiple layers of features from tiny images. Technical Report. Citeseer.
[22]
Dingcheng Li, Jingyuan Zhang, and Ping Li. 2019. Tmsa: a mutual learning model for topic discovery and word embedding. In SIAM. 684–692.
[23]
Zhizhong Li and Derek Hoiem. 2017. Learning without forgetting. IEEE transactions on pattern analysis and machine intelligence 40, 12(2017), 2935–2947.
[24]
Yiding Liu, Kaiqi Zhao, and Gao Cong. 2018. Efficient similar region search with deep metric learning. In KDD. 1850–1859.
[25]
Mohammad Masud, Jing Gao, Latifur Khan, Jiawei Han, and Bhavani M. Thuraisingham. 2011. Classification and Novel Class Detection in Concept-Drifting Data Streams Under Time Constraints. IEEE Trans. on Knowl. and Data Eng. 23, 6 (June 2011), 859–874. https://doi.org/10.1109/TKDE.2010.61
[26]
Michael McCloskey and Neal J Cohen. 1989. Catastrophic interference in connectionist networks: The sequential learning problem. In Psychology of learning and motivation. Vol. 24. Elsevier, 109–165.
[27]
Fei Mi and Boi Faltings. 2020. Memory Augmented Neural Model for Incremental Session-based Recommendation. IJCAI (2020).
[28]
Xin Mu, Feida Zhu, Juan Du, Ee-Peng Lim, and Zhi-Hua Zhou. 2017. Streaming classification with emerging new class by class matrix sketching. In AAAI, Vol. 31.
[29]
Michael Opitz, Georg Waltner, Horst Possegger, and Horst Bischof. 2017. Bier-boosting independent embeddings robustly. In ICCV. 5189–5198.
[30]
Brandon Shane Parker and Latifur Khan. 2015. Detecting and Tracking Concept Class Drift and Emergence in Non-Stationary Fast Data Streams. In AAAI. 2908–2913.
[31]
Deepak Pathak, Philipp Krahenbuhl, Jeff Donahue, Trevor Darrell, and Alexei A Efros. 2016. Context encoders: Feature learning by inpainting. In CVPR. 2536–2544.
[32]
Sylvestre-Alvise Rebuffi, Alexander Kolesnikov, Georg Sperl, and Christoph H Lampert. 2017. icarl: Incremental classifier and representation learning. In CVPR. 2001–2010.
[33]
Florian Schroff, Dmitry Kalenichenko, and James Philbin. 2015. Facenet: A unified embedding for face recognition and clustering. In CVPR. 815–823.
[34]
Junming Shao, Zahra Ahmadi, and Stefan Kramer. 2014. Prototype-based learning on concept-drifting data streams. In KDD. 412–421.
[35]
Hanul Shin, Jung Kwon Lee, Jaehong Kim, and Jiwon Kim. 2017. Continual learning with deep generative replay. In NeurIPS. 2990–2999.
[36]
Lei Shu, Hu Xu, and Bing Liu. 2017. DOC: Deep Open Classification of Text Documents. In EMNLP. Association for Computational Linguistics, 2911–2916.
[37]
Jake Snell, Kevin Swersky, and Richard Zemel. 2017. Prototypical networks for few-shot learning. In NeurIPS. 4077–4087.
[38]
Kihyuk Sohn. 2016. Improved deep metric learning with multi-class n-pair loss objective. In NeurIPS. 1857–1865.
[39]
Xiaoyu Tao, Xiaopeng Hong, Xinyuan Chang, Songlin Dong, Xing Wei, and Yihong Gong. 2020. Few-Shot Class-Incremental Learning. In CVPR. 12183–12192.
[40]
Yi Tay, Luu Anh Tuan, and Siu Cheung Hui. 2018. Latent relational metric learning via memory-based attention for collaborative ranking. In WWW. 729–739.
[41]
Gido M van de Ven and Andreas S Tolias. 2019. Three scenarios for continual learning. arXiv preprint arXiv:1904.07734(2019).
[42]
Catherine Wah, Steve Branson, Peter Welinder, Pietro Perona, and Serge Belongie. 2011. The caltech-ucsd birds-200-2011 dataset. (2011).
[43]
Lei Wang, Li Liu, and Latifur Khan. 2004. Automatic image annotation and retrieval using subspace clustering algorithm. In Proceedings of the 2nd ACM international workshop on Multimedia databases. 100–108.
[44]
Xun Wang, Xintong Han, Weilin Huang, Dengke Dong, and Matthew R Scott. 2019. Multi-similarity loss with general pair weighting for deep metric learning. In CVPR. 5022–5030.
[45]
Xinshao Wang, Yang Hua, Elyor Kodirov, Guosheng Hu, and Neil M Robertson. 2019. Deep metric learning by online soft mining and class-aware attention. In AAAI, Vol. 33. 5361–5368.
[46]
Zhuoyi Wang, Zelun Kong, Swarup Changra, Hemeng Tao, and Latifur Khan. 2019. Robust high dimensional stream classification with novel class detection. In 2019 IEEE 35th International Conference on Data Engineering (ICDE). IEEE, 1418–1429.
[47]
Zhuoyi Wang, Yigong Wang, Yu Lin, Evan Delord, and Khan Latifur. 2020. Few-Sample and Adversarial Representation Learning for Continual Stream Mining. In Proceedings of The Web Conference 2020. 718–728.
[48]
Wei Xia and John HL Hansen. 2020. Speaker Representation Learning using Global Context Guided Channel and Time-Frequency Transformations. Proc. Interspeech 2020(2020), 3226–3230.
[49]
Hu Xu, Bing Liu, Lei Shu, and P Yu. 2019. Open-world learning and application to product classification. In WWW. 3413–3419.
[50]
Yang Yang, Da-Wei Zhou, De-Chuan Zhan, Hui Xiong, and Yuan Jiang. 2019. Adaptive deep models for incremental learning: Considering capacity scalability and sustainability. In KDD. 74–82.
[51]
Mang Ye, Xu Zhang, Pong C Yuen, and Shih-Fu Chang. 2019. Unsupervised embedding learning via invariant and spreading instance feature. In CVPR.
[52]
Tomoki Yoshida, Ichiro Takeuchi, and Masayuki Karasuyama. 2018. Safe triplet screening for distance metric learning. In KDD. 2653–2662.
[53]
Lu Yu, Bartlomiej Twardowski, and Joost van de Weijer. 2020. Semantic drift compensation for class-incremental learning. In CVPR. 6982–6991.
[54]
Friedemann Zenke, Ben Poole, and Surya Ganguli. 2017. Continual learning through synaptic intelligence. In ICML. 3987–3995.
[55]
Bowen Zhao, Xi Xiao, Guojun Gan, Bin Zhang, and Shu-Tao Xia. 2020. Maintaining Discrimination and Fairness in Class Incremental Learning. In CVPR. 13208–13217.
[56]
Chen Zhao, Feng Chen, Zhuoyi Wang, and Latifur Khan. 2020. A Primal-Dual Subgradient Approach for Fair Meta Learning. In ICDM. IEEE.
[57]
Yue Zhao and Maciej K Hryniewicki. 2019. DCSO: dynamic combination of detector scores for outlier ensembles. arXiv preprint arXiv:1911.10418(2019).
[58]
Yue Zhao, Zain Nasrullah, and Zheng Li. 2019. PyOD: A Python Toolbox for Scalable Outlier Detection. Journal of Machine Learning Research 20 (2019), 1–7.

Cited By

View all
  • (2024)Dynamic Environment Responsive Online Meta-Learning with Fairness AwarenessACM Transactions on Knowledge Discovery from Data10.1145/364868418:6(1-23)Online publication date: 29-Apr-2024
  • (2024)Online Drift Detection with Maximum Concept DiscrepancyProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3672016(2924-2935)Online publication date: 25-Aug-2024
  • (2024)ConfliLPC: Logits and Parameter Calibration for Political Conflict Analysis in Continual Learning2024 IEEE International Conference on Big Data (BigData)10.1109/BigData62323.2024.10826026(6320-6329)Online publication date: 15-Dec-2024
  • Show More Cited By
  1. CLEAR: Contrastive-Prototype Learning with Drift Estimation for Resource Constrained Stream Mining

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    WWW '21: Proceedings of the Web Conference 2021
    April 2021
    4054 pages
    ISBN:9781450383127
    DOI:10.1145/3442381
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 03 June 2021

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Contrastive Learning
    2. Prototype Drift
    3. Resource Constrained
    4. Stream Mining

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    WWW '21
    Sponsor:
    WWW '21: The Web Conference 2021
    April 19 - 23, 2021
    Ljubljana, Slovenia

    Acceptance Rates

    Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)53
    • Downloads (Last 6 weeks)4
    Reflects downloads up to 23 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Dynamic Environment Responsive Online Meta-Learning with Fairness AwarenessACM Transactions on Knowledge Discovery from Data10.1145/364868418:6(1-23)Online publication date: 29-Apr-2024
    • (2024)Online Drift Detection with Maximum Concept DiscrepancyProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3672016(2924-2935)Online publication date: 25-Aug-2024
    • (2024)ConfliLPC: Logits and Parameter Calibration for Political Conflict Analysis in Continual Learning2024 IEEE International Conference on Big Data (BigData)10.1109/BigData62323.2024.10826026(6320-6329)Online publication date: 15-Dec-2024
    • (2023)A Transparent Blockchain-Based College Admissions Platform2023 IEEE 8th International Conference on Smart Cloud (SmartCloud)10.1109/SmartCloud58862.2023.00029(116-123)Online publication date: 16-Sep-2023
    • (2022)Model and Training Method of the Resilient Image Classifier Considering Faults, Concept Drift, and Adversarial AttacksAlgorithms10.3390/a1510038415:10(384)Online publication date: 19-Oct-2022
    • (2022)COCOAProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/35503166:3(1-28)Online publication date: 7-Sep-2022
    • (2022)Towards Robust False Information Detection on Social Networks with Contrastive LearningProceedings of the 31st ACM International Conference on Information & Knowledge Management10.1145/3511808.3557477(1441-1450)Online publication date: 17-Oct-2022
    • (2022)Latent Coreset Sampling based Data-Free Continual LearningProceedings of the 31st ACM International Conference on Information & Knowledge Management10.1145/3511808.3557375(2077-2087)Online publication date: 17-Oct-2022
    • (2022)CAPT: Contrastive Pre-Training based Semi-Supervised Open-Set Learning2022 IEEE 5th International Conference on Multimedia Information Processing and Retrieval (MIPR)10.1109/MIPR54900.2022.00009(7-13)Online publication date: Aug-2022
    • (2022)How Out-of-Distribution Data Hurts Semi-Supervised Learning2022 IEEE International Conference on Data Mining (ICDM)10.1109/ICDM54844.2022.00087(763-772)Online publication date: Nov-2022

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media