research-article

Combining Diverse Meta-Features to Accurately Identify Recurring Concept Drift in Data Streams

Authors:
Ben Halstead

University of Auckland, Auckland, New Zealand

University of Auckland, Auckland, New Zealand

0000-0002-1597-4284
View Profile

,
Yun Sing Koh

University of Auckland, Auckland, New Zealand

University of Auckland, Auckland, New Zealand

0000-0001-7256-4049
View Profile

,
Patricia Riddle

University of Auckland, Auckland, New Zealand

University of Auckland, Auckland, New Zealand

0000-0001-8616-0053
View Profile

,
Mykola Pechenizkiy

Eindhoven University of Technology, AE Eindhoven, The Netherlands

Eindhoven University of Technology, AE Eindhoven, The Netherlands

0000-0003-4955-0743
View Profile

,
Albert Bifet

University of Waikato and LTCI, Télécom Paris, IP-Paris

University of Waikato and LTCI, Télécom Paris, IP-Paris

0000-0002-8339-7773
View Profile

ACM Transactions on Knowledge Discovery from Data Volume 17 Issue 8Article No.: 107pp 1–36https://doi.org/10.1145/3587098

Published:12 May 2023Publication History

ACM Transactions on Knowledge Discovery from Data

Abstract

Learning from streaming data is challenging as the distribution of incoming data may change over time, a phenomenon known as concept drift. The predictive patterns, or experience learned under one distribution may become irrelevant as conditions change under concept drift, but may become relevant once again when conditions reoccur. Adaptive learning methods adapt a classifier to concept drift by identifying which distribution, or concept, is currently present in order to determine which experience is relevant. Identifying a concept requires some representation to be stored for comparison, with the quality of the representation being key to accurate identification. Existing concept representations are based on meta-features, efficient univariate summaries of a concept. However, no single meta-feature can fully represent a concept, leading to severe accuracy loss when existing representations cannot describe concept drift. To avoid these failure cases, we propose the first general framework for combining a diverse range of meta-features into a single representation. We solve two main challenges, first presenting a method of efficiently computing, storing, and querying an arbitrary set of meta-features as a single representation, showing that a combination of meta-features may successfully avoid failure cases seen with existing methods. Second, we present the first method for dynamically learning which meta-features distinguish concepts in any given dataset, significantly improving performance. Our proposed approach enables state-of-the-art feature selection methods, such as mutual information, to be applied to concept representation meta-features for the first time. We investigate tradeoffs between memory budget and classification performance, observing accuracy increases of up to 16% by dynamically weighting the contribution of each meta-feature.

REFERENCES

[1] Ahmadi Zahra and Kramer Stefan. 2018. Modeling recurring concepts in data streams: A graph-based framework. Knowledge and Information Systems 55, 1 (2018), 15–44.Google ScholarDigital Library
[2] Al-Khateeb Tahseen, Masud Mohammad M., Al-Naami Khaled M., Seker Sadi Evren, Mustafa Ahmad M., Khan Latifur, Trabelsi Zouheir, Aggarwal Charu, and Han Jiawei. 2016. Recurring and novel class detection using class-based ensemble for evolving data stream. IEEE Transactions on Knowledge and Data Engineering 28, 10 (2016), 2752–2764. DOI:Google ScholarDigital Library
[3] Alippi Cesare, Boracchi Giacomo, and Roveri Manuel. 2013. Just-in-time classifiers for recurrent concepts. IEEE Transactions on Neural Networks and Learning Systems 24, 4 (2013), 620–634.Google ScholarCross Ref
[4] Anderson Robert, Koh Yun Sing, and Dobbie Gillian. 2016. CPF: Concept profiling framework for recurring drifts in data streams. In Proceedings of the Australasian Joint Conference on Artificial Intelligence. Springer, 203–214.Google ScholarDigital Library
[5] Anderson Robert, Koh Yun Sing, Dobbie Gillian, and Bifet Albert. 2019. Recurring concept meta-learning for evolving data streams. Expert Systems with Applications 138 (2019), 112832.Google ScholarCross Ref
[6] Barddal Jean Paul, Enembreck Fabrício, Gomes Heitor Murilo, Bifet Albert, and Pfahringer Bernhard. 2019. Merit-guided dynamic feature selection filter for data streams. Expert Systems with Applications 116 (2019), 227–242.Google ScholarCross Ref
[7] Barros Roberto S. M., Cabral Danilo R. L., Jr. Paulo M. Gonçalves, and Santos Silas G. T. C.. 2017. RDDM: Reactive drift detection method. Expert Systems with Applications 90, C (2017), 344–355.Google ScholarDigital Library
[8] Bifet Albert and Gavaldà Ricard. 2007. Learning from time-changing data with adaptive windowing. In Proceedings of the 2007 SIAM International Conference on Data Mining. SIAM, 443–448.Google ScholarCross Ref
[9] Brzezinski Dariusz and Stefanowski Jerzy. 2014. Prequential AUC for classifier evaluation and drift detection in evolving data streams. In Proceedings of the International Workshop on New Frontiers in Mining Complex Patterns. Springer, ACM, 87–101.Google Scholar
[10] Cavalcante Rodolfo C., Minku Leandro L., and Oliveira Adriano L. I.. 2016. FEDD: Feature extraction for explicit concept drift detection in time series. In Proceedings of the 2016 International Joint Conference on Neural Networks. IEEE, 740–747.Google ScholarCross Ref
[11] Chiu Chun Wai and Minku Leandro L.. 2018. Diversity-based pool of models for dealing with recurring concepts. In Proceedings of the 2018 International Joint Conference on Neural Networks. IEEE, 1–8.Google ScholarCross Ref
[12] Costa Fausto G. da, Duarte Felipe S. L. G., Vallim Rosane M. M., and Mello Rodrigo F. de. 2017. Multidimensional surrogate stability to detect data stream concept drift. Expert Systems with Applications 87, C (2017), 15–29.Google ScholarDigital Library
[13] Cabral Danilo Rafael de Lima and Barros Roberto Souto Maior de. 2018. Concept drift detection based on Fisher’s Exact test. Information Sciences 442, C (2018), 220–234.Google ScholarDigital Library
[14] Ding Fengqian and Luo Chao. 2019. The entropy-based time domain feature extraction for online concept drift detection. Entropy 21, 12 (2019), 1187.Google ScholarCross Ref
[15] Dries Anton and Rückert Ulrich. 2009. Adaptive concept drift detection. Statistical Analysis and Data Mining: The ASA Data Science Journal 2, 5–6 (2009), 311–327.Google ScholarCross Ref
[16] Gama João, Medas Pedro, Castillo Gladys, and Rodrigues Pedro. 2004. Learning with drift detection. In Advances in Artificial Intelligence—SBIA 2004. A. L. C. Bazzan and S. Labidi (Eds.), Springer, 286–295.Google Scholar
[17] Ghashami Mina, Liberty Edo, Phillips Jeff M., and Woodruff David P.. 2016. Frequent directions: Simple and deterministic matrix sketching. SIAM Journal on Computing 45, 5 (2016), 1762–1792.Google ScholarDigital Library
[18] Gomes Heitor M., Bifet Albert, Read Jesse, Barddal Jean Paul, Enembreck Fabrício, Pfharinger Bernhard, Holmes Geoff, and Abdessalem Talel. 2017. Adaptive random forests for evolving data stream classification. Machine Learning 106, 9–10 (2017), 1469–1495.Google ScholarDigital Library
[19] Jr. Paulo Mauricio Gonçalves and Barros Roberto Souto Maior de. 2013. RCD: A recurring concept drift framework. Pattern Recognition Letters 34, 9 (2013), 1018–1025.Google ScholarDigital Library
[20] Gu Quanquan, Li Zhenhui, and Han Jiawei. 2011. Generalized Fisher score for feature selection. In Proceedings of the 27th Conference on Uncertainty in Artificial Intelligence. 266–273.Google Scholar
[21] Halstead Ben, Koh Yun Sing, Pechenizkiy Mykola, Bifet Albert, and Pears Russel. 2021. Fingerprinting concepts in data streams with supervised and unsupervised meta-information. In Proceedings of the 2021 IEEE 37th International Conference on Data Engineering.Google ScholarCross Ref
[22] Halstead Ben, Koh Yun Sing, Riddle Patricia, Pears Russel, Pechenizkiy Mykola, and Bifet Albert. 2021. Recurring concept memory management in data streams: Exploiting data stream concept evolution to improve performance and transparency. Data Mining and Knowledge Discovery 35, 3 (2021), 1–41.Google ScholarDigital Library
[23] Haug Johannes and Kasneci Gjergji. 2021. Learning parameter distributions to detect concept drift in data streams. In Proceedings of the 2020 25th International Conference on Pattern Recognition. IEEE, 9452–9459.Google ScholarCross Ref
[24] He Yi, Yuan Xu, Chen Sheng, and Wu Xindong. 2021. Online learning in variable feature spaces under incomplete supervision. In Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 35, 4106–4114.Google ScholarCross Ref
[25] Hinder Fabian, Vaquet Valerie, and Hammer Barbara. 2022. Suitability of different metric choices for concept drift detection. In Advances in Intelligent Data Analysis XX: 20th International Symposium on Intelligent Data Analysis (IDA’22). Springer-Verlag, 157–170. Google ScholarDigital Library
[26] Hu Hanqing, Kantardzic Mehmed, and Lyu Lingyu. 2018. Detecting different types of concept drifts with ensemble framework. In Proceedings of the 2018 17th IEEE International Conference on Machine Learning and Applications. IEEE, 344–350.Google ScholarCross Ref
[27] Huang David Tse Jung, Koh Yun Sing, Dobbie Gillian, and Bifet Albert. 2015. Drift detection using stream volatility. In Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 417–432.Google ScholarDigital Library
[28] Katakis Ioannis, Tsoumakas Grigorios, and Vlahavas Ioannis. 2010. Tracking recurring contexts using ensemble classifiers: An application to email filtering. Knowledge and Information Systems 22, 3 (2010), 371–391.Google ScholarCross Ref
[29] Kolter J. Zico and Maloof Marcus A.. 2007. Dynamic weighted majority: An ensemble method for drifting concepts. The Journal of Machine Learning Research 8 (2007), 2755–2790.Google ScholarDigital Library
[30] Kuncheva Ludmila I.. 2011. Change detection in streaming multivariate data using likelihood detectors. IEEE Transactions on Knowledge and Data Engineering 25, 5 (2011), 1175–1180.Google ScholarDigital Library
[31] Kuncheva Ludmila I. and Faithfull William J.. 2013. PCA feature extraction for change detection in multidimensional unlabeled data. IEEE Transactions on Neural Networks and Learning Systems 25, 1 (2013), 69–80.Google ScholarCross Ref
[32] Li Jundong, Cheng Kewei, Wang Suhang, Morstatter Fred, Trevino Robert P., Tang Jiliang, and Liu Huan. 2017. Feature selection: A data perspective. ACM Computing Surveys 50, 6 (2017), 1–45.Google ScholarDigital Library
[33] Lobo Jesus L., Ser Javier Del, Osaba Eneko, Bifet Albert, and Herrera Francisco. 2021. CURIE: A cellular automaton for concept drift detection. Data Mining and Knowledge Discovery 35, 6 (2021), 2655–2678.Google ScholarDigital Library
[34] Masud Mohammad, Gao Jing, Khan Latifur, Han Jiawei, and Thuraisingham Bhavani M.. 2010. Classification and novel class detection in concept-drifting data streams under time constraints. IEEE Transactions on Knowledge and Data Engineering 23, 6 (2010), 859–874.Google ScholarDigital Library
[35] Montiel Jacob, Read Jesse, Bifet Albert, and Abdessalem Talel. 2018. Scikit-Multiflow: A multi-output streaming framework. Journal of Machine Learning Research 19, 72 (2018), 1–5.Google Scholar
[36] Reis Denis Moreira dos, Maletzke André, Silva Diego F., and Batista Gustavo EAPA. 2018. Classifying and counting with recurrent contexts. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 1983–1992.Google ScholarDigital Library
[37] Nishida Kyosuke and Yamauchi Koichiro. 2007. Detecting concept drift using statistical testing. In Proceedings of the 10th International Conference on Discovery Science. Springer, Berlin, 264–269.Google ScholarDigital Library
[38] Pesaranghader Ali and Viktor Herna L.. 2016. Fast Hoeffding drift detection method for evolving data streams. In Machine Learning and Knowledge Discovery in Databases. P. Frasconi, N. Landwehr, G. Manco, and J. Vreeken (Eds.), Springer International Publishing, 96–111.Google Scholar
[39] Ren Siqi, Liao Bo, Zhu Wen, and Li Keqin. 2018. Knowledge-maximized ensemble algorithm for different types of concept drift. Information Sciences 430–431 (2018), 261–281. DOI:Google ScholarCross Ref
[40] Ryu Joung Woo, Kantardzic Mehmed M., and Kim Myung-Won. 2012. Efficiently maintaining the performance of an ensemble classifier in streaming data. In Proceedings of the International Conference on Hybrid Information Technology. Springer, 533–540.Google ScholarCross Ref
[41] Sethi Tegjyot Singh and Kantardzic Mehmed. 2017. On the reliable detection of concept drift from streaming unlabeled data. Expert Systems with Applications 82, C (2017), 77–99.Google ScholarDigital Library
[42] Tang Jiliang, Alelyani Salem, and Liu Huan. 2014. Feature selection for classification: A review. In Data Classification: Algorithms and Applications. Charu C. Aggarwal (Ed.). CRC Press, 37–64.Google Scholar
[43] Vallim Rosane M. M. and Mello Rodrigo F. de. 2014. Proposal of a new stability concept to detect changes in unsupervised data streams. Expert Systems with Applications 41, 16 (2014), 7350–7360.Google ScholarDigital Library
[44] Wang Jialei, Zhao Peilin, Hoi Steven C. H., and Jin Rong. 2014. Online feature selection and its applications. IEEE Transactions on Knowledge and Data Engineering 26, 3 (2014), 698–710. DOI:Google ScholarDigital Library
[45] Wang Shuo and Minku Leandro L.. 2020. AUC estimation and concept drift detection for imbalanced data streams with multiple classes. In Proceedings of the 2020 International Joint Conference on Neural Networks. IEEE, 1–8.Google ScholarCross Ref
[46] Wang Shuo, Minku Leandro L., and Yao Xin. 2018. A systematic study of online class imbalance learning with concept drift. IEEE Transactions on Neural Networks and Learning Systems 29, 10 (2018), 4802–4821.Google ScholarCross Ref
[47] Wu Ocean, Koh Yun Sing, Dobbie Gillian, and Lacombe Thomas. 2021. Nacre: Proactive recurrent concept drift detection in data streams. In Proceedings of the 2021 International Joint Conference on Neural Networks. 1–8. DOI:Google ScholarCross Ref
[48] You Dianlong, Sun Miaomiao, Liang Shunpan, Li Ruiqi, Wang Yang, Xiao Jiawei, Yuan Fuyong, Shen Limin, and Wu Xindong. 2022. Online feature selection for multi-source streaming features. Information Sciences 590, C (2022), 267–295. DOI:Google ScholarDigital Library
[49] You Dianlong, Wu Xindong, Shen Limin, Deng Song, Chen Zhen, Ma Chuan, and Lian Qiusheng. 2019. Online feature selection for streaming features using self-adaption sliding-window sampling. IEEE Access 7 (2019), 16088–16100. DOI:Google ScholarCross Ref
[50] Zheng Shihao. 2019. Labelless Concept Drift Detection and Explanation. Master’s thesis. Eindhoven University of Technology.Google Scholar
[51] Zheng Xiulin, Li Peipei, Hu Xuegang, and Yu Kui. 2021. Semi-supervised classification on data streams with recurring concept drift and concept evolution. Knowledge-Based Systems 215 (2021), 106749. DOI:Google ScholarCross Ref

Index Terms

Combining Diverse Meta-Features to Accurately Identify Recurring Concept Drift in Data Streams
1. Computing methodologies
  1. Machine learning

Recommendations

Algorithm of Recurring Concept Drift Base on Main Feature Extraction
ICCAI '19: Proceedings of the 2019 5th International Conference on Computing and Artificial Intelligence

Recurring concept drift is one of the sub-types of concept drift. In recurring concept drift detection, it is very important to represent concepts and select the most appropriate classifier to classify. We propose an algorithm, conceptual clustering and ...
Read More
CPF: Concept Profiling Framework for Recurring Drifts in Data Streams
AI 2016: Advances in Artificial Intelligence
Abstract
We propose the Concept Profiling Framework (CPF), a meta-learner that uses a concept drift detector and a collection of classification models to perform effective classification on data streams with recurrent concept drifts, through relating ...
Read More
Modeling recurring concepts in data streams: a graph-based framework

Classifying a stream of non-stationary data with recurrent drift is a challenging task and has been considered as an interesting problem in recent years. All of the existing approaches handling recurrent concepts maintain a pool of concepts/classifiers ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Knowledge Discovery from Data Volume 17, Issue 8
September 2023
348 pages
ISSN:1556-4681
EISSN:1556-472X
DOI:10.1145/3596449
Editor:
Charu Aggarwal
IBM T. J. Watson Research, USA
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 12 May 2023
- Online AM: 7 March 2023
- Accepted: 27 February 2023
- Revised: 30 December 2022
- Received: 17 June 2022
Published in tkdd Volume 17, Issue 8

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Data Streaming
adaptive learning
meta-features
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 3
  Total Citations
  View Citations
- 512
  Total Downloads
- Downloads (Last 12 months)447
- Downloads (Last 6 weeks)39
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

View Full Text

Combining Diverse Meta-Features to Accurately Identify Recurring Concept Drift in Data Streams

ACM Transactions on Knowledge Discovery from Data

Abstract

REFERENCES

Cited By

Index Terms

Recommendations

Algorithm of Recurring Concept Drift Base on Main Feature Extraction

CPF: Concept Profiling Framework for Recurring Drifts in Data Streams

Modeling recurring concepts in data streams: a graph-based framework