Streaming Data Analytics for Feature Importance Measures in Concept Drift Detection and Adaptation

Alizadeh Mansouri, Ali; Javadtalab, Abbas; Shiri, Nematollaah

doi:10.1007/978-3-031-39847-6_8

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14146))

Included in the following conference series:

International Conference on Database and Expert Systems Applications

526 Accesses

Abstract

Numerous applications require the ability to detect and adapt to concept drifts in streaming data on the fly. This is challenged by limited computational resources and access to archival storage. In this paper, we study features that capture the evolving relationship between raw data features and target labels, and techniques to extract those features. In particular, we focus on the relationship between feature importance measures in streaming data and predictability performance of the main classifier. For this, we consider two groups of feature importance measures: impurity-based and permutation-based, both of which are computed over an auxiliary online gradient boosted decision trees ensemble that runs in parallel to the main classifier in processing the same data stream. We found strong evidence that feature importance measures follow the long-term trend of the performance metrics even if the data streams are non-stationary or deviate from the performance metrics in short-term. Our study also shows that classification models that process data with constant or monotonic rate of drift, are robust in terms of stationary nature of feature importance measures and learner’s predictability performance. Moreover, we found evidence for more consistency and reliability of permutation feature importance measurements over impurity-based ones if data exhibits periodic or non-monotonic rates of drift, or if this knowledge is not known a priori. Our study and results indicate that the feature importance measures considered are viable sources of information for concept drift detection and adaptation problems. This has been established through a solution to these problems we developed based on vector error-correction analysis.

This work was partially supported by Concordia University.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Abdallah, A., Maarof, M.A., Zainal, A.: Fraud detection system: a survey. J. Netw. Comput. Appl. 68, 90–113 (2016)
Article Google Scholar
Alizadeh Mansouri, A., Javadtalab, A., Shiri, N.: An ensemble learning augmentation method for concept drift detection over data streams. In: Advances in Data Science and Information Engineering. Springer (2022)
Google Scholar
Barddal, J.P., Enembreck, F., Gomes, H.M., Bifet, A., Pfahringer, B.: Boosting decision stumps for dynamic feature selection on data streams. Inf. Syst. 83, 13–29 (2019)
Article Google Scholar
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Article MATH Google Scholar
Breiman, L.: Manual on setting up, using, and understanding random forests v3.1. Stat. Dept. Univ. Calif. Berkeley CA, USA 1(58), 3–42 (2002)
Google Scholar
Cassidy, A.P., Deviney, F.A.: Calculating feature importance in data streams with concept drift using online random forest. In: 2014 IEEE International Conference on Big Data (Big Data), pp. 23–28 (2014)
Google Scholar
Castro-Cabrera, P.A., Orozco-Alzate, M., Castellanos-Domínguez, C.G., Huenupán, F., Franco, L.E.: Supervised and unsupervised identification of concept drifts in data streams of seismic-volcanic signals. In: Simari, G.R., Fermé, E., Gutiérrez Segura, F., Rodríguez Melquiades, J.A. (eds.) IBERAMIA 2018. LNCS (LNAI), vol. 11238, pp. 193–205. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-03928-8_16
Chapter Google Scholar
Ditzler, G., Polikar, R.: Hellinger distance based drift detection for nonstationary environments. In: 2011 IEEE Symposium on Computational Intelligence in Dynamic and Uncertain Environments (CIDUE), pp. 41–48 (2011)
Google Scholar
Elwell, R., Polikar, R.: Incremental learning of concept drift in nonstationary environments. IEEE Trans. Neural Netw. 22(10), 1517–1531 (2011)
Article Google Scholar
Engle, R.F., Granger, C.W.J.: Co-integration and error correction: representation, estimation, and testing. Econometrica 55(2), 251–276 (1987)
Article MathSciNet MATH Google Scholar
Friedman, J.H.: Greedy function approximation: a gradient boosting machine. Ann. Stat. 29(5), 1189–1232 (2001)
Article MathSciNet MATH Google Scholar
Gama, J., Medas, P., Castillo, G., Rodrigues, P.: Learning with drift detection. In: Bazzan, A.L.C., Labidi, S. (eds.) SBIA 2004. LNCS (LNAI), vol. 3171, pp. 286–295. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-28645-5_29
Chapter Google Scholar
Gama, J., Žliobaitė, I., Bifet, A., Pechenizkiy, M., Bouchachia, A.: A survey on concept drift adaptation. ACM Comput. Surv. (CSUR) 46(4), 44 (2014)
Article MATH Google Scholar
Gomes, H.M., de Mello, R.F., Pfahringer, B., Bifet, A.: Feature scoring using tree-based ensembles for evolving data streams. In: 2019 IEEE International Conference on Big Data (Big Data), pp. 761–769 (2019)
Google Scholar
Hand, D.J., Adams, N.M.: Selection bias in credit scorecard evaluation. J. Oper. Res. Soc. 65(3), 408–415 (2014)
Article Google Scholar
Harries, M., Wales, N.S.: SPLICE-2 Comparative Evaluation: Electricity Pricing (1999)
Google Scholar
He, Z., Maekawa, K.: On spurious Granger causality. Econ. Lett. 73(3), 307–313 (2001)
Article MathSciNet MATH Google Scholar
Johansen, S.: Estimation and hypothesis testing of cointegration vectors in gaussian vector autoregressive models. Econometrica 59(6), 1551–1580 (1991)
Article MathSciNet MATH Google Scholar
Khamassi, I., Sayed-Mouchaweh, M., Hammami, M., Ghédira, K.: Discussion and review on evolving data streams and concept drift adapting. Evol. Syst. 9(1), 1–23 (2018)
Article Google Scholar
Kuncheva, L.I.: Combining Pattern Classifiers: Methods and Algorithms, 2 edn. John Wiley & Sons, Hoboken (2014)
Google Scholar
Liang, N.y., Huang, G.b., Saratchandran, P., Sundararajan, N.: A fast and accurate online sequential learning algorithm for feedforward networks. IEEE Trans. Neural Netw. 17(6), 1411–1423 (2006)
Google Scholar
Lu, J., Liu, A., Dong, F., Gu, F., Gama, J., Zhang, G.: Learning under concept drift: a review. IEEE Trans. Knowl. Data Eng. 31(12), 2346–2363 (2019)
Google Scholar
Maziarz, M.: A review of the Granger-causality fallacy. J. Philos. Econ. Reflect. Econ. Soc. Issues VIII 2, 86–105 (2015)
Google Scholar
Michaelides, M.P., Reppa, V., Panayiotou, C., Polycarpou, M.: Contaminant event monitoring in intelligent buildings using a multi-zone formulation. IFAC Proc. Vol. 45(20), 492–497 (2012)
Article Google Scholar
Sethi, T.S., Kantardzic, M.: On the reliable detection of concept drift from streaming unlabeled data. Expert Syst. Appl. 82, 77–99 (2017)
Article Google Scholar
Sims, C.A., Stock, J.H., Watson, M.W.: Inference in linear time series models with some unit roots. Econometrica 58(1), 113–144 (1990)
Article MathSciNet MATH Google Scholar
Stolfo, S., Fan, W., Lee, W., Prodromidis, A., Chan, P.: Cost-based modeling for fraud and intrusion detection: results from the JAM project. In: Proceedings DARPA Information Survivability Conference and Exposition. DISCEX’00, vol. 2, pp. 130–144 (2000)
Google Scholar
Street, W.N., Kim, Y.: A streaming ensemble algorithm (SEA) for large-scale classification. In: Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 377–382. KDD ’01, Association for Computing Machinery (2001)
Google Scholar
Unknown: Global Surface Summary of the Day - GSOD
Google Scholar
Vergara, A., Vembu, S., Ayhan, T., Ryan, M.A., Homer, M.L., Huerta, R.: Chemical gas sensor drift compensation using classifier ensembles. Sens. Actuators B Chem. 166–167, 320–329 (2012)
Article Google Scholar
Wang, J., Lu, S., Wang, S.H., Zhang, Y.D.: A review on extreme learning machine. Multimed. Tools Appl. 81(29), 41611–41660 (2022)
Article Google Scholar
Wang, K., Lu, J., Liu, A., Zhang, G., Xiong, L.: Evolving gradient boost: a pruning scheme based on loss improvement ratio for learning under concept drift. IEEE Trans. Cybern. 53(4), 2110–2123 (2023). https://doi.org/10.1109/TCYB.2021.3109796
White, A.P., Liu, W.Z.: Bias in information-based measures in decision tree induction. Mach. Learn. 15(3), 321–329 (1994)
Article MATH Google Scholar
Yang, Z., Al-Dahidi, S., Baraldi, P., Zio, E., Montelatici, L.: A novel concept drift detection method for incremental learning in nonstationary environments. IEEE Trans. Neural Netw. Learn. Syst. 31(1), 309–320 (2020)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Software Engineering, Concordia University, Montreal, Canada
Ali Alizadeh Mansouri, Abbas Javadtalab & Nematollaah Shiri

Authors

Ali Alizadeh Mansouri
View author publications
You can also search for this author in PubMed Google Scholar
Abbas Javadtalab
View author publications
You can also search for this author in PubMed Google Scholar
Nematollaah Shiri
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ali Alizadeh Mansouri .

Editor information

Editors and Affiliations

University of Vienna, Vienna, Austria
Christine Strauss
University of Tsukuba, Ibaraki, Japan
Toshiyuki Amagasa
Johannes Kepler University Linz, Linz, Austria
Gabriele Kotsis
Vienna University of Technology, Vienna, Austria
A Min Tjoa
Johannes Kepler University Linz, Linz, Austria
Ismail Khalil

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Alizadeh Mansouri, A., Javadtalab, A., Shiri, N. (2023). Streaming Data Analytics for Feature Importance Measures in Concept Drift Detection and Adaptation. In: Strauss, C., Amagasa, T., Kotsis, G., Tjoa, A.M., Khalil, I. (eds) Database and Expert Systems Applications. DEXA 2023. Lecture Notes in Computer Science, vol 14146. Springer, Cham. https://doi.org/10.1007/978-3-031-39847-6_8

Download citation

DOI: https://doi.org/10.1007/978-3-031-39847-6_8
Published: 18 August 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-39846-9
Online ISBN: 978-3-031-39847-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Streaming Data Analytics for Feature Importance Measures in Concept Drift Detection and Adaptation