Abstract
In the last decade the interest in adaptive models for non-stationary environments has gained momentum within the research community due to an increasing number of application scenarios generating non-stationary data streams. In this context the literature has been specially rich in terms of ensemble techniques, which in their majority have focused on taking advantage of past information in the form of already trained predictive models and other alternatives alike. This manuscript elaborates on a rather different approach, which hinges on extracting the essential predictive information of past trained models and determining therefrom the best candidates (intelligent sample matchmaking) for training the predictive model of the current data batch. This novel perspective is of inherent utility for data streams characterized by short-length unbalanced data batches, situation where the so-called trade-off between plasticity and stability must be carefully met. The approach is evaluated on a synthetic data set that simulates a non-stationary environment with recurrently changing concept drift. The proposed approach is shown to perform competitively when adapting to a sudden and recurrent change with respect to the state of the art, but without storing all the past trained models and by lessening its computational complexity in terms of model evaluations. These promising results motivate future research aimed at validating the proposed strategy on other scenarios under concept drift, such as those characterized by semi-supervised data streams.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Ditzler, G., Roveri, M., Alippi, C., Polikar, R.: Learning in Nonstationary Environments: A Survey. IEEE Comp. Int. Magazine, 10(4), 12–25 (2015)
Žliobaitė, I., Pechenizkiy, M., Gama, J.: An Overview of Concept Drift Applications. Big Data Analysis: New Algorithms for a New Society, 91–114 (2016)
Hoens, T. R., Polikar, R., Chawla, N. V.: Learning from Streaming Data with Concept Drift and Imbalance: an Overview. Progress in Artificial Intelligence, 1(1), 89–101 (2012)
Grossberg, S.: Nonlinear Neural Networks: Principles, Mechanisms, and Architectures. Neural Networks, 1(1), 17–61 (1988)
Nick Street, W., Kim, Y.: A Streaming Ensemble Algorithm (SEA) for Large-Scale Classification. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 377–382 (2001)
Schlimmer, J. C., Granger, R. H.: Incremental Learning from Noisy Data. Machine Learning, 1(3), 317–354 (1986)
Tsymbal, A.: The Problem of Concept Drift: Definitions and Related Work. Computer Science Department, Trinity College Dublin, 106:2 (2004)
Heywood, M. I.: Evolutionary Model Building under Streaming Data for Classification Tasks: Opportunities and Challenges. Genetic Programming and Evolvable Machines, 16(3), 283–326 (2015)
Gama, J., Žliobaitė, I., Bifet, A., Pechenizkiy, M., Bouchachia, A.: A Survey on Concept Drift Adaptation. ACM Computing Surveys, 46(4), 44 (2014)
Elwell, R., Polikar, R.: Incremental Learning of Concept Drift in Nonstationary Environments. IEEE Transactions on Neural Networks, 22(10), 1517–1531 (2011)
Ditzler, G., Polikar, R.: An Ensemble based Incremental Learning Framework for Concept Drift and Class Imbalance. International Joint Conference on Neural Networks, 1–8 (2010)
Ditzler, G., Polikar, R.: Incremental Learning of Concept Drift from Streaming Imbalanced Data. IEEE Transactions on Knowledge and Data Engineering, 25(10), 2283–2301 (2013)
Minku, L. L., Yao, X.: DDD: A New Ensemble Approach for Dealing with Concept Drift. IEEE Transactions on Knowledge and Data Engineering, 24(4), 619–633 (2012)
Mirza, B., Lin, Z., Liu, N.: Ensemble of Subset Online Sequential Extreme Learning Machine for Class Imbalance and Concept Drift. Neurocomputing, 149(Part A), 316–329 (2015)
De La Torre, M., Granger, E., Sabourin, R., Gorodnichy, D. O.: Adaptive Skew-sensitive Ensembles for Face Recognition in Video Surveillance. Pattern Recognition, 48(11), 3385–3406 (2015)
Zhang, D., Shen, H., Hui, T., Li, Y., Wu, J., Sang, Y.: A Selectively Re-train Approach based on Clustering to Classify Concept-Drifting Data Streams with Skewed Distribution. Advances in Knowledge Discovery and Data Mining, 413–424 (2014)
Wang, S., Minku, L. L., Ghezzi, D., Caltabiano, D., Tino, P., Yao, X.: Concept Drift Detection for Online Class Imbalance Learning. International Joint Conference on Neural Networks, 1–10 (2013)
Chawla, N. V., Bowyer, K. W., Hall, L. O., Kegelmeyer, W. P.: SMOTE: Synthetic Minority Over-sampling Technique. Journal of Artificial Intelligence Research, 16, 321–357 (2002)
He, H., Chen, S.: Towards Incremental Learning of Nonstationary Imbalanced Data Stream: A Multiple Selectively Recursive Approach. Evolving Systems, 2(1), 35–50 (2011)
Ditzler, G., Polikar, R., Chawla, N. V.: An Incremental Learning Algorithm for Non-stationary Environments and Class Imbalance. International Conference on Pattern Recognition, 2997–3000 (2010)
Ester, M., Kriegel, H. P., Sander, J., Xu, X.: A Density-based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. KDD, 96(34), 226–231 (1996)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Lobo, J.L., Del Ser, J., Bilbao, M.N., Laña, I., Salcedo-Sanz, S. (2017). A Probabilistic Sample Matchmaking Strategy for Imbalanced Data Streams with Concept Drift. In: Badica, C., et al. Intelligent Distributed Computing X. IDC 2016. Studies in Computational Intelligence, vol 678. Springer, Cham. https://doi.org/10.1007/978-3-319-48829-5_23
Download citation
DOI: https://doi.org/10.1007/978-3-319-48829-5_23
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-48828-8
Online ISBN: 978-3-319-48829-5
eBook Packages: EngineeringEngineering (R0)