Skip to main content

A Probabilistic Sample Matchmaking Strategy for Imbalanced Data Streams with Concept Drift

  • Conference paper
  • First Online:
Intelligent Distributed Computing X (IDC 2016)

Part of the book series: Studies in Computational Intelligence ((SCI,volume 678))

Included in the following conference series:

  • 629 Accesses

Abstract

In the last decade the interest in adaptive models for non-stationary environments has gained momentum within the research community due to an increasing number of application scenarios generating non-stationary data streams. In this context the literature has been specially rich in terms of ensemble techniques, which in their majority have focused on taking advantage of past information in the form of already trained predictive models and other alternatives alike. This manuscript elaborates on a rather different approach, which hinges on extracting the essential predictive information of past trained models and determining therefrom the best candidates (intelligent sample matchmaking) for training the predictive model of the current data batch. This novel perspective is of inherent utility for data streams characterized by short-length unbalanced data batches, situation where the so-called trade-off between plasticity and stability must be carefully met. The approach is evaluated on a synthetic data set that simulates a non-stationary environment with recurrently changing concept drift. The proposed approach is shown to perform competitively when adapting to a sudden and recurrent change with respect to the state of the art, but without storing all the past trained models and by lessening its computational complexity in terms of model evaluations. These promising results motivate future research aimed at validating the proposed strategy on other scenarios under concept drift, such as those characterized by semi-supervised data streams.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Ditzler, G., Roveri, M., Alippi, C., Polikar, R.: Learning in Nonstationary Environments: A Survey. IEEE Comp. Int. Magazine, 10(4), 12ā€“25 (2015)

    Google ScholarĀ 

  2. Žliobaitė, I., Pechenizkiy, M., Gama, J.: An Overview of Concept Drift Applications. Big Data Analysis: New Algorithms for a New Society, 91ā€“114 (2016)

    Google ScholarĀ 

  3. Hoens, T. R., Polikar, R., Chawla, N. V.: Learning from Streaming Data with Concept Drift and Imbalance: an Overview. Progress in Artificial Intelligence, 1(1), 89ā€“101 (2012)

    Google ScholarĀ 

  4. Grossberg, S.: Nonlinear Neural Networks: Principles, Mechanisms, and Architectures. Neural Networks, 1(1), 17ā€“61 (1988)

    Google ScholarĀ 

  5. Nick Street, W., Kim, Y.: A Streaming Ensemble Algorithm (SEA) for Large-Scale Classification. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 377ā€“382 (2001)

    Google ScholarĀ 

  6. Schlimmer, J. C., Granger, R. H.: Incremental Learning from Noisy Data. Machine Learning, 1(3), 317ā€“354 (1986)

    Google ScholarĀ 

  7. Tsymbal, A.: The Problem of Concept Drift: Definitions and Related Work. Computer Science Department, Trinity College Dublin, 106:2 (2004)

    Google ScholarĀ 

  8. Heywood, M. I.: Evolutionary Model Building under Streaming Data for Classification Tasks: Opportunities and Challenges. Genetic Programming and Evolvable Machines, 16(3), 283ā€“326 (2015)

    Google ScholarĀ 

  9. Gama, J., Žliobaitė, I., Bifet, A., Pechenizkiy, M., Bouchachia, A.: A Survey on Concept Drift Adaptation. ACM Computing Surveys, 46(4), 44 (2014)

    Google ScholarĀ 

  10. Elwell, R., Polikar, R.: Incremental Learning of Concept Drift in Nonstationary Environments. IEEE Transactions on Neural Networks, 22(10), 1517ā€“1531 (2011)

    Google ScholarĀ 

  11. Ditzler, G., Polikar, R.: An Ensemble based Incremental Learning Framework for Concept Drift and Class Imbalance. International Joint Conference on Neural Networks, 1ā€“8 (2010)

    Google ScholarĀ 

  12. Ditzler, G., Polikar, R.: Incremental Learning of Concept Drift from Streaming Imbalanced Data. IEEE Transactions on Knowledge and Data Engineering, 25(10), 2283ā€“2301 (2013)

    Google ScholarĀ 

  13. Minku, L. L., Yao, X.: DDD: A New Ensemble Approach for Dealing with Concept Drift. IEEE Transactions on Knowledge and Data Engineering, 24(4), 619ā€“633 (2012)

    Google ScholarĀ 

  14. Mirza, B., Lin, Z., Liu, N.: Ensemble of Subset Online Sequential Extreme Learning Machine for Class Imbalance and Concept Drift. Neurocomputing, 149(Part A), 316ā€“329 (2015)

    Google ScholarĀ 

  15. De La Torre, M., Granger, E., Sabourin, R., Gorodnichy, D. O.: Adaptive Skew-sensitive Ensembles for Face Recognition in Video Surveillance. Pattern Recognition, 48(11), 3385ā€“3406 (2015)

    Google ScholarĀ 

  16. Zhang, D., Shen, H., Hui, T., Li, Y., Wu, J., Sang, Y.: A Selectively Re-train Approach based on Clustering to Classify Concept-Drifting Data Streams with Skewed Distribution. Advances in Knowledge Discovery and Data Mining, 413ā€“424 (2014)

    Google ScholarĀ 

  17. Wang, S., Minku, L. L., Ghezzi, D., Caltabiano, D., Tino, P., Yao, X.: Concept Drift Detection for Online Class Imbalance Learning. International Joint Conference on Neural Networks, 1ā€“10 (2013)

    Google ScholarĀ 

  18. Chawla, N. V., Bowyer, K. W., Hall, L. O., Kegelmeyer, W. P.: SMOTE: Synthetic Minority Over-sampling Technique. Journal of Artificial Intelligence Research, 16, 321ā€“357 (2002)

    Google ScholarĀ 

  19. He, H., Chen, S.: Towards Incremental Learning of Nonstationary Imbalanced Data Stream: A Multiple Selectively Recursive Approach. Evolving Systems, 2(1), 35ā€“50 (2011)

    Google ScholarĀ 

  20. Ditzler, G., Polikar, R., Chawla, N. V.: An Incremental Learning Algorithm for Non-stationary Environments and Class Imbalance. International Conference on Pattern Recognition, 2997ā€“3000 (2010)

    Google ScholarĀ 

  21. Ester, M., Kriegel, H. P., Sander, J., Xu, X.: A Density-based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. KDD, 96(34), 226ā€“231 (1996)

    Google ScholarĀ 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jesus L. Lobo .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

Ā© 2017 Springer International Publishing AG

About this paper

Cite this paper

Lobo, J.L., Del Ser, J., Bilbao, M.N., LaƱa, I., Salcedo-Sanz, S. (2017). A Probabilistic Sample Matchmaking Strategy for Imbalanced Data Streams with Concept Drift. In: Badica, C., et al. Intelligent Distributed Computing X. IDC 2016. Studies in Computational Intelligence, vol 678. Springer, Cham. https://doi.org/10.1007/978-3-319-48829-5_23

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-48829-5_23

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-48828-8

  • Online ISBN: 978-3-319-48829-5

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics