Skip to main content

Learning from Imbalanced Data Streams Based on Over-Sampling and Instance Selection

  • Conference paper
  • First Online:
Computational Science – ICCS 2021 (ICCS 2021)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12744))

Included in the following conference series:

Abstract

Learning from imbalanced data streams is one of the challenges for classification algorithms and learning classifiers. The goal of the paper is to propose and validate a new approach for learning from data streams. However, the paper references a problem of class-imbalanced data. In this paper, a hybrid approach for changing the class distribution towards a more balanced data using the over-sampling and instance selection techniques is discussed. The proposed approach assumes that classifiers are induced from incoming blocks of instances, called data chunks. These data chunks consist of incoming instances from different classes and a balance between them is obtained through the hybrid approach. These data chunks are next used to induce classifier ensembles. The proposed approach is validated experimentally using several selected benchmark datasets and the computational experiment results are presented and discussed. The results of the computational experiment show that the proposed approach for eliminating class imbalance in data streams can help increase the performance of online learning algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The best solution obtained by the compared algorithms is indicated in bold. The underline indicates the best solution obtained by the WECOI or WECU algorithm.

References

  1. Kaplan, A.M., Haenlein, M.: Users of the world, unite! the challenges and opportunities of social media. Bus. Horiz. 53(1), 59–68 (2010). https://doi.org/10.1016/j.bushor.2009.09.003

    Article  Google Scholar 

  2. Chan, J.F., et al.: A familial cluster of pneumonia associated with the 2019 novel coronavirus indicating person-to-person transmission: a study of a family cluster. Lancet 395(10223), 514–523 (2020). https://doi.org/10.1016/S0140-6736(20)30154-9

    Article  Google Scholar 

  3. Phan, H.T., Nguyen, N.T., Tran, V.C., Hwang, D.: A sentiment analysis method of objects by integrating sentiments from tweets. J. Intell. Fuzzy Syst. 37(6), 7251–7263 (2019). https://doi.org/10.3233/JIFS-179336

    Article  Google Scholar 

  4. Wang, Y., Zheng, L., Wang, Y.: Event-driven tool condition monitoring methodology considering tool life prediction based on industrial internet. J. Manuf. Syst. 58, 205–222 (2021). https://doi.org/10.1016/j.jmsy.2020.11.019

    Article  MathSciNet  Google Scholar 

  5. Bifet, A.: Adaptive learning and mining for data streams and frequent patterns. PhD thesis, Universitat Politecnica de Catalunya (2009)

    Google Scholar 

  6. Sahel, Z., Bouchachia, A., Gabrys, B., Rogers, P.: Adaptive mechanisms for classification problems with drifting data. In: Apolloni, B., Howlett, R.J., Jain, L. (eds.) Knowledge-Based Intelligent Information and Engineering Systems. LNCS (LNAI), vol. 4693, pp. 419–426. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-74827-4_53

    Chapter  Google Scholar 

  7. Widmer, G., Kubat, M.: Learning in the presence of concept drift and hidden contexts. Mach. Learn. 23(1), 69–101 (1996)

    Google Scholar 

  8. Tsymbal, A.: The problem of concept drift: definitions and related work. Technical Report. TCD-CS-2004–15, Department of Computer Science, Trinity College Dublin, Dublin, Ireland (2004)

    Google Scholar 

  9. Mitchell, T.: Machine Learning. McGraw-Hill, New York (1997)

    Google Scholar 

  10. Vitter, J.S.: Random sampling with a reservoir. ACM Trans. Math. Softw. 11(1), 37–57 (1985)

    Article  MathSciNet  Google Scholar 

  11. Chaudhuri, S., Motwani, R., Narasayya, V.R. On random sampling over joins. In: Delis, A., Faloutsos, C., Ghandeharizadeh, S. (eds.) SIGMOD 1999, pp. 263–274. ACM Press (1999)

    Google Scholar 

  12. Guha, S., Mishra, N., Motwani, R., O’Callaghan, L.: Clustering data streams. In: Proceedings of the 41st Annual Symposium on Foundations of Computer Science (FOCS), pp. 359–366. IEEE Computer Society, Washington (2000)

    Google Scholar 

  13. Kuncheva, L.I.: Classifier ensembles for changing environments. In: Roli, F., Kittler, J., Windeatt, T. (eds.) Multiple Classifier Systems. LNCS, vol. 3077, pp. 1–15. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-25966-4_1

    Chapter  Google Scholar 

  14. Fernández, A., García, S., Galar, M., Prati, R.C., Krawczyk, B., Herrera, F.: Learning from imbalanced data streams. In: Learning from Imbalanced Data Sets, pp. 279–303. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-98074-4_11

    Chapter  Google Scholar 

  15. Stefanowski, J.: Multiple and hybrid classifiers. In: Polkowski L. (ed.) Formal Methods and Intelligent Techniques in Control, Decision Making. Multimedia and Robotics, pp. 174–188. Warszawa (2001)

    Google Scholar 

  16. Zhu, X., Ding, W., Yu, P.S.: One-class learning and concept summarization for data streams. Knowl. Inf. Syst. 28, 523–553 (2011)

    Article  Google Scholar 

  17. Czarnowski, I., Jędrzejowicz, P.: Ensemble online classifier based on the one-class base classifiers for mining data streams. Cybern. Syst. 46(1–2), 51–68 (2015). https://doi.org/10.1080/01969722.2015.1007736

    Article  Google Scholar 

  18. Woźniak, M., Cal, P., Cyganek, B.: The influence of a classifiers’ diversity on the quality of weighted aging ensemble. In: Nguyen, N.T., Attachoo, B., Trawiński, B., Somboonviwat, K. (eds.) ACIIDS 2014. LNCS (LNAI), vol. 8398, pp. 90–99. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-05458-2_10

    Chapter  Google Scholar 

  19. Tsai, C.-F., Lin, W.-C., Hu, Y.-H., Ya, G.-T.: Under-sampling class imbalanced datasets by combining clustering analysis and instance selection. Inf. Sci. 477, 47–54 (2019). https://doi.org/10.1016/j.ins.2018.10.029

    Article  Google Scholar 

  20. Khan, S., Madden, M.G.: One-class classification: taxonomy of study and review of techniques. Knowl. Eng. Rev. 29(3), 345–374 (2014)

    Article  Google Scholar 

  21. Bifet, A., Holmes, G., Kirkby, R., Pfahhringer, B.: MOA: Massive online analysis. J. Mach. Learn. Res. 11, 1601–1604 (2010)

    Google Scholar 

  22. Wang, H., Fan, W., Yu, P.S., Han, J.: Mining concept-drifting data streams using ensemble classifiers. In: Proceedings of 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 226–235 (2003). https://doi.org/10.1145/956750.956778

  23. Asuncion, A., Newman, D.J.: UCI machine learning repository. University of California, School of Information and Computer Science, Irvine, CA (2007). http://www.ics.uci.edu/~mlearn/MLRepository.html

  24. IDA Benchmark Repository (2014). https://mldata.org/

  25. Wang, L., Hong-Bing, J., Jin, Y.: Fuzzy passive-aggressive classification: a robust and efficient algorithm for online classification problems. Inf. Sci. 220, 46–63 (2013)

    Article  Google Scholar 

  26. Jędrzejowicz, J., Jędrzejowicz, P.: Rotation forest with GEP-induced expression trees. In: Shea, J.O., et al. (eds.) Systems: Technologies and Applications, LNAI, vol. 6682, pp. 495–503. Springer, Heidelberg (2011)

    Google Scholar 

  27. Jędrzejowicz, J., Jędrzejowicz, P.: A family of the online distance-based classifiers. In: Nguyen, N.T., Attachoo, B., Trawiński, B., Somboonviwat, K. (eds.) Intelligent Information and Database Systems. LNCS (LNAI), vol. 8398, pp. 177–186. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-05458-2_19

    Chapter  MATH  Google Scholar 

  28. Bertini, J.B., Zhao, L., Lopes, A.A.: An incremental learning algorithm based on the K-associated graph for non-stationary data classification. Inf. Sci. 246, 52–68 (2013)

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ireneusz Czarnowski .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Czarnowski, I. (2021). Learning from Imbalanced Data Streams Based on Over-Sampling and Instance Selection. In: Paszynski, M., Kranzlmüller, D., Krzhizhanovskaya, V.V., Dongarra, J.J., Sloot, P.M. (eds) Computational Science – ICCS 2021. ICCS 2021. Lecture Notes in Computer Science(), vol 12744. Springer, Cham. https://doi.org/10.1007/978-3-030-77967-2_32

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-77967-2_32

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-77966-5

  • Online ISBN: 978-3-030-77967-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics