Abstract
Learning from imbalanced data streams is one of the challenges for classification algorithms and learning classifiers. The goal of the paper is to propose and validate a new approach for learning from data streams. However, the paper references a problem of class-imbalanced data. In this paper, a hybrid approach for changing the class distribution towards a more balanced data using the over-sampling and instance selection techniques is discussed. The proposed approach assumes that classifiers are induced from incoming blocks of instances, called data chunks. These data chunks consist of incoming instances from different classes and a balance between them is obtained through the hybrid approach. These data chunks are next used to induce classifier ensembles. The proposed approach is validated experimentally using several selected benchmark datasets and the computational experiment results are presented and discussed. The results of the computational experiment show that the proposed approach for eliminating class imbalance in data streams can help increase the performance of online learning algorithms.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
The best solution obtained by the compared algorithms is indicated in bold. The underline indicates the best solution obtained by the WECOI or WECU algorithm.
References
Kaplan, A.M., Haenlein, M.: Users of the world, unite! the challenges and opportunities of social media. Bus. Horiz. 53(1), 59–68 (2010). https://doi.org/10.1016/j.bushor.2009.09.003
Chan, J.F., et al.: A familial cluster of pneumonia associated with the 2019 novel coronavirus indicating person-to-person transmission: a study of a family cluster. Lancet 395(10223), 514–523 (2020). https://doi.org/10.1016/S0140-6736(20)30154-9
Phan, H.T., Nguyen, N.T., Tran, V.C., Hwang, D.: A sentiment analysis method of objects by integrating sentiments from tweets. J. Intell. Fuzzy Syst. 37(6), 7251–7263 (2019). https://doi.org/10.3233/JIFS-179336
Wang, Y., Zheng, L., Wang, Y.: Event-driven tool condition monitoring methodology considering tool life prediction based on industrial internet. J. Manuf. Syst. 58, 205–222 (2021). https://doi.org/10.1016/j.jmsy.2020.11.019
Bifet, A.: Adaptive learning and mining for data streams and frequent patterns. PhD thesis, Universitat Politecnica de Catalunya (2009)
Sahel, Z., Bouchachia, A., Gabrys, B., Rogers, P.: Adaptive mechanisms for classification problems with drifting data. In: Apolloni, B., Howlett, R.J., Jain, L. (eds.) Knowledge-Based Intelligent Information and Engineering Systems. LNCS (LNAI), vol. 4693, pp. 419–426. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-74827-4_53
Widmer, G., Kubat, M.: Learning in the presence of concept drift and hidden contexts. Mach. Learn. 23(1), 69–101 (1996)
Tsymbal, A.: The problem of concept drift: definitions and related work. Technical Report. TCD-CS-2004–15, Department of Computer Science, Trinity College Dublin, Dublin, Ireland (2004)
Mitchell, T.: Machine Learning. McGraw-Hill, New York (1997)
Vitter, J.S.: Random sampling with a reservoir. ACM Trans. Math. Softw. 11(1), 37–57 (1985)
Chaudhuri, S., Motwani, R., Narasayya, V.R. On random sampling over joins. In: Delis, A., Faloutsos, C., Ghandeharizadeh, S. (eds.) SIGMOD 1999, pp. 263–274. ACM Press (1999)
Guha, S., Mishra, N., Motwani, R., O’Callaghan, L.: Clustering data streams. In: Proceedings of the 41st Annual Symposium on Foundations of Computer Science (FOCS), pp. 359–366. IEEE Computer Society, Washington (2000)
Kuncheva, L.I.: Classifier ensembles for changing environments. In: Roli, F., Kittler, J., Windeatt, T. (eds.) Multiple Classifier Systems. LNCS, vol. 3077, pp. 1–15. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-25966-4_1
Fernández, A., García, S., Galar, M., Prati, R.C., Krawczyk, B., Herrera, F.: Learning from imbalanced data streams. In: Learning from Imbalanced Data Sets, pp. 279–303. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-98074-4_11
Stefanowski, J.: Multiple and hybrid classifiers. In: Polkowski L. (ed.) Formal Methods and Intelligent Techniques in Control, Decision Making. Multimedia and Robotics, pp. 174–188. Warszawa (2001)
Zhu, X., Ding, W., Yu, P.S.: One-class learning and concept summarization for data streams. Knowl. Inf. Syst. 28, 523–553 (2011)
Czarnowski, I., Jędrzejowicz, P.: Ensemble online classifier based on the one-class base classifiers for mining data streams. Cybern. Syst. 46(1–2), 51–68 (2015). https://doi.org/10.1080/01969722.2015.1007736
Woźniak, M., Cal, P., Cyganek, B.: The influence of a classifiers’ diversity on the quality of weighted aging ensemble. In: Nguyen, N.T., Attachoo, B., Trawiński, B., Somboonviwat, K. (eds.) ACIIDS 2014. LNCS (LNAI), vol. 8398, pp. 90–99. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-05458-2_10
Tsai, C.-F., Lin, W.-C., Hu, Y.-H., Ya, G.-T.: Under-sampling class imbalanced datasets by combining clustering analysis and instance selection. Inf. Sci. 477, 47–54 (2019). https://doi.org/10.1016/j.ins.2018.10.029
Khan, S., Madden, M.G.: One-class classification: taxonomy of study and review of techniques. Knowl. Eng. Rev. 29(3), 345–374 (2014)
Bifet, A., Holmes, G., Kirkby, R., Pfahhringer, B.: MOA: Massive online analysis. J. Mach. Learn. Res. 11, 1601–1604 (2010)
Wang, H., Fan, W., Yu, P.S., Han, J.: Mining concept-drifting data streams using ensemble classifiers. In: Proceedings of 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 226–235 (2003). https://doi.org/10.1145/956750.956778
Asuncion, A., Newman, D.J.: UCI machine learning repository. University of California, School of Information and Computer Science, Irvine, CA (2007). http://www.ics.uci.edu/~mlearn/MLRepository.html
IDA Benchmark Repository (2014). https://mldata.org/
Wang, L., Hong-Bing, J., Jin, Y.: Fuzzy passive-aggressive classification: a robust and efficient algorithm for online classification problems. Inf. Sci. 220, 46–63 (2013)
Jędrzejowicz, J., Jędrzejowicz, P.: Rotation forest with GEP-induced expression trees. In: Shea, J.O., et al. (eds.) Systems: Technologies and Applications, LNAI, vol. 6682, pp. 495–503. Springer, Heidelberg (2011)
Jędrzejowicz, J., Jędrzejowicz, P.: A family of the online distance-based classifiers. In: Nguyen, N.T., Attachoo, B., Trawiński, B., Somboonviwat, K. (eds.) Intelligent Information and Database Systems. LNCS (LNAI), vol. 8398, pp. 177–186. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-05458-2_19
Bertini, J.B., Zhao, L., Lopes, A.A.: An incremental learning algorithm based on the K-associated graph for non-stationary data classification. Inf. Sci. 246, 52–68 (2013)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Czarnowski, I. (2021). Learning from Imbalanced Data Streams Based on Over-Sampling and Instance Selection. In: Paszynski, M., Kranzlmüller, D., Krzhizhanovskaya, V.V., Dongarra, J.J., Sloot, P.M. (eds) Computational Science – ICCS 2021. ICCS 2021. Lecture Notes in Computer Science(), vol 12744. Springer, Cham. https://doi.org/10.1007/978-3-030-77967-2_32
Download citation
DOI: https://doi.org/10.1007/978-3-030-77967-2_32
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-77966-5
Online ISBN: 978-3-030-77967-2
eBook Packages: Computer ScienceComputer Science (R0)