Abstract
Mining data streams has become an important topic due to the increased availability of vast amounts of online data. In such incremental learning scenarios, observations arrive in a sequence over time and are subject to changes in data distributions, also known as concept drifts. Interleaved test-then-train evaluations are often used during supervised learning from streaming data. The idea is intuitive: we first use each instance to test a model, then it is used for training. However, true class labels may be missing or arrive well after the prediction, which implies that they cannot be used for training and/or drift detection. Based on these considerations, we introduce our LESS-TWE ensemble-based method for online learning in domains where full reliance on labels would be unfeasible. Our approach combines weighted soft voting and unsupervised drift detection to reduce the dependency on labels during model construction. In cases where the label is unavailable, the most confident label, as predicted through weighted soft voting, is selected. Similarly, our unlabelled drift detector flags for drifts based on the voting confidence, rather than relying on the true label. Our experimental evaluation indicates that our algorithm is very fast, achieves comparable predictive accuracy when compared to the state-of-the-art and outperforms baseline methods.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Bifet, A., Gavalda, R.: Learning from time-changing data with adaptive windowing. In: Proceedings of the 2007 SIAM International Conference on Data Mining, pp. 443–448. SIAM (2007)
Bifet, A., Holmes, G., Pfahringer, B.: Leveraging bagging for evolving data streams. In: Balcázar, J.L., Bonchi, F., Gionis, A., Sebag, M. (eds.) ECML PKDD 2010. LNCS (LNAI), vol. 6321, pp. 135–150. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15880-3_15
Bifet, A., et al.: New ensemble methods for evolving data streams. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 139–148. ACM (2009)
Creech, G., Hu, J.: A semantic approach to host-based intrusion detection systems using contiguous and discontinuous system call patterns. IEEE Trans. Comput. 63, 807–819 (2014)
D’Ettorre, S., Viktor, H.L., Paquet, E.: Context-based abrupt change detection and adaptation for categorical data streams. In: Yamamoto, A., Kida, T., Uno, T., Kuboyama, T. (eds.) DS 2017. LNCS (LNAI), vol. 10558, pp. 3–17. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-67786-6_1
Flach, P.: Machine Learning: The Art and Science of Algorithms that Make Sense of Data. Cambridge University Press, Cambridge (2012)
Gama, J., Medas, P., Castillo, G., Rodrigues, P.: Learning with drift detection. In: Bazzan, A.L.C., Labidi, S. (eds.) SBIA 2004. LNCS (LNAI), vol. 3171, pp. 286–295. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-28645-5_29
Haque, A., Khan, L., Baron, M.: Semi supervised adaptive framework for classifying evolving data stream. In: Cao, T., Lim, E.-P., Zhou, Z.-H., Ho, T.-B., Cheung, D., Motoda, H. (eds.) PAKDD 2015. LNCS (LNAI), vol. 9078, pp. 383–394. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-18032-8_30
Japkowicz, N., Shah, M.: Evaluating Learning Algorithms: A Classification Perspective. Cambridge University Press, Cambridge (2011)
Krawczyk, B., et al.: Ensemble learning for data stream analysis: a survey. Inf. Fusion 37, 132–156 (2017). ISSN 1566-2535
Krempl, G., et al.: Open challenges for data stream mining research. ACM SIGKDD Explor. Newsl. 16(1), 1–10 (2014)
Nishida, K., Yamauchi, K.: Adaptive classifiers-ensemble system for tracking concept drift. In: 2007 International Conference on Machine Learning and Cybernetics, vol. 6, pp. 3607–3612. IEEE (2007)
Pesaranghader, A., Viktor, H., Paquet, E.: Reservoir of diverse adaptive learners and stacking fast Hoeffding drift detection methods for evolving data streams. Mach. Learn. 107(11), 1711–1743 (2018). https://doi.org/10.1007/s10994-018-5719-z
Sobolewski, P., Wozniak, M.: Concept drift detection and model selection with simulated recurrence and ensembles of statistical detectors. J. Univ. Comput. Sci. 19(4), 462–483 (2013)
Street, W.N., Kim, Y.S.: A streaming ensemble algorithm (SEA) for large-scale classification. In: Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 377–382. ACM (2001)
Zhu, X., Goldberg, A.B.: Introduction to semi-supervised learning. Synth. Lect. Artif. Intell. Mach. Learn. 3(1), 1–130 (2009)
Žliobaitė, I., Bifet, A., Read, J., Pfahringer, B., Holmes, G.: Evaluation methods and decision theory for classification of streaming data with temporal dependence. Mach. Learn. 98(3), 455–482 (2014). https://doi.org/10.1007/s10994-014-5441-4
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Floyd, S.L.A., Viktor, H.L. (2020). Soft Voting Windowing Ensembles for Learning from Partially Labelled Streams. In: Ceci, M., Loglisci, C., Manco, G., Masciari, E., Ras, Z. (eds) New Frontiers in Mining Complex Patterns. NFMCP 2019. Lecture Notes in Computer Science(), vol 11948. Springer, Cham. https://doi.org/10.1007/978-3-030-48861-1_6
Download citation
DOI: https://doi.org/10.1007/978-3-030-48861-1_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-48860-4
Online ISBN: 978-3-030-48861-1
eBook Packages: Computer ScienceComputer Science (R0)