Soft Voting Windowing Ensembles for Learning from Partially Labelled Streams

Floyd, Sean L. A.; Viktor, Herna L.

doi:10.1007/978-3-030-48861-1_6

Soft Voting Windowing Ensembles for Learning from Partially Labelled Streams

Sean L. A. Floyd¹³ &
Herna L. Viktor¹³

Conference paper
First Online: 14 May 2020

630 Accesses
2 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11948))

Abstract

Mining data streams has become an important topic due to the increased availability of vast amounts of online data. In such incremental learning scenarios, observations arrive in a sequence over time and are subject to changes in data distributions, also known as concept drifts. Interleaved test-then-train evaluations are often used during supervised learning from streaming data. The idea is intuitive: we first use each instance to test a model, then it is used for training. However, true class labels may be missing or arrive well after the prediction, which implies that they cannot be used for training and/or drift detection. Based on these considerations, we introduce our LESS-TWE ensemble-based method for online learning in domains where full reliance on labels would be unfeasible. Our approach combines weighted soft voting and unsupervised drift detection to reduce the dependency on labels during model construction. In cases where the label is unavailable, the most confident label, as predicted through weighted soft voting, is selected. Similarly, our unlabelled drift detector flags for drifts based on the voting confidence, rather than relying on the true label. Our experimental evaluation indicates that our algorithm is very fast, achieves comparable predictive accuracy when compared to the state-of-the-art and outperforms baseline methods.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
https://github.com/SeanLF/scikit-multiflow.

References

Bifet, A., Gavalda, R.: Learning from time-changing data with adaptive windowing. In: Proceedings of the 2007 SIAM International Conference on Data Mining, pp. 443–448. SIAM (2007)
Google Scholar
Bifet, A., Holmes, G., Pfahringer, B.: Leveraging bagging for evolving data streams. In: Balcázar, J.L., Bonchi, F., Gionis, A., Sebag, M. (eds.) ECML PKDD 2010. LNCS (LNAI), vol. 6321, pp. 135–150. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15880-3_15
Chapter Google Scholar
Bifet, A., et al.: New ensemble methods for evolving data streams. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 139–148. ACM (2009)
Google Scholar
Creech, G., Hu, J.: A semantic approach to host-based intrusion detection systems using contiguous and discontinuous system call patterns. IEEE Trans. Comput. 63, 807–819 (2014)
Article MathSciNet Google Scholar
D’Ettorre, S., Viktor, H.L., Paquet, E.: Context-based abrupt change detection and adaptation for categorical data streams. In: Yamamoto, A., Kida, T., Uno, T., Kuboyama, T. (eds.) DS 2017. LNCS (LNAI), vol. 10558, pp. 3–17. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-67786-6_1
Chapter Google Scholar
Flach, P.: Machine Learning: The Art and Science of Algorithms that Make Sense of Data. Cambridge University Press, Cambridge (2012)
Book Google Scholar
Gama, J., Medas, P., Castillo, G., Rodrigues, P.: Learning with drift detection. In: Bazzan, A.L.C., Labidi, S. (eds.) SBIA 2004. LNCS (LNAI), vol. 3171, pp. 286–295. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-28645-5_29
Chapter Google Scholar
Haque, A., Khan, L., Baron, M.: Semi supervised adaptive framework for classifying evolving data stream. In: Cao, T., Lim, E.-P., Zhou, Z.-H., Ho, T.-B., Cheung, D., Motoda, H. (eds.) PAKDD 2015. LNCS (LNAI), vol. 9078, pp. 383–394. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-18032-8_30
Chapter Google Scholar
Japkowicz, N., Shah, M.: Evaluating Learning Algorithms: A Classification Perspective. Cambridge University Press, Cambridge (2011)
Book Google Scholar
Krawczyk, B., et al.: Ensemble learning for data stream analysis: a survey. Inf. Fusion 37, 132–156 (2017). ISSN 1566-2535
Article Google Scholar
Krempl, G., et al.: Open challenges for data stream mining research. ACM SIGKDD Explor. Newsl. 16(1), 1–10 (2014)
Article Google Scholar
Nishida, K., Yamauchi, K.: Adaptive classifiers-ensemble system for tracking concept drift. In: 2007 International Conference on Machine Learning and Cybernetics, vol. 6, pp. 3607–3612. IEEE (2007)
Google Scholar
Pesaranghader, A., Viktor, H., Paquet, E.: Reservoir of diverse adaptive learners and stacking fast Hoeffding drift detection methods for evolving data streams. Mach. Learn. 107(11), 1711–1743 (2018). https://doi.org/10.1007/s10994-018-5719-z
Article MathSciNet MATH Google Scholar
Sobolewski, P., Wozniak, M.: Concept drift detection and model selection with simulated recurrence and ensembles of statistical detectors. J. Univ. Comput. Sci. 19(4), 462–483 (2013)
Google Scholar
Street, W.N., Kim, Y.S.: A streaming ensemble algorithm (SEA) for large-scale classification. In: Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 377–382. ACM (2001)
Google Scholar
Zhu, X., Goldberg, A.B.: Introduction to semi-supervised learning. Synth. Lect. Artif. Intell. Mach. Learn. 3(1), 1–130 (2009)
Article Google Scholar
Žliobaitė, I., Bifet, A., Read, J., Pfahringer, B., Holmes, G.: Evaluation methods and decision theory for classification of streaming data with temporal dependence. Mach. Learn. 98(3), 455–482 (2014). https://doi.org/10.1007/s10994-014-5441-4
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

School of Electrical Engineering and Computer Science, University of Ottawa, Ottawa, ON, Canada
Sean L. A. Floyd & Herna L. Viktor

Authors

Sean L. A. Floyd
View author publications
You can also search for this author in PubMed Google Scholar
Herna L. Viktor
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Herna L. Viktor .

Editor information

Editors and Affiliations

University of Bari Aldo Moro, Bari, Italy
Michelangelo Ceci
University of Bari Aldo Moro, Bari, Italy
Corrado Loglisci
CNR-ICAR, Rende, Italy
Giuseppe Manco
Federico II University, Naples, Italy
Elio Masciari
University of North Carolina, Charlotte, NC, USA
Zbigniew Ras

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Floyd, S.L.A., Viktor, H.L. (2020). Soft Voting Windowing Ensembles for Learning from Partially Labelled Streams. In: Ceci, M., Loglisci, C., Manco, G., Masciari, E., Ras, Z. (eds) New Frontiers in Mining Complex Patterns. NFMCP 2019. Lecture Notes in Computer Science(), vol 11948. Springer, Cham. https://doi.org/10.1007/978-3-030-48861-1_6

Download citation

DOI: https://doi.org/10.1007/978-3-030-48861-1_6
Published: 14 May 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-48860-4
Online ISBN: 978-3-030-48861-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the ECML PKDD community (opens in a new tab)