Paired k-NN learners with dynamically adjusted number of neighbors for classification of drifting data streams

Hidalgo, Juan Isidro González; Santos, Silas Garrido T. C.; de Barros, Roberto Souto Maior

doi:10.1007/s10115-022-01817-y

Paired k-NN learners with dynamically adjusted number of neighbors for classification of drifting data streams

Regular Paper
Published: 30 December 2022

Volume 65, pages 1787–1816, (2023)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Juan Isidro González Hidalgo¹,
Silas Garrido T. C. Santos¹ &
Roberto Souto Maior de Barros¹

177 Accesses
1 Altmetric
Explore all metrics

Abstract

Over the years, several classification algorithms have been proposed in the machine learning area to address challenges related to the continuous arrival of data over time, formally known as data stream. The implementations of these approaches are of vital importance for the different applications where they are used, and they have also received modifications, specifically to address the problem of concept drift, a phenomenon present in classification problems with data streams. The K-nearest neighbors (k-NN) classification algorithm is one of the methods of the family of lazy approaches used to address this problem in online learning, but it still presents some challenges that can be improved, such as the efficient choice of the number of neighbors k used in the learning process. This article proposes paired k-NN learners with dynamically adjusted number of neighbors (PL-kNN), an innovative method which adjusts dynamically and incrementally the number of neighbors used by its pair of k-NN learners in the process of online learning regarding data streams with concept drifts. To validate it, experiments were carried out with both artificial and real-world datasets and the results were evaluated using the accuracy metric, run-time, memory usage, and the Friedman statistical test with the Nemenyi post hoc test. The experimental results show that PL-kNN improves and optimizes the accuracy performances of k-NN with fixed neighboring k values in most tested scenarios.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fast Adaptive Real-Time Classification for Data Streams with Concept Drift

A grid density based framework for classifying streaming data in the presence of concept drift

Article 09 May 2015

Forward Classification on Data Streams

Notes

MOAManager is freely available at https://github.com/brunom4ciel/moamanager/.
Available at http://mlkd.csd.auth.gr/datasets.html.

References

Agrawal R, Imielinski T, Swami A (1993) Database mining: a performance perspective. IEEE Trans Knowl Data Eng 5(6):914–925
Article Google Scholar
Alberghini G, Barbon Junior S, Cano A (2022) Adaptive ensemble of self-adjusting nearest neighbor subspaces for multi-label drifting data streams. Neurocomputing 481:228–248
Article Google Scholar
Almeida PR, Oliveira LS, Britto AS Jr et al (2018) Adapting dynamic classifier selection for concept drift. Expert Syst Appl 104:67–85
Article Google Scholar
Atkeson CG, Moore AW, Schaal S (1997) Locally weighted learning. Artif Intell Rev 11(1–5):11–73
Article Google Scholar
Barddal JP, Gomes HM, Granatyr J et al (2016) Overcoming feature drifts via dynamic feature weighted k-nearest neighbor learning. In: Proceedings of 23rd IEEE international conference on pattern recognition (ICPR), pp 2186–2191
Barros RSM, Santos SGTC (2018) A large-scale comparison of concept drift detectors. Inf Sciences 451:348–370
Article MathSciNet Google Scholar
Barros RSM, Santos SGTC (2019) An overview and comprehensive comparison of ensembles for concept drift. Inf Fusion 52((C)):213–244
Article Google Scholar
Barros RSM, Cabral DRL, Gonçalves PM Jr et al (2017) RDDM: reactive drift detection method. Expert Syst Appl 90((C)):344–355
Article Google Scholar
Barros RSM, Hidalgo JIG, Cabral DRL (2018) Wilcoxon rank sum test drift detector. Neurocomputing 275((C)):1954–1963
Article Google Scholar
Barros RSM, Santos SGTC, Barddal JP (2022) Evaluating k-NN in the classification of data streams with concept drift. arXiv preprint arXiv:2210.03119
Bifet A, Holmes G, Kirkby R et al (2010) MOA: massive online analysis. J Mach Learn Res 11:1601–1604
Google Scholar
Bifet A, Gavaldà R, Holmes G et al (2018) Machine learning for data streams with practical examples in MOA. MIT Press, Cambridge
Book Google Scholar
Bottou L, Vapnik V (1992) Local learning algorithms. Neural Comput 4(6):888–900
Article Google Scholar
Brzezinski D, Stefanowski J (2013) Reacting to different types of concept drift: the accuracy updated ensemble algorithm. IEEE Trans Neural Netw Learn Syst 25(1):81–94
Article Google Scholar
Cabral DRL, Barros RSM (2018) Concept drift detection based on Fisher’s exact test. Inf Sci 442:220–234
Article MathSciNet Google Scholar
Cai YL, Ji D, Cai D (2010) A KNN research paper classification method based on shared nearest neighbor. In: Proceedings of NTCIR-8 workshop meeting, Tokyo, Japan, pp 336–340
Candillier L, Lemaire V (2012) Design and analysis of the nomao challenge active learning in the real-world. In: Proceedings of the ALRA: active learning in real-world applications, workshop ECML-PKDD, pp 1–15
Cortez P, Cerdeira A, Almeida F et al (2009) Modeling wine preferences by data mining from physicochemical properties. Decis Support Syst 47(4):547–553
Article Google Scholar
Dawid AP (1984) Present position and potential developments: some personal views: statistical theory: the prequential approach. J R Stat Soc Ser A (General) 147(2):278–292
Article Google Scholar
Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30
MathSciNet MATH Google Scholar
Fern X, Brodley C (2004) Cluster ensembles for high dimensional clustering: an empirical study. Tech. rep., Oregon State University. Department of Computer Science. http://hdl.handle.net/1957/35655
Frías-Blanco I, Verdecia-Cabrera A, Ortiz-Díaz A et al (2016) Fast adaptive stacking of ensembles. In: Proceedings of the 31st ACM symposium on applied computing (SAC’16), Pisa, Italy, pp 929–934
Friedman M (1937) The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J Am Stat Assoc 32(200):675–701
Article MATH Google Scholar
Gaber MM, Zaslavsky A, Krishnaswamy S (2007) A survey of classification methods in data streams. In: Aggarwal CC (ed) Data streams: advances in database systems. Springer, Boston, pp 39–59
Chapter Google Scholar
Gao J, Ding B, Fan W et al (2008) Classifying data streams with skewed class distributions and concept drifts. IEEE Internet Comput 12(6):37–49
Article Google Scholar
Gomes HM, Barddal JP, Enembreck F et al (2017) A survey on ensemble learning for data stream classification. ACM Comput Surv 50(2):1–36
Article Google Scholar
Gonçalves PM Jr, Barros RSM (2013) RCD: a recurring concept drift framework. Pattern Recogn Lett 34(9):1018–1025
Article Google Scholar
Hidalgo JIG, Maciel BIF, Barros RSM (2019) Experimenting with prequential variations for data stream learning evaluation. Comput Intell 35:670–692
Article MathSciNet Google Scholar
Hulten G, Spencer L, Domingos P (2001) Mining time-changing data streams. In: Proceedings of the 7th ACM SIGKDD international conference on knowledge discovery and data mining, New York, USA, KDD ’01, pp 97–106
Ienco D, Žliobaitė I, Pfahringer B (2014) High density-focused uncertainty sampling for active learning over evolving stream data. In: Proceedings of the 3rd international workshop on big data, streams and heterogeneous source mining: algorithms, systems, programming models and applications, pp 133–148
Katakis I, Tsoumakas G, Vlahavas I (2006) Dynamic feature space and incremental feature selection for the classification of textual data streams. In: Proceedings of ECML/PKDD international workshop on knowledge discovery from data streams (IWKDDS), pp 107–116
Koychev I (2007) Experiments with two approaches for tracking drifting concepts. Serdica J Comput 1(1):27–44
Article MATH Google Scholar
Liao Y, Vemuri V (2002) Use of k-nearest neighbor classifier for intrusion detection. Comput Secur 21(5):439–448
Article Google Scholar
Liu A, Lu J, Liu F et al (2018) Accumulating regional density dissimilarity for concept drift detection in data streams. Pattern Recogn 76:256–272
Article Google Scholar
Losing V, Hammer B, Wersing H (2016) KNN classifier with self adjusting memory for heterogeneous concept drift. In: 2016 IEEE 16th international conference on data mining (ICDM), Barcelona, Spain, pp 291–300
Losing V, Hammer B, Wersing H (2018) Tackling heterogeneous concept drift with the self-adjusting memory (SAM). Knowl Inf Syst 54(1):171–201
Article Google Scholar
Lu N, Zhang G, Lu J (2014) Concept drift detection via competence models. Artif Intell 209:11–28
Article MathSciNet MATH Google Scholar
Lu N, Lu J, Zhang G et al (2016) A concept drift-tolerant case-base editing technique. Artif Intell 230((C)):108–133
Article MathSciNet MATH Google Scholar
Maciel BIF, Santos SGTC, Barros RSM (2020) MOAManager: a tool to support data stream experiments. Softw Pract Exp 50(4):325–334
Article Google Scholar
Nemenyi P (1963) Distribution-free Multiple Comparisons. Ph.D. Thesis, Princeton University, New Jersey, NJ, USA. https://books.google.com.br/books?id=nhDMtgAACAAJ
Nguyen T, Czerwinski M, Lee D (1993) Compaq quicksource: providing the consumer with the power of artificial intelligence. In: Proceedings of the the fifth conference on innovative applications of artificial intelligence. AAAI Press, IAAI ’93, pp 142–151
Roseberry M, Krawczyk B, Cano A (2019) Multi-label punitive kNN with self-adjusting memory for drifting data streams. ACM Trans Knowl Discov Data 13(6):1–31
Article Google Scholar
Salganicoff M (1997) Tolerating concept and sampling shift in lazy learning using prediction error context switching. Artif Intell Rev 11(1–5):133–155
Article Google Scholar
Simoudis E, Aha DW (1997) Special issue on lazy learning. Artif Intell Rev 11(1–5):7–10
Google Scholar
Srivas S, Khot PG (2019) Performance evaluation of MOA v/s KNN classification schemes: case study of major cities in the world. Int J Comput Sci Eng 7:489–495
Google Scholar
Sun Y, Dai H (2021) Constructing accuracy and diversity ensemble using pareto-based multi-objective learning for evolving data streams. Neural Comput Appl 33(11):6119–6132
Article Google Scholar
Sun Y, Sun Y, Dai H (2020) Two-stage cost-sensitive learning for data streams with concept drift and class imbalance. IEEE Access 8:191942–191955
Article Google Scholar
Sun Y, Li M, Li L et al (2021) Cost-sensitive classification for evolving data streams with concept drift and class imbalance. Comput Intell Neurosci. https://doi.org/10.1155/2021/8813806
Article Google Scholar
Wang X, Kuntz P, Meyer F et al (2021) Multi-label kNN classifier with online dual memory on data stream. In: 2021 international conference on data mining workshops (ICDMW), pp 405–413
Wu X, Li P, Hu X (2012) Learning from concept drifting data streams with unlabeled data. Neurocomputing 92:145–155
Article Google Scholar
Xioufis ES, Spiliopoulou M, Tsoumakas G et al (2011) Dealing with concept drift and class imbalance in multi-label stream classification. In: Proceedings of 22nd international joint conference on artificial intelligence, Barcelona, Spain, IJCAI’11, pp 1583–1588
Zhang J, Wang T, Ng WWY et al (2022) KNNENS: a k-nearest neighbor ensemble-based method for incremental learning under data stream with emerging new classes. IEEE Trans Neural Netw Learn Syst. https://doi.org/10.1109/TNNLS.2022.3149991
Article Google Scholar
Zhang ML, Zhou ZH (2007) ML-KNN: a lazy learning approach to multi-label learning. Pattern Recogn 40(7):2038–2048
Article MATH Google Scholar

Download references

Acknowledgements

Juan Hidalgo is a PhD student previously supported by a postgraduate grant from Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES); Silas Santos is a researcher supported by postdoctorate Grant Number 88887.374884/2019-00 from CAPES; and Prof. Roberto S. M. Barros is supported by research Grant Number 310092/2019-1 from Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq).

Author information

Authors and Affiliations

Centro de Informática, Universidade Federal de Pernambuco, Cidade Universitária, Recife, PE, 50740-560, Brazil
Juan Isidro González Hidalgo, Silas Garrido T. C. Santos & Roberto Souto Maior de Barros

Authors

Juan Isidro González Hidalgo
View author publications
You can also search for this author in PubMed Google Scholar
Silas Garrido T. C. Santos
View author publications
You can also search for this author in PubMed Google Scholar
Roberto Souto Maior de Barros
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

JH and RB were responsible for conceptualization, validation, and writing, reviewing, and editing; and SS was involved in validation, and writing, reviewing, and editing.

Corresponding author

Correspondence to Silas Garrido T. C. Santos.

Ethics declarations

Conflict of interest

The authors have no relevant financial or non-financial interests to disclose.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

All the run-time and memory usage tables of results omitted from the main body of the article are provided here in the appendix. Nevertheless, this appendix can also be seen as complementary material. In summary, PL-kNN tends to demand a little more run-time and memory than the other methods, except for SAMkNN, which presents very high memory consumption.

See Tables 6, 7, 8, 9, 10 and 11.

Table 6 Run-time of the methods in percentage using RDDM as auxiliary detector, in artificial datasets with abrupt concept drifts and 95% confidence intervals

Full size table

Table 7 Memory usage of the methods (bytes per second) in percentage using RDDM as auxiliary detector, in artificial datasets with abrupt concept drifts and 95% confidence intervals

Full size table

Table 8 Run-time of the methods in percentage using RDDM as auxiliary detector, in artificial datasets with gradual concept drifts and 95% confidence intervals

Full size table

Table 9 Memory usage of the methods (bytes per second) in percentage using RDDM as auxiliary detector, in artificial datasets with gradual concept drifts and 95% confidence intervals

Full size table

Table 10 Run-time of the methods in percentage using RDDM as auxiliary detector in real-world datasets

Full size table

Table 11 Memory usage of the methods in percentage using RDDM as auxiliary detector in real-world datasets

Full size table

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Hidalgo, J.I.G., Santos, S.G.T.C. & de Barros, R.S.M. Paired k-NN learners with dynamically adjusted number of neighbors for classification of drifting data streams. Knowl Inf Syst 65, 1787–1816 (2023). https://doi.org/10.1007/s10115-022-01817-y

Download citation

Received: 06 July 2022
Revised: 14 December 2022
Accepted: 17 December 2022
Published: 30 December 2022
Issue Date: April 2023
DOI: https://doi.org/10.1007/s10115-022-01817-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Paired k-NN learners with dynamically adjusted number of neighbors for classification of drifting data streams

Abstract

Access this article

Similar content being viewed by others

Fast Adaptive Real-Time Classification for Data Streams with Concept Drift

A grid density based framework for classifying streaming data in the presence of concept drift

Forward Classification on Data Streams

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendix

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Paired k-NN learners with dynamically adjusted number of neighbors for classification of drifting data streams

Abstract

Access this article

Similar content being viewed by others

Fast Adaptive Real-Time Classification for Data Streams with Concept Drift

A grid density based framework for classifying streaming data in the presence of concept drift

Forward Classification on Data Streams

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation