skip to main content
research-article

Imbalance-Robust Multi-Label Self-Adjusting kNN

Published: 26 July 2024 Publication History

Abstract

In the task of multi-label classification in data streams, instances arriving in real-time need to be associated with multiple labels simultaneously. Various methods based on the k Nearest Neighbors algorithm have been proposed to address this task. However, these methods face limitations when dealing with imbalanced data streams, a problem that has received limited attention in existing works. To approach this gap, this article introduces the Imbalance-Robust Multi-Label Self-Adjusting kNN (IRMLSAkNN), designed to tackle multi-label imbalanced data streams. IRMLSAkNN’s strength relies on maintaining relevant instances with imbalance labels by using a discarding mechanism that considers the imbalance ratio per label. On the other hand, it evaluates subwindows with an imbalance-aware measure to discard older instances that are lacking performance. We conducted statistical experiments on 32 benchmark data streams, evaluating IRMLSAkNN against eight multi-label classification algorithms using common accuracy-aware and imbalance-aware measures. The obtained results demonstrate that IRMLSAkNN consistently outperforms these algorithms in terms of predictive capacity and time cost across various levels of imbalance.

References

[1]
Gabriel Aguiar, Bartosz Krawczyk, and Alberto Cano. 2022. A survey on learning from imbalanced data streams: Taxonomy, challenges, empirical study, and reproducible experimental framework. Machine Learning 113, 4165–4243.
[2]
Gavin Alberghini, Sylvio Barbon Junior, and Alberto Cano. 2022. Adaptive ensemble of self-adjusting nearest neighbor subspaces for multi-label drifting data streams. Neurocomputing 481 (2022), 228–248.
[3]
Albert Bifet and Ricard Gavalda. 2007. Learning from time-changing data with adaptive windowing. In Proceedings of the SIAM International Conference on Data Mining (SDM ’07). 443–448.
[4]
Albert Bifet, Geoff Holmes, Richard Kirkby, and Bernhard Pfahringer. 2010. MOA: Massive online analysis. Journal of Machine Learning Research 11 (2010), 1601–1604.
[5]
Albert Bifet, Geoff Holmes, Bernhard Pfahringer, Richard Kirkby, and Ricard Gavalda. 2009. New ensemble methods for evolving data streams. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 139–148.
[6]
Jasmin Bogatinovski, Ljupco Todorovski, Saso Dzeroski, and Dragi Kocev. 2022. Comprehensive comparative study of multi-label classification methods. Expert Systems with Applications 203 (2022), 117–215.
[7]
Matthew R. Boutell, Jiebo Luo, Xipeng Shen, and Christopher M. Brown. 2004. Learning multi-label scene classification. Pattern Recognition 37, 9 (2004), 1757–1771.
[8]
Francisco Charte, Antonio Rivera, María José del Jesus, and Francisco Herrera. 2013. A first approach to deal with imbalance in multi-label datasets. In Hybrid Artificial Intelligent Systems. Jeng-Shyang Pan, Marios M. Polycarpou, Michał Woźniak, André C. P. L. F. de Carvalho, Héctor Quintián, and Emilio Corchado (Eds.), Springer, Berlin, 150–160.
[9]
Pedro Domingos and Geoff Hulten. 2000. Mining high-speed data streams. In Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Association for Computing Machinery, 71–80.
[10]
Jie Du and Chi-Man Vong. 2020. Robust online multilabel learning under dynamic changes in data distribution with labels. IEEE Transactions on Cybernetics 50, 1 (2020), 374–385.
[11]
Andrés F. Giraldo-Forero, Jorge A. Jaramillo-Garzón, and César G. Castellanos-Domínguez. 2015. Evaluation of example-based measures for multi-label classification performance. In Bioinformatics and Biomedical Engineering. Francisco Ortuño and Ignacio Rojas (Eds.), 557–564.
[12]
Jorge Gonzalez-Lopez, Alberto Cano, and Sebastian Ventura. 2017. Large-scale multi-label ensemble learning on spark. In Proceedings of the 2017 IEEE Trustcom/BigDataSE/ICESS. 893–900.
[13]
Ege B. Gulcan, Isin S. Ecevit, and Fazli Can. 2022. Binary transformation method for multi-label stream classification. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management. Association for Computing Machinery, New York, NY, 3968–3972.
[14]
Meng Han, Hongxin Wu, Zhiqiang Chen, Muhang Li, and Xilong Zhang. 2023. A survey of multi-label classification based on supervised and semi-supervised learning. International Journal of Machine Learning and Cybernetics 14 (2023), 697–724.
[15]
F. Herrera, F. Charte, A. J. Rivera, and M. J. del Jesus. 2016. Multilabel Classification. Springer Cham, Switzerland.
[16]
Jiaye Li, Jian Zhang, Jilian Zhang, and Shichao Zhang. 2023. Quantum KNN classification with k value selection and neighbor selection. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 43, (2023), 1–1.
[17]
Shunpan Liang, Weiwei Pan, Dianlong You, Ze Liu, and Ling Yin. 2022. Incremental deep forest for multi-label data streams learning. Applied Intelligence 52 (2022), 13398–13414.
[18]
Weiwei Liu, Xiaobo Shen, Haobo Wang, and Ivor W. Tsang. 2020. The emerging trends of multi-label learning. IEEE Transactions on Pattern Analysis and Machine Intelligence PP (2020), 7955–7974.
[19]
Viktor Losing, Barbara Hammer, and Heiko Wersing. 2016. KNN classifier with self adjusting memory for heterogeneous concept drift. In Proceedings of the IEEE 16th International Conference on Data Mining (ICDM ’16). 291–300.
[20]
Jie Lu, Anjin Liu, Fan Dong, Feng Gu, João Gama, and Guangquan Zhang. 2019. Learning under concept drift: A review. IEEE Transactions on Knowledge and Data Engineering 31, 12 (2019), 2346–2363.
[21]
Oded Maimon and Lior Rokach. 2010. Data Mining and Knowledge Discovery Handbook (2nd. ed.). Springer, New York, NY.
[22]
Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schütze. 2008. Introduction to Information Retrieval. Cambridge University Press, 234–265 pages.
[23]
Jose M. Moyano, Eva L. Gibaja, Krzysztof J. Cios, and Sebastián Ventura. 2018. Review of ensembles of multi-label classifiers: Models, experimental study and prospects. Information Fusion 44 (2018), 33–45.
[24]
John W. Pratt. 1959. Remarks on zeros and ties in the Wilcoxon signed rank procedures. Journal of the American Statistical Association 54, 287 (1959), 655–667.
[25]
Niloofar Rastin, Mansoor Z. Jahromi, and Mohammad Taheri. 2021. A generalized weighted distance k-nearest neighbor for multi-label problems. Pattern Recognition 114 (2021), 107526.
[26]
Jesse Read, Albert Bifet, Geoff Holmes, and Bernhard Pfahringer. 2012. Scalable and efficient multi-label classification for evolving data streams. Machine Learning 88 (2012), 243–272.
[27]
Jesse Read, Peter Reutemann, Bernhard Pfahringer, and Geoff Holmes. 2016. MEKA: A multi-label/multi-target extension to WEKA. Journal of Machine Learning Research 17, 21 (2016), 1–5. Retrieved from http://jmlr.org/papers/v17/12-164.html
[28]
Martha Roseberry and Alberto Cano. 2018. Multi-label kNN classifier with self adjusting memory for drifting data streams. In Proceedings of the 2nd International Workshop on Learning with Imbalanced Domains: Theory and Applications. Luís Torgo, Stan Matwin, Nathalie Japkowicz, Bartosz Krawczyk, Nuno Moniz, and Paula Branco (Eds.), Vol. 94. 23–37.
[29]
Martha Roseberry, Saso Dzeroski, Albert Bifet, and Alberto Cano. 2023. Aging and rejuvenating strategies for fading windows in multi-label classification on data streams. In Proceedings of the 38th ACM/SIGAPP Symposium on Applied Computing. Association for Computing Machinery, New York, NY, 390–397.
[30]
Martha Roseberry, Bartosz Krawczyk, and Alberto Cano. 2019. Multi-label punitive KNN with self-adjusting memory for drifting data streams. ACM Transactions on Knowledge Discovery from Data (TKDD) 13, 6 (2019), 1–31.
[31]
Martha Roseberry, Bartosz Krawczyk, Youcef Djenouri, and Alberto Cano. 2021. Self-adjusting k nearest neighbors for continual learning from multi-label drifting data streams. Neurocomputing 442 (2021), 10–25.
[32]
Cees G. M. Snoek, Marcel Worring, Jan C. van Gemert, Jan-Mark Geusebroek, and Arnold W. M. Smeulders. 2006. The challenge problem for automated detection of 101 semantic concepts in multimedia. In Proceedings of the 14th ACM International Conference on Multimedia. 421–430.
[33]
Yange Sun, Han Shao, and Shasha Wang. 2019. Efficient ensemble classification for multi-label data streams with concept drift. Information 10, 5 (2019), 158.
[34]
Adane N. Tarekegn, Mario Giacobini, and Krzysztof Michalak. 2021. A review of methods for imbalanced multi-label classification. Pattern Recognition 118 (2021), 107965.
[35]
Kashvi Taunk, Sanjukta De, Srishti Verma, and Aleena Swetapadma. 2019. A brief review of nearest neighbor algorithm for learning and classification. In Proceedings of the International Conference on Intelligent Computing and Control Systems (ICCS ’19). 1255–1260.
[36]
Grigorios Tsoumakas, Eleftherios Spyromitros-Xioufis, Jozef Vilcek, and Ioannis Vlahavas. 2011. Mulan: A java library for multi-label learning. Journal of Machine Learning Research 12 (2011), 2411–2414.
[37]
Douglas Turnbull, Luke Barrington, David Torres, and Gert Lanckriet. 2008. Semantic annotation and retrieval of music and sound effects. IEEE Transactions on Audio, Speech, and Language Processing 16, 2 (2008), 467–476.
[38]
Xihui Wang, Pascale Kuntz, Frank Meyer, and Vincent Lemaire. 2021. Multi-label kNN classifier with online dual memory on data stream. In Proceedings of the International Conference on Data Mining Workshops (ICDMW ’21). IEEE, Auckland, New Zealand, 405–413.
[39]
Zhe Wang, Hao Xu, Pan Zhou, and Gang Xiao. 2023. An improved multilabel k-nearest neighbor algorithm based on value and weight. Computation 11, 2 (2023), 32.
[40]
Scott Wares, John Isaacs, and Eyad Elyan. 2019. Data stream mining: Methods and challenges for handling concept drift. SN Applied Sciences 1 (2019), 1412–1431.
[41]
Hongxin Wu, Meng Han, Zhiqiang Chen, Muhang Li, and Xilong Zhang. 2023. A weighted ensemble classification algorithm based on nearest neighbors for multi-label data stream. ACM Transactions on Knowledge Discovery from Data 17, 5 (2023), 1–21.
[42]
Jianhua Xu, Jiali Liu, Jing Yin, and Chengyu Sun. 2016. A multi-label feature extraction algorithm via maximizing feature variance and feature-label dependence simultaneously. Knowledge-Based Systems 98 (2016), 172–184.
[43]
Min-Ling Zhang, Yu-Kun Li, Xu-Ying Liu, and Xin Geng. 2018. Binary relevance for multi-label learning: An overview. Frontiers of Computer Science 12 (2018), 191–202.
[44]
Min-Ling Zhang and Zhi-Hua Zhou. 2007. ML-KNN: A lazy learning approach to multi-label learning. Pattern Recognition 40, 7 (2007), 2038–2048.
[45]
Min-Ling Zhang and Zhi-Hua Zhou. 2014. A review on multi-label learning algorithms. IEEE Transactions on Knowledge and Data Engineering 26, 8 (2014), 1819–1837.
[46]
Shichao Zhang. 2022. Challenges in KNN classification. IEEE Transactions on Knowledge and Data Engineering 34, 10 (2022), 4663–4675.
[47]
Shichao Zhang and Jiaye Li. 2023. KNN classification with one-step computation. IEEE Transactions on Knowledge and Data Engineering 35, 3 (2023), 2711–2723.
[48]
Shichao Zhang, Jiaye Li, and Yangding Li. 2023. Reachable distance function for KNN classification. IEEE Transactions on Knowledge and Data Engineering 35, 07 (2023), 7382–7396.
[49]
Shichao Zhang, Jiaye Li, Wenzhen Zhang, and Yongsong Qin. 2022. Hyper-class representation of data. Neurocomputing 503, C (2022), 200–218.
[50]
Shichao Zhang, Xuelong Li, Ming Zong, Xiaofeng Zhu, and Ruili Wang. 2018. Efficient kNN classification with different numbers of nearest neighbors. IEEE Transactions on Neural Networks and Learning Systems 29, 5 (2018), 1774–1785.
[51]
Xiulin Zheng and Peipei Li. 2021. An efficient framework for multi-label learning in non-stationary data stream. In Proceedings of the IEEE International Conference on Big Knowledge (ICBK ’21). IEEE, Auckland, New Zealand, 149–156.
[52]
Xiulin Zheng, Peipei Li, Zhe Chu, and Xuegang Hu. 2020. A survey on multi-label data stream classification. IEEE Access 8 (2020), 1249–1275.

Cited By

View all
  • (2025)Permutation driven evolutionary ordering with dependency filtering for multi-label classificationInternational Journal of Machine Learning and Cybernetics10.1007/s13042-024-02502-yOnline publication date: 16-Jan-2025

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Knowledge Discovery from Data
ACM Transactions on Knowledge Discovery from Data  Volume 18, Issue 8
September 2024
700 pages
EISSN:1556-472X
DOI:10.1145/3613713
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 July 2024
Online AM: 11 May 2024
Accepted: 23 April 2024
Revised: 09 April 2024
Received: 20 November 2023
Published in TKDD Volume 18, Issue 8

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Multi-label learning
  2. data stream classification
  3. class imbalance
  4. nearest neighbors

Qualifiers

  • Research-article

Funding Sources

  • CEPID-CeMEAI-Center for Mathematical Sciences Applied to Industry

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)230
  • Downloads (Last 6 weeks)31
Reflects downloads up to 15 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Permutation driven evolutionary ordering with dependency filtering for multi-label classificationInternational Journal of Machine Learning and Cybernetics10.1007/s13042-024-02502-yOnline publication date: 16-Jan-2025

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

Full Text

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media