research-article

Imbalance-Robust Multi-Label Self-Adjusting kNN

Authors:

Victor Gomes De Oliveira Martins Nicola,

Karina Valdivia Delgado,

Marcelo de Souza LaurettoAuthors Info & Claims

ACM Transactions on Knowledge Discovery from Data, Volume 18, Issue 8

Article No.: 190, Pages 1 - 30

https://doi.org/10.1145/3663575

Published: 26 July 2024 Publication History

Abstract

In the task of multi-label classification in data streams, instances arriving in real-time need to be associated with multiple labels simultaneously. Various methods based on the k Nearest Neighbors algorithm have been proposed to address this task. However, these methods face limitations when dealing with imbalanced data streams, a problem that has received limited attention in existing works. To approach this gap, this article introduces the Imbalance-Robust Multi-Label Self-Adjusting kNN (IRMLSAkNN), designed to tackle multi-label imbalanced data streams. IRMLSAkNN’s strength relies on maintaining relevant instances with imbalance labels by using a discarding mechanism that considers the imbalance ratio per label. On the other hand, it evaluates subwindows with an imbalance-aware measure to discard older instances that are lacking performance. We conducted statistical experiments on 32 benchmark data streams, evaluating IRMLSAkNN against eight multi-label classification algorithms using common accuracy-aware and imbalance-aware measures. The obtained results demonstrate that IRMLSAkNN consistently outperforms these algorithms in terms of predictive capacity and time cost across various levels of imbalance.

References

[1]

Gabriel Aguiar, Bartosz Krawczyk, and Alberto Cano. 2022. A survey on learning from imbalanced data streams: Taxonomy, challenges, empirical study, and reproducible experimental framework. Machine Learning 113, 4165–4243.

Digital Library

[2]

Gavin Alberghini, Sylvio Barbon Junior, and Alberto Cano. 2022. Adaptive ensemble of self-adjusting nearest neighbor subspaces for multi-label drifting data streams. Neurocomputing 481 (2022), 228–248.

Digital Library

[3]

Albert Bifet and Ricard Gavalda. 2007. Learning from time-changing data with adaptive windowing. In Proceedings of the SIAM International Conference on Data Mining (SDM ’07). 443–448.

[4]

Albert Bifet, Geoff Holmes, Richard Kirkby, and Bernhard Pfahringer. 2010. MOA: Massive online analysis. Journal of Machine Learning Research 11 (2010), 1601–1604.

Digital Library

[5]

Albert Bifet, Geoff Holmes, Bernhard Pfahringer, Richard Kirkby, and Ricard Gavalda. 2009. New ensemble methods for evolving data streams. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 139–148.

Digital Library

[6]

Jasmin Bogatinovski, Ljupco Todorovski, Saso Dzeroski, and Dragi Kocev. 2022. Comprehensive comparative study of multi-label classification methods. Expert Systems with Applications 203 (2022), 117–215.

Digital Library

[7]

Matthew R. Boutell, Jiebo Luo, Xipeng Shen, and Christopher M. Brown. 2004. Learning multi-label scene classification. Pattern Recognition 37, 9 (2004), 1757–1771.

[8]

Francisco Charte, Antonio Rivera, María José del Jesus, and Francisco Herrera. 2013. A first approach to deal with imbalance in multi-label datasets. In Hybrid Artificial Intelligent Systems. Jeng-Shyang Pan, Marios M. Polycarpou, Michał Woźniak, André C. P. L. F. de Carvalho, Héctor Quintián, and Emilio Corchado (Eds.), Springer, Berlin, 150–160.

[9]

Pedro Domingos and Geoff Hulten. 2000. Mining high-speed data streams. In Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Association for Computing Machinery, 71–80.

Digital Library

[10]

Jie Du and Chi-Man Vong. 2020. Robust online multilabel learning under dynamic changes in data distribution with labels. IEEE Transactions on Cybernetics 50, 1 (2020), 374–385.

[11]

Andrés F. Giraldo-Forero, Jorge A. Jaramillo-Garzón, and César G. Castellanos-Domínguez. 2015. Evaluation of example-based measures for multi-label classification performance. In Bioinformatics and Biomedical Engineering. Francisco Ortuño and Ignacio Rojas (Eds.), 557–564.

[12]

Jorge Gonzalez-Lopez, Alberto Cano, and Sebastian Ventura. 2017. Large-scale multi-label ensemble learning on spark. In Proceedings of the 2017 IEEE Trustcom/BigDataSE/ICESS. 893–900.

[13]

Ege B. Gulcan, Isin S. Ecevit, and Fazli Can. 2022. Binary transformation method for multi-label stream classification. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management. Association for Computing Machinery, New York, NY, 3968–3972.

Digital Library

[14]

Meng Han, Hongxin Wu, Zhiqiang Chen, Muhang Li, and Xilong Zhang. 2023. A survey of multi-label classification based on supervised and semi-supervised learning. International Journal of Machine Learning and Cybernetics 14 (2023), 697–724.

[15]

F. Herrera, F. Charte, A. J. Rivera, and M. J. del Jesus. 2016. Multilabel Classification. Springer Cham, Switzerland.

[16]

Jiaye Li, Jian Zhang, Jilian Zhang, and Shichao Zhang. 2023. Quantum KNN classification with k value selection and neighbor selection. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 43, (2023), 1–1.

[17]

Shunpan Liang, Weiwei Pan, Dianlong You, Ze Liu, and Ling Yin. 2022. Incremental deep forest for multi-label data streams learning. Applied Intelligence 52 (2022), 13398–13414.

Digital Library

[18]

Weiwei Liu, Xiaobo Shen, Haobo Wang, and Ivor W. Tsang. 2020. The emerging trends of multi-label learning. IEEE Transactions on Pattern Analysis and Machine Intelligence PP (2020), 7955–7974.

[19]

Viktor Losing, Barbara Hammer, and Heiko Wersing. 2016. KNN classifier with self adjusting memory for heterogeneous concept drift. In Proceedings of the IEEE 16th International Conference on Data Mining (ICDM ’16). 291–300.

[20]

Jie Lu, Anjin Liu, Fan Dong, Feng Gu, João Gama, and Guangquan Zhang. 2019. Learning under concept drift: A review. IEEE Transactions on Knowledge and Data Engineering 31, 12 (2019), 2346–2363.

[21]

Oded Maimon and Lior Rokach. 2010. Data Mining and Knowledge Discovery Handbook (2nd. ed.). Springer, New York, NY.

[22]

Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schütze. 2008. Introduction to Information Retrieval. Cambridge University Press, 234–265 pages.

[23]

Jose M. Moyano, Eva L. Gibaja, Krzysztof J. Cios, and Sebastián Ventura. 2018. Review of ensembles of multi-label classifiers: Models, experimental study and prospects. Information Fusion 44 (2018), 33–45.

[24]

John W. Pratt. 1959. Remarks on zeros and ties in the Wilcoxon signed rank procedures. Journal of the American Statistical Association 54, 287 (1959), 655–667.

[25]

Niloofar Rastin, Mansoor Z. Jahromi, and Mohammad Taheri. 2021. A generalized weighted distance k-nearest neighbor for multi-label problems. Pattern Recognition 114 (2021), 107526.

[26]

Jesse Read, Albert Bifet, Geoff Holmes, and Bernhard Pfahringer. 2012. Scalable and efficient multi-label classification for evolving data streams. Machine Learning 88 (2012), 243–272.

Digital Library

[27]

Jesse Read, Peter Reutemann, Bernhard Pfahringer, and Geoff Holmes. 2016. MEKA: A multi-label/multi-target extension to WEKA. Journal of Machine Learning Research 17, 21 (2016), 1–5. Retrieved from http://jmlr.org/papers/v17/12-164.html

[28]

Martha Roseberry and Alberto Cano. 2018. Multi-label kNN classifier with self adjusting memory for drifting data streams. In Proceedings of the 2nd International Workshop on Learning with Imbalanced Domains: Theory and Applications. Luís Torgo, Stan Matwin, Nathalie Japkowicz, Bartosz Krawczyk, Nuno Moniz, and Paula Branco (Eds.), Vol. 94. 23–37.

[29]

Martha Roseberry, Saso Dzeroski, Albert Bifet, and Alberto Cano. 2023. Aging and rejuvenating strategies for fading windows in multi-label classification on data streams. In Proceedings of the 38th ACM/SIGAPP Symposium on Applied Computing. Association for Computing Machinery, New York, NY, 390–397.

Digital Library

[30]

Martha Roseberry, Bartosz Krawczyk, and Alberto Cano. 2019. Multi-label punitive KNN with self-adjusting memory for drifting data streams. ACM Transactions on Knowledge Discovery from Data (TKDD) 13, 6 (2019), 1–31.

Digital Library

[31]

Martha Roseberry, Bartosz Krawczyk, Youcef Djenouri, and Alberto Cano. 2021. Self-adjusting k nearest neighbors for continual learning from multi-label drifting data streams. Neurocomputing 442 (2021), 10–25.

[32]

Cees G. M. Snoek, Marcel Worring, Jan C. van Gemert, Jan-Mark Geusebroek, and Arnold W. M. Smeulders. 2006. The challenge problem for automated detection of 101 semantic concepts in multimedia. In Proceedings of the 14th ACM International Conference on Multimedia. 421–430.

Digital Library

[33]

Yange Sun, Han Shao, and Shasha Wang. 2019. Efficient ensemble classification for multi-label data streams with concept drift. Information 10, 5 (2019), 158.

[34]

Adane N. Tarekegn, Mario Giacobini, and Krzysztof Michalak. 2021. A review of methods for imbalanced multi-label classification. Pattern Recognition 118 (2021), 107965.

[35]

Kashvi Taunk, Sanjukta De, Srishti Verma, and Aleena Swetapadma. 2019. A brief review of nearest neighbor algorithm for learning and classification. In Proceedings of the International Conference on Intelligent Computing and Control Systems (ICCS ’19). 1255–1260.

[36]

Grigorios Tsoumakas, Eleftherios Spyromitros-Xioufis, Jozef Vilcek, and Ioannis Vlahavas. 2011. Mulan: A java library for multi-label learning. Journal of Machine Learning Research 12 (2011), 2411–2414.

Digital Library

[37]

Douglas Turnbull, Luke Barrington, David Torres, and Gert Lanckriet. 2008. Semantic annotation and retrieval of music and sound effects. IEEE Transactions on Audio, Speech, and Language Processing 16, 2 (2008), 467–476.

Digital Library

[38]

Xihui Wang, Pascale Kuntz, Frank Meyer, and Vincent Lemaire. 2021. Multi-label kNN classifier with online dual memory on data stream. In Proceedings of the International Conference on Data Mining Workshops (ICDMW ’21). IEEE, Auckland, New Zealand, 405–413.

[39]

Zhe Wang, Hao Xu, Pan Zhou, and Gang Xiao. 2023. An improved multilabel k-nearest neighbor algorithm based on value and weight. Computation 11, 2 (2023), 32.

[40]

Scott Wares, John Isaacs, and Eyad Elyan. 2019. Data stream mining: Methods and challenges for handling concept drift. SN Applied Sciences 1 (2019), 1412–1431.

[41]

Hongxin Wu, Meng Han, Zhiqiang Chen, Muhang Li, and Xilong Zhang. 2023. A weighted ensemble classification algorithm based on nearest neighbors for multi-label data stream. ACM Transactions on Knowledge Discovery from Data 17, 5 (2023), 1–21.

Digital Library

[42]

Jianhua Xu, Jiali Liu, Jing Yin, and Chengyu Sun. 2016. A multi-label feature extraction algorithm via maximizing feature variance and feature-label dependence simultaneously. Knowledge-Based Systems 98 (2016), 172–184.

Digital Library

[43]

Min-Ling Zhang, Yu-Kun Li, Xu-Ying Liu, and Xin Geng. 2018. Binary relevance for multi-label learning: An overview. Frontiers of Computer Science 12 (2018), 191–202.

Digital Library

[44]

Min-Ling Zhang and Zhi-Hua Zhou. 2007. ML-KNN: A lazy learning approach to multi-label learning. Pattern Recognition 40, 7 (2007), 2038–2048.

Digital Library

[45]

Min-Ling Zhang and Zhi-Hua Zhou. 2014. A review on multi-label learning algorithms. IEEE Transactions on Knowledge and Data Engineering 26, 8 (2014), 1819–1837.

[46]

Shichao Zhang. 2022. Challenges in KNN classification. IEEE Transactions on Knowledge and Data Engineering 34, 10 (2022), 4663–4675.

Digital Library

[47]

Shichao Zhang and Jiaye Li. 2023. KNN classification with one-step computation. IEEE Transactions on Knowledge and Data Engineering 35, 3 (2023), 2711–2723.

[48]

Shichao Zhang, Jiaye Li, and Yangding Li. 2023. Reachable distance function for KNN classification. IEEE Transactions on Knowledge and Data Engineering 35, 07 (2023), 7382–7396.

[49]

Shichao Zhang, Jiaye Li, Wenzhen Zhang, and Yongsong Qin. 2022. Hyper-class representation of data. Neurocomputing 503, C (2022), 200–218.

Digital Library

[50]

Shichao Zhang, Xuelong Li, Ming Zong, Xiaofeng Zhu, and Ruili Wang. 2018. Efficient kNN classification with different numbers of nearest neighbors. IEEE Transactions on Neural Networks and Learning Systems 29, 5 (2018), 1774–1785.

[51]

Xiulin Zheng and Peipei Li. 2021. An efficient framework for multi-label learning in non-stationary data stream. In Proceedings of the IEEE International Conference on Big Knowledge (ICBK ’21). IEEE, Auckland, New Zealand, 149–156.

[52]

Xiulin Zheng, Peipei Li, Zhe Chu, and Xuegang Hu. 2020. A survey on multi-label data stream classification. IEEE Access 8 (2020), 1249–1275.

Cited By

Jain AGupta DShukla SSrivastava V(2025)Permutation driven evolutionary ordering with dependency filtering for multi-label classificationInternational Journal of Machine Learning and Cybernetics10.1007/s13042-024-02502-yOnline publication date: 16-Jan-2025
https://doi.org/10.1007/s13042-024-02502-y

Index Terms

Imbalance-Robust Multi-Label Self-Adjusting kNN
1. Computing methodologies
  1. Machine learning

Recommendations

Multi-label sampling based on local label imbalance
Highlights
- The local imbalance is more crucial than the global one in multi-label data.
- ...
Abstract
Class imbalance is an inherent characteristic of multi-label data that hinders most multi-label learning methods. One efficient and flexible strategy to deal with this problem is to employ sampling techniques before training a multi-...
Addressing class-imbalance in multi-label learning via two-stage multi-label hypernetwork

Multi-label learning is concerned with learning from data examples that are represented by a single feature vector while associated with multiple labels simultaneously. Existing multi-label learning approaches mainly focus on exploiting label ...
Discriminatory Label-specific Weights for Multi-label Learning with Missing Labels
Abstract
Class labels in multi-label datasets are only associated with a very small fraction of the data instances leading to a class imbalance problem. There exist multi-label learning algorithms that handle the datasets’ class imbalance issue by ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Knowledge Discovery from Data

ACM Transactions on Knowledge Discovery from Data Volume 18, Issue 8

September 2024

700 pages

EISSN:1556-472X

DOI:10.1145/3613713

Editor:
Jian Pei
Duke University, USA

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 July 2024

Online AM: 11 May 2024

Accepted: 23 April 2024

Revised: 09 April 2024

Received: 20 November 2023

Published in TKDD Volume 18, Issue 8

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

CEPID-CeMEAI-Center for Mathematical Sciences Applied to Industry

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
230
Total Downloads

Downloads (Last 12 months)230
Downloads (Last 6 weeks)31

Reflects downloads up to 15 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Jain AGupta DShukla SSrivastava V(2025)Permutation driven evolutionary ordering with dependency filtering for multi-label classificationInternational Journal of Machine Learning and Cybernetics10.1007/s13042-024-02502-yOnline publication date: 16-Jan-2025
https://doi.org/10.1007/s13042-024-02502-y

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

Figures

Tables

Media

View full text|Download PDF

View Issue’s Table of Contents