research-article

A Weighted Ensemble Classification Algorithm Based on Nearest Neighbors for Multi-Label Data Stream

Authors:

Xilong ZhangAuthors Info & Claims

ACM Transactions on Knowledge Discovery from Data, Volume 17, Issue 5

Article No.: 72, Pages 1 - 21

https://doi.org/10.1145/3570960

Published: 27 February 2023 Publication History

Abstract

With the rapid development of data stream, multi-label algorithms for mining dynamic data become more and more important. At the same time, when data distribution changes, concept drift will occur, which will make the existing classification models lose effectiveness. Ensemble methods have been used for multi-label classification, but few methods consider both the accuracy and diversity of base classifiers. To address the above-mentioned problem, a Weighted Ensemble classification algorithm based on Nearest Neighbors for Multi-Label data stream (WENNML) is proposed. WENNML uses data blocks to train Active candidate Ensemble Classifiers (AEC) and Passive candidate Ensemble Classifiers (PEC). The base classifiers of AEC and PEC are dynamically updated using geometric and diversity weighting methods. When the difference value between the number of current instances and the number of warning instances reaches the passive warning value, the algorithm selects the optimal base classifiers from AEC and PEC according to the subset accuracy and hamming score and puts them into the predictive ensemble classifiers. Experiments are carried out on 12 kinds of datasets with 9 comparison algorithms. The results show that WENNML achieves the best average rankings among the four evaluation metrics.

References

[1]

X. Zhang, M. Han, H. Wu, M. Li, and Z. Chen. 2021. An overview of complex data stream ensemble classification. Journal of Intelligent & Fuzzy Systems 41, 2 (2021), 3667–3695.

Digital Library

[2]

H. He and R. Xia. 2018. Joint binary neural network for multi-label learning with applications to emotion classification. In Proceedings of the CCF International Conference on Natural Language Processing and Chinese Computing. Vol. 1118, 250–259.

Digital Library

[3]

J. Du, Q. Chen, Y. Peng, Y. Xiang, C. Tao, and Z. Lu. 2019. ML-Net: Multi-label classification of biomedical texts with deep neural networks. Journal of the American Medical Informatics Association 26, 11 (2019), 1279–1285.

[4]

P. Rana, E. Meijering, A. Sowmya, and Y. Song. 2021. Multi-label classification based on subcellular region-guided feature description for protein localisation. In Proceedings of the 18th International Symposium on Biomedical Imaging. 1929–1933.

[5]

S. Oramas, O. Nieto, F. Barbieri, and X. Serra. 2017. Multi-label music genre classification from audio, text, and images using deep features. In Proceedings of the 18th International Society for Music Information Retrieval Conference. 23–30.

[6]

G. N. Karagoz, A. Yazici, T. Dokeroglu, and A. Cosa. 2020. Analysis of multiobjective algorithms for the classification of multi-label video datasets. IEEE Access 8 (2020), 163937–163952.

[7]

M. Han, Z. Chen, M. Li, H. Wu, and X. Zhang. 2022. A survey of active and passive concept drift handling methods. Computational Intelligence, 38, 4 (2022), 1492--1535.

[8]

M. L. Zhang and Z. H. Zhou. 2007. ML-KNN: A lazy learning approach to multi-label learning. Pattern Recognition 40, 7 (2007), 2038–2048.

Digital Library

[9]

C. Liu and L. Cao. 2015. A coupled k-nearest neighbor algorithm for multi-label classification. In Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining. 176–187.

[10]

D. Wang, J. Wang, F. Hu, L. Li, and X. Zhang. 2018. A locally adaptive multi-label k-nearest neighbor algorithm. In Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining. 81–93.

Digital Library

[11]

M. Roseberry, B. Krawczyk, and A. Cano. 2019. Multi-label punitive kNN with self-adjusting memory for drifting data streams. ACM Transactions on Knowledge Discovery from Data 13, 6 (2019), 1–31.

Digital Library

[12]

M. Roseberry, B. Krawczyk, Y. Djenouri, and A. Cano. 2021. Self-adjusting k nearest neighbors for continual learning from multi-label drifting data streams. Neurocomputing 442 (2021), 10–25.

[13]

G. Alberghini, S. B. Junior, and A. Cano. 2022. Adaptive ensemble of self-adjusting nearest neighbor subspaces for multi-label drifting data streams. Neurocomputing 481, C (2022), 228–248.

Digital Library

[14]

A. Bifet and R. Gavalda. 2007. Learning from time-changing data with adaptive windowing. In Proceedings of the 2007 SIAM International Conference on Data Mining. Society for Industrial and Applied Mathematics, 443–448.

[15]

C. Y. Hsu and J. C. Chien. 2022. Ensemble convolutional neural networks with weighted majority for wafer bin map pattern classification. Journal of Intelligent Manufacturing 33, 33 (2022), 831–844.

Digital Library

[16]

V. D. Quang, T. D. Khang, and N. M. Huy. 2021. Improving ADABoost algorithm with weighted SVM for imbalanced data classification. In Proceedings of the International Conference on Future Data and Security Engineering. Springer, Cham, 125–136.

Digital Library

[17]

Y. Xia, K. Chen, and Y. Yang. 2021. Multi-label classification with weighted classifier selection and stacked ensemble. Information Sciences 557 (2021), 421–442.

[18]

G. R. You, Y. R. Shiue, C. T. Su, and Q. L. Huang. 2022. Enhancing ensemble diversity based on multiscale dilated convolution in image classification. Information Sciences 606 (2022), 292–312.

Digital Library

[19]

B. E. Ouassif, A. Idri, and M. Hosni. 2021. Investigating accuracy and diversity in heterogeneous ensembles for breast cancer classification. In Proceedings of the International Conference on Computational Science and Its Applications. Springer, Cham, 263–281.

Digital Library

[20]

X. Zhu, Z. Ni, L. Ni, F. Jin, M. Cheng, and J. Li. 2019. Improved discrete artificial fish swarm algorithm combined with margin distance minimization for ensemble pruning. Computers & Industrial Engineering 128 (2019), 32–46.

Digital Library

[21]

X. Zhu, Z. Ni, L. Ni, F. Jin, M. Cheng, and Z. Wu. 2020. Ensemble pruning of ELM via migratory binary glowworm swarm optimization and margin distance minimization. Neural Processing Letters 52, 3 (2020), 2043–2067.

Digital Library

[22]

A. Clare and R. D. King. 2001. Knowledge discovery in multi-label phenotype data. In Proceedings of the European Conference on Principles of Data Mining and Knowledge Discovery. 42–53.

[23]

R. Wang, S. Ye, K. Li, and S. Kwong. 2021. Bayesian network based label correlation analysis for multi-label classifier chain. Information Sciences 554 (2021), 256–275.

[24]

G. Wu, R. Zheng, Y. Tian, and D. Liu. 2020. Joint ranking SVM and binary relevance with robust Low-rank learning for multi-label classification. Neural Networks 122, C (2020), 24–39.

Digital Library

[25]

E. P. Sapozhnikova. 2009. Multi-label classification with ART neural networks. In Proceedings of the 2009 2nd International Workshop on Knowledge Discovery and Data Mining. IEEE, 144–147.

Digital Library

[26]

W. N. Street and Y. S. Kim. 2001. A streaming ensemble algorithm (SEA) for large-scale classification. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 377–382.

Digital Library

[27]

L. I. Kuncheva. 2004. Classifier ensembles for changing environments. In Proceedings of the International Workshop on Multiple Classifier Systems. 1–15.

[28]

D. Brzezinski and J. Stefanowski. 2013. Reacting to different types of concept drift: The accuracy updated ensemble algorithm. IEEE Transactions on Neural Networks & Learning Systems 25, 1 (2013), 81–94.

[29]

N. C. Oza and S. Russell. 2001. Experimental comparisons of online and batch versions of bagging and boosting. In Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 359–364.

Digital Library

[30]

Q. Dai, Y. Rui, and Z. Liu. 2017. Considering diversity and accuracy simultaneously for ensemble pruning. Applied Soft Computing 58 (2017), 75–91.

[31]

H. Guo, H. Liu, R. Li, C. Wu, Y. Guo, and M. Xu. 2018. Margin & diversity based ordering ensemble pruning. Neurocomputing 275 (2018), 237–246.

[32]

S. Fletcher and B. Verma. 2019. Pruning high-similarity clusters to optimize data diversity when building ensemble classifiers. International Journal of Computational Intelligence and Applications 18, 7 (2019), 1950027.

[33]

Z. Zhou, J. Chen, Y. Song, Z. Zhu, and X. Liu. 2017. RFSEN-ELM: Selective ensemble of extreme learning machines using rotation forest for image classification. Neural Network World 27, 5 (2017), 499–517.

[34]

J. Read, B. Pfahringer, G. Holmes, and E. Frank. 2011. Classifier chains for multi-label classification. Machine Learning 85, 3 (2011), 333–359.

Digital Library

[35]

J. Read, B. Pfahringer, and G. Holmes. 2008. Multi-label classification using ensembles of pruned sets. In Proceedings of the 2008 8th IEEE International Conference on Data Mining. IEEE, 995–1000.

Digital Library

[36]

V. Freitas Rocha, F. M. Varejão, and M. E. V. Segatto. 2022.Ensemble of classifier chains and decision templates for multi-label classification. Knowledge and Information Systems 64, 3 (2022), 643–663.

Digital Library

[37]

Y. Yao, Y. Li, Y. Ye, and X. Li. 2021. MLCE: A Multi-Label crotch ensemble method for multi-label classification. International Journal of Pattern Recognition and Artificial Intelligence 35, 04 (2021), 121–135.

[38]

Xihui Wang, Pascale Kuntz, Frank Meyer, and Vincent Lemaire. 2021. Multi-Label kNN classifier with online dual memory on data stream. In Proceedings of the 2021 International Conference on Data Mining Workshops. 405–413.

[39]

Xiulin Zheng and Peipei Li. 2021. An efficient framework for multi-label learning in non-stationary data stream. In Proceedings of the 2021 IEEE International Conference on Big Knowledge. 149–156.

[40]

Y. Sun, H. Shao, and S. Wang. 2019. Efficient ensemble classification for multi-label data streams with concept drift. Information 10, 5 (2019), 158.

[41]

S. Zhang, D. Cheng, Z. Deng, M. Zong, and X. Deng. 2018. A novel kNN algorithm with data-driven k parameter computation. Pattern Recognition Letters 109 (2018), 44–54.

[42]

Z. Deng, X. Zhu, D. Cheng, M. Zong, and S. Zhang. 2016. Efficient kNN classification algorithm for big data. Neurocomputing 195, C (2016), 143–148.

Digital Library

[43]

I. Saini, D. Singh, and A. Khosla. 2013. QRS detection using K-Nearest Neighbor algorithm (KNN) and evaluation on standard ECG databases. Journal of Advanced Research 4, 4 (2013), 331–344.

[44]

S. Zhang. 2020. Cost-sensitive KNN classification. Neurocomputing 391 (2020), 234–242.

[45]

K. Q. Weinberger. 2009. Distance metric learning for large margin nearest neighbor classification. The Journal of Machine Learning Research 10, (2009), 1532–4435.

[46]

S. Zhang, J. Li, and Y. Li. 2022. Reachable distance function for KNN classification. IEEE Transactions on Knowledge and Data Engineering (2022). DOI:

[47]

S. Sun and R. Huang. 2010. An adaptive k-nearest neighbor algorithm. In Proceedings of the 2010 7th International Conference on Fuzzy Systems and Knowledge Discovery. IEEE 1, 91–94.

[48]

Z. Bian, C. M. Vong, P. K. Wong, and S. Wang. 2022. Fuzzy KNN method with adaptive nearest neighbors. IEEE Transactions on Cybernetics 52, 6 (2022), 5380–5393.

[49]

Shichao Zhang, Xuelong Li, Ming Zong, Xiaofeng Zhu, and Debo Cheng. 2017. Learning k for kNN classification. ACM Transactions on Intelligent Systems and Technology 8, 8 (2017), 1–19.

Digital Library

[50]

S. Zhang, X. Li, M. Zong, X. Zhu, and R. Wang. 2017. Efficient kNN classification with different numbers of nearest neighbors. IEEE Transactions on Neural Networks and Learning Systems 29, 5 (2017), 1774–1785.

[51]

S. Zhang and J. Li. 2020. KNN classification with one-step computation. IEEE Transactions on Knowledge and Data Engineering (2020). DOI:

[52]

H. R. Bonab and F. Can. 2018. GOOWE: Geometrically optimum and online-weighted ensemble classifier for evolving data streams. ACM Transactions on Knowledge Discovery from Data 12, 2 (2018), 1–33.

Digital Library

[53]

A. Büyükçakir, H. Bonab, and F. Can. 2018. A novel online stacked ensemble for multi-label stream classification. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management. 1063–1072.

Digital Library

[54]

G. Martınez-Munoz and A. Suárez. 2004. Aggregation ordering in bagging. In Proceedings of the IASTED International Conference on Artificial Intelligence and Applications. Citeseer, 258–263.

[55]

A. Bifet, G. Holmes, B. Pfahringer, P. Kranen, H. Kremer, T. Jansen, and T. Seidl. 2010. MOA: Massive online analysis, a framework for stream classification and clustering. In Proceedings of the 1st Workshop on Applications of Pattern Analysis. 44–50.

[56]

Read Jesse, Peter Reutemann, Bernhard Pfahringer, and Geoff Holmes. 2016. MEKA: A multi-label/multi-target extension to WEKA. Journal of Machine Learning Research 17, 21 (2016), 1–5.

Digital Library

[57]

J. Read. 2010. Scalable Multi-Label Classification. Thesis, Doctor of Philosophy (PhD). University of Waikato. Retrieved from https://hdl.handle.net/10289/4645.

[58]

J. Gama, P. P. Rodrigues, and R. Sebastiao. 2009. Evaluating algorithms that learn from data streams. In Proceedings of the 2009 ACM Symposium on Applied Computing. 1496–1500.

Digital Library

Cited By

Nicola VDelgado KLauretto M(2024)Imbalance-Robust Multi-Label Self-Adjusting kNNACM Transactions on Knowledge Discovery from Data10.1145/366357518:8(1-30)Online publication date: 26-Jul-2024
https://doi.org/10.1145/3663575

Index Terms

A Weighted Ensemble Classification Algorithm Based on Nearest Neighbors for Multi-Label Data Stream
1. Computing methodologies
  1. Machine learning
    1. Machine learning algorithms
      1. Ensemble methods
2. Software and its engineering

Recommendations

A Novel Online Stacked Ensemble for Multi-Label Stream Classification
CIKM '18: Proceedings of the 27th ACM International Conference on Information and Knowledge Management

As data streams become more prevalent, the necessity for online algorithms that mine this transient and dynamic data becomes clearer. Multi-label data stream classification is a supervised learning problem where each instance in the data stream is ...
Mining Multi-label Concept-Drifting Data Streams Using Dynamic Classifier Ensemble
ACML '09: Proceedings of the 1st Asian Conference on Machine Learning: Advances in Machine Learning

The problem of mining single-label data streams has been extensively studied in recent years. However, not enough attention has been paid to the problem of mining multi-label data streams. In this paper, we propose an improved binary relevance method to ...
Dynamic ensemble selection classification algorithm based on window over imbalanced drift data stream
Abstract
Data stream classification is an important research direction in the field of data mining, but in many practical applications, it is impossible to collect the complete training set at one time, and the data may be in an imbalanced state and ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Knowledge Discovery from Data

ACM Transactions on Knowledge Discovery from Data Volume 17, Issue 5

June 2023

386 pages

ISSN:1556-4681

EISSN:1556-472X

DOI:10.1145/3583066

Editor:
Charu Aggarwal
IBM T. J. Watson Research, USA

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 February 2023

Online AM: 30 November 2022

Accepted: 03 November 2022

Revised: 25 July 2022

Received: 21 May 2022

Published in TKDD Volume 17, Issue 5

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Nature Science Foundation of China
Ningxia Natural Science Foundation Project

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
451
Total Downloads

Downloads (Last 12 months)154
Downloads (Last 6 weeks)22

Reflects downloads up to 17 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Nicola VDelgado KLauretto M(2024)Imbalance-Robust Multi-Label Self-Adjusting kNNACM Transactions on Knowledge Discovery from Data10.1145/366357518:8(1-30)Online publication date: 26-Jul-2024
https://doi.org/10.1145/3663575

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View full text|Download PDF

View Issue’s Table of Contents