skip to main content
research-article

A Weighted Ensemble Classification Algorithm Based on Nearest Neighbors for Multi-Label Data Stream

Published: 27 February 2023 Publication History

Abstract

With the rapid development of data stream, multi-label algorithms for mining dynamic data become more and more important. At the same time, when data distribution changes, concept drift will occur, which will make the existing classification models lose effectiveness. Ensemble methods have been used for multi-label classification, but few methods consider both the accuracy and diversity of base classifiers. To address the above-mentioned problem, a Weighted Ensemble classification algorithm based on Nearest Neighbors for Multi-Label data stream (WENNML) is proposed. WENNML uses data blocks to train Active candidate Ensemble Classifiers (AEC) and Passive candidate Ensemble Classifiers (PEC). The base classifiers of AEC and PEC are dynamically updated using geometric and diversity weighting methods. When the difference value between the number of current instances and the number of warning instances reaches the passive warning value, the algorithm selects the optimal base classifiers from AEC and PEC according to the subset accuracy and hamming score and puts them into the predictive ensemble classifiers. Experiments are carried out on 12 kinds of datasets with 9 comparison algorithms. The results show that WENNML achieves the best average rankings among the four evaluation metrics.

References

[1]
X. Zhang, M. Han, H. Wu, M. Li, and Z. Chen. 2021. An overview of complex data stream ensemble classification. Journal of Intelligent & Fuzzy Systems 41, 2 (2021), 3667–3695.
[2]
H. He and R. Xia. 2018. Joint binary neural network for multi-label learning with applications to emotion classification. In Proceedings of the CCF International Conference on Natural Language Processing and Chinese Computing. Vol. 1118, 250–259.
[3]
J. Du, Q. Chen, Y. Peng, Y. Xiang, C. Tao, and Z. Lu. 2019. ML-Net: Multi-label classification of biomedical texts with deep neural networks. Journal of the American Medical Informatics Association 26, 11 (2019), 1279–1285.
[4]
P. Rana, E. Meijering, A. Sowmya, and Y. Song. 2021. Multi-label classification based on subcellular region-guided feature description for protein localisation. In Proceedings of the 18th International Symposium on Biomedical Imaging. 1929–1933.
[5]
S. Oramas, O. Nieto, F. Barbieri, and X. Serra. 2017. Multi-label music genre classification from audio, text, and images using deep features. In Proceedings of the 18th International Society for Music Information Retrieval Conference. 23–30.
[6]
G. N. Karagoz, A. Yazici, T. Dokeroglu, and A. Cosa. 2020. Analysis of multiobjective algorithms for the classification of multi-label video datasets. IEEE Access 8 (2020), 163937–163952.
[7]
M. Han, Z. Chen, M. Li, H. Wu, and X. Zhang. 2022. A survey of active and passive concept drift handling methods. Computational Intelligence, 38, 4 (2022), 1492--1535.
[8]
M. L. Zhang and Z. H. Zhou. 2007. ML-KNN: A lazy learning approach to multi-label learning. Pattern Recognition 40, 7 (2007), 2038–2048.
[9]
C. Liu and L. Cao. 2015. A coupled k-nearest neighbor algorithm for multi-label classification. In Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining. 176–187.
[10]
D. Wang, J. Wang, F. Hu, L. Li, and X. Zhang. 2018. A locally adaptive multi-label k-nearest neighbor algorithm. In Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining. 81–93.
[11]
M. Roseberry, B. Krawczyk, and A. Cano. 2019. Multi-label punitive kNN with self-adjusting memory for drifting data streams. ACM Transactions on Knowledge Discovery from Data 13, 6 (2019), 1–31.
[12]
M. Roseberry, B. Krawczyk, Y. Djenouri, and A. Cano. 2021. Self-adjusting k nearest neighbors for continual learning from multi-label drifting data streams. Neurocomputing 442 (2021), 10–25.
[13]
G. Alberghini, S. B. Junior, and A. Cano. 2022. Adaptive ensemble of self-adjusting nearest neighbor subspaces for multi-label drifting data streams. Neurocomputing 481, C (2022), 228–248.
[14]
A. Bifet and R. Gavalda. 2007. Learning from time-changing data with adaptive windowing. In Proceedings of the 2007 SIAM International Conference on Data Mining. Society for Industrial and Applied Mathematics, 443–448.
[15]
C. Y. Hsu and J. C. Chien. 2022. Ensemble convolutional neural networks with weighted majority for wafer bin map pattern classification. Journal of Intelligent Manufacturing 33, 33 (2022), 831–844.
[16]
V. D. Quang, T. D. Khang, and N. M. Huy. 2021. Improving ADABoost algorithm with weighted SVM for imbalanced data classification. In Proceedings of the International Conference on Future Data and Security Engineering. Springer, Cham, 125–136.
[17]
Y. Xia, K. Chen, and Y. Yang. 2021. Multi-label classification with weighted classifier selection and stacked ensemble. Information Sciences 557 (2021), 421–442.
[18]
G. R. You, Y. R. Shiue, C. T. Su, and Q. L. Huang. 2022. Enhancing ensemble diversity based on multiscale dilated convolution in image classification. Information Sciences 606 (2022), 292–312.
[19]
B. E. Ouassif, A. Idri, and M. Hosni. 2021. Investigating accuracy and diversity in heterogeneous ensembles for breast cancer classification. In Proceedings of the International Conference on Computational Science and Its Applications. Springer, Cham, 263–281.
[20]
X. Zhu, Z. Ni, L. Ni, F. Jin, M. Cheng, and J. Li. 2019. Improved discrete artificial fish swarm algorithm combined with margin distance minimization for ensemble pruning. Computers & Industrial Engineering 128 (2019), 32–46.
[21]
X. Zhu, Z. Ni, L. Ni, F. Jin, M. Cheng, and Z. Wu. 2020. Ensemble pruning of ELM via migratory binary glowworm swarm optimization and margin distance minimization. Neural Processing Letters 52, 3 (2020), 2043–2067.
[22]
A. Clare and R. D. King. 2001. Knowledge discovery in multi-label phenotype data. In Proceedings of the European Conference on Principles of Data Mining and Knowledge Discovery. 42–53.
[23]
R. Wang, S. Ye, K. Li, and S. Kwong. 2021. Bayesian network based label correlation analysis for multi-label classifier chain. Information Sciences 554 (2021), 256–275.
[24]
G. Wu, R. Zheng, Y. Tian, and D. Liu. 2020. Joint ranking SVM and binary relevance with robust Low-rank learning for multi-label classification. Neural Networks 122, C (2020), 24–39.
[25]
E. P. Sapozhnikova. 2009. Multi-label classification with ART neural networks. In Proceedings of the 2009 2nd International Workshop on Knowledge Discovery and Data Mining. IEEE, 144–147.
[26]
W. N. Street and Y. S. Kim. 2001. A streaming ensemble algorithm (SEA) for large-scale classification. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 377–382.
[27]
L. I. Kuncheva. 2004. Classifier ensembles for changing environments. In Proceedings of the International Workshop on Multiple Classifier Systems. 1–15.
[28]
D. Brzezinski and J. Stefanowski. 2013. Reacting to different types of concept drift: The accuracy updated ensemble algorithm. IEEE Transactions on Neural Networks & Learning Systems 25, 1 (2013), 81–94.
[29]
N. C. Oza and S. Russell. 2001. Experimental comparisons of online and batch versions of bagging and boosting. In Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 359–364.
[30]
Q. Dai, Y. Rui, and Z. Liu. 2017. Considering diversity and accuracy simultaneously for ensemble pruning. Applied Soft Computing 58 (2017), 75–91.
[31]
H. Guo, H. Liu, R. Li, C. Wu, Y. Guo, and M. Xu. 2018. Margin & diversity based ordering ensemble pruning. Neurocomputing 275 (2018), 237–246.
[32]
S. Fletcher and B. Verma. 2019. Pruning high-similarity clusters to optimize data diversity when building ensemble classifiers. International Journal of Computational Intelligence and Applications 18, 7 (2019), 1950027.
[33]
Z. Zhou, J. Chen, Y. Song, Z. Zhu, and X. Liu. 2017. RFSEN-ELM: Selective ensemble of extreme learning machines using rotation forest for image classification. Neural Network World 27, 5 (2017), 499–517.
[34]
J. Read, B. Pfahringer, G. Holmes, and E. Frank. 2011. Classifier chains for multi-label classification. Machine Learning 85, 3 (2011), 333–359.
[35]
J. Read, B. Pfahringer, and G. Holmes. 2008. Multi-label classification using ensembles of pruned sets. In Proceedings of the 2008 8th IEEE International Conference on Data Mining. IEEE, 995–1000.
[36]
V. Freitas Rocha, F. M. Varejão, and M. E. V. Segatto. 2022.Ensemble of classifier chains and decision templates for multi-label classification. Knowledge and Information Systems 64, 3 (2022), 643–663.
[37]
Y. Yao, Y. Li, Y. Ye, and X. Li. 2021. MLCE: A Multi-Label crotch ensemble method for multi-label classification. International Journal of Pattern Recognition and Artificial Intelligence 35, 04 (2021), 121–135.
[38]
Xihui Wang, Pascale Kuntz, Frank Meyer, and Vincent Lemaire. 2021. Multi-Label kNN classifier with online dual memory on data stream. In Proceedings of the 2021 International Conference on Data Mining Workshops. 405–413.
[39]
Xiulin Zheng and Peipei Li. 2021. An efficient framework for multi-label learning in non-stationary data stream. In Proceedings of the 2021 IEEE International Conference on Big Knowledge. 149–156.
[40]
Y. Sun, H. Shao, and S. Wang. 2019. Efficient ensemble classification for multi-label data streams with concept drift. Information 10, 5 (2019), 158.
[41]
S. Zhang, D. Cheng, Z. Deng, M. Zong, and X. Deng. 2018. A novel kNN algorithm with data-driven k parameter computation. Pattern Recognition Letters 109 (2018), 44–54.
[42]
Z. Deng, X. Zhu, D. Cheng, M. Zong, and S. Zhang. 2016. Efficient kNN classification algorithm for big data. Neurocomputing 195, C (2016), 143–148.
[43]
I. Saini, D. Singh, and A. Khosla. 2013. QRS detection using K-Nearest Neighbor algorithm (KNN) and evaluation on standard ECG databases. Journal of Advanced Research 4, 4 (2013), 331–344.
[44]
S. Zhang. 2020. Cost-sensitive KNN classification. Neurocomputing 391 (2020), 234–242.
[45]
K. Q. Weinberger. 2009. Distance metric learning for large margin nearest neighbor classification. The Journal of Machine Learning Research 10, (2009), 1532–4435.
[46]
S. Zhang, J. Li, and Y. Li. 2022. Reachable distance function for KNN classification. IEEE Transactions on Knowledge and Data Engineering (2022). DOI:
[47]
S. Sun and R. Huang. 2010. An adaptive k-nearest neighbor algorithm. In Proceedings of the 2010 7th International Conference on Fuzzy Systems and Knowledge Discovery. IEEE 1, 91–94.
[48]
Z. Bian, C. M. Vong, P. K. Wong, and S. Wang. 2022. Fuzzy KNN method with adaptive nearest neighbors. IEEE Transactions on Cybernetics 52, 6 (2022), 5380–5393.
[49]
Shichao Zhang, Xuelong Li, Ming Zong, Xiaofeng Zhu, and Debo Cheng. 2017. Learning k for kNN classification. ACM Transactions on Intelligent Systems and Technology 8, 8 (2017), 1–19.
[50]
S. Zhang, X. Li, M. Zong, X. Zhu, and R. Wang. 2017. Efficient kNN classification with different numbers of nearest neighbors. IEEE Transactions on Neural Networks and Learning Systems 29, 5 (2017), 1774–1785.
[51]
S. Zhang and J. Li. 2020. KNN classification with one-step computation. IEEE Transactions on Knowledge and Data Engineering (2020). DOI:
[52]
H. R. Bonab and F. Can. 2018. GOOWE: Geometrically optimum and online-weighted ensemble classifier for evolving data streams. ACM Transactions on Knowledge Discovery from Data 12, 2 (2018), 1–33.
[53]
A. Büyükçakir, H. Bonab, and F. Can. 2018. A novel online stacked ensemble for multi-label stream classification. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management. 1063–1072.
[54]
G. Martınez-Munoz and A. Suárez. 2004. Aggregation ordering in bagging. In Proceedings of the IASTED International Conference on Artificial Intelligence and Applications. Citeseer, 258–263.
[55]
A. Bifet, G. Holmes, B. Pfahringer, P. Kranen, H. Kremer, T. Jansen, and T. Seidl. 2010. MOA: Massive online analysis, a framework for stream classification and clustering. In Proceedings of the 1st Workshop on Applications of Pattern Analysis. 44–50.
[56]
Read Jesse, Peter Reutemann, Bernhard Pfahringer, and Geoff Holmes. 2016. MEKA: A multi-label/multi-target extension to WEKA. Journal of Machine Learning Research 17, 21 (2016), 1–5.
[57]
J. Read. 2010. Scalable Multi-Label Classification. Thesis, Doctor of Philosophy (PhD). University of Waikato. Retrieved from https://hdl.handle.net/10289/4645.
[58]
J. Gama, P. P. Rodrigues, and R. Sebastiao. 2009. Evaluating algorithms that learn from data streams. In Proceedings of the 2009 ACM Symposium on Applied Computing. 1496–1500.

Cited By

View all
  • (2024)Imbalance-Robust Multi-Label Self-Adjusting kNNACM Transactions on Knowledge Discovery from Data10.1145/366357518:8(1-30)Online publication date: 26-Jul-2024

Index Terms

  1. A Weighted Ensemble Classification Algorithm Based on Nearest Neighbors for Multi-Label Data Stream

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on Knowledge Discovery from Data
      ACM Transactions on Knowledge Discovery from Data  Volume 17, Issue 5
      June 2023
      386 pages
      ISSN:1556-4681
      EISSN:1556-472X
      DOI:10.1145/3583066
      Issue’s Table of Contents

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 27 February 2023
      Online AM: 30 November 2022
      Accepted: 03 November 2022
      Revised: 25 July 2022
      Received: 21 May 2022
      Published in TKDD Volume 17, Issue 5

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Multi-label
      2. ensemble classification
      3. data stream
      4. dynamic update
      5. concept drift

      Qualifiers

      • Research-article

      Funding Sources

      • National Nature Science Foundation of China
      • Ningxia Natural Science Foundation Project

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)154
      • Downloads (Last 6 weeks)22
      Reflects downloads up to 17 Jan 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Imbalance-Robust Multi-Label Self-Adjusting kNNACM Transactions on Knowledge Discovery from Data10.1145/366357518:8(1-30)Online publication date: 26-Jul-2024

      View Options

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Full Text

      View this article in Full Text.

      Full Text

      HTML Format

      View this article in HTML Format.

      HTML Format

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media