research-article

Comparison of Distance Metrics for Generating Cluster-based Ensemble Learning

Authors:

Lenny Putri Yulianti,

Agung Trisetyarso,

Kridanto SurendroAuthors Info & Claims

ICSCA '23: Proceedings of the 2023 12th International Conference on Software and Computer Applications

Pages 26 - 33

https://doi.org/10.1145/3587828.3587833

Published: 20 June 2023 Publication History

Abstract

The basis of ensemble learning is using multiple learning algorithms to improve predictive performance compared to individual learners. Behind the various advantages of ensemble learning, there are several issues that need attention, one of which is related to finding a set of diverse base learners. Recently, clustering has been used to generate diverse base learners as opposed to bagging. The main advantages of cluster-based ensemble learners are their robustness and versatility. The key parameters for implementing a clustering algorithm are the cluster size and distance metrics. The contribution of this study is to compare four distance metrics, including the Euclidean, Manhattan, Chebyshev, and Canberra distances, in the clustering method for ensemble generation and evaluate them based on accuracy, purity, and diversity. The methodology is tested on 10 benchmark UCI datasets. The results show that the use of the Chebyshev and Canberra distances achieved superior accuracy to both the Euclidean and Manhattan distances, while the purity and diversity values of the use of the Chebyshev distance outperformed the other three.

References

[1]

Giuliano Armano and Mohammad Reza Farmani. 2014. Clustering Analysis with Combination of Artificial Bee Colony Algorithm and k-Means Technique. IJTEF 6, 2 (2014), 141–145.

[2]

Kothariya Arzoo and Asst Kirit Rathod. 2017. K-Means algorithm with different distance metrics in spatial data mining with uses of NetBeans IDE 8.2. International Research Journal of Engineering and Technology (IRJET) 04, 04 (2017), 2363–2368.

[3]

Md Asafuddoula, Brijesh Verma, and Mengjie Zhang. 2017. An incremental ensemble classifier learning by means of a rule-based accuracy and diversity comparison. In 2017 International Joint Conference on Neural Networks (IJCNN), IEEE, Anchorage, AK, USA, 1924–1931.

[4]

H. Bonab and F. Can. 2019. Less Is More: A Comprehensive Framework for the Number of Components of Ensemble Classifiers. IEEE Transactions on Neural Networks and Learning Systems 30, 9 (2019), 2735-2745,.

[5]

Leo Breiman. 1996. Bagging predictors. Mach Learn 24, 2 (August 1996), 123–140.

[6]

Leo Breiman. 2001. Random Forests. Machine Learning 45, 1 (2001), 5–32.

Digital Library

[7]

P. Bühlmann. 2012. Bagging, Boosting and Ensemble Methods. In Handbook of Computational Statistics: Concepts and Methods. Springer, London, New York, 985–1022.

[8]

A. Chandra, H. Chen, and X. Yao. 2006. Trade-Off Between Diversity and Accuracy in Ensemble Generation. In Multi-Objective Machine Learning. Springer, Berlin, Heidelberg, 429–464.

[9]

T.G. Dietterich. 2000. Ensemble Methods in Machine Learning. In Multiple Classifier Systems. Springer, Cagliari, Italy, 1–15.

[10]

X. Dong, Z. Yu, W. Cao, Y. Shi, and Q. MA. 2020. A survey on ensemble learning. Frontiers of Computer Science 14, (2020), 241-258,.

Digital Library

[11]

Yoav Freund and Robert E Schapire. 1996. Experiments with a New Boosting Algorithm. Machine Learning: Proceedings of the Thirteenth International Conference (1996), 1–9.

Digital Library

[12]

C. Giraud-Carrier. 2022. Combining Base-Learners into Ensembles. In Metalearning: Applications to Automated Machine Learning and Data Mining Second Edition. Springer, Switzerland, 169–188.

[13]

Mehmet Gonen, Ethem Alpaydın, Boun Edu Tr, and Boun Edu Tr. 2011. Multiple Kernel Learning Algorithms. Journal of Machine Learning Research 12, (2011), 2211–2268.

[14]

L.K. Hansen and P. Salamon. 1990. Neural Network Ensembles. IEEE Transactions on Pattern Analysis and Machine Intelligence 12, 10 (1990), 993-1001,.

Digital Library

[15]

Anna Huang. 2008. Similarity Measures for Text Document Clustering. Proceedings of the Sixth New Zealand (2008), 49–56.

[16]

Z. Jan, J. Munoz, and A. Ali. 2022. A Novel Method for Creating an Optimized Ensemble Classifier by Introducing Cluster Size Reduction and Diversity. IEEE Transactions on Knowledge and Data Engineering 34, 7 (2022), 3072-3081,.

[17]

Zohaib Jan and Brijesh Verma. 2020. Optimal Clusters Generation for Maximizing Ensemble Classifier Performance. In 2020 International Joint Conference on Neural Networks (IJCNN), IEEE, Glasgow, UK, 1–7.

[18]

J. Johnson and C. Giraud-Carrier. 2019. Diversity, accuracy and efficiency in ensemble learning: An unexpected result. Intelligent Data Analysis 23, (2019), 297-311,.

[19]

Vijay Kumar, Jitender Kumar Chhabra, and Dinesh Kumar. 2014. Impact of Distance Measures on the Performance of Clustering Algorithms. In Intelligent Computing, Networking, and Informatics, Durga Prasad Mohapatra and Srikanta Patnaik (eds.). Springer India, New Delhi, 183–190.

[20]

Vijay Kumar, Jitender Kumar Chhabra, and Dinesh Kumar. 2014. Performance Evaluation of Distance Metrics in the Clustering Algorithms. INFOCOMP 13, 1 (2014), 38–51.

[21]

Ludmila I. Kuncheva and Christopher J. Whitaker. 2003. Measures of Diversity in Classifier Ensembles and Their Relationship with the Ensemble Accuracy. Machine Learning 51, 2 (2003), 181–207.

Digital Library

[22]

Taher M. Ghazal, Muhammad Zahid Hussain, Raed A. Said, Afrozah Nadeem, Mohammad Kamrul Hasan, Munir Ahmad, Muhammad Adnan Khan, and Muhammad Tahir Naseem. 2021. Performances of K-Means Clustering Algorithm with Different Distance Metrics. Intelligent Automation & Soft Computing 29, 3 (2021), 735–742.

[23]

Zohaib Md. Jan and Brijesh Verma. 2020. Evolutionary Classifier and Cluster Selection Approach for Ensemble Classification. ACM Trans. Knowl. Discov. Data 14, 1 (February 2020), 1–18.

Digital Library

[24]

J. Mendes-Moreira, C. Soares, A.M. Jorge, and J.F.D. Sousa. 2012. Ensemble approaches for regression: A survey. ACM Computing Surveys 45, 1 (2012), 1-40,.

Digital Library

[25]

Marina Adriana Mercioni and Stefan Holban. 2019. A Survey of Distance Metrics in Clustering Data Mining Techniques. In Proceedings of the 2019 3rd International Conference on Graphics and Signal Processing, ACM, Hong Kong Hong Kong, 44–47.

Digital Library

[26]

Shraddha Pandit and Suchita Gupta. 2011. A Comparative Study on Distance Measuring Approaches for Clustering. IJORCS 2, 1 (December 2011), 29–31.

[27]

Hermawan Prasetyo and Ayu Purwarianti. 2014. Comparison of distance measures for clustering data with mix attribute types for Indonesian potential-based regional grouping. In 2014 International Conference on Information Technology Systems and Innovation (ICITSI), IEEE, Bandung, Indonesia, 13–18.

[28]

A Rahman and B Verma. 2011. Novel Layered Clustering-Based Approach for Generating Ensemble of Classifiers. IEEE Trans. Neural Netw. 22, 5 (May 2011), 781–792.

Digital Library

[29]

Ashfaqur Rahman and Brijesh Verma. 2013. Ensemble classifier generation using non-uniform layered clustering and Genetic Algorithm. Knowledge-Based Systems 43, (May 2013), 30–42.

Digital Library

[30]

T.N. Rincy and R. Gupta. 2020. Ensemble Learning Techniques and its Efficiency in Machine Learning: A Survey. In 2nd International Conference on Data, Engineering and Applications (IDEA, Bhopal, India.

[31]

O. Sagi and L. Rokach. 2018. Ensemble learning: A survey. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery (2018), 1-18,.

[32]

Iti Sharma and Harish Sharma. 2018. Document Clustering: How to Measure Quality of Clusters in Absence of Ground Truth. 9, 2 (2018), 5.

[33]

Deepak Sinwar and Rahul Kaushik. 2014. Study of Euclidean and Manhattan Distance Metrics using Simple K-Means Clustering. International Journal for Research in Applied Science and Engineering Technology (IJRASET) 2, V (2014), 270–274.

[34]

R Suwanda, Z Syahputra, and E M Zamzami. 2020. Analysis of Euclidean Distance and Manhattan Distance in the K-Means Algorithm for Variations Number of Centroid K. J. Phys.: Conf. Ser. 1566, 1 (June 2020), 012058.

[35]

Tin Kam Ho. 1998. The random subspace method for constructing decision forests. IEEE Trans. Pattern Anal. Machine Intell. 20, 8 (1998), 832–844.

Digital Library

[36]

Brijesh Verma and Ashfaqur Rahman. 2012. Cluster-Oriented Ensemble Classifier: Impact of Multicluster Characterization on Ensemble Classifier Learning. IEEE Trans. Knowl. Data Eng. 24, 4 (April 2012), 605–618.

Digital Library

[37]

R. Xu and D. Wunsch II. 2005. Survey of Clustering Algorithms. IEEE Trans. Neural Netw. 16, 3 (May 2005), 645–678.

Digital Library

[38]

Lenny Putri Yulianti, Judhi Santoso, Agung Trisetyarso, and Kridanto Surendro. 2022. Hybrid Classical-Quantum Optimization for Ensemble Learning. In 2022 9th International Conference on Advanced Informatics: Concepts, Theory and Applications (ICAICTA), IEEE, Tokoname, Japan, 1–6.

[39]

Boyu Zhang, A. K. Qin, and Timos Sellis. 2018. Evolutionary feature subspaces generation for ensemble classification. In Proceedings of the Genetic and Evolutionary Computation Conference, ACM, Kyoto Japan, 577–584.

Digital Library

[40]

C. Zhang and Y. Ma. 2012. Ensemble Machine Learning: Methods and Applications. Springer, New York, USA.

[41]

Z.-H. Zhou. 2012. Ensemble Methods: Foundations and Algorithms. CRC Press, London, New York.

Cited By

Bhaskaran PPrasanna S(2024)An accuracy analysis of classical and quantum-enhanced K-nearest neighbor algorithm using Canberra distance metricKnowledge and Information Systems10.1007/s10115-024-02229-w67:1(767-788)Online publication date: 5-Oct-2024
https://doi.org/10.1007/s10115-024-02229-w
Yulianti LTrisetyarso ASantoso JSurendro K(2024)The Implementation of Quantum Annealing for Ensemble PruningProceedings of Ninth International Congress on Information and Communication Technology10.1007/978-981-97-3305-7_19(239-249)Online publication date: 30-Jul-2024
https://doi.org/10.1007/978-981-97-3305-7_19
Yulianti LTrisetyarso ASantoso JSurendro K(2023)Annealing-Based Optimization for Selecting Training Space in Ensemble Learning2023 10th International Conference on Advanced Informatics: Concept, Theory and Application (ICAICTA)10.1109/ICAICTA59291.2023.10390142(1-6)Online publication date: 7-Oct-2023
https://doi.org/10.1109/ICAICTA59291.2023.10390142
Show More Cited By

Index Terms

Comparison of Distance Metrics for Generating Cluster-based Ensemble Learning
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Unsupervised learning
        Cluster analysis
    2. Machine learning approaches
2. Information systems
  1. Information systems applications

Index terms have been assigned to the content through auto-classification.

Recommendations

Cluster-Oriented Ensemble Classifier: Impact of Multicluster Characterization on Ensemble Classifier Learning

This paper presents a novel cluster-oriented ensemble classifier. The proposed ensemble classifier is based on original concepts such as learning of cluster boundaries by the base classifiers and mapping of cluster confidences to class decision using a ...
Optimizing clustering to promote data diversity when generating an ensemble classifier
GECCO '18: Proceedings of the Genetic and Evolutionary Computation Conference Companion

In this paper, we propose a method to generate an optimized ensemble classifier. In the proposed method, a diverse input space is created by clustering training data incrementally within a cycle. A cycle is one complete round that includes clustering, ...
A hybrid quantum annealing method for generating ensemble classifiers
Abstract
Quantum annealing has been widely used to optimize machine learning such as ensemble classifiers. This ensemble classifier enhances classification performance through the combination of multiple accurate and diverse base classifiers. It benefits ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

ICSCA '23: Proceedings of the 2023 12th International Conference on Software and Computer Applications

February 2023

385 pages

ISBN:9781450398589

DOI:10.1145/3587828

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 June 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

Beasiswa Pendidikan Indonesia from The Center for Higher Education Fund (Balai Pembiayaan Pendidikan Tinggi) or Center or Education Services (Pusat Layanan Pendidikan) Ministry of Education, Culture, Research, and Technology of the Republic of Indonesia

Conference

ICSCA 2023

ICSCA 2023: 2023 12th International Conference on Software and Computer Applications

February 23 - 25, 2023

Kuantan, Malaysia

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
40
Total Downloads

Downloads (Last 12 months)16
Downloads (Last 6 weeks)1

Reflects downloads up to 14 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Bhaskaran PPrasanna S(2024)An accuracy analysis of classical and quantum-enhanced K-nearest neighbor algorithm using Canberra distance metricKnowledge and Information Systems10.1007/s10115-024-02229-w67:1(767-788)Online publication date: 5-Oct-2024
https://doi.org/10.1007/s10115-024-02229-w
Yulianti LTrisetyarso ASantoso JSurendro K(2024)The Implementation of Quantum Annealing for Ensemble PruningProceedings of Ninth International Congress on Information and Communication Technology10.1007/978-981-97-3305-7_19(239-249)Online publication date: 30-Jul-2024
https://doi.org/10.1007/978-981-97-3305-7_19
Yulianti LTrisetyarso ASantoso JSurendro K(2023)Annealing-Based Optimization for Selecting Training Space in Ensemble Learning2023 10th International Conference on Advanced Informatics: Concept, Theory and Application (ICAICTA)10.1109/ICAICTA59291.2023.10390142(1-6)Online publication date: 7-Oct-2023
https://doi.org/10.1109/ICAICTA59291.2023.10390142
Yulianti LTrisetyarso ASantoso JSurendro K(2023)A hybrid quantum annealing method for generating ensemble classifiersJournal of King Saud University - Computer and Information Sciences10.1016/j.jksuci.2023.10183135:10(101831)Online publication date: Dec-2023
https://doi.org/10.1016/j.jksuci.2023.101831

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Table of Conten