skip to main content
10.1145/3587828.3587833acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicscaConference Proceedingsconference-collections
research-article

Comparison of Distance Metrics for Generating Cluster-based Ensemble Learning

Published: 20 June 2023 Publication History

Abstract

The basis of ensemble learning is using multiple learning algorithms to improve predictive performance compared to individual learners. Behind the various advantages of ensemble learning, there are several issues that need attention, one of which is related to finding a set of diverse base learners. Recently, clustering has been used to generate diverse base learners as opposed to bagging. The main advantages of cluster-based ensemble learners are their robustness and versatility. The key parameters for implementing a clustering algorithm are the cluster size and distance metrics. The contribution of this study is to compare four distance metrics, including the Euclidean, Manhattan, Chebyshev, and Canberra distances, in the clustering method for ensemble generation and evaluate them based on accuracy, purity, and diversity. The methodology is tested on 10 benchmark UCI datasets. The results show that the use of the Chebyshev and Canberra distances achieved superior accuracy to both the Euclidean and Manhattan distances, while the purity and diversity values of the use of the Chebyshev distance outperformed the other three.

References

[1]
Giuliano Armano and Mohammad Reza Farmani. 2014. Clustering Analysis with Combination of Artificial Bee Colony Algorithm and k-Means Technique. IJTEF 6, 2 (2014), 141–145.
[2]
Kothariya Arzoo and Asst Kirit Rathod. 2017. K-Means algorithm with different distance metrics in spatial data mining with uses of NetBeans IDE 8.2. International Research Journal of Engineering and Technology (IRJET) 04, 04 (2017), 2363–2368.
[3]
Md Asafuddoula, Brijesh Verma, and Mengjie Zhang. 2017. An incremental ensemble classifier learning by means of a rule-based accuracy and diversity comparison. In 2017 International Joint Conference on Neural Networks (IJCNN), IEEE, Anchorage, AK, USA, 1924–1931.
[4]
H. Bonab and F. Can. 2019. Less Is More: A Comprehensive Framework for the Number of Components of Ensemble Classifiers. IEEE Transactions on Neural Networks and Learning Systems 30, 9 (2019), 2735-2745,.
[5]
Leo Breiman. 1996. Bagging predictors. Mach Learn 24, 2 (August 1996), 123–140.
[6]
Leo Breiman. 2001. Random Forests. Machine Learning 45, 1 (2001), 5–32.
[7]
P. Bühlmann. 2012. Bagging, Boosting and Ensemble Methods. In Handbook of Computational Statistics: Concepts and Methods. Springer, London, New York, 985–1022.
[8]
A. Chandra, H. Chen, and X. Yao. 2006. Trade-Off Between Diversity and Accuracy in Ensemble Generation. In Multi-Objective Machine Learning. Springer, Berlin, Heidelberg, 429–464.
[9]
T.G. Dietterich. 2000. Ensemble Methods in Machine Learning. In Multiple Classifier Systems. Springer, Cagliari, Italy, 1–15.
[10]
X. Dong, Z. Yu, W. Cao, Y. Shi, and Q. MA. 2020. A survey on ensemble learning. Frontiers of Computer Science 14, (2020), 241-258,.
[11]
Yoav Freund and Robert E Schapire. 1996. Experiments with a New Boosting Algorithm. Machine Learning: Proceedings of the Thirteenth International Conference (1996), 1–9.
[12]
C. Giraud-Carrier. 2022. Combining Base-Learners into Ensembles. In Metalearning: Applications to Automated Machine Learning and Data Mining Second Edition. Springer, Switzerland, 169–188.
[13]
Mehmet Gonen, Ethem Alpaydın, Boun Edu Tr, and Boun Edu Tr. 2011. Multiple Kernel Learning Algorithms. Journal of Machine Learning Research 12, (2011), 2211–2268.
[14]
L.K. Hansen and P. Salamon. 1990. Neural Network Ensembles. IEEE Transactions on Pattern Analysis and Machine Intelligence 12, 10 (1990), 993-1001,.
[15]
Anna Huang. 2008. Similarity Measures for Text Document Clustering. Proceedings of the Sixth New Zealand (2008), 49–56.
[16]
Z. Jan, J. Munoz, and A. Ali. 2022. A Novel Method for Creating an Optimized Ensemble Classifier by Introducing Cluster Size Reduction and Diversity. IEEE Transactions on Knowledge and Data Engineering 34, 7 (2022), 3072-3081,.
[17]
Zohaib Jan and Brijesh Verma. 2020. Optimal Clusters Generation for Maximizing Ensemble Classifier Performance. In 2020 International Joint Conference on Neural Networks (IJCNN), IEEE, Glasgow, UK, 1–7.
[18]
J. Johnson and C. Giraud-Carrier. 2019. Diversity, accuracy and efficiency in ensemble learning: An unexpected result. Intelligent Data Analysis 23, (2019), 297-311,.
[19]
Vijay Kumar, Jitender Kumar Chhabra, and Dinesh Kumar. 2014. Impact of Distance Measures on the Performance of Clustering Algorithms. In Intelligent Computing, Networking, and Informatics, Durga Prasad Mohapatra and Srikanta Patnaik (eds.). Springer India, New Delhi, 183–190.
[20]
Vijay Kumar, Jitender Kumar Chhabra, and Dinesh Kumar. 2014. Performance Evaluation of Distance Metrics in the Clustering Algorithms. INFOCOMP 13, 1 (2014), 38–51.
[21]
Ludmila I. Kuncheva and Christopher J. Whitaker. 2003. Measures of Diversity in Classifier Ensembles and Their Relationship with the Ensemble Accuracy. Machine Learning 51, 2 (2003), 181–207.
[22]
Taher M. Ghazal, Muhammad Zahid Hussain, Raed A. Said, Afrozah Nadeem, Mohammad Kamrul Hasan, Munir Ahmad, Muhammad Adnan Khan, and Muhammad Tahir Naseem. 2021. Performances of K-Means Clustering Algorithm with Different Distance Metrics. Intelligent Automation & Soft Computing 29, 3 (2021), 735–742.
[23]
Zohaib Md. Jan and Brijesh Verma. 2020. Evolutionary Classifier and Cluster Selection Approach for Ensemble Classification. ACM Trans. Knowl. Discov. Data 14, 1 (February 2020), 1–18.
[24]
J. Mendes-Moreira, C. Soares, A.M. Jorge, and J.F.D. Sousa. 2012. Ensemble approaches for regression: A survey. ACM Computing Surveys 45, 1 (2012), 1-40,.
[25]
Marina Adriana Mercioni and Stefan Holban. 2019. A Survey of Distance Metrics in Clustering Data Mining Techniques. In Proceedings of the 2019 3rd International Conference on Graphics and Signal Processing, ACM, Hong Kong Hong Kong, 44–47.
[26]
Shraddha Pandit and Suchita Gupta. 2011. A Comparative Study on Distance Measuring Approaches for Clustering. IJORCS 2, 1 (December 2011), 29–31.
[27]
Hermawan Prasetyo and Ayu Purwarianti. 2014. Comparison of distance measures for clustering data with mix attribute types for Indonesian potential-based regional grouping. In 2014 International Conference on Information Technology Systems and Innovation (ICITSI), IEEE, Bandung, Indonesia, 13–18.
[28]
A Rahman and B Verma. 2011. Novel Layered Clustering-Based Approach for Generating Ensemble of Classifiers. IEEE Trans. Neural Netw. 22, 5 (May 2011), 781–792.
[29]
Ashfaqur Rahman and Brijesh Verma. 2013. Ensemble classifier generation using non-uniform layered clustering and Genetic Algorithm. Knowledge-Based Systems 43, (May 2013), 30–42.
[30]
T.N. Rincy and R. Gupta. 2020. Ensemble Learning Techniques and its Efficiency in Machine Learning: A Survey. In 2nd International Conference on Data, Engineering and Applications (IDEA, Bhopal, India.
[31]
O. Sagi and L. Rokach. 2018. Ensemble learning: A survey. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery (2018), 1-18,.
[32]
Iti Sharma and Harish Sharma. 2018. Document Clustering: How to Measure Quality of Clusters in Absence of Ground Truth. 9, 2 (2018), 5.
[33]
Deepak Sinwar and Rahul Kaushik. 2014. Study of Euclidean and Manhattan Distance Metrics using Simple K-Means Clustering. International Journal for Research in Applied Science and Engineering Technology (IJRASET) 2, V (2014), 270–274.
[34]
R Suwanda, Z Syahputra, and E M Zamzami. 2020. Analysis of Euclidean Distance and Manhattan Distance in the K-Means Algorithm for Variations Number of Centroid K. J. Phys.: Conf. Ser. 1566, 1 (June 2020), 012058.
[35]
Tin Kam Ho. 1998. The random subspace method for constructing decision forests. IEEE Trans. Pattern Anal. Machine Intell. 20, 8 (1998), 832–844.
[36]
Brijesh Verma and Ashfaqur Rahman. 2012. Cluster-Oriented Ensemble Classifier: Impact of Multicluster Characterization on Ensemble Classifier Learning. IEEE Trans. Knowl. Data Eng. 24, 4 (April 2012), 605–618.
[37]
R. Xu and D. Wunsch II. 2005. Survey of Clustering Algorithms. IEEE Trans. Neural Netw. 16, 3 (May 2005), 645–678.
[38]
Lenny Putri Yulianti, Judhi Santoso, Agung Trisetyarso, and Kridanto Surendro. 2022. Hybrid Classical-Quantum Optimization for Ensemble Learning. In 2022 9th International Conference on Advanced Informatics: Concepts, Theory and Applications (ICAICTA), IEEE, Tokoname, Japan, 1–6.
[39]
Boyu Zhang, A. K. Qin, and Timos Sellis. 2018. Evolutionary feature subspaces generation for ensemble classification. In Proceedings of the Genetic and Evolutionary Computation Conference, ACM, Kyoto Japan, 577–584.
[40]
C. Zhang and Y. Ma. 2012. Ensemble Machine Learning: Methods and Applications. Springer, New York, USA.
[41]
Z.-H. Zhou. 2012. Ensemble Methods: Foundations and Algorithms. CRC Press, London, New York.

Cited By

View all
  • (2024)An accuracy analysis of classical and quantum-enhanced K-nearest neighbor algorithm using Canberra distance metricKnowledge and Information Systems10.1007/s10115-024-02229-w67:1(767-788)Online publication date: 5-Oct-2024
  • (2024)The Implementation of Quantum Annealing for Ensemble PruningProceedings of Ninth International Congress on Information and Communication Technology10.1007/978-981-97-3305-7_19(239-249)Online publication date: 30-Jul-2024
  • (2023)Annealing-Based Optimization for Selecting Training Space in Ensemble Learning2023 10th International Conference on Advanced Informatics: Concept, Theory and Application (ICAICTA)10.1109/ICAICTA59291.2023.10390142(1-6)Online publication date: 7-Oct-2023
  • Show More Cited By

Index Terms

  1. Comparison of Distance Metrics for Generating Cluster-based Ensemble Learning
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image ACM Other conferences
        ICSCA '23: Proceedings of the 2023 12th International Conference on Software and Computer Applications
        February 2023
        385 pages
        ISBN:9781450398589
        DOI:10.1145/3587828
        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 20 June 2023

        Permissions

        Request permissions for this article.

        Check for updates

        Author Tags

        1. Clustering
        2. Distance Metrics
        3. Ensemble Generation
        4. Ensemble Learning

        Qualifiers

        • Research-article
        • Research
        • Refereed limited

        Funding Sources

        • Beasiswa Pendidikan Indonesia from The Center for Higher Education Fund (Balai Pembiayaan Pendidikan Tinggi) or Center or Education Services (Pusat Layanan Pendidikan) Ministry of Education, Culture, Research, and Technology of the Republic of Indonesia

        Conference

        ICSCA 2023

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)16
        • Downloads (Last 6 weeks)1
        Reflects downloads up to 14 Feb 2025

        Other Metrics

        Citations

        Cited By

        View all
        • (2024)An accuracy analysis of classical and quantum-enhanced K-nearest neighbor algorithm using Canberra distance metricKnowledge and Information Systems10.1007/s10115-024-02229-w67:1(767-788)Online publication date: 5-Oct-2024
        • (2024)The Implementation of Quantum Annealing for Ensemble PruningProceedings of Ninth International Congress on Information and Communication Technology10.1007/978-981-97-3305-7_19(239-249)Online publication date: 30-Jul-2024
        • (2023)Annealing-Based Optimization for Selecting Training Space in Ensemble Learning2023 10th International Conference on Advanced Informatics: Concept, Theory and Application (ICAICTA)10.1109/ICAICTA59291.2023.10390142(1-6)Online publication date: 7-Oct-2023
        • (2023)A hybrid quantum annealing method for generating ensemble classifiersJournal of King Saud University - Computer and Information Sciences10.1016/j.jksuci.2023.10183135:10(101831)Online publication date: Dec-2023

        View Options

        Login options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format.

        HTML Format

        Figures

        Tables

        Media

        Share

        Share

        Share this Publication link

        Share on social media