Abstract
The emergence of novel data collection methods has led to the accumulation of vast amounts of unlabelled data. Discovering well separated groups of data samples through clustering is a critical but challenging task. In recent years various techniques to detect isolated and boundary points have been developed. In this work, we propose a clustering methodology that enables us to discover boundary data effectively, discriminating them from outliers. The proposed methodology utilizes a well established density based clustering method designed for high dimensional data, to develop a new ensemble scheme. The experimental results demonstrate very good performance, indicating that the approach has the potential to be used in diverse domains.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Boongoen, T., Iam-On, N.: Cluster ensembles: a survey of approaches with recent extensions and applications. Comput. Sci. Rev. 28, 1–25 (2018)
Cao, X.: High-dimensional cluster boundary detection using directed Markov tree. Pattern Anal. Appl. 24(1), 35–47 (2021)
Cao, X., Qiu, B., Xu, G.: BorderShift: toward optimal MeanShift vector for cluster boundary detection in high-dimensional data. Pattern Anal. Appl. 22, 1015–1027 (2019)
Deng, Q., Ramsköld, D., Reinius, B., Sandberg, R.: Single-cell RNA-seq reveals dynamic, random monoallelic gene expression in mammalian cells. Science 343(6167), 193–196 (2014)
Dias, M.L.D.: Fuzzy-c-means: an implementation of fuzzy \(c\)-means clustering algorithm (2019). https://git.io/fuzzy-c-means
Dua, D., Graff, C.: UCI machine learning repository (2017). http://archive.ics.uci.edu/ml
Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, KDD 1996, pp. 226–231. AAAI Press (1996)
Fränti, P., Virmajoki, O.: Iterative shrinking method for clustering problems. Pattern Recogn. 39(5), 761–765 (2006)
Hofmeyr, D., Pavlidis, N.G.: PPCI: an R package for cluster identification using projection pursuit. R J. (2019)
McInnes, L., Healy, J., Astels, S.: HDBSCAN: hierarchical density based clustering. J. Open Sour. Softw. 2(11), 205 (2017)
Pavlidis, N.G., Hofmeyr, D.P., Tasoulis, S.K.: Minimum density hyperplanes. J. Mach. Learn. Res. 17(156), 1–33 (2016)
Qiu, B.Z., Yang, Y., Du, X.W.: BRINK: an algorithm of boundary points of clusters detecton based on local qualitative factors. J. Zhengzhou Univ. (Eng. Sci.) 33(3), 117–120 (2012)
Qiu, B.-Z., Yue, F., Shen, J.-Y.: BRIM: an efficient boundary points detecting algorithm. In: Zhou, Z.-H., Li, H., Yang, Q. (eds.) PAKDD 2007. LNCS (LNAI), vol. 4426, pp. 761–768. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-71701-0_83
Qiu, B., Cao, X.: Clustering boundary detection for high dimensional space based on space inversion and Hopkins statistics. Knowl.-Based Syst. 98, 216–225 (2016)
Tang, L., Wang, X., Liu, H.: Uncoverning groups via heterogeneous interaction analysis. In: 2009 Ninth IEEE International Conference on Data Mining, pp. 503–512. IEEE (2009)
Ting, K.M.: An instance-weighting method to induce cost-sensitive trees. IEEE Trans. Knowl. Data Eng. 14(3), 659–665 (2002)
Xia, C., Hsu, W., Lee, M., Ooi, B.: BORDER: efficient computation of boundary points. IEEE Trans. Knowl. Data Eng. 18(3), 289–303 (2006)
Zhang, M.: Weighted clustering ensemble: a review. Pattern Recogn. 124, 108428 (2022)
Zheng, G.X., et al.: Massively parallel digital transcriptional profiling of single cells. Nat. Commun. 8(1), 14049 (2017)
Acknowledgment
We acknowledge support of this work by the project “Par-ICT CENG: Enhancing ICT research infrastructure in Central Greece to enable processing of Big data from sensor stream, multimedia content, and complex mathematical modeling and simulations” (MIS 5047244), which is implemented under the Action “Reinforcement of the Research and Innovation Infrastructure”, funded by the Operational Programme ”Competitiveness, Entrepreneurship and Innovation” (NSRF 2014–2020) and co-financed by Greece and the European Union (European Regional Development Fund).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Anagnostou, P., Pavlidis, N.G., Tasoulis, S. (2024). Ensemble Clustering for Boundary Detection in High-Dimensional Data. In: Nicosia, G., Ojha, V., La Malfa, E., La Malfa, G., Pardalos, P.M., Umeton, R. (eds) Machine Learning, Optimization, and Data Science. LOD 2023. Lecture Notes in Computer Science, vol 14506. Springer, Cham. https://doi.org/10.1007/978-3-031-53966-4_24
Download citation
DOI: https://doi.org/10.1007/978-3-031-53966-4_24
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-53965-7
Online ISBN: 978-3-031-53966-4
eBook Packages: Computer ScienceComputer Science (R0)