Abstract
The analysis of complex, weakly labeled data is increasingly popular, presenting unique challenges. Traditional unsupervised clustering aims to uncover interrelated sets of objects using feature-based similarity of the objects, but this approach often hits its limits for complex multimedia data. Thus, semi-supervised clustering that exploits small amounts of labeled training data has gained traction recently. In this paper, we propose LabeledPAM, a semi-supervised extension of FasterPAM, a state-of-the-art k-medoids clustering algorithm. Our approach is applicable in semi-supervised classification tasks, where labels are assigned to clusters with minimal labeled data, as well as in semi-supervised clustering scenarios, identifying new clusters with unknown labels. We evaluate our proposal against other semi-supervised clustering techniques suitable for arbitrary distances, demonstrating its efficacy and versatility.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Alpaydin, E., Kaynak, C.: Optical Recognition of Handwritten Digits. UCI Machine Learning Repository (1998). https://doi.org/10.24432/C50P49
Balcan, M.F., Blum, A., Gupta, A.: Clustering under approximation stability. J. ACM 60(2) (2013). https://doi.org/10.1145/2450142.2450144
Basu, S., Banerjee, A., Mooney, R.J.: Semi-supervised clustering by seeding. In: ICML, pp. 27–34 (2002)
Estivill-Castro, V., Murray, A.T.: Discovering associations in spatial data - an efficient medoid based approach. PAKDD 1394, 110–121 (1998). https://doi.org/10.1007/3-540-64383-4_10
Estivill-Castro, V., Yang, J.: Fast and robust general purpose clustering algorithms. Data Min. Knowl. Discov. 8(2), 127–150 (2004). https://doi.org/10.1023/B:DAMI.0000015869.08323.b3
Ezugwu, A.E., Ikotun, A.M., Oyelade, O.O., Abualigah, L., Agushaka, J.O., Eke, C.I., Akinyelu, A.A.: A comprehensive survey of clustering algorithms: state-of-the-art machine learning applications, taxonomy, challenges, and future research prospects. Eng. Appl. Artif. Intell. 110, 104743 (2022). https://doi.org/10.1016/j.engappai.2022.104743
Hubert, L.J., Arabie, P.: Comparing partitions. J. Classif. 2, 193–218 (1985)
Jiang, H., Ren, Z., Xuan, J., Wu, X.: Extracting elite pairwise constraints for clustering. Neurocomputing 99, 124–133 (2013). https://doi.org/10.1016/j.neucom.2012.06.013
Kaufman, L., Rousseeuw, P.J.: Finding Groups in Data: An Introduction to Cluster Analysis. Wiley (2009). https://doi.org/10.1002/9780470316801
Krizhevsky, A.: Learning multiple layers of features from tiny images. Tech. rep., University of Toronto (2009)
Lange, T., Law, M.H., Jain, A.K., Buhmann, J.M.: Learning with constrained and unlabelled data. In: CVPR, pp. 731–738 (2005). https://doi.org/10.1109/CVPR.2005.210
LeCun, Y., Cortes, C., Burges, C.: MNIST handwritten digit database. http://yann.lecun.com/exdb/mnist (1998)
Lenssen, L., Schubert, E.: Sparse partitioning around medoids. In: Machine Learning Under Resource Constraints – Volume 1: Fundamentals, pp. 182–196. De Gruyter (2022). https://doi.org/10.1515/9783110785944-005
Leonard Kaufman, P.J.R.: Partitioning Around Medoids (Program PAM), pp. 68–125. Wiley (1990) (chapter 2)
Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., Krueger, G., Sutskever, I.: Learning transferable visual models from natural language supervision. In: ICML, pp. 8748–8763 (2021)
Reynolds, A.P., Richards, G., de la Iglesia, B., Rayward-Smith, V.J.: Clustering rules: a comparison of partitioning and hierarchical clustering algorithms. J. Math. Model. Algorith. 5, 475–504 (2006). https://doi.org/10.1007/s10852-005-9022-1
Schubert, E.: Automatic indexing for similarity search in ELKI. In: Similarity Search and Applications. SISAP (2022). https://doi.org/10.1007/978-3-031-17849-8_16
Schubert, E., Rousseeuw, P.J.: Fast and eager k-medoids clustering: O(k) runtime improvement of the PAM, CLARA, and CLARANS algorithms. Inf. Syst. 101, 101804 (2021). https://doi.org/10.1016/j.is.2021.101804
Schubert, E., Zimek, A.: ELKI Multi-View Clustering Data Sets Based on the Amsterdam Library of Object Images (ALOI). Zenodo (2010)
Van Engelen, J.E., Hoos, H.H.: A survey on semi-supervised learning. Mach. Learn. 109(2), 373–440 (2020). https://doi.org/10.1007/s10994-019-05855-6
Wagstaff, K., Cardie, C., Rogers, S., Schrödl, S.: Constrained k-means clustering with background knowledge. In: ICML, pp. 577–584 (2001)
Acknowledgments
Czech Science Foundation project No. GF23-07040K. Computational resources were provided by the e-INFRA CZ project, supported by the Ministry of Education, Youth and Sports of the Czech Republic, project No. ID:90254.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Jánošová, M., Lang, A., Budikova, P., Schubert, E., Dohnal, V. (2025). Advancing the PAM Algorithm to Semi-supervised k-Medoids Clustering. In: Chávez, E., Kimia, B., Lokoč, J., Patella, M., Sedmidubsky, J. (eds) Similarity Search and Applications. SISAP 2024. Lecture Notes in Computer Science, vol 15268. Springer, Cham. https://doi.org/10.1007/978-3-031-75823-2_19
Download citation
DOI: https://doi.org/10.1007/978-3-031-75823-2_19
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-75822-5
Online ISBN: 978-3-031-75823-2
eBook Packages: Computer ScienceComputer Science (R0)