Skip to main content

Advancing the PAM Algorithm to Semi-supervised k-Medoids Clustering

  • Conference paper
  • First Online:
Similarity Search and Applications (SISAP 2024)

Abstract

The analysis of complex, weakly labeled data is increasingly popular, presenting unique challenges. Traditional unsupervised clustering aims to uncover interrelated sets of objects using feature-based similarity of the objects, but this approach often hits its limits for complex multimedia data. Thus, semi-supervised clustering that exploits small amounts of labeled training data has gained traction recently. In this paper, we propose LabeledPAM, a semi-supervised extension of FasterPAM, a state-of-the-art k-medoids clustering algorithm. Our approach is applicable in semi-supervised classification tasks, where labels are assigned to clusters with minimal labeled data, as well as in semi-supervised clustering scenarios, identifying new clusters with unknown labels. We evaluate our proposal against other semi-supervised clustering techniques suitable for arbitrary distances, demonstrating its efficacy and versatility.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Alpaydin, E., Kaynak, C.: Optical Recognition of Handwritten Digits. UCI Machine Learning Repository (1998). https://doi.org/10.24432/C50P49

  2. Balcan, M.F., Blum, A., Gupta, A.: Clustering under approximation stability. J. ACM 60(2) (2013). https://doi.org/10.1145/2450142.2450144

  3. Basu, S., Banerjee, A., Mooney, R.J.: Semi-supervised clustering by seeding. In: ICML, pp. 27–34 (2002)

    Google Scholar 

  4. Estivill-Castro, V., Murray, A.T.: Discovering associations in spatial data - an efficient medoid based approach. PAKDD 1394, 110–121 (1998). https://doi.org/10.1007/3-540-64383-4_10

    Article  Google Scholar 

  5. Estivill-Castro, V., Yang, J.: Fast and robust general purpose clustering algorithms. Data Min. Knowl. Discov. 8(2), 127–150 (2004). https://doi.org/10.1023/B:DAMI.0000015869.08323.b3

    Article  MathSciNet  Google Scholar 

  6. Ezugwu, A.E., Ikotun, A.M., Oyelade, O.O., Abualigah, L., Agushaka, J.O., Eke, C.I., Akinyelu, A.A.: A comprehensive survey of clustering algorithms: state-of-the-art machine learning applications, taxonomy, challenges, and future research prospects. Eng. Appl. Artif. Intell. 110, 104743 (2022). https://doi.org/10.1016/j.engappai.2022.104743

    Article  Google Scholar 

  7. Hubert, L.J., Arabie, P.: Comparing partitions. J. Classif. 2, 193–218 (1985)

    Article  Google Scholar 

  8. Jiang, H., Ren, Z., Xuan, J., Wu, X.: Extracting elite pairwise constraints for clustering. Neurocomputing 99, 124–133 (2013). https://doi.org/10.1016/j.neucom.2012.06.013

    Article  Google Scholar 

  9. Kaufman, L., Rousseeuw, P.J.: Finding Groups in Data: An Introduction to Cluster Analysis. Wiley (2009). https://doi.org/10.1002/9780470316801

  10. Krizhevsky, A.: Learning multiple layers of features from tiny images. Tech. rep., University of Toronto (2009)

    Google Scholar 

  11. Lange, T., Law, M.H., Jain, A.K., Buhmann, J.M.: Learning with constrained and unlabelled data. In: CVPR, pp. 731–738 (2005). https://doi.org/10.1109/CVPR.2005.210

  12. LeCun, Y., Cortes, C., Burges, C.: MNIST handwritten digit database. http://yann.lecun.com/exdb/mnist (1998)

  13. Lenssen, L., Schubert, E.: Sparse partitioning around medoids. In: Machine Learning Under Resource Constraints – Volume 1: Fundamentals, pp. 182–196. De Gruyter (2022). https://doi.org/10.1515/9783110785944-005

  14. Leonard Kaufman, P.J.R.: Partitioning Around Medoids (Program PAM), pp. 68–125. Wiley (1990) (chapter 2)

    Google Scholar 

  15. Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., Krueger, G., Sutskever, I.: Learning transferable visual models from natural language supervision. In: ICML, pp. 8748–8763 (2021)

    Google Scholar 

  16. Reynolds, A.P., Richards, G., de la Iglesia, B., Rayward-Smith, V.J.: Clustering rules: a comparison of partitioning and hierarchical clustering algorithms. J. Math. Model. Algorith. 5, 475–504 (2006). https://doi.org/10.1007/s10852-005-9022-1

    Article  MathSciNet  Google Scholar 

  17. Schubert, E.: Automatic indexing for similarity search in ELKI. In: Similarity Search and Applications. SISAP (2022). https://doi.org/10.1007/978-3-031-17849-8_16

  18. Schubert, E., Rousseeuw, P.J.: Fast and eager k-medoids clustering: O(k) runtime improvement of the PAM, CLARA, and CLARANS algorithms. Inf. Syst. 101, 101804 (2021). https://doi.org/10.1016/j.is.2021.101804

    Article  Google Scholar 

  19. Schubert, E., Zimek, A.: ELKI Multi-View Clustering Data Sets Based on the Amsterdam Library of Object Images (ALOI). Zenodo (2010)

    Google Scholar 

  20. Van Engelen, J.E., Hoos, H.H.: A survey on semi-supervised learning. Mach. Learn. 109(2), 373–440 (2020). https://doi.org/10.1007/s10994-019-05855-6

    Article  MathSciNet  Google Scholar 

  21. Wagstaff, K., Cardie, C., Rogers, S., Schrödl, S.: Constrained k-means clustering with background knowledge. In: ICML, pp. 577–584 (2001)

    Google Scholar 

Download references

Acknowledgments

Czech Science Foundation project No. GF23-07040K. Computational resources were provided by the e-INFRA CZ project, supported by the Ministry of Education, Youth and Sports of the Czech Republic, project No. ID:90254.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Miriama Jánošová .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Jánošová, M., Lang, A., Budikova, P., Schubert, E., Dohnal, V. (2025). Advancing the PAM Algorithm to Semi-supervised k-Medoids Clustering. In: Chávez, E., Kimia, B., Lokoč, J., Patella, M., Sedmidubsky, J. (eds) Similarity Search and Applications. SISAP 2024. Lecture Notes in Computer Science, vol 15268. Springer, Cham. https://doi.org/10.1007/978-3-031-75823-2_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-75823-2_19

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-75822-5

  • Online ISBN: 978-3-031-75823-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics