Skip to main content

Agglomerative Constrained Clustering Through Similarity and Distance Recalculation

  • Conference paper
  • First Online:
  • 1058 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12344))

Abstract

Constrained clustering has become a topic of considerable interest in machine learning, as it has been shown to produce promising results in domains where only partial information about how to solve the problem is available. Constrained clustering can be viewed as a semi-supervised generalization of clustering, which is traditionally unsupervised. It is able to leverage a new type of information encoded by constraints that guide the clustering process. In particular, this study focuses on instance-level must-link and cannot-link constraints. We propose an agglomerative constrained clustering algorithm, which combines distance-based and clustering-engine adapting methods to incorporate constraints into the partitioning process. It computes a similarity measure on the basis of distances (in the dataset) and constraints (in the constraint set) to later apply an agglomerative clustering method, whose clustering engine has been adapted to consider constraints and raw distances. We prove its capability to produce quality results for the constrained clustering problem by comparing its performance to previous proposals on several datasets with incremental levels of constraint-based information.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    https://sci2s.ugr.es/keel/category.php?cat=clas.

  2. 2.

    https://scikit-learn.org/stable/datasets/index.html.

References

  1. Benavoli, A., Corani, G., Demšar, J., Zaffalon, M.: Time for a change: a tutorial for comparing multiple classifiers through bayesian analysis. J. Mach. Learn. Res. 18(1), 2653–2688 (2017)

    MathSciNet  MATH  Google Scholar 

  2. Bradley, P.S., Bennett, K.P., Demiriz, A.: Constrained k-means clustering. Microsoft Res. Redmond, 20 (2000)

    Google Scholar 

  3. Cai, Z., Yang, X., Huang, T., Zhu, W.: A new similarity combining reconstruction coefficient with pairwise distance for agglomerative clustering. Inf. Sci. 508, 173–182 (2020)

    Article  MathSciNet  Google Scholar 

  4. Carrasco, J., García, S., del Mar Rueda, M., Herrera, F.: rNPBST: an R package covering non-parametric and bayesian statistical tests. In: Martínez de Pisón, F.J., Urraca, R., Quintián, H., Corchado, E. (eds.) HAIS 2017. LNCS (LNAI), vol. 10334, pp. 281–292. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-59650-1_24

    Chapter  Google Scholar 

  5. Davidson, I., Basu, S.: A survey of clustering with instance level constraints. ACM Trans. Knowl. Discovery Data 1, 1–41 (2007)

    Article  Google Scholar 

  6. Everitt, B.S., Landau, S., Leese, M.: Cluster Analysis. Wiley Publishing, 4th edn. (2009)

    Google Scholar 

  7. Hubert, L., Arabie, P.: Comparing partitions. J. Classif. 2(1), 193–218 (1985)

    Article  MATH  Google Scholar 

  8. Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comput. Surv. (CSUR) 31(3), 264–323 (1999)

    Article  Google Scholar 

  9. Khashabi, D., Wieting, J., Liu, J.Y., Liang, F.: Clustering with side information: from a probabilistic model to a deterministic algorithm. arXiv preprint arXiv:1508.06235 (2015)

  10. Klein, D., Kamvar, S.D., Manning, C.D.: From instance-level constraints to space-level constraints: making the most of prior knowledge in data clustering. Tech. rep, Stanford (2002)

    Google Scholar 

  11. Law, M.H.C., Topchy, A., Jain, A.K.: Clustering with soft and group constraints. In: Fred, A., Caelli, T.M., Duin, R.P.W., Campilho, A.C., de Ridder, D. (eds.) SSPR /SPR 2004. LNCS, vol. 3138, pp. 662–670. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-27868-9_72

    Chapter  MATH  Google Scholar 

  12. Pelleg, D., Baras, D.: K-means with large and noisy constraint sets. In: Kok, J.N., Koronacki, J., Mantaras, R.L., Matwin, S., Mladenič, D., Skowron, A. (eds.) ECML 2007. LNCS (LNAI), vol. 4701, pp. 674–682. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-74958-5_67

    Chapter  Google Scholar 

  13. Schmidt, J., Brandle, E.M., Kramer, S.: Clustering with attribute-level constraints. In: 2011 IEEE 11th International Conference on Data Mining, pp. 1206–1211. IEEE (2011)

    Google Scholar 

  14. Triguero, I., et al.: KEEL 3.0: an open source software for multi-stage analysis in data mining. Int. J. Comput. Intell. Syst. 10(1), 1238–1249 (2017)

    Article  Google Scholar 

  15. Wagstaff, K., Cardie, C., Rogers, S., Schrödl, S.: Constrained k-means clustering with background knowledge. In: Proceedings of the Eighteenth International Conference on Machine Learning, Morgan Kaufmann Publishers Inc., pp. 577–584 (2001)

    Google Scholar 

  16. Xing, E.P., Jordan, M.I., Russell, S.J., Ng, A.Y.: Distance metric learning with application to clustering with side-information. In: Advances in neural information processing systems, pp. 521–528 (2003)

    Google Scholar 

  17. Zhang, W., Wang, X., Zhao, D., Tang, X.: Graph degree linkage: agglomerative clustering on a directed graph. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7572, pp. 428–441. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33718-5_31

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Germán González-Almagro .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

González-Almagro, G., Suarez, J.L., Luengo, J., Cano, JR., García, S. (2020). Agglomerative Constrained Clustering Through Similarity and Distance Recalculation. In: de la Cal, E.A., Villar Flecha, J.R., Quintián, H., Corchado, E. (eds) Hybrid Artificial Intelligent Systems. HAIS 2020. Lecture Notes in Computer Science(), vol 12344. Springer, Cham. https://doi.org/10.1007/978-3-030-61705-9_35

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-61705-9_35

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-61704-2

  • Online ISBN: 978-3-030-61705-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics