Skip to main content

An Effective and Efficient Constrained Ward’s Hierarchical Agglomerative Clustering Method

  • Conference paper
  • First Online:
Book cover Intelligent Systems and Applications (IntelliSys 2019)

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1037))

Included in the following conference series:

  • 1685 Accesses

Abstract

Constraints-based hierarchical clustering (HC) has emerged as an important improvement over the existing clustering algorithms. Triple-wise relative constraints are suitable to be applicable for HC, enabling the derivation of a cluster hierarchy instead of a flat partition. This paper proposes Constrained Ward’s Hierarchical Agglomerative Clustering algorithm (CWHAC). It is a novel variation of Ward’s hierarchical agglomerative clustering method based on the ideas of triple-wise relative constraints. The algorithm is proposed based on the ultra-metric transformation of the dissimilarity matrix which exploits the triple-wise relative constraints as background knowledge to create a new metric for data similarity. IPoptim and UltraTran methods are introduced to address the triple-wise relative constraints to modify and update the similarity metric for the proposed algorithm. This study addresses the issue of non-satisfaction of triple-wise relative constraints with HC to improve the effectiveness of CWHAC by addressing the issue of constraint violation and redundancy. Furthermore, this paper presents three computational optimization strategies for generating constraints to enhance the efficiency of CWHAC for massive data sets. The proposed algorithm is validated using seven benchmark UCI datasets in terms of F-Score for effectiveness and execution time for efficiency by varying the proportion of constraints. Experimental results demonstrate the improvements made by the proposed algorithm in comparison to the existing Ward’s Hierarchical Clustering algorithm, based constraints and unsupervised HC. Finally, Mann-Witney test is performed to prove the significant improvement demonstrated by the proposed algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Jain, A.K.: Data clustering: 50 years beyond K-means. Pattern Recognit. Lett. 31(8), 651–666 (2010)

    Article  Google Scholar 

  2. Ma, X., Dhavala, S.: Hierarchical Clustering with Prior Knowledge. arXiv preprint arXiv:1806.03432 (2018)

  3. Dinler, D., Tural, M.K.: A survey of constrained clustering. In: Unsupervised Learning Algorithms, pp. 207–235. Springer, Cham (2016)

    Chapter  Google Scholar 

  4. Basu, S., Bilenko, M., Mooney, R.J.: A probabilistic framework for semi-supervised clustering. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 59–68. ACM, August 2004

    Google Scholar 

  5. Mitchell, T.: Machine Learning. McGraw Hill, New York (1997)

    MATH  Google Scholar 

  6. Aljohani, A., Lai, D.T.C., Bell, P.C., Edirisinghe, E.A.: A comparison of distance metrics in semi-supervised hierarchical clustering methods. In: International Conference on Intelligent Computing, pp. 719–731. Springer, Cham, August 2017

    Chapter  Google Scholar 

  7. Zheng, L., Li, T.: Semi-supervised hierarchical clustering. In: 2011 IEEE 11th International Conference on Data Mining (ICDM), pp. 982–991. IEEE, December 2011

    Google Scholar 

  8. Miyamoto, S., Terami, A.: Semi-supervised agglomerative hierarchical clustering algorithms with pairwise constraints. In: 2010 IEEE International Conference on Fuzzy Systems (FUZZ), pp. 1–6. IEEE, July 2010

    Google Scholar 

  9. Bradley, P.S., Bennett, K.P., Demiriz, A.: Constrained k-means clustering, pp. 1–8. Microsoft Research, Redmond (2000)

    Google Scholar 

  10. Wagstaff, K., Cardie, C., Rogers, S., Schrödl, S.: Constrained k-means clustering with background knowledge. In: ICML, vol. 1, pp. 577–584, June 2001

    Google Scholar 

  11. Kestler, H.A., Kraus, J.M., Palm, G., Schwenker, F.: On the effects of constraints in semi-supervised hierarchical clustering. In: IAPR Workshop on Artificial Neural Networks in Pattern Recognition, pp. 57–66. Springer, Heidelberg, August 2006

    Chapter  Google Scholar 

  12. Cai, L., Yu, T., He, T., Chen, L., Lin, M.: Active learning method for constraint-based clustering algorithms. In: International Conference on Web-Age Information Management, pp. 319–329. Springer, Cham, June 2016

    Chapter  Google Scholar 

  13. Atwa, W., Li, K.: Active query selection for constraint-based clustering algorithms. In: International Conference on Database and Expert Systems Applications, pp. 438–445. Springer, Cham, September 2014

    Google Scholar 

  14. Davidson, I., Wagstaff, K.L., Basu, S.: Measuring constraint-set utility for partitional clustering algorithms. In: Knowledge Discovery in Databases: PKDD 2006, pp. 115–126. Springer, Heidelberg (2006)

    Google Scholar 

  15. Greene, D., Cunningham, P.: Constraint selection by committee: an ensemble approach to identifying informative constraints for semi-supervised clustering. In: European Conference on Machine Learning, pp. 140–151. Springer, Heidelberg, September 2007

    Google Scholar 

  16. Covoes, T.F., Hruschka, E.R., Ghosh, J.: A study of k-means-based algorithms for constrained clustering. Intell. Data Anal. 17(3), 485–505 (2013)

    Article  Google Scholar 

  17. Okabe, M., Yamada, S.: Clustering using boosted constrained k-means algorithm. Front. Robot. AI 5, 18 (2018)

    Article  Google Scholar 

  18. Hang, G., Zhang, D., Ren, J., Hu, C.: A hierarchical clustering algorithm based on K-means with constraints. In: 2009 Fourth International Conference on Innovative Computing, Information and Control (ICICIC), pp. 1479–1482. IEEE, December 2009

    Google Scholar 

  19. MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, no. 14, pp. 281–297, June 1967

    Google Scholar 

  20. Bade, K., Nürnberger, A.: Hierarchical constraints. Mach. Learn. 94(3), 371–399 (2014)

    Article  MathSciNet  Google Scholar 

  21. Zhao, Y., Karypis, G.: Evaluation of hierarchical clustering algorithms for document datasets. In: Proceedings of Information Conference on Information and Knowledge Management, pp. 515–524 (2002)

    Google Scholar 

  22. Bair, E.: Semi-supervised clustering methods. Wiley Interdiscip. Rev. Comput. Stat. 5(5), 349–361 (2013)

    Article  Google Scholar 

  23. Zhigang, C., Xuan, L., Fan, Y.: Constrained k-means with external information. In: 2013 8th International Conference on Computer Science & Education (ICCSE), pp. 490–493. IEEE, April 2013

    Google Scholar 

  24. Demiriz, A., Bennett, K.P., Embrechts, M.J.: Semi-supervised clustering using genetic algorithms. In:  Artificial neural networks in engineering (ANNIE-99), pp. 809–814 (1999)

    Google Scholar 

  25. Basu, S., Banerjee, A., Mooney, R.: Semi-supervised clustering by seeding. In: Proceedings of 19th International Conference on Machine Learning (ICM) (2002)

    Google Scholar 

  26. Cohn, D., Caruana, R., McCallum, A.: Semi-supervised clustering with user feedback. Constrained Clust. Adv. Algorithms Theory Appl. 4(1), 17–32 (2003)

    MATH  Google Scholar 

  27. Klein, D., Kamvar, S.D., Manning, C.D.: From instance-level constraints to space-level constraints: making the most of prior knowledge in data clustering. Stanford (2002)

    Google Scholar 

  28. Xing, E.P., Jordan, M.I., Russell, S.J., Ng, A.Y.: Distance metric learning with application to clustering with side-information. In: Advances in Neural Information Processing Systems, pp. 521–528 (2003)

    Google Scholar 

  29. Hansen, P., Jaumard, B.: Cluster analysis and mathematical programming. Math. Program. 79(1–3), 191–215 (1997)

    MathSciNet  MATH  Google Scholar 

  30. Ward, J.H.: Hierarchical grouping to optimize an objective function. J. Am. Stat. Assoc. 58, 236–244 (1963)

    Article  MathSciNet  Google Scholar 

  31. Larsen, B., Aone, C.: Fast and effective text mining using linear-time document clustering. In: Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 16–22. ACM, August 1999

    Google Scholar 

  32. Mann, H.B., Whitney, D.R.: On a test of whether one of two random variables is stochastically larger than other. Ann. Math. Statist. 18, 52–54 (1947)

    Article  MathSciNet  Google Scholar 

  33. Lichman, M.: UCI machine learning repository (2013)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Abeer A. Aljohani , Eran A. Edirisinghe or Daphne Teck Ching Lai .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Aljohani, A.A., Edirisinghe, E.A., Lai, D.T.C. (2020). An Effective and Efficient Constrained Ward’s Hierarchical Agglomerative Clustering Method. In: Bi, Y., Bhatia, R., Kapoor, S. (eds) Intelligent Systems and Applications. IntelliSys 2019. Advances in Intelligent Systems and Computing, vol 1037. Springer, Cham. https://doi.org/10.1007/978-3-030-29516-5_46

Download citation

Publish with us

Policies and ethics