An Effective and Efficient Constrained Ward’s Hierarchical Agglomerative Clustering Method

Aljohani, Abeer A.; Edirisinghe, Eran A.; Lai, Daphne Teck Ching

doi:10.1007/978-3-030-29516-5_46

Abeer A. Aljohani¹⁷,
Eran A. Edirisinghe¹⁷ &
Daphne Teck Ching Lai¹⁸

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1037))

Included in the following conference series:

Proceedings of SAI Intelligent Systems Conference

1685 Accesses

Abstract

Constraints-based hierarchical clustering (HC) has emerged as an important improvement over the existing clustering algorithms. Triple-wise relative constraints are suitable to be applicable for HC, enabling the derivation of a cluster hierarchy instead of a flat partition. This paper proposes Constrained Ward’s Hierarchical Agglomerative Clustering algorithm (CWHAC). It is a novel variation of Ward’s hierarchical agglomerative clustering method based on the ideas of triple-wise relative constraints. The algorithm is proposed based on the ultra-metric transformation of the dissimilarity matrix which exploits the triple-wise relative constraints as background knowledge to create a new metric for data similarity. IPoptim and UltraTran methods are introduced to address the triple-wise relative constraints to modify and update the similarity metric for the proposed algorithm. This study addresses the issue of non-satisfaction of triple-wise relative constraints with HC to improve the effectiveness of CWHAC by addressing the issue of constraint violation and redundancy. Furthermore, this paper presents three computational optimization strategies for generating constraints to enhance the efficiency of CWHAC for massive data sets. The proposed algorithm is validated using seven benchmark UCI datasets in terms of F-Score for effectiveness and execution time for efficiency by varying the proportion of constraints. Experimental results demonstrate the improvements made by the proposed algorithm in comparison to the existing Ward’s Hierarchical Clustering algorithm, based constraints and unsupervised HC. Finally, Mann-Witney test is performed to prove the significant improvement demonstrated by the proposed algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Jain, A.K.: Data clustering: 50 years beyond K-means. Pattern Recognit. Lett. 31(8), 651–666 (2010)
Article Google Scholar
Ma, X., Dhavala, S.: Hierarchical Clustering with Prior Knowledge. arXiv preprint arXiv:1806.03432 (2018)
Dinler, D., Tural, M.K.: A survey of constrained clustering. In: Unsupervised Learning Algorithms, pp. 207–235. Springer, Cham (2016)
Chapter Google Scholar
Basu, S., Bilenko, M., Mooney, R.J.: A probabilistic framework for semi-supervised clustering. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 59–68. ACM, August 2004
Google Scholar
Mitchell, T.: Machine Learning. McGraw Hill, New York (1997)
MATH Google Scholar
Aljohani, A., Lai, D.T.C., Bell, P.C., Edirisinghe, E.A.: A comparison of distance metrics in semi-supervised hierarchical clustering methods. In: International Conference on Intelligent Computing, pp. 719–731. Springer, Cham, August 2017
Chapter Google Scholar
Zheng, L., Li, T.: Semi-supervised hierarchical clustering. In: 2011 IEEE 11th International Conference on Data Mining (ICDM), pp. 982–991. IEEE, December 2011
Google Scholar
Miyamoto, S., Terami, A.: Semi-supervised agglomerative hierarchical clustering algorithms with pairwise constraints. In: 2010 IEEE International Conference on Fuzzy Systems (FUZZ), pp. 1–6. IEEE, July 2010
Google Scholar
Bradley, P.S., Bennett, K.P., Demiriz, A.: Constrained k-means clustering, pp. 1–8. Microsoft Research, Redmond (2000)
Google Scholar
Wagstaff, K., Cardie, C., Rogers, S., Schrödl, S.: Constrained k-means clustering with background knowledge. In: ICML, vol. 1, pp. 577–584, June 2001
Google Scholar
Kestler, H.A., Kraus, J.M., Palm, G., Schwenker, F.: On the effects of constraints in semi-supervised hierarchical clustering. In: IAPR Workshop on Artificial Neural Networks in Pattern Recognition, pp. 57–66. Springer, Heidelberg, August 2006
Chapter Google Scholar
Cai, L., Yu, T., He, T., Chen, L., Lin, M.: Active learning method for constraint-based clustering algorithms. In: International Conference on Web-Age Information Management, pp. 319–329. Springer, Cham, June 2016
Chapter Google Scholar
Atwa, W., Li, K.: Active query selection for constraint-based clustering algorithms. In: International Conference on Database and Expert Systems Applications, pp. 438–445. Springer, Cham, September 2014
Google Scholar
Davidson, I., Wagstaff, K.L., Basu, S.: Measuring constraint-set utility for partitional clustering algorithms. In: Knowledge Discovery in Databases: PKDD 2006, pp. 115–126. Springer, Heidelberg (2006)
Google Scholar
Greene, D., Cunningham, P.: Constraint selection by committee: an ensemble approach to identifying informative constraints for semi-supervised clustering. In: European Conference on Machine Learning, pp. 140–151. Springer, Heidelberg, September 2007
Google Scholar
Covoes, T.F., Hruschka, E.R., Ghosh, J.: A study of k-means-based algorithms for constrained clustering. Intell. Data Anal. 17(3), 485–505 (2013)
Article Google Scholar
Okabe, M., Yamada, S.: Clustering using boosted constrained k-means algorithm. Front. Robot. AI 5, 18 (2018)
Article Google Scholar
Hang, G., Zhang, D., Ren, J., Hu, C.: A hierarchical clustering algorithm based on K-means with constraints. In: 2009 Fourth International Conference on Innovative Computing, Information and Control (ICICIC), pp. 1479–1482. IEEE, December 2009
Google Scholar
MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, no. 14, pp. 281–297, June 1967
Google Scholar
Bade, K., Nürnberger, A.: Hierarchical constraints. Mach. Learn. 94(3), 371–399 (2014)
Article MathSciNet Google Scholar
Zhao, Y., Karypis, G.: Evaluation of hierarchical clustering algorithms for document datasets. In: Proceedings of Information Conference on Information and Knowledge Management, pp. 515–524 (2002)
Google Scholar
Bair, E.: Semi-supervised clustering methods. Wiley Interdiscip. Rev. Comput. Stat. 5(5), 349–361 (2013)
Article Google Scholar
Zhigang, C., Xuan, L., Fan, Y.: Constrained k-means with external information. In: 2013 8th International Conference on Computer Science & Education (ICCSE), pp. 490–493. IEEE, April 2013
Google Scholar
Demiriz, A., Bennett, K.P., Embrechts, M.J.: Semi-supervised clustering using genetic algorithms. In: Artificial neural networks in engineering (ANNIE-99), pp. 809–814 (1999)
Google Scholar
Basu, S., Banerjee, A., Mooney, R.: Semi-supervised clustering by seeding. In: Proceedings of 19th International Conference on Machine Learning (ICM) (2002)
Google Scholar
Cohn, D., Caruana, R., McCallum, A.: Semi-supervised clustering with user feedback. Constrained Clust. Adv. Algorithms Theory Appl. 4(1), 17–32 (2003)
MATH Google Scholar
Klein, D., Kamvar, S.D., Manning, C.D.: From instance-level constraints to space-level constraints: making the most of prior knowledge in data clustering. Stanford (2002)
Google Scholar
Xing, E.P., Jordan, M.I., Russell, S.J., Ng, A.Y.: Distance metric learning with application to clustering with side-information. In: Advances in Neural Information Processing Systems, pp. 521–528 (2003)
Google Scholar
Hansen, P., Jaumard, B.: Cluster analysis and mathematical programming. Math. Program. 79(1–3), 191–215 (1997)
MathSciNet MATH Google Scholar
Ward, J.H.: Hierarchical grouping to optimize an objective function. J. Am. Stat. Assoc. 58, 236–244 (1963)
Article MathSciNet Google Scholar
Larsen, B., Aone, C.: Fast and effective text mining using linear-time document clustering. In: Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 16–22. ACM, August 1999
Google Scholar
Mann, H.B., Whitney, D.R.: On a test of whether one of two random variables is stochastically larger than other. Ann. Math. Statist. 18, 52–54 (1947)
Article MathSciNet Google Scholar
Lichman, M.: UCI machine learning repository (2013)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Loughborough University, Loughborough, UK
Abeer A. Aljohani & Eran A. Edirisinghe
Faculty of Science, Universiti Brunei Darussalam, Bandar Seri Begawan, Brunei
Daphne Teck Ching Lai

Authors

Abeer A. Aljohani
View author publications
You can also search for this author in PubMed Google Scholar
Eran A. Edirisinghe
View author publications
You can also search for this author in PubMed Google Scholar
Daphne Teck Ching Lai
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Abeer A. Aljohani , Eran A. Edirisinghe or Daphne Teck Ching Lai .

Editor information

Editors and Affiliations

School of Computing, Computer Science Research Institute, Ulster University, Newtownabbey, UK
Yaxin Bi
The Science and Information (SAI) Organization, Bradford, West Yorkshire, UK
Rahul Bhatia
The Science and Information (SAI) Organization, Bradford, West Yorkshire, UK
Supriya Kapoor

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Aljohani, A.A., Edirisinghe, E.A., Lai, D.T.C. (2020). An Effective and Efficient Constrained Ward’s Hierarchical Agglomerative Clustering Method. In: Bi, Y., Bhatia, R., Kapoor, S. (eds) Intelligent Systems and Applications. IntelliSys 2019. Advances in Intelligent Systems and Computing, vol 1037. Springer, Cham. https://doi.org/10.1007/978-3-030-29516-5_46

Download citation

DOI: https://doi.org/10.1007/978-3-030-29516-5_46
Published: 24 August 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-29515-8
Online ISBN: 978-3-030-29516-5
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics