Abstract
Constraints-based hierarchical clustering (HC) has emerged as an important improvement over the existing clustering algorithms. Triple-wise relative constraints are suitable to be applicable for HC, enabling the derivation of a cluster hierarchy instead of a flat partition. This paper proposes Constrained Ward’s Hierarchical Agglomerative Clustering algorithm (CWHAC). It is a novel variation of Ward’s hierarchical agglomerative clustering method based on the ideas of triple-wise relative constraints. The algorithm is proposed based on the ultra-metric transformation of the dissimilarity matrix which exploits the triple-wise relative constraints as background knowledge to create a new metric for data similarity. IPoptim and UltraTran methods are introduced to address the triple-wise relative constraints to modify and update the similarity metric for the proposed algorithm. This study addresses the issue of non-satisfaction of triple-wise relative constraints with HC to improve the effectiveness of CWHAC by addressing the issue of constraint violation and redundancy. Furthermore, this paper presents three computational optimization strategies for generating constraints to enhance the efficiency of CWHAC for massive data sets. The proposed algorithm is validated using seven benchmark UCI datasets in terms of F-Score for effectiveness and execution time for efficiency by varying the proportion of constraints. Experimental results demonstrate the improvements made by the proposed algorithm in comparison to the existing Ward’s Hierarchical Clustering algorithm, based constraints and unsupervised HC. Finally, Mann-Witney test is performed to prove the significant improvement demonstrated by the proposed algorithm.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Jain, A.K.: Data clustering: 50 years beyond K-means. Pattern Recognit. Lett. 31(8), 651–666 (2010)
Ma, X., Dhavala, S.: Hierarchical Clustering with Prior Knowledge. arXiv preprint arXiv:1806.03432 (2018)
Dinler, D., Tural, M.K.: A survey of constrained clustering. In: Unsupervised Learning Algorithms, pp. 207–235. Springer, Cham (2016)
Basu, S., Bilenko, M., Mooney, R.J.: A probabilistic framework for semi-supervised clustering. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 59–68. ACM, August 2004
Mitchell, T.: Machine Learning. McGraw Hill, New York (1997)
Aljohani, A., Lai, D.T.C., Bell, P.C., Edirisinghe, E.A.: A comparison of distance metrics in semi-supervised hierarchical clustering methods. In: International Conference on Intelligent Computing, pp. 719–731. Springer, Cham, August 2017
Zheng, L., Li, T.: Semi-supervised hierarchical clustering. In: 2011 IEEE 11th International Conference on Data Mining (ICDM), pp. 982–991. IEEE, December 2011
Miyamoto, S., Terami, A.: Semi-supervised agglomerative hierarchical clustering algorithms with pairwise constraints. In: 2010 IEEE International Conference on Fuzzy Systems (FUZZ), pp. 1–6. IEEE, July 2010
Bradley, P.S., Bennett, K.P., Demiriz, A.: Constrained k-means clustering, pp. 1–8. Microsoft Research, Redmond (2000)
Wagstaff, K., Cardie, C., Rogers, S., Schrödl, S.: Constrained k-means clustering with background knowledge. In: ICML, vol. 1, pp. 577–584, June 2001
Kestler, H.A., Kraus, J.M., Palm, G., Schwenker, F.: On the effects of constraints in semi-supervised hierarchical clustering. In: IAPR Workshop on Artificial Neural Networks in Pattern Recognition, pp. 57–66. Springer, Heidelberg, August 2006
Cai, L., Yu, T., He, T., Chen, L., Lin, M.: Active learning method for constraint-based clustering algorithms. In: International Conference on Web-Age Information Management, pp. 319–329. Springer, Cham, June 2016
Atwa, W., Li, K.: Active query selection for constraint-based clustering algorithms. In: International Conference on Database and Expert Systems Applications, pp. 438–445. Springer, Cham, September 2014
Davidson, I., Wagstaff, K.L., Basu, S.: Measuring constraint-set utility for partitional clustering algorithms. In: Knowledge Discovery in Databases: PKDD 2006, pp. 115–126. Springer, Heidelberg (2006)
Greene, D., Cunningham, P.: Constraint selection by committee: an ensemble approach to identifying informative constraints for semi-supervised clustering. In: European Conference on Machine Learning, pp. 140–151. Springer, Heidelberg, September 2007
Covoes, T.F., Hruschka, E.R., Ghosh, J.: A study of k-means-based algorithms for constrained clustering. Intell. Data Anal. 17(3), 485–505 (2013)
Okabe, M., Yamada, S.: Clustering using boosted constrained k-means algorithm. Front. Robot. AI 5, 18 (2018)
Hang, G., Zhang, D., Ren, J., Hu, C.: A hierarchical clustering algorithm based on K-means with constraints. In: 2009 Fourth International Conference on Innovative Computing, Information and Control (ICICIC), pp. 1479–1482. IEEE, December 2009
MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, no. 14, pp. 281–297, June 1967
Bade, K., Nürnberger, A.: Hierarchical constraints. Mach. Learn. 94(3), 371–399 (2014)
Zhao, Y., Karypis, G.: Evaluation of hierarchical clustering algorithms for document datasets. In: Proceedings of Information Conference on Information and Knowledge Management, pp. 515–524 (2002)
Bair, E.: Semi-supervised clustering methods. Wiley Interdiscip. Rev. Comput. Stat. 5(5), 349–361 (2013)
Zhigang, C., Xuan, L., Fan, Y.: Constrained k-means with external information. In: 2013 8th International Conference on Computer Science & Education (ICCSE), pp. 490–493. IEEE, April 2013
Demiriz, A., Bennett, K.P., Embrechts, M.J.: Semi-supervised clustering using genetic algorithms. In: Artificial neural networks in engineering (ANNIE-99), pp. 809–814 (1999)
Basu, S., Banerjee, A., Mooney, R.: Semi-supervised clustering by seeding. In: Proceedings of 19th International Conference on Machine Learning (ICM) (2002)
Cohn, D., Caruana, R., McCallum, A.: Semi-supervised clustering with user feedback. Constrained Clust. Adv. Algorithms Theory Appl. 4(1), 17–32 (2003)
Klein, D., Kamvar, S.D., Manning, C.D.: From instance-level constraints to space-level constraints: making the most of prior knowledge in data clustering. Stanford (2002)
Xing, E.P., Jordan, M.I., Russell, S.J., Ng, A.Y.: Distance metric learning with application to clustering with side-information. In: Advances in Neural Information Processing Systems, pp. 521–528 (2003)
Hansen, P., Jaumard, B.: Cluster analysis and mathematical programming. Math. Program. 79(1–3), 191–215 (1997)
Ward, J.H.: Hierarchical grouping to optimize an objective function. J. Am. Stat. Assoc. 58, 236–244 (1963)
Larsen, B., Aone, C.: Fast and effective text mining using linear-time document clustering. In: Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 16–22. ACM, August 1999
Mann, H.B., Whitney, D.R.: On a test of whether one of two random variables is stochastically larger than other. Ann. Math. Statist. 18, 52–54 (1947)
Lichman, M.: UCI machine learning repository (2013)
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Aljohani, A.A., Edirisinghe, E.A., Lai, D.T.C. (2020). An Effective and Efficient Constrained Ward’s Hierarchical Agglomerative Clustering Method. In: Bi, Y., Bhatia, R., Kapoor, S. (eds) Intelligent Systems and Applications. IntelliSys 2019. Advances in Intelligent Systems and Computing, vol 1037. Springer, Cham. https://doi.org/10.1007/978-3-030-29516-5_46
Download citation
DOI: https://doi.org/10.1007/978-3-030-29516-5_46
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-29515-8
Online ISBN: 978-3-030-29516-5
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)