Knowledge Integration in Deep Clustering

Nghiem, Nguyen-Viet-Dung; Vrain, Christel; Dao, Thi-Bich-Hanh

doi:10.1007/978-3-031-26387-3_11

Nguyen-Viet-Dung Nghiem¹³,
Christel Vrain¹³ &
Thi-Bich-Hanh Dao¹³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13713))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

923 Accesses
1 Citations

Abstract

Constrained clustering that integrates knowledge in the form of constraints in a clustering process has been studied for more than two decades. Popular clustering algorithms such as K-means, spectral clustering and recent deep clustering already have their constrained versions, but they usually lack of expressiveness in the form of constraints. In this paper we consider prior knowledge expressing relations between some data points and their assignments to clusters in propositional logic and we show how a deep clustering framework can be extended to integrate this knowledge. To achieve this, we define an expert loss based on the weighted models of the logical formulas; the weights depend on the soft assignment of points to clusters dynamically computed by the deep learner. This loss is integrated in the deep clustering method. We show how it can be computed efficiently using Weighted Model Counting and decomposition techniques. This method has the advantages of both integrating general knowledge and being independent of the neural architecture. Indeed, we have integrated the expert loss into two well-known deep clustering algorithms (IDEC and SCAN). Experiments have been conducted to compare our systems IDEC-LK and SCAN-LK to state-of-the-art methods for pairwise and triplet constraints in terms of computational cost, clustering quality and constraint satisfaction. We show that IDEC-LK can achieve comparable results with these systems, which are tailored for these specific constraints. To show the flexibility of our approach to learn from high-level domain constraints, we have integrated implication constraints, and a new constraint, called span-limited constraint that limits the number of clusters a set of points can belong to. Some experiments are also performed showing that constraints on some points can be extrapolated to other similar points.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Babaki, B., Guns, T., Nijssen, S.: Constrained clustering using column generation. In: CPAIOR 2014, pp. 438–454 (2014)
Google Scholar
Basu, S., Banjeree, A., Mooney, E., Banerjee, A., Mooney, R.J.: Active semi-supervision for pairwise constrained clustering. In: SDM, pp. 333–344 (2004)
Google Scholar
Bilenko, M., Basu, S., Mooney, R.J.: Integrating constraints and metric learning in semi-supervised clustering. In: ICML 2004. pp. 11–18 (2004)
Google Scholar
Bo, D., Wang, X., Shi, C., Zhu, M., Lu, E., Cui, P.: Structural deep clustering network. In: Proceedings of The Web Conference 2020, pp. 1400–1410 (2020)
Google Scholar
Bradley, P., Bennett, K., Demiriz, A.: Constrained k-means clustering. Technical report MSR-TR-2000-65, Microsoft Research (2000)
Google Scholar
Caron, M., Bojanowski, P., Joulin, A., Douze, M.: Deep clustering for unsupervised learning of visual features. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) Computer Vision – ECCV 2018. LNCS, vol. 11218, pp. 139–156. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01264-9_9
Chapter Google Scholar
Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: ICML, pp. 1597–1607. PMLR (2020)
Google Scholar
Dao, T.B.H., Duong, K.C., Vrain, C.: Constrained clustering by constraint programming. Artif. Intell. 244, 70–94 (2017)
Article MathSciNet MATH Google Scholar
Dao, T.B.H., Vrain, C., Duong, K.C., Davidson, I.: A framework for actionable clustering using constraint programming. In: ECAI 2016, pp. 453–461 (2016)
Google Scholar
Darwiche, A.: SDD: a new canonical representation of propositional knowledge bases. In: IJCAI (2011)
Google Scholar
Davidson, I., Ravi, S.S., Shamis, L.: A SAT-based framework for efficient constrained clustering. In: ICDM 2010, pp. 94–105 (2010)
Google Scholar
Guo, X., Gao, L., Liu, X., Yin, J.: Improved deep embedded clustering with local structure preservation. In: IJCAI 2017, pp. 1753–1759 (2017)
Google Scholar
Hodges, J.L.: The significance probability of the SMIRNOV two-sample test. Ark. Mat. 3(5), 469–486 (1958)
Article MathSciNet MATH Google Scholar
Ienco, D., Pensa, R.G.: Deep triplet-driven semi-supervised embedding clustering. In: Kralj Novak, P., Šmuc, T., Džeroski, S. (eds.) DS 2019. LNCS (LNAI), vol. 11828, pp. 220–234. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-33778-0_18
Chapter Google Scholar
Jiang, Z., Zheng, Y., Tan, H., Tang, B., Zhou, H.: Variational deep embedding: An unsupervised and generative approach to clustering (2016)
Google Scholar
Kuhn, H.W.: The Hungarian method for the assignment problem. Naval Res. Logist. Quart. 2(1–2), 83–97 (1955)
Article MathSciNet MATH Google Scholar
Lewis, D.D., Yang, Y., Rose, T.G., Li, F.: Rcv1: a new benchmark collection for text categorization research. JMLR 5, 361–397 (2004)
Google Scholar
Lu, Z., Carreira-Perpinan, M.A.: Constrained spectral clustering through affinity propagation. In: IEEE CVPR, pp. 1–8. IEEE (2008)
Google Scholar
Mueller, M., Kramer, S.: Integer linear programming models for constrained clustering. In: DS 2010, pp. 159–173 (2010)
Google Scholar
Mukherjee, S., Asnani, H., Lin, E., Kannan, S.: Clustergan: latent space clustering in generative adversarial networks. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 4610–4617 (2019)
Google Scholar
Sang, T., Beame, P., Kautz, H.A.: Performing Bayesian inference by weighted model counting. In: AAAI, vol. 5, pp. 475–481 (2005)
Google Scholar
Tang, W., Yang, Y., Zeng, L., Zhan, Y.: Optimizing MSE for clustering with balanced size constraints. Symmetry 11(3), 338 (2019)
Article MATH Google Scholar
Van Gansbeke, W., Vandenhende, S., Georgoulis, S., Proesmans, M., Van Gool, L.: SCAN: learning to classify images without labels. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12355, pp. 268–285. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58607-2_16
Chapter Google Scholar
Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., Manzagol, P.A., Bottou, L.: Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. J. Mach. Learn. Res. 11(12), 3371–3408 (2010)
MathSciNet MATH Google Scholar
Wagstaff, K., Cardie, C., Rogers, S., Schrödl, S.: Constrained K-means Clustering with Background Knowledge. In: ICML 2001, pp. 577–584 (2001)
Google Scholar
Xie, J., Girshick, R., Farhadi, A.: Unsupervised deep embedding for clustering analysis. In: ICML 2016, pp. 478–487 (2016)
Google Scholar
Xie, Y., Xu, Z., Kankanhalli, M.S., Meel, K.S., Soh, H.: Embedding symbolic knowledge into deep networks. In: NIPS, pp. 4233–4243 (2019)
Google Scholar
Xing, E.P., Ng, A.Y., Jordan, M.I., Russell, S.: Distance metric learning with application to clustering with side-information. In: NIPS, vol. 15, p. 12 (2002)
Google Scholar
Xu, J., Zhang, Z., Friedman, T., Liang, Y., Broeck, G.: A semantic loss function for deep learning with symbolic knowledge. In: ICML, pp. 5502–5511 (2018)
Google Scholar
Zhang, H., Zhan, T., Basu, S., Davidson, I.: A framework for deep constrained clustering. Data Min. Knowl. Disc. 35(2), 593–620 (2021). https://doi.org/10.1007/s10618-020-00734-4
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Univ. Orléans, INSA Centre Val de Loire, LIFO EA 4022, 45067, Orléans, France
Nguyen-Viet-Dung Nghiem, Christel Vrain & Thi-Bich-Hanh Dao

Authors

Nguyen-Viet-Dung Nghiem
View author publications
You can also search for this author in PubMed Google Scholar
Christel Vrain
View author publications
You can also search for this author in PubMed Google Scholar
Thi-Bich-Hanh Dao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nguyen-Viet-Dung Nghiem .

Editor information

Editors and Affiliations

Grenoble Alpes University, Saint Martin d’Hères, France
Massih-Reza Amini
INSA Rouen Normandy, Saint Etienne du Rouvray, France
Stéphane Canu
Ruhr-Universität Bochum, Bochum, Germany
Asja Fischer
KU Leuven, Leuven, Belgium
Tias Guns
Central European University, Vienna, Austria
Petra Kralj Novak
Aristotle University of Thessaloniki, Thessaloniki, Greece
Grigorios Tsoumakas

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 351 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nghiem, NVD., Vrain, C., Dao, TBH. (2023). Knowledge Integration in Deep Clustering. In: Amini, MR., Canu, S., Fischer, A., Guns, T., Kralj Novak, P., Tsoumakas, G. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2022. Lecture Notes in Computer Science(), vol 13713. Springer, Cham. https://doi.org/10.1007/978-3-031-26387-3_11

Download citation

DOI: https://doi.org/10.1007/978-3-031-26387-3_11
Published: 17 March 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-26386-6
Online ISBN: 978-3-031-26387-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the ECML PKDD community (opens in a new tab)

Knowledge Integration in Deep Clustering