Abstract
Classical clustering problems search for a partition of objects into a fixed number of clusters. In many scenarios, however, the number of clusters is not known or necessarily fixed. Further, clusters are sometimes only considered to be of significance if they have a certain size. We discuss clustering into sets of minimum cardinality k without a fixed number of sets and present a general model for these types of problems. This general framework allows the comparison of different measures to assess the quality of a clustering. We specifically consider nine quality-measures and classify the complexity of the resulting problems with respect to k. Further, we derive some polynomial-time solvable cases for \(k=2\) with connections to matching-type problems which, among other graph problems, then are used to compute approximations for larger values of k.



Similar content being viewed by others
Notes
This covering problem is sometimes also called Unweighted Simplex Matching and is equivalent to \(\{K_2,K_3\}\)-packing, an old, well studied generalisation of the classical matching problem [7].
References
Abu-Khzam, F.N., Bazgan, C., Casel, K., Fernau, H.: Building clusters with lower-bounded sizes. In: Hong, S. (ed.) 27th International Symposium on Algorithms and Computation, ISAAC, LIPIcs, vol. 64, pp. 4:1–4:13. Schloss Dagstuhl-Leibniz-Zentrum für Informatik (2016)
Aggarwal, G., Panigrahy, R., Feder, T., Thomas, D., Kenthapadi, K., Khuller, S., Zhu, A.: Achieving anonymity via clustering. ACM Trans. Algorithms 6(3), 49 (2010)
Anshelevich, E., Karagiozova, A.: Terminal backup, 3D matching, and covering cubic graphs. SIAM J. Comput. 40(3), 678–708 (2011)
Armon, A.: On min–max \(r\)-gatherings. Theor. Comput. Sci. 412(7), 573–582 (2011)
Blocki, J., Williams, R.: Resolving the complexity of some data privacy problems. In: Abramsky, S., Gavoille, C., Kirchner, C., auf der Heide, F.M., Spirakis, P.G. (eds.) Proceedings of the 37th International Colloquium Conference on Automata, Languages and Programming, ICALP’10: Part II, LNCS, vol. 6199, pp. 393–404. Springer (2010)
Byun, J.W., Kamra, A., Bertino, E., Li, N.: Efficient \(k\)-anonymization using clustering techniques. In: Kotagiri, R., Krishna, P.R., Mohania, M., Nantajeewarawat, E. (eds.) Advances in Databases: Concepts, Systems and Applications, LNCS, vol. 4443, pp. 188–200. Springer, Berlin (2007)
Cornuéjols, G., Hartvigsen, D., Pulleyblank, W.: Packing subgraphs in a graph. Oper. Res. Lett. 1(4), 139–143 (1982)
Domingo-Ferrer, J., Mateo-Sanz, J.M.: Practical data-oriented microaggregation for statistical disclosure control. IEEE Trans. Knowl. Data Eng. 14(1), 189–201 (2002)
Domingo-Ferrer, J., Sebé, F.: Optimal multivariate 2-microaggregation for microdata protection: a 2-approximation. In: Domingo-Ferrer, J., Franconi, L. (eds.) Privacy in Statistical Databases, PSD’06, LNCS, vol. 4302, pp. 129–138. Springer, Berlin (2006)
Edmonds, J., Johnson, E.L.: Matching, Euler tours and the Chinese postman. Math. Program. 5, 88124 (1973)
Ergün, F., Kumar, R., Rubinfeld, R.: Fast approximate PCPs. In: Proceedings of the Thirty-First Annual ACM Symposium on Theory of Computing, 1–4 May 1999, Atlanta, Georgia, USA, pp. 41–50 (1999)
Goemans, M., Williamson, D.: A general approximation technique for constrained forest problems. SIAM J. Comput. 24(2), 296–317 (1995)
Guha, S., Meyerson, A., Munagala, K.: Hierarchical placement and network design problems. In: In Proceedings of the 41th Annual IEEE Symposium on Foundations of Computer Science, FOCS’00, pp. 603–612. IEEE Computer Society (2000)
King, V., Rao, S., Tarjan, R.: A faster deterministic maximum flow algorithm. J. Algorithms 17(3), 447–474 (1994)
Orlin, J.B.: Max flows in \(O(nm)\) time, or better. In: Proceedings of the Forty-Fifth Annual ACM Symposium on Theory of Computing, STOC, pp. 765–774. ACM (2013)
Papadimitriou, C.H., Yannakakis, M.: Optimization, approximation, and complexity classes. J. Comput. Syst. Sci. 43, 425–440 (1991)
Samarati, P.: Protecting respondents’ identities in microdata release. IEEE Trans. Knowl. Data Eng. 13(6), 1010–1027 (2001)
Schrijver, A.: Combinatorial Optimization. Springer, Berlin (2003)
Shalita, A., Zwick, U.: Efficient algorithms for the 2-gathering problem. ACM Trans. Algorithms 6(2), 34 (2010)
Stokes, K.: On computational anonymity. In: Privacy in Statistical Databases—UNESCO Chair in Data Privacy, International Conference, PSD 2012, Palermo, Italy, 26–28 September 2012. Proceedings, pp. 336–347 (2012)
Tovey, C.: A simplified NP-complete satisfiability problem. Discrete Appl. Math. 8(1), 85–89 (1984)
Xu, D., Anshelevich, E., Chiang, M.: On survivable access network design: complexity and algorithms. In: INFOCOM 2008. 27th IEEE International Conference on Computer Communications, Joint Conference of the IEEE Computer and Communications Societies, 13–18 April 2008, Phoenix, AZ, USA, pp. 186–190 (2008)
Acknowledgements
Katrin Casel and Henning Fernau were supported by the German Science Foundation Deutsche Forschungsgemeinschaft (FE 560/6-1). Faisal Abu-Khzam and Cristina Bazgan were partially supported by the bilateral research cooperation CEDRE between France and Lebanon (Grant Number 30885TM). We are grateful for the helpful comments of the anonymous reviewers.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Abu-Khzam, F.N., Bazgan, C., Casel, K. et al. Clustering with Lower-Bounded Sizes. Algorithmica 80, 2517–2550 (2018). https://doi.org/10.1007/s00453-017-0374-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00453-017-0374-5