Skip to main content
Log in

Clustering with Lower-Bounded Sizes

A General Graph-Theoretic Framework

  • Published:
Algorithmica Aims and scope Submit manuscript

Abstract

Classical clustering problems search for a partition of objects into a fixed number of clusters. In many scenarios, however, the number of clusters is not known or necessarily fixed. Further, clusters are sometimes only considered to be of significance if they have a certain size. We discuss clustering into sets of minimum cardinality k without a fixed number of sets and present a general model for these types of problems. This general framework allows the comparison of different measures to assess the quality of a clustering. We specifically consider nine quality-measures and classify the complexity of the resulting problems with respect to k. Further, we derive some polynomial-time solvable cases for \(k=2\) with connections to matching-type problems which, among other graph problems, then are used to compute approximations for larger values of k.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Notes

  1. This covering problem is sometimes also called Unweighted Simplex Matching and is equivalent to \(\{K_2,K_3\}\)-packing, an old, well studied generalisation of the classical matching problem [7].

References

  1. Abu-Khzam, F.N., Bazgan, C., Casel, K., Fernau, H.: Building clusters with lower-bounded sizes. In: Hong, S. (ed.) 27th International Symposium on Algorithms and Computation, ISAAC, LIPIcs, vol. 64, pp. 4:1–4:13. Schloss Dagstuhl-Leibniz-Zentrum für Informatik (2016)

  2. Aggarwal, G., Panigrahy, R., Feder, T., Thomas, D., Kenthapadi, K., Khuller, S., Zhu, A.: Achieving anonymity via clustering. ACM Trans. Algorithms 6(3), 49 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  3. Anshelevich, E., Karagiozova, A.: Terminal backup, 3D matching, and covering cubic graphs. SIAM J. Comput. 40(3), 678–708 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  4. Armon, A.: On min–max \(r\)-gatherings. Theor. Comput. Sci. 412(7), 573–582 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  5. Blocki, J., Williams, R.: Resolving the complexity of some data privacy problems. In: Abramsky, S., Gavoille, C., Kirchner, C., auf der Heide, F.M., Spirakis, P.G. (eds.) Proceedings of the 37th International Colloquium Conference on Automata, Languages and Programming, ICALP’10: Part II, LNCS, vol. 6199, pp. 393–404. Springer (2010)

  6. Byun, J.W., Kamra, A., Bertino, E., Li, N.: Efficient \(k\)-anonymization using clustering techniques. In: Kotagiri, R., Krishna, P.R., Mohania, M., Nantajeewarawat, E. (eds.) Advances in Databases: Concepts, Systems and Applications, LNCS, vol. 4443, pp. 188–200. Springer, Berlin (2007)

    Chapter  Google Scholar 

  7. Cornuéjols, G., Hartvigsen, D., Pulleyblank, W.: Packing subgraphs in a graph. Oper. Res. Lett. 1(4), 139–143 (1982)

    Article  MathSciNet  MATH  Google Scholar 

  8. Domingo-Ferrer, J., Mateo-Sanz, J.M.: Practical data-oriented microaggregation for statistical disclosure control. IEEE Trans. Knowl. Data Eng. 14(1), 189–201 (2002)

    Article  Google Scholar 

  9. Domingo-Ferrer, J., Sebé, F.: Optimal multivariate 2-microaggregation for microdata protection: a 2-approximation. In: Domingo-Ferrer, J., Franconi, L. (eds.) Privacy in Statistical Databases, PSD’06, LNCS, vol. 4302, pp. 129–138. Springer, Berlin (2006)

    Chapter  Google Scholar 

  10. Edmonds, J., Johnson, E.L.: Matching, Euler tours and the Chinese postman. Math. Program. 5, 88124 (1973)

    Article  MathSciNet  MATH  Google Scholar 

  11. Ergün, F., Kumar, R., Rubinfeld, R.: Fast approximate PCPs. In: Proceedings of the Thirty-First Annual ACM Symposium on Theory of Computing, 1–4 May 1999, Atlanta, Georgia, USA, pp. 41–50 (1999)

  12. Goemans, M., Williamson, D.: A general approximation technique for constrained forest problems. SIAM J. Comput. 24(2), 296–317 (1995)

    Article  MathSciNet  MATH  Google Scholar 

  13. Guha, S., Meyerson, A., Munagala, K.: Hierarchical placement and network design problems. In: In Proceedings of the 41th Annual IEEE Symposium on Foundations of Computer Science, FOCS’00, pp. 603–612. IEEE Computer Society (2000)

  14. King, V., Rao, S., Tarjan, R.: A faster deterministic maximum flow algorithm. J. Algorithms 17(3), 447–474 (1994)

    Article  MathSciNet  MATH  Google Scholar 

  15. Orlin, J.B.: Max flows in \(O(nm)\) time, or better. In: Proceedings of the Forty-Fifth Annual ACM Symposium on Theory of Computing, STOC, pp. 765–774. ACM (2013)

  16. Papadimitriou, C.H., Yannakakis, M.: Optimization, approximation, and complexity classes. J. Comput. Syst. Sci. 43, 425–440 (1991)

    Article  MathSciNet  MATH  Google Scholar 

  17. Samarati, P.: Protecting respondents’ identities in microdata release. IEEE Trans. Knowl. Data Eng. 13(6), 1010–1027 (2001)

    Article  Google Scholar 

  18. Schrijver, A.: Combinatorial Optimization. Springer, Berlin (2003)

    MATH  Google Scholar 

  19. Shalita, A., Zwick, U.: Efficient algorithms for the 2-gathering problem. ACM Trans. Algorithms 6(2), 34 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  20. Stokes, K.: On computational anonymity. In: Privacy in Statistical Databases—UNESCO Chair in Data Privacy, International Conference, PSD 2012, Palermo, Italy, 26–28 September 2012. Proceedings, pp. 336–347 (2012)

  21. Tovey, C.: A simplified NP-complete satisfiability problem. Discrete Appl. Math. 8(1), 85–89 (1984)

    Article  MathSciNet  MATH  Google Scholar 

  22. Xu, D., Anshelevich, E., Chiang, M.: On survivable access network design: complexity and algorithms. In: INFOCOM 2008. 27th IEEE International Conference on Computer Communications, Joint Conference of the IEEE Computer and Communications Societies, 13–18 April 2008, Phoenix, AZ, USA, pp. 186–190 (2008)

Download references

Acknowledgements

Katrin Casel and Henning Fernau were supported by the German Science Foundation Deutsche Forschungsgemeinschaft (FE 560/6-1). Faisal Abu-Khzam and Cristina Bazgan were partially supported by the bilateral research cooperation CEDRE between France and Lebanon (Grant Number 30885TM). We are grateful for the helpful comments of the anonymous reviewers.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Henning Fernau.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Abu-Khzam, F.N., Bazgan, C., Casel, K. et al. Clustering with Lower-Bounded Sizes. Algorithmica 80, 2517–2550 (2018). https://doi.org/10.1007/s00453-017-0374-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00453-017-0374-5

Keywords

Navigation