Skip to main content
Log in

On the Advantage of Overlapping Clusters for Minimizing Conductance

  • Published:
Algorithmica Aims and scope Submit manuscript

Abstract

Graph clustering is an important problem with applications to bioinformatics, community discovery in social networks, distributed computing, and more. While most of the research in this area has focused on clustering using disjoint clusters, many real datasets have inherently overlapping clusters. We compare overlapping and non-overlapping clusterings in graphs in the context of minimizing their conductance. It is known that allowing clusters to overlap gives better results in practice. We prove that overlapping clustering may be significantly better than non-overlapping clustering with respect to conductance, even in a theoretical setting.

For minimizing the maximum conductance over the clusters, we give examples demonstrating that allowing overlaps can yield significantly better clusterings, namely, one that has much smaller optimum. In addition for the min-max variant, the overlapping version admits a simple approximation algorithm, while our algorithm for the non-overlapping version is complex and yields a worse approximation ratio due to the presence of the additional constraint. Somewhat surprisingly, for the problem of minimizing the sum of conductances, we found out that allowing overlap does not help. We show how to apply a general technique to transform any overlapping clustering into a non-overlapping one with only a modest increase in the sum of conductances. This uncrossing technique is of independent interest and may find further applications in the future.

We consider this work as a step toward rigorous comparison of overlapping and non-overlapping clusterings and hope that it stimulates further research in this area.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Notes

  1. Alternatively, we can define the conductance (or, more appropriately the sparsity) of a cluster as \(\sum_{e\in\delta(S)} w_{e}/\min\{|S|,|\overline {S}|\}\). Our results also hold for this definition.

  2. The notation \(\widetilde{O}(f(n))\) ignores the factors polylogarithmic in n.

  3. A similar consideration does not hold for the non-overlapping version of the min-sum problem since the overlapping and non-overlapping versions turn out to be equivalent for the min-sum problem. This is explained in detail in Sect. 2.

  4. If \((1+\delta)^{q_{1}} \leq a < (1+\delta)^{h+q_{1}}\) and \((1+\delta)^{q_{2}} \leq b < (1+\delta )^{h+q_{2}}\), then (1+δ)qa+b<(1+δ)h+1+q where \((1+\delta)^{q} \leq(1+\delta)^{q_{1}}+(1+\delta)^{q_{2}} < (1+\delta )^{q+1}\).

References

  1. Agarwal, A., Alon, N., Charikar, M.: Improved approximation for directed cut problems. In: STOC, pp. 671–680 (2007)

    Google Scholar 

  2. Andersen, R., Chung, F.R.K., Lang, K.J.: Local graph partitioning using pagerank vectors. In: FOCS, pp. 475–486 (2006)

    Google Scholar 

  3. Andersen, R., Gleich, D., Mirrokni, V.: Overlapping clustering for distributed computation. In: ACM Conference on Web search and Data Mining (2012)

    Google Scholar 

  4. Arora, S., Ge, R., Sachdeva, S., Schoenebeck, G.: Finding overlapping communities in social networks: toward a rigorous approach. In: ACM EC (2012)

    Google Scholar 

  5. Arora, S., Hazan, E., Kale, S.: \(O(\sqrt{\log(n)})\) approximation to sparsest cut in \(\tilde{o}(n^{{2}})\) time. SIAM J. Comput. 39(5), 1748–1771 (2010)

    Article  MATH  MathSciNet  Google Scholar 

  6. Arora, S., Lee, J.R., Naor, A.: Euclidean distortion and the sparsest cut. In: STOC ’05: Proceedings of the Thirty-Seventh Annual ACM Symposium on Theory of Computing, pp. 553–562. ACM Press, New York (2005)

    Chapter  Google Scholar 

  7. Arora, S., Rao, S., Vazirani, U.V.: Expander flows, geometric embeddings and graph partitioning. In: STOC, pp. 222–231 (2004)

    Google Scholar 

  8. Balcan, M., Borgs, C., Braverman, M., Chayes, J.T., Teng, S.: I like her more than you: self-determined communities. CoRR (2012). arXiv:1201.4899

  9. Bansal, N., Feige, U., Krauthgamer, R., Makarychev, K., Nagarajan, V., Naor, J., Schwartz, R.: Min-max graph partitioning and small set expansion. In: FOCS, pp. 17–26 (2011)

    Google Scholar 

  10. Blum, A., Chawla, S.: Learning from labeled and unlabeled data using graph mincuts. In: ICML, pp. 19–26 (2001)

    Google Scholar 

  11. Brandes, U., Gaertler, M., Wagner, D.: Engineering graph clustering: models and experimental evaluation. ACM J. Exp. Algorithmics 1(1) (2007)

  12. Călinescu, G., Karloff, H.J., Rabani, Y.: An improved approximation algorithm for multiway cut. J. Comput. Syst. Sci. 60(3), 564–574 (2000)

    Article  MATH  Google Scholar 

  13. Chawla, S., Gupta, A., Räcke, H.: Embeddings of negative-type metrics and an improved approximation to generalized sparsest cut. In: SODA ’05: Proceedings of the Sixteenth annual ACM-SIAM Symposium on Discrete Algorithms, pp. 102–111. SIAM, Philadelphia (2005)

    Google Scholar 

  14. Cheriyan, J., Karloff, H., Rabani, Y.: Approximating directed multicuts. Combinatorica 25(3), 251–269 (2005)

    Article  MATH  MathSciNet  Google Scholar 

  15. Chuzhoy, J., Khanna, S.: Hardness of cut problems in directed graphs. In: STOC ’06: Proceedings of the Thirty-Eighth Annual ACM Symposium on Theory of Computing, pp. 527–536. ACM Press, New York (2006)

    Chapter  Google Scholar 

  16. Chuzhoy, J., Khanna, S.: Polynomial flow-cut gaps and hardness of directed cut problems. J. ACM 56(2) (2009)

  17. Dinur, I., Safra, S.: The importance of being biased. In: Symposium on Theory of Computing, pp. 33–42 (2002)

    Google Scholar 

  18. Even, G., Naor, J., Rao, S., Schieber, B.: Fast approximate graph partitioning algorithms. SIAM J. Comput. 28(6), 2187–2214 (1999)

    Article  MATH  MathSciNet  Google Scholar 

  19. Feige, U., Peleg, D., Kortsarz, G.: The dense k-subgraph problem. Algorithmica 29(3), 410–421 (2001)

    Article  MATH  MathSciNet  Google Scholar 

  20. Garg, N., Vazirani, V.V., Yannakakis, M.: Approximate max-flow min-(multi)cut theorems and their applications. SIAM J. Comput. 25(2), 235–251 (1996)

    Article  MATH  MathSciNet  Google Scholar 

  21. Garg, N., Vazirani, V.V., Yannakakis, M.: Primal-dual approximation algorithms for integral flow and multicut in trees. Algorithmica 18(1), 3–20 (1997)

    Article  MATH  MathSciNet  Google Scholar 

  22. Gargi, U., Lu, W., Mirrokni, V., Yoon, S.: Large-scale community detection on youtube. In: ICWSM (2011)

    Google Scholar 

  23. Gupta, A.: Improved results for directed multicut. In: Symposium on Discrete Algorithms, pp. 454–455 (2003)

    Google Scholar 

  24. Harrelson, C., Hildrum, K., Rao, S.: A polynomial-time tree decomposition to minimize congestion. In: SPAA, pp. 34–43 (2003)

    Google Scholar 

  25. Khandekar, R., Rao, S., Vazirani, U.V.: Graph partitioning using single commodity flows. In: STOC, pp. 385–390 (2006)

    Google Scholar 

  26. Kortsarts, Y., Kortsarz, G., Nutov, Z.: Greedy approximation algorithms for directed multicuts. Networks 45(4), 214–217 (2005)

    Article  MATH  MathSciNet  Google Scholar 

  27. Lee, J.R.: On distance scales, embeddings, and efficient relaxations of the cut cone. In: SODA ’05: Proceedings of the Sixteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 92–101. SIAM, Philadelphia (2005)

    Google Scholar 

  28. Lepère, R., Rapine, C.: An asymptotic o(ln rho/ln ln rho)-approximation algorithm for the scheduling problem with duplication on large communication delay graphs. In: STACS, pp. 154–165 (2002)

    Google Scholar 

  29. Mishra, N., Schreiber, R., Stanton, I., Tarjan, R.E.: Clustering social networks. In: WAW, pp. 56–67 (2007)

    Google Scholar 

  30. Räcke, H.: Minimizing congestion in general networks. In: FOCS, pp. 43–52 (2002)

    Google Scholar 

  31. Räcke, H.: Optimal hierarchical decompositions for congestion minimization in networks. In: STOC, pp. 255–264 (2008)

    Google Scholar 

  32. Räcke, H.: Optimal hierarchical decompositions for congestion minimization in networks. In: STOC, pp. 255–264 (2008)

    Google Scholar 

  33. Sahai, T., Speranzon, A., Banaszuk, A.: Hearing the clusters of a graph: a distributed algorithm. Automatica 48(1), 15–24 (2012)

    Article  MATH  MathSciNet  Google Scholar 

  34. Saran, H., Vazirani, V.V.: Finding k-cuts within twice the optimal. SIAM J. Comput. 24(1), 101–108 (1995)

    Article  MATH  MathSciNet  Google Scholar 

  35. Shmoys, D.: Cut problems and their applications to divide-andconquer (1996)

  36. Streich, A.P., Frank, M., Basin, D., Buhmann, J.M.: Multi-assignment clustering for boolean data. In: ICML (2009)

    Google Scholar 

Download references

Acknowledgements

We thank David Gleich for useful discussions, for validating the importance of our model and for conducting some initial experimental study. We thank two anonymous referees for their comments that helped improve the presentation. We also thank one of the referees for suggestion an example in which the optimum for non overlap and overlap min-max conductance, that is much more natural then the one we presented.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Guy Kortsarz.

Additional information

Guy Kortsarz partially supported by NSF Award Grant number 434923.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Khandekar, R., Kortsarz, G. & Mirrokni, V. On the Advantage of Overlapping Clusters for Minimizing Conductance. Algorithmica 69, 844–863 (2014). https://doi.org/10.1007/s00453-013-9761-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00453-013-9761-8

Keywords

Navigation