Abstract
Graph clustering is an important problem with applications to bioinformatics, community discovery in social networks, distributed computing, and more. While most of the research in this area has focused on clustering using disjoint clusters, many real datasets have inherently overlapping clusters. We compare overlapping and non-overlapping clusterings in graphs in the context of minimizing their conductance. It is known that allowing clusters to overlap gives better results in practice. We prove that overlapping clustering may be significantly better than non-overlapping clustering with respect to conductance, even in a theoretical setting.
For minimizing the maximum conductance over the clusters, we give examples demonstrating that allowing overlaps can yield significantly better clusterings, namely, one that has much smaller optimum. In addition for the min-max variant, the overlapping version admits a simple approximation algorithm, while our algorithm for the non-overlapping version is complex and yields a worse approximation ratio due to the presence of the additional constraint. Somewhat surprisingly, for the problem of minimizing the sum of conductances, we found out that allowing overlap does not help. We show how to apply a general technique to transform any overlapping clustering into a non-overlapping one with only a modest increase in the sum of conductances. This uncrossing technique is of independent interest and may find further applications in the future.
We consider this work as a step toward rigorous comparison of overlapping and non-overlapping clusterings and hope that it stimulates further research in this area.
Similar content being viewed by others
Notes
Alternatively, we can define the conductance (or, more appropriately the sparsity) of a cluster as \(\sum_{e\in\delta(S)} w_{e}/\min\{|S|,|\overline {S}|\}\). Our results also hold for this definition.
The notation \(\widetilde{O}(f(n))\) ignores the factors polylogarithmic in n.
A similar consideration does not hold for the non-overlapping version of the min-sum problem since the overlapping and non-overlapping versions turn out to be equivalent for the min-sum problem. This is explained in detail in Sect. 2.
If \((1+\delta)^{q_{1}} \leq a < (1+\delta)^{h+q_{1}}\) and \((1+\delta)^{q_{2}} \leq b < (1+\delta )^{h+q_{2}}\), then (1+δ)q≤a+b<(1+δ)h+1+q where \((1+\delta)^{q} \leq(1+\delta)^{q_{1}}+(1+\delta)^{q_{2}} < (1+\delta )^{q+1}\).
References
Agarwal, A., Alon, N., Charikar, M.: Improved approximation for directed cut problems. In: STOC, pp. 671–680 (2007)
Andersen, R., Chung, F.R.K., Lang, K.J.: Local graph partitioning using pagerank vectors. In: FOCS, pp. 475–486 (2006)
Andersen, R., Gleich, D., Mirrokni, V.: Overlapping clustering for distributed computation. In: ACM Conference on Web search and Data Mining (2012)
Arora, S., Ge, R., Sachdeva, S., Schoenebeck, G.: Finding overlapping communities in social networks: toward a rigorous approach. In: ACM EC (2012)
Arora, S., Hazan, E., Kale, S.: \(O(\sqrt{\log(n)})\) approximation to sparsest cut in \(\tilde{o}(n^{{2}})\) time. SIAM J. Comput. 39(5), 1748–1771 (2010)
Arora, S., Lee, J.R., Naor, A.: Euclidean distortion and the sparsest cut. In: STOC ’05: Proceedings of the Thirty-Seventh Annual ACM Symposium on Theory of Computing, pp. 553–562. ACM Press, New York (2005)
Arora, S., Rao, S., Vazirani, U.V.: Expander flows, geometric embeddings and graph partitioning. In: STOC, pp. 222–231 (2004)
Balcan, M., Borgs, C., Braverman, M., Chayes, J.T., Teng, S.: I like her more than you: self-determined communities. CoRR (2012). arXiv:1201.4899
Bansal, N., Feige, U., Krauthgamer, R., Makarychev, K., Nagarajan, V., Naor, J., Schwartz, R.: Min-max graph partitioning and small set expansion. In: FOCS, pp. 17–26 (2011)
Blum, A., Chawla, S.: Learning from labeled and unlabeled data using graph mincuts. In: ICML, pp. 19–26 (2001)
Brandes, U., Gaertler, M., Wagner, D.: Engineering graph clustering: models and experimental evaluation. ACM J. Exp. Algorithmics 1(1) (2007)
Călinescu, G., Karloff, H.J., Rabani, Y.: An improved approximation algorithm for multiway cut. J. Comput. Syst. Sci. 60(3), 564–574 (2000)
Chawla, S., Gupta, A., Räcke, H.: Embeddings of negative-type metrics and an improved approximation to generalized sparsest cut. In: SODA ’05: Proceedings of the Sixteenth annual ACM-SIAM Symposium on Discrete Algorithms, pp. 102–111. SIAM, Philadelphia (2005)
Cheriyan, J., Karloff, H., Rabani, Y.: Approximating directed multicuts. Combinatorica 25(3), 251–269 (2005)
Chuzhoy, J., Khanna, S.: Hardness of cut problems in directed graphs. In: STOC ’06: Proceedings of the Thirty-Eighth Annual ACM Symposium on Theory of Computing, pp. 527–536. ACM Press, New York (2006)
Chuzhoy, J., Khanna, S.: Polynomial flow-cut gaps and hardness of directed cut problems. J. ACM 56(2) (2009)
Dinur, I., Safra, S.: The importance of being biased. In: Symposium on Theory of Computing, pp. 33–42 (2002)
Even, G., Naor, J., Rao, S., Schieber, B.: Fast approximate graph partitioning algorithms. SIAM J. Comput. 28(6), 2187–2214 (1999)
Feige, U., Peleg, D., Kortsarz, G.: The dense k-subgraph problem. Algorithmica 29(3), 410–421 (2001)
Garg, N., Vazirani, V.V., Yannakakis, M.: Approximate max-flow min-(multi)cut theorems and their applications. SIAM J. Comput. 25(2), 235–251 (1996)
Garg, N., Vazirani, V.V., Yannakakis, M.: Primal-dual approximation algorithms for integral flow and multicut in trees. Algorithmica 18(1), 3–20 (1997)
Gargi, U., Lu, W., Mirrokni, V., Yoon, S.: Large-scale community detection on youtube. In: ICWSM (2011)
Gupta, A.: Improved results for directed multicut. In: Symposium on Discrete Algorithms, pp. 454–455 (2003)
Harrelson, C., Hildrum, K., Rao, S.: A polynomial-time tree decomposition to minimize congestion. In: SPAA, pp. 34–43 (2003)
Khandekar, R., Rao, S., Vazirani, U.V.: Graph partitioning using single commodity flows. In: STOC, pp. 385–390 (2006)
Kortsarts, Y., Kortsarz, G., Nutov, Z.: Greedy approximation algorithms for directed multicuts. Networks 45(4), 214–217 (2005)
Lee, J.R.: On distance scales, embeddings, and efficient relaxations of the cut cone. In: SODA ’05: Proceedings of the Sixteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 92–101. SIAM, Philadelphia (2005)
Lepère, R., Rapine, C.: An asymptotic o(ln rho/ln ln rho)-approximation algorithm for the scheduling problem with duplication on large communication delay graphs. In: STACS, pp. 154–165 (2002)
Mishra, N., Schreiber, R., Stanton, I., Tarjan, R.E.: Clustering social networks. In: WAW, pp. 56–67 (2007)
Räcke, H.: Minimizing congestion in general networks. In: FOCS, pp. 43–52 (2002)
Räcke, H.: Optimal hierarchical decompositions for congestion minimization in networks. In: STOC, pp. 255–264 (2008)
Räcke, H.: Optimal hierarchical decompositions for congestion minimization in networks. In: STOC, pp. 255–264 (2008)
Sahai, T., Speranzon, A., Banaszuk, A.: Hearing the clusters of a graph: a distributed algorithm. Automatica 48(1), 15–24 (2012)
Saran, H., Vazirani, V.V.: Finding k-cuts within twice the optimal. SIAM J. Comput. 24(1), 101–108 (1995)
Shmoys, D.: Cut problems and their applications to divide-andconquer (1996)
Streich, A.P., Frank, M., Basin, D., Buhmann, J.M.: Multi-assignment clustering for boolean data. In: ICML (2009)
Acknowledgements
We thank David Gleich for useful discussions, for validating the importance of our model and for conducting some initial experimental study. We thank two anonymous referees for their comments that helped improve the presentation. We also thank one of the referees for suggestion an example in which the optimum for non overlap and overlap min-max conductance, that is much more natural then the one we presented.
Author information
Authors and Affiliations
Corresponding author
Additional information
Guy Kortsarz partially supported by NSF Award Grant number 434923.
Rights and permissions
About this article
Cite this article
Khandekar, R., Kortsarz, G. & Mirrokni, V. On the Advantage of Overlapping Clusters for Minimizing Conductance. Algorithmica 69, 844–863 (2014). https://doi.org/10.1007/s00453-013-9761-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00453-013-9761-8