Abstract
Given a real world graph, how can we find a large subgraph whose partition quality is much better than the original? How can we use a partition of that subgraph to discover a high quality global partition? Although graph partitioning especially with balanced sizes has received attentions in various applications, it is known NP-hard, and also known that there is no good cut at a large scale for real graphs. In this paper, we propose a novel approach for graph partitioning. Our first focus is on finding a large subgraph with high quality partitions, in terms of conductance. Despite the difficulty of the task for the whole graph, we observe that there is a large connected subgraph whose partition quality is much better than the original. Our proposed method MTP finds such a subgraph by removing “hub” nodes with large degrees, and taking the remaining giant connected component. Further, we extend MTP to gb MTP (Global Balanced MTP) for discovering a global balanced partition. gb MTP attaches the excluded nodes in MTP to the partition found by MTP in a greedy way. In experiments, we demonstrate that MTP finds a subgraph of a large size with low conductance graph partitions, compared with competing methods. We also show that the competitors cannot find connected subgraphs while our method does, by construction. This improvement in partition quality for the subgraph is especially noticeable for large scale cuts—for a balanced partition, down to 14 % of the original conductance with the subgraph size 70 % of the total. As a result, the found subgraph has clear partitions at almost all scales compared with the original. Moreover, gb MTP generally discovers global balanced partitions whose conductance are lower than those found by METIS, the state-of-the-art graph partitioning method.














Similar content being viewed by others
Notes
We use subset to indicate a set of nodes in a graph, and subsets for its plural.
a balanced partition for a subset of nodes.
A principal submatrix with size \(n^{\prime }\times n^{\prime }\) of a matrix A with size n×n for \(n^{\prime }\leq n\) is a submatrix by taking the first \(n^{\prime }\) rows and columns from A.
References
Abou-Rjeili, A., Karypis, G.: Multilevel algorithms for partitioning power-law graphs. Proceedings of the 20th International Conference on Parallel and Distributed Processing, 124–124 (2006).
Ahn, Y. -Y., Bagrow, J.P., Lehmann, S.: Link communities reveal multiscale complexity in networks. Nature. 466, 761–764 (2010).
Albert, R., Jeong, H., Barabási, A.L.: Internet: Diameter of the world-wide Web. Nature. 401 (6749), 130–131 (1999).
Andersen, R., Chung, F.R.K., Lang, K.J.: Local graph partitioning using pagerank vectors. 47th Annual IEEE Symposium on Foundations of Computer Science, 475–486 (2006).
Arora, S., Ge, R., Sachdeva, S.: Finding overlapping communities in social networks Toward a rigorous approach. Proceedings of the 13th ACM Conference on Electronic Commerce, 37–54 (2012).
Bourse, F., Lelarge, M., Vojnovic, M.: Balanced graph edge partition. Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 1456–1465 (2014).
Chakrabarti, D., Papadimitriou, S., Modha, D.S., Faloutsos, C.: Fully automatic cross-associations. Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 79–88 (2004).
Chung, F., Linyuan, L.: The average distances in random graphs with given expected degrees. Proc. Natl. Acad. Sci. U.S.A. 99 (25), 15879–15882 (2002).
Ciglan, M., Laclavík, M., Nørvåg, K.: On community detection in real-world networks and the importance of degree assortativity. Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 1007–1015 (2013).
Dhillon, I.S., Mallela, S., Modha, D.S.: Information-theoretic co-clustering. Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 89–98 (2003).
Evans, T.S., Lambiotte, R.: Line graphs, link partitions, and overlapping communities. Phys. Rev. E. 80 (1), 016105+ (2009).
Faloutsos, M., Faloutsos, P., Faloutsos, C.: On power-law relationships of the internet topology. Proceedings of the Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication, 251–262 (1999).
Fiduccia, C.M., Mattheyses, R.M.: A linear-time heuristic for improving network partitions. Proceedings of the 19th Design Automation Conference, 175–181 (1982).
Fortunato, S.: Community detection in graphs. Phys. Rep. 486 (3), 75–174 (2010).
Gopalan, P., Mimno, D.M., Gerrish, S., Freedman, M.J., Blei, D.M.: Scalable inference of overlapping communities. 26th Annual Conference on Neural Information Processing Systems, 2258–2266 (2012).
Hendrickson, B., Leland, R.: The Chaco User’s Guide Version 2.0. Technical report, Sandia National Laboratories, 1995.
Hendrickson, B., Leland, R.: An improved spectral graph partitioning algorithm for mapping parallel computations. SIAM J. Sci. Comput. 16 (2), 452–469 (1995).
Hendrickson, B., Leland, R.W.: A multi-level algorithm for partitioning graphs. Proceedings Supercomputing, 28 (1995).
Hoare, C.A.R.: Algorithm 65: Find. Commun. ACM. 4 (7), 321–322 (1961).
Kang, U., Faloutsos, C.: Beyond caveman communities: Hubs and spokes for graph compression and mining. 11th IEEE International Conference on Data Mining, 300–309 (2011).
Kang, U., Lee, J.Y., Koutra, D., Faloutsos, C.: Net-ray: Visualizing and mining billion-scale graphs. Advances in Knowledge Discovery and Data Mining - 18th Pacific-Asia Conference, 348–361 (2014).
Kannan, R, Vempala, S, Vetta, A: On clusterings: good, bad and spectral. J. ACM. 51 (3), 497–515 (2004).
Karypis, G., Kumar, V.: Multilevel k-way partitioning scheme for irregular graphs. J. Parallel Distrib. Comput. 48 (1), 96–129 (1998).
Kernighan, B.W., Lin, S.: An Efficient Heuristic Procedure for Partitioning Graphs. Bell Sys. Tech. J. 49 (2) (1970).
Khadivi, A., Rad, A.A., Hasler, M.: Network community-detection enhancement by proper weighting. Phys. Rev. E. 83 (4), 046104 (2011).
Kolmogorov, V., Boykov, Y., Rother, C.: Applications of parametric maxflow in computer vision. IEEE 11th International Conference on Computer Vision, 1–8 (2007).
Koutra, D., Kang, U., Vreeken, J., Faloutsos, C.: VOG: Summarizing and understanding large graphs. Proceedings of the 2014 SIAM International Conference on Data Mining, 91–99 (2014).
Lang, K.J., Rao, S.: A flow-based method for improving the expansion or conductance of graph cuts. Integer Programming and Combinatorial Optimization, 10th International IPCO Conference, 325–337 (2004).
Jure, L. http://snap.stanford.edu/snap/.
Leskovec, J., Lang, K.J., Dasgupta, A., Mahoney, M.W.: Statistical properties of community structure in large social and information networks. Proceedings of the 17th International Conference on World Wide Web, 695–704 (2008).
Lim, S., Ryu, S., Kwon, S., Jung, K., Lee, J.-G.: Linkscan*: Overlapping community detection using the link-space transformation. IEEE 30th International Conference on Data Engineering, 292–303 (2014).
Lim, Y., Jung, K., Kohli, P.: Energy minimization under constraints on label counts. 11th European Conference on Computer Vision, 535–551 (2010).
Lim, Y., Jung, K., Kohli, P.: Efficient energy minimization for enforcing label statistics, Vol. 36 (2014).
Lim, Y., Kang, U., Faloutsos, C.: SlashBurn: Graph compression and mining beyond caveman communities. IEEE Trans. Knowl. Data Eng. 26 (12), 3077–3089 (2014).
Lim, Y., Lee, W.-J., Choi, H.-J., Kang, U.: Discovering large subsets with high quality partitions in real world graphs. 2015 International Conference on Big Data and Smart Computing, 186–193 (2015).
Lin, Z., Cao, N., Tong, H., Wang, F., Kang, U., Chau, D.H.: Interactive multi-resolution exploration of million node graphs. IEEE VIS (2013).
Lin, Z., Cao, N., Tong, H., Wang, F., Kang, U., Horng, D.: (Polo) Chau. Demonstrating interactive multi-resolution large graph exploration. 13th IEEE International Conference on Data Mining Workshops, 1097–1100 (2013).
Nagano, K., Kawahara, Y., Aihara, K.: Size-constrained submodular minimization through minimum norm base. Proceedings of the 28th International Conference on Machine Learning, 977–984 (2011).
Naor, J., Schwartz, R.: Balanced metric labeling. Proceedings of the 37th Annual ACM Symposium on Theory of Computing, 582–591 (2005).
Newman, M.E.J., Girvan, M.: Finding and evaluating community structure in networks. Phys. Rev. E. 69 (2), 026113 (2004).
Nishimura, J., Ugander, J.: Restreaming graph partitioning: simple versatile algorithms for advanced balancing. The 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 1106–1114 (2013).
Raghavan, U.N., Albert, R., Kumara, S.: Near linear time algorithm to detect community structures in large-scale networks. Phys Rev. E. 76 (3) (2007).
Satuluri, V., Parthasarathy, S.: Scalable graph clustering using stochastic flows: applications to community discovery. Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 737–746 (2009).
Schaeffer, E.S.: Graph clustering. Comput. Sci. Rev. 1 (1), 27–64 (2007).
Sen, A., Deng, H., Guha, S.: On a graph partition problem with application to vlsi layout. Inf. Process. Lett. 43 (2), 87–94 (1992).
Shi, J., Malik, J.: Normalized cuts and image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 22 (8), 888–905 (2000).
Siganos, G., Tauro, S.L., Michalis, F.: Jellyfish: A conceptual model for the as internet topology. J. Commun. Netw. 8 (3), 339–350 (2006).
Spielman, D.A., Teng, S.-H.: Nearly-linear time algorithms for graph partitioning, graph sparsification, and solving linear systems. Proceedings of the 36th Annual ACM Symposium on Theory of Computing, 81–90 (2004).
Stanton, I.: Streaming balanced graph partitioning algorithms for random graphs. Proceedings of the Twenty-Fifth Annual ACM-SIAM Symposium on Discrete Algorithms, 1287–1301 (2014).
Stanton, I., Kliot, G.: Streaming graph partitioning for large distributed graphs. The 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 1222–1230 (2012).
Tsourakakis, C.E., Gkantsidis, C., Radunovic, B., Vojnovic, M.: FENNEL: Streaming graph partitioning for massive scale graphs. Seventh ACM International Conference on Web Search and Data Mining, 333–342 (2014).
Ugander, J., Backstrom, L.: Balanced label propagation for partitioning massive graphs. Sixth ACM International Conference on Web Search and Data Mining, 507–516 (2013).
Wang, S., Siskind, J.M.: Image segmentation with ratio cut. IEEE Trans. Pattern Anal. Intell. 25 (6), 675–690 (2003).
Xu, N., Chen, L., Loggp, B.C.: A log-based dynamic graph partitioning method. PVLDB. 7 (14), 1917–1928 (2014).
Xu, N., Cui, B., Chen, L., Zi, H., Shao, Y.: Heterogeneous environment aware streaming graph partitioning. IEEE Trans. Knowl. Data Eng. 27 (6), 1560–1572 (2015).
Yang, J., Leskovec, J.: Defining and evaluating network communities based on ground-truth. 12th IEEE International Conference on Data Mining, 745–754 (2012).
Yang, J., Leskovec, J.: Overlapping community detection at scale: a nonnegative matrix factorization approach. Sixth ACM International Conference on Web Search and Data Mining, 587–596 (2013).
Yang, T., Jin, R., Chi, Y., Zhu, S.: Combining link and content for community detection: a discriminative approach. Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 927–936 (2009).
Acknowledgments
This work was supported by the National Research Foundation of Korea(NRF) funded by the Ministry of Science, ICT and Future Planning (NRF-2015K1A3A1A14021055), and IT R&D program of MSIP/IITP [10044970, Development of Core Technology for Human-like Self-taught Learning based on Symbolic Approach]. This work was also funded by Institute for Information communications Technology Promotion (IITP) grant funded by the Korea government (MSIP) (No.R0190-15-2012, ”High Performance Big Data Analytics Platform Performance Acceleration Technologies Development”). The ICT at Seoul National University provides research facilities for this study. The Institute of Engineering Research at Seoul National University provided research facilities for this work.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Lim, Y., Lee, WJ., Choi, HJ. et al. MTP: discovering high quality partitions in real world graphs. World Wide Web 20, 491–514 (2017). https://doi.org/10.1007/s11280-016-0393-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11280-016-0393-1