Skip to main content
Log in

MTP: discovering high quality partitions in real world graphs

  • Published:
World Wide Web Aims and scope Submit manuscript

Abstract

Given a real world graph, how can we find a large subgraph whose partition quality is much better than the original? How can we use a partition of that subgraph to discover a high quality global partition? Although graph partitioning especially with balanced sizes has received attentions in various applications, it is known NP-hard, and also known that there is no good cut at a large scale for real graphs. In this paper, we propose a novel approach for graph partitioning. Our first focus is on finding a large subgraph with high quality partitions, in terms of conductance. Despite the difficulty of the task for the whole graph, we observe that there is a large connected subgraph whose partition quality is much better than the original. Our proposed method MTP finds such a subgraph by removing “hub” nodes with large degrees, and taking the remaining giant connected component. Further, we extend MTP to gb MTP (Global Balanced MTP) for discovering a global balanced partition. gb MTP attaches the excluded nodes in MTP to the partition found by MTP in a greedy way. In experiments, we demonstrate that MTP finds a subgraph of a large size with low conductance graph partitions, compared with competing methods. We also show that the competitors cannot find connected subgraphs while our method does, by construction. This improvement in partition quality for the subgraph is especially noticeable for large scale cuts—for a balanced partition, down to 14 % of the original conductance with the subgraph size 70 % of the total. As a result, the found subgraph has clear partitions at almost all scales compared with the original. Moreover, gb MTP generally discovers global balanced partitions whose conductance are lower than those found by METIS, the state-of-the-art graph partitioning method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10
Figure 11
Figure 12
Figure 13
Figure 14

Similar content being viewed by others

Notes

  1. We use subset to indicate a set of nodes in a graph, and subsets for its plural.

  2. URL http://snap.stanford.edu/data/index.html

  3. a balanced partition for a subset of nodes.

  4. A principal submatrix with size \(n^{\prime }\times n^{\prime }\) of a matrix A with size n×n for \(n^{\prime }\leq n\) is a submatrix by taking the first \(n^{\prime }\) rows and columns from A.

References

  • Abou-Rjeili, A., Karypis, G.: Multilevel algorithms for partitioning power-law graphs. Proceedings of the 20th International Conference on Parallel and Distributed Processing, 124–124 (2006).

  • Ahn, Y. -Y., Bagrow, J.P., Lehmann, S.: Link communities reveal multiscale complexity in networks. Nature. 466, 761–764 (2010).

    Article  Google Scholar 

  • Albert, R., Jeong, H., Barabási, A.L.: Internet: Diameter of the world-wide Web. Nature. 401 (6749), 130–131 (1999).

    Article  Google Scholar 

  • Andersen, R., Chung, F.R.K., Lang, K.J.: Local graph partitioning using pagerank vectors. 47th Annual IEEE Symposium on Foundations of Computer Science, 475–486 (2006).

  • Arora, S., Ge, R., Sachdeva, S.: Finding overlapping communities in social networks Toward a rigorous approach. Proceedings of the 13th ACM Conference on Electronic Commerce, 37–54 (2012).

  • Bourse, F., Lelarge, M., Vojnovic, M.: Balanced graph edge partition. Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 1456–1465 (2014).

  • Chakrabarti, D., Papadimitriou, S., Modha, D.S., Faloutsos, C.: Fully automatic cross-associations. Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 79–88 (2004).

  • Chung, F., Linyuan, L.: The average distances in random graphs with given expected degrees. Proc. Natl. Acad. Sci. U.S.A. 99 (25), 15879–15882 (2002).

    Article  MathSciNet  MATH  Google Scholar 

  • Ciglan, M., Laclavík, M., Nørvåg, K.: On community detection in real-world networks and the importance of degree assortativity. Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 1007–1015 (2013).

  • Dhillon, I.S., Mallela, S., Modha, D.S.: Information-theoretic co-clustering. Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 89–98 (2003).

  • Evans, T.S., Lambiotte, R.: Line graphs, link partitions, and overlapping communities. Phys. Rev. E. 80 (1), 016105+ (2009).

    Article  Google Scholar 

  • Faloutsos, M., Faloutsos, P., Faloutsos, C.: On power-law relationships of the internet topology. Proceedings of the Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication, 251–262 (1999).

  • Fiduccia, C.M., Mattheyses, R.M.: A linear-time heuristic for improving network partitions. Proceedings of the 19th Design Automation Conference, 175–181 (1982).

  • Fortunato, S.: Community detection in graphs. Phys. Rep. 486 (3), 75–174 (2010).

    Article  MathSciNet  Google Scholar 

  • Gopalan, P., Mimno, D.M., Gerrish, S., Freedman, M.J., Blei, D.M.: Scalable inference of overlapping communities. 26th Annual Conference on Neural Information Processing Systems, 2258–2266 (2012).

  • Hendrickson, B., Leland, R.: The Chaco User’s Guide Version 2.0. Technical report, Sandia National Laboratories, 1995.

  • Hendrickson, B., Leland, R.: An improved spectral graph partitioning algorithm for mapping parallel computations. SIAM J. Sci. Comput. 16 (2), 452–469 (1995).

    Article  MathSciNet  MATH  Google Scholar 

  • Hendrickson, B., Leland, R.W.: A multi-level algorithm for partitioning graphs. Proceedings Supercomputing, 28 (1995).

  • Hoare, C.A.R.: Algorithm 65: Find. Commun. ACM. 4 (7), 321–322 (1961).

    Article  Google Scholar 

  • Kang, U., Faloutsos, C.: Beyond caveman communities: Hubs and spokes for graph compression and mining. 11th IEEE International Conference on Data Mining, 300–309 (2011).

  • Kang, U., Lee, J.Y., Koutra, D., Faloutsos, C.: Net-ray: Visualizing and mining billion-scale graphs. Advances in Knowledge Discovery and Data Mining - 18th Pacific-Asia Conference, 348–361 (2014).

  • Kannan, R, Vempala, S, Vetta, A: On clusterings: good, bad and spectral. J. ACM. 51 (3), 497–515 (2004).

    Article  MathSciNet  MATH  Google Scholar 

  • Karypis, G., Kumar, V.: Multilevel k-way partitioning scheme for irregular graphs. J. Parallel Distrib. Comput. 48 (1), 96–129 (1998).

    Article  MATH  Google Scholar 

  • Kernighan, B.W., Lin, S.: An Efficient Heuristic Procedure for Partitioning Graphs. Bell Sys. Tech. J. 49 (2) (1970).

  • Khadivi, A., Rad, A.A., Hasler, M.: Network community-detection enhancement by proper weighting. Phys. Rev. E. 83 (4), 046104 (2011).

    Article  Google Scholar 

  • Kolmogorov, V., Boykov, Y., Rother, C.: Applications of parametric maxflow in computer vision. IEEE 11th International Conference on Computer Vision, 1–8 (2007).

  • Koutra, D., Kang, U., Vreeken, J., Faloutsos, C.: VOG: Summarizing and understanding large graphs. Proceedings of the 2014 SIAM International Conference on Data Mining, 91–99 (2014).

  • Lang, K.J., Rao, S.: A flow-based method for improving the expansion or conductance of graph cuts. Integer Programming and Combinatorial Optimization, 10th International IPCO Conference, 325–337 (2004).

  • Jure, L. http://snap.stanford.edu/snap/.

  • Leskovec, J., Lang, K.J., Dasgupta, A., Mahoney, M.W.: Statistical properties of community structure in large social and information networks. Proceedings of the 17th International Conference on World Wide Web, 695–704 (2008).

  • Lim, S., Ryu, S., Kwon, S., Jung, K., Lee, J.-G.: Linkscan*: Overlapping community detection using the link-space transformation. IEEE 30th International Conference on Data Engineering, 292–303 (2014).

  • Lim, Y., Jung, K., Kohli, P.: Energy minimization under constraints on label counts. 11th European Conference on Computer Vision, 535–551 (2010).

  • Lim, Y., Jung, K., Kohli, P.: Efficient energy minimization for enforcing label statistics, Vol. 36 (2014).

  • Lim, Y., Kang, U., Faloutsos, C.: SlashBurn: Graph compression and mining beyond caveman communities. IEEE Trans. Knowl. Data Eng. 26 (12), 3077–3089 (2014).

    Article  Google Scholar 

  • Lim, Y., Lee, W.-J., Choi, H.-J., Kang, U.: Discovering large subsets with high quality partitions in real world graphs. 2015 International Conference on Big Data and Smart Computing, 186–193 (2015).

  • Lin, Z., Cao, N., Tong, H., Wang, F., Kang, U., Chau, D.H.: Interactive multi-resolution exploration of million node graphs. IEEE VIS (2013).

  • Lin, Z., Cao, N., Tong, H., Wang, F., Kang, U., Horng, D.: (Polo) Chau. Demonstrating interactive multi-resolution large graph exploration. 13th IEEE International Conference on Data Mining Workshops, 1097–1100 (2013).

  • Nagano, K., Kawahara, Y., Aihara, K.: Size-constrained submodular minimization through minimum norm base. Proceedings of the 28th International Conference on Machine Learning, 977–984 (2011).

  • Naor, J., Schwartz, R.: Balanced metric labeling. Proceedings of the 37th Annual ACM Symposium on Theory of Computing, 582–591 (2005).

  • Newman, M.E.J., Girvan, M.: Finding and evaluating community structure in networks. Phys. Rev. E. 69 (2), 026113 (2004).

  • Nishimura, J., Ugander, J.: Restreaming graph partitioning: simple versatile algorithms for advanced balancing. The 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 1106–1114 (2013).

  • Raghavan, U.N., Albert, R., Kumara, S.: Near linear time algorithm to detect community structures in large-scale networks. Phys Rev. E. 76 (3) (2007).

  • Satuluri, V., Parthasarathy, S.: Scalable graph clustering using stochastic flows: applications to community discovery. Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 737–746 (2009).

  • Schaeffer, E.S.: Graph clustering. Comput. Sci. Rev. 1 (1), 27–64 (2007).

    Article  MATH  Google Scholar 

  • Sen, A., Deng, H., Guha, S.: On a graph partition problem with application to vlsi layout. Inf. Process. Lett. 43 (2), 87–94 (1992).

    Article  MathSciNet  MATH  Google Scholar 

  • Shi, J., Malik, J.: Normalized cuts and image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 22 (8), 888–905 (2000).

    Article  Google Scholar 

  • Siganos, G., Tauro, S.L., Michalis, F.: Jellyfish: A conceptual model for the as internet topology. J. Commun. Netw. 8 (3), 339–350 (2006).

    Article  Google Scholar 

  • Spielman, D.A., Teng, S.-H.: Nearly-linear time algorithms for graph partitioning, graph sparsification, and solving linear systems. Proceedings of the 36th Annual ACM Symposium on Theory of Computing, 81–90 (2004).

  • Stanton, I.: Streaming balanced graph partitioning algorithms for random graphs. Proceedings of the Twenty-Fifth Annual ACM-SIAM Symposium on Discrete Algorithms, 1287–1301 (2014).

  • Stanton, I., Kliot, G.: Streaming graph partitioning for large distributed graphs. The 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 1222–1230 (2012).

  • Tsourakakis, C.E., Gkantsidis, C., Radunovic, B., Vojnovic, M.: FENNEL: Streaming graph partitioning for massive scale graphs. Seventh ACM International Conference on Web Search and Data Mining, 333–342 (2014).

  • Ugander, J., Backstrom, L.: Balanced label propagation for partitioning massive graphs. Sixth ACM International Conference on Web Search and Data Mining, 507–516 (2013).

  • Wang, S., Siskind, J.M.: Image segmentation with ratio cut. IEEE Trans. Pattern Anal. Intell. 25 (6), 675–690 (2003).

    Article  Google Scholar 

  • Xu, N., Chen, L., Loggp, B.C.: A log-based dynamic graph partitioning method. PVLDB. 7 (14), 1917–1928 (2014).

    Google Scholar 

  • Xu, N., Cui, B., Chen, L., Zi, H., Shao, Y.: Heterogeneous environment aware streaming graph partitioning. IEEE Trans. Knowl. Data Eng. 27 (6), 1560–1572 (2015).

    Article  Google Scholar 

  • Yang, J., Leskovec, J.: Defining and evaluating network communities based on ground-truth. 12th IEEE International Conference on Data Mining, 745–754 (2012).

  • Yang, J., Leskovec, J.: Overlapping community detection at scale: a nonnegative matrix factorization approach. Sixth ACM International Conference on Web Search and Data Mining, 587–596 (2013).

  • Yang, T., Jin, R., Chi, Y., Zhu, S.: Combining link and content for community detection: a discriminative approach. Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 927–936 (2009).

Download references

Acknowledgments

This work was supported by the National Research Foundation of Korea(NRF) funded by the Ministry of Science, ICT and Future Planning (NRF-2015K1A3A1A14021055), and IT R&D program of MSIP/IITP [10044970, Development of Core Technology for Human-like Self-taught Learning based on Symbolic Approach]. This work was also funded by Institute for Information communications Technology Promotion (IITP) grant funded by the Korea government (MSIP) (No.R0190-15-2012, ”High Performance Big Data Analytics Platform Performance Acceleration Technologies Development”). The ICT at Seoul National University provides research facilities for this study. The Institute of Engineering Research at Seoul National University provided research facilities for this work.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to U Kang.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lim, Y., Lee, WJ., Choi, HJ. et al. MTP: discovering high quality partitions in real world graphs. World Wide Web 20, 491–514 (2017). https://doi.org/10.1007/s11280-016-0393-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11280-016-0393-1

Keywords

Navigation