Skip to main content
Log in

Bipartite-Oriented Distributed Graph Partitioning for Big Learning

  • Regular Paper
  • Published:
Journal of Computer Science and Technology Aims and scope Submit manuscript

Abstract

Many machine learning and data mining (MLDM) problems like recommendation, topic modeling, and medical diagnosis can be modeled as computing on bipartite graphs. However, most distributed graph-parallel systems are oblivious to the unique characteristics in such graphs and existing online graph partitioning algorithms usually cause excessive replication of vertices as well as significant pressure on network communication. This article identifies the challenges and opportunities of partitioning bipartite graphs for distributed MLDM processing and proposes BiGraph, a set of bipartite-oriented graph partitioning algorithms. BiGraph leverages observations such as the skewed distribution of vertices, discriminated computation load and imbalanced data sizes between the two subsets of vertices to derive a set of optimal graph partitioning algorithms that result in minimal vertex replication and network communication. BiGraph has been implemented on PowerGraph and is shown to have a performance boost up to 17.75X (from 1.16X) for four typical MLDM algorithms, due to reducing up to 80% vertex replication, and up to 96% network traffic.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Malewicz G, Austern M H, Bik A J, Dehnert J C, Horn I, Leiser N, Czajkowski G. Pregel: A system for large-scale graph processing. In Proc. the 2010 ACM SIGMOD International Conference on Management of Data, June 2010, pp.135–146.

  2. Dhillon I S. Co-clustering documents and words using bipartite spectral graph partitioning. In Proc. the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Aug. 2001, pp.269–274.

  3. Zha H, He X, Ding C, Simon H, Gu M. Bipartite graph partitioning and data clustering. In Proc. the 10th International Conference on Information and Knowledge Management, August 2001, pp.25–32.

  4. Gao B, Liu T Y, Zheng X, Cheng Q S, Ma W Y. Consistent bipartite graph co-partitioning for star-structured high-order heterogeneous data co-clustering. In Proc. the 11th ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, August 2005, pp.41–50.

  5. Gao B, Liu T Y, Feng G, Qin T, Cheng Q S, Ma W Y. Hierarchical taxonomy preparation for text categorization using consistent bipartite spectral graph copartitioning. IEEE Transactions on Knowledge and Data Engineering, 2005, 17(9): 1263–1273.

    Article  Google Scholar 

  6. Chen R, Shi J, Chen Y, Guan H, Zang B, Chen H. Powerlyra: Differentiated graph computation and partitioning on skewed graphs. Technical Report, IPADSTR-2013-001, Shanghai Jiao Tong University, 2013.

  7. Low Y, Bickson D, Gonzalez J, Guestrin C, Kyrola A, Hellerstein J M. Distributed GraphLab: A framework for machine learning and data mining in the cloud. Proceedings of the VLDB Endowment, 2012, 5(8): 716–727.

    Article  Google Scholar 

  8. Gonzalez J E, Low Y, Gu H, Bickson D, Guestrin C. Powergraph: Distributed graph-parallel computation on natural graphs. In Proc. the 10th USENIX Symp. Operating Systems Design and Implementation, October 2012, pp.17–30.

  9. Jain N, Liao G, Willke T L. Graphbuilder: Scalable graph ETL framework. In Proc. the 1st International Workshop on Graph Data Management Experiences and Systems, June 2013, Article No.4.

  10. Chen R, Shi J, Zang B, Guan H. Bipartite-oriented distributed graph partitioning for big learning. In Proc. the 5th Asia-Paci_c Workshop on Systems, June 2014, pp.14:1–14:7.

  11. Chen R, Ding X, Wang P, Chen H, Zang B, Guan H. Computation and communication efficient graph processing with distributed immutable view. In Proc. the 23rd International Symposium on High-Performance Parallel and Distributed Computing, June 2014, pp.215–226.

  12. Brin S, Page L. The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN Systems, 1998, 30(1): 107–117.

    Article  Google Scholar 

  13. Schloegel K, Karypis G, Kumar V. Parallel multilevel algorithms for multi-constraint graph partitioning. In Proc. the 6th Int. Euro-Par Conf. Parallel Processing, August 2000, pp.296–310.

  14. Ng A Y, Jordan M I,Weiss Y. On spectral clustering: Analysis and an algorithm. In Advances in Neural Information Processing Systems, Dietterich T G, Becker S, Ghahramani Z (eds), MIT Press, 2002, pp.849–856.

  15. LÜcking T, Monien B, Elsässer R. New spectral bounds on k-partitioning of graphs. In Proc. the 13th Annual ACM Symposium on Parallel Algorithms and Architectures, July 2001, pp.255–262.

  16. Stanton I, Kliot G. Streaming graph partitioning for large distributed graphs. In Proc. the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August 2012, pp.1222–1230.

  17. Tsourakakis C, Gkantsidis C, Radunovic B, Vojnovic M. FENNEL: Streaming graph partitioning for massive scale graphs. In Proc. the 7th ACM International Conference on Web Search and Data Mining, February 2014, pp.333–342.

  18. Abou-Rjeili A, Karypis G. Multilevel algorithms for partitioning power-law graphs. In Proc. the 20th International Parallel and Distributed Processing Symposium, April 2006, p.124.

  19. Leskovec J, Lang K J, Dasgupta A, Mahoney M W. Community structure in large networks: Natural cluster sizes and the absence of large well-defined clusters. Internet Mathematics, 2009, 6(1): 29–123.

    Article  MATH  MathSciNet  Google Scholar 

  20. Koren Y, Bell R, Volinsky C. Matrix factorization techniques for recommender systems. Computer, 2009, 42(8): 30–37.

    Article  Google Scholar 

  21. Kumar A, Beutel A, Ho Q, Xing E P. Fugue: Slow-workeragnostic distributed learning for big models on big data. In Proc. the 17th International Conference on Arti_cial Intelligence and Statistics, April 2014, pp.531–539.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hai-Bo Chen.

Additional information

This work was supported in part by the Doctoral Fund of Ministry of Education of China under Grant No. 20130073120040, the Program for New Century Excellent Talents in University of Ministry of Education of China, the Shanghai Science and Technology Development Funds under Grant No. 12QA1401700, a foundation for the Author of National Excellent Doctoral Dissertation of China, the Open Project Program of the State Key Laboratory of Mathematical Engineering and Advanced Computing under Grant No. 2014A05, the National Natural Science Foundation of China under Grant Nos. 61003002, 61402284, the Shanghai Science and Technology Development Fund for High-Tech Achievement Translation under Grant No. 14511100902, and the Singapore National Research Foundation under Grant No. CREATE E2S2.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, R., Shi, JX., Chen, HB. et al. Bipartite-Oriented Distributed Graph Partitioning for Big Learning. J. Comput. Sci. Technol. 30, 20–29 (2015). https://doi.org/10.1007/s11390-015-1501-x

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11390-015-1501-x

Keywords

Navigation