Skip to main content

Cost Model for Pregel on GraphX

  • Conference paper
  • First Online:
Advances in Databases and Information Systems (ADBIS 2017)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10509))

Included in the following conference series:

Abstract

The graph partitioning strategy plays a vital role in the overall execution of an algorithm in a distributed graph processing system. Choosing the best strategy is very challenging, as no one strategy is always the best fit for all kinds of graphs or algorithms. In this paper, we help users choosing a suitable partitioning strategy for algorithms based on the Pregel model by providing a cost model for the Pregel implementation in Spark-GraphX. The cost model shows the relationship between four major parameters: (1) input graph (2) cluster configuration (3) algorithm properties and (4) partitioning strategy. We validate the accuracy of the cost model on 17 different combinations of input graph, algorithm, and partition strategy. As such, the cost model can serve as a basis for yet to be developed optimizers for Pregel.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Barnard, S.T.: Parallel multilevel recursive spectral bisection. In: Proceedings of the 1995 ACM/IEEE Conference on Supercomputing, p. 27. ACM (1995)

    Google Scholar 

  2. Çatalyürek, Ü.I.T.V., Aykanat, C., Uçar, B.: On two-dimensional sparse matrix partitioning: models, methods, and a recipe. SIAM J. Sci. Comput. (2010)

    Google Scholar 

  3. Ching, A., Edunov, S., Kabiljo, M., Logothetis, D., Muthukrishnan, S.: One trillion edges: graph processing at facebook-scale. VLDB (2015)

    Google Scholar 

  4. Gonzalez, J.E., Low, Y., Gu, H., Bickson, D., Guestrin, C.: Powergraph: distributed graph-parallel computation on natural graphs. In: OSDI (2012)

    Google Scholar 

  5. Jain, N., Liao, G., Willke, T.L.: Graphbuilder: scalable graph ETL framework. In: GRADES (2013)

    Google Scholar 

  6. Karypis, G., Kumar, V.: Multilevel graph partitioning schemes. In: ICPP, vol. 3 (1995)

    Google Scholar 

  7. Kumar, R., Calders, T.: Information propagation in interaction networks. In: Proceedings of the 20th International Conference on Extending Database Technology, EDBT 2017 (2017)

    Google Scholar 

  8. Kumar, R., Calders, T., Gionis, A., Tatti, N.: Maintaining sliding-window neighborhood profiles in interaction networks. In: Appice, A., Rodrigues, P.P., Santos Costa, V., Gama, J., Jorge, A., Soares, C. (eds.) ECML PKDD 2015. LNCS, vol. 9285, pp. 719–735. Springer, Cham (2015). doi:10.1007/978-3-319-23525-7_44

    Chapter  Google Scholar 

  9. Leskovec, J., Krevl, A.: SNAP Datasets: stanford large network dataset collection, http://snap.stanford.edu/data

  10. Lumsdaine, A., Gregor, D., Hendrickson, B., Berry, J.: Challenges in parallel graph processing. Parallel Process. Lett. (2007)

    Google Scholar 

  11. Malewicz, G., Austern, M.H., Bik, A.J., Dehnert, J.C., Horn, I., Leiser, N., Czajkowski, G.: Pregel: a system for large-scale graph processing. In: SIGMOD (2010)

    Google Scholar 

  12. Petroni, F., Querzoni, L., Daudjee, K., Kamali, S., Iacoboni, G.: HDRF: stream-based partitioning for power-law graphs. In: CIKM. ACM (2015)

    Google Scholar 

  13. Verma, S., Leslie, L.M., Shin, Y., Gupta, I.: An experimental comparison of partitioning strategies in distributed graph processing. Proc. VLDB Endow. (2017)

    Google Scholar 

  14. Xie, C., Yan, L., Li, W.J., Zhang, Z.: Distributed power-law graph computing: theoretical and empirical analysis. In: Advances in Neural Information Processing Systems (2014)

    Google Scholar 

  15. Xin, R.S., Gonzalez, J.E., Franklin, M.J., Stoica, I.: Graphx: a resilient distributed graph system on spark. In: GRADES. ACM (2013)

    Google Scholar 

  16. Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauley, M., Franklin, M.J., Shenker, S., Stoica, I.: Resilient Distributed Datasets: A Fault-tolerant Abstraction for In-memory Cluster Computing. USENIX Association (2012)

    Google Scholar 

Download references

Acknowledgement

This work was supported by the Fonds de la Recherche Scientifique-FNRS under Grant(s) no. T.0183.14 PDR. The student is also part of IT4BI DC program.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rohit Kumar .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Kumar, R., Abelló, A., Calders, T. (2017). Cost Model for Pregel on GraphX. In: Kirikova, M., Nørvåg, K., Papadopoulos, G. (eds) Advances in Databases and Information Systems. ADBIS 2017. Lecture Notes in Computer Science(), vol 10509. Springer, Cham. https://doi.org/10.1007/978-3-319-66917-5_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-66917-5_11

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-66916-8

  • Online ISBN: 978-3-319-66917-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics