Skip to main content

Towards an Algebraic Cost Model for Graph Operators

  • Conference paper
  • First Online:
Algorithmic Aspects of Cloud Computing (ALGOCLOUD 2017)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10739))

Included in the following conference series:

  • 476 Accesses

Abstract

Graph Analytics has been gaining an increasing amount of attention in recent years. This has given rise to the development of numerous graph processing and storage engines, each featuring different models in computation, storage and execution as well as performance. Multi-Engine Analytics present a solution towards adaptive, cost-based complex workflow scheduling to the best available underlying technology. To achieve this in the Graph Analytics case, detailed and accurate cost models for the various runtimes and operators must be defined and exported, such that intelligent planning can take place. In this work, we take a first step towards defining a cost model for graph-based operators based on an algebra and its primitives. We evaluate its accuracy over a state of the art graph database and discuss its advantages and shortcomings.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    We use the notation \(\circ \) to denote composition of functions.

  2. 2.

    Using default arguments: alpha=0.41, beta=0.54, gamma=0.05, delta_in=0.2, delta_out=0.

References

  1. Apache Hama. https://hama.apache.org/

  2. Apache Spark graphX. http://spark.apache.org/graphx/

  3. Neo4j. https://neo4j.com/

  4. Arenas, M., Gutierrez, C., Pérez, J.: Foundations of RDF databases. In: Tessaris, S., Franconi, E., Eiter, T., Gutierrez, C., Handschuh, S., Rousset, M.-C., Schmidt, R.A. (eds.) Reasoning Web 2009. LNCS, vol. 5689, pp. 158–204. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-03754-2_4

    Chapter  Google Scholar 

  5. Cyganiak, R.: A relational algebra for SPARQL. Digital Media Systems Laboratory HP Laboratories Bristol. HPL-2005-170, vol. 35 (2005)

    Google Scholar 

  6. Doka, K., Papailiou, N., Giannakouris, V., Tsoumakos, D., Koziris, N.: Mix ‘n’ match multi-engine analytics. In: 2016 IEEE International Conference on Big Data, pp. 194–203 (2016)

    Google Scholar 

  7. Duggan, J., Elmore, A.J., Stonebraker, M., Balazinska, M., Howe, B., Kepner, J., Madden, S., Maier, D., Mattson, T., Zdonik, S.: The BigDAWG polystore system. In: ACM Sigmod Record (2015)

    Google Scholar 

  8. Frasincar, F., Houben, G.J., Vdovjak, R., Barna, P.: RAL: an algebra for querying RDF. World Wide Web 7(1), 83–109 (2004)

    Article  Google Scholar 

  9. Gonzalez, J., Low, Y., Gu, H., Bickson, D., Guestrin, C.: PowerGraph: distributed graph-parallel computation on natural graphs. In: Proceedings of the 10th USENIX Symposium on Operating Systems Design and Implementation (OSDI 12) (2012)

    Google Scholar 

  10. Hölsch, J., Grossniklaus, M.: An algebra and equivalences to transform graph patterns in neo4j. In: EDBT/ICDT 2016 Workshops: EDBT Workshop on Querying Graph Structured Data (GraphQ) (2016)

    Google Scholar 

  11. Kang, U., Tong, H., Sun, J., Lin, C.Y., Faloutsos, C.: GBASE: A scalable and general graph management system. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2011 (2011)

    Google Scholar 

  12. LeFevre, J., Sankaranarayanan, J., Hacigumus, H., et al.: MISO: souping up big data query processing with a multistore system. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (2014)

    Google Scholar 

  13. Papailiou, N., Tsoumakos, D., Karras, P., Koziris, N.: Graph-aware, workload-adaptive SPARQL query caching. In: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, SIGMOD 2015 (2015)

    Google Scholar 

  14. Savnik, I., Nitta, K.: Algebra of RDF graphs for querying large-scale distributed triple-store. In: Buccafurri, F., Holzinger, A., Kieseberg, P., Tjoa, A.M., Weippl, E. (eds.) CD-ARES 2016. LNCS, vol. 9817, pp. 3–18. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-45507-5_1

    Chapter  Google Scholar 

  15. Schmidt, M., Meier, M., Lausen, G.: Foundations of SPARQL query optimization. In: Proceedings of the 13th International Conference on Database Theory, pp. 4–33. ACM (2010)

    Google Scholar 

  16. Yan, D., Bu, Y., Tian, Y., Deshpande, A., Cheng, J.: Big graph analytics systems. In: Proceedings of the 2016 International Conference on Management of Data, SIGMOD 2016 (2016)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alexander Singh .

Editor information

Editors and Affiliations

A Appendix

A Appendix

1.1 A.1 Random 4-Walk Benchmarks

For the random 4-walks, we first note that \(SpaceCost(RandRow(R)) = 1\), so final space costs collapse to 1. For the time costs, we present Fig. 11. We note that, in this case, both default and modified models fare well in their predictions – this can be attributed to the simplicity of the query and the fact that it uses only expandOut operations.

Fig. 11.
figure 11

Space and time costs (actual and projected via default and modified models) of 4-walk queries on graphs with various connectivity probabilities.

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Singh, A., Tsoumakos, D. (2018). Towards an Algebraic Cost Model for Graph Operators. In: Alistarh, D., Delis, A., Pallis, G. (eds) Algorithmic Aspects of Cloud Computing. ALGOCLOUD 2017. Lecture Notes in Computer Science(), vol 10739. Springer, Cham. https://doi.org/10.1007/978-3-319-74875-7_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-74875-7_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-74874-0

  • Online ISBN: 978-3-319-74875-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics