Abstract
Graph Analytics has been gaining an increasing amount of attention in recent years. This has given rise to the development of numerous graph processing and storage engines, each featuring different models in computation, storage and execution as well as performance. Multi-Engine Analytics present a solution towards adaptive, cost-based complex workflow scheduling to the best available underlying technology. To achieve this in the Graph Analytics case, detailed and accurate cost models for the various runtimes and operators must be defined and exported, such that intelligent planning can take place. In this work, we take a first step towards defining a cost model for graph-based operators based on an algebra and its primitives. We evaluate its accuracy over a state of the art graph database and discuss its advantages and shortcomings.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
We use the notation \(\circ \) to denote composition of functions.
- 2.
Using default arguments: alpha=0.41, beta=0.54, gamma=0.05, delta_in=0.2, delta_out=0.
References
Apache Hama. https://hama.apache.org/
Apache Spark graphX. http://spark.apache.org/graphx/
Neo4j. https://neo4j.com/
Arenas, M., Gutierrez, C., Pérez, J.: Foundations of RDF databases. In: Tessaris, S., Franconi, E., Eiter, T., Gutierrez, C., Handschuh, S., Rousset, M.-C., Schmidt, R.A. (eds.) Reasoning Web 2009. LNCS, vol. 5689, pp. 158–204. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-03754-2_4
Cyganiak, R.: A relational algebra for SPARQL. Digital Media Systems Laboratory HP Laboratories Bristol. HPL-2005-170, vol. 35 (2005)
Doka, K., Papailiou, N., Giannakouris, V., Tsoumakos, D., Koziris, N.: Mix ‘n’ match multi-engine analytics. In: 2016 IEEE International Conference on Big Data, pp. 194–203 (2016)
Duggan, J., Elmore, A.J., Stonebraker, M., Balazinska, M., Howe, B., Kepner, J., Madden, S., Maier, D., Mattson, T., Zdonik, S.: The BigDAWG polystore system. In: ACM Sigmod Record (2015)
Frasincar, F., Houben, G.J., Vdovjak, R., Barna, P.: RAL: an algebra for querying RDF. World Wide Web 7(1), 83–109 (2004)
Gonzalez, J., Low, Y., Gu, H., Bickson, D., Guestrin, C.: PowerGraph: distributed graph-parallel computation on natural graphs. In: Proceedings of the 10th USENIX Symposium on Operating Systems Design and Implementation (OSDI 12) (2012)
Hölsch, J., Grossniklaus, M.: An algebra and equivalences to transform graph patterns in neo4j. In: EDBT/ICDT 2016 Workshops: EDBT Workshop on Querying Graph Structured Data (GraphQ) (2016)
Kang, U., Tong, H., Sun, J., Lin, C.Y., Faloutsos, C.: GBASE: A scalable and general graph management system. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2011 (2011)
LeFevre, J., Sankaranarayanan, J., Hacigumus, H., et al.: MISO: souping up big data query processing with a multistore system. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (2014)
Papailiou, N., Tsoumakos, D., Karras, P., Koziris, N.: Graph-aware, workload-adaptive SPARQL query caching. In: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, SIGMOD 2015 (2015)
Savnik, I., Nitta, K.: Algebra of RDF graphs for querying large-scale distributed triple-store. In: Buccafurri, F., Holzinger, A., Kieseberg, P., Tjoa, A.M., Weippl, E. (eds.) CD-ARES 2016. LNCS, vol. 9817, pp. 3–18. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-45507-5_1
Schmidt, M., Meier, M., Lausen, G.: Foundations of SPARQL query optimization. In: Proceedings of the 13th International Conference on Database Theory, pp. 4–33. ACM (2010)
Yan, D., Bu, Y., Tian, Y., Deshpande, A., Cheng, J.: Big graph analytics systems. In: Proceedings of the 2016 International Conference on Management of Data, SIGMOD 2016 (2016)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
A Appendix
A Appendix
1.1 A.1 Random 4-Walk Benchmarks
For the random 4-walks, we first note that \(SpaceCost(RandRow(R)) = 1\), so final space costs collapse to 1. For the time costs, we present Fig. 11. We note that, in this case, both default and modified models fare well in their predictions – this can be attributed to the simplicity of the query and the fact that it uses only expandOut operations.
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this paper
Cite this paper
Singh, A., Tsoumakos, D. (2018). Towards an Algebraic Cost Model for Graph Operators. In: Alistarh, D., Delis, A., Pallis, G. (eds) Algorithmic Aspects of Cloud Computing. ALGOCLOUD 2017. Lecture Notes in Computer Science(), vol 10739. Springer, Cham. https://doi.org/10.1007/978-3-319-74875-7_6
Download citation
DOI: https://doi.org/10.1007/978-3-319-74875-7_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-74874-0
Online ISBN: 978-3-319-74875-7
eBook Packages: Computer ScienceComputer Science (R0)