ABSTRACT
Join optimization has been dominated by Selinger-style, pairwise optimizers for decades. But, Selinger-style algorithms are asymptotically suboptimal for applications in graphic analytics. This sub-optimality is one of the reasons that many have advocated supplementing relational engines with specialized graph processing engines. Recently, new join algorithms have been discovered that achieve optimal worst-case run times for any join or even so-called beyond worst-case (or instance optimal) run time guarantees for specialized classes of joins. These new algorithms match or improve on those used in specialized graph-processing systems. This paper asks can these new join algorithms allow relational engines to close the performance gap with graph engines?
We examine this question for graph-pattern queries or join queries. We find that classical relational databases like Postgres and MonetDB or newer graph databases/stores like Virtuoso and Neo4j may be orders of magnitude slower than these new approaches compared to a fully featured RDBMS, LogicBlox, using these new ideas. Our results demonstrate that an RDBMS with such new algorithms can perform as well as specialized engines like GraphLab -- while retaining a high-level interface. We hope our work adds to the ongoing debate of the role of graph accelerators, new graph systems, and relational systems in modern workloads.
- C. Aberger, A. Nötzli, K. Olukotun, and C. Ré. EmptyHeaded: Boolean Algebra Based Graph Processing. ArXiv e-prints, Mar. 2015.Google Scholar
- S. Abiteboul, R. Hull, and V. Vianu. Foundations of Databases. Addison-Wesley, 1995. Google ScholarDigital Library
- A. Atserias, M. Grohe, and D. Marx. Size bounds and query plans for relational joins. SIAM J. Comput., 42(4):1737--1767, 2013.Google ScholarDigital Library
- R. Fagin. Degrees of acyclicity for hypergraphs and relational database schemes. J. ACM, 30(3):514--550, 1983. Google ScholarDigital Library
- R. Fagin, A. Lotem, and M. Naor. Optimal aggregation algorithms for middleware. In PODS, 2001. Google ScholarDigital Library
- M. Grohe and D. Marx. Constraint solving via fractional edge covers. In SODA, pages 289--298. ACM Press, 2006. Google ScholarDigital Library
- J. Leskovec and A. Krevl. SNAP Datasets: Stanford large network dataset collection. http://snap.stanford.edu/data, June 2014.Google Scholar
- H. Q. Ngo, D. T. Nguyen, C. Re, and A. Rudra. Beyond worst-case analysis for joins with Minesweeper. In PODS, pages 234--245, 2014. Google ScholarDigital Library
- H. Q. Ngo, E. Porat, C. Ré, and A. Rudra. Worst-case optimal join algorithms: {extended abstract}. In PODS, pages 37--48, 2012. Google ScholarDigital Library
- H. Q. Ngo, C. Ré, and A. Rudra. Skew strikes back: New developments in the theory of join algorithms. In SIGMOD RECORD, pages 5--16, 2013. Google ScholarDigital Library
- M. Rudolf, M. Paradies, C. Bornhövd, and W. Lehner. The graph story of the sap hana database. In BTW, 2013.Google Scholar
- P. G. Selinger, M. M. Astrahan, D. D. Chamberlin, R. A. Lorie, and T. G. Price. Access path selection in a relational database management system. In SIGMOD, pages 23--34, New York, NY, USA, 1979. ACM. Google ScholarDigital Library
- T. L. Veldhuizen. Incremental maintenance for leapfrog triejoin. CoRR, abs/1303.5313, 2013.Google Scholar
- T. L. Veldhuizen. Leapfrog triejoin: A simple, worst-case optimal join algorithm. In ICDT, pages 96--106, 2014.Google Scholar
- T. L. Veldhuizen. Transaction repair: Full serializability without locks. CoRR, abs/1403.5645, 2014.Google Scholar
- M. Yannakakis. Algorithms for acyclic database schemes. In VLDB, pages 82--94, 1981. Google ScholarDigital Library
Index Terms
- Join Processing for Graph Patterns: An Old Dog with New Tricks
Recommendations
Multi-way spatial join selectivity for the ring join graph
Efficient spatial query processing is very important since the applications of the spatial DBMS (e.g. GIS, CAD/CAM, LBS) handle massive amount of data and consume much time. Many spatial queries contain the multi-way spatial join due to the fact that ...
Functional-join processing
Inter-object references are one of the key concepts of object-relational and object-oriented database systems. In this work, we investigate alternative techniques to implement inter-object references and make the best use of them in query processing, ...
Distributed stream join query processing with semijoins
This paper addresses the distributed stream processing of window-based multi-way join queries considering the semijoin as a key join operator. In distributed stream processing, data streams arriving at remote sites need to be shipped to the processing ...
Comments