ABSTRACT
The join operation of relational algebra is a cornerstone of relational database systems. Computing the join of several relations is NP-hard in general, whereas special (and typical) cases are tractable. This paper considers joins having an acyclic join graph, for which current methods initially apply a full reducer to efficiently eliminate tuples that will not contribute to the result of the join. From a worst-case perspective, previous algorithms for computing an acyclic join of k fully reduced relations, occupying a total of n≥k blocks on disk, use Ω((n+z)k) I/Os, where z is the size of the join result in blocks.In this paper we show how to compute the join in a time bound that is within a constant factor of the cost of running a full reducer plus sorting the output. For a broad class of acyclic join graphs this is O(sort(n+z)) I/Os, removing the dependence on k from previous bounds. Traditional methods decompose the join into a number of binary joins, which are then carried out one by one. Departing from this approach, our technique is based on computing the size of certain subsets of the result, and using these sizes to compute the location(s) of each data item in the result.Finally, as an initial study of cyclic joins in the I/O model, we show how to compute a join whose join graph is a 3-cycle, in O(n2/m+sort(n+z)) I/Os, where m is the number of blocks in internal memory.
- A. Aggarwal and J. S. Vitter. The input/output complexity of sorting and related problems. Comm. ACM, 31(9):1116--1127, 1988. Google ScholarDigital Library
- I. Baran, E. D. Demaine, and M. Patrascu. Subquadratic algorithms for 3SUM. In Proceedings of WADS, volume 3608 of Lecture Notes in Computer Science, pages 409--421, 2005. Google ScholarDigital Library
- E. F. Codd. A relational model of data for large shared data banks. Comm. of the ACM, 13(6):377--387, 1970. Google ScholarDigital Library
- A. Dobra, M. Garofalakis, J. Gehrke, and R. Rastogi. Processing complex aggregate queries over data streams. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 61--72. ACM Press, 2002. Google ScholarDigital Library
- J. M. Hellerstein and M. Stonebraker. Readings in Database Systems. MIT Press, 4th edition, 2005. Google ScholarDigital Library
- T. Ibaraki and T. Kameda. On the optimal nesting for computing N-relational joins. ACM Transactions on Database Systems, 9(3):482--502, 1984. Google ScholarDigital Library
- Y. E. Ioannidis. Query optimization. In Computer Science Handbook, Second Edition, chapter 55. Chapman & Hall/CRC, 2004. Google ScholarDigital Library
- Y. E. Ioannidis and S. Christodoulakis. On the propagation of errors in the size of join results. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 268--277, 1991. Google ScholarDigital Library
- Y. E. Ioannidis and Y. C. Kang. Left-deep vs. bushy trees: An analysis of strategy spaces and its implications for query optimization. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 168--177. ACM Press, 1991. Google ScholarDigital Library
- D. Maier, Y. Sagiv, and M. Yannakakis. On the complexity of testing implications of functional and join dependencies. J. Assoc. Comput. Mach., 28(4):680--695, 1981. Google ScholarDigital Library
- S. Pramanik and D. Vineyard. Optimizing join queries in distributed databases. In Foundations of Software Technology and Theoretical Computer Science (FSTTCS), volume 287 of Lecture Notes in Computer Science, pages 282--304. Springer, 1987. Google ScholarDigital Library
- P. G. Selinger, M. M. Astrahan, D. D. Chamberlin, R. A. Lorie, and T. G. Price. Access path selection in a relational database management system. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 23--34. ACM Press, 1979. Google ScholarDigital Library
- J. D. Ullman. Principles of Database and Knowledge-based Systems, volume 2. Computer Science Press, 1989. Google ScholarDigital Library
- D. E. Willard. An algorithm for handling many relational calculus queries efficiently. J. Comput. System Sci., 65(2):295--331, 2002.Google ScholarCross Ref
- M. Yannakakis. Algorithms for acyclic database schemes. In 7th International Conference on Very Large Data Bases (VLDB), pages 82--94. IEEE, 1981.Google Scholar
- C. T. Yu and M. Z. Ozsoyoglu. An algorithm for tree-query membership of a distributed query. In Proceedings of Computer Software and Applications Conference, pages 306--312. IEEE, 1979.Google ScholarCross Ref
Index Terms
- Scalable computation of acyclic joins
Recommendations
The complexity of acyclic conjunctive queries
This paper deals with the evaluation of acyclic Boolean conjunctive queries in relational databases. By well-known results of Yannakakis[1981], this problem is solvable in polynomial time; its precise complexity, however, has not been pinpointed so far. We ...
Fast and scalable inequality joins
Inequality joins, which is to join relations with inequality conditions, are used in various applications. Optimizing joins has been the subject of intensive research ranging from efficient join algorithms such as sort-merge join, to the use of ...
Outer Joins and Filters for Instantiating Objects from Relational Databases Through Views
One of the approaches for integrating object-oriented programs with databases is to instantiate objects from relational databases by evaluating view queries. In that approach, it is often necessary to evaluate some joins of the query by left outer joins ...
Comments