skip to main content
10.1145/1142351.1142384acmconferencesArticle/Chapter ViewAbstractPublication PagespodsConference Proceedingsconference-collections
Article

Scalable computation of acyclic joins

Published:26 June 2006Publication History

ABSTRACT

The join operation of relational algebra is a cornerstone of relational database systems. Computing the join of several relations is NP-hard in general, whereas special (and typical) cases are tractable. This paper considers joins having an acyclic join graph, for which current methods initially apply a full reducer to efficiently eliminate tuples that will not contribute to the result of the join. From a worst-case perspective, previous algorithms for computing an acyclic join of k fully reduced relations, occupying a total of n≥k blocks on disk, use Ω((n+z)k) I/Os, where z is the size of the join result in blocks.In this paper we show how to compute the join in a time bound that is within a constant factor of the cost of running a full reducer plus sorting the output. For a broad class of acyclic join graphs this is O(sort(n+z)) I/Os, removing the dependence on k from previous bounds. Traditional methods decompose the join into a number of binary joins, which are then carried out one by one. Departing from this approach, our technique is based on computing the size of certain subsets of the result, and using these sizes to compute the location(s) of each data item in the result.Finally, as an initial study of cyclic joins in the I/O model, we show how to compute a join whose join graph is a 3-cycle, in O(n2/m+sort(n+z)) I/Os, where m is the number of blocks in internal memory.

References

  1. A. Aggarwal and J. S. Vitter. The input/output complexity of sorting and related problems. Comm. ACM, 31(9):1116--1127, 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. I. Baran, E. D. Demaine, and M. Patrascu. Subquadratic algorithms for 3SUM. In Proceedings of WADS, volume 3608 of Lecture Notes in Computer Science, pages 409--421, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. E. F. Codd. A relational model of data for large shared data banks. Comm. of the ACM, 13(6):377--387, 1970. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. A. Dobra, M. Garofalakis, J. Gehrke, and R. Rastogi. Processing complex aggregate queries over data streams. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 61--72. ACM Press, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. J. M. Hellerstein and M. Stonebraker. Readings in Database Systems. MIT Press, 4th edition, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. T. Ibaraki and T. Kameda. On the optimal nesting for computing N-relational joins. ACM Transactions on Database Systems, 9(3):482--502, 1984. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Y. E. Ioannidis. Query optimization. In Computer Science Handbook, Second Edition, chapter 55. Chapman & Hall/CRC, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Y. E. Ioannidis and S. Christodoulakis. On the propagation of errors in the size of join results. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 268--277, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Y. E. Ioannidis and Y. C. Kang. Left-deep vs. bushy trees: An analysis of strategy spaces and its implications for query optimization. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 168--177. ACM Press, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. D. Maier, Y. Sagiv, and M. Yannakakis. On the complexity of testing implications of functional and join dependencies. J. Assoc. Comput. Mach., 28(4):680--695, 1981. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. S. Pramanik and D. Vineyard. Optimizing join queries in distributed databases. In Foundations of Software Technology and Theoretical Computer Science (FSTTCS), volume 287 of Lecture Notes in Computer Science, pages 282--304. Springer, 1987. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. P. G. Selinger, M. M. Astrahan, D. D. Chamberlin, R. A. Lorie, and T. G. Price. Access path selection in a relational database management system. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 23--34. ACM Press, 1979. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. J. D. Ullman. Principles of Database and Knowledge-based Systems, volume 2. Computer Science Press, 1989. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. D. E. Willard. An algorithm for handling many relational calculus queries efficiently. J. Comput. System Sci., 65(2):295--331, 2002.Google ScholarGoogle ScholarCross RefCross Ref
  15. M. Yannakakis. Algorithms for acyclic database schemes. In 7th International Conference on Very Large Data Bases (VLDB), pages 82--94. IEEE, 1981.Google ScholarGoogle Scholar
  16. C. T. Yu and M. Z. Ozsoyoglu. An algorithm for tree-query membership of a distributed query. In Proceedings of Computer Software and Applications Conference, pages 306--312. IEEE, 1979.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Scalable computation of acyclic joins

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      PODS '06: Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
      June 2006
      382 pages
      ISBN:1595933182
      DOI:10.1145/1142351

      Copyright © 2006 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 26 June 2006

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • Article

      Acceptance Rates

      PODS '06 Paper Acceptance Rate35of185submissions,19%Overall Acceptance Rate642of2,707submissions,24%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader