skip to main content
10.1145/773153.773167acmconferencesArticle/Chapter ViewAbstractPublication PagespodsConference Proceedingsconference-collections
Article

On producing join results early

Published:09 June 2003Publication History

ABSTRACT

Support for exploratory interaction with databases in applications such as data mining requires that the first few results of an operation be available as quickly as possible. We study the algorithmic side of what can and what cannot be achieved for processing join operations. We develop strategies that modify the strict two-phase processing of the sort-merge paradigm, intermingling join steps with selected merge phases of the sort. We propose an algorithm that produces early join results for a broad class of join problems, including many not addressed well by hash-based algorithms. Our algorithm has no significant increase in the number of I/O operations needed to complete the join compared to standard sort-merge algorithms.

References

  1. L. Arge, O. Procopiuc, S. Ramaswamy, T. Suel, and J. S. Vitter. Scalable Sweeping-Based Spatial Join. In International Conference on Very Large Data Bases, pages 570--581, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. M. W. Blasgen and K. P. Eswaran. Storage and access in relational data bases. IBM Systems Journal, 16(4):362--377, 1977.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. C. Böhm, B. Braunmüller, F. Krebs, and H.-P. Kriegel. Epsilon Grid Order: An algorithm for the similarity join on massive high-dimensional data. In ACM SIGMOD International Conference on Management of Data, pages 379--388, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. S. Chaudhuri, R. Motwani, and V. R. Narasayya. On random sampling over joins. In ACM SIGMOD International Conference on Management of Data, pages 263--274, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. J.-P. Dittrich and B. Seeger. Data redundancy and duplicate detection in spatial join processing. In International Conference on Data Engineering, pages 535--546, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. J.-P. Dittrich and B. Seeger. GESS: a scalable similarity-join algorithm for mining large data sets in high dimensional spaces. In ACM SIGKDD International Converence on Knowledge Discover and Data Mining, pages 47--56, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. J.-P. Dittrich, B. Seeger, D. S. Taylor, and P. Widmayer. Progressive Merge Join: A generic and non-blocking sort-based join algorithm. In International Conference on Very Large Data Bases, pages 299--310, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. G. Graefe. Heap-Filter Merge Join: A new algorithm for joining medium-size inputs. IEEE Transactions on Software Engineering, 17(9):979--982, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. G. Graefe. Query evaluation techniques for large databases. ACM Computing Surveys, 25(2):73--170, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. G. Graefe. Sort-Merge-Join: An idea whose time has(h) passed? In International Conference on Data Engineering, pages 406--417, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. P. J. Haas and J. M. Hellerstein. Ripple Joins for online aggregation. In ACM SIGMOD International Conference on Management of Data, pages 287--298, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Z. G. Ives, D. Florescu, M. Friedman, A. Y. Levy, and D. S. Weld. An adaptive query execution system for data integration. In ACM SIGMOD International Conference on Management of Data, pages 299--310, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. D. Knuth. The Art of Computer Programming, Volume III: Searching and Sorting. Addison Wesley, second edition, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. R. E. Korf. Depth-First Iterative-Deepening: An optimal admissible tree search. Artificial Intelligence, 27(1):35--77, 1985. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. R. A. Kyuseok Shim, Ramakrishnan Srikant. High-dimensional similarity joins. In International Conference on Data Engineering, pages 301--313, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. W. Li, D. Gao, and R. T. Snodgrass. Skew handling techniques in sort-merge join. In ACM SIGMOD International Conference on Management of Data, pages 169--180, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. G. Luo, J. F. Naughton, and C. Ellmann. A non-blocking parallel spatial join algorithm. In International Conference on Data Engineering, pages 697--705, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. M. Negri and G. Pelagatti. Join During Merge: An improved sort based algorithm. Information Processing Letters, 21(1):11--16, 1985.Google ScholarGoogle ScholarCross RefCross Ref
  19. J. A. Orenstein. Spatial query processing in an object-oriented database system. In ACM SIGMOD International Conference on Management of Data, pages 326--336, 1986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. J. A. Orenstein. An algorithm for computing the overlay of k--dimensional spaces. In International Symposium on Advances in Spacial Databases, pages 381--400, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. J. M. Patel and D. J. DeWitt. Partition Based Spatial-Merge Join. In ACM SIGMOD International Conference on Management of Data, pages 259--270, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. L. Raschid and S. Y. W. Su. A parallel processing strategy for evaluating recursive queries. In International Conference On Very Large Data Bases, pages 412--419, 1986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. P. G. Selinger, M. M. Astrahan, D. D. Chamberlin, R. A. Lorie, and T. G. Price. Access path selection in a relational database management system. In ACM SIGMOD International Conference on Management of Data, pages 23--34, 1979. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. T. Urhan and M. J. Franklin. XJoin: A reactively-scheduled pipelined join operator. Data Engineering Bulletin, 23(2):27--33, 2000.Google ScholarGoogle Scholar
  25. A. N. Wilschut and P. M. G. Apers. Pipelining in query execution. In Conference on Databases, Parallel Architectures and their Applications, pages 68--77, 1991.Google ScholarGoogle Scholar

Index Terms

  1. On producing join results early

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        PODS '03: Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
        June 2003
        291 pages
        ISBN:1581136706
        DOI:10.1145/773153
        • Conference Chair:
        • Frank Neven,
        • General Chair:
        • Catriel Beeri,
        • Program Chair:
        • Tova Milo

        Copyright © 2003 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 9 June 2003

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • Article

        Acceptance Rates

        PODS '03 Paper Acceptance Rate27of136submissions,20%Overall Acceptance Rate642of2,707submissions,24%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader