ABSTRACT
Support for exploratory interaction with databases in applications such as data mining requires that the first few results of an operation be available as quickly as possible. We study the algorithmic side of what can and what cannot be achieved for processing join operations. We develop strategies that modify the strict two-phase processing of the sort-merge paradigm, intermingling join steps with selected merge phases of the sort. We propose an algorithm that produces early join results for a broad class of join problems, including many not addressed well by hash-based algorithms. Our algorithm has no significant increase in the number of I/O operations needed to complete the join compared to standard sort-merge algorithms.
- L. Arge, O. Procopiuc, S. Ramaswamy, T. Suel, and J. S. Vitter. Scalable Sweeping-Based Spatial Join. In International Conference on Very Large Data Bases, pages 570--581, 1998. Google ScholarDigital Library
- M. W. Blasgen and K. P. Eswaran. Storage and access in relational data bases. IBM Systems Journal, 16(4):362--377, 1977.Google ScholarDigital Library
- C. Böhm, B. Braunmüller, F. Krebs, and H.-P. Kriegel. Epsilon Grid Order: An algorithm for the similarity join on massive high-dimensional data. In ACM SIGMOD International Conference on Management of Data, pages 379--388, 2001. Google ScholarDigital Library
- S. Chaudhuri, R. Motwani, and V. R. Narasayya. On random sampling over joins. In ACM SIGMOD International Conference on Management of Data, pages 263--274, 1999. Google ScholarDigital Library
- J.-P. Dittrich and B. Seeger. Data redundancy and duplicate detection in spatial join processing. In International Conference on Data Engineering, pages 535--546, 2000. Google ScholarDigital Library
- J.-P. Dittrich and B. Seeger. GESS: a scalable similarity-join algorithm for mining large data sets in high dimensional spaces. In ACM SIGKDD International Converence on Knowledge Discover and Data Mining, pages 47--56, 2001. Google ScholarDigital Library
- J.-P. Dittrich, B. Seeger, D. S. Taylor, and P. Widmayer. Progressive Merge Join: A generic and non-blocking sort-based join algorithm. In International Conference on Very Large Data Bases, pages 299--310, 2002. Google ScholarDigital Library
- G. Graefe. Heap-Filter Merge Join: A new algorithm for joining medium-size inputs. IEEE Transactions on Software Engineering, 17(9):979--982, 1991. Google ScholarDigital Library
- G. Graefe. Query evaluation techniques for large databases. ACM Computing Surveys, 25(2):73--170, 1993. Google ScholarDigital Library
- G. Graefe. Sort-Merge-Join: An idea whose time has(h) passed? In International Conference on Data Engineering, pages 406--417, 1994. Google ScholarDigital Library
- P. J. Haas and J. M. Hellerstein. Ripple Joins for online aggregation. In ACM SIGMOD International Conference on Management of Data, pages 287--298, 1999. Google ScholarDigital Library
- Z. G. Ives, D. Florescu, M. Friedman, A. Y. Levy, and D. S. Weld. An adaptive query execution system for data integration. In ACM SIGMOD International Conference on Management of Data, pages 299--310, 1999. Google ScholarDigital Library
- D. Knuth. The Art of Computer Programming, Volume III: Searching and Sorting. Addison Wesley, second edition, 1998. Google ScholarDigital Library
- R. E. Korf. Depth-First Iterative-Deepening: An optimal admissible tree search. Artificial Intelligence, 27(1):35--77, 1985. Google ScholarDigital Library
- R. A. Kyuseok Shim, Ramakrishnan Srikant. High-dimensional similarity joins. In International Conference on Data Engineering, pages 301--313, 1997. Google ScholarDigital Library
- W. Li, D. Gao, and R. T. Snodgrass. Skew handling techniques in sort-merge join. In ACM SIGMOD International Conference on Management of Data, pages 169--180, 2002. Google ScholarDigital Library
- G. Luo, J. F. Naughton, and C. Ellmann. A non-blocking parallel spatial join algorithm. In International Conference on Data Engineering, pages 697--705, 2002. Google ScholarDigital Library
- M. Negri and G. Pelagatti. Join During Merge: An improved sort based algorithm. Information Processing Letters, 21(1):11--16, 1985.Google ScholarCross Ref
- J. A. Orenstein. Spatial query processing in an object-oriented database system. In ACM SIGMOD International Conference on Management of Data, pages 326--336, 1986. Google ScholarDigital Library
- J. A. Orenstein. An algorithm for computing the overlay of k--dimensional spaces. In International Symposium on Advances in Spacial Databases, pages 381--400, 1991. Google ScholarDigital Library
- J. M. Patel and D. J. DeWitt. Partition Based Spatial-Merge Join. In ACM SIGMOD International Conference on Management of Data, pages 259--270, 1996. Google ScholarDigital Library
- L. Raschid and S. Y. W. Su. A parallel processing strategy for evaluating recursive queries. In International Conference On Very Large Data Bases, pages 412--419, 1986. Google ScholarDigital Library
- P. G. Selinger, M. M. Astrahan, D. D. Chamberlin, R. A. Lorie, and T. G. Price. Access path selection in a relational database management system. In ACM SIGMOD International Conference on Management of Data, pages 23--34, 1979. Google ScholarDigital Library
- T. Urhan and M. J. Franklin. XJoin: A reactively-scheduled pipelined join operator. Data Engineering Bulletin, 23(2):27--33, 2000.Google Scholar
- A. N. Wilschut and P. M. G. Apers. Pipelining in query execution. In Conference on Databases, Parallel Architectures and their Applications, pages 68--77, 1991.Google Scholar
Index Terms
- On producing join results early
Recommendations
Multi-way spatial join selectivity for the ring join graph
Efficient spatial query processing is very important since the applications of the spatial DBMS (e.g. GIS, CAD/CAM, LBS) handle massive amount of data and consume much time. Many spatial queries contain the multi-way spatial join due to the fact that ...
Combining Joint and Semi-Join Operations for Distributed Query Processing
The application of a combination of join and semi-join operations to minimize the amount of data transmission required for distributed query processing is discussed. Specifically, two important concepts that occur with the use of join operations as ...
Interleaving a Join Sequence with Semijoins in Distributed Query Processing
The problem of combining join and semijoin reducers for distributed query processing is studied. An approach based on interleaving a join sequence with beneficial semijoins is proposed. A join sequence is mapped into a join sequence tree first. The join ...
Comments