Article

On producing join results early

Authors:
Jens-Peter Dittrich

University of Marburg, Marburg, Germany

University of Marburg, Marburg, Germany
View Profile

,
Bernhard Seeger

University of Marburg, Marburg, Germany

University of Marburg, Marburg, Germany
View Profile

,
David Scot Taylor

ETH Zürich, Zürich, Switzerland

ETH Zürich, Zürich, Switzerland
View Profile

,
Peter Widmayer

ETH Zürich, Zürich, Switzerland

ETH Zürich, Zürich, Switzerland
View Profile

PODS '03: Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systemsJune 2003Pages 134–142https://doi.org/10.1145/773153.773167

Published:09 June 2003Publication History

PODS '03: Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems

Pages 134–142

ABSTRACT

Support for exploratory interaction with databases in applications such as data mining requires that the first few results of an operation be available as quickly as possible. We study the algorithmic side of what can and what cannot be achieved for processing join operations. We develop strategies that modify the strict two-phase processing of the sort-merge paradigm, intermingling join steps with selected merge phases of the sort. We propose an algorithm that produces early join results for a broad class of join problems, including many not addressed well by hash-based algorithms. Our algorithm has no significant increase in the number of I/O operations needed to complete the join compared to standard sort-merge algorithms.

References

L. Arge, O. Procopiuc, S. Ramaswamy, T. Suel, and J. S. Vitter. Scalable Sweeping-Based Spatial Join. In International Conference on Very Large Data Bases, pages 570--581, 1998. Google ScholarDigital Library
M. W. Blasgen and K. P. Eswaran. Storage and access in relational data bases. IBM Systems Journal, 16(4):362--377, 1977.Google ScholarDigital Library
C. Böhm, B. Braunmüller, F. Krebs, and H.-P. Kriegel. Epsilon Grid Order: An algorithm for the similarity join on massive high-dimensional data. In ACM SIGMOD International Conference on Management of Data, pages 379--388, 2001. Google ScholarDigital Library
S. Chaudhuri, R. Motwani, and V. R. Narasayya. On random sampling over joins. In ACM SIGMOD International Conference on Management of Data, pages 263--274, 1999. Google ScholarDigital Library
J.-P. Dittrich and B. Seeger. Data redundancy and duplicate detection in spatial join processing. In International Conference on Data Engineering, pages 535--546, 2000. Google ScholarDigital Library
J.-P. Dittrich and B. Seeger. GESS: a scalable similarity-join algorithm for mining large data sets in high dimensional spaces. In ACM SIGKDD International Converence on Knowledge Discover and Data Mining, pages 47--56, 2001. Google ScholarDigital Library
J.-P. Dittrich, B. Seeger, D. S. Taylor, and P. Widmayer. Progressive Merge Join: A generic and non-blocking sort-based join algorithm. In International Conference on Very Large Data Bases, pages 299--310, 2002. Google ScholarDigital Library
G. Graefe. Heap-Filter Merge Join: A new algorithm for joining medium-size inputs. IEEE Transactions on Software Engineering, 17(9):979--982, 1991. Google ScholarDigital Library
G. Graefe. Query evaluation techniques for large databases. ACM Computing Surveys, 25(2):73--170, 1993. Google ScholarDigital Library
G. Graefe. Sort-Merge-Join: An idea whose time has(h) passed? In International Conference on Data Engineering, pages 406--417, 1994. Google ScholarDigital Library
P. J. Haas and J. M. Hellerstein. Ripple Joins for online aggregation. In ACM SIGMOD International Conference on Management of Data, pages 287--298, 1999. Google ScholarDigital Library
Z. G. Ives, D. Florescu, M. Friedman, A. Y. Levy, and D. S. Weld. An adaptive query execution system for data integration. In ACM SIGMOD International Conference on Management of Data, pages 299--310, 1999. Google ScholarDigital Library
D. Knuth. The Art of Computer Programming, Volume III: Searching and Sorting. Addison Wesley, second edition, 1998. Google ScholarDigital Library
R. E. Korf. Depth-First Iterative-Deepening: An optimal admissible tree search. Artificial Intelligence, 27(1):35--77, 1985. Google ScholarDigital Library
R. A. Kyuseok Shim, Ramakrishnan Srikant. High-dimensional similarity joins. In International Conference on Data Engineering, pages 301--313, 1997. Google ScholarDigital Library
W. Li, D. Gao, and R. T. Snodgrass. Skew handling techniques in sort-merge join. In ACM SIGMOD International Conference on Management of Data, pages 169--180, 2002. Google ScholarDigital Library
G. Luo, J. F. Naughton, and C. Ellmann. A non-blocking parallel spatial join algorithm. In International Conference on Data Engineering, pages 697--705, 2002. Google ScholarDigital Library
M. Negri and G. Pelagatti. Join During Merge: An improved sort based algorithm. Information Processing Letters, 21(1):11--16, 1985.Google ScholarCross Ref
J. A. Orenstein. Spatial query processing in an object-oriented database system. In ACM SIGMOD International Conference on Management of Data, pages 326--336, 1986. Google ScholarDigital Library
J. A. Orenstein. An algorithm for computing the overlay of k--dimensional spaces. In International Symposium on Advances in Spacial Databases, pages 381--400, 1991. Google ScholarDigital Library
J. M. Patel and D. J. DeWitt. Partition Based Spatial-Merge Join. In ACM SIGMOD International Conference on Management of Data, pages 259--270, 1996. Google ScholarDigital Library
L. Raschid and S. Y. W. Su. A parallel processing strategy for evaluating recursive queries. In International Conference On Very Large Data Bases, pages 412--419, 1986. Google ScholarDigital Library
P. G. Selinger, M. M. Astrahan, D. D. Chamberlin, R. A. Lorie, and T. G. Price. Access path selection in a relational database management system. In ACM SIGMOD International Conference on Management of Data, pages 23--34, 1979. Google ScholarDigital Library
T. Urhan and M. J. Franklin. XJoin: A reactively-scheduled pipelined join operator. Data Engineering Bulletin, 23(2):27--33, 2000.Google Scholar
A. N. Wilschut and P. M. G. Apers. Pipelining in query execution. In Conference on Databases, Parallel Architectures and their Applications, pages 68--77, 1991.Google Scholar

Index Terms

On producing join results early
1. Information systems
  1. Information retrieval
2. Theory of computation
  1. Computational complexity and cryptography
    1. Complexity classes

Recommendations

Multi-way spatial join selectivity for the ring join graph

Efficient spatial query processing is very important since the applications of the spatial DBMS (e.g. GIS, CAD/CAM, LBS) handle massive amount of data and consume much time. Many spatial queries contain the multi-way spatial join due to the fact that ...
Read More
Combining Joint and Semi-Join Operations for Distributed Query Processing

The application of a combination of join and semi-join operations to minimize the amount of data transmission required for distributed query processing is discussed. Specifically, two important concepts that occur with the use of join operations as ...
Read More
Interleaving a Join Sequence with Semijoins in Distributed Query Processing

The problem of combining join and semijoin reducers for distributed query processing is studied. An approach based on interleaving a join sequence with beneficial semijoins is proposed. A join sequence is mapped into a join sequence tree first. The join ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
PODS '03: Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
June 2003
291 pages
ISBN:1581136706
DOI:10.1145/773153
Conference Chair:
Frank Neven
Limburgs Universitair Centrum
,
General Chair:
Catriel Beeri
Hebrew University of Jerusalem
,
Program Chair:
Tova Milo
Tel Aviv University & INRIA
Copyright © 2003 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 9 June 2003
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
data mining
join processing
non-blocking
query processing
spatial data
Qualifiers
- Article
Conference

Acceptance Rates
PODS '03 Paper Acceptance Rate27of136submissions,20%Overall Acceptance Rate642of2,707submissions,24%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 14
  Total Citations
  View Citations
- 635
  Total Downloads
- Downloads (Last 12 months)0
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

On producing join results early

PODS '03: Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems

ABSTRACT

References

Cited By

Index Terms

Recommendations

Multi-way spatial join selectivity for the ring join graph

Combining Joint and Semi-Join Operations for Distributed Query Processing

Interleaving a Join Sequence with Semijoins in Distributed Query Processing