Abstract
Time intervals are often associated with tuples to represent their valid time in temporal relations, where overlap join is crucial for various kinds of queries. Many existing overlap join algorithms use indices based on tree structures such as quad-tree, B+-tree and interval tree. These algorithms usually have high CPU cost since deep path traversals are unavoidable, which makes them not so competitive as data-partition or plane-sweep based algorithms. This paper proposes an efficient overlap join algorithm based on a new two-layer flat index named as Overlap Interval Inverted Index (i.e., O2i Index). It uses an array to record the end points of intervals and approximates the nesting structures of intervals via two functions in the first layer, and the second layer uses inverted lists to trace all intervals satisfying the approximated nesting structures. With the help of the new index, the join algorithm only visits the must-be-scanned lists and skips all others. Analyses and experiments on both real and synthetic datasets show that the proposed algorithm is as competitive as the state-of-the-art algorithms.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Cafagna F, Böhlen M H. Disjoint interval partitioning. The VLDB Journal, 2017, 26(3): 447-466.
Bouros P, Mamoulis N. A forward scan based plane sweep algorithm for parallel interval joins. Proceedings of the VLDB Endowment, 2017, 10(11): 1346-1357.
Piatov D, Helmer S, Dignös A. An interval join optimized for modern hardware. In Proc. the 32nd IEEE International Conference on Data Engineering, May 2016, pp.1098-1109.
Dignös A, Böhlen M H, Gamper J. Overlap interval partition join. In Proc. the 2014 ACM SIGMOD International Conference on Management of Data, June 2014, pp.1459-1470.
Gao D, Jensen C S, Snodgrass R T, Soo M. Join operations in temporal databases. The VLDB Journal, 2005, 14(1): 2-29.
Elmasri R, Wuu G T J, Kim Y J. The time index: An access structure for temporal data. In Proc. the 16th International Conference on Very Large Databases, Aug. 1990, pp.1-12.
Son D, Elmasri R. Efficient temporal join processing using time index. In Proc. the 8th International Conference on Scientific and Statistical Data Base Management, Jun. 1996, pp.252-261.
Soo M D, Snodgrass R T, Jensen C S. Efficient evaluation of the valid-time natural join. In Proc. the 10th International Conference on Data Engineering, Feb. 1994, pp.282-292.
Sitzmann I, Stuckey P J. Improving temporal joins using histograms. In Proc. the 11th International Conference on Database and Expert Systems Applications, Sept. 2000, pp.488-498.
Gunadhi H, Segev A. Query processing algorithms for temporal intersection joins. In Proc. the 7th International Conference on Data Engineering, April 1991, pp.336-344.
Rana S P, Fotouhi F. Efficient processing of time-joins in temporal data bases. In Proc. the 3rd International Conference on Database Systems for Advanced Applications, April 1993, pp.427-432.
Pfser D, Jensen S J. Incremental join of time-oriented data. In Proc. the 11th International Conference on Scientific and Statistical Data Base Management, Jul. 1999, pp.232-243.
Zhang C, Naughton J, DeWitt D, Luo Q, Lohman G. On supporting containment queries in relational database management systems. In Proc. the 2001 ACM SIGMOD International Conference on Management of Data, May 2001, pp.425-436.
Zhang D, Tsotras V J, Seeger B. Efficient temporal join processing using indices. In Proc. the 18th International Conference on Data Engineering, Feb. 2002, pp.103-113.
Kriegel H P, Pötke M, Seidl T. Managing intervals efficiently in object-relational databases. In Proc. the 26th International Conference on Very Large Data Bases, Sept. 2000, pp.407-418.
Tsotras V J, Kangelaris N. The snapshot index: An I/O-optimal access method for timeslice queries. Information Systems, 1995, 20(3): 237-260.
Beckmann N, Seeger B. A revised R ⋆-tree in comparison with related index structures. In Proc. the 2009 ACM SIGMOD International Conference on Management of Data, June 2009, pp.799-812.
Ozsoyoglu G, Snodgrass R T. Temporal and real-time databases: A survey. IEEE Transactions on Knowledge and Data Engineering, 1995, 7(4): 513-532.
Beckmann N, Kriegel H P, Schneider R, Seeger B. The R ⋆-tree: An efficient and robust access method for points and rectangles. In Proc. the 1990 ACM SIGMOD International Conference on Management of Data, May 1990, pp.322-331.
Koudas N, Sevcik K C. Size separation spatial join. In Proc. the 1997 ACM SIGMOD International Conference on Management of Data, May 1997, pp.324-335.
Nobari S, Tauheed F, Heinis T, Karras P, Bressan S, Ailamaki A. TOUCH: In-memory spatial join by hierarchical data-oriented partitioning. In Proc. the 2013 ACM SIGMOD International Conference on Management of Data, Jun. 2013, pp.701-712.
Leung T Y C, Muntz R. Stream processing: Temporal query processing and optimization. In Temporal Databases: Theory, Design, and Implementation, Tassel A U, Clifford J, Gadia S et al. (eds.), Benjamin-Cummings, 1993, pp.329–355.
Lu H, Ooi B C, Tan K L. On spatially partitioned temporal join. In Proc. the 20th International Conference on Very Large Data Bases, Sept. 1994, pp.546-557.
Shen H, Ooi B C, Lu H. The TP-index: A dynamic and efficient indexing mechanism for temporal databases. In Proc. the 10th Int. Conf. Data Engineering, Feb. 1994, pp.274-281.
Arge L, Procopiuc O, Ramaswamy S et al. Scalable sweeping-based spatial join. In Proc. the 24th International Conference on Very Large Data Bases, Aug. 1998, pp.570-581.
Lo M L, Ravishankar C V. Spatial hash-joins. In Proc. the 1996 ACM SIGMOD International Conference on Management of Data, June 1996, pp.247-258.
Patel J M, Dewitt D J. Partition based spatial-merge join. In Proc. the 1996 ACM SIGMOD International Conference on Management of Data, Jun. 1996, pp.259-270.
Samet H. Foundation of Multidimensional and Metric Data Structures (1st edition). Morgan Kaufmann, 2006.
Enderle J, Hampel M, Seidl T. Joining interval data in relational databases. In Proc. the 2004 ACM SIGMOD International Conference on Management of Data, Jun. 2004, pp.683-694.
Zhang D, Tsotras V J. Index based processing of semi-restrictive temporal joins. In Proc. the 9th International Symposium on Temporal Representation and Reasoning, July 2002, pp.70-77.
Luo J Z, Shi S F, Wang H Z, Li J Z. FrepJoin: An efficient partition-based algorithm for edit similarity join. Frontiers of Information Technology & Electronic Engineering, 2017, 18(10): 1499-1510.
Wang C P, Wang C K, Wang H, Chen J, Ye X J. Preference join on heterogeneous data. In Proc. the 18th Asia Pacific Web Conference, Sept. 2016, pp.317-329.
Khayyat Z, Lucia W, Singh M, Ouzzani M, Papotti P, Quiané-Ruiz J A, Tang N, Kalnis P. Fast and scalable inequality joins. The VLDB Journal, 2017, 26(1): 125–150.
Acknowledgment
We would like to appreciate Zeng-Long Wang, a Master student of Ji-Zhou Luo, for his work on early experiments.
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Below is the link to the electronic supplementary material.
ESM 1
(PDF 784 kb)
Rights and permissions
About this article
Cite this article
Luo, JZ., Shi, SF., Yang, G. et al. O2iJoin: An Efficient Index-Based Algorithm for Overlap Interval Join. J. Comput. Sci. Technol. 33, 1023–1038 (2018). https://doi.org/10.1007/s11390-018-1872-x
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11390-018-1872-x