Abstract
Top-k joins have been extensively studied when numerical valued attributes are joined on an equality predicate. Other types of join attributes and predicates have received little to no attention. In this paper, we consider spatial objects that are assigned a score (e.g., a ranking). Give two collections R, S of such objects and a spatial distance threshold 𝜖, we introduce the top-k spatial distance join (k-SDJoin) to identify the k pairs of objects, which have the highest combined score (based on an aggregate function γ) among all object pairs in R × S with a spatial distance at most 𝜖. State-the-of-art methods for relational top-k joins can be adapted for k-SDJoin, but their focus is on minimizing the number of objects accessed from the inputs; however, when spatial objects are joined, the computational cost can easily become the bottleneck. In view of this, we propose a novel evaluation algorithm, which greatly reduces the computational cost, without compromising the access cost. The main idea is to access and efficiently join blocks of objects from each collection, using appropriate bounds to avoid computing the entire spatial 𝜖-distance join. As the performance of our solution heavily relies on the size of the input blocks, we devise an approach for automated block size tuning enhanced by a novel generic model for estimating the number of objects to be accessed from each input. Contrary to previous efforts, our model employs cheap-to-compute statistics and requires no prior knowledge of data distribution. Our extensive experimental analysis demonstrates the efficiency of our algorithm compared to methods based on existing literature that prioritize either the ranking or the spatial join component of k-SDJoin queries.
Similar content being viewed by others
Notes
An exception is the work of [21] which, however, is restricted to a specific type of attributes (probabilities) and a specific aggregation function (product).
Input collections R and S need not to be sorted on their scoring attribute for example, if they stem from previous query operators which produce such interesting orders.
When a dataset is sorted in descending order of its scoring attribute, the lowest seen score is equivalent to the last seen score.
In this paper, we define the dist function on non-leaf entries as the minimum distance between the MBR of two tree entries bounding boxes or between the MBR of a tree entry and an object, i.e., \(dist(e,e^{\prime }) = MINDIST(e,e^{\prime })\) or dist(o, e) = MINDIST(o, e), respectively.
We briefly discuss the cost of automatically determining block size λ in the next section.
We denote by \(r_{c_{R}}\) and \(s_{c_{S}}\) the cR and cS-th objects in the sorted inputs, respectively.
References
Arge L, Procopiuc O, Ramaswamy S, Suel T, Vitter JS (1998) Scalable sweeping-based spatial join. In: VLDB’98, Proceedings of 24rd International Conference on Very Large Data Bases, New York City, pp 570–581
Belussi A, Faloutsos C (1995) Estimating the selectivity of spatial queries using the ‘correlation’ fractal dimension. In: VLDB’95, Proceedings of 21th International Conference on Very Large Data Bases, Zurich, pp 299–310
Brinkhoff T, Kriegel HP, Seeger B (1993) Efficient processing of spatial joins using R-trees. In: Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, Washington, pp 237–246
Chakrabarti K, Chaudhuri S, Ganti V (2011) Interval-based pruning for top-k processing over compressed lists. In: Proceedings of the 27th International Conference on Data Engineering, ICDE 2011, Hannover, pp 709–720
Chan EPF (2003) Buffer queries. IEEE TKDE 15(4):895–910
Corral A, Manolopoulos Y, Theodoridis Y, Vassilakopoulos M (2000) Closest pair queries in spatial databases. In: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, Dallas, pp 189–200
Doulkeridis C, Vlachou A, Kotidis Y, Polyzotis N (2012) Processing of rank joins in highly distributed systems. In: IEEE 28Th international conference on data engineering (ICDE 2012), Washington, pp 606–617
Fagin R, Lotem A, Naor M (2001) Optimal aggregation algorithms for middleware. In: Proceedings of the Twentieth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, Santa Barbara, pp 102–113
Faloutsos C, Seeger B, Traina A, Traina C Jr (2000) Spatial join selectivity using power laws. In: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, Dallas, pp 177–188
Finger J, Polyzotis N (2009) Robust and efficient algorithms for rank join evaluation. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2009, Providence, pp 415–428
Guttman A (1984) R-trees: a dynamic index structure for spatial searching. In: SIGMOD’84, Proceedings of Annual Meeting, Boston, pp 47–57
Hjaltason GR, Samet H (1998) Incremental distance join algorithms for spatial databases. In: SIGMOD 1998, Proceedings ACM SIGMOD International Conference on Management of Data, Seattle, pp 237–248
Hu H, Li G, Bao Z, Feng J, Wu Y, Gong Z, Xu Y (2016) Top-k spatio-textual similarity join. IEEE TKDE 28(2):551–565
Ilyas IF, Aref WG, Elmagarmid AK (2003) Supporting top-k join queries in relational databases. In: VLDB 2003, Proceedings of 29th International Conference on Very Large Data Bases, Berlin, pp 754–765
Ilyas IF, Shah R, Aref WG, Vitter JS, Elmagarmid AK (2004) Rank-aware query optimization. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, Paris, pp 203–214
Jacox EH, Samet H (2007) Spatial join techniques. ACM Trans Database Syst 32(1):7
Kiefer J (1953) Sequential minimax search for a maximum. Proc Am Math Soc 4(3):502–506
Kim Y, Shim K (2012) Parallel top-k similarity join algorithms using mapreduce. In: IEEE 28Th international conference on data engineering (ICDE 2012), Washington, pp 510–521
Koudas N, Muthukrishnan S, Srivastava D (2000) Optimal histograms for hierarchical range queries. In: Proceedings of the Nineteenth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, Dallas, pp 196–204
Li C, Chang KCC, Ilyas IF, Song S (2005) Ranksql: Query algebra and optimization for relational top-k queries. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, Baltimore, pp 131–142
Ljosa V, Singh AK (2008) Top-k spatial joins of probabilistic objects. In: Proceedings of the 24th International Conference on Data Engineering, ICDE 2008, Cancu̇n, pp 566–575
Mamoulis N, Papadias D (2001) Multiway spatial joins. ACM Trans Database Syst 26(4):424–475
Mamoulis N, Yiu ML, Cheng KH, Cheung DW (2007) Efficient top-k aggregation of ranked inputs. ACM TODS 32(3):19–63
Martinenghi D, Tagliasacchi M (2010) Proximity rank join. PVLDB 3 (1):352–363
Natsev A, Chang YC, Smith JR, Li CS, Vitter JS (2001) Supporting incremental join queries on ranked inputs. In: VLDB 2001, Proceedings of 27th International Conference on Very Large Data Bases, Roma, pp 281–290
Nobari S, Tauheed F, Heinis T, Karras P, Bressan S, Ailamaki A (2013) TOUCH: in-memory spatial join by hierarchical data-oriented partitioning. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2013, New York, pp 701–712
Ntarmos N, Patlakas I, Triantafillou P (2014) Rank join queries in nosql databases. PVLDB 7(7):493–504
Papadias D, Kalnis P, Zhang J, Tao Y (2001) Efficient OLAP operations in spatial data warehouses. In: Advances in spatial and temporal databases, 7th international symposium, SSTD 2001, Redondo Beach, Proceedings, pp 443–459
Patel JM, DeWitt DJ (1996) Partition based spatial-merge join. In: Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data, Montreal, pp 259–270
Petersen SB, Neves-Petersen MT, Henriksen SB, Mortensen RJ, Geertz-Hansen HM (2012) Scale-free behaviour of amino acid pair interactions in folded proteins. PLos ONE 7(7):1–14
Poosala V, Haas PJ, Ioannidis YE, Shekita EJ (1996) Improved histograms for selectivity estimation of range predicates. In: Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data, Montreal, pp 294–305
Qi S, Bouros P, Mamoulis N (2013) Efficient top-k spatial distance joins. In: Advances in spatial and temporal databases - 13th international symposium, SSTD 2013, Munich, pp 1–18
Qian Z, Xu J, Zheng K, Zhao P, Zhou X (2018) Semantic-aware top-k spatial keyword queries. World Wide Web 21(3):573–594
Ray S, Simion B, Brown AD, Johnson R (2014) Skew-resistant parallel in-memory spatial join. In: Conference on scientific and statistical database management, SSDBM’14, Aalborg, pp 6:1–6:12
Roussopoulos N, Kelley S, Vincent F (1995) Nearest neighbor queries. In: Proceedings of the 1995 ACM SIGMOD International Conference on Management of Data, San Jose, pp 71–79
Saouk M, Doulkeridis C, Vlachou A, Nørvåg K (2016) Efficient processing of top-k joins in mapreduce. In: 2016 IEEE International conference on big data, bigdata 2016, Washington, pp 570–577
Schnaitter K, Polyzotis N (2010) Optimal algorithms for evaluating rank joins in database systems. ACM TODS 35(1):6:1–6:47
Schnaitter K, Spiegel J, Polyzotis N (2007) Depth estimation for ranking query optimization. In: Proceedings of the 33rd International Conference on Very Large Data Bases. University of Vienna, Austria, pp 902–913
Shin H, Moon B, Lee S (2000) Adaptive multi-stage distance join processing. In: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, Dallas, pp 343–354
Smith AJ (1978) Sequentiality and prefetching in database systems. ACM TODS 3(3):223–247
Wu M, Berti-Équille L, Marian A, Procopiuc CM, Srivastava D (2010) Processing top-k join queries. PVLDB 3(1-2):860–870
Xiao C, Wang W, Lin X, Shang H (2009) Top-k set similarity joins. In: Proceedings of the 2009 IEEE International Conference on Data Engineering, pp 916–927
Xin D, Han J, Chang KC (2007) Progressive and selective merge: computing top-k with ad-hoc ranking functions. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, Beijing, pp 103–114
Zhang S, Han J, Liu Z, Wang K, Xu Z (2009) SJMR: parallelizing spatial join with mapreduce on clusters. In: Proceedings of the 2009 IEEE International Conference on Cluster Computing, New Orleans, pp 1–8
Zhao K, Zhou S, Tan KL, Zhou A (2005) Supporting ranked join in peer-to-peer networks. In: 16Th international workshop on database and expert systems applications (DEXA’05), pp 796–800
Zhu M, Papadias D, Lee DL, Zhang J (2005) Top-k spatial joins. IEEE TKDE 17(4):567–579
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Qi, S., Bouros, P. & Mamoulis, N. Top-k spatial distance joins. Geoinformatica 24, 591–631 (2020). https://doi.org/10.1007/s10707-020-00393-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10707-020-00393-z