Abstract
We consider the join operation in metric spaces. Given two sets A and B of objects drawn from some universe \(\mathbb U\), we want to compute the set \(A \Join B = \{(a,b) \in A \times B\;|\;d(a,b) \leq r\}\) efficiently, where \(d : \mathbb U \times \mathbb U \to \mathbb R^+\) is a metric distance function and r ∈ ℝ + is user supplied query radius. In particular we are interested in the case where we have no index available (nor we can afford to build it) for either A or B. In this paper we improve the Quickjoin algorithm (Jacox and Samet, 2008), based on the well-know Quicksort algorithm, by (i) replacing the low level component that handles small subsets with essentially brute-force nested loop with a more efficient method; (ii) showing that, contrary to Quicksort, in Quickjoin unbalanced partitioning can improve the algorithm; and (iii) making the algorithm probabilistic while still obtaining most of the relevant results. We also show how to use Quickjoin to compute k-nearest neighbor joins. The experimental results show that the method works well in practice.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Chávez, E., Navarro, G., Baeza-Yates, R., Marroquin, J.: Searching in metric spaces. ACM Computing Surveys 33(3), 273–321 (2001)
Hjaltason, G., Samet, H.: Index-driven similarity search in metric spaces. ACM Transactions Database Systems 28(4), 517–580 (2003)
Samet, H.: Foundations of Multidimensional and Metric Data Structures. Morgan Kaufmann Publishers Inc., San Francisco (2005)
Hoare, C.A.R.: Quicksort. Comput. J. 5(1), 10–15 (1962)
Jacox, E.H., Samet, H.: Metric space similarity joins. ACM Trans. Database Syst. 33(2) (2008)
Vidal, E.: An algorithm for finding nearest neighbors in (approximately) constant average time. Pattern Recognition Letters 4, 145–157 (1986)
Chávez, E., Navarro, G.: A compact space decomposition for effective metric indexing. Pattern Recognition Letters 26(9), 1363–1376 (2005)
Dohnal, V., Gennaro, C., Zezula, P.: Similarity join in metric spaces using eD-index. In: Mařík, V., Štěpánková, O., Retschitzegger, W. (eds.) DEXA 2003. LNCS, vol. 2736, pp. 484–493. Springer, Heidelberg (2003)
Dohnal, V., Gennaro, C., Savino, P., Zezula, P.: D-index: Distance searching index for metric data sets. Multimedia Tools Appl. 21(1), 9–33 (2003)
Paredes, R., Reyes, N.: Solving similarity joins and range queries in metric spaces with the list of twin clusters. Journal of Discrete Algorithms (JDA) 7, 18–35 (2009)
Fredriksson, K.: Exploiting distance coherence to speed up range queries in metric indexes. Information Processing Letters 95(1), 287–292 (2005)
Uhlmann, J.: Satisfying general proximity/similarity queries with metric trees. Information Processing Letters, 175–179 (1991)
Chávez, E., Navarro, G.: Probabilistic proximity search: Fighting the curse of dimensionality in metric spaces. Information Processing Letters 85, 39–46 (2003)
Fredriksson, K.: Engineering efficient metric indexes. Pattern Recognition Letters (PRL) 28(1), 75–84 (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Fredriksson, K., Braithwaite, B. (2013). Quicker Similarity Joins in Metric Spaces. In: Brisaboa, N., Pedreira, O., Zezula, P. (eds) Similarity Search and Applications. SISAP 2013. Lecture Notes in Computer Science, vol 8199. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-41062-8_13
Download citation
DOI: https://doi.org/10.1007/978-3-642-41062-8_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-41061-1
Online ISBN: 978-3-642-41062-8
eBook Packages: Computer ScienceComputer Science (R0)