Skip to main content

Quicker Similarity Joins in Metric Spaces

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8199))

Abstract

We consider the join operation in metric spaces. Given two sets A and B of objects drawn from some universe \(\mathbb U\), we want to compute the set \(A \Join B = \{(a,b) \in A \times B\;|\;d(a,b) \leq r\}\) efficiently, where \(d : \mathbb U \times \mathbb U \to \mathbb R^+\) is a metric distance function and r ∈ ℝ +  is user supplied query radius. In particular we are interested in the case where we have no index available (nor we can afford to build it) for either A or B. In this paper we improve the Quickjoin algorithm (Jacox and Samet, 2008), based on the well-know Quicksort algorithm, by (i) replacing the low level component that handles small subsets with essentially brute-force nested loop with a more efficient method; (ii) showing that, contrary to Quicksort, in Quickjoin unbalanced partitioning can improve the algorithm; and (iii) making the algorithm probabilistic while still obtaining most of the relevant results. We also show how to use Quickjoin to compute k-nearest neighbor joins. The experimental results show that the method works well in practice.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Chávez, E., Navarro, G., Baeza-Yates, R., Marroquin, J.: Searching in metric spaces. ACM Computing Surveys 33(3), 273–321 (2001)

    Article  Google Scholar 

  2. Hjaltason, G., Samet, H.: Index-driven similarity search in metric spaces. ACM Transactions Database Systems 28(4), 517–580 (2003)

    Article  Google Scholar 

  3. Samet, H.: Foundations of Multidimensional and Metric Data Structures. Morgan Kaufmann Publishers Inc., San Francisco (2005)

    Google Scholar 

  4. Hoare, C.A.R.: Quicksort. Comput. J. 5(1), 10–15 (1962)

    Article  MathSciNet  MATH  Google Scholar 

  5. Jacox, E.H., Samet, H.: Metric space similarity joins. ACM Trans. Database Syst. 33(2) (2008)

    Google Scholar 

  6. Vidal, E.: An algorithm for finding nearest neighbors in (approximately) constant average time. Pattern Recognition Letters 4, 145–157 (1986)

    Article  Google Scholar 

  7. Chávez, E., Navarro, G.: A compact space decomposition for effective metric indexing. Pattern Recognition Letters 26(9), 1363–1376 (2005)

    Article  Google Scholar 

  8. Dohnal, V., Gennaro, C., Zezula, P.: Similarity join in metric spaces using eD-index. In: Mařík, V., Štěpánková, O., Retschitzegger, W. (eds.) DEXA 2003. LNCS, vol. 2736, pp. 484–493. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  9. Dohnal, V., Gennaro, C., Savino, P., Zezula, P.: D-index: Distance searching index for metric data sets. Multimedia Tools Appl. 21(1), 9–33 (2003)

    Article  Google Scholar 

  10. Paredes, R., Reyes, N.: Solving similarity joins and range queries in metric spaces with the list of twin clusters. Journal of Discrete Algorithms (JDA) 7, 18–35 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  11. Fredriksson, K.: Exploiting distance coherence to speed up range queries in metric indexes. Information Processing Letters 95(1), 287–292 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  12. Uhlmann, J.: Satisfying general proximity/similarity queries with metric trees. Information Processing Letters, 175–179 (1991)

    Google Scholar 

  13. Chávez, E., Navarro, G.: Probabilistic proximity search: Fighting the curse of dimensionality in metric spaces. Information Processing Letters 85, 39–46 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  14. Fredriksson, K.: Engineering efficient metric indexes. Pattern Recognition Letters (PRL) 28(1), 75–84 (2007)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Fredriksson, K., Braithwaite, B. (2013). Quicker Similarity Joins in Metric Spaces. In: Brisaboa, N., Pedreira, O., Zezula, P. (eds) Similarity Search and Applications. SISAP 2013. Lecture Notes in Computer Science, vol 8199. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-41062-8_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-41062-8_13

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-41061-1

  • Online ISBN: 978-3-642-41062-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics