Skip to main content
Log in

Efficient \(k\)-closest pair queries in general metric spaces

  • Regular Paper
  • Published:
The VLDB Journal Aims and scope Submit manuscript

Abstract

Given two object sets \(P\) and \(Q\), a k-closest pair \((k\hbox {CP})\) query finds \(k\) closest object pairs from \(P\times Q\). This operation is common in many real-life applications such as GIS, data mining, and recommender systems. Although it has received much attention in the Euclidean space, there is little prior work on the metric space. In this paper, we study the problem of kCP query processing in general metric spaces, namely Metric kCP \((\hbox {M}k\hbox {CP})\) search, and propose several efficient algorithms using dynamic disk-based metric indexes (e.g., M-tree), which can be applied to arbitrary type of data as long as a certain metric distance is defined and satisfies the triangle inequality. Our approaches follow depth-first and/or best-first traversal paradigm(s), employ effective pruning rules based on metric space properties and the counting information preserved in the metric index, take advantage of aggressive pruning and compensation to further boost query efficiency, and derive a node-based cost model for \(\hbox {M}k\hbox {CP}\) retrieval. In addition, we extend our techniques to tackle two interesting variants of \(\hbox {M}k\hbox {CP}\) queries. Extensive experiments with both real and synthetic data sets demonstrate the performance of our proposed algorithms, the effectiveness of our developed pruning rules, and the accuracy of our presented cost model.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21

Similar content being viewed by others

Notes

  1. Available at http://www.census.gov/geo/www/tiger/.

  2. Available at http://www.dbs.informatik.uni-muenchen.de/~seidl.

  3. Available at http://www.sisap.org/metric_space_library.html.

References

  1. Achtert, E., Kriegel, H.P., Kroger, P., Renz, M., Zufle, A.: Reverse \(k\)-nearest neighbor search in dynamic and general metric databases. In: EDBT, pp. 886–897 (2009)

  2. Alvarez, M., Pan, A., Raposo, J., Bellas, F., Cacheda, F.: Using clustering and edit distance techniques for automatic web data extraction. In: WISE, pp. 212–224 (2007)

  3. Angiulli, F., Pizzuti, C.: An approximate algorithm for top-\(k\) closest pairs join query in large high dimensional data. Data Knowl. Eng. 53(3), 263–281 (2005)

    Article  Google Scholar 

  4. Arumugam, S., Jermaine, C.: Closest-point-of-approach join for moving object histories. In: ICDE, pp. 86–95 (2006)

  5. Bohm, C.: A cost model for query processing in high dimensional data spaces. ACM Trans. Database Syst. 25(2), 129–178 (2000)

  6. Breunig, M.M., Kriegel, H.P., Ng, R.T., Sander, J.: LOF: identifying density-based local outliers. In: SIGMOD, pp. 93–104 (2000)

  7. Bustos, B., Navarro, G., Chavez, E.: Pivot selection techniques for proximity searching in metric spaces. Pattern Recognit. Lett. 24(14), 2357–2366 (2003)

    Article  MATH  Google Scholar 

  8. Chavez, E., Navarro, G., Baeza-Yates, R., Marroquin, J.L.: Searching in metric spaces. ACM Comput. Surv. 33(3), 273–321 (2001)

    Article  Google Scholar 

  9. Cheema, M.A., Lin, X., Wang, H., Wang, J., Zhang, W.: A unified approach for computing top-\(k\) pairs in multi-dimensional space. In: ICDE, pp. 1031–1042 (2011)

  10. Chen, C., Sun, W., Zheng, B., Mao, D., Liu, W.: An incremental approach to closest pair queries in spatial networks using best-first search. In: DEXA, pp. 136–143 (2011)

  11. Chen, L., Lian, X.: Efficient processing of metric skyline queries. IEEE Trans. Knowl. Data Eng. 21(3), 351–365 (2009)

    Article  Google Scholar 

  12. Ciaccia, P., Nanni, A., Patella, M.: A query-sensitive cost model for similarity queries with M-tree. In: ADC, pp. 65–76 (1999)

  13. Ciaccia, P., Patella, M.: PAC nearest neighbor queries: approximate and controlled search in high dimensional and metric spaces. In: ICDE, pp. 244–255 (2000)

  14. Ciaccia, P., Patella, M., Zezula, P.: M-tree: an efficient access method for similarity search in metric spaces. In: VLDB, pp. 426–435 (1997)

  15. Ciaccia, P., Patella, M., Zezula, P.: A cost model for similarity queries in metric spaces. In: PODS, pp. 59–68 (1998)

  16. Corral, A., Almendros-Jimnez, J.: A performance comparison of distance-based query algorithms using R-trees in spatial databases. Inf. Sci. 177(11), 2207–2237 (2007)

    Article  Google Scholar 

  17. Corral, A., Manolopoulos, Y., Theodoridis, Y., Vassilakopoulos, M.: Closest pair queries in spatial databases. In: SIGMOD, pp. 189–200 (2000)

  18. Corral, A., Manolopoulos, Y., Theodoridis, Y., Vassilakopoulos, M.: Algorithms for processing \(k\)-closest-pair queries in spatial databases. Data Knowl. Eng. 49(1), 67–104 (2004)

    Article  Google Scholar 

  19. Corral, A., Manolopoulos, Y., Theodoridis, Y., Vassilakopoulos, M.: Cost models for distance joins queries using R-trees. Data Knowl. Eng. 57(1), 1–36 (2006)

    Article  Google Scholar 

  20. Corral, A., Vassilakopoulos, M.: On approximate algorithms for distance-based queries using R-trees. Comput. J. 48(2), 220–238 (2005)

    Article  Google Scholar 

  21. Eppstein, D.: Fast hierarchical clustering and other applications of dynamic closest pairs. J. Exp. Algorithm. 5, article 1 (2000)

  22. Fredriksson, K., Braithwaite, B.: Quicker similarity joins in metric spaces. In: SISAP, pp. 127–140 (2013)

  23. Fuhry, D., Jin, R., D.Zhang: Efficient skyline computation in metric space. In: EDBT, pp. 1042–1051 (2009)

  24. Gutierrez, G., Saez, P.: The \(k\) closest pairs in spatial databases. GeoInformatica 17(4), 543–565 (2013)

    Article  Google Scholar 

  25. Hjaltason, G.R., Samet, H.: Incremental distance join algorithms for spatial databases. In: SIGMOD, pp. 237–248 (1998)

  26. Hjaltason, G.R., Samet, H.: Index-driven similarity search in metric spaces. ACM Trans. Database Syst. 28(4), 517–580 (2003)

    Article  Google Scholar 

  27. Jacox, E.H., Samet, H.: Metric space similarity joins. ACM Trans. Database Syst. 33(2), article 7 (2008)

  28. Kim, Y.J., Patel, J.M.: Performance comparison of the R*-tree and the quadtree for \(k\)nn and distance join queries. IEEE Trans. Knowl. Data Eng. 22(7), 1014–1027 (2010)

    Article  Google Scholar 

  29. Kurasawa, H., Takasu, A., Adachi, J.: Finding the \(k\)-closest pairs in metric spaces. In: NTSS, pp. 8–13 (2011)

  30. Nanopoulos, A., Theodoridis, Y., Manolopoulos, Y.: C2P: Clustering based on closest pairs. In: VLDB, pp. 331–340 (2001)

  31. Papadopoulos, A.N., Nanopoulos, A., Manolopoulos, Y.: Processing distance join queries with constraints. Comput. J. 49(3), 281–296 (2006)

    Article  Google Scholar 

  32. Paredes, R., Reyes, N.: Solving similarity joins and range queries in metric spaces with the list of twin clusters. J. Discrete Algorithms 7(1), 18–35 (2009)

    Article  MATH  MathSciNet  Google Scholar 

  33. Pearson, S.S., Silva, Y.N.: Index-based R-S similarity joins. In: SISAP, pp. 106–112 (2014)

  34. Ristad, E.S., Yianilos, P.N.: Learning string-edit distance. IEEE Trans. Pattern Anal. Mach. Intell. 20(5), 522–532 (1998)

    Article  Google Scholar 

  35. Roumelis, G., Vassilakopoulos, M., Corral, A., Manolopoulos, Y.: A new plane-sweep algorithm for the \(k\)-closest-pairs query. In: SOFSEM, pp. 478–490 (2014)

  36. Samet, H.: Foundations of Multidimensional and Metric Data Structures. Morgan Kaufmann, San Francisco (2006)

    MATH  Google Scholar 

  37. Sarma, A.D., He, Y., Chaudhuri, S.: Clusterjoin: a similarity joins framework using map-reduce. PVLDB 7(12), 1059–1070 (2014)

    Google Scholar 

  38. Shan, J., Zhang, D., Salzberg, B.: On spatial-range closest-pair query. In: SSTD, pp. 252–269 (2003)

  39. Shin, H., Moon, B., Lee, S.: Adaptive multi-stage distance join processing. In: SIGMOD, pp. 343–354 (2000)

  40. Shin, H., Moon, B., Lee, S.: Adaptive and incremental processing for distance join queries. IEEE Trans. Knowl. Data Eng. 15(6), 1561–1578 (2003)

    Article  Google Scholar 

  41. Silva, Y.N., Pearson, S.: Exploiting database similarity joins for metric spaces. In: VLDB, pp. 1922–1925 (2012)

  42. Silva, Y.N., Pearson, S., Cheney, J.A.: Database similarity join for metric spaces. In: SISAP, pp. 266–279 (2013)

  43. Skopal, T., Lokoc, J.: Answering metric skyline queries by PM-tree. In: DATESO, pp. 22–37 (2010)

  44. Skopal, T., Pokorny, J., Snasel, V.: PM-tree: pivoting metric tree for similarity search in multimedia databases. In: ADBIS, pp. 803–815 (2004)

  45. Tao, Y., Yi, K., Sheng, C., Kalnis, P.: Efficient and accurate nearest neighbor and closest pair search in high-dimensional space. ACM Trans. Database Syst. 35(3), article 20 (2010)

  46. Tao, Y., Yiu, M.L., Mamoulis, N.: Reverse nearest neighbor search in metric spaces. IEEE Trans. Knowl. Data Eng. 18(9), 1239–1252 (2006)

    Article  Google Scholar 

  47. Tao, Y., Zhang, J., Papadias, D., Mamoulis, N.: An efficient cost model for optimization of nearest neighbor search in low and medium dimensional spaces. TKDE 16(10), 1169–1184 (2004)

    Google Scholar 

  48. U, L.H., Mamoulis, N., Yiu, M.L.: Computation and monitoring of exclusive closest pairs. IEEE Trans. Knowl. Data Eng. 20(12), 1641–1654 (2008)

    Article  Google Scholar 

  49. Vlachou, A., Doulkeridis, C., Kotidis, Y.: Metric-based similarity search in unstructured peer-to-peer systems. Trans. Large Scale Data Knowl. Cent. Syst. 7100, 28–48 (2012)

    Google Scholar 

  50. Wang, Y., Metwally, A., Parthasarathy, S.: Scalable all-pairs similarity search in metric spaces. In: KDD, pp. 829–837 (2013)

  51. Xiao, C., Wang, W., Lin, X., Shang, H.: Top-\(k\) set similarity joins. In: ICDE, pp. 916–927 (2009)

  52. Yang, C., Lin, K.I.: An index structure for improving closest pairs and related join queries in spatial databases. In: IDEAS, pp. 140–149 (2002)

  53. Zezula, P., Savino, P., Amato, G., Rabitti, F.: Approximate similarity retrieval with M-trees. VLDB J. 7(4), 275–293 (1998)

    Article  Google Scholar 

  54. Zhou, P., Zhang, D., Salzberg, B., Cooperman, G., Kollios, G.: Close pair queries in moving object databases. In: GIS, pp. 2–11 (2005)

Download references

Acknowledgments

Yunjun Gao was supported in part by the National Key Basic Research and Development Program (i.e., 973 Program) No. 2015CB352502, NSFC Grant No. 61379033, the Cyber Innovation Joint Research Center of Zhejiang University, and the Key Project of Zhejiang University Excellent Young Teacher Fund (Zijin Plan). We would like to thank Prof. A. Corral and Prof. T. Skopal for their useful feedback on the source codes of their proposed algorithms in [18, 44]. We also would like to express our gratitude to some anonymous reviewers for their giving valuable and helpful comments to improve the technical quality and presentation of this paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yunjun Gao.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 189 KB)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gao, Y., Chen, L., Li, X. et al. Efficient \(k\)-closest pair queries in general metric spaces. The VLDB Journal 24, 415–439 (2015). https://doi.org/10.1007/s00778-015-0383-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00778-015-0383-4

Keywords

Navigation