Skip to main content
Log in

Reference-based indexing for metric spaces with costly distance measures

  • Regular Paper
  • Published:
The VLDB Journal Aims and scope Submit manuscript

Abstract

We consider the problem of similarity search in databases with costly metric distance measures. Given limited main memory, our goal is to develop a reference-based index that reduces the number of comparisons in order to answer a query. The idea in reference-based indexing is to select a small set of reference objects that serve as a surrogate for the other objects in the database. We consider novel strategies for selection of references and assigning references to database objects. For dynamic databases with frequent updates, we propose two incremental versions of the selection algorithm. Our experimental results show that our selection and assignment methods far outperform competing methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Baeza-Yates, R., Perleberg, C.: Fast and practical approximate string matching. In: CPM, pp. 185–192 (1992)

  2. Baeza-Yates, R.A., Cunto, W., Manber, U., Wu, S.: Proximity matching using fixed-queries trees. In: CPM ’94: Proceedings of the 5th Annual Symposium on Combinatorial Pattern Matching, pp. 198–212. Springer, London (1994)

  3. Baeza-Yates R.A. and Navarro G. (1999). Faster approximate string matching. Algorithmica 23(2): 127–158

    Article  MATH  MathSciNet  Google Scholar 

  4. Bairoch A., Boeckmann B., Ferro S. and Gasteiger E. (2004). Swiss-Prot: juggling between evolution and stability. Briefings Bioinf. 1: 39–55

    Article  Google Scholar 

  5. Benson D., Karsch-Mizrachi I., Lipman D., Ostell J., Rapp B. and Wheeler D. (2000). GenBank. Nucl. Acids Res. 28(1): 15–18

    Article  Google Scholar 

  6. Bhattacharya, A., Ljosa, V., Pan, J.Y., Verardo, M.R., Yang, H., Faloutsos, C., Singh, A.K.: ViVo: Visual vocabulary construction for mining biomedical images. In: ICDM, pp. 50–57 (2005)

  7. Bozkaya, T., Ozsoyoglu, M.: Distance-based indexing for high-dimensional metric spaces. In: ACM SIGMOD, pp. 357–368 (1997)

  8. Brisaboa, N.R., Fariña, A., Pedreira, O., Reyes, N.: Similarity search using sparse pivots for efficient multimedia information retrieval. In: ISM ’06: Proceedings of the Eighth IEEE International Symposium on Multimedia (2006)

  9. Burkhard W.A. and Keller R.M. (1973). Some approaches to best-match file searching. Commun. ACM 16(4): 230–236

    Article  MATH  Google Scholar 

  10. Bustos B., Navarro G. and Chavez E. (2003). Pivot selection techniques for proximity searching in metric spaces. Pattern Recogn. Lett. 24(14): 2357–2366

    Article  MATH  Google Scholar 

  11. Chan, S., Martinez, K., Lewis, P.H., Lahanier, C., Stevenson, J.: Handling sub-image queries in content-based retrieval of high resolution art images. In: ICHIM, pp. 157–163 (2001)

  12. Chavez, E., Marroquin, J.L., Baeza-Yates, R.: Spaghettis: an array based algorithm for similarity queries in metric spaces. In: SPIRE ’99: Proceedings of the String Processing and Information Retrieval Symposium & International Workshop on Groupware, p. 38. IEEE Computer Society, Washington (1999)

  13. Chavez E., Marroquin J.L. and Navarro G. (2001). Fixed queries array: a fast and economical data structure for proximity searching. Multimedia Tools Appl. 14(2): 113–135

    Article  MATH  Google Scholar 

  14. Chavez E., Navarro G., Baeza-Yates R. and Marroquin J.L. (2001). Searching in metric spaces. ACM Comput. Surv. 33(3): 273–321

    Article  Google Scholar 

  15. Ciaccia, P., Patella, M., Zezula, P.: M-Tree: An efficient access method for similarity search in metric spaces. In: The VLDB Journal, pp. 426–435 (1997)

  16. Filho, R.F.S., Traina, A.J.M., Traina, C., Faloutsos, C.: Similarity search without tears: The OMNI family of all-purpose access methods. In: ICDE, pp. 623–630 (2001)

  17. Gumbel E.J. (1958). Statistics of Extremes. Columbia University Press, New York

    MATH  Google Scholar 

  18. Gusfield D. (1997). Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology, 1st edn. Cambridge University Press, Cambridge

    Google Scholar 

  19. Hjaltason G.R. and Samet H. (2003). Index-driven similarity search in metric spaces. ACM Trans. Database Syst. 28(4): 517–580

    Article  Google Scholar 

  20. Kahveci, T., Singh, A.: An efficient index structure for string databases. In: VLDB, pp. 351–360. Rome (2001)

  21. Leuken, R.H.V., Veltkamp, R.C., Typke, R.: Selecting vantage objects for similarity indexing. In: ICPR ’06: Proceedings of the 18th International Conference on Pattern Recognition, pp. 453–456. IEEE Computer Society, Washington (2006)

  22. Ljosa, V., Bhattacharya, A., Singh, A.K.: Indexing spatially sensitive distance measures using multi-resolution lower bounds. In: EDBT, pp. 865–883 (2006)

  23. Mico M.L., Oncina J. and Vidal E. (1994). A new version of the nearest-neighbour approximating and eliminating search algorithm (AESA) with linear preprocessing time and memory requirements. Pattern Recogn. Lett. 15: 9–17

    Article  Google Scholar 

  24. Myers E.W. (1986). An o(ND) difference algorithm and its variations. Algorithmica 1(2): 251–266

    Article  MATH  MathSciNet  Google Scholar 

  25. Needleman S.B. and Wunsch C.D. (1970). A general method applicable to the search for similarities in the amino acid sequence of two proteins. JMB 48: 443–53

    Article  Google Scholar 

  26. Rubner, Y., Tomasi, C., Guibas, L.J.: A metric for distributions with applications to image databases. In: ICCV ’98: Proceedings of the Sixth International Conference on Computer Vision, p. 59. IEEE Computer Society, Washington (1998)

  27. Ruiz E.V. (1986). An algorithm for finding nearest neighbours in (approximately) constant average time. Pattern Recogn. Lett. 4(3): 145–157

    Article  Google Scholar 

  28. Samet, H.: Foundations of Multidimensional Metric and Data Structures. Morgan Kaufmann (2006)

  29. Skopal, T., Pokorný, J., Snásel, V.: PM-tree: Pivoting metric tree for similarity search in multimedia databases. In: ADBIS (Local Proceedings) (2004)

  30. Traina, C., Traina, A.J.M., Filho, R.F.S., Faloutsos, C.: How to improve the pruning ability of dynamic metric access methods. In: CIKM, pp. 219–226 (2002)

  31. Traina, C., Traina, A.J.M., Seeger, B., Faloutsos, C.: Slim-trees: high performance metric trees minimizing overlap between nodes. In: EDBT, pp. 51–65 (2000)

  32. Ukkonen E. (1985). Algorithms for approximate string matching. Inf. Control 64: 100–118

    Article  MATH  MathSciNet  Google Scholar 

  33. Venkateswaran, J., Lachwani, D., Kahveci, T., Jermaine, C.M.: Reference-based indexing of sequence databases. In: VLDB, pp. 906–917 (2006)

  34. Vieira, M.R., Traina, C., Chino, F.J.T., Traina, A.J.M.: DBM-tree: a dynamic metric access method sensitive to local density data. In: SBBD, pp. 163–177 (2004)

  35. Vitter J.S. (1985). Random sampling with a reservoir. ACM Trans. Math. Softw. 11(1): 37–57

    Article  MATH  MathSciNet  Google Scholar 

  36. Vleugels, J., Veltkamp, R.: Efficient image retrieval through vantage objects. In: VISUAL, pp. 575–584. Springer, Heidelberg (1999)

  37. Yianilos, P.: Data structures and algorithms for nearest Neighbor search in general metric spaces. In: SODA, pp. 311–321 (1993)

  38. Yianilos, P.: Excluded middle vantage point forests for nearest neighbor search. In: DIMACS Implementation Challenge: Near Neighbor Searches Workshop (1999)

  39. Zezula, P., Amato, G., Dohnal, V., Batko, M.: Similarity Search: the Metric Space Approach. Springer, Heidelberg (2006)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jayendra Venkateswaran.

Additional information

This work is partially supported by the National Science Foundation under Grant No. 0347408.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Venkateswaran, J., Kahveci, T., Jermaine, C. et al. Reference-based indexing for metric spaces with costly distance measures. The VLDB Journal 17, 1231–1251 (2008). https://doi.org/10.1007/s00778-007-0062-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00778-007-0062-1

Keywords

Navigation