Skip to main content

On Index-Free Similarity Search in Metric Spaces

  • Conference paper
Database and Expert Systems Applications (DEXA 2009)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5690))

Included in the following conference series:

Abstract

Metric access methods (MAMs) serve as a tool for speeding similarity queries. However, all MAMs developed so far are index-based; they need to build an index on a given database. The indexing itself is either static (the whole database is indexed at once) or dynamic (insertions/deletions are supported), but there is always a preprocessing step needed. In this paper, we propose D-file, the first MAM that requires no indexing at all. This feature is especially beneficial in domains like data mining, streaming databases, etc., where the production of data is much more intensive than querying. Thus, in such environments the indexing is the bottleneck of the entire production/querying scheme. The idea of D-file is an extension of the trivial sequential file (an abstraction over the original database, actually) by so-called D-cache. The D-cache is a main-memory structure that keeps track of distance computations spent by processing all similarity queries so far (within a runtime session). Based on the distances stored in D-cache, the D-file can cheaply determine lower bounds of some distances while the distances alone have not to be explicitly computed, which results in faster queries. Our experimental evaluation shows that query efficiency of D-file is comparable to the index-based state-of-the-art MAMs, however, for zero indexing costs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Benson, D.A., Karsch-Mizrachi, I., Lipman, D.J., Ostell, J., Rapp, B.A., Wheeler, D.L.: Genbank. Nucleic Acids Res. 28(1), 15–18 (2000)

    Article  Google Scholar 

  2. Böhm, C., Berchtold, S., Keim, D.: Searching in High-Dimensional Spaces – Index Structures for Improving the Performance of Multimedia Databases. ACM Computing Surveys 33(3), 322–373 (2001)

    Article  Google Scholar 

  3. Brin, S.: Near neighbor search in large metric spaces. In: Proc. 21st Conference on Very Large Databases (VLDB 1995), pp. 574–584. Morgan Kaufmann, San Francisco (1995)

    Google Scholar 

  4. Carson, S.D.: A system for adaptive disk rearrangement. Software - Practice and Experience (SPE) 20(3), 225–242 (1990)

    Article  Google Scholar 

  5. Chávez, E., Navarro, G., Baeza-Yates, R., Marroquín, J.L.: Searching in metric spaces. ACM Computing Surveys 33(3), 273–321 (2001)

    Article  Google Scholar 

  6. Ciaccia, P., Patella, M., Zezula, P.: M-tree: An Efficient Access Method for Similarity Search in Metric Spaces. In: VLDB 1997, pp. 426–435 (1997)

    Google Scholar 

  7. Effelsberg, W., Haerder, T.: Principles of database buffer management. ACM Transactions on Database Systems (TODS) 9(4), 560–595 (1984)

    Article  Google Scholar 

  8. Falchi, F., Lucchese, C., Orlando, S., Perego, R., Rabitti, F.: A metric cache for similarity search. In: LSDS-IR 2008: Proceeding of the 2008 ACM workshop on Large-Scale distributed systems for information retrieval, pp. 43–50. ACM Press, New York (2008)

    Chapter  Google Scholar 

  9. Falchi, F., Lucchese, C., Orlando, S., Perego, R., Rabitti, F.: Caching content-based queries for robust and efficient image retrieval. In: EDBT 2009: Proceedings of the 12th International Conference on Extending Database Technology, pp. 780–790. ACM Press, New York (2009)

    Google Scholar 

  10. Hettich, S., Bay, S.: The UCI KDD archive (1999), http://kdd.ics.uci.edu

  11. Hjaltason, G.R., Samet, H.: Index-driven similarity search in metric spaces. ACM Trans. Database Syst. 28(4), 517–580 (2003)

    Article  Google Scholar 

  12. Samet, H.: Foundations of Multidimensional and Metric Data Structures. Morgan Kaufmann, San Francisco (2006)

    MATH  Google Scholar 

  13. Skopal, T.: Pivoting M-tree: A Metric Access Method for Efficient Similarity Search. In: Proceedings of the 4th annual workshop DATESO, Desná, Czech Republic, ISBN 80-248-0457-3, also available at CEUR, vol. 98, pp. 21–31 (2004) ISSN 1613-0073, http://www.ceur-ws.org/Vol-98

  14. Skopal, T., Pokorný, J., Snášel, V.: Nearest Neighbours Search Using the PM-Tree. In: Zhou, L.-z., Ooi, B.-C., Meng, X. (eds.) DASFAA 2005. LNCS, vol. 3453, pp. 803–815. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  15. Uhlmann, J.: Satisfying general proximity/similarity queries with metric trees. Information Processing Letters 40(4), 175–179 (1991)

    Article  MATH  Google Scholar 

  16. Vitter, J.S.: External memory algorithms and data structures: dealing with massive data. ACM Computing Surveys 33(2), 209–271 (2001)

    Article  Google Scholar 

  17. Weber, R., Schek, H.-J., Blott, S.: A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In: VLDB 1998: Proceedings of the 24rd International Conference on Very Large Data Bases, pp. 194–205. Morgan Kaufmann Publishers Inc., San Francisco (1998)

    Google Scholar 

  18. Zezula, P., Amato, G., Dohnal, V., Batko, M.: Similarity Search: The Metric Space Approach (Advances in Database Systems). Springer, Secaucus (2005)

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Skopal, T., Bustos, B. (2009). On Index-Free Similarity Search in Metric Spaces. In: Bhowmick, S.S., Küng, J., Wagner, R. (eds) Database and Expert Systems Applications. DEXA 2009. Lecture Notes in Computer Science, vol 5690. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03573-9_44

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-03573-9_44

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-03572-2

  • Online ISBN: 978-3-642-03573-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics