skip to main content
10.1145/3019612.3019674acmconferencesArticle/Chapter ViewAbstractPublication PagessacConference Proceedingsconference-collections
research-article

Similarity search through one-dimensional embeddings

Published:03 April 2017Publication History

ABSTRACT

The optimization of similarity queries is often done with specialized data structures known as metric access methods. It has recently been proposed the use of B+trees to index high dimensional data for range and nearest neighbor search in metric spaces. This work1 introduces a new access method called GroupSim and query algorithms for indexing and retrieving complex data by similarity. It employs a single B+tree in order to dynamically index data elements with regard to a set of one-dimensional embeddings. Our strategy uses a new scheme to store distance information, allowing to determine directly if each element lies on the intersection of the embeddings. We compare GroupSim with two related methods, iDistance and OmniB-Forest, and we show empirically the new access method outperforms them with regard to the time required to run similarity queries.

References

  1. G. Amato, A. Esuli, and F. Falchi. A comparison of pivot selection techniques for permutation-based indexing. Information Systems, 52:176 - 188, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. B. Bustos, G. Navarro, and E. Chavez. Pivot selection techniques for proximity searching in metric spaces. Pattern Recognition Letters, 24(14):2357--2366, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. P. Ciaccia, M. Patella, and P. Zezula. M-tree: An efficient access method for similarity search in metric spaces. In International Conference on Very Large Data Bases (VLDB), pages 426--435, Athens, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. H. V. Jagadish, B. C. Ooi, K.-L. Tan, C. Yu, and R. Zhang. idistance: An adaptive b+-tree based indexing method for nearest neighbor search. ACM Trans. Database Syst., 30(2):364--397, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. M. Lichman. UCI Machine Learning Repository, Univ. California, Irvine, http://archive.ics.uci.edu/ml, 2013.Google ScholarGoogle Scholar
  6. M. L. Mico, J. Oncina, and E. Vidal. A new version of the nearest-neighbour approximating and eliminating search algorithm (aesa) with linear preprocessing time and memory requirements. Pattern Recognition Letters, 15(1):9--17, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. A. N. Papadopoulos, K. Tsichlas, A. Gounaris, and Y. Manolopoulos. Access methods. In Computing Handbook, Third Edition: Information Systems and Information Technology, pages 1--18. 2014. Google ScholarGoogle ScholarCross RefCross Ref
  8. O. Pedreira and N. R. Brisaboa. Spatial selection of sparse pivots for similarity search in metric spaces. In Conference on Current Trends in Theory and Practice of Computer Science, LNCS 4362, pages 434--445, Harrachov, Czech Republic, 2007. Springer. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. H. Samet. Foundations of Multidimensional and Metric Data Structures. Morgan Kaufmann, San Francisco, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. R. Socorro, L. Mico, and J. Oncina. A fast pivot-based indexing algorithm for metric spaces. Pattern Recognition Letters, 32(11):1511--1516, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. C. Traina-Jr, R. F. Filho, A. Traina, M. R. Vieira, and C. Faloutsos. The omni-family of all-purpose access methods: A simple and effective way to make similarity search more efficient. The VLDB Journal, 16(4):483--505, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. P. Zezula, G. Amato, V. Dohnal, and M. Batko. Similarity Search: The Metric Space Approach (Advances in Database Systems). Springer, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Similarity search through one-dimensional embeddings

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      SAC '17: Proceedings of the Symposium on Applied Computing
      April 2017
      2004 pages
      ISBN:9781450344869
      DOI:10.1145/3019612

      Copyright © 2017 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 3 April 2017

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate1,650of6,669submissions,25%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader