research-article

Similarity search through one-dimensional embeddings

Authors:
Humberto Razente

Universidade Federal de Uberlândia, Brazil

Universidade Federal de Uberlândia, Brazil
View Profile

,
Rafael L. Bernardes Lima

Universidade Federal de Uberlândia, Brazil

Universidade Federal de Uberlândia, Brazil
View Profile

,
Maria Camila N. Barioni

Universidade Federal de Uberlândia, Brazil

Universidade Federal de Uberlândia, Brazil
View Profile

SAC '17: Proceedings of the Symposium on Applied ComputingApril 2017Pages 874–879https://doi.org/10.1145/3019612.3019674

Published:03 April 2017Publication History

SAC '17: Proceedings of the Symposium on Applied Computing

Pages 874–879

ABSTRACT

The optimization of similarity queries is often done with specialized data structures known as metric access methods. It has recently been proposed the use of B+trees to index high dimensional data for range and nearest neighbor search in metric spaces. This work¹ introduces a new access method called GroupSim and query algorithms for indexing and retrieving complex data by similarity. It employs a single B+tree in order to dynamically index data elements with regard to a set of one-dimensional embeddings. Our strategy uses a new scheme to store distance information, allowing to determine directly if each element lies on the intersection of the embeddings. We compare GroupSim with two related methods, iDistance and OmniB-Forest, and we show empirically the new access method outperforms them with regard to the time required to run similarity queries.

References

G. Amato, A. Esuli, and F. Falchi. A comparison of pivot selection techniques for permutation-based indexing. Information Systems, 52:176 - 188, 2015. Google ScholarDigital Library
B. Bustos, G. Navarro, and E. Chavez. Pivot selection techniques for proximity searching in metric spaces. Pattern Recognition Letters, 24(14):2357--2366, 2003. Google ScholarDigital Library
P. Ciaccia, M. Patella, and P. Zezula. M-tree: An efficient access method for similarity search in metric spaces. In International Conference on Very Large Data Bases (VLDB), pages 426--435, Athens, 1997. Google ScholarDigital Library
H. V. Jagadish, B. C. Ooi, K.-L. Tan, C. Yu, and R. Zhang. idistance: An adaptive b+-tree based indexing method for nearest neighbor search. ACM Trans. Database Syst., 30(2):364--397, 2005. Google ScholarDigital Library
M. Lichman. UCI Machine Learning Repository, Univ. California, Irvine, http://archive.ics.uci.edu/ml, 2013.Google Scholar
M. L. Mico, J. Oncina, and E. Vidal. A new version of the nearest-neighbour approximating and eliminating search algorithm (aesa) with linear preprocessing time and memory requirements. Pattern Recognition Letters, 15(1):9--17, 1994. Google ScholarDigital Library
A. N. Papadopoulos, K. Tsichlas, A. Gounaris, and Y. Manolopoulos. Access methods. In Computing Handbook, Third Edition: Information Systems and Information Technology, pages 1--18. 2014. Google ScholarCross Ref
O. Pedreira and N. R. Brisaboa. Spatial selection of sparse pivots for similarity search in metric spaces. In Conference on Current Trends in Theory and Practice of Computer Science, LNCS 4362, pages 434--445, Harrachov, Czech Republic, 2007. Springer. Google ScholarDigital Library
H. Samet. Foundations of Multidimensional and Metric Data Structures. Morgan Kaufmann, San Francisco, 2006. Google ScholarDigital Library
R. Socorro, L. Mico, and J. Oncina. A fast pivot-based indexing algorithm for metric spaces. Pattern Recognition Letters, 32(11):1511--1516, 2011. Google ScholarDigital Library
C. Traina-Jr, R. F. Filho, A. Traina, M. R. Vieira, and C. Faloutsos. The omni-family of all-purpose access methods: A simple and effective way to make similarity search more efficient. The VLDB Journal, 16(4):483--505, 2007. Google ScholarDigital Library
P. Zezula, G. Amato, V. Dohnal, and M. Batko. Similarity Search: The Metric Space Approach (Advances in Database Systems). Springer, 2005. Google ScholarDigital Library

Index Terms

Similarity search through one-dimensional embeddings
1. Information systems
  1. Data management systems
    1. Data structures
      1. Data access methods
        Multidimensional range search

Recommendations

D-Cache: Universal Distance Cache for Metric Access Methods

The caching of accessed disk pages has been successfully used for decades in database technology, resulting in effective amortization of I/O operations needed within a stream of query or update requests. However, in modern complex databases, like ...
Read More
Similarity Indexing with the SS-tree
ICDE '96: Proceedings of the Twelfth International Conference on Data Engineering

Efficient indexing of high dimensional feature vectors is important to allow visual information systems and a number other applications to scale up to large databases. In this paper, we define this problem as "similarity indexing" and describe the ...
Read More
Index-driven similarity search in metric spaces (Survey Article)

Similarity search is a very important operation in multimedia databases and other database applications involving complex objects, and involves finding objects in a data set S similar to a query object q, based on some similarity measure. In this ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SAC '17: Proceedings of the Symposium on Applied Computing
April 2017
2004 pages
ISBN:9781450344869
DOI:10.1145/3019612
Conference Chair:
Sung Y. Shin
South Dakota State University
,
Program Chairs:
Dongwan Shin
New Mexico Tech
,
Maria Lencastre
University of Pernambuco, Brazil
Copyright © 2017 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 3 April 2017
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
metric access methods
metric indexing
nearest neighbor queries
similarity serach
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate1,650of6,669submissions,25%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 4
  Total Citations
  View Citations
- 75
  Total Downloads
- Downloads (Last 12 months)2
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Similarity search through one-dimensional embeddings

SAC '17: Proceedings of the Symposium on Applied Computing

ABSTRACT

References

Cited By

Index Terms

Recommendations

D-Cache: Universal Distance Cache for Metric Access Methods

Similarity Indexing with the SS-tree

Index-driven similarity search in metric spaces (Survey Article)