ABSTRACT
The optimization of similarity queries is often done with specialized data structures known as metric access methods. It has recently been proposed the use of B+trees to index high dimensional data for range and nearest neighbor search in metric spaces. This work1 introduces a new access method called GroupSim and query algorithms for indexing and retrieving complex data by similarity. It employs a single B+tree in order to dynamically index data elements with regard to a set of one-dimensional embeddings. Our strategy uses a new scheme to store distance information, allowing to determine directly if each element lies on the intersection of the embeddings. We compare GroupSim with two related methods, iDistance and OmniB-Forest, and we show empirically the new access method outperforms them with regard to the time required to run similarity queries.
- G. Amato, A. Esuli, and F. Falchi. A comparison of pivot selection techniques for permutation-based indexing. Information Systems, 52:176 - 188, 2015. Google ScholarDigital Library
- B. Bustos, G. Navarro, and E. Chavez. Pivot selection techniques for proximity searching in metric spaces. Pattern Recognition Letters, 24(14):2357--2366, 2003. Google ScholarDigital Library
- P. Ciaccia, M. Patella, and P. Zezula. M-tree: An efficient access method for similarity search in metric spaces. In International Conference on Very Large Data Bases (VLDB), pages 426--435, Athens, 1997. Google ScholarDigital Library
- H. V. Jagadish, B. C. Ooi, K.-L. Tan, C. Yu, and R. Zhang. idistance: An adaptive b+-tree based indexing method for nearest neighbor search. ACM Trans. Database Syst., 30(2):364--397, 2005. Google ScholarDigital Library
- M. Lichman. UCI Machine Learning Repository, Univ. California, Irvine, http://archive.ics.uci.edu/ml, 2013.Google Scholar
- M. L. Mico, J. Oncina, and E. Vidal. A new version of the nearest-neighbour approximating and eliminating search algorithm (aesa) with linear preprocessing time and memory requirements. Pattern Recognition Letters, 15(1):9--17, 1994. Google ScholarDigital Library
- A. N. Papadopoulos, K. Tsichlas, A. Gounaris, and Y. Manolopoulos. Access methods. In Computing Handbook, Third Edition: Information Systems and Information Technology, pages 1--18. 2014. Google ScholarCross Ref
- O. Pedreira and N. R. Brisaboa. Spatial selection of sparse pivots for similarity search in metric spaces. In Conference on Current Trends in Theory and Practice of Computer Science, LNCS 4362, pages 434--445, Harrachov, Czech Republic, 2007. Springer. Google ScholarDigital Library
- H. Samet. Foundations of Multidimensional and Metric Data Structures. Morgan Kaufmann, San Francisco, 2006. Google ScholarDigital Library
- R. Socorro, L. Mico, and J. Oncina. A fast pivot-based indexing algorithm for metric spaces. Pattern Recognition Letters, 32(11):1511--1516, 2011. Google ScholarDigital Library
- C. Traina-Jr, R. F. Filho, A. Traina, M. R. Vieira, and C. Faloutsos. The omni-family of all-purpose access methods: A simple and effective way to make similarity search more efficient. The VLDB Journal, 16(4):483--505, 2007. Google ScholarDigital Library
- P. Zezula, G. Amato, V. Dohnal, and M. Batko. Similarity Search: The Metric Space Approach (Advances in Database Systems). Springer, 2005. Google ScholarDigital Library
Index Terms
- Similarity search through one-dimensional embeddings
Recommendations
D-Cache: Universal Distance Cache for Metric Access Methods
The caching of accessed disk pages has been successfully used for decades in database technology, resulting in effective amortization of I/O operations needed within a stream of query or update requests. However, in modern complex databases, like ...
Similarity Indexing with the SS-tree
ICDE '96: Proceedings of the Twelfth International Conference on Data EngineeringEfficient indexing of high dimensional feature vectors is important to allow visual information systems and a number other applications to scale up to large databases. In this paper, we define this problem as "similarity indexing" and describe the ...
Index-driven similarity search in metric spaces (Survey Article)
Similarity search is a very important operation in multimedia databases and other database applications involving complex objects, and involves finding objects in a data set S similar to a query object q, based on some similarity measure. In this ...
Comments