Abstract
The entire history and, we dare say, future of similarity search is governed by the underlying notion of partition. A partition is an equivalence relation defined over the space, therefore each element of the space is contained within precisely one of the equivalence classes of the partition. All attempts to search a finite space efficiently, whether exactly or approximately, rely on some set of principles which imply that if the query is within one equivalence class, then one or more other classes either cannot, or probably do not, contain any of its solutions.
In most early research, partitions relied only on the metric postulates, and logarithmic search time could be obtained on low dimensional spaces. In these cases, it was straightforward to identify multiple partitions, each of which gave a relatively high probability of identifying subsets of the space which could not contain solutions. Over time the datasets being searched have become more complex, leading to higher dimensional spaces. It is now understood that even an approximate search in a very high-dimensional space is destined to require \(\mathcal {O}(n)\) time and space.
Almost entirely missing from the research literature however is any analysis of exactly when this effect takes over. In this paper, we make a start on tackling this important issue. Using a quantitative approach, we aim to shed some light on the notion of the exclusion power of partitions, in an attempt to better understand their nature with respect to increasing dimensionality.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
A nearest neighbour query can be formulated as a range query where the query threshold is not known in advance but it is set iteratively as the distance to the current k-th nearest neighbour [16].
- 2.
All results in this article are derived using randomly generated uniformly distributed Euclidean data in different dimensions as stated. All code is available on request from the authors.
References
Chávez, E., Navarro, G.: A compact space decomposition for effective metric indexing. Pattern Recogn. Lett. 26(9), 1363–1376 (2005). https://doi.org/10.1016/j.patrec.2004.11.014
Chávez, E., Navarro, G., Baeza-Yates, R., Marroquín, J.L.: Searching in metric spaces. ACM Comput. Surv. 33(3), 273–321 (2001). https://doi.org/10.1145/502807.502808
Cháivez, E., Navarro, G.: A compact space decomposition for effective metric indexing. Pattern Recogn. Lett. 26(9), 1363–1376 (2005). https://doi.org/10.1016/j.patrec.2004.11.014. https://linkinghub.elsevier.com/retrieve/pii/S0167865504003733
Connor, R., Cardillo, F.A., Vadicamo, L., Rabitti, F.: Hilbert exclusion: improved metric search through finite isometric embeddings. ACM Trans. Inf. Syst.(TOIS) 35(3), 17:1–17:27 (2016). https://doi.org/10.1145/3001583
Connor, R., Vadicamo, L., Rabitti, F.: High-dimensional simplexes for supermetric search. In: Beecks, C., Borutta, F., Kröger, P., Seidl, T. (eds.) SISAP 2017. LNCS, vol. 10609, pp. 96–109. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-68474-1_7
Connor, R., Dearle, A., Vadicamo, L.: Investigating binary partition power in metric query. In: Proceedings of the 30th Italian Symposium on Advanced Database Systems, SEBD 2022, CEUR Workshop Proceedings, vol. 3194, pp. 415–426. Tirrenia (PI), Italy, 19–22 June 2022. http://ceur-ws.org/Vol-3194/paper49.pdf, http://CEUR-WS.org
Connor, R., Vadicamo, L., Cardillo, F.A., Rabitti, F.: Supermetric search. Inf. Syst. 80, 108–123 (2019). https://doi.org/10.1016/j.is.2018.01.002
Hetland, M.L.: Comparison-based indexing from first principles. http://arxiv.org/abs/1908.06318
Hetland, M.L.: Metrics and ambits and sprawls, oh my. In: Satoh, S., et al. (eds.) SISAP 2020. LNCS, vol. 12440, pp. 126–139. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-60936-8_10. http://arxiv.org/abs/2008.09654
Naidan, B., Boytsov, L., Nyberg, E.: Permutation search methods are efficient, yet faster search is possible. Proc. Int. Conf. Very Large Data Bases 8(12), 1618–1629 (2015)
Pestov, V., Stojmirović, A.: Indexing schemes for similarity search: an illustrated paradigm. Fund. Inform. 70(4), 367–385 (2006)
Sadit Tellez, E., Chávez, E.: The list of clusters revisited. In: Carrasco-Ochoa, J.A., Martínez-Trinidad, J.F., Olvera López, J.A., Boyer, K.L. (eds.) MCPR 2012. LNCS, vol. 7329, pp. 187–196. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-31149-9_19
Uhlmann, J.K.: Satisfying general proximity/similarity queries with metric trees. Inf. Process. Lett. 40(4), 175–179 (1991)
Weber, R., Schek, H.J., Blott, S.: A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In: Proceedings International Conference on Very Large Data Bases, vol. 98, pp. 194–205 (1998)
Yianilos, P.N.: Data structures and algorithms for nearest neighbor search in general metric spaces. In: Proceedings of the Fourth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 1993, pp. 311-321. Society for Industrial and Applied Mathematics (1993)
Zezula, P., Amato, G., Dohnal, V., Batko, M.: Similarity Search: The Metric Space Approach, vol. 32. Springer, New York (2006). https://doi.org/10.1007/0-387-29151-2
Acknowledgements
This work was partially funded by AI4Media - A European Excellence Centre for Media, Society, and Democracy (EC, H2020 n. 951911) and by Economic & Social Research Council, ADR UK Programme ES/W010321/1.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Vadicamo, L., Dearle, A., Connor, R. (2022). On the Expected Exclusion Power of Binary Partitions for Metric Search. In: Skopal, T., Falchi, F., Lokoč, J., Sapino, M.L., Bartolini, I., Patella, M. (eds) Similarity Search and Applications. SISAP 2022. Lecture Notes in Computer Science, vol 13590. Springer, Cham. https://doi.org/10.1007/978-3-031-17849-8_9
Download citation
DOI: https://doi.org/10.1007/978-3-031-17849-8_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-17848-1
Online ISBN: 978-3-031-17849-8
eBook Packages: Computer ScienceComputer Science (R0)