Abstract
Processing large volumes of various data requires index structures that can efficiently organize them on secondary memory. Methods based on pivot permutations have become popular because of their tremendous querying performance. Pivot permutations can be perceived as a recursive Voronoi tessellation with a fixed set of anchors. Its disadvantage is that it cannot adapt to the data distribution well, which leads to cells unbalanced in occupation and unevenly filled disk buckets.
In this paper, we address this issue and propose a novel schema called the BM-index. It exploits a weighted Voronoi partitioning, which is able to respect the data distribution. We present an algorithm to balance the data partitions, and show its correctness. The secondary memory is then accessed efficiently, which is shown in experiments executing k-nearest neighbors queries on a real-life image collection CoPhIR.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Amato, G., Gennaro, C., Savino, P.: MI-File: using inverted files for scalable approximate similarity search. Multimed. Tools Appl. 71, 1333–1362 (2014)
Aurenhammer, F., Edelsbrunner, H.: An optimal algorithm for constructing the weighted voronoi diagram in the plane. Pattern Recogn. 17(2), 251–257 (1984)
Aurenhammer, F.: Voronoi diagrams - a survey of a fundamental geometric data structure. ACM Comput. Surv. 23(3), 345–405 (1991)
Batko, M., et al.: Building a web-scale image similarity search system. Multimed. Tools Appl. 47(3), 599–629 (2009)
Brin, S.: Near neighbor search in large metric spaces. In: Proceedings of the 21th International Conference on Very Large Data Bases, VLDB 1995, pp. 574–584. Morgan Kaufmann Publishers Inc., San Francisco (1995)
Chávez, E., Navarro, G.: A probabilistic spell for the curse of dimensionality. In: Buchsbaum, A.L., Snoeyink, J. (eds.) ALENEX 2001. LNCS, vol. 2153, pp. 147–160. Springer, Heidelberg (2001). https://doi.org/10.1007/3-540-44808-X_12
Deepak, P., Prasad, M.D.: Operators for Similarity Search: Semantics, Techniques and Usage Scenarios. Springer, Heidelberg (2015). https://doi.org/10.1007/978-3-319-21257-9
Esuli, A.: MiPai: using the pp-index to build an efficient and scalable similarity search system. In: Second International Workshop on Similarity Search and Applications, SISAP 2009, 29–30 2009, Prague, Czech Republic, pp. 146–148 (2009)
Esuli, A.: Use of permutation prefixes for efficient and scalable approximate similarity search. Inform. Process. Manage. (IPM) 48(5), 889–902 (2012)
Fagin, R., Kumar, R., Sivakumar, D.: Comparing top k lists. SIAM J. Discrete Math. (2003). https://doi.org/10.1137/s0895480102412856
Figueroa, K., Paredes, R., Reyes, N.: New permutation dissimilarity measures for proximity searching. In: Marchand-Maillet, S., Silva, Y.N., Chávez, E. (eds.) SISAP 2018. LNCS, vol. 11223, pp. 122–133. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-02224-2_10
Mic, V., Novak, D., Zezula, P.: Binary sketches for secondary filtering. ACM Trans. Inform. Syst. 36(5), 4:1–4:30 (2018)
Naidan, B., Boytsov, L., Nyberg, E.: Permutation search methods are efficient, yet faster search is possible. Proc. VLDB Endow. 8(12), 1618–1629 (2015)
Novak, D., Batko, M., Zezula, P.: Metric index: an efficient and scalable solution for precise and approximate similarity search. Inform. Syst. 36, 721–733 (2011)
Novak, D., Zezula, P.: PPP-codes for large-scale similarity searching. In: Hameurlain, A., Küng, J., Wagner, R., Decker, H., Lhotska, L., Link, S. (eds.) Transactions on Large-Scale Data- and Knowledge-Centered Systems XXIV. LNCS, vol. 9510, pp. 61–87. Springer, Heidelberg (2016). https://doi.org/10.1007/978-3-662-49214-7_2
Paredes, R.U., Navarro, G.: EGNAT: a fully dynamic metric access method for secondary memory. In: Second International Workshop on Similarity Search and Applications, SISAP 2009, 29–30 2009, Czech Republic, pp. 57–64 (2009)
Samet, H.: Foundations of Multidimensional and Metric Data Structures. The Morgan Kaufmann Series in Data Management Systems. Morgan Kaufmann, Burlington (2006)
Skala, M.: Counting distance permutations. J. Discrete Algorithms 7(1), 49–61 (2009). https://doi.org/10.1016/j.jda.2008.09.011. Selected papers from the 1st International Workshop on Similarity Search and Applications
Tellez, E.S., Chavez, E., Navarro, G.: Succinct nearest neighbor search. Inform. Syst. 38(7), 1019–1030 (2013). https://doi.org/10.1016/J.IS.2012.06.005
Zezula, P., Amato, G., Dohnal, V., Batko, M.: Similarity Search: The Metric Space Approach. Advances in Database Systems, vol. 32. Springer, Heidelberg (2005). https://doi.org/10.1007/0-387-29151-2
Acknowledgment
The publication of this paper and the follow-up research was supported by the ERDF “CyberSecurity, CyberCrime and Critical Information Infrastructures Center of Excellence” (No.CZ.02.1.01/0.0/0.0/16_019/0000822).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Antol, M., Dohnal, V. (2019). BM-index: Balanced Metric Space Index Based on Weighted Voronoi Partitioning. In: Welzer, T., Eder, J., Podgorelec, V., Kamišalić Latifić, A. (eds) Advances in Databases and Information Systems. ADBIS 2019. Lecture Notes in Computer Science(), vol 11695. Springer, Cham. https://doi.org/10.1007/978-3-030-28730-6_21
Download citation
DOI: https://doi.org/10.1007/978-3-030-28730-6_21
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-28729-0
Online ISBN: 978-3-030-28730-6
eBook Packages: Computer ScienceComputer Science (R0)