BM-index: Balanced Metric Space Index Based on Weighted Voronoi Partitioning

Antol, Matej; Dohnal, Vlastislav

doi:10.1007/978-3-030-28730-6_21

Matej Antol¹² &
Vlastislav Dohnal¹²

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11695))

Included in the following conference series:

European Conference on Advances in Databases and Information Systems

943 Accesses
2 Citations

Abstract

Processing large volumes of various data requires index structures that can efficiently organize them on secondary memory. Methods based on pivot permutations have become popular because of their tremendous querying performance. Pivot permutations can be perceived as a recursive Voronoi tessellation with a fixed set of anchors. Its disadvantage is that it cannot adapt to the data distribution well, which leads to cells unbalanced in occupation and unevenly filled disk buckets.

In this paper, we address this issue and propose a novel schema called the BM-index. It exploits a weighted Voronoi partitioning, which is able to respect the data distribution. We present an algorithm to balance the data partitions, and show its correctness. The secondary memory is then accessed efficiently, which is shown in experiments executing k-nearest neighbors queries on a real-life image collection CoPhIR.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Improving Metric Access Methods with Bucket Files

BBoxDB: a distributed and highly available key-bounding-box-value store

Article 15 November 2019

Nearest base-neighbor search on spatial datasets

Article 10 April 2019

References

Amato, G., Gennaro, C., Savino, P.: MI-File: using inverted files for scalable approximate similarity search. Multimed. Tools Appl. 71, 1333–1362 (2014)
Article Google Scholar
Aurenhammer, F., Edelsbrunner, H.: An optimal algorithm for constructing the weighted voronoi diagram in the plane. Pattern Recogn. 17(2), 251–257 (1984)
Article MathSciNet Google Scholar
Aurenhammer, F.: Voronoi diagrams - a survey of a fundamental geometric data structure. ACM Comput. Surv. 23(3), 345–405 (1991)
Article Google Scholar
Batko, M., et al.: Building a web-scale image similarity search system. Multimed. Tools Appl. 47(3), 599–629 (2009)
Article Google Scholar
Brin, S.: Near neighbor search in large metric spaces. In: Proceedings of the 21th International Conference on Very Large Data Bases, VLDB 1995, pp. 574–584. Morgan Kaufmann Publishers Inc., San Francisco (1995)
Google Scholar
Chávez, E., Navarro, G.: A probabilistic spell for the curse of dimensionality. In: Buchsbaum, A.L., Snoeyink, J. (eds.) ALENEX 2001. LNCS, vol. 2153, pp. 147–160. Springer, Heidelberg (2001). https://doi.org/10.1007/3-540-44808-X_12
Chapter Google Scholar
Deepak, P., Prasad, M.D.: Operators for Similarity Search: Semantics, Techniques and Usage Scenarios. Springer, Heidelberg (2015). https://doi.org/10.1007/978-3-319-21257-9
Book Google Scholar
Esuli, A.: MiPai: using the pp-index to build an efficient and scalable similarity search system. In: Second International Workshop on Similarity Search and Applications, SISAP 2009, 29–30 2009, Prague, Czech Republic, pp. 146–148 (2009)
Google Scholar
Esuli, A.: Use of permutation prefixes for efficient and scalable approximate similarity search. Inform. Process. Manage. (IPM) 48(5), 889–902 (2012)
Article Google Scholar
Fagin, R., Kumar, R., Sivakumar, D.: Comparing top k lists. SIAM J. Discrete Math. (2003). https://doi.org/10.1137/s0895480102412856
Article MathSciNet MATH Google Scholar
Figueroa, K., Paredes, R., Reyes, N.: New permutation dissimilarity measures for proximity searching. In: Marchand-Maillet, S., Silva, Y.N., Chávez, E. (eds.) SISAP 2018. LNCS, vol. 11223, pp. 122–133. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-02224-2_10
Chapter Google Scholar
Mic, V., Novak, D., Zezula, P.: Binary sketches for secondary filtering. ACM Trans. Inform. Syst. 36(5), 4:1–4:30 (2018)
Google Scholar
Naidan, B., Boytsov, L., Nyberg, E.: Permutation search methods are efficient, yet faster search is possible. Proc. VLDB Endow. 8(12), 1618–1629 (2015)
Article Google Scholar
Novak, D., Batko, M., Zezula, P.: Metric index: an efficient and scalable solution for precise and approximate similarity search. Inform. Syst. 36, 721–733 (2011)
Article Google Scholar
Novak, D., Zezula, P.: PPP-codes for large-scale similarity searching. In: Hameurlain, A., Küng, J., Wagner, R., Decker, H., Lhotska, L., Link, S. (eds.) Transactions on Large-Scale Data- and Knowledge-Centered Systems XXIV. LNCS, vol. 9510, pp. 61–87. Springer, Heidelberg (2016). https://doi.org/10.1007/978-3-662-49214-7_2
Chapter Google Scholar
Paredes, R.U., Navarro, G.: EGNAT: a fully dynamic metric access method for secondary memory. In: Second International Workshop on Similarity Search and Applications, SISAP 2009, 29–30 2009, Czech Republic, pp. 57–64 (2009)
Google Scholar
Samet, H.: Foundations of Multidimensional and Metric Data Structures. The Morgan Kaufmann Series in Data Management Systems. Morgan Kaufmann, Burlington (2006)
MATH Google Scholar
Skala, M.: Counting distance permutations. J. Discrete Algorithms 7(1), 49–61 (2009). https://doi.org/10.1016/j.jda.2008.09.011. Selected papers from the 1st International Workshop on Similarity Search and Applications
Article MathSciNet MATH Google Scholar
Tellez, E.S., Chavez, E., Navarro, G.: Succinct nearest neighbor search. Inform. Syst. 38(7), 1019–1030 (2013). https://doi.org/10.1016/J.IS.2012.06.005
Article Google Scholar
Zezula, P., Amato, G., Dohnal, V., Batko, M.: Similarity Search: The Metric Space Approach. Advances in Database Systems, vol. 32. Springer, Heidelberg (2005). https://doi.org/10.1007/0-387-29151-2
Book MATH Google Scholar

Download references

Acknowledgment

The publication of this paper and the follow-up research was supported by the ERDF “CyberSecurity, CyberCrime and Critical Information Infrastructures Center of Excellence” (No.CZ.02.1.01/0.0/0.0/16_019/0000822).

Author information

Authors and Affiliations

Faculty of Informatics, Masaryk University, Botanicka 68a, Brno, Czech Republic
Matej Antol & Vlastislav Dohnal

Authors

Matej Antol
View author publications
You can also search for this author in PubMed Google Scholar
Vlastislav Dohnal
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Vlastislav Dohnal .

Editor information

Editors and Affiliations

University of Maribor, Maribor, Slovenia
Tatjana Welzer
Alpen-Adria Universität Klagenfurt, Klagenfurt, Austria
Johann Eder
University of Maribor, Maribor, Slovenia
Vili Podgorelec
University of Maribor, Maribor, Slovenia
Aida Kamišalić Latifić

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Antol, M., Dohnal, V. (2019). BM-index: Balanced Metric Space Index Based on Weighted Voronoi Partitioning. In: Welzer, T., Eder, J., Podgorelec, V., Kamišalić Latifić, A. (eds) Advances in Databases and Information Systems. ADBIS 2019. Lecture Notes in Computer Science(), vol 11695. Springer, Cham. https://doi.org/10.1007/978-3-030-28730-6_21

Download citation

DOI: https://doi.org/10.1007/978-3-030-28730-6_21
Published: 13 August 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-28729-0
Online ISBN: 978-3-030-28730-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics