SISAP 2023 Indexing Challenge – Learned Metric Index

Slanináková, Terèzia; Procházka, David; Antol, Matej; Olha, Jaroslav; Dohnal, Vlastislav

doi:10.1007/978-3-031-46994-7_24

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14289))

Included in the following conference series:

International Conference on Similarity Search and Applications

589 Accesses

Abstract

This submission into the SISAP Indexing Challenge examines the experimental setup and performance of the Learned Metric Index, which uses an architecture of interconnected learned models to answer similarity queries. An inherent part of this design is a great deal of flexibility in the implementation, such as the choice of particular machine learning models, or their arrangement in the overall architecture of the index. Therefore, for the sake of transparency and reproducibility, this report thoroughly describes the details of the specific Learned Metric Index implementation used to tackle the challenge.

The publication of this paper and the follow-up research was supported by the Czech Science Foundation project No. GF23-07040K (all authors but V. Dohnal) and by the ERDF “CyberSecurity, CyberCrime and Critical Information Infrastructures Center of Excellence” (No.CZ.02.1.01/0.0/0.0/16_019/0000822) – V. Dohnal. Computational resources were supplied by the project “-Infrastruktura CZ” (e-INFRA CZ LM2018140) and ELIXIR CZ Research Infrastructure (ID LM2018131) supported by the Ministry of Education, Youth and Sports of the Czech Republic.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 49.99; Price excludes VAT (USA)

Softcover Book: USD 64.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Data-Driven Learned Metric Index: An Unsupervised Approach

Scaling Learned Metric Index to 100M Datasets

A Unified Framework for Learned Sparse Retrieval

Notes

1.
K-Means is adopted from the FAISS library [6], K-Medoids’ implementation with FasterPAM [13] is used, and DBSCAN [12] is taken from the scikit-learn library, v. 0.24.2.
2.
The neural network is implemented in the PyTorch library (v 1.1.0), using the ReLU activation function and Adam optimization algorithm.
3.
Python’s NumPy library [16] (v. 1.19.5) is used to achieve this optimization.
4.
https://github.com/TerkaSlan/sisap23-laion-challenge-learned-index.

References

Antol, M., Oíha, J., Slanináková, T., Dohnal, V.: Learned metric index - proposition of learned indexing for unstructured data. Inf. Syst. 100 (2021)
Google Scholar
Berrendorf, M., Borutta, F., Kröger, P.: k-Distance approximation for memory-efficient RkNN retrieval. In: Amato, G., Gennaro, C., Oria, V., Radovanović, M. (eds.) SISAP 2019. LNCS, vol. 11807, pp. 57–71. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-32047-8_6
Chapter Google Scholar
Dong, Y., Indyk, P., Razenshteyn, I.P., Wagner, T.: Learning space partitions for nearest neighbor search. In: 8th International Conference on Learning Representations, ICLR, Addis Ababa, Ethiopia, 26–30 April 2020 (2020)
Google Scholar
Galakatos, A., Markovitch, M., Binnig, C., Fonseca, R., Kraska, T.: FITing-tree: a data-aware index structure. In: Proceedings of the International Conference on Management of Data (SIGMOD), pp. 1189–1206. ACM (2019)
Google Scholar
Hünemörder, M., Kröger, P., Renz, M.: Towards a learned index structure for approximate nearest neighbor search query processing. In: Reyes, N., et al. (eds.) SISAP 2021. LNCS, vol. 13058, pp. 95–103. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-89657-7_8
Chapter Google Scholar
Johnson, J., Douze, M., Jégou, H.: Billion-scale similarity search with GPUs. IEEE Trans. Big Data 7(3), 535–547 (2019)
Article Google Scholar
Lin, K.-I., Yang, C.: The Ann-tree: an index for efficient approximate nearest neighbor search. In: Proceedings Seventh International Conference on Database Systems for Advanced Applications. DASFAA 2001, pp. 174–181, April 2001
Google Scholar
Kraska, T., et al.: SageDB: a learned database system. In: CIDR 2019, 9th Biennial Conference on Innovative Data Systems Research, Asilomar, CA, USA, 13–16 January 2019, Online Proceedings (2019). www.cidrdb.org
Kraska, T., Beutel, A., Chi, E.H., Dean, J., Polyzotis, N.: The case for learned index structures. In: Proceedings of the 2018 International Conference on Management of Data. SIGMOD ’18, pp. 489–504. Association for Computing Machinery (2018)
Google Scholar
Muja, M., Lowe, D.G.: Fast approximate nearest neighbors with automatic algorithm configuration. In: International Conference on Computer Vision Theory and Applications (VISAPP), pp. 331–340 (2009)
Google Scholar
Olha, J., Slanináková, T., Gendiar, M., Antol, M., Dohnal, V.: Learned indexing in proteins: substituting complex distance calculations with embedding and clustering techniques. In: Skopal, T., Falchi, F., Lokoč, J., Sapino, M.L., Bartolini, I., Patella, M. (eds.) SISAP 2022. LNCS, vol. 13590, pp. 274–282. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-17849-8_22
Chapter Google Scholar
Sander, J., Ester, M., Kriegel, H.P., Xu, X.: Density-based clustering in spatial databases: the algorithm GDBScan and its applications. Data Min. Knowl. Disc. 2(2), 169–194 (1998)
Article Google Scholar
Schubert, E., Rousseeuw, P.J.: Faster k-medoids clustering: improving the PAM, CLARA, and CLARANS algorithms. In: Amato, G., Gennaro, C., Oria, V., Radovanović, M. (eds.) SISAP 2019. LNCS, vol. 11807, pp. 171–187. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-32047-8_16
Chapter Google Scholar
Slanináková, T., Antol, M., Oľha, J., Kaňa, V., Dohnal, V.: Data-driven learned metric index: an unsupervised approach. In: Reyes, N., et al. (eds.) SISAP 2021. LNCS, vol. 13058, pp. 81–94. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-89657-7_7
Chapter Google Scholar
Slanináková, T., Antol, M., Olíha, J., Dohnal, V., Ladra, S., Martínez-Prieto, M.A.: Reproducible experiments with learned metric index framework. Inf. Syst. 102255 (2023). https://doi.org/10.1016/j.is.2023.102255
Van Der Walt, S., Colbert, S.C., Varoquaux, G.: The numpy array: a structure for efficient numerical computation. Comput. Sci. Eng. 13(2) (2011)
Google Scholar
Zhang, C., Koishida, K., Hansen, J.H.: Text-independent speaker verification based on triplet convolutional neural network embeddings. IEEE/ACM Trans. Audio Speech Lang. Process. 26(9), 1633–1644 (2018)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Informatics and Institute of Computer Science, Masaryk University, Botanická 68a and Šumavská 15, 602 00, Brno, Czech Republic
Terèzia Slanináková, David Procházka, Matej Antol, Jaroslav Olha & Vlastislav Dohnal

Authors

Terèzia Slanináková
View author publications
You can also search for this author in PubMed Google Scholar
David Procházka
View author publications
You can also search for this author in PubMed Google Scholar
Matej Antol
View author publications
You can also search for this author in PubMed Google Scholar
Jaroslav Olha
View author publications
You can also search for this author in PubMed Google Scholar
Vlastislav Dohnal
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Terèzia Slanináková .

Editor information

Editors and Affiliations

University of A Coruña, Coruña, Spain
Oscar Pedreira
Pompeu Fabra University, Barcelona, Spain
Vladimir Estivill-Castro

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Slanináková, T., Procházka, D., Antol, M., Olha, J., Dohnal, V. (2023). SISAP 2023 Indexing Challenge – Learned Metric Index. In: Pedreira, O., Estivill-Castro, V. (eds) Similarity Search and Applications. SISAP 2023. Lecture Notes in Computer Science, vol 14289. Springer, Cham. https://doi.org/10.1007/978-3-031-46994-7_24

Download citation

DOI: https://doi.org/10.1007/978-3-031-46994-7_24
Published: 27 October 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-46993-0
Online ISBN: 978-3-031-46994-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics