Abstract
This submission into the SISAP Indexing Challenge examines the experimental setup and performance of the Learned Metric Index, which uses an architecture of interconnected learned models to answer similarity queries. An inherent part of this design is a great deal of flexibility in the implementation, such as the choice of particular machine learning models, or their arrangement in the overall architecture of the index. Therefore, for the sake of transparency and reproducibility, this report thoroughly describes the details of the specific Learned Metric Index implementation used to tackle the challenge.
The publication of this paper and the follow-up research was supported by the Czech Science Foundation project No. GF23-07040K (all authors but V. Dohnal) and by the ERDF “CyberSecurity, CyberCrime and Critical Information Infrastructures Center of Excellence” (No.CZ.02.1.01/0.0/0.0/16_019/0000822) – V. Dohnal. Computational resources were supplied by the project “-Infrastruktura CZ” (e-INFRA CZ LM2018140) and ELIXIR CZ Research Infrastructure (ID LM2018131) supported by the Ministry of Education, Youth and Sports of the Czech Republic.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
The neural network is implemented in the PyTorch library (v 1.1.0), using the ReLU activation function and Adam optimization algorithm.
- 3.
Python’s NumPy library [16] (v. 1.19.5) is used to achieve this optimization.
- 4.
References
Antol, M., Oíha, J., Slanináková, T., Dohnal, V.: Learned metric index - proposition of learned indexing for unstructured data. Inf. Syst. 100 (2021)
Berrendorf, M., Borutta, F., Kröger, P.: k-Distance approximation for memory-efficient RkNN retrieval. In: Amato, G., Gennaro, C., Oria, V., Radovanović, M. (eds.) SISAP 2019. LNCS, vol. 11807, pp. 57–71. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-32047-8_6
Dong, Y., Indyk, P., Razenshteyn, I.P., Wagner, T.: Learning space partitions for nearest neighbor search. In: 8th International Conference on Learning Representations, ICLR, Addis Ababa, Ethiopia, 26–30 April 2020 (2020)
Galakatos, A., Markovitch, M., Binnig, C., Fonseca, R., Kraska, T.: FITing-tree: a data-aware index structure. In: Proceedings of the International Conference on Management of Data (SIGMOD), pp. 1189–1206. ACM (2019)
Hünemörder, M., Kröger, P., Renz, M.: Towards a learned index structure for approximate nearest neighbor search query processing. In: Reyes, N., et al. (eds.) SISAP 2021. LNCS, vol. 13058, pp. 95–103. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-89657-7_8
Johnson, J., Douze, M., Jégou, H.: Billion-scale similarity search with GPUs. IEEE Trans. Big Data 7(3), 535–547 (2019)
Lin, K.-I., Yang, C.: The Ann-tree: an index for efficient approximate nearest neighbor search. In: Proceedings Seventh International Conference on Database Systems for Advanced Applications. DASFAA 2001, pp. 174–181, April 2001
Kraska, T., et al.: SageDB: a learned database system. In: CIDR 2019, 9th Biennial Conference on Innovative Data Systems Research, Asilomar, CA, USA, 13–16 January 2019, Online Proceedings (2019). www.cidrdb.org
Kraska, T., Beutel, A., Chi, E.H., Dean, J., Polyzotis, N.: The case for learned index structures. In: Proceedings of the 2018 International Conference on Management of Data. SIGMOD ’18, pp. 489–504. Association for Computing Machinery (2018)
Muja, M., Lowe, D.G.: Fast approximate nearest neighbors with automatic algorithm configuration. In: International Conference on Computer Vision Theory and Applications (VISAPP), pp. 331–340 (2009)
Olha, J., Slanináková, T., Gendiar, M., Antol, M., Dohnal, V.: Learned indexing in proteins: substituting complex distance calculations with embedding and clustering techniques. In: Skopal, T., Falchi, F., Lokoč, J., Sapino, M.L., Bartolini, I., Patella, M. (eds.) SISAP 2022. LNCS, vol. 13590, pp. 274–282. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-17849-8_22
Sander, J., Ester, M., Kriegel, H.P., Xu, X.: Density-based clustering in spatial databases: the algorithm GDBScan and its applications. Data Min. Knowl. Disc. 2(2), 169–194 (1998)
Schubert, E., Rousseeuw, P.J.: Faster k-medoids clustering: improving the PAM, CLARA, and CLARANS algorithms. In: Amato, G., Gennaro, C., Oria, V., Radovanović, M. (eds.) SISAP 2019. LNCS, vol. 11807, pp. 171–187. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-32047-8_16
Slanináková, T., Antol, M., Oľha, J., Kaňa, V., Dohnal, V.: Data-driven learned metric index: an unsupervised approach. In: Reyes, N., et al. (eds.) SISAP 2021. LNCS, vol. 13058, pp. 81–94. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-89657-7_7
Slanináková, T., Antol, M., Olíha, J., Dohnal, V., Ladra, S., Martínez-Prieto, M.A.: Reproducible experiments with learned metric index framework. Inf. Syst. 102255 (2023). https://doi.org/10.1016/j.is.2023.102255
Van Der Walt, S., Colbert, S.C., Varoquaux, G.: The numpy array: a structure for efficient numerical computation. Comput. Sci. Eng. 13(2) (2011)
Zhang, C., Koishida, K., Hansen, J.H.: Text-independent speaker verification based on triplet convolutional neural network embeddings. IEEE/ACM Trans. Audio Speech Lang. Process. 26(9), 1633–1644 (2018)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Slanináková, T., Procházka, D., Antol, M., Olha, J., Dohnal, V. (2023). SISAP 2023 Indexing Challenge – Learned Metric Index. In: Pedreira, O., Estivill-Castro, V. (eds) Similarity Search and Applications. SISAP 2023. Lecture Notes in Computer Science, vol 14289. Springer, Cham. https://doi.org/10.1007/978-3-031-46994-7_24
Download citation
DOI: https://doi.org/10.1007/978-3-031-46994-7_24
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-46993-0
Online ISBN: 978-3-031-46994-7
eBook Packages: Computer ScienceComputer Science (R0)