Abstract
Similarity searching consists of retrieving the most similar elements in a database. This is a central problem in many real applications, and it becomes intractable when a big database is used. A way to overcome this problem is by getting a few objects as a promissory candidate list of being part of the answer. In this paper, the most relevant and efficient algorithms for high dimensional spaces based on the permutations-technique are compared. Permutation-based algorithm is related to make a permutation of some special objects that allows us to organize the space of the elements in a database. One of the indexes related uses a complete permutation, and the second one utilizes a small part of the permutation and an inverted index.
Our research is focussed on two proposed ideas: the first consists in using a similar inverted index only with less information per object and computing the candidate list in a different way; and the second consists in changing a parameter during querying time in order to achieve a better prediction of the nearest neighbors. Our experiments show that our proposals do serve for implementing a better predictor and that the nearest neighbor can be found computing up to 45% fewer distances per query.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Amato, G., Esuli, A., Falchi, F.: A comparison of pivot selection techniques for permutation-based indexing. Inf. Syst. 52, 176–188 (2015). https://doi.org/10.1016/j.is.2015.01.010
Amato, G., Savino, P.: Approximate similarity search in metric spaces using inverted files. In: Lempel, R., Perego, R., Silvestri, F. (eds.) 3rd International ICST Conference on Scalable Information Systems, INFOSCALE 2008, Vico Equense, Italy, June 4–6 2008, p. 28. ICST/ACM (2008). https://doi.org/10.4108/ICST.INFOSCALE2008.3486
Beyer, K., Goldstein, J., Ramakrishnan, R., Shaft, U.: When is “nearest neighbor” meaningful? In: Beeri, C., Buneman, P. (eds.) ICDT 1999. LNCS, vol. 1540, pp. 217–235. Springer, Heidelberg (1999). https://doi.org/10.1007/3-540-49257-7_15
Chávez, E., Figueroa, K., Navarro, G.: Proximity searching in high dimensional spaces with a proximity preserving order. In: Gelbukh, A., de Albornoz, Á., Terashima-Marín, H. (eds.) MICAI 2005. LNCS (LNAI), vol. 3789, pp. 405–414. Springer, Heidelberg (2005). https://doi.org/10.1007/11579427_41
Chávez, E., Navarro, G.: A probabilistic spell for the curse of dimensionality. In: Buchsbaum, A.L., Snoeyink, J. (eds.) ALENEX 2001. LNCS, vol. 2153, pp. 147–160. Springer, Heidelberg (2001). https://doi.org/10.1007/3-540-44808-X_12
Chávez, E., Navarro, G., Baeza-Yates, R., Marroquín, J.: Proximity searching in metric spaces. ACM Comput. Surv. 33(3), 273–321 (2001)
Esuli, A.: MiPai: using the PP-index to build an efficient and scalable similarity search system. In: Proceedings of the 2nd International Workshop on Similarity Searching and Applications (SISAP 2009), pp. 146–148. IEEE Computer Society (2009)
Esuli, A.: Use of permutation prefixes for efficient and scalable approximate similarity search. Inf. Process. Manage. 48(5), 889–902 (2012). https://doi.org/10.1016/j.ipm.2010.11.011
Figueroa, K., Paredes, R., Reyes, N.: New permutation dissimilarity measures for proximity searching. In: Marchand-Maillet, S., Silva, Y.N., Chávez, E. (eds.) SISAP 2018. LNCS, vol. 11223, pp. 122–133. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-02224-2_10
Mohamed, H., Marchand-Maillet, S.: Quantized ranking for permutation-based indexing. Inf. Syst. 52, 163–175 (2015). https://doi.org/10.1016/j.is.2015.01.009
Patella, M., Ciaccia, P.: Approximate similarity search: a multi-faceted problem. J. Discret. Algorithms 7(1), 36–48 (2009)
Samet, H.: Foundations of Multidimensional and Metric Data Structures. Computer Graphics and Geometic Modeling, 1st edn. Morgan Kaufmann Publishers, Burlington (2006). University of Maryland at College Park
Skala, M.: Counting distance permutations. J. Discret. Algorithms 7(1), 49–61 (2009). https://doi.org/10.1016/j.jda.2008.09.011
Zezula, P., Amato, G., Dohnal, V., Batko, M.: Similarity Search: The Metric Space Approach. Advances in Database Systems. Springer, Heidelberg (2006). https://doi.org/10.1007/0-387-29151-2
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Figueroa, K., Reyes, N., Camarena-Ibarrola, A. (2020). Candidate List Obtained from Metric Inverted Index for Similarity Searching. In: Martínez-Villaseñor, L., Herrera-Alcántara, O., Ponce, H., Castro-Espinoza, F.A. (eds) Advances in Computational Intelligence. MICAI 2020. Lecture Notes in Computer Science(), vol 12469. Springer, Cham. https://doi.org/10.1007/978-3-030-60887-3_3
Download citation
DOI: https://doi.org/10.1007/978-3-030-60887-3_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-60886-6
Online ISBN: 978-3-030-60887-3
eBook Packages: Computer ScienceComputer Science (R0)