Abstract
Similarity search is a common approach to support new applications that deal with complex data (e.g., images, videos, georeferenced data, etc.). As a consequence, appropriate indexing structures to support this task have been proposed in the literature. Recently, graph-based methods have shown to be very efficient for approximate similarity search. However, some of the main types of graphs used still suffer from two main drawbacks: (i) slow construction, and (ii) inaccurate retrieval. To reduce these drawbacks, in this paper, we propose the HGraph method. HGraph is a divide-and-conquer method for building graphs for similarity search that recursively partitions the input dataset and connect vertices across partitions at different levels. The method can be used with different types of graphs proposed in the literature to speed up the graph construction time as well as to increase the approximate search results quality through long-range edges connecting pivots of different partitions. We present experimental results using real datasets that show that HGraph applied to the k-NNG graph was able to decrease the construction time while increasing the approximate search recall when compared to the k-NNG. Regarding the application of HGraph to the NSW graph, the query recall also increased, however with a higher computational cost. An analysis of different combinations of the tested methods demonstrated HGraph query times given a recall rate were always among the top results regarding different setups.
This work has been supported by the Brazilian agencies CAPES and CNPq.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Amato, G., Esuli, A., Falchi, F.: A comparison of pivot selection techniques for permutation-based indexing. Inf. Syst. 52(C), 176–188 (2015)
Barioni, M.C.N., Kaster, D.D.S., Razente, H.L., Traina, A.J., Júnior, C.T.: Advanced Database Query Systems. IGI Global (2011)
Boytsov, L., Naidan, B.: Engineering efficient and effective non-metric space library. In: Brisaboa, N., Pedreira, O., Zezula, P. (eds.) SISAP 2013. LNCS, vol. 8199, pp. 280–293. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-41062-8_28
Bustos, B., Navarro, G., Chavez, E.: Pivot selection techniques for proximity searching in metric spaces. In: SCCC, pp. 33–40, November 2001
Chávez, E., Navarro, G., Baeza-Yates, R., Marroquín, J.L.: Searching in metric spaces. ACM Comput. Surv. 33(3), 273–321 (2001)
Chen, J., Fangand, H.R., Saad, Y.: Fast approximate KNN graph construction for high dimensional data via recursive Lanczos bisection. J. Mach. Learn. Res. 10, 1989–2012 (2009)
Hajebi, K., Abbasi-Yadkori, Y., Shahbazi, H., Zhang, H.: Fast approximate nearest-neighbor search with k-nearest neighbor graph. In: IJCAI, pp. 1312–1317 (2011)
Lecun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)
Malkov, Y., et al.: Approximate nearest neighbor algorithm based on navigable small world graphs. Inf. Syst. 45, 61–68 (2014)
Ocsa, A., Bedregal, C., Cuadros-Vargas, E.: A new approach for similarity queries using neighborhood graphs. In: Brazilian Symposium on Databases, pp. 131–142 (2007)
Ortega, M., Rui, Y., Chakrabarti, K., Porkaew, K., Mehrotra, S., Huang, T.S.: Supporting ranked boolean similarity queries in MARS. TKDE 10(6), 905–925 (1998)
Paredes, R., Chávez, E.: Using the k-nearest neighbor graph for proximity searching in metric spaces. In: Consens, M., Navarro, G. (eds.) SPIRE 2005. LNCS, vol. 3772, pp. 127–138. Springer, Heidelberg (2005). https://doi.org/10.1007/11575832_14
Park, H.S., Jun, C.H.: A simple and fast algorithm for k-medoids clustering. Expert Syst. Appl. 36(2, Part 2), 3336–3341 (2009)
Shimomura, L.C., Vieira, M.R., Kaster, D.S.: Performance analysis of graph-based methods for exact and approximate similarity search in metric spaces. In: Marchand-Maillet, S., Silva, Y.N., Chávez, E. (eds.) SISAP 2018. LNCS, vol. 11223, pp. 18–32. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-02224-2_2
Traina Jr., C., Filho, R.F., Traina, A.J., Vieira, M.R., Faloutsos, C.: The Omni-family of all-purpose access methods: a simple and effective way to make similarity search more efficient. VLDB J. 16(4), 483–505 (2007)
Uhlmann, J.K.: Satisfying general proximity/similarity queries with metric trees. Inf. Process. Lett. 40(4), 175–179 (1991)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Shimomura, L.C., Kaster, D.S. (2019). HGraph: A Connected-Partition Approach to Proximity Graphs for Similarity Search. In: Hartmann, S., Küng, J., Chakravarthy, S., Anderst-Kotsis, G., Tjoa, A., Khalil, I. (eds) Database and Expert Systems Applications. DEXA 2019. Lecture Notes in Computer Science(), vol 11706. Springer, Cham. https://doi.org/10.1007/978-3-030-27615-7_8
Download citation
DOI: https://doi.org/10.1007/978-3-030-27615-7_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-27614-0
Online ISBN: 978-3-030-27615-7
eBook Packages: Computer ScienceComputer Science (R0)