Skip to main content
Log in

Pivot selection algorithms in metric spaces: a survey and experimental study

  • Regular Paper
  • Published:
The VLDB Journal Aims and scope Submit manuscript

Abstract

Similarity search in metric spaces is used widely in areas such as multimedia retrieval, data mining, data integration, to name but a few. To accelerate metric similarity search, pivot-based indexing is often employed. Pivot-based indexing first computes the distances between data objects and pivots and then exploits filtering techniques that use the triangle inequality on pre-computed distances to prune search space during search. The performance of pivot-based indexing depends on the quality of the pivots used, and many algorithms have been proposed for selecting high-quality pivots. We present a comprehensive empirical study of pivot selection algorithms. Specifically, we classify all existing algorithms into three categories according to the types of distances they use for selecting pivots. We also propose a new pivot selection algorithm that exploits the power law probabilistic distribution. Next, we report on a comprehensive empirical study of the search performance enabled by different pivot selection approaches, using different datasets and indexes, thus contributing new insight into the strengths and weaknesses of existing selection techniques. Finally, we offer advice on how to select appropriate pivot selection algorithms for different settings.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Notes

  1. https://github.com/ZJU-DAILY/PSAMS

  2. http://www.dbs.informatik.uni-muenchen.de/seidl

  3. http://icon.shef.ac.uk/Moby/

  4. http://cophir.isti.cnr.it/

References

  1. Amato, G., Esuli, A., Falchi, F.: A comparison of pivot selection techniques for permutation-based indexing. Inf. Syst. 52, 176–188 (2015)

    Article  Google Scholar 

  2. Angiulli, F., Fassetti, F.: Principal directions-based pivot placement. In: SISAP, pp. 85–90 (2013)

  3. Bozkaya, T., Özsoyoglu, Z.M.: Distance-based indexing for high-dimensional metric spaces. In: SIGMOD, pp. 357–368 (1997)

  4. Bustos, B., Navarro, G., Chávez, E.: Pivot selection techniques for proximity searching in metric spaces. Pattern Recognit. Lett. 24(14), 2357–2366 (2003)

    Article  Google Scholar 

  5. Bustos, B., Pedreira, O., Brisaboa, N.R.: A dynamic pivot selection technique for similarity search. In: SISAP, pp. 105–112 (2008)

  6. Carrara, F., Gennaro, C., Falchi, F., Amato, G.: Learning distance estimators from pivoted embeddings of metric objects. In: SISAP, pp. 361–368 (2020)

  7. Chávez, E., Navarro, G.: A compact space decomposition for effective metric indexing. Pattern Recognit. Lett. 26(9), 1363–1376 (2005)

    Article  Google Scholar 

  8. Chávez, E., Navarro, G., Baeza-Yates, R., Maproquín, J.L.: Proximity searching in metric spaces. ACM Computing Surveys 33(3), 273–321 (2001)

    Article  Google Scholar 

  9. Chen, L., Gao, Y., Li, X., Jensen, C.S., Chen, G.: Efficient metric indexing for similarity search. In: ICDE, pp. 591–602 (2015)

  10. Chen, L., Gao, Y., Song, X., Li, Z., Miao, X., Jensen, C.S.: Indexing metric spaces for exact similarity search. CoRR arXiv:2005.03468 (2020)

  11. Chen, L., Gao, Y., Zheng, B., Jensen, C.S., Yang, H., Yang, K.: Pivot-based metric indexing. PVLDB 10(10), 1058–1069 (2017)

    Google Scholar 

  12. Dallachiesa, M., Palpanas, T., Ilyas, I.F.: Top-k nearest neighbor search in uncertain data series. PVLDB 8(1), 13–24 (2014)

    Google Scholar 

  13. Echihabi, K., Zoumpatianos, K., Palpanas, T., Benbrahim, H.: Return of the lernaean hydra: Experimental evaluation of data series approximate similarity search. PVLDB 13(3), 403–420 (2019)

    Google Scholar 

  14. Figueroa, K., Paredes, R.: An effective permutant selection heuristic for proximity searching in metric spaces. In: MCPR, pp. 102–111 (2014)

  15. Gómez-Tostón, C., Barrena, M., Cortés, Á.: Characterizing the optimal pivots for efficient similarity searches in vector space databases with minkowski distances. Appl. Math. Comput. 328, 203–223 (2018)

    MathSciNet  MATH  Google Scholar 

  16. Hochbaum, D.S., Shmoys, D.B.: A best possible heuristic for the k-center problem. Math. Oper. Res. 10(2), 180–184 (1985)

    Article  MathSciNet  Google Scholar 

  17. Jr., C.T., Filho, R.F.S., Traina, A.J.M., Vieira, M.R., Faloutsos, C. : The omni-family of all-purpose access methods: a simple and effective way to make similarity search more efficient. VLDB J. 16(4), 483–505 (2007)

  18. Kim, S., Lee, D., Cho, H.: An eigenvalue-based pivot selection strategy for improving search efficiency in metric spaces. In: BigComp, pp. 207–214 (2016)

  19. Kimura, M., Saito, K., Ueda, N.: Pivot learning for efficient similarity search. In: KES, pp. 227–234 (2007)

  20. Kurasawa, H., Fukagawa, D., Takasu, A., Adachi, J.: Margin-based pivot selection for similarity search indexes. IEICE. Transactions 93-D(6), 1422–1432 (2010)

    Google Scholar 

  21. Kurasawa, H., Fukagawa, D., Takasu, A., Adachi, J.: Optimal pivot selection method based on the partition and the pruning effect for metric space indexes. IEICE. Transactions 94-D(3), 504–514 (2011)

  22. Leuken, R.H.V., Veltkamp, R.C., Typke, R.: Selecting vantage objects for similarity indexing. In: ICPR, pp. 453–456 (2006)

  23. Li, W., Zhang, Y., Sun, Y., Wang, W., Li, M., Zhang, W., Lin, X.: Approximate nearest neighbor search on high dimensional data-experiments, analyses, and improvement. TKDE 32(8), 1475–1488 (2020)

  24. Mao, R., Miranker, W.L., Miranker, D.P.: Pivot selection: Dimension reduction for distance-based indexing. J. Discrete Algorithms 13, 32–46 (2012)

    Article  MathSciNet  Google Scholar 

  25. Mao, R., Zhang, P., Li, X., Liu, X., Lu, M.: Pivot selection for metric-space indexing. Int. J. Mach. Learn. Cybern. 7(2), 311–323 (2016)

    Article  Google Scholar 

  26. Micó, L., Oncina, J., Carrasco, R.C.: A fast branch & bound nearest neighbour classifier in metric spaces. Pattern Recognit. Lett. 17(7), 731–739 (1996)

    Article  Google Scholar 

  27. Micó, L., Oncina, J., Vidal, E.: A new version of the nearest-neighbour approximating and eliminating search algorithm (AESA) with linear preprocessing time and memory requirements. Pattern Recognit. Lett. 15(1), 9–17 (1994)

    Article  Google Scholar 

  28. Nathan, V., Ding, J., Alizadeh, M., Kraska, T.: Learning multi-dimensional indexes. In: SIGMOD, pp. 985–1000 (2020)

  29. Pedreira, O., Brisaboa, N.R.: Spatial selection of sparse pivots for similarity search in metric spaces. In: SOFSEM, pp. 434–445 (2007)

  30. Sprenger, S., Schäfer, P., Leser, U.: Bb-tree: A main-memory index structure for multidimensional range queries. In: ICDE, pp. 1566–1569 (2019)

  31. Sun, Y., Wang, W., Qin, J., Zhang, Y., Lin, X.: SRS: solving c-approximate nearest neighbor queries in high dimensional euclidean space with a tiny index. PVLDB 8(1), 1–12 (2014)

    Google Scholar 

  32. Sundaram, N., Turmukhametova, A., Satish, N., Mostak, T., Indyk, P., Madden, S., Dubey, P.: Streaming similarity search over one billion tweets using parallel locality-sensitive hashing. PVLDB 6(14), 1930–1941 (2013)

    Google Scholar 

  33. Tosun, U.: A novel indexing scheme for similarity search in metric spaces. Pattern Recognit. Lett. 54, 69–74 (2015)

    Article  Google Scholar 

  34. Venkateswaran, J., Kahveci, T., Jermaine, C.M., Lachwani, D.: Reference-based indexing for metric spaces with costly distance measures. VLDB J. 17(5), 1231–1251 (2008)

    Article  Google Scholar 

  35. Watve, A., Pramanik, S., Jung, S., Lim, C.Y.: Data-independent vantage point selection for range queries. J. Supercomput. 75(12), 7952–7978 (2019)

    Article  Google Scholar 

  36. Yamagishi, Y., Aoyama, K., Saito, K., Ikeda, T.: Pivot generation algorithm with a complete binary tree for efficient exact similarity search. IEICE. Transactions 101-D(1), 142–151 (2018)

    Google Scholar 

Download references

Acknowledgements

This work was supported in part by the NSFC under Grants No. 62025206 and 61972338. Yunjun Gao is the corresponding author of the work.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lu Chen.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhu, Y., Chen, L., Gao, Y. et al. Pivot selection algorithms in metric spaces: a survey and experimental study. The VLDB Journal 31, 23–47 (2022). https://doi.org/10.1007/s00778-021-00691-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00778-021-00691-4

Keywords

Navigation