Skip to main content
Log in

An eigenvalue-based pivot selection strategy for efficient indexing and searching in metric spaces

  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

Pivots are used widely during indexing and searching in metric spaces. We maintain the distances from pivots to data objects to be indexed so the pre-computed distances can be used to prune unpromising objects during the search process. The search efficiency depends on the pivots used, but choosing good pivots is a challenging task. In this paper, we propose a new pivot selection method that incrementally chooses pivots using an eigenvalue-based uncorrelatedness scoring function. We also present a GPU implementation for computing the uncorrelatedness score in order to accelerate the pivot selection process. Our experimental results demonstrated that the proposed method performed better than other previously described pivot selection methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

References

  1. Beecks, C., Lokoč, J., Seidl, T., Skopal, T.: Indexing the signature quadratic form distance for efficient content-based multimedia retrieval. In: Proceedings of the 1st ACM International Conference on Multimedia Retrieval (ICMR), p. 24 (2011)

  2. Böhm, C., Berchtold, S., Keim, D.A.: Searching in high-dimensional spaces-index structures for improving the performance of multimedia databases. ACM Comput. Surv. 33(3), 322–373 (2001)

    Article  Google Scholar 

  3. Böhm, C., Braunmüller, B., Breunig, M., Kriegel, H.P.: High performance clustering based on the similarity join. In: Proceedings of the 9th International Conference on Information and Knowledge Management (CIKM), pp. 298–305 (2000)

  4. Brin, S.: Near neighbor search in large metric spaces. In: Proceedings of the 21st Conference on Very Large Databases (VLDB), pp. 574–584 (1995)

  5. Bustos, B., Navarro, G., Chavez, E.: Pivot selection techniques for proximity searching in metric spaces. Pattern Recognit. Lett. 24, 2357–2366 (2003)

    Article  MATH  Google Scholar 

  6. Chavez, E., Navarro, G., Baeza-Yates, R., Marroquin, J.: Searching in metric spaces. ACM Comput. Surv. 33(3), 273–321 (2001)

    Article  Google Scholar 

  7. Chen, L., Gao, Y., Li, X., Jensen, C.S., Chen, G.: Efficient metric indexing for similarity search. In: Proceedings of the 31st IEEE International Conference on Data Engineering (ICDE), pp. 591–602 (2015)

  8. Coskun, B., Giura, P.: Mitigating SMS spam by online detection of repetitive near-duplicate messages. In: Proceedings of the IEEE International Conference on Communication (ICC), pp. 999–1004 (2012)

  9. Farago, A., Linder, T., Lugosi, G.: Fast nearest-neighbor search in dissimilarity spaces. IEEE Trans. Pattern Anal. Mach. Intell. 15(9), 957–962 (1999)

    Article  Google Scholar 

  10. Traina Jr., C., Filho, R.F.S., Traina, A.J.M., Vieira, M.R.: The omni-family of all purpose access method: a simple and effective way to make similarity search more efficient. VLDB J 16, 483–505 (2007)

    Article  Google Scholar 

  11. Kim, S.H., Lee, D.Y., Cho, H.G.: An eigenvalue-based pivot selection strategy for improving search efficiency in metric spaces. In: Proceedings of the 2016 International Conference on Big Data and Smart Computing (BigComp), pp. 207–214 (2016)

  12. Mao, R., Miranker, L., Miranker, D.P.: Pivot selection: Dimension reduction for distance-based indexing. J Discret. Algorithms 13, 32–46 (2012)

    Article  MATH  MathSciNet  Google Scholar 

  13. Maon, R., Liu, S., Xu, H., Zhang, D., Miranker, D.P.: On data partitioning in tree structure metric-space indexes. In: Lecture Notes in Computer Science: Database Systems for Advanced Applications, vol. 8421, pp. 141–155 (2014)

  14. Micó, M.L., Oncina, J.: A new version of the nearest-neighbour approximating and eliminating search algorithm (aesa) with linear preprocessing time and memory requirements. Pattern Recognit. Lett. 15(1), 9–17 (1994)

    Article  Google Scholar 

  15. Sarawagi, S., Bhamidipaty, A.: Interactive deduplication using active learning. In: Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pp. 269–278 (2002)

  16. Savary, A.: Typographical nearest-neighbor search in a finite-state lexicon and its application to spelling correction. In: Proceedings of the 6th International Conference on Implementation and Application of Automata (CIAA), pp. 251–260 (2001)

  17. Uhlmann, J.: Satisfying general proximity/similarity queries with metric trees. Inf. Process. Lett 40, 175–179 (1991)

    Article  MATH  Google Scholar 

  18. Uribe-Paredes, R., Valero-Lara, ., Arias, E., Sánchez, J.L., Cazorla, D.: A GPU-based implementation for range queries on spaghettis data structure. In: Proceedings of the 11th International Conference on Computational Science and Its Applications. Lecture Notes in Computer Science, vol. 6782, pp. 615–629 (2011)

  19. Yoon, T., Park, S.Y., Cho, H.G.: A smart filtering system for newly coined profanities by using approximate string alignment. In: Proceedings of the 10th IEEE International Conference on Computer and Information Technology (CIT), pp. 643–650 (2010)

  20. Zhou, X., Wang, G., Zhou, X., Yu, G.: Bm+-tree: A hyperplane-based index method for high-dimensional metric spaces. In: Lecture Notes in Computer Science: Database Systems for Advanced Applications, vol. 3453, pp. 398–409 (2005)

Download references

Acknowledgements

This research was supported by Basic Research Laboratory through the National Research Foundations of Korea funded by the Ministry of Science, ICT and Future Planning (NRF-2015R1A4A1041584). A preliminary version of this paper appeared in [11].

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hwan-Gue Cho.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kim, SH., Lee, DY. & Cho, HG. An eigenvalue-based pivot selection strategy for efficient indexing and searching in metric spaces. Cluster Comput 20, 3643–3655 (2017). https://doi.org/10.1007/s10586-017-1153-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10586-017-1153-4

Keywords

Navigation