Abstract
Pivots are used widely during indexing and searching in metric spaces. We maintain the distances from pivots to data objects to be indexed so the pre-computed distances can be used to prune unpromising objects during the search process. The search efficiency depends on the pivots used, but choosing good pivots is a challenging task. In this paper, we propose a new pivot selection method that incrementally chooses pivots using an eigenvalue-based uncorrelatedness scoring function. We also present a GPU implementation for computing the uncorrelatedness score in order to accelerate the pivot selection process. Our experimental results demonstrated that the proposed method performed better than other previously described pivot selection methods.
Similar content being viewed by others
References
Beecks, C., Lokoč, J., Seidl, T., Skopal, T.: Indexing the signature quadratic form distance for efficient content-based multimedia retrieval. In: Proceedings of the 1st ACM International Conference on Multimedia Retrieval (ICMR), p. 24 (2011)
Böhm, C., Berchtold, S., Keim, D.A.: Searching in high-dimensional spaces-index structures for improving the performance of multimedia databases. ACM Comput. Surv. 33(3), 322–373 (2001)
Böhm, C., Braunmüller, B., Breunig, M., Kriegel, H.P.: High performance clustering based on the similarity join. In: Proceedings of the 9th International Conference on Information and Knowledge Management (CIKM), pp. 298–305 (2000)
Brin, S.: Near neighbor search in large metric spaces. In: Proceedings of the 21st Conference on Very Large Databases (VLDB), pp. 574–584 (1995)
Bustos, B., Navarro, G., Chavez, E.: Pivot selection techniques for proximity searching in metric spaces. Pattern Recognit. Lett. 24, 2357–2366 (2003)
Chavez, E., Navarro, G., Baeza-Yates, R., Marroquin, J.: Searching in metric spaces. ACM Comput. Surv. 33(3), 273–321 (2001)
Chen, L., Gao, Y., Li, X., Jensen, C.S., Chen, G.: Efficient metric indexing for similarity search. In: Proceedings of the 31st IEEE International Conference on Data Engineering (ICDE), pp. 591–602 (2015)
Coskun, B., Giura, P.: Mitigating SMS spam by online detection of repetitive near-duplicate messages. In: Proceedings of the IEEE International Conference on Communication (ICC), pp. 999–1004 (2012)
Farago, A., Linder, T., Lugosi, G.: Fast nearest-neighbor search in dissimilarity spaces. IEEE Trans. Pattern Anal. Mach. Intell. 15(9), 957–962 (1999)
Traina Jr., C., Filho, R.F.S., Traina, A.J.M., Vieira, M.R.: The omni-family of all purpose access method: a simple and effective way to make similarity search more efficient. VLDB J 16, 483–505 (2007)
Kim, S.H., Lee, D.Y., Cho, H.G.: An eigenvalue-based pivot selection strategy for improving search efficiency in metric spaces. In: Proceedings of the 2016 International Conference on Big Data and Smart Computing (BigComp), pp. 207–214 (2016)
Mao, R., Miranker, L., Miranker, D.P.: Pivot selection: Dimension reduction for distance-based indexing. J Discret. Algorithms 13, 32–46 (2012)
Maon, R., Liu, S., Xu, H., Zhang, D., Miranker, D.P.: On data partitioning in tree structure metric-space indexes. In: Lecture Notes in Computer Science: Database Systems for Advanced Applications, vol. 8421, pp. 141–155 (2014)
Micó, M.L., Oncina, J.: A new version of the nearest-neighbour approximating and eliminating search algorithm (aesa) with linear preprocessing time and memory requirements. Pattern Recognit. Lett. 15(1), 9–17 (1994)
Sarawagi, S., Bhamidipaty, A.: Interactive deduplication using active learning. In: Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pp. 269–278 (2002)
Savary, A.: Typographical nearest-neighbor search in a finite-state lexicon and its application to spelling correction. In: Proceedings of the 6th International Conference on Implementation and Application of Automata (CIAA), pp. 251–260 (2001)
Uhlmann, J.: Satisfying general proximity/similarity queries with metric trees. Inf. Process. Lett 40, 175–179 (1991)
Uribe-Paredes, R., Valero-Lara, ., Arias, E., Sánchez, J.L., Cazorla, D.: A GPU-based implementation for range queries on spaghettis data structure. In: Proceedings of the 11th International Conference on Computational Science and Its Applications. Lecture Notes in Computer Science, vol. 6782, pp. 615–629 (2011)
Yoon, T., Park, S.Y., Cho, H.G.: A smart filtering system for newly coined profanities by using approximate string alignment. In: Proceedings of the 10th IEEE International Conference on Computer and Information Technology (CIT), pp. 643–650 (2010)
Zhou, X., Wang, G., Zhou, X., Yu, G.: Bm+-tree: A hyperplane-based index method for high-dimensional metric spaces. In: Lecture Notes in Computer Science: Database Systems for Advanced Applications, vol. 3453, pp. 398–409 (2005)
Acknowledgements
This research was supported by Basic Research Laboratory through the National Research Foundations of Korea funded by the Ministry of Science, ICT and Future Planning (NRF-2015R1A4A1041584). A preliminary version of this paper appeared in [11].
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Kim, SH., Lee, DY. & Cho, HG. An eigenvalue-based pivot selection strategy for efficient indexing and searching in metric spaces. Cluster Comput 20, 3643–3655 (2017). https://doi.org/10.1007/s10586-017-1153-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10586-017-1153-4