Abstract
Diversified similarity searching embeds result diversification straight into the query procedure, which boosts the computational performance by orders of magnitude. While metric indexes have a hidden potential for perfecting such procedures, the construction of a suitable, fast, and incremental solution for diversified similarity searching is still an open issue. This study presents a novel index-and-search algorithm, coined diversity browsing, that combines an optimized implementation of the vantage-point tree (VP-Tree) index with the distance browsing search strategy and coverage-based query criteria. Our proposal maps data elements into VP-Tree nodes, which are incrementally evaluated for solving diversified neighborhood searches. Such an evaluation is based not only on the distance between the query and candidate objects but also on distances from the candidate to data elements (called influencers) in the partial search result. Accordingly, we take advantage of those distance-based relationships for pruning VP-Tree branches that are themselves influenced by elements in the result set. As a result, diversity browsing benefits from data indexing for (i) eliminating nodes without valid candidate elements, and (ii) examining the minimum number of partitions regarding the query element. Experiments with real-world datasets show our approach outperformed competitors GMC and GNE by at least 4.91 orders of magnitude, as well as baseline algorithm BRID\(_k\) in at least \(87.51\%\) regarding elapsed query time.
Keywords
M. Bedo—This study was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior – Brasil (CAPES) - Finance Code 001 and Research Support Foundation of Rio de Janeiro State - G. E-26/010.101237/2018.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Aggarwal, C.: Data Mining: The Textbook. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-14142-8
Agrawal, R., Gollapudi, S., Halverson, A., Ieong, S.: Diversifying search results. In: ACM WSDM, pp. 5–14 (2009)
Chávez, E., Navarro, G., Baeza-Yates, R., Marroquín, J.: Searching in metric spaces. CSUR 33(3), 273–321 (2001)
Chen, L., Gao, Y., Song, X., Li, Z., Miao, X., Jensen, C.: Indexing metric spaces for exact similarity search. arXiv preprint arXiv:2005.03468 (2020)
Chen, L., Gao, Y., Zheng, B., Jensen, C., Yang, H., Yang, K.: Pivot-based metric indexing. PVLDB 10(10), 1058–1069 (2017)
Costa, V., Santos, R., Maconald, C., Ounis, I.: Sparse spatial selection for novelty-based search result diversification. In: Grossi, R., Sebastiani, F., Silvestri, F. (eds.) SPIRE 2011. LNCS, vol. 7024, pp. 344–355. Springer, Cham (2011). https://doi.org/10.1007/978-3-642-24583-1_34
Drosou, M., Jagadish, H., Pitoura, E., Stoyanovich, J.: Diversity in big data: a review. Big Data 5(2), 73–84 (2017)
Drosou, M., Pitoura, E.: Multiple radii disc diversity: result diversification based on dissimilarity and coverage. ACM TODS 40(1), 1–43 (2015)
Hetland, M.: The basic principles of metric indexing. In: Coello, C.A.C., Dehuri, S., Ghosh, S. (eds.) Swarm Intelligence for Multi-objective Problems in Data Mining. LNCS, vol. 242, pp. 199–232. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-03625-5_9
Hjaltason, G., Samet, H.: Index-driven similarity search in metric spaces. TODS 28(4), 517–580 (2003)
Novak, D., Zezula, P.: PPP-codes for large-scale similarity searching. In: Hameurlain, A., Küng, J., Wagner, R., Decker, H., Lhotska, L., Link, S. (eds.) Transaction on Large-Scale Data-and Knowledge-Centered System. LNCS, vol. 9510, pp. 61–87. Springer, Cham (2016). https://doi.org/10.1007/978-3-662-49214-7_2
Padmanabhan, D., Deshpande, P.: Operators for Similarity Search - Semantics, Techniques and Usage Scenarios. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-21257-9
Pestov, V.: Lower bounds on performance of metric tree indexing schemes for exact similarity search in high dimensions. Algorithmica 66(2), 310–328 (2013)
Pisinger, D.: Upper bounds and exact algorithms for p-dispersion problems. Comput. Oper. Res. 33(5), 1380–1398 (2006)
Santos, L., Blanco, G., Oliveira, D., Traina, A., Traina Jr., C., Bedo, M.: Exploring diversified similarity with kundaha. In: ACM CIKM, pp. 1903–1906 (2018)
Santos, L., Oliveira, W., Ferreira, M., Traina, A., Traina Jr., C.: Parameter-free and domain-independent similarity search with diversity. In: SSDBM, pp. 1–12 (2013)
Traina Jr., C., Santos, R., Traina, A., Vieira, M., Faloutsos, C.: The Omni-family of all-purpose access methods: a simple and effective way to make similarity search more efficient. VLDB J. 16(4), 483–505 (2007)
Vieira, M., et al.: On query result diversification. In: IEEE ICDE, pp. 1163–1174. IEEE (2011)
Yianilos, P.: Data structures and algorithms for nearest neighbor search in general metric spaces. In: ACM-SIAM SDA, pp. 311–321. SIAM (1993)
Zheng, K., Wang, H., Qi, Z., Li, J., Gao, H.: A survey of query result diversification. Knowl. Inf. Syst. 51(1), 1–36 (2016). https://doi.org/10.1007/s10115-016-0990-4
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Jasbick, D., Santos, L., de Oliveira, D., Bedo, M. (2020). Some Branches May Bear Rotten Fruits: Diversity Browsing VP-Trees. In: Satoh, S., et al. Similarity Search and Applications. SISAP 2020. Lecture Notes in Computer Science(), vol 12440. Springer, Cham. https://doi.org/10.1007/978-3-030-60936-8_11
Download citation
DOI: https://doi.org/10.1007/978-3-030-60936-8_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-60935-1
Online ISBN: 978-3-030-60936-8
eBook Packages: Computer ScienceComputer Science (R0)