Abstract
In metric search, worst-case analysis is of little value, as the search invariably degenerates to a linear scan for ill-behaved data. Consequently, much effort has been expended on more nuanced descriptions of what performance might in fact be attainable, including heuristic baselines like the AESA family, as well as statistical proxies such as intrinsic dimensionality. This paper gets to the heart of the matter with an exact characterization of the best performance actually achievable for any given data set and query. Specifically, linear-time objective-preserving reductions are established in both directions between optimal metric search and the minimum dominating set problem, whose greedy approximation becomes the equivalent of an oracle-based AESA, repeatedly selecting the pivot that eliminates the most of the remaining points. As an illustration, the AESA heuristic is adapted to downplay the role of previously eliminated points, yielding some modest performance improvements over the original, as well as its younger relative iAESA2.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
Personal communication, July 2012.
- 3.
Note that the reductions are to and from two different versions of the dominating set problem (the directed and undirected version, respectively). At the price of slightly looser bounds, one could stick with just one of these.
- 4.
This is the worst case given that the optimal number of distance computations is some value \(\gamma \), not the more general, non-informative worst-case of \(\Omega (n)\).
- 5.
Optimal kNN with upper bounds does not map as cleanly to dominating sets.
- 6.
The undirected version is most commonly discussed, with a reduction, e.g., from set covering [13, Th. A.1]. A similar reduction to the directed version is straightforward.
- 7.
In terms of vertices, not edges.
- 8.
Note that only the lower bound is relevant, as the upper bound is always greater than the search radius.
- 9.
If the new distance is allowed to use the original graph as part of its definition, the reduction can be performed in constant time—it is merely a reinterpretation.
- 10.
The upper bound is easily shown by reinterpreting the minimum dominating set problem for a directed graph G = (V, E) as the problem of covering V with the closed out-neighborhoods of G, translating the standard set covering approximation [26].
- 11.
That is, for any range search instance, there is a directed graph with the objects as its nodes for which the equivalence holds. Reducing in the other direction preserves the objective value, but not necessarily the number of nodes/objects.
- 12.
Chávez et al. say that such independence is a “reasonable approximation” [5].
References
Backurs, A., Indyk, P.: Edit distance cannot be computed in strongly subquadratic time (unless SETH is false). In: Proceedings of the 47th Annual ACM Symposium on Theory of Computing (2015). https://doi.org/10.1145/2746539.2746612
Beecks, C., Uysal, M.S., Seidl, T.: Signature quadratic form distance. In: Proceedings of the ACM International Conference on Image and Video Retrieval. ACM, New York, NY, USA (2010). https://doi.org/10.1145/1816041.1816105
Boyar, J., Eidenbenz, S.J., Favrholdt, L.M., Kotrbčík, M., Larsen, K.S.: Online dominating set. Algorithmica 81(5), 1938–1964 (2018). https://doi.org/10.1007/s00453-018-0519-1
Bustos, B., Navarro, G., Chávez, E.: Pivot selection techniques for proximity searching in metric spaces. Pattern Recogn. Lett. 24(14), 2357–2366 (2003). https://doi.org/10.1016/S0167-8655(03)00065-5
Chávez, E., Navarro, G., Baeza-Yates, R., Marroquín, J.L.: Searching in metric spaces. ACM Comput. Surv. 33(3), 273–321 (2001). https://doi.org/10.1145/502807.502808
Chlebík, M., Chlebíková, J.: Approximation hardness of dominating set problems. In: Albers, S., Radzik, T. (eds.) ESA 2004. LNCS, vol. 3221, pp. 192–203. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-30140-0_19
Das, A.: Partial domination in graphs. Iran. J. Sci. Technol. Trans. A Sci. 43(4), 1713–1718 (2018). https://doi.org/10.1007/s40995-018-0618-5
Edsberg, O., Hetland, M.L.: Indexing inexact proximity search with distance regression in pivot space. In: Proceedings of the 3rd International Conference on Similarity Search and Applications (2010). https://doi.org/10.1145/1862344.1862353
Figueroa, K., Navarro, G., Chávez, E.: Metric spaces library (2007). http://www.sisap.org/Metric_Space_Library.html
Figueroa, K., Chávez, E., Navarro, G., Paredes, R.: Speeding up spatial approximation search in metric spaces. J. Exp. Algorithmics 14, 3–6 (2010). https://doi.org/10.1145/1498698.1564506
Ford Jr., L.R., Johnson, S.M.: A tournament problem. Am. Math. Mon. 66(5), 37–40 (1959). https://doi.org/10.1080/00029890.1959.11989306
Gurobi Optimization, LLC.: Gurobi optimizer reference manual (2020). http://gurobi.com
Kann, V.: On the Approximability of NP-complete Optimization Problems. Ph.D. thesis, Department of Numerical Analysis and Computing Science, Royal Institute of Technology, Stockholm (1992)
Lee, C.: Domination in digraphs. J. Korean Math. Soc. 35(4), 843–853 (1998)
Lü, L., Zhou, T.: Link prediction in complex networks: a survey. Physica A stat. Mech. Appl. 390(6), 1150–1170 (2011). https://doi.org/10.1016/j.physa.2010.11.027
Mao, R., Liu, X., Tang, H., Luo, Q., Chen, J., Wu, W.: Multivariate regression for pivot selection: a preliminary study. In: 2011 3rd Symposium on Web Society. IEEE (2011). https://doi.org/10.1109/SWS.2011.6101281
Murakami, T., Takahashi, K., Serita, S., Fujii, Y.: Probabilistic enhancement of approximate indexing in metric spaces. Inf. Syst. 38(7), 1007–1018 (2013). https://doi.org/10.1016/j.is.2012.05.012
Naidan, B., Hetland, M.L.: Shrinking data balls in metric indexes. In: DBKDA (2013)
Navarro, G.: Analyzing metric space indexes: what for? In: Proceedings of the 2009 2nd International Workshop on Similarity Search and Applications, SISAP 2009. IEEE Computer Society (2009). https://doi.org/10.1109/SISAP.2009.17
Pestov, V.: Lower bounds on performance of metric tree indexing schemes for exact similarity search in high dimensions. Algorithmica (2013). https://doi.org/10.1007/s00453-012-9638-2
Skopal, T.: Unified framework for exact and approximate search in dissimilarity spaces. ACM Trans. Database Syst. (TODS) 32(4), 1–45 (2007). https://doi.org/10.1145/1292609.1292619
Socorro, R., Micó, L., Oncina, J.: A fast pivot-based indexing algorithm for metric spaces. Pattern Recogn. Lett. 32(11), 1511–1516 (2011). https://doi.org/10.1016/j.patrec.2011.04.016
Telelis, O.A., Zissimopoulos, V.: Absolute \(o(\log m)\) error in approximating random set covering: an average case analysis. Inf. Process. Lett. 94(4), 171–177 (2005). https://doi.org/10.1016/j.ipl.2005.02.009
Traina Jr., C.: Distance exponent: a new concept for selectivity estimation in metric trees. In: Proceedings of the 16th International Conference on Data Engineering (2000). https://doi.org/10.1109/ICDE.2000.839409
Vidal Ruiz, E.: An algorithm for finding nearest neighbours in (approximately) constant average time. Pattern Recogn. Lett. 4(3), 145–157 (1986). https://doi.org/10.1016/0167-8655(86)90013-9
Williamson, D.P., Shmoys, D.B.: The Design of Approximation Algorithms. Cambridge University Press, Cambridge (2011)
Acknowledgements
The author would like to thank Ole Edsberg, both for discussions providing the initial idea for this paper, and for substantial later input. He would also like to thank Jon Marius Venstad and Bilegsaikhan Naidan for reading early drafts of the paper and providing feedback.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Hetland, M.L. (2020). Optimal Metric Search Is Equivalent to the Minimum Dominating Set Problem. In: Satoh, S., et al. Similarity Search and Applications. SISAP 2020. Lecture Notes in Computer Science(), vol 12440. Springer, Cham. https://doi.org/10.1007/978-3-030-60936-8_9
Download citation
DOI: https://doi.org/10.1007/978-3-030-60936-8_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-60935-1
Online ISBN: 978-3-030-60936-8
eBook Packages: Computer ScienceComputer Science (R0)