Abstract
Metric data plays an important role in various settings, for example, in metric-based indexing, clustering, classification, and approximation algorithms in general. Due to measurement error, noise, or an inability to completely gather all the data, a collection of distances may not satisfy the basic metric requirements, most notably the triangle inequality. In this paper we initiate the study of the metric violation distance problem: given a set of pairwise distances, modify the minimum number of distances such that the resulting set forms a metric. Three variants of the problem are considered, based on whether distances are allowed to only decrease, only increase, or the general case which allows both decreases and increases. We show that while the decrease only variant is polynomial time solvable, the increase only and general variants are NP-Complete, and moreover cannot in polynomial time be approximated to any ratio better than the minimum vertex cover problem. We then provide approximation algorithms for the increase only and general variants of the problem, by proving interesting necessary and sufficient conditions on the optimal solution, which are used to approximately reduce to a purely combinatorial problem for which we provide matching asymptotic upper and lower bounds.
Similar content being viewed by others
References
Abraham, I., Bartal, Y., Chan, T.-H., Dhamdhere, K., Gupta, A., Kleinberg, J., Neiman, O., Slivkins, A.: Metric embeddings with relaxed guarantees. In: 46th Annual IEEE Symposium on Foundations of Computer Science (FOCS), pp. 83–100 (2005)
Brickell, J., Dhillon, I., Sra, S., Tropp, J.: The metric nearness problem. SIAM J. Matrix Anal. Appl. 30(1), 375–396 (2008)
Bourgain, J.: On Lipschitz embedding of finite metric spaces in Hilbert space. Israel J. Math. 52(1–2), 46–52 (1985)
Chan, T.-H., Dhamdhere, K., Gupta, A., Kleinberg, J., Slivkins, A.: Metric embeddings with relaxed guarantees. SIAM J. Comput. 38(6), 2303–2329 (2009)
Chung, F., Garrett, M., Graham, R., Shallcross, D.: Distance realization problems with applications to internet tomography. J. Comput. Syst. Sci. 63(3), 432–448 (2001)
Christofides, N.: Worst-case analysis of a new heuristic for the travelling salesman problem. Technical Report 388, Graduate School of Industrial Administration, Carnegie Mellon University (1976)
Candès, E., Recht, B.: Exact matrix completion via convex optimization. Commun. ACM 55(6), 111–119 (2012)
Fan, C., Gilbert, A., Raichel, B., Sonthalia, R., Van Buskirk, G.: Generalized metric repair on graphs. In: 17th Scandinavian Symposium and Workshops on Algorithm Theory (SWAT), volume 162 of LIPIcs, pp. 25:1–25:22 (2020)
Fan, C., Raichel, B., Gregory Van Buskirk. Metric violation distance: Hardness and approximation. In: Proceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pp. 196–209 (2018)
Gilbert, A., Jain, L.: If it ain’t broke, don’t fix it: Sparse metric repair. In: 55th Annual Allerton Conference on Communication, Control, and Computing, pp. 612–619 (2017)
Indyk, P., Matoušek, J.: Low-distortion embeddings of finite metric spaces. In: Handbook of Discrete and Computational Geometry, pp. 177–196. CRC Press (2004)
Khot, S., Regev, O.: Vertex cover might be hard to approximate to within 2-epsilon. J. Comput. Syst. Sci. 74(3), 335–349 (2008)
Linial, N., London, E., Rabinovich, Y.: The geometry of graphs and some of its algorithmic applications. Combinatorica 15(2), 215–245 (1995)
Matoušek, J.: Lecture Notes on Metric Embeddings (2013). http://kam.mff.cuni.cz/~matousek/ba-a4.pdf
Sahni, S., Gonzalez, T.: P-complete approximation problems. J. ACM 23(3), 555–565 (1976)
Sidiropoulos, A., Wang, D., Wang, Y.: Metric embeddings with outliers. In: Proceedings of Twenty-Eighth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pp. 670–689 (2017)
Acknowledgements
The authors thank Sariel Har-Peled for helping us understand the nature of the combinatorial problem arising from our chording procedure in Sect. 4.3. The authors also thank Hsien-Chih Chang, K. Alex Mills, and Amir Nayyeri for helpful discussions. Finally, the authors thank the reviewers for their valuable comments.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
A preliminary version of this paper appeared in the Symposium on Discrete Algorithms (SODA), 2018
Work on this paper was partially supported by NSF CRII Award 1566137 and CAREER Award 1750780.
A Matching Lower Bound
A Matching Lower Bound
The following is a matching lower bound to the combinatorial problem from Lemma 4.7.
Lemma A.1
Let \(G=(V,E)\) be a graph whose edge set E is the union of the edges of a collection of 4-cycles, \({\mathbb {C}}\), such that no two 4-cycles in \({\mathbb {C}}\) can share a chord. (Note this applies to all chords in the complete graph on V, i.e. regardless of whether they appear in E.) Then in the worst case \(|{\mathbb {C}}| = \Omega (m^{4/3})\), where \(m=|E|\).
Proof
Construct a graph \(G=(V,E)\), where V is the disjoint union of four sets of vertices \(X_1\), \(X_2\), \(X_3\), and \(X_4\), each containing exactly t vertices, where t is a value to be determined shortly. The edge set E is sampled as follows. For \(1\le i\le 4\), for each pair \((u,v) \in X_i \times X_{i+1}\) (where \(X_5 = X_1\)), the edge (u, v) is sampled into E independently with probability
Let \({\mathbb {C}}\) be the set of 4-cycles defined by E. Any cycle \(C\in {\mathbb {C}}\) must contain exactly one vertex from each of \(X_1\), \(X_2\), \(X_3\), and \(X_4\). The probability that any quadruple of vertices \((i_1, i_2, i_3,i_4)\in X_1\times X_2\times X_3\times X_4\) defines a cycle in \({\mathbb {C}}\) is \(p^4\). As such, the expected size of \({\mathbb {C}}\) is
Consider such a cycle \(C = (i_1,i_2,i_3,i_4)\) that we know exists in the graph. Any cycle which shares the chord \(\{i_1,i_3\}\) with C clearly shares the vertices \(i_1\) and \(i_3\). Now such a cycle either shares a third vertex or not. The expected number of cycles which share the chord \(\{i_1,i_3\}\) and no other vertex is at most \(t^2p^4\) The expected number of cycles which share the chord \(\{i_1,i_3\}\) and one other vertex is at most \(2tp^2\). Let \(X_C\) be a random variable denoting the number of cycles sharing either chord (i.e., \(\{i_1,i_3\}\) or \(\{i_2,i_4\}\)) with C. Assuming \(tp^2 \le 1\) we have,
Assume further that \(6 t p^2\le 1/10\), then by Markov’s inequality we have
Let Y be a random variable denoting the number of cycles that exists in the graph and don’t share a chord with any other cycle that exists in the graph. We have that
Note that as \(\alpha \) was the expected number of cycles overall, this implies \(\delta = \mathop {\mathbf {E}}\!\left[ {Y} \right] = \Theta (\alpha )\).
Recall that we assumed \(6tp^2 \le 1/10\), which plugging in for p becomes,
Thus setting \(t= m^{2/3}\) (which up to constants minimizes \(\alpha \)) implies the expected number of cycles that do not share a chord is
On the other hand, the expected number of edges is \(4t^2 p = m/2\), and moreover by the Chernoff bound with high probability is at most m. Thus by the probabilistic method there exists a graph where \(|E| \le m\) and the number of 4-cycles which don’t share a chord is \(\Omega (m^{4/3})\). (Note to match the lemma statement, in the above construction one should only keep edges which were in cycles that did not share a chord with any other cycle.) \(\square \)
Rights and permissions
About this article
Cite this article
Fan, C., Raichel, B. & Buskirk, G.V. Metric Violation Distance: Hardness and Approximation. Algorithmica 84, 1441–1465 (2022). https://doi.org/10.1007/s00453-022-00940-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00453-022-00940-0