Skip to main content
Log in

Metric Violation Distance: Hardness and Approximation

  • Published:
Algorithmica Aims and scope Submit manuscript

Abstract

Metric data plays an important role in various settings, for example, in metric-based indexing, clustering, classification, and approximation algorithms in general. Due to measurement error, noise, or an inability to completely gather all the data, a collection of distances may not satisfy the basic metric requirements, most notably the triangle inequality. In this paper we initiate the study of the metric violation distance problem: given a set of pairwise distances, modify the minimum number of distances such that the resulting set forms a metric. Three variants of the problem are considered, based on whether distances are allowed to only decrease, only increase, or the general case which allows both decreases and increases. We show that while the decrease only variant is polynomial time solvable, the increase only and general variants are NP-Complete, and moreover cannot in polynomial time be approximated to any ratio better than the minimum vertex cover problem. We then provide approximation algorithms for the increase only and general variants of the problem, by proving interesting necessary and sufficient conditions on the optimal solution, which are used to approximately reduce to a purely combinatorial problem for which we provide matching asymptotic upper and lower bounds.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Abraham, I., Bartal, Y., Chan, T.-H., Dhamdhere, K., Gupta, A., Kleinberg, J., Neiman, O., Slivkins, A.: Metric embeddings with relaxed guarantees. In: 46th Annual IEEE Symposium on Foundations of Computer Science (FOCS), pp. 83–100 (2005)

  2. Brickell, J., Dhillon, I., Sra, S., Tropp, J.: The metric nearness problem. SIAM J. Matrix Anal. Appl. 30(1), 375–396 (2008)

    Article  MathSciNet  Google Scholar 

  3. Bourgain, J.: On Lipschitz embedding of finite metric spaces in Hilbert space. Israel J. Math. 52(1–2), 46–52 (1985)

    Article  MathSciNet  Google Scholar 

  4. Chan, T.-H., Dhamdhere, K., Gupta, A., Kleinberg, J., Slivkins, A.: Metric embeddings with relaxed guarantees. SIAM J. Comput. 38(6), 2303–2329 (2009)

    Article  MathSciNet  Google Scholar 

  5. Chung, F., Garrett, M., Graham, R., Shallcross, D.: Distance realization problems with applications to internet tomography. J. Comput. Syst. Sci. 63(3), 432–448 (2001)

    Article  MathSciNet  Google Scholar 

  6. Christofides, N.: Worst-case analysis of a new heuristic for the travelling salesman problem. Technical Report 388, Graduate School of Industrial Administration, Carnegie Mellon University (1976)

  7. Candès, E., Recht, B.: Exact matrix completion via convex optimization. Commun. ACM 55(6), 111–119 (2012)

    Article  Google Scholar 

  8. Fan, C., Gilbert, A., Raichel, B., Sonthalia, R., Van Buskirk, G.: Generalized metric repair on graphs. In: 17th Scandinavian Symposium and Workshops on Algorithm Theory (SWAT), volume 162 of LIPIcs, pp. 25:1–25:22 (2020)

  9. Fan, C., Raichel, B., Gregory Van Buskirk. Metric violation distance: Hardness and approximation. In: Proceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pp. 196–209 (2018)

  10. Gilbert, A., Jain, L.: If it ain’t broke, don’t fix it: Sparse metric repair. In: 55th Annual Allerton Conference on Communication, Control, and Computing, pp. 612–619 (2017)

  11. Indyk, P., Matoušek, J.: Low-distortion embeddings of finite metric spaces. In: Handbook of Discrete and Computational Geometry, pp. 177–196. CRC Press (2004)

  12. Khot, S., Regev, O.: Vertex cover might be hard to approximate to within 2-epsilon. J. Comput. Syst. Sci. 74(3), 335–349 (2008)

    Article  Google Scholar 

  13. Linial, N., London, E., Rabinovich, Y.: The geometry of graphs and some of its algorithmic applications. Combinatorica 15(2), 215–245 (1995)

    Article  MathSciNet  Google Scholar 

  14. Matoušek, J.: Lecture Notes on Metric Embeddings (2013). http://kam.mff.cuni.cz/~matousek/ba-a4.pdf

  15. Sahni, S., Gonzalez, T.: P-complete approximation problems. J. ACM 23(3), 555–565 (1976)

    Article  MathSciNet  Google Scholar 

  16. Sidiropoulos, A., Wang, D., Wang, Y.: Metric embeddings with outliers. In: Proceedings of Twenty-Eighth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pp. 670–689 (2017)

Download references

Acknowledgements

The authors thank Sariel Har-Peled for helping us understand the nature of the combinatorial problem arising from our chording procedure in Sect. 4.3. The authors also thank Hsien-Chih Chang, K. Alex Mills, and Amir Nayyeri for helpful discussions. Finally, the authors thank the reviewers for their valuable comments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Benjamin Raichel.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

A preliminary version of this paper appeared in the Symposium on Discrete Algorithms (SODA), 2018

Work on this paper was partially supported by NSF CRII Award 1566137 and CAREER Award 1750780.

A Matching Lower Bound

A Matching Lower Bound

The following is a matching lower bound to the combinatorial problem from Lemma 4.7.

Lemma A.1

Let \(G=(V,E)\) be a graph whose edge set E is the union of the edges of a collection of 4-cycles, \({\mathbb {C}}\), such that no two 4-cycles in \({\mathbb {C}}\) can share a chord. (Note this applies to all chords in the complete graph on V, i.e. regardless of whether they appear in E.) Then in the worst case \(|{\mathbb {C}}| = \Omega (m^{4/3})\), where \(m=|E|\).

Proof

Construct a graph \(G=(V,E)\), where V is the disjoint union of four sets of vertices \(X_1\), \(X_2\), \(X_3\), and \(X_4\), each containing exactly t vertices, where t is a value to be determined shortly. The edge set E is sampled as follows. For \(1\le i\le 4\), for each pair \((u,v) \in X_i \times X_{i+1}\) (where \(X_5 = X_1\)), the edge (uv) is sampled into E independently with probability

$$\begin{aligned} p = \frac{m}{8t^2} \ll 1. \end{aligned}$$

Let \({\mathbb {C}}\) be the set of 4-cycles defined by E. Any cycle \(C\in {\mathbb {C}}\) must contain exactly one vertex from each of \(X_1\), \(X_2\), \(X_3\), and \(X_4\). The probability that any quadruple of vertices \((i_1, i_2, i_3,i_4)\in X_1\times X_2\times X_3\times X_4\) defines a cycle in \({\mathbb {C}}\) is \(p^4\). As such, the expected size of \({\mathbb {C}}\) is

$$\begin{aligned} \alpha = p^4t^4 = \!\left( {\frac{m}{8t^2}}\right) ^4t^4 = \!\left( {\frac{m}{8t}}\right) ^4. \end{aligned}$$

Consider such a cycle \(C = (i_1,i_2,i_3,i_4)\) that we know exists in the graph. Any cycle which shares the chord \(\{i_1,i_3\}\) with C clearly shares the vertices \(i_1\) and \(i_3\). Now such a cycle either shares a third vertex or not. The expected number of cycles which share the chord \(\{i_1,i_3\}\) and no other vertex is at most \(t^2p^4\) The expected number of cycles which share the chord \(\{i_1,i_3\}\) and one other vertex is at most \(2tp^2\). Let \(X_C\) be a random variable denoting the number of cycles sharing either chord (i.e., \(\{i_1,i_3\}\) or \(\{i_2,i_4\}\)) with C. Assuming \(tp^2 \le 1\) we have,

$$\begin{aligned} \mathop {\mathbf {E}}\!\left[ {X_C | \text { { C} exists} } \right] \le 2(2tp^2 + t^2 p^4) \le 2(2 + t p^2)tp^2 \le 6tp^2. \end{aligned}$$

Assume further that \(6 t p^2\le 1/10\), then by Markov’s inequality we have

$$\begin{aligned} \beta (C)&= \mathop {\mathbf {Pr}}\!\left[ { \text {no cycle shares a chord with } C \mid C \text { exists}} \right] \\&= 1 - \mathop {\mathbf {Pr}}\!\left[ {X_C \ge 1 | C \text {exists}} \right] \ge \frac{9}{10}. \end{aligned}$$

Let Y be a random variable denoting the number of cycles that exists in the graph and don’t share a chord with any other cycle that exists in the graph. We have that

$$\begin{aligned} \delta&= \mathop {\mathbf {E}}\!\left[ {Y} \right] \\&=\sum _{C} \mathop {\mathbf {Pr}}\!\left[ { (\text {no cycle shares chord with } C) \cap \!\left( { C \text { exists}}\right) } \right] \\&=\sum _{C} \beta (C) \cdot \mathop {\mathbf {Pr}}\!\left[ { C \text { exists}} \right] \ge \frac{9}{10} \alpha . \end{aligned}$$

Note that as \(\alpha \) was the expected number of cycles overall, this implies \(\delta = \mathop {\mathbf {E}}\!\left[ {Y} \right] = \Theta (\alpha )\).

Recall that we assumed \(6tp^2 \le 1/10\), which plugging in for p becomes,

$$\begin{aligned} \frac{1}{10}\ge 6t \!\left( {\frac{m}{8t^2}}\right) ^2 \ge \frac{3}{32} \frac{m^2}{t^3} \quad \implies \quad t \ge (30/32)^{1/3} m^{2/3}. \end{aligned}$$

Thus setting \(t= m^{2/3}\) (which up to constants minimizes \(\alpha \)) implies the expected number of cycles that do not share a chord is

$$\begin{aligned} \delta = \Theta (p^4t^4)&= \Theta \!\left( { \!\left( {\frac{m}{t^2}}\right) ^4 t^4 }\right) = \Theta \!\left( { \frac{m^4}{t^4} }\right) \\&= \Theta \!\left( { \frac{m^4}{m^{8/3}} }\right) = \Theta \!\left( { m^{4/3}}\right) . \end{aligned}$$

On the other hand, the expected number of edges is \(4t^2 p = m/2\), and moreover by the Chernoff bound with high probability is at most m. Thus by the probabilistic method there exists a graph where \(|E| \le m\) and the number of 4-cycles which don’t share a chord is \(\Omega (m^{4/3})\). (Note to match the lemma statement, in the above construction one should only keep edges which were in cycles that did not share a chord with any other cycle.) \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Fan, C., Raichel, B. & Buskirk, G.V. Metric Violation Distance: Hardness and Approximation. Algorithmica 84, 1441–1465 (2022). https://doi.org/10.1007/s00453-022-00940-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00453-022-00940-0

Keywords

Navigation