Abstract
The “relative earth mover distance” is a technical term introduced by Valiant and Valiant (43rd STOC, 2011), and extensively used in their work. They claimed that, for every two distributions, the relative earth mover distance upper-bounds the variation distance up to relabeling, but this claim was not used in their work. The claim appears as a special case of a result proved by Valiant and Valiant in a later work (48th STOC, 2016), but we found their proof too terse. The proof presented here is merely an elaboration of (this special case of) their proof.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
Specifically, for two distributions presented by their probability functions \(p,q:D\!\rightarrow \![0,1]\), their variation distance equals \(0.5\cdot \sum _{i\in D}|p(i)-q(i)|\), which in turn equals \(\min _{S\subseteq D}\{p(S)-q(S)\}\), where \(p(S)=\sum _{i\in S}p(i)\). The set S may be viewed as the set of samples on which an observer (as discussed next) outputs the verdict 1.
- 3.
Here and in the sequel, the logarithm is to base 2. The proof of Theorem 3.1 as presented in Sect. 3 remains valid for any base \(b\in (1,e]\); our only reference to this base is that it (i.e., b) should satisfy \(\log _bz>1-(1/z)\) for every \(z>1\). It seems that Valiant and Valiant do mean to take \(b=2\) (although other parts of their text suggest \(b=e\)). Indeed, both \(b=2\) and \(b=e\) seems natural choices.
- 4.
As stated in Footnote 3, we assume that the logarithm is to base \(b\in (1,e]\). Indeed, here we use \(\log _b z+(1/z)>1\) for all \(z>1\), and this is the only place in the proof in which the choice of b matters.
- 5.
That is, letting \(\pi _p\) and \(\pi _q\) be permutations over [n] such that \(p(\pi _p(j))\le p(\pi _p(j+1))\) and \(q(\pi _q(j))\le q(\pi _q(j+1))\) for every \(j\in [n-1]\), in the \(i^\mathrm{th}\) iteration we transport one unit from location \(p(\pi _p(i))\) of \(h_p\) to location \(q(\pi _q(i))\) of \(h_q\).
- 6.
To see that (1) holds, note that the cost of \(\ell '\) equals the cost of \(\ell \) plus \(c\cdot |x^*-y^*|-c\cdot |x'-y^*|-c\cdot |x^*-y'|+c\cdot |x'-y'|\). Hence, we need to verify that the added value is not positive; equivalently, that \(|x^*-y^*|-|x^*-y'|\le |x'-y^*|-|x'-y'|\). Consider the following cases:
-
1.
The diagonal line \(y=x\) does not cross the rectangle spanned by \((x^*,y^*)\) (i.e., either \(y'\le x^*\) or \(y^*\ge x'\)). If \(y'\le x^*\), then \(|y^*-x^*|-|y'-x^*|=y'-y^*=|y^*-x'|-|y'-x'|\), and otherwise \(|y^*-x^*|-|y'-x^*|=-(y'-y^*)=|y^*-x'|-|y'-x'|\).
-
2.
The diagonal line \(y=x\) separates one corner-point of the rectangle from the other three corner-points (e.g., \(y'>x^*\) but \(y<x\) for \((x,y)\in \{(x^*,y^*),(x',y^*),(x',y')\}\)). If \(y'>x^*\), then \(|y^*-x^*|-|y'-x^*| < y'-y^* = |y^*-x'|-|y'-x'|\), and similarly for the case that \((x',y^*)\) is separated.
-
3.
The diagonal line \(y=x\) crosses both horizontal lines of the rectangle (i.e., \(y^*,y'\in [x^*,x']\)). In this case, \(|y^*-x^*|-|y'-x^*|=-(y'-y^*)\) and \(|y^*-x'|-|y'-x'|=y'-y^*\).
-
4.
The diagonal line \(y=x\) crosses both vertical lines of the rectangle (i.e., \(x^*,x'\in [y^*,y']\)). In this case \(|y^*-x^*|-|y'-x^*|<|y^*-x'|-|y'-x'|\), since \(|y^*-x^*|<|y^*-x'|\) and \(|y'-x^*|>|y'-x'|\).
To see that (2) holds, recall that \(\ell '(x,y)=\ell (x,y)=m(x,y)\) for every \((x,y)<(x^*,y^*)\).
-
1.
References
Goldreich, O.: Introduction to Property Testing. Cambridge University Press, Cambridge (2017)
Goldreich, O., Ron, D.: On sample-based testers. In: 6th Innovations in Theoretical Computer Science, pp. 337–345 (2015)
Valiant, G., Valiant, P.: Estimating the unseen: an \(n/\log (n)\)-sample estimator for entropy and support size, shown optimal via new CLTs. In: 43rd ACM Symposium on the Theory of Computing, pp. 685–694 (2011). See ECCC TR10-180 for the algorithm, and TR10-179 for the lower bound
Valiant, G., Valiant, P.: Instance optimal learning. CoRR abs/1504.05321 (2015)
Valiant, G., Valiant, P.: Instance optimal learning of discrete distributions. In: 48th ACM Symposium on the Theory of Computing, pp. 142–155 (2016)
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Goldreich, O., Ron, D. (2020). On the Relation Between the Relative Earth Mover Distance and the Variation Distance (an Exposition). In: Goldreich, O. (eds) Computational Complexity and Property Testing. Lecture Notes in Computer Science(), vol 12050. Springer, Cham. https://doi.org/10.1007/978-3-030-43662-9_9
Download citation
DOI: https://doi.org/10.1007/978-3-030-43662-9_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-43661-2
Online ISBN: 978-3-030-43662-9
eBook Packages: Computer ScienceComputer Science (R0)