Abstract
We address the problem of finding the maximum of the mutual information \(I(X;Y)\) of two finite-valued random variables \(X\) and \(Y\) given only the value of their coupling, i.e., the probability \(\Pr\{X=Y\}\). We obtain explicit lower and upper bounds on this maximum, which in some cases are optimal.
Similar content being viewed by others
References
Erokhin, V., \(\varepsilon\)-Entropy of a Discrete Random Variable, Teor. Veroyatnost. i Primenen., 1958, vol. 3, no. 1, pp. 103–107 [Theory Probab. Appl. (Engl. Transl.), 1958, vol. 3, no. 1, pp. 97–100]. https://doi.org/10.1137/1103008
Berger, T., Rate Distortion Theory: A Mathematical Basis for Data Compression, Englewood Cliffs, NJ: Prentice-Hall, 1971.
Ho, S.-W. and Verdú, S., On the Interplay between Conditional Entropy and Error Probability, IEEE Trans. Inform. Theory, 2010, vol. 56, no. 12, pp. 5930–5942. https://doi.org/10.1109/TIT.2010.2080891
Pinsker, M.S., On Estimation of Information via Variation, Probl. Peredachi Inf., 2005, vol. 41, no. 2, pp. 3–8 [Probl. Inf. Transm. (Engl. Transl.), 2005, vol. 41, no. 2, pp. 71–75]. https://doi.org/10.1007/s11122-005-0012-8
Zhang, Z., Estimating Mutual Information via Kolmogorov Distance, IEEE Trans. Inform. Theory, 2007, vol. 53, no. 9, pp. 3280–3282. https://doi.org/10.1109/TIT.2007.903122
Prelov, V.V., Generalization of a Pinsker Problem, Probl. Peredachi Inf., 2011, vol. 47, no. 2, pp. 17–37 [Probl. Inf. Transm. (Engl. Transl.), 2011, vol. 47, no. 2, pp. 98–116]. https://doi.org/10.1134/S0032946011020037
Prelov, V.V. On Some Extremal Problems for Mutual Information and Entropy, Probl. Peredachi Inf., 2016, vol. 52, no. 4, pp. 3–13 [Probl. Inf. Transm. (Engl. Transl.), 2016, vol. 52, no. 4, pp. 319–328]. https://doi.org/10.1134/S0032946016040013
Acknowledgments
In conclusion, the author expresses his sincere gratitude to a reviewer for pointing out a number of inaccuracies in the paper, the correction of which has improved its quality.
Author information
Authors and Affiliations
Additional information
Translated from Problemy Peredachi Informatsii, 2022, Vol. 58, No. 3, pp. 18–32. https://doi.org/10.31857/S0555292322030020
In memoriam Prof. E.M. Gabidulin
Appendix
Appendix
Proof of Proposition 1.
The claim of this proposition is almost obvious. Indeed, the information \(I(X;Y)\) takes its maximum value \(\ln n\) if and only if \(X\) has a uniform distribution and the conditional entropy \(H(X\,|\,Y)\) vanishes, which means that \(X\) should be a deterministic function of \(Y\). Therefore, \(I_{\max}(\alpha)=\ln n\) if and only if \(n\) numbers \(\frac{1}{n}\) can be placed in an \(n\times n\) matrix of the joint distribution of \(X\) and \(Y\) in such a way that each column and each row of this matrix contains exactly one entry \(\frac{1}{n}\) while the sum of diagonal entries is \(\alpha\) (all other entries of the matrix being zero). It is clear that such an arrangement is possible if and only if \(\alpha=\frac{k}{n}\), where \(k\) is any integer from \(0\) to \(n\) except for \(k=n-1\). △
Proof of Proposition 2.
First of all, note that throughout what follows, in the cases where we consider a probability distribution \(P_{X}=P=\{p_i,\: i\in\mathcal{N}\}\) of a random variable \(X\), we will assume for convenience that the components \(p_i\) of this distribution are arranged in descending order, so that \(p_1\ge p_2\ge\ldots\ge p_n>0\). In particular, we will always assume that \(p_{\min}=\min\limits_{i\in\mathcal{N}}p_i=p_n\).
To prove the lower bound (5), we use the following result from [7, Corollary 2]: For a given probability distribution \(P=\{p_i,\: i\in\mathcal{N}\}\) of a random variable \(X\), we have the equality
where \(\alpha_0\) is a solution of the equation
and the function \(\varphi(x,y)\) is defined in (4). Since \(\varphi(x,y)\) increases in each of its arguments and we consider the minimum over all distributions of \(X\) and \(Y\) such that \(\Pr\{Y=X\}=\alpha\) and only the minimal component \(p_n\) of the distribution \(P\) is given, (A.1) and (A.2) imply that this minimum is attained at any \(P\) for which \(p_{n-1}=p_n\); therefore, we obtain
Now, taking into account (A.3) and (A.4), we have
where \(H_1(\cdot)\) is defined in (2). In deriving (A.5), we have also used the fact that \(\max\limits_{X:\: p_{\min}=p_n}H(X)=H_1(p_n)\). It can easily be verified that \(H_1(p_n)-\varphi(\alpha,p_n)\) is a concave function of \(p_n\) and that its maximum in the interval \(p_n\in\bigl[\alpha,\frac{1}{n}\bigr]\) is attained at
if \(p^*_n(\alpha)\in\bigl[\alpha,\frac{1}{n}\bigr]\), i.e., if \(0\le\alpha\le\frac{1}{2n-1}\). If \(\frac{1}{2n-1}\le\alpha\le\frac{1}{n}\), then the maximum of \(H_1(p_n)-\varphi(\alpha,p_n)\) is attained at \(p_n=\alpha\). One can easily verify that
where \(J(\cdot)\) is defined in (3). Thus, (A.5) implies that
Now note that for any \(\alpha\in\bigl[0,\frac{1}{n}\bigr]\) we have another lower bound
Indeed, \(I(X;Y)=H_1(\alpha)\) if the entries \(p_{ij}\) of the matrix \(M=\left\|p_{ij}\right\|_{i,j=1}^n\) of the joint distribution of \(X\) and \(Y\) have the form
since in this case we have \(H(X)=H_1(\alpha)\) and \(H(X\,|\,Y)=0\).
It is obvious that the lower bound (A.7) is better than (A.6) if \(\frac{1}{2n-1}<\alpha\le\frac{1}{n}\), since \(\varphi(\alpha,\alpha)>0\) for \(\alpha>0\). Therefore, it is also necessary to compare these bounds in the interval \(\alpha\in\bigl[0,\frac{1}{2n-1}\bigl]\). We have
since \(\frac{n(n-1)\alpha^2}{1-\alpha^2}\le\frac{1}{4}\). Therefore, the function \(J(\alpha)-H_1(\alpha)\) decreases with \(\alpha\) and is positive at \(\alpha=0\). Now let us show that
Indeed, one can easily check that
The inequality \(J\bigl(\frac{1}{3n-1}\bigr)-H_1\bigl(\frac{1}{3n-1}\bigr)<0\) follows from the fact that
Thus, we have
where \(\alpha^*\) is a unique solution of the equation \(J(\alpha^*)=H_1(\alpha^*)\) in the interval \(0\le\alpha^{(*)}<\frac{1}{3n-1}\). The lower bound (5) is proved.
Now let us prove equality (6) and the upper bound (7). To this end, note that
where
According to (A.3), we have
and we have seen that
and that this maximum is attained at
However, the above equality for the maximum is valid only if \(p^*_n(\alpha)\) satisfies our constraint \(p^*_n(\alpha)\ge\ 2\alpha\) (previously, we had another constraint \(p^*_n(\alpha)\ge\alpha\)), which is equivalent to the condition \(0\le\alpha\le\frac{1}{3n-1}\). But if \(\frac{1}{3n-1}\le\alpha\le\frac{1}{2n}\), then
is attained at \(p_n=2\alpha\). Thus, we have the equality
In the case where \(p_n\le 2\alpha\), a precise value of
is not known, and therefore we can only claim that
Therefore, (A.10)–(A.14) imply now that
Let us find a restriction on \(\alpha\) under which \(J(\alpha)\ge H_1(2\alpha)\). One can easily verify that the difference \(J(\alpha)-H_1(2\alpha)\) decreases with \(\alpha\), is positive at \(\alpha=0\), and is negative at \(\alpha=\frac{1}{3n-1}\). Indeed, we have
Therefore, \(J(\alpha)\ge H_1(2\alpha)\) in the interval \(0\le\alpha\le\alpha_*\) and \(J(\alpha)\le H_1(2\alpha)\) in the interval \(\alpha_*<\alpha<\frac{1}{3n-1}\), where \(\alpha_*\) is a solution of the equation \(J(\alpha_*)= H_1(2\alpha_*)\). In addition, it is obvious that \(\alpha_*<\alpha^*<\frac{1}{3n-1}\), where \(\alpha^*\) is a solution of the equation \(J(\alpha^*)= H_1(\alpha^*)\). Now (A.13)–(A.15) imply equality (6) and the upper bound (7).
In conclusion, note that the second equality in (A.13) leads to the lower bound
which, however, is weaker than the lower bound (5) proved above. To this end, it should only be verified that
One can easily check that the difference \(H_1(\alpha)-[H_1(2\alpha)-\varphi(\alpha,2\alpha)]\) increases with \(\alpha\) in the interval \(\alpha\in\bigl[\frac{1}{3n-1},\frac{1}{2n} \bigr]\) and is positive at \(\alpha=\frac{1}{3n-1}\). Indeed, one can easily verify that in this interval we have
and
This completes the proof of Proposition 2. △
Proof of Proposition 3.
Note first of all that although the proof of this proposition is conceptually quite similar to that of Proposition 2 given above, it has some essential distinctions. In particular, in the case at hand, in contrast to Proposition 2, lower bounds for \(I_{\max}(\alpha)\) significantly differ depending on the value of the parameter \(n\), when \(n\le 14\) or when \(n\ge 15\).
First, we again use the following result from [7, Corollary 2]: For a given probability distribution \(P=\{p_i,\: i\in\mathcal{N}\}\) of a random variable \(X\), we have the equality
Using (A.17), we obtain
where
In the proof of Proposition 2, we have shown that
and that this maximum is attained at
if \(0\le\alpha\le\frac{1}{2n-1}\) and at \(p_n=\alpha\) if \(\frac{1}{2n-1}\le\alpha\le\frac{1}{n}\).
In the considered case of large values of \(\alpha\), finding the maximum
differs from that in (A.21) by only replacing the parameter \(\alpha\) with \(1-\alpha\). Therefore, (A.18) and (A.19) imply the following lower bound for \(I_{\max}(\alpha)\):
moreover, the first equality for \(J'_1(\alpha)\) is attained at
and the second is attained at \(p_n=1-\alpha\).
Now note that we have another lower bound
Indeed, this lower bound follows from the fact that \(I(X;Y)=H_2(1-\alpha)\) if the entries of the matrix \(M=\left\|p_{ij}\right\|_{i,j=1}^n\) of the joint distribution of \(X\) and \(Y\) are given by
since for this joint distribution we have \(H(X)=H_2(1-\alpha)\) and \(H(X\,|\,Y)=0\). Note that, in contrast to the case of small values of \(\alpha\), where the lower bound \(I_{\max}(\alpha)\ge H_1(\alpha)\) was valid (see (A.7)), in the present case of large values of \(\alpha\) we cannot claim that \(I_{\max}(\alpha)\ge H_1(1-\alpha)\), since we are unable to construct a joint distribution of \(X\) and \(Y\) such that \(H(X)=H_1(1-\alpha)\) and \(H(X\,|\,Y)=0\).
Thus, it is necessary to compare the lower bounds (A.22) with (A.23) for different values of \(\alpha\) and choose the best of them. Compare now the first bound in (A.22) with (A.23) in the interval \(\alpha\in\bigl[1-\frac{1}{2n-1},1 \bigr]\). One can easily check that the difference \(J(1-\alpha)-H_2(1-\alpha)\) increases with \(\alpha\) in this interval, since
and this difference is positive at \(\alpha=1\). Therefore, we have to compare the values \(J(1-\alpha)\) with \(H_2(1-\alpha)\) at \(\alpha=1-\frac{1}{2n-1}\). We have
so that
Noting that the function \((n-1)\ln\frac{n-1}{n-2}\) decreases with \(n\) and taking into account that \(\ln 8\approx 2.07944\) and
we obtain that the lower bound (A.22) is better than (A.23) for all \(\alpha\in\bigl[1-\frac{1}{2n-1},1\bigr]\) if \(3\le n\le 14\) and for \(\alpha\in [\widetilde{\alpha},1]\) if \(n\ge 15\), where \(\widetilde{\alpha}\), \(1-\frac{1}{2n-1}<\widetilde{\alpha}<1\), is a unique solution of the equation
in the given interval. But if \(n\ge 15\) and \(\alpha\in\bigl[1-\frac{1}{2n-1},\widetilde{\alpha}\bigr]\), then (A.23) is better than (A.22). Thus, we have proved the validity of (11) and (12) in the interval \(\bigl[1-\frac{1}{2n-1},1\bigr]\).
Now let us compare the second bound in (A.22) with (A.23) in the interval \(\alpha\in\bigl[1-\frac{1}{n},1-\frac{1}{2n-1}\bigr]\). To this end, note that the difference
increases with \(\alpha\) and is negative at \(\alpha=1-\frac{1}{n}\), since one can easily check that
because \(\ln 8>2\) and
Now, noting that this difference at \(\alpha=1-\frac{1}{2n-1}\) equals
since one can easily check that
and that for \(J\bigl(\frac{1}{2n-1}\bigr)\) equality (A.26) is valid, again taking into account (A.27), we verify the validity of the lower bounds in (11) and (12) in the interval \(\alpha\in\bigl[1-\frac{1}{n},1-\frac{1}{2n-1}\bigr]\) too.
To prove equality (13) and the upper bound (14), it suffices to check that
and that
where \(\overline\alpha\), \(\overline\alpha\in\bigl[1-\frac{1}{2n-1},1\bigr]\) is a unique solution of the equation
in this interval. The validity of (A.28) follows from the fact that the difference \(H_1(1-\alpha)-H_2(1-\alpha)\) decreases with \(\alpha\) in the interval \(\bigl[1-\frac{1}{n},1\bigr]\) and is positive at \(\alpha=1\). To prove (A.29) and (A.30), we should only verify that the difference \(J(1-\alpha)-H_1(1-\alpha)\) increases with \(\alpha\) in the interval \(\bigl[1-\frac{1}{2n-1},1\bigr]\), is negative at \(\alpha=1-\frac{1}{2n-1}\), and is positive at \(\alpha=1\). This completes the proof of Proposition 3. △
Proof of Proposition 4.
First note that in this case of “moderate” values of the parameter \(\alpha\), where \(\frac{1}{n}<\alpha<1-\frac{1}{n}\), we are unable to obtain an explicit expression for \(\min\limits_{Y:\: \Pr\{Y=X\}=\alpha}H(X\,|\,Y)\) under a given distribution of \(X\), unlike the cases of small and large values of \(\alpha\). Therefore, for moderate values of \(\alpha\), we are unable to obtain some analogs of the lower bounds for \(I_{\max}(\alpha)\) given in Propositions 2 and 3.
For the moderate values of \(\alpha\) it is natural to consider two types of explicit lower bounds for \(I_{\max}(\alpha)\) and compare them: where the joint distribution of \(X\) and \(Y\) is such that \(\Pr\{Y=X\}=\alpha\) and \(H(X\,|\,Y)=0\), or where \(X\) has a uniform distribution, since in this case the precise value for \(\min\limits_{Y:\: \Pr\{Y=X\}=\alpha}H(X\,|\,Y)\) is known (see (9)). In the first case, where it is assumed that the conditional entropy \(H(X\,|\,Y)\) is zero and \(\Pr\{Y=X\}=\alpha\), the matrix of the joint distribution of \(X\) and \(Y\) contains a single nonzero entry in each column and each row, while the sum of diagonal entries is \(\alpha\). As can easily be verified, in the case where
to maximize the information \(I(X;Y)=H(X)\), we have to choose the best of the two ways to arrange nonzero entries in this joint distribution matrix:
-
1.
\(k\) numbers \(\frac{\alpha}{k}\) are located on the diagonal and all the other \(n-k\) numbers \(\frac{1-\alpha}{n-k}\) are located outside the diagonal is such a way that each unoccupied row and each unoccupied column contains one such entry
-
2.
\(k+1\) numbers \(\frac{\alpha}{k+1}\) are located on the diagonal and all the other \(n-k-1\) numbers \(\frac{1-\alpha}{n-k-1}\) are located outside the diagonal is such a way that each unoccupied row and each unoccupiedcolumn contains one such entry.
In the first case we have \(H(X)=H_k(\alpha)\), and in the second we have \(H(X)=H_{k+1}(\alpha)\), where \(H_i(x)\) is defined in (2). But if \(k=n-2\), i.e., \(\frac{n-2}{n}<\alpha<\frac{n-1}{n}\), then the equality \(H(Y\,|\,X)=0\) is possible only in the first of these cases of arranging nonzero entries in the joint distribution matrix, so in this case we have \(H(X)=H_{n-2}(\alpha)\). Now note that
where
Indeed, equalities (A.31) and (A.32) follow from the fact that the difference \(H_k(\alpha)-H_{k+1}(\alpha)\) decreases with \(\alpha\),
so \(\widehat{\alpha}(k)\) is a solution of the equation
Thus, the validity of the lower bound (18) is proved; moreover, its optimality at
follows from Proposition 1. The lower bound (19) is a direct consequence of equality (9). Now let us show that (18) is better than (19).
First assume that
In this case it suffices to show that
To this end, note that the difference
is a concave function of \(\alpha\) which vanishes at \(\alpha=\frac{k}{n}\). Therefore, to prove that in this case (18) is better than (19), it suffices to verify that
Simple calculations show that
The desired inequality \(H_k\bigl(\frac{2k+1}{2n}\bigr)>\ln n-\varphi\bigl(\frac{1}{n},\frac{1}{2n}\bigr)\) now follows from the fact that
This means that in the case at hand, where
the lower bound (18) is better than (19).
A similar proof shows that (18) is better than (19) in both remaining cases, where either
or \(k=n-2\), i.e.,
In the first of these cases, one proves that
and in the second case, that
This completes the proof of Proposition 4. △
Rights and permissions
About this article
Cite this article
Prelov, V. On One Extremal Problem for Mutual Information. Probl Inf Transm 58, 217–230 (2022). https://doi.org/10.1134/S0032946022030024
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1134/S0032946022030024