Skip to main content
Log in

On One Extremal Problem for Mutual Information

  • INFORMATION THEORY
  • Published:
Problems of Information Transmission Aims and scope Submit manuscript

Abstract

We address the problem of finding the maximum of the mutual information \(I(X;Y)\) of two finite-valued random variables \(X\) and \(Y\) given only the value of their coupling, i.e., the probability \(\Pr\{X=Y\}\). We obtain explicit lower and upper bounds on this maximum, which in some cases are optimal.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Erokhin, V., \(\varepsilon\)-Entropy of a Discrete Random Variable, Teor. Veroyatnost. i Primenen., 1958, vol. 3, no. 1, pp. 103–107 [Theory Probab. Appl. (Engl. Transl.), 1958, vol. 3, no. 1, pp. 97–100]. https://doi.org/10.1137/1103008

    MathSciNet  MATH  Google Scholar 

  2. Berger, T., Rate Distortion Theory: A Mathematical Basis for Data Compression, Englewood Cliffs, NJ: Prentice-Hall, 1971.

    MATH  Google Scholar 

  3. Ho, S.-W. and Verdú, S., On the Interplay between Conditional Entropy and Error Probability, IEEE Trans. Inform. Theory, 2010, vol. 56, no. 12, pp. 5930–5942. https://doi.org/10.1109/TIT.2010.2080891

    Article  MathSciNet  Google Scholar 

  4. Pinsker, M.S., On Estimation of Information via Variation, Probl. Peredachi Inf., 2005, vol. 41, no. 2, pp. 3–8 [Probl. Inf. Transm. (Engl. Transl.), 2005, vol. 41, no. 2, pp. 71–75]. https://doi.org/10.1007/s11122-005-0012-8

    MathSciNet  MATH  Google Scholar 

  5. Zhang, Z., Estimating Mutual Information via Kolmogorov Distance, IEEE Trans. Inform. Theory, 2007, vol. 53, no. 9, pp. 3280–3282. https://doi.org/10.1109/TIT.2007.903122

    Article  MathSciNet  Google Scholar 

  6. Prelov, V.V., Generalization of a Pinsker Problem, Probl. Peredachi Inf., 2011, vol. 47, no. 2, pp. 17–37 [Probl. Inf. Transm. (Engl. Transl.), 2011, vol. 47, no. 2, pp. 98–116]. https://doi.org/10.1134/S0032946011020037

    MathSciNet  MATH  Google Scholar 

  7. Prelov, V.V. On Some Extremal Problems for Mutual Information and Entropy, Probl. Peredachi Inf., 2016, vol. 52, no. 4, pp. 3–13 [Probl. Inf. Transm. (Engl. Transl.), 2016, vol. 52, no. 4, pp. 319–328]. https://doi.org/10.1134/S0032946016040013

    MathSciNet  MATH  Google Scholar 

Download references

Acknowledgments

In conclusion, the author expresses his sincere gratitude to a reviewer for pointing out a number of inaccuracies in the paper, the correction of which has improved its quality.

Author information

Authors and Affiliations

Authors

Additional information

Translated from Problemy Peredachi Informatsii, 2022, Vol. 58, No. 3, pp. 18–32. https://doi.org/10.31857/S0555292322030020

In memoriam Prof. E.M. Gabidulin

Appendix

Appendix

Proof of Proposition 1.

The claim of this proposition is almost obvious. Indeed, the information \(I(X;Y)\) takes its maximum value \(\ln n\) if and only if \(X\) has a uniform distribution and the conditional entropy \(H(X\,|\,Y)\) vanishes, which means that \(X\) should be a deterministic function of \(Y\). Therefore, \(I_{\max}(\alpha)=\ln n\) if and only if \(n\) numbers \(\frac{1}{n}\) can be placed in an \(n\times n\) matrix of the joint distribution of \(X\) and \(Y\) in such a way that each column and each row of this matrix contains exactly one entry \(\frac{1}{n}\) while the sum of diagonal entries is \(\alpha\) (all other entries of the matrix being zero). It is clear that such an arrangement is possible if and only if \(\alpha=\frac{k}{n}\), where \(k\) is any integer from \(0\) to \(n\) except for \(k=n-1\). △

Proof of Proposition 2.

First of all, note that throughout what follows, in the cases where we consider a probability distribution \(P_{X}=P=\{p_i,\: i\in\mathcal{N}\}\) of a random variable \(X\), we will assume for convenience that the components \(p_i\) of this distribution are arranged in descending order, so that \(p_1\ge p_2\ge\ldots\ge p_n>0\). In particular, we will always assume that \(p_{\min}=\min\limits_{i\in\mathcal{N}}p_i=p_n\).

To prove the lower bound (5), we use the following result from [7, Corollary 2]: For a given probability distribution \(P=\{p_i,\: i\in\mathcal{N}\}\) of a random variable \(X\), we have the equality

$$\min\limits_{Y:\: \Pr\{Y=X\}=\alpha}H(X\,|\, Y)= \begin{cases} \varphi(\alpha,p_n) & \text{if}\ 0\le\alpha\le\alpha_0,\\ \varphi(p_n-\alpha,p_{n-1}) & \text{if}\ \alpha_0\le\alpha\le p_n, \end{cases}$$
(A.1)

where \(\alpha_0\) is a solution of the equation

$$\varphi(p_n-\alpha_0,p_{n-1})= \varphi(\alpha_0,p_n)$$
(A.2)

and the function \(\varphi(x,y)\) is defined in (4). Since \(\varphi(x,y)\) increases in each of its arguments and we consider the minimum over all distributions of \(X\) and \(Y\) such that \(\Pr\{Y=X\}=\alpha\) and only the minimal component \(p_n\) of the distribution \(P\) is given, (A.1) and (A.2) imply that this minimum is attained at any \(P\) for which \(p_{n-1}=p_n\); therefore, we obtain

$$\min\limits_{(X,Y):\: \Pr\{Y=X\}=\alpha,\: p_{\min}=p_n}H(X\,|\, Y) =\varphi(\alpha,p_n)\quad \text{if}\quad 0\le\alpha\le\frac{p_n}{2},$$
(A.3)
$$\min\limits_{(X,Y):\: \Pr\{Y=X\}=\alpha,\: p_{\min}=p_n}H(X\,|\, Y)=\varphi(p_n-\alpha,p_n) \le\varphi(\alpha,p_n)\quad \text{if}\quad \frac{p_n}{2}\le\alpha\le p_n.$$
(A.4)

Now, taking into account (A.3) and (A.4), we have

$$\begin{aligned}[b] I_{\max}(\alpha)&\ge\max\limits_{p_n:\:\alpha\le p_n\le\frac{1}{n}}\;\max\limits_{X:\: p_{\min}=p_n}\Bigl[ H(X)-\min\limits_{Y:\:\Pr\{Y=X\}=\alpha}H(X\,|\,Y)\Bigr]\\ &\ge\max\limits_{p_n:\:\alpha\le p_n\le\frac{1}{n}}\bigl[H_1(p_n)-\varphi(\alpha,p_n) \bigr],\end{aligned}$$
(A.5)

where \(H_1(\cdot)\) is defined in (2). In deriving (A.5), we have also used the fact that \(\max\limits_{X:\: p_{\min}=p_n}H(X)=H_1(p_n)\). It can easily be verified that \(H_1(p_n)-\varphi(\alpha,p_n)\) is a concave function of \(p_n\) and that its maximum in the interval \(p_n\in\bigl[\alpha,\frac{1}{n}\bigr]\) is attained at

$$p_n=p^*_n(\alpha)=\frac{1}{n}-\frac{n-1}{n}\alpha$$

if \(p^*_n(\alpha)\in\bigl[\alpha,\frac{1}{n}\bigr]\), i.e., if \(0\le\alpha\le\frac{1}{2n-1}\). If \(\frac{1}{2n-1}\le\alpha\le\frac{1}{n}\), then the maximum of \(H_1(p_n)-\varphi(\alpha,p_n)\) is attained at \(p_n=\alpha\). One can easily verify that

$$H_1(p^*_n(\alpha))-\varphi(\alpha,p^*_n(\alpha))=J(\alpha),$$

where \(J(\cdot)\) is defined in (3). Thus, (A.5) implies that

$$I_{\max}(\alpha)\ge \begin{cases} J(\alpha) & \text{if}\ 0\le\alpha\le\dfrac{1}{2n-1},\\[2pt] H_1(\alpha)-\varphi(\alpha,\alpha) & \text{if}\ \dfrac{1}{2n-1}<\alpha\le\dfrac{1}{n}. \end{cases}$$
(A.6)

Now note that for any \(\alpha\in\bigl[0,\frac{1}{n}\bigr]\) we have another lower bound

$$I_{\max}(\alpha)\ge H_1(\alpha)\quad \text{if}\quad \alpha\in\Bigl[0,\frac{1}{n}\Bigr].$$
(A.7)

Indeed, \(I(X;Y)=H_1(\alpha)\) if the entries \(p_{ij}\) of the matrix \(M=\left\|p_{ij}\right\|_{i,j=1}^n\) of the joint distribution of \(X\) and \(Y\) have the form

$$p_{ij}= \begin{cases} \alpha & \text{if}\ i=j=1,\\ \dfrac{1-\alpha}{n-1} & \text{if}\ j=i+1,\: i=2,3,\ldots,n-1,\ \text{and if}\ i=n\ \text{and}\ j=2,\\ 0 & \text{in all other cases}, \end{cases}$$

since in this case we have \(H(X)=H_1(\alpha)\) and \(H(X\,|\,Y)=0\).

It is obvious that the lower bound (A.7) is better than (A.6) if \(\frac{1}{2n-1}<\alpha\le\frac{1}{n}\), since \(\varphi(\alpha,\alpha)>0\) for \(\alpha>0\). Therefore, it is also necessary to compare these bounds in the interval \(\alpha\in\bigl[0,\frac{1}{2n-1}\bigl]\). We have

$$\left[J(\alpha)-H_1(\alpha)\right]'_{\alpha}=\ln \frac{n(n-1)\alpha^2}{1-\alpha^2}<0\quad\text{for}\quad \alpha\in\Bigl[0,\frac{1}{2n-1}\Bigl],$$

since \(\frac{n(n-1)\alpha^2}{1-\alpha^2}\le\frac{1}{4}\). Therefore, the function \(J(\alpha)-H_1(\alpha)\) decreases with \(\alpha\) and is positive at \(\alpha=0\). Now let us show that

$$J\biggl(\frac{1}{3n-1}\biggr)-H_1\biggl(\frac{1}{3n-1} \biggr)<0.$$

Indeed, one can easily check that

$$J\biggl(\frac{1}{3n-1}\biggr)=\ln(3n-1)- \frac{3n-2}{3n-1}\ln 3-\frac{2\ln 3}{3n-1},$$
(A.8)
$$H_1\biggl(\frac{1}{3n-1}\biggr)=\ln(3n-1)- \frac{3n-2}{3n-1}\ln 3-\frac{3n-2}{3n-1}\ln\biggl(1+\frac{1}{3n-3} \biggr).$$
(A.9)

The inequality \(J\bigl(\frac{1}{3n-1}\bigr)-H_1\bigl(\frac{1}{3n-1}\bigr)<0\) follows from the fact that

$$\frac{3n-2}{3n-1}\ln\biggl(1+\frac{1}{3n-3}\biggr)<\frac{2\ln 3}{3n-1}.$$

Thus, we have

$$\begin{aligned}J(\alpha)>H_1(\alpha)\quad &\text{if}\quad 0\le\alpha\le\alpha^*,\\ J(\alpha)<H_1(\alpha)\quad &\text{if}\quad \alpha^*<\alpha<\frac{1}{3n-1},\end{aligned}$$

where \(\alpha^*\) is a unique solution of the equation \(J(\alpha^*)=H_1(\alpha^*)\) in the interval \(0\le\alpha^{(*)}<\frac{1}{3n-1}\). The lower bound (5) is proved.

Now let us prove equality (6) and the upper bound (7). To this end, note that

$$I_{\max}(\alpha)=\max\{J_1(\alpha),J_2(\alpha)\}\quad \text{if}\quad 0\le\alpha\le\frac{1}{2n},$$
(A.10)

where

$$J_1(\alpha)=\max\limits_{p_n:\: 2\alpha\le p_n\le\frac{1}{n}}\,\max\limits_{X:\: p_{\min}=p_n}\Bigl[H(X)-\min\limits_{Y:\: \Pr\{Y=X\}=\alpha}H(X\,|\,Y)\Bigr],$$
(A.11)
$$J_2(\alpha)=\max\limits_{p_n:\: p_n\le 2\alpha}\,\max\limits_{X:\: p_{\min}=p_n}\Bigl[H(X)-\min\limits_{Y:\: \Pr\{Y=X\}=\alpha}H(X\,|\,Y)\Bigr].$$
(A.12)

According to (A.3), we have

$$\min\limits_{Y:\: \Pr\{Y=X\}=\alpha}H(X\,|\,Y)=\varphi(\alpha,p_n)\quad \text{if}\quad p_n\ge 2\alpha,$$

and we have seen that

$$\max\limits_{p_n:\: 2\alpha\le p_n\le\frac{1}{n}}\,\max\limits_{X:\: p_{\min}=p_n}[H(X)-\varphi(\alpha,p_n)]=J(\alpha)$$

and that this maximum is attained at

$$p_n=p^*_n(\alpha)=\frac{1}{n}-\frac{n-1}{n}\alpha.$$

However, the above equality for the maximum is valid only if \(p^*_n(\alpha)\) satisfies our constraint \(p^*_n(\alpha)\ge\ 2\alpha\) (previously, we had another constraint \(p^*_n(\alpha)\ge\alpha\)), which is equivalent to the condition \(0\le\alpha\le\frac{1}{3n-1}\). But if \(\frac{1}{3n-1}\le\alpha\le\frac{1}{2n}\), then

$$\max\limits_{p_n:\: 2\alpha\le p_n\le\frac{1}{n}}\,\max\limits_{X:\: p_{\min}=p_n}[H(X)-\varphi(\alpha,p_n)]$$

is attained at \(p_n=2\alpha\). Thus, we have the equality

$$J_1(\alpha)= \begin{cases} J(\alpha) & \text{if}\ 0\le\alpha\le\dfrac{1}{3n-1},\\ H_1(2\alpha)-\varphi(\alpha,2\alpha) & \text{if}\ \dfrac{1}{3n-1}<\alpha\le\dfrac{1}{2n}. \end{cases}$$
(A.13)

In the case where \(p_n\le 2\alpha\), a precise value of

$$\max\limits_{p_n:\: p_n\le 2\alpha}\,\max\limits_{X:\: p_{\min}=p_n}\Bigl[H(X)-\min\limits_{Y:\: \Pr\{Y=X\}=\alpha}H(X\,|\,Y)\Bigr]$$

is not known, and therefore we can only claim that

$$J_2(\alpha)\le\max\limits_{X:\: p_n\le 2\alpha}H(X)=H_1(2\alpha),\quad 0\le\alpha\le\frac{1}{2n}.$$
(A.14)

Therefore, (A.10)–(A.14) imply now that

$$I_{\max}(\alpha)=J(\alpha)\quad \text{if}\quad J(\alpha)\ge H_1(2\alpha).$$
(A.15)

Let us find a restriction on \(\alpha\) under which \(J(\alpha)\ge H_1(2\alpha)\). One can easily verify that the difference \(J(\alpha)-H_1(2\alpha)\) decreases with \(\alpha\), is positive at \(\alpha=0\), and is negative at \(\alpha=\frac{1}{3n-1}\). Indeed, we have

$$\begin{aligned} J\Bigl(\frac{1}{3n-1}\Bigr)&=\ln(3n-1)- \frac{3n}{3n-1}\ln 3\quad \text{(see (A.8))},\\ H_1\Bigl(\frac{2}{3n-1}\Bigr)&=\ln(3n-1)- \frac{3n}{3n-1}\ln 3+\frac{1}{3n-1}\ln \frac{27}{4}>J\Bigl(\frac{1}{3n-1}\Bigr). \end{aligned}$$

Therefore, \(J(\alpha)\ge H_1(2\alpha)\) in the interval \(0\le\alpha\le\alpha_*\) and \(J(\alpha)\le H_1(2\alpha)\) in the interval \(\alpha_*<\alpha<\frac{1}{3n-1}\), where \(\alpha_*\) is a solution of the equation \(J(\alpha_*)= H_1(2\alpha_*)\). In addition, it is obvious that \(\alpha_*<\alpha^*<\frac{1}{3n-1}\), where \(\alpha^*\) is a solution of the equation \(J(\alpha^*)= H_1(\alpha^*)\). Now (A.13)–(A.15) imply equality (6) and the upper bound (7).

In conclusion, note that the second equality in (A.13) leads to the lower bound

$$I_{\max}(\alpha)\ge H_1(2\alpha)-\varphi(\alpha,2\alpha)\quad \text{if}\quad \frac{1}{3n-1}\le\alpha\le\frac{1}{2n},$$
(A.16)

which, however, is weaker than the lower bound (5) proved above. To this end, it should only be verified that

$$H_1(\alpha)>H_1(2\alpha)-\varphi(\alpha,2\alpha)\quad \text{if}\quad \frac{1}{3n-1}\le\alpha\le\frac{1}{2n}.$$

One can easily check that the difference \(H_1(\alpha)-[H_1(2\alpha)-\varphi(\alpha,2\alpha)]\) increases with \(\alpha\) in the interval \(\alpha\in\bigl[\frac{1}{3n-1},\frac{1}{2n} \bigr]\) and is positive at \(\alpha=\frac{1}{3n-1}\). Indeed, one can easily verify that in this interval we have

$$[H_1(\alpha)-H_1(2\alpha)+\varphi(\alpha,2\alpha)]'_{\alpha}= \ln\frac{27(n-1)\alpha(1-\alpha)}{(1-2\alpha)^2}>0$$

and

$$H_1\Bigl(\frac{1}{3n-1}\Bigr)- H_1\Bigl(\frac{2}{3n-1}\Bigr)+\varphi\Bigl(\frac{1}{3n-1}, \frac{2}{3n-1}\Bigr) =-\frac{3n-2}{3n-1}\ln \Bigl(1+\frac{1}{3n-3}\Bigr)+\frac{2\ln 3}{3n-1}\ge 0.$$

This completes the proof of Proposition 2. △

Proof of Proposition 3.

Note first of all that although the proof of this proposition is conceptually quite similar to that of Proposition 2 given above, it has some essential distinctions. In particular, in the case at hand, in contrast to Proposition 2, lower bounds for \(I_{\max}(\alpha)\) significantly differ depending on the value of the parameter \(n\), when \(n\le 14\) or when \(n\ge 15\).

First, we again use the following result from [7, Corollary 2]: For a given probability distribution \(P=\{p_i,\: i\in\mathcal{N}\}\) of a random variable \(X\), we have the equality

$$\min\limits_{Y:\: \Pr\{Y=X\}=\alpha}H(X\,|\,Y)=\varphi(1-\alpha,p_n)\quad \text{if}\quad 1-p_n\le\alpha\le 1.$$
(A.17)

Using (A.17), we obtain

$$I_{\max}(\alpha)=\max\{J'_1(\alpha),J'_2(\alpha)\},$$
(A.18)

where

$$\begin{aligned}[b]J'_1(\alpha)&=\max\limits_{p_n:\: 1-\alpha\le p_n\le\frac{1}{n}}\,\max\limits_{X:\: p_{\min}=p_n}\Bigl[H(X)-\min\limits_{Y:\: \Pr\{Y=X\}=\alpha}H(X\,|\,Y)\Bigr]\\ &=\max\limits_{p_n:\: 1-\alpha\le p_n\le\frac{1}{n}}\,[H_1(p_n)-\varphi(1-\alpha,p_n)],\end{aligned}$$
(A.19)
$$\begin{aligned}[b]J'_2(\alpha)&=\max\limits_{p_n:\: p_n\le 1-\alpha}\,\max\limits_{X:\: p_{\min}=p_n}\,\Bigl[H(X)-\min\limits_{Y:\: \Pr\{Y=X\}=\alpha}H(X\,|\,Y)\Bigr]\\ &\le\max\limits_{p_n:\: p_n\le 1-\alpha}H(X)=H_1(1-\alpha).\end{aligned}$$
(A.20)

In the proof of Proposition 2, we have shown that

$$\max\limits_{p_n:\: \alpha\le p_n\le\frac{1}{n}}\,[H_1(p_n)-\varphi(\alpha,p_n)]= \begin{cases} J(\alpha) & \text{if}\ 0\le\alpha\le\dfrac{1}{2n-1},\\ H_1(\alpha)-\varphi(\alpha,\alpha) & \text{if}\ \dfrac{1}{2n-1}<\alpha\le\dfrac{1}{n} \end{cases}$$
(A.21)

and that this maximum is attained at

$$p_n=p^*_n(\alpha)=\frac{1}{n}-\frac{n-1}{n}\alpha$$

if \(0\le\alpha\le\frac{1}{2n-1}\) and at \(p_n=\alpha\) if \(\frac{1}{2n-1}\le\alpha\le\frac{1}{n}\).

In the considered case of large values of \(\alpha\), finding the maximum

$$\max\limits_{p_n:\: 1-\alpha\le p_n\le\frac{1}{n}}\,[H_1(p_n)-\varphi(1-\alpha,p_n)]$$

differs from that in (A.21) by only replacing the parameter \(\alpha\) with \(1-\alpha\). Therefore, (A.18) and (A.19) imply the following lower bound for \(I_{\max}(\alpha)\):

$$I_{\max}(\alpha)\ge J'_1(\alpha)= \begin{cases} J(1-\alpha) & \text{if}\ 1-\dfrac{1}{2n-1}\le\alpha\le 1,\\[2pt] H_1(1-\alpha)-\varphi(1-\alpha,1-\alpha) & \text{if}\ 1-\dfrac{1}{n}\le\alpha\le 1-\dfrac{1}{2n-1}; \end{cases} $$
(A.22)

moreover, the first equality for \(J'_1(\alpha)\) is attained at

$$p_n=p^*_n(1-\alpha)=\frac{1}{n}- \frac{n-1}{n}(1-\alpha)$$

and the second is attained at \(p_n=1-\alpha\).

Now note that we have another lower bound

$$I_{\max}(\alpha)\ge H_2(1-\alpha)\quad\text{for all}\quad \alpha\in\Bigl[1-\frac{1}{n}, 1\Bigr]. $$
(A.23)

Indeed, this lower bound follows from the fact that \(I(X;Y)=H_2(1-\alpha)\) if the entries of the matrix \(M=\left\|p_{ij}\right\|_{i,j=1}^n\) of the joint distribution of \(X\) and \(Y\) are given by

$$p_{ij}=\begin{cases}\dfrac{\alpha}{n-2} & \text{if}\ i=j=1,2,\ldots,n-2,\\ \dfrac{1-\alpha}{2} & \text{if}\ i=n-1,\: j=n\ \text{and if}\ i=n,\: j=n-1,\\ 0 & \text{in all other cases},\end{cases}$$

since for this joint distribution we have \(H(X)=H_2(1-\alpha)\) and \(H(X\,|\,Y)=0\). Note that, in contrast to the case of small values of \(\alpha\), where the lower bound \(I_{\max}(\alpha)\ge H_1(\alpha)\) was valid (see (A.7)), in the present case of large values of \(\alpha\) we cannot claim that \(I_{\max}(\alpha)\ge H_1(1-\alpha)\), since we are unable to construct a joint distribution of \(X\) and \(Y\) such that \(H(X)=H_1(1-\alpha)\) and \(H(X\,|\,Y)=0\).

Thus, it is necessary to compare the lower bounds (A.22) with (A.23) for different values of \(\alpha\) and choose the best of them. Compare now the first bound in (A.22) with (A.23) in the interval \(\alpha\in\bigl[1-\frac{1}{2n-1},1 \bigr]\). One can easily check that the difference \(J(1-\alpha)-H_2(1-\alpha)\) increases with \(\alpha\) in this interval, since

$$ \bigl(J(1-\alpha)-H_2(1-\alpha)\bigr)'_{\alpha}=\ln \frac{2\alpha(2-\alpha)}{n(n-2)(1-\alpha)^2}>0,$$

and this difference is positive at \(\alpha=1\). Therefore, we have to compare the values \(J(1-\alpha)\) with \(H_2(1-\alpha)\) at \(\alpha=1-\frac{1}{2n-1}\). We have

$$J\Bigl(\frac{1}{2n-1}\Bigr)=\ln(2n-1)- \frac{2n\ln 2}{2n-1},$$
(A.24)
$$H_2\Bigl(\frac{1}{2n-1}\Bigr)=\ln(2n-1)- \frac{(2n-3)\ln 2}{2n-1}-\frac{2n-2}{2n-1}\ln \frac{n-1}{n-2},$$
(A.25)

so that

$$H_2\Bigl(\frac{1}{2n-1}\Bigr)= J\Bigl(\frac{1}{2n-1}\Bigr)+\frac{1}{2n-1} \Bigl[\ln 8-2(n-1)\ln \frac{n-1}{n-2}\Bigr].$$
(A.26)

Noting that the function \((n-1)\ln\frac{n-1}{n-2}\) decreases with \(n\) and taking into account that \(\ln 8\approx 2.07944\) and

$$2(n-1)\ln\frac{n-1}{n-2}\approx \begin{cases} 2.08111 & \text{if}\ n=14,\\ 2.07502 & \text{if}\ n=15, \end{cases}$$
(A.27)

we obtain that the lower bound (A.22) is better than (A.23) for all \(\alpha\in\bigl[1-\frac{1}{2n-1},1\bigr]\) if \(3\le n\le 14\) and for \(\alpha\in [\widetilde{\alpha},1]\) if \(n\ge 15\), where \(\widetilde{\alpha}\), \(1-\frac{1}{2n-1}<\widetilde{\alpha}<1\), is a unique solution of the equation

$$J(1-\widetilde{\alpha})=H_2(1-\widetilde{\alpha})$$

in the given interval. But if \(n\ge 15\) and \(\alpha\in\bigl[1-\frac{1}{2n-1},\widetilde{\alpha}\bigr]\), then (A.23) is better than (A.22). Thus, we have proved the validity of (11) and (12) in the interval \(\bigl[1-\frac{1}{2n-1},1\bigr]\).

Now let us compare the second bound in (A.22) with (A.23) in the interval \(\alpha\in\bigl[1-\frac{1}{n},1-\frac{1}{2n-1}\bigr]\). To this end, note that the difference

$$[H_1(1-\alpha)-\varphi(1-\alpha,1-\alpha)]-H_2(1-\alpha)$$

increases with \(\alpha\) and is negative at \(\alpha=1-\frac{1}{n}\), since one can easily check that

$$H_1\Bigl(\frac{1}{n}\Bigr)-\varphi\Bigl(\frac{1}{n}, \frac{1}{n}\Bigr)-H_2\Bigl(\frac{1}{n}\Bigr)= -\frac{3\ln 2}{n}+\frac{n-1}{n}\ln \biggl(1+\frac{1}{n-2}\biggr)<0,$$

because \(\ln 8>2\) and

$$(n-1)\ln \bigl(1+\frac{1}{n-2}\bigr)\le\frac{n-1}{n-2}<2\quad \text{for}\quad n>2.$$

Now, noting that this difference at \(\alpha=1-\frac{1}{2n-1}\) equals

$$-\frac{1}{2n-1}\biggl[\ln 8-2(n-1)\ln\frac{n-1}{n-2}\biggr],$$

since one can easily check that

$$H_1\Bigl(\frac{1}{2n-1}\Bigr)-\varphi\Bigl(\frac{1}{2n-1},\frac{1}{2n-1}\Bigr) =J\Bigl(\frac{1}{2n-1}\Bigr)$$

and that for \(J\bigl(\frac{1}{2n-1}\bigr)\) equality (A.26) is valid, again taking into account (A.27), we verify the validity of the lower bounds in (11) and (12) in the interval \(\alpha\in\bigl[1-\frac{1}{n},1-\frac{1}{2n-1}\bigr]\) too.

To prove equality (13) and the upper bound (14), it suffices to check that

$$H_1(1-\alpha)\ge H_2(1-\alpha)\quad\text{for all}\ \alpha,\: 1-\frac{1}{n}\le\alpha\le 1,$$
(A.28)

and that

$$J(1-\alpha)\ge H_1(1-\alpha)\quad\text{for}\ \alpha\in[\overline\alpha,1],$$
(A.29)
$$J(1-\alpha)\le H_1(1-\alpha)\quad\text{for}\ \alpha\in\Bigl[1- \frac{1}{2n-1},\overline\alpha\Bigr],$$
(A.30)

where \(\overline\alpha\), \(\overline\alpha\in\bigl[1-\frac{1}{2n-1},1\bigr]\) is a unique solution of the equation

$$J(1-\overline\alpha)= H_1(1-\overline\alpha)$$

in this interval. The validity of (A.28) follows from the fact that the difference \(H_1(1-\alpha)-H_2(1-\alpha)\) decreases with \(\alpha\) in the interval \(\bigl[1-\frac{1}{n},1\bigr]\) and is positive at \(\alpha=1\). To prove (A.29) and (A.30), we should only verify that the difference \(J(1-\alpha)-H_1(1-\alpha)\) increases with \(\alpha\) in the interval \(\bigl[1-\frac{1}{2n-1},1\bigr]\), is negative at \(\alpha=1-\frac{1}{2n-1}\), and is positive at \(\alpha=1\). This completes the proof of Proposition 3. △

Proof of Proposition 4.

First note that in this case of “moderate” values of the parameter \(\alpha\), where \(\frac{1}{n}<\alpha<1-\frac{1}{n}\), we are unable to obtain an explicit expression for \(\min\limits_{Y:\: \Pr\{Y=X\}=\alpha}H(X\,|\,Y)\) under a given distribution of \(X\), unlike the cases of small and large values of \(\alpha\). Therefore, for moderate values of \(\alpha\), we are unable to obtain some analogs of the lower bounds for \(I_{\max}(\alpha)\) given in Propositions 2 and 3.

For the moderate values of \(\alpha\) it is natural to consider two types of explicit lower bounds for \(I_{\max}(\alpha)\) and compare them: where the joint distribution of \(X\) and \(Y\) is such that \(\Pr\{Y=X\}=\alpha\) and \(H(X\,|\,Y)=0\), or where \(X\) has a uniform distribution, since in this case the precise value for \(\min\limits_{Y:\: \Pr\{Y=X\}=\alpha}H(X\,|\,Y)\) is known (see (9)). In the first case, where it is assumed that the conditional entropy \(H(X\,|\,Y)\) is zero and \(\Pr\{Y=X\}=\alpha\), the matrix of the joint distribution of \(X\) and \(Y\) contains a single nonzero entry in each column and each row, while the sum of diagonal entries is \(\alpha\). As can easily be verified, in the case where

$$\frac{k}{n}<\alpha\le\frac{k+1}{n},\quad k=1,2,\ldots,n-3,$$

to maximize the information \(I(X;Y)=H(X)\), we have to choose the best of the two ways to arrange nonzero entries in this joint distribution matrix:

  1. 1.

    \(k\) numbers \(\frac{\alpha}{k}\) are located on the diagonal and all the other \(n-k\) numbers \(\frac{1-\alpha}{n-k}\) are located outside the diagonal is such a way that each unoccupied row and each unoccupied column contains one such entry

  2. 2.

    \(k+1\) numbers \(\frac{\alpha}{k+1}\) are located on the diagonal and all the other \(n-k-1\) numbers \(\frac{1-\alpha}{n-k-1}\) are located outside the diagonal is such a way that each unoccupied row and each unoccupiedcolumn contains one such entry.

In the first case we have \(H(X)=H_k(\alpha)\), and in the second we have \(H(X)=H_{k+1}(\alpha)\), where \(H_i(x)\) is defined in (2). But if \(k=n-2\), i.e., \(\frac{n-2}{n}<\alpha<\frac{n-1}{n}\), then the equality \(H(Y\,|\,X)=0\) is possible only in the first of these cases of arranging nonzero entries in the joint distribution matrix, so in this case we have \(H(X)=H_{n-2}(\alpha)\). Now note that

$$\max\{H_k(\alpha),H_{k+1}(\alpha)\}= \begin{cases} H_k(\alpha) & \text{if}\ \dfrac{k}{n}<\alpha\le\widehat{\alpha}(k),\\ H_{k+1}(\alpha) & \text{if}\ \widehat{\alpha}(k)\le\alpha\le\dfrac{k+1}{n}, \end{cases}$$
(A.31)

where

$$\widehat{\alpha}(k)= \ln\biggl(1+\frac{1}{n-k-1}\biggr)\Bigm/ \ln\biggl(1+\frac{n}{k(n-k-1)}\biggr),\quad k=1,2,\ldots,n-3.$$
(A.32)

Indeed, equalities (A.31) and (A.32) follow from the fact that the difference \(H_k(\alpha)-H_{k+1}(\alpha)\) decreases with \(\alpha\),

$$H_k\Bigl(\frac{k}{n}\Bigr)-H_{k+1}\Bigl(\frac{k}{n}\Bigr)>0,\qquad H_k\Bigl(\frac{k+1}{n}\Bigr)-H_{k+1}\Bigl(\frac{k+1}{n}\Bigr)<0,$$

so \(\widehat{\alpha}(k)\) is a solution of the equation

$$H_k(\widehat{\alpha}(k))=H_{k+1}(\widehat{\alpha}(k)).$$

Thus, the validity of the lower bound (18) is proved; moreover, its optimality at

$$\alpha=\frac{k}{n},\quad k=0,1,2,\ldots,n-2,n,$$

follows from Proposition 1. The lower bound (19) is a direct consequence of equality (9). Now let us show that (18) is better than (19).

First assume that

$$\frac{k}{n}<\alpha\le\frac{2k+1}{2n},\quad k=1,2,\ldots,n-3.$$

In this case it suffices to show that

$$H_k(\alpha)>\ln n-\varphi\Bigl(\frac{1}{n},\alpha-\frac{k}{n}\Bigr).$$

To this end, note that the difference

$$H_k(\alpha)-\ln n+\varphi\Bigl(\frac{1}{n},\alpha-\frac{k}{n}\Bigr)$$

is a concave function of \(\alpha\) which vanishes at \(\alpha=\frac{k}{n}\). Therefore, to prove that in this case (18) is better than (19), it suffices to verify that

$$H_k\Bigl(\frac{2k+1}{2n}\Bigr)>\ln n-\varphi\Bigl(\frac{1}{n}, \frac{1}{2n}\Bigr).$$

Simple calculations show that

$$ \begin{gathered} H_k\Bigl(\frac{2k+1}{2n}\Bigr)=\ln n-\frac{2k+1}{2n}\ln\Bigl(1+\frac{1}{2k}\Bigr) -\frac{2n-2k-1}{2n}\ln\Bigl(1-\frac{1}{2(n-k)}\Bigr),\\ \ln n-\varphi\Bigl(\frac{1}{n},\frac{1}{2n}\Bigr)= \ln n-\frac{1}{2n}\ln\frac{27}{4}. \end{gathered}$$

The desired inequality \(H_k\bigl(\frac{2k+1}{2n}\bigr)>\ln n-\varphi\bigl(\frac{1}{n},\frac{1}{2n}\bigr)\) now follows from the fact that

$$\frac{2k+1}{2n}\ln\Bigl(1+\frac{1}{2k}\Bigr) +\frac{2n-2k-1}{2n}\ln\Bigl(1-\frac{1}{2(n-k)}\Bigr) \le\frac{2k+1}{4nk}-\frac{2n-2k-1}{4n(n-k)}< \frac{1}{2n}\ln\frac{27}{4}.$$

This means that in the case at hand, where

$$\frac{k}{n}<\alpha\le\frac{2k+1}{2n},\quad k=1,2,\ldots,n-3,$$

the lower bound (18) is better than (19).

A similar proof shows that (18) is better than (19) in both remaining cases, where either

$$\frac{2k+1}{2n}\le\alpha\le\frac{k+1}{n},\quad k=1,2,\ldots,n-3,$$

or \(k=n-2\), i.e.,

$$\frac{n-2}{n}<\alpha<\frac{n-1}{n}.$$

In the first of these cases, one proves that

$$H_{k+1}(\alpha)>\ln n-\varphi\Bigl(\frac{1}{n},\frac{k+1}{n}-\alpha\Bigr),$$

and in the second case, that

$$H_{n-2}(\alpha)>\ln n-\varphi\Bigl(\frac{1}{n},\alpha-\frac{n-2}{n}\Bigr).$$

This completes the proof of Proposition 4. △

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Prelov, V. On One Extremal Problem for Mutual Information. Probl Inf Transm 58, 217–230 (2022). https://doi.org/10.1134/S0032946022030024

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1134/S0032946022030024

Keywords

Navigation