On One Extremal Problem for Mutual Information

Prelov, V. V.

doi:10.1134/S0032946022030024

On One Extremal Problem for Mutual Information

INFORMATION THEORY
Published: 04 October 2022

Volume 58, pages 217–230, (2022)
Cite this article

Problems of Information Transmission Aims and scope Submit manuscript

V. V. Prelov¹

102 Accesses
Explore all metrics

Abstract

We address the problem of finding the maximum of the mutual information $I(X;Y)$ of two finite-valued random variables $X$ and $Y$ given only the value of their coupling, i.e., the probability $\Pr\{X=Y\}$. We obtain explicit lower and upper bounds on this maximum, which in some cases are optimal.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

On some extremal problems for mutual information and entropy

Article 01 October 2016

Information in the presence of noise. Shannon’s amount of information

Sherman’s and related inequalities with applications in information theory

Article Open access 24 April 2018

References

Erokhin, V., $\varepsilon$-Entropy of a Discrete Random Variable, Teor. Veroyatnost. i Primenen., 1958, vol. 3, no. 1, pp. 103–107 [Theory Probab. Appl. (Engl. Transl.), 1958, vol. 3, no. 1, pp. 97–100]. https://doi.org/10.1137/1103008
MathSciNet MATH Google Scholar
Berger, T., Rate Distortion Theory: A Mathematical Basis for Data Compression, Englewood Cliffs, NJ: Prentice-Hall, 1971.
MATH Google Scholar
Ho, S.-W. and Verdú, S., On the Interplay between Conditional Entropy and Error Probability, IEEE Trans. Inform. Theory, 2010, vol. 56, no. 12, pp. 5930–5942. https://doi.org/10.1109/TIT.2010.2080891
Article MathSciNet Google Scholar
Pinsker, M.S., On Estimation of Information via Variation, Probl. Peredachi Inf., 2005, vol. 41, no. 2, pp. 3–8 [Probl. Inf. Transm. (Engl. Transl.), 2005, vol. 41, no. 2, pp. 71–75]. https://doi.org/10.1007/s11122-005-0012-8
MathSciNet MATH Google Scholar
Zhang, Z., Estimating Mutual Information via Kolmogorov Distance, IEEE Trans. Inform. Theory, 2007, vol. 53, no. 9, pp. 3280–3282. https://doi.org/10.1109/TIT.2007.903122
Article MathSciNet Google Scholar
Prelov, V.V., Generalization of a Pinsker Problem, Probl. Peredachi Inf., 2011, vol. 47, no. 2, pp. 17–37 [Probl. Inf. Transm. (Engl. Transl.), 2011, vol. 47, no. 2, pp. 98–116]. https://doi.org/10.1134/S0032946011020037
MathSciNet MATH Google Scholar
Prelov, V.V. On Some Extremal Problems for Mutual Information and Entropy, Probl. Peredachi Inf., 2016, vol. 52, no. 4, pp. 3–13 [Probl. Inf. Transm. (Engl. Transl.), 2016, vol. 52, no. 4, pp. 319–328]. https://doi.org/10.1134/S0032946016040013
MathSciNet MATH Google Scholar

Download references

Acknowledgments

In conclusion, the author expresses his sincere gratitude to a reviewer for pointing out a number of inaccuracies in the paper, the correction of which has improved its quality.

Author information

Authors and Affiliations

Kharkevich Institute for Information Transmission Problems, Russian Academy of Sciences, Moscow, Russia
V. V. Prelov

Authors

V. V. Prelov
View author publications
You can also search for this author in PubMed Google Scholar

Additional information

Translated from Problemy Peredachi Informatsii, 2022, Vol. 58, No. 3, pp. 18–32. https://doi.org/10.31857/S0555292322030020

In memoriam Prof. E.M. Gabidulin

Appendix

Proof of Proposition 1.

The claim of this proposition is almost obvious. Indeed, the information $I(X;Y)$ takes its maximum value $\ln n$ if and only if $X$ has a uniform distribution and the conditional entropy $H(X\,|\,Y)$ vanishes, which means that $X$ should be a deterministic function of $Y$. Therefore, $I_{\max}(\alpha)=\ln n$ if and only if $n$ numbers $\frac{1}{n}$ can be placed in an $n\times n$ matrix of the joint distribution of $X$ and $Y$ in such a way that each column and each row of this matrix contains exactly one entry $\frac{1}{n}$ while the sum of diagonal entries is $\alpha$ (all other entries of the matrix being zero). It is clear that such an arrangement is possible if and only if $\alpha=\frac{k}{n}$, where $k$ is any integer from $0$ to $n$ except for $k=n-1$. △

Proof of Proposition 2.

First of all, note that throughout what follows, in the cases where we consider a probability distribution $P_{X}=P=\{p_i,\: i\in\mathcal{N}\}$ of a random variable $X$, we will assume for convenience that the components $p_i$ of this distribution are arranged in descending order, so that $p_1\ge p_2\ge\ldots\ge p_n>0$. In particular, we will always assume that $p_{\min}=\min\limits_{i\in\mathcal{N}}p_i=p_n$.

To prove the lower bound (5), we use the following result from [7, Corollary 2]: For a given probability distribution $P=\{p_i,\: i\in\mathcal{N}\}$ of a random variable $X$, we have the equality

$$\min\limits_{Y:\: \Pr\{Y=X\}=\alpha}H(X\,|\, Y)= \begin{cases} \varphi(\alpha,p_n) & \text{if}\ 0\le\alpha\le\alpha_0,\\ \varphi(p_n-\alpha,p_{n-1}) & \text{if}\ \alpha_0\le\alpha\le p_n, \end{cases}$$

(A.1)

where $\alpha_0$ is a solution of the equation

$$\varphi(p_n-\alpha_0,p_{n-1})= \varphi(\alpha_0,p_n)$$

(A.2)

and the function $\varphi(x,y)$ is defined in (4). Since $\varphi(x,y)$ increases in each of its arguments and we consider the minimum over all distributions of $X$ and $Y$ such that $\Pr\{Y=X\}=\alpha$ and only the minimal component $p_n$ of the distribution $P$ is given, (A.1) and (A.2) imply that this minimum is attained at any $P$ for which $p_{n-1}=p_n$; therefore, we obtain

$$\min\limits_{(X,Y):\: \Pr\{Y=X\}=\alpha,\: p_{\min}=p_n}H(X\,|\, Y) =\varphi(\alpha,p_n)\quad \text{if}\quad 0\le\alpha\le\frac{p_n}{2},$$

(A.3)

$$\min\limits_{(X,Y):\: \Pr\{Y=X\}=\alpha,\: p_{\min}=p_n}H(X\,|\, Y)=\varphi(p_n-\alpha,p_n) \le\varphi(\alpha,p_n)\quad \text{if}\quad \frac{p_n}{2}\le\alpha\le p_n.$$

(A.4)

Now, taking into account (A.3) and (A.4), we have

$$\begin{aligned}[b] I_{\max}(\alpha)&\ge\max\limits_{p_n:\:\alpha\le p_n\le\frac{1}{n}}\;\max\limits_{X:\: p_{\min}=p_n}\Bigl[ H(X)-\min\limits_{Y:\:\Pr\{Y=X\}=\alpha}H(X\,|\,Y)\Bigr]\\ &\ge\max\limits_{p_n:\:\alpha\le p_n\le\frac{1}{n}}\bigl[H_1(p_n)-\varphi(\alpha,p_n) \bigr],\end{aligned}$$

(A.5)

where $H_1(\cdot)$ is defined in (2). In deriving (A.5), we have also used the fact that $\max\limits_{X:\: p_{\min}=p_n}H(X)=H_1(p_n)$. It can easily be verified that $H_1(p_n)-\varphi(\alpha,p_n)$ is a concave function of $p_n$ and that its maximum in the interval $p_n\in\bigl[\alpha,\frac{1}{n}\bigr]$ is attained at

$$p_n=p^*_n(\alpha)=\frac{1}{n}-\frac{n-1}{n}\alpha$$

if $p^*_n(\alpha)\in\bigl[\alpha,\frac{1}{n}\bigr]$, i.e., if $0\le\alpha\le\frac{1}{2n-1}$. If $\frac{1}{2n-1}\le\alpha\le\frac{1}{n}$, then the maximum of $H_1(p_n)-\varphi(\alpha,p_n)$ is attained at $p_n=\alpha$. One can easily verify that

$$H_1(p^*_n(\alpha))-\varphi(\alpha,p^*_n(\alpha))=J(\alpha),$$

where $J(\cdot)$ is defined in (3). Thus, (A.5) implies that

$$I_{\max}(\alpha)\ge \begin{cases} J(\alpha) & \text{if}\ 0\le\alpha\le\dfrac{1}{2n-1},\\[2pt] H_1(\alpha)-\varphi(\alpha,\alpha) & \text{if}\ \dfrac{1}{2n-1}<\alpha\le\dfrac{1}{n}. \end{cases}$$

(A.6)

Now note that for any $\alpha\in\bigl[0,\frac{1}{n}\bigr]$ we have another lower bound

$$I_{\max}(\alpha)\ge H_1(\alpha)\quad \text{if}\quad \alpha\in\Bigl[0,\frac{1}{n}\Bigr].$$

(A.7)

Indeed, $I(X;Y)=H_1(\alpha)$ if the entries $p_{ij}$ of the matrix $M=\left\|p_{ij}\right\|_{i,j=1}^n$ of the joint distribution of $X$ and $Y$ have the form

$$p_{ij}= \begin{cases} \alpha & \text{if}\ i=j=1,\\ \dfrac{1-\alpha}{n-1} & \text{if}\ j=i+1,\: i=2,3,\ldots,n-1,\ \text{and if}\ i=n\ \text{and}\ j=2,\\ 0 & \text{in all other cases}, \end{cases}$$

since in this case we have $H(X)=H_1(\alpha)$ and $H(X\,|\,Y)=0$.

It is obvious that the lower bound (A.7) is better than (A.6) if $\frac{1}{2n-1}<\alpha\le\frac{1}{n}$, since $\varphi(\alpha,\alpha)>0$ for $\alpha>0$. Therefore, it is also necessary to compare these bounds in the interval $\alpha\in\bigl[0,\frac{1}{2n-1}\bigl]$. We have

$$\left[J(\alpha)-H_1(\alpha)\right]'_{\alpha}=\ln \frac{n(n-1)\alpha^2}{1-\alpha^2}<0\quad\text{for}\quad \alpha\in\Bigl[0,\frac{1}{2n-1}\Bigl],$$

since $\frac{n(n-1)\alpha^2}{1-\alpha^2}\le\frac{1}{4}$. Therefore, the function $J(\alpha)-H_1(\alpha)$ decreases with $\alpha$ and is positive at $\alpha=0$. Now let us show that

$$J\biggl(\frac{1}{3n-1}\biggr)-H_1\biggl(\frac{1}{3n-1} \biggr)<0.$$

Indeed, one can easily check that

$$J\biggl(\frac{1}{3n-1}\biggr)=\ln(3n-1)- \frac{3n-2}{3n-1}\ln 3-\frac{2\ln 3}{3n-1},$$

(A.8)

$$H_1\biggl(\frac{1}{3n-1}\biggr)=\ln(3n-1)- \frac{3n-2}{3n-1}\ln 3-\frac{3n-2}{3n-1}\ln\biggl(1+\frac{1}{3n-3} \biggr).$$

(A.9)

The inequality $J\bigl(\frac{1}{3n-1}\bigr)-H_1\bigl(\frac{1}{3n-1}\bigr)<0$ follows from the fact that

$$\frac{3n-2}{3n-1}\ln\biggl(1+\frac{1}{3n-3}\biggr)<\frac{2\ln 3}{3n-1}.$$

Thus, we have

$$\begin{aligned}J(\alpha)>H_1(\alpha)\quad &\text{if}\quad 0\le\alpha\le\alpha^*,\\ J(\alpha)<H_1(\alpha)\quad &\text{if}\quad \alpha^*<\alpha<\frac{1}{3n-1},\end{aligned}$$

where $\alpha^*$ is a unique solution of the equation $J(\alpha^*)=H_1(\alpha^*)$ in the interval $0\le\alpha^{(*)}<\frac{1}{3n-1}$. The lower bound (5) is proved.

Now let us prove equality (6) and the upper bound (7). To this end, note that

$$I_{\max}(\alpha)=\max\{J_1(\alpha),J_2(\alpha)\}\quad \text{if}\quad 0\le\alpha\le\frac{1}{2n},$$

(A.10)

where

$$J_1(\alpha)=\max\limits_{p_n:\: 2\alpha\le p_n\le\frac{1}{n}}\,\max\limits_{X:\: p_{\min}=p_n}\Bigl[H(X)-\min\limits_{Y:\: \Pr\{Y=X\}=\alpha}H(X\,|\,Y)\Bigr],$$

(A.11)

$$J_2(\alpha)=\max\limits_{p_n:\: p_n\le 2\alpha}\,\max\limits_{X:\: p_{\min}=p_n}\Bigl[H(X)-\min\limits_{Y:\: \Pr\{Y=X\}=\alpha}H(X\,|\,Y)\Bigr].$$

(A.12)

According to (A.3), we have

$$\min\limits_{Y:\: \Pr\{Y=X\}=\alpha}H(X\,|\,Y)=\varphi(\alpha,p_n)\quad \text{if}\quad p_n\ge 2\alpha,$$

and we have seen that

$$\max\limits_{p_n:\: 2\alpha\le p_n\le\frac{1}{n}}\,\max\limits_{X:\: p_{\min}=p_n}[H(X)-\varphi(\alpha,p_n)]=J(\alpha)$$

and that this maximum is attained at

$$p_n=p^*_n(\alpha)=\frac{1}{n}-\frac{n-1}{n}\alpha.$$

However, the above equality for the maximum is valid only if $p^*_n(\alpha)$ satisfies our constraint $p^*_n(\alpha)\ge\ 2\alpha$ (previously, we had another constraint $p^*_n(\alpha)\ge\alpha$), which is equivalent to the condition $0\le\alpha\le\frac{1}{3n-1}$. But if $\frac{1}{3n-1}\le\alpha\le\frac{1}{2n}$, then

$$\max\limits_{p_n:\: 2\alpha\le p_n\le\frac{1}{n}}\,\max\limits_{X:\: p_{\min}=p_n}[H(X)-\varphi(\alpha,p_n)]$$

is attained at $p_n=2\alpha$. Thus, we have the equality

$$J_1(\alpha)= \begin{cases} J(\alpha) & \text{if}\ 0\le\alpha\le\dfrac{1}{3n-1},\\ H_1(2\alpha)-\varphi(\alpha,2\alpha) & \text{if}\ \dfrac{1}{3n-1}<\alpha\le\dfrac{1}{2n}. \end{cases}$$

(A.13)

In the case where $p_n\le 2\alpha$, a precise value of

$$\max\limits_{p_n:\: p_n\le 2\alpha}\,\max\limits_{X:\: p_{\min}=p_n}\Bigl[H(X)-\min\limits_{Y:\: \Pr\{Y=X\}=\alpha}H(X\,|\,Y)\Bigr]$$

is not known, and therefore we can only claim that

$$J_2(\alpha)\le\max\limits_{X:\: p_n\le 2\alpha}H(X)=H_1(2\alpha),\quad 0\le\alpha\le\frac{1}{2n}.$$

(A.14)

Therefore, (A.10)–(A.14) imply now that

$$I_{\max}(\alpha)=J(\alpha)\quad \text{if}\quad J(\alpha)\ge H_1(2\alpha).$$

(A.15)

Let us find a restriction on $\alpha$ under which $J(\alpha)\ge H_1(2\alpha)$. One can easily verify that the difference $J(\alpha)-H_1(2\alpha)$ decreases with $\alpha$, is positive at $\alpha=0$, and is negative at $\alpha=\frac{1}{3n-1}$. Indeed, we have

$$\begin{aligned} J\Bigl(\frac{1}{3n-1}\Bigr)&=\ln(3n-1)- \frac{3n}{3n-1}\ln 3\quad \text{(see (A.8))},\\ H_1\Bigl(\frac{2}{3n-1}\Bigr)&=\ln(3n-1)- \frac{3n}{3n-1}\ln 3+\frac{1}{3n-1}\ln \frac{27}{4}>J\Bigl(\frac{1}{3n-1}\Bigr). \end{aligned}$$

Therefore, $J(\alpha)\ge H_1(2\alpha)$ in the interval $0\le\alpha\le\alpha_*$ and $J(\alpha)\le H_1(2\alpha)$ in the interval $\alpha_*<\alpha<\frac{1}{3n-1}$, where $\alpha_*$ is a solution of the equation $J(\alpha_*)= H_1(2\alpha_*)$. In addition, it is obvious that $\alpha_*<\alpha^*<\frac{1}{3n-1}$, where $\alpha^*$ is a solution of the equation $J(\alpha^*)= H_1(\alpha^*)$. Now (A.13)–(A.15) imply equality (6) and the upper bound (7).

In conclusion, note that the second equality in (A.13) leads to the lower bound

$$I_{\max}(\alpha)\ge H_1(2\alpha)-\varphi(\alpha,2\alpha)\quad \text{if}\quad \frac{1}{3n-1}\le\alpha\le\frac{1}{2n},$$

(A.16)

which, however, is weaker than the lower bound (5) proved above. To this end, it should only be verified that

$$H_1(\alpha)>H_1(2\alpha)-\varphi(\alpha,2\alpha)\quad \text{if}\quad \frac{1}{3n-1}\le\alpha\le\frac{1}{2n}.$$

One can easily check that the difference $H_1(\alpha)-[H_1(2\alpha)-\varphi(\alpha,2\alpha)]$ increases with $\alpha$ in the interval $\alpha\in\bigl[\frac{1}{3n-1},\frac{1}{2n} \bigr]$ and is positive at $\alpha=\frac{1}{3n-1}$. Indeed, one can easily verify that in this interval we have

$$[H_1(\alpha)-H_1(2\alpha)+\varphi(\alpha,2\alpha)]'_{\alpha}= \ln\frac{27(n-1)\alpha(1-\alpha)}{(1-2\alpha)^2}>0$$

and

$$H_1\Bigl(\frac{1}{3n-1}\Bigr)- H_1\Bigl(\frac{2}{3n-1}\Bigr)+\varphi\Bigl(\frac{1}{3n-1}, \frac{2}{3n-1}\Bigr) =-\frac{3n-2}{3n-1}\ln \Bigl(1+\frac{1}{3n-3}\Bigr)+\frac{2\ln 3}{3n-1}\ge 0.$$

This completes the proof of Proposition 2. △

Proof of Proposition 3.

Note first of all that although the proof of this proposition is conceptually quite similar to that of Proposition 2 given above, it has some essential distinctions. In particular, in the case at hand, in contrast to Proposition 2, lower bounds for $I_{\max}(\alpha)$ significantly differ depending on the value of the parameter $n$, when $n\le 14$ or when $n\ge 15$.

First, we again use the following result from [7, Corollary 2]: For a given probability distribution $P=\{p_i,\: i\in\mathcal{N}\}$ of a random variable $X$, we have the equality

$$\min\limits_{Y:\: \Pr\{Y=X\}=\alpha}H(X\,|\,Y)=\varphi(1-\alpha,p_n)\quad \text{if}\quad 1-p_n\le\alpha\le 1.$$

(A.17)

Using (A.17), we obtain

$$I_{\max}(\alpha)=\max\{J'_1(\alpha),J'_2(\alpha)\},$$

(A.18)

where

$$\begin{aligned}[b]J'_1(\alpha)&=\max\limits_{p_n:\: 1-\alpha\le p_n\le\frac{1}{n}}\,\max\limits_{X:\: p_{\min}=p_n}\Bigl[H(X)-\min\limits_{Y:\: \Pr\{Y=X\}=\alpha}H(X\,|\,Y)\Bigr]\\ &=\max\limits_{p_n:\: 1-\alpha\le p_n\le\frac{1}{n}}\,[H_1(p_n)-\varphi(1-\alpha,p_n)],\end{aligned}$$

(A.19)

$$\begin{aligned}[b]J'_2(\alpha)&=\max\limits_{p_n:\: p_n\le 1-\alpha}\,\max\limits_{X:\: p_{\min}=p_n}\,\Bigl[H(X)-\min\limits_{Y:\: \Pr\{Y=X\}=\alpha}H(X\,|\,Y)\Bigr]\\ &\le\max\limits_{p_n:\: p_n\le 1-\alpha}H(X)=H_1(1-\alpha).\end{aligned}$$

(A.20)

In the proof of Proposition 2, we have shown that

$$\max\limits_{p_n:\: \alpha\le p_n\le\frac{1}{n}}\,[H_1(p_n)-\varphi(\alpha,p_n)]= \begin{cases} J(\alpha) & \text{if}\ 0\le\alpha\le\dfrac{1}{2n-1},\\ H_1(\alpha)-\varphi(\alpha,\alpha) & \text{if}\ \dfrac{1}{2n-1}<\alpha\le\dfrac{1}{n} \end{cases}$$

(A.21)

and that this maximum is attained at

$$p_n=p^*_n(\alpha)=\frac{1}{n}-\frac{n-1}{n}\alpha$$

if $0\le\alpha\le\frac{1}{2n-1}$ and at $p_n=\alpha$ if $\frac{1}{2n-1}\le\alpha\le\frac{1}{n}$.

In the considered case of large values of $\alpha$, finding the maximum

$$\max\limits_{p_n:\: 1-\alpha\le p_n\le\frac{1}{n}}\,[H_1(p_n)-\varphi(1-\alpha,p_n)]$$

differs from that in (A.21) by only replacing the parameter $\alpha$ with $1-\alpha$. Therefore, (A.18) and (A.19) imply the following lower bound for $I_{\max}(\alpha)$:

$$I_{\max}(\alpha)\ge J'_1(\alpha)= \begin{cases} J(1-\alpha) & \text{if}\ 1-\dfrac{1}{2n-1}\le\alpha\le 1,\\[2pt] H_1(1-\alpha)-\varphi(1-\alpha,1-\alpha) & \text{if}\ 1-\dfrac{1}{n}\le\alpha\le 1-\dfrac{1}{2n-1}; \end{cases} $$

(A.22)

moreover, the first equality for $J'_1(\alpha)$ is attained at

$$p_n=p^*_n(1-\alpha)=\frac{1}{n}- \frac{n-1}{n}(1-\alpha)$$

and the second is attained at $p_n=1-\alpha$.

Now note that we have another lower bound

$$I_{\max}(\alpha)\ge H_2(1-\alpha)\quad\text{for all}\quad \alpha\in\Bigl[1-\frac{1}{n}, 1\Bigr]. $$

(A.23)

Indeed, this lower bound follows from the fact that $I(X;Y)=H_2(1-\alpha)$ if the entries of the matrix $M=\left\|p_{ij}\right\|_{i,j=1}^n$ of the joint distribution of $X$ and $Y$ are given by

$$p_{ij}=\begin{cases}\dfrac{\alpha}{n-2} & \text{if}\ i=j=1,2,\ldots,n-2,\\ \dfrac{1-\alpha}{2} & \text{if}\ i=n-1,\: j=n\ \text{and if}\ i=n,\: j=n-1,\\ 0 & \text{in all other cases},\end{cases}$$

since for this joint distribution we have $H(X)=H_2(1-\alpha)$ and $H(X\,|\,Y)=0$. Note that, in contrast to the case of small values of $\alpha$, where the lower bound $I_{\max}(\alpha)\ge H_1(\alpha)$ was valid (see (A.7)), in the present case of large values of $\alpha$ we cannot claim that $I_{\max}(\alpha)\ge H_1(1-\alpha)$, since we are unable to construct a joint distribution of $X$ and $Y$ such that $H(X)=H_1(1-\alpha)$ and $H(X\,|\,Y)=0$.

Thus, it is necessary to compare the lower bounds (A.22) with (A.23) for different values of $\alpha$ and choose the best of them. Compare now the first bound in (A.22) with (A.23) in the interval $\alpha\in\bigl[1-\frac{1}{2n-1},1 \bigr]$. One can easily check that the difference $J(1-\alpha)-H_2(1-\alpha)$ increases with $\alpha$ in this interval, since

$$ \bigl(J(1-\alpha)-H_2(1-\alpha)\bigr)'_{\alpha}=\ln \frac{2\alpha(2-\alpha)}{n(n-2)(1-\alpha)^2}>0,$$

and this difference is positive at $\alpha=1$. Therefore, we have to compare the values $J(1-\alpha)$ with $H_2(1-\alpha)$ at $\alpha=1-\frac{1}{2n-1}$. We have

$$J\Bigl(\frac{1}{2n-1}\Bigr)=\ln(2n-1)- \frac{2n\ln 2}{2n-1},$$

(A.24)

$$H_2\Bigl(\frac{1}{2n-1}\Bigr)=\ln(2n-1)- \frac{(2n-3)\ln 2}{2n-1}-\frac{2n-2}{2n-1}\ln \frac{n-1}{n-2},$$

(A.25)

so that

$$H_2\Bigl(\frac{1}{2n-1}\Bigr)= J\Bigl(\frac{1}{2n-1}\Bigr)+\frac{1}{2n-1} \Bigl[\ln 8-2(n-1)\ln \frac{n-1}{n-2}\Bigr].$$

(A.26)

Noting that the function $(n-1)\ln\frac{n-1}{n-2}$ decreases with $n$ and taking into account that $\ln 8\approx 2.07944$ and

$$2(n-1)\ln\frac{n-1}{n-2}\approx \begin{cases} 2.08111 & \text{if}\ n=14,\\ 2.07502 & \text{if}\ n=15, \end{cases}$$

(A.27)

we obtain that the lower bound (A.22) is better than (A.23) for all $\alpha\in\bigl[1-\frac{1}{2n-1},1\bigr]$ if $3\le n\le 14$ and for $\alpha\in [\widetilde{\alpha},1]$ if $n\ge 15$, where $\widetilde{\alpha}$, $1-\frac{1}{2n-1}<\widetilde{\alpha}<1$, is a unique solution of the equation

$$J(1-\widetilde{\alpha})=H_2(1-\widetilde{\alpha})$$

in the given interval. But if $n\ge 15$ and $\alpha\in\bigl[1-\frac{1}{2n-1},\widetilde{\alpha}\bigr]$, then (A.23) is better than (A.22). Thus, we have proved the validity of (11) and (12) in the interval $\bigl[1-\frac{1}{2n-1},1\bigr]$.

Now let us compare the second bound in (A.22) with (A.23) in the interval $\alpha\in\bigl[1-\frac{1}{n},1-\frac{1}{2n-1}\bigr]$. To this end, note that the difference

$$[H_1(1-\alpha)-\varphi(1-\alpha,1-\alpha)]-H_2(1-\alpha)$$

increases with $\alpha$ and is negative at $\alpha=1-\frac{1}{n}$, since one can easily check that

$$H_1\Bigl(\frac{1}{n}\Bigr)-\varphi\Bigl(\frac{1}{n}, \frac{1}{n}\Bigr)-H_2\Bigl(\frac{1}{n}\Bigr)= -\frac{3\ln 2}{n}+\frac{n-1}{n}\ln \biggl(1+\frac{1}{n-2}\biggr)<0,$$

because $\ln 8>2$ and

$$(n-1)\ln \bigl(1+\frac{1}{n-2}\bigr)\le\frac{n-1}{n-2}<2\quad \text{for}\quad n>2.$$

Now, noting that this difference at $\alpha=1-\frac{1}{2n-1}$ equals

$$-\frac{1}{2n-1}\biggl[\ln 8-2(n-1)\ln\frac{n-1}{n-2}\biggr],$$

since one can easily check that

$$H_1\Bigl(\frac{1}{2n-1}\Bigr)-\varphi\Bigl(\frac{1}{2n-1},\frac{1}{2n-1}\Bigr) =J\Bigl(\frac{1}{2n-1}\Bigr)$$

and that for $J\bigl(\frac{1}{2n-1}\bigr)$ equality (A.26) is valid, again taking into account (A.27), we verify the validity of the lower bounds in (11) and (12) in the interval $\alpha\in\bigl[1-\frac{1}{n},1-\frac{1}{2n-1}\bigr]$ too.

To prove equality (13) and the upper bound (14), it suffices to check that

$$H_1(1-\alpha)\ge H_2(1-\alpha)\quad\text{for all}\ \alpha,\: 1-\frac{1}{n}\le\alpha\le 1,$$

(A.28)

and that

$$J(1-\alpha)\ge H_1(1-\alpha)\quad\text{for}\ \alpha\in[\overline\alpha,1],$$

(A.29)

$$J(1-\alpha)\le H_1(1-\alpha)\quad\text{for}\ \alpha\in\Bigl[1- \frac{1}{2n-1},\overline\alpha\Bigr],$$

(A.30)

where $\overline\alpha$, $\overline\alpha\in\bigl[1-\frac{1}{2n-1},1\bigr]$ is a unique solution of the equation

$$J(1-\overline\alpha)= H_1(1-\overline\alpha)$$

in this interval. The validity of (A.28) follows from the fact that the difference $H_1(1-\alpha)-H_2(1-\alpha)$ decreases with $\alpha$ in the interval $\bigl[1-\frac{1}{n},1\bigr]$ and is positive at $\alpha=1$. To prove (A.29) and (A.30), we should only verify that the difference $J(1-\alpha)-H_1(1-\alpha)$ increases with $\alpha$ in the interval $\bigl[1-\frac{1}{2n-1},1\bigr]$, is negative at $\alpha=1-\frac{1}{2n-1}$, and is positive at $\alpha=1$. This completes the proof of Proposition 3. △

Proof of Proposition 4.

First note that in this case of “moderate” values of the parameter $\alpha$, where $\frac{1}{n}<\alpha<1-\frac{1}{n}$, we are unable to obtain an explicit expression for $\min\limits_{Y:\: \Pr\{Y=X\}=\alpha}H(X\,|\,Y)$ under a given distribution of $X$, unlike the cases of small and large values of $\alpha$. Therefore, for moderate values of $\alpha$, we are unable to obtain some analogs of the lower bounds for $I_{\max}(\alpha)$ given in Propositions 2 and 3.

For the moderate values of $\alpha$ it is natural to consider two types of explicit lower bounds for $I_{\max}(\alpha)$ and compare them: where the joint distribution of $X$ and $Y$ is such that $\Pr\{Y=X\}=\alpha$ and $H(X\,|\,Y)=0$, or where $X$ has a uniform distribution, since in this case the precise value for $\min\limits_{Y:\: \Pr\{Y=X\}=\alpha}H(X\,|\,Y)$ is known (see (9)). In the first case, where it is assumed that the conditional entropy $H(X\,|\,Y)$ is zero and $\Pr\{Y=X\}=\alpha$, the matrix of the joint distribution of $X$ and $Y$ contains a single nonzero entry in each column and each row, while the sum of diagonal entries is $\alpha$. As can easily be verified, in the case where

$$\frac{k}{n}<\alpha\le\frac{k+1}{n},\quad k=1,2,\ldots,n-3,$$

to maximize the information $I(X;Y)=H(X)$, we have to choose the best of the two ways to arrange nonzero entries in this joint distribution matrix:

1.
$k$ numbers $\frac{\alpha}{k}$ are located on the diagonal and all the other $n-k$ numbers $\frac{1-\alpha}{n-k}$ are located outside the diagonal is such a way that each unoccupied row and each unoccupied column contains one such entry
2.
$k+1$ numbers $\frac{\alpha}{k+1}$ are located on the diagonal and all the other $n-k-1$ numbers $\frac{1-\alpha}{n-k-1}$ are located outside the diagonal is such a way that each unoccupied row and each unoccupiedcolumn contains one such entry.

In the first case we have $H(X)=H_k(\alpha)$, and in the second we have $H(X)=H_{k+1}(\alpha)$, where $H_i(x)$ is defined in (2). But if $k=n-2$, i.e., $\frac{n-2}{n}<\alpha<\frac{n-1}{n}$, then the equality $H(Y\,|\,X)=0$ is possible only in the first of these cases of arranging nonzero entries in the joint distribution matrix, so in this case we have $H(X)=H_{n-2}(\alpha)$. Now note that

$$\max\{H_k(\alpha),H_{k+1}(\alpha)\}= \begin{cases} H_k(\alpha) & \text{if}\ \dfrac{k}{n}<\alpha\le\widehat{\alpha}(k),\\ H_{k+1}(\alpha) & \text{if}\ \widehat{\alpha}(k)\le\alpha\le\dfrac{k+1}{n}, \end{cases}$$

(A.31)

where

$$\widehat{\alpha}(k)= \ln\biggl(1+\frac{1}{n-k-1}\biggr)\Bigm/ \ln\biggl(1+\frac{n}{k(n-k-1)}\biggr),\quad k=1,2,\ldots,n-3.$$

(A.32)

Indeed, equalities (A.31) and (A.32) follow from the fact that the difference $H_k(\alpha)-H_{k+1}(\alpha)$ decreases with $\alpha$,

$$H_k\Bigl(\frac{k}{n}\Bigr)-H_{k+1}\Bigl(\frac{k}{n}\Bigr)>0,\qquad H_k\Bigl(\frac{k+1}{n}\Bigr)-H_{k+1}\Bigl(\frac{k+1}{n}\Bigr)<0,$$

so $\widehat{\alpha}(k)$ is a solution of the equation

$$H_k(\widehat{\alpha}(k))=H_{k+1}(\widehat{\alpha}(k)).$$

Thus, the validity of the lower bound (18) is proved; moreover, its optimality at

$$\alpha=\frac{k}{n},\quad k=0,1,2,\ldots,n-2,n,$$

follows from Proposition 1. The lower bound (19) is a direct consequence of equality (9). Now let us show that (18) is better than (19).

First assume that

$$\frac{k}{n}<\alpha\le\frac{2k+1}{2n},\quad k=1,2,\ldots,n-3.$$

In this case it suffices to show that

$$H_k(\alpha)>\ln n-\varphi\Bigl(\frac{1}{n},\alpha-\frac{k}{n}\Bigr).$$

To this end, note that the difference

$$H_k(\alpha)-\ln n+\varphi\Bigl(\frac{1}{n},\alpha-\frac{k}{n}\Bigr)$$

is a concave function of $\alpha$ which vanishes at $\alpha=\frac{k}{n}$. Therefore, to prove that in this case (18) is better than (19), it suffices to verify that

$$H_k\Bigl(\frac{2k+1}{2n}\Bigr)>\ln n-\varphi\Bigl(\frac{1}{n}, \frac{1}{2n}\Bigr).$$

Simple calculations show that

$$ \begin{gathered} H_k\Bigl(\frac{2k+1}{2n}\Bigr)=\ln n-\frac{2k+1}{2n}\ln\Bigl(1+\frac{1}{2k}\Bigr) -\frac{2n-2k-1}{2n}\ln\Bigl(1-\frac{1}{2(n-k)}\Bigr),\\ \ln n-\varphi\Bigl(\frac{1}{n},\frac{1}{2n}\Bigr)= \ln n-\frac{1}{2n}\ln\frac{27}{4}. \end{gathered}$$

The desired inequality $H_k\bigl(\frac{2k+1}{2n}\bigr)>\ln n-\varphi\bigl(\frac{1}{n},\frac{1}{2n}\bigr)$ now follows from the fact that

$$\frac{2k+1}{2n}\ln\Bigl(1+\frac{1}{2k}\Bigr) +\frac{2n-2k-1}{2n}\ln\Bigl(1-\frac{1}{2(n-k)}\Bigr) \le\frac{2k+1}{4nk}-\frac{2n-2k-1}{4n(n-k)}< \frac{1}{2n}\ln\frac{27}{4}.$$

This means that in the case at hand, where

$$\frac{k}{n}<\alpha\le\frac{2k+1}{2n},\quad k=1,2,\ldots,n-3,$$

the lower bound (18) is better than (19).

A similar proof shows that (18) is better than (19) in both remaining cases, where either

$$\frac{2k+1}{2n}\le\alpha\le\frac{k+1}{n},\quad k=1,2,\ldots,n-3,$$

or $k=n-2$, i.e.,

$$\frac{n-2}{n}<\alpha<\frac{n-1}{n}.$$

In the first of these cases, one proves that

$$H_{k+1}(\alpha)>\ln n-\varphi\Bigl(\frac{1}{n},\frac{k+1}{n}-\alpha\Bigr),$$

and in the second case, that

$$H_{n-2}(\alpha)>\ln n-\varphi\Bigl(\frac{1}{n},\alpha-\frac{n-2}{n}\Bigr).$$

This completes the proof of Proposition 4. △

Rights and permissions

Reprints and permissions

About this article

Cite this article

Prelov, V. On One Extremal Problem for Mutual Information. Probl Inf Transm 58, 217–230 (2022). https://doi.org/10.1134/S0032946022030024

Download citation

Received: 24 May 2022
Revised: 09 August 2022
Accepted: 09 August 2022
Published: 04 October 2022
Issue Date: July 2022
DOI: https://doi.org/10.1134/S0032946022030024

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

On One Extremal Problem for Mutual Information

Abstract

Access this article

Similar content being viewed by others

On some extremal problems for mutual information and entropy

Information in the presence of noise. Shannon’s amount of information

Sherman’s and related inequalities with applications in information theory

References

Acknowledgments

Author information

Authors and Affiliations

Additional information

Appendix

Proof of Proposition 1.

Proof of Proposition 2.

Proof of Proposition 3.

Proof of Proposition 4.

Rights and permissions

About this article

Cite this article

Keywords

Navigation

On One Extremal Problem for Mutual Information

Abstract

Access this article

Similar content being viewed by others

On some extremal problems for mutual information and entropy

Information in the presence of noise. Shannon’s amount of information

Sherman’s and related inequalities with applications in information theory

References

Acknowledgments

Author information

Authors and Affiliations

Additional information

Appendix

Appendix

Proof of Proposition 1.

Proof of Proposition 2.

Proof of Proposition 3.

Proof of Proposition 4.

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation