Skip to main content
Log in

Opportunistic Fair Scheduling in Wireless Networks: An Approximate Dynamic Programming Approach

  • Published:
Mobile Networks and Applications Aims and scope Submit manuscript

Abstract

We consider the problem of temporal fair scheduling of queued data transmissions in wireless heterogeneous networks. We deal with both the throughput maximization problem and the delay minimization problem. Taking fairness constraints and the data arrival queues into consideration, we formulate the transmission scheduling problem as a Markov decision process (MDP) with fairness constraints. We study two categories of fairness constraints, namely temporal fairness and utilitarian fairness. We consider two criteria: infinite horizon expected total discounted reward and expected average reward. Applying the dynamic programming approach, we derive and prove explicit optimality equations for the above constrained MDPs, and give corresponding optimal fair scheduling policies based on those equations. A practical stochastic-approximation-type algorithm is applied to calculate the control parameters online in the policies. Furthermore, we develop a novel approximation method—temporal fair rollout—to achieve a tractable computation. Numerical results show that the proposed scheme achieves significant performance improvement for both throughput maximization and delay minimization problems compared with other existing schemes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. An MDP is unichain if the transition matrix corresponding to every deterministic stationary policy consists of one single recurrent class plus a possibly empty set of transient state [12].

References

  1. Zhang Z, Moola S, Chong EKP (2008) Approximate stochastic dynamic programming for opportunistic fair scheduling in wireless networks. In: Proc. 47th IEEE conference on decision and control, Cancun, 9–11 December 2008, pp 1404–1409

  2. Knopp R, Humblet P (1995) Information capacity and power control in single cell multiuser communications. In: Proc. IEEE int. conference on communications 1995, vol 1, pp 331–335

  3. Andrews M, Kumaran K, Ramanan K, Stolyar A, Whiting P, Vijayakumar R (2001) Providing quality of service over a shared wireless link. IEEE Commun Mag 39(2):150–153

    Article  Google Scholar 

  4. Bender P, Black P, Grob M, Padovani R, Sindhushyana N, Viterbi A (2000) Cdma/hdr: a bandwidth-efficient high-speed wireless data service for nomadic users. IEEE Commun Mag 38(7):70–77

    Article  Google Scholar 

  5. Andrews M (2005) A survey of scheduling theory in wireless data networks. In: Proc. 2005 IMA summer workshop on wireless communications

  6. Parkvall S, Dahlman E, Frenger P, Beming P, Persson M (2001) The high speed packet data evolution of wcdma. In: Proc. IEEE VTC 2001, vol 3, pp 2287–2291

  7. Liu X, Chong EKP, Shroff NB (2001) Opportunistic transmission scheduling with resource-sharing constraints in wireless networks. IEEE J Sel Areas Commun 19(10):2053–2064

    Article  Google Scholar 

  8. Liu X, Chong EKP, Shroff NB (2003) A framework for opportunistic scheduling in wireless networks. Comput Netw 41(4):451–474

    Article  MATH  Google Scholar 

  9. Liu X, Chong EKP, Shroff NB (2004) Opportunistic scheduling: an illustration of cross-layer design. Telecommun Rev 14(6):947–959

    Google Scholar 

  10. Wang HS, Moayeri N (1995) Finite-state markov channel—a useful model for radio communication channels. IEEE Trans Veh Technol 43:163–171

    Article  Google Scholar 

  11. Kelly F (1997) Charging and rate control for elastic traffic. Eur Trans Telecommun 8:33–37

    Article  Google Scholar 

  12. Puterman ML (1994) Markov decision processes. Wiley, New York

    Book  MATH  Google Scholar 

  13. Bertsekas DP (2001) Dynamic programming and optimal control, 2nd ed. Athena, Belmont

    MATH  Google Scholar 

  14. Ross SM (1970) Applied probability models with optimization applications. Dover, New York

    MATH  Google Scholar 

  15. Derman C (1970) Finite sate Markovian decision processes. Academic, New York

    Google Scholar 

  16. Altman E (1998) Constrained Markov decision processes. Chapman and Hall/CRC, London

    Google Scholar 

  17. Piunovskiy AB (1997) Optimal control of random sequences in problems with constraints. Kluwer, Dordrecht

    MATH  Google Scholar 

  18. Feinberg EA, Shwartz A (eds) (2002) Handbook of Markov decision processes: methods and applications. Kluwer, Boston

    MATH  Google Scholar 

  19. Ross KW (1989) Randomized and past-dependent policies for Markov decision processes with multiple constraints. Oper Res 37(3):474–477

    Article  MATH  MathSciNet  Google Scholar 

  20. Piunovskiy AB, Mao X (2000) Constrained markovian decision processes: the dynamic programming approach. Oper Res Lett 27:119–126

    Article  MATH  MathSciNet  Google Scholar 

  21. Chen RC, Blankenship GL (2004) Dynamic programming equations for discounted constrained stochastic control. IEEE Trans Automat Contr 49(5):699–709

    Article  MathSciNet  Google Scholar 

  22. Bertsekas DP, Tsitsiklis JN (1996) Neuro-dynamic programming. Athena, Belmont

    MATH  Google Scholar 

  23. Bertsekas DP, Tsitsiklis JN, Wu C (1997) Rollout algorithms for combinatorial optimization. J Heuristics 3:245–262

    Article  MATH  Google Scholar 

  24. Bertsekas DP, Castanon DA (1999) Rollout algorithms for stochastic scheduling problems. J Heuristics 5(1):89–108

    Article  MATH  Google Scholar 

  25. Gesbert D, Slim-Alouini M (2004) How much feedback is multi-user diversity really worth? In: Proc. IEEE international conference on communications, pp 234–238

  26. Floren F, Edfors O, Molin BA (2003) The effect of feedback quantization on the throughput of a multiuser diversity scheme. In: Proc. IEEE global telecommunication conference (GLOBECOM’03), pp 497–501

  27. Svedman P, Wilson S, Cimini LJ Jr., Ottersten B (2004) A simplified opportunistic feedback and scheduling scheme for ofdm. In: Proc. IEEE vehicular technology conference 2004

  28. Al-Harthi Y, Tewfik A, Alouini MS (2007) Multiuser diversity with quantized feedback. IEEE Trans Wirel Commun 6(1):330–337

    Article  Google Scholar 

  29. Parekh AK, Gallager RG (1993) A generalized processor sharing approach to flow control in integrated services networks: the single-node case. IEEE/ACM Trans Netw 1(3):344–357

    Article  Google Scholar 

  30. Kushner HJ, Yin GG (2003) Stochastic approximation and recursive algorithms and applications, 2nd ed. Springer, New York

    MATH  Google Scholar 

  31. Chen HF (2002) Stochastic approximation and its applications. Kluwer, Dordrecht

    MATH  Google Scholar 

  32. Gilbert E (1960) Capacity of a burst-noise channel. Bell Syst Technol J 39:1253–1265

    Google Scholar 

  33. Elliott EO (1963) Estimates of error rates for codes on burst-noise channels. Bell Syst Technol 42:1977–1997

    Google Scholar 

  34. Swarts F, Ferreira HC (1993) Markov characterization of channels with soft decision outputs. IEEE Trans Commun 41:678–682

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Edwin K. P. Chong.

Additional information

This research was supported in part by NSF under grant ECCS-0700559. Parts of an early version of this paper was presented at the IEEE Conference on Decision and Control 2008 [1].

Appendix

Appendix

1.1 A Proof of Lemma 1

Proof

Let π be an arbitrary policy, and suppose that π chooses action a at time slot 0 with probability P a , a ∈ A. Then,

$$ V_{\pi}(s)=\sum_{a\in A}P_{a}\left[r(s,a)+ u(a) +\sum_{s'\in S}P(s'|s,a)W_{\pi}(s')\right], $$

where W π (s′) represents the expected discounted weighted reward with the weight u(π t ) incurred from time slot 1 onwards, given that π is employed and the state a time 1 is s′. However, it follows that

$$ W_{\pi}(s')\leq \alpha V_{\alpha}(s') $$

and hence that

$$\begin{array}{rl} V_{\pi}(s)&\leq \sum\limits_{a\in A}P_{a}\left\{r(s,a)+ u(a) +\alpha\sum\limits_{s'\in S}P(s'|s,a)V_{\alpha}(s')\right\} \\ &\!\leq\! \sum\limits_{a\in A}P_{a}\max\limits_{a\in A}\!\left\{r(s,a)\!+\! u(a) \!+\!\alpha\!\sum\limits_{s'\in S}P(s'|s,a)V_{\alpha}(s')\right\} \\ &=\max\limits_{a\in A}\left\{r(s,a)+ u(a) +\alpha\sum\limits_{s'\in S}P(s'|s,a)V_{\alpha}(s')\right\}.\label{Lemma1.2} \end{array} $$
(23)

Since π is arbitrary, Eq. 23 implies that

$$ \label{Lemma1.3} V_{\alpha}(s)\leq\max_{a\in A}\left\{r(s,a)+u(a)+\alpha\sum_{s'\in S}P(s'|s,a)V_{\alpha}(s')\right\}. $$
(24)

To go the other way, let a 0 be such that

$$ \begin{array}{rcl} \label{Lemma1.4} &&{\kern-6pt} r(s,a_0)+u(a_0)+\alpha\sum\limits_{s'\in S}P(s'|s,a_0)V_{\alpha}(s') \\ &&=\max\limits_{a\in A}\left\{r(s,a)+u(a)+\alpha\sum\limits_{s'\in S}P(s'|s,a)V_{\alpha}(s')\right\} \end{array} $$
(25)

and let π be the policy that chooses a 0 at time 0; and, if the next state is s′, views the process as originating in state s′; and follows a policy π s, which is such that \(V_{\pi_{s'}}(s')\geq V_{\alpha}(s')-\varepsilon\), s′ ∈ S. Hence,

$$ \begin{array}{rl} V_{\pi}(s)&=r(s,a_0)+u(a_0)+\alpha\sum\limits_{s'\in S}P(s'|s,a_0)V_{\pi_{s'}}(s') \\ &\geq r(s,a_0)+u(a_0)+\alpha\sum\limits_{s'\in S}P(s'|s,a_0)V_{\alpha}(s')-\alpha \varepsilon \end{array} $$

which, since V α (s) ≥ V π (s), implies that

$$ V_{\alpha}(s)\geq r(s,a_0)+u(a_0)+\alpha\sum_{s'\in S}P(s'|s,a_0)V_{\alpha}(s')-\alpha \varepsilon. $$

Hence, from Eq. 25, we have

$$ \label{Lemma1.5} V_{\alpha}(s)\!\geq\! \max_{a\in A}\left\{r(s,a)\!+\!u(a)\!+\!\alpha\sum_{s'\in S}P(s'|s,a)V_{\alpha}(s')\right\}\!-\!\alpha \varepsilon. $$
(26)

Since π s could be arbitrary, then ε is arbitrary, from Eqs. 24 and 26, we have

$$ \begin{array}{rcl} V_{\alpha}(s)&=&\max\limits_{a\in A}\left\{r(s,a)+u(a). +\alpha\sum\limits_{s'\in S}P(s'|s,a)V_{\alpha}(s')\right\},\\ s &\in& S. \end{array} $$

1.2 B Proof of Lemma 2

Proof

By applying the mapping \(T_{\pi^*}\) to V α , we obtain

$$ \begin{array}{rcl} &&{\kern-30pt}(T_{\pi^*}V_\alpha)(s) \\ \quad{\kern6pt}&=&r(s,\pi^*(s))+u(\pi^*(s)) +\alpha\sum\limits_{s'\in S}P(s'|s,\pi^*(s))V_\alpha(s')\\ \quad&=&\!\max\limits_{a\in A}\!\left\{r(s,a)\!+\!u(a)\!+\!\!\alpha\!\sum\limits_{s'\in S}\!P(s'|s,\pi^*(s))\!V_\alpha(s')\!\right\}\!\!=\!\!V_\alpha\!(s), \end{array} $$

where the last equation follows from Lemma 1. Hence, by induction we have,

$$ T_{\pi^*}^nV_\alpha=V_\alpha , \quad \forall n. $$

Letting n→ ∞ and using Banach fixed-point theorem yields the result,

$$ V_{\pi^*}(s)=V_\alpha(s), \quad \forall s \in S. $$

1.3 C Proof of Theorem 1

Proof

Let π be a policy satisfying the expected discounted temporal fairness constraint. And suppose there exists u:A→ℝ satisfying conditions 1–3. Then,

$$ \begin{array}{rcl} &&{\kern-30pt}J_{\pi}(s)\\ {\kern6pt} \;\;&=&\lim\limits_{T\to\infty}E_{\pi}\left[\left.\sum\limits_{t=0}^{T-1}\alpha^{t}r(X_t,\pi_t)\right|X_0=s\right]\\ \;\;&\leq& \lim\limits_{T\to\infty}E_{\pi}\left[\left.\sum\limits_{t=0}^{T-1}\alpha^{t}r(X_t,\pi_t)\right|X_0=s\right]\\ &&+\!\sum\limits_{a\in A}u(a)\!\left(\lim\limits_{T\to\infty}E_{\pi}\left[\left.\sum\limits^{T-1}_{t=0}\alpha^t\mathbf{1}_{\{\pi_t=a\}}\right|X_0=s\right]\!-\!C(a)\right)\\ \;\;&=&\lim\limits_{T\to\infty}E_{\pi}\left[\left.\sum\limits_{t=0}^{T-1}\alpha^{t}r(X_t,\pi_t)\right|X_0=s\right]\\ &&+\lim\limits_{T\to\infty}E_{\pi}\left[\left.\sum\limits_{t=0}^{T-1}\alpha^tu(\pi_t)\right|X_0=s\right]-\sum\limits_{a \in A}u(a)C(a)\\ \;\;&=& \lim\limits_{T\to\infty}E_{\pi}\left[\left.\sum\limits_{t=0}^{T-1}\alpha^{t}[r(X_t,\pi_t)+u(\pi_t)]\right|X_0=s\right]\\ &&-\sum\limits_{a\in A}u(a)C(a)\\ \;\;&=&V_{\pi}(s)-\sum\limits_{a\in A}u(a)C(a). \end{array} $$

Since \(V_{\pi}(s)\leq V_{\alpha}(s)=V_{\pi^*}(s)\) from Lemma 2, we have

$$ J_{\pi}(s)\leq V_{\pi^*}(s)-\sum_{a\in A}u(a)C(a)\label{Theorem1.1} $$
(27)
$$ \begin{array}{rcl} \quad&&=\lim\limits_{T\to\infty}E_{\pi^*}\left[\left.\sum\limits_{t=0}^{T-1}\alpha^{t}[r(X_t,\pi^*_t)+u(\pi^*_t)]\right|X_0=s\right] \\[5pt] &&{\kern6pt}-\sum\limits_{a\in A}u(a)C(a) \\[5pt] &&=\lim\limits_{T\to\infty}E_{\pi^*}\left[\left.\sum\limits_{t=0}^{T-1}\alpha^{t}r(X_t,\pi^*_t)\right|X_0=s\right] \\[5pt] &&{\kern6pt}+\,\sum\limits_{a\in A}u(a)\left(\lim\limits_{T\to\infty}E_{\pi^*}\!\left[\left.\sum\limits^{T-1}_{t=0}\alpha^t\mathbf{1}_{\{\pi^*_t=a\}}\right|X_0\!=\!s\right]\!-\!C(a)\!\right) \\[5pt] &&=\lim\limits_{T\to\infty}E_{\pi^*}\left[\left.\sum\limits_{t=0}^{T-1}\alpha^{t}r(X_t,\pi^*_t)\right|X_0=s\right] \\[5pt] &&=J_{\pi^*}(s),\label{Theorem1.2} \end{array} $$
(28)

where the second part of Eq. 28 equals zero because of condition 3 on u. From Eq. 27, we get the corresponding optimal discounted reward as

$$ J_{\pi^*}(s)=V_{\pi^*}(s)-\sum_{a \in A}u(a)C(a) , \; \forall s \in S. $$

1.4 D Proof of Theorem 2

Proof

Let π be a policy satisfying the expected average temporal fairness constraint; and let H t  = (X 0,π 0,..., X t − 1,π t − 1,X t ,π t ) denote the history of the process up to time t. First, we have

$$E_{\pi}\left\{\sum_{t=1}^T[h(X_t)-E_{\pi}(h(X_t)|H_{t-1})]\right\}=0,$$

since

$$ \begin{array}{rcl} &&{\kern-21pt}E_{\pi}\left\{\sum\limits_{t=1}^T[h(X_t)-E_{\pi}(h(X_t)|H_{t-1})]\right\} \\ {\kern6pt}\;\;&=&\sum\limits_{t=1}^T E_{\pi}[h(X_t)-E_{\pi}(h(X_t)|H_{t-1})] \\ \;\;&=&\sum\limits_{t=1}^T \{E_{\pi}[h(X_t)]-E_{\pi}[E_{\pi}(h(X_t)|H_{t-1})]\} \\ \;\;&=&\sum\limits_{t=1}^T \{E_{\pi}[h(X_t)]-E_{\pi}[h(X_t)]\}=0. \end{array} $$

Also,

$$ \begin{array}{rcl} E_{\pi}[h(X_t)|H_{t-1}]&=&\sum\limits_{s'\in S}h(s')P(s'|X_{t-1},\pi_{t-1})\\ &=&\; r(X_{t-1},\pi_{t-1})+u(\pi_{t-1})\\ &&+\sum\limits_{s'\in S}h(s')P(s'|X_{t-1},\pi_{t-1})\\ &&-r(X_{t-1},\pi_{t-1})-u(\pi_{t-1})\\ &\leq &\max\limits_{a\in A}\bigg\{r(X_{t-1},a)+u(a)\\ &&\phantom{\max\limits_{a\in A}\,\,}+\sum_{s'\in S}P(s'|X_{t-1},a)h(s')\bigg\}\\ &&-r(X_{t-1},\pi_{t-1})-u(\pi_{t-1})\\ &=& g\!+\!h(X_{t-1})\!-\!r(X_{t-1},\pi_{t-1})\!-\!u(\pi_{t-1}) \end{array} $$

with equality for π *, since π * is defined to take the maximizing action. Hence,

$$ \begin{array}{rcl} 0&\geq& E_{\pi}\Bigg\{\sum\limits_{t=1}^T\big[h(X_t)-g-h(X_{t-1})+r(X_{t-1},\pi_{t-1})\Bigg.\\ &&\phantom{E_{\pi}\,\,}\Bigg.+u(\pi_{t-1})\big]\Bigg\}\\ \Leftrightarrow g&\geq& E_{\pi}\frac{h(X_T)}{T}-E_{\pi}\frac{h(X_0)}{T}+E_{\pi}\frac{1}{T}\sum\limits_{t=1}^Tr(X_{t-1},\pi_{t-1})\\ &&+E_{\pi}\frac{1}{T}\sum\limits_{t=1}^Tu(\pi_{t-1}). \end{array} $$

Letting T → ∞ and using the fact that h is bounded, we have that

$$ \begin{array}{rl} g&\geq J_{\pi}(X_0)+\lim\limits_{T\to \infty}E_{\pi}\frac{1}{T}\sum\limits_{t=0}^{T-1}u(\pi_t) \Leftrightarrow g-\sum\limits_{a\in A}u(a)C(a) \\&\geq J_{\pi}(X_0)+\lim\limits_{T\to \infty}E_{\pi}\frac{1}{T}\sum\limits_{t=0}^{T-1}u(\pi_t)-\sum\limits_{a\in A}u(a)C(a) \\ &= J_{\pi}(s)+\lim\limits_{T\to \infty}E_{\pi}\bigg[\left.\frac{1}{T}\sum\limits_{t=0}^{T-1}\sum\limits_{a\in A}u(a)\mathbf{1}_{\{\pi_t=a\}}\right|\, X_0=s\bigg] \\ &{\kern6pt}-\sum\limits_{a\in A}u(a)C(a) \\ &= J_{\pi}(s)+\sum\limits_{a\in A}u(a)\bigg(\lim\limits_{T\to \infty}E_{\pi}\bigg[\left.\frac{1}{T}\sum\limits_{t=0}^{T-1}\mathbf{1}_{\{\pi_t=a\}}\right|\, X_0=s\bigg] \\ &{\kern6pt}-C(a)\bigg).\label{Theorem2.1} \end{array}$$
(29)

Since we know that u ≥ 0, and that the policy π satisfies the temporal fairness constraints, the second part of Eq. 29 is greater than or equal to zero. We get

$$ g-\sum_{a\in A}u(a)C(a)\geq J_{\pi}(s). $$

With policy π *, we have

$$ \begin{array}{l} g\!-\!\sum\limits_{a\in A}u(a)C(a)= J_{\pi^*}(s)\\ \qquad \qquad \quad\!+\!\sum\limits_{a\in A}u(a) \times \bigg(\!\lim\limits_{T\to\infty}\!E_{\pi^*}\left[\!\frac{1}{T}\sum\limits_{t=0}^{T-1}\mathbf{1}_{\{\pi_t^*=a\}}\!\right]\!-\!C(a)\!\bigg) \\ \qquad \qquad \qquad = J_{\pi^*}(s), \end{array} $$
(30)

where the second part of Eq. 30 equals to zero because of condition 3 on u(a). Hence, the desired result is proven. □

1.5 E Proof of Theorem 3

Proof

Let π be a policy satisfying the expected discounted utilitarian fairness constraint. And suppose there exists ω:A→ℝ satisfying conditions 1–3. Then,

$$ \begin{array}{rl} J_{\pi}(s) &\!\leq\! J_{\pi}(s)\!+\!\sum\limits_{a\in A}\omega(a)\Bigg(\!\lim\limits_{T\to\infty}E_{\pi}\Bigg[\left.\sum\limits^{T-1}_{t=0}\alpha^tr(X_t,\pi_t)\mathbf{1}_{\{\pi_t=a\}}\right| \\ &\qquad\qquad\qquad{\kern120pt} \times\! X_0 \!=\!s\!\Bigg]\!-\!D(a)J_{\pi}(s)\!\Bigg) \\ &=J_{\pi}(s)+\lim\limits_{T\to\infty}E_{\pi}\left[\left.\sum\limits_{t=0}^{T-1}\alpha^t\omega(\pi_t)r(X_t,\pi_t)\right|X_0=s\right] \\ &{\kern12pt} -\sum\limits_{a \in A}\omega(a)D(a)J_{\pi}(s) \\ &=\lim\limits_{T\to\infty}E_{\pi}\left[\left.\sum\limits_{t=0}^{T-1}\alpha^{t}[(\kappa+\omega(\pi_t))r(X_t,\pi_t)]\right|X_0=s\right] \\ &=U_{\pi}(s), \end{array} $$

where \(\kappa\!=\!1\!-\!\sum_{\pi_t\in A}D(\pi_t)\omega(\pi_t)\). Since U π (s) ≤ U α (s) = \(U_{\pi^*}(s)\) from Lemma 4, we have

$$ J_{\pi}(s)\leq U_{\pi^*}(s)\label{Theorem3.1} $$
(31)
$$ \begin{array}{l} \qquad =\lim\limits_{T\to\infty}E_{\pi^*}\left[\left.\sum\limits_{t=0}^{T-1}\alpha^{t}r(X_t,\pi^*_t)\right|X_0=s\right] \\ + \sum\limits_{a\in A}\omega(a)\bigg(\!\lim\limits_{T\to\infty}E_{\pi^*}\bigg[\!\left.\sum\limits^{T-1}_{t=0}\alpha^tr(X_t,\pi^*_t)\mathbf{1}_{\{\pi^*_t\!=\!a\}}\right| X_0\!=\!s\!\bigg]\! \\ \qquad \qquad \qquad-\!D(a)J_{\pi^*}(s)\!\bigg) \\ \qquad =J_{\pi^*}(s), \end{array} $$
(32)

where the second part of Eq. 32 equals zero because of condition 3 on ω. From Eq. 31, we get the corresponding optimal discounted reward is

$$ J_{\pi^*}(s)=U_{\pi^*}(s) , \; \forall s \in S. $$

1.6 F Proof of Theorem 4

Proof

Let π be a policy satisfying the expected average utilitarian fairness constraint; and let H t  = (X 0,π 0,..., X t − 1,π t − 1,X t ,π t ) denote the history of the process up to time t. First, we have

$$E_{\pi}\left\{\sum_{t=1}^T[h(X_t)-E_{\pi}(h(X_t)|H_{t-1})]\right\}=0.$$

Also,

$$ \begin{array}{rcl} E_{\pi}[h(X_t)|H_{t-1}]&=&\sum\limits_{s'\in S}h(s')P(s'|X_{t-1},\pi_{t-1})\\ &=& (\kappa+\omega(\pi_{t-1}))r(X_{t-1},\pi_{t-1})\\ &&+\,\sum\limits_{s'\in S}h(s')P(s'|X_{t-1},\pi_{t-1})\\ &&-\,(\kappa+\omega(\pi_{t-1}))r(X_{t-1},\pi_{t-1})\\ &\leq& \max\limits_{a\in A}\Bigg\{(\kappa+\omega(a))r(X_{t-1},a)\\ &&+\,\sum\limits_{s'\in S}P(s'|X_{t-1},a)h(s')\Bigg\}\\ &&-\,(\kappa+\omega(\pi_{t-1}))r(X_{t-1},\pi_{t-1})\\ &=&g+h(X_{t-1})-(\kappa+\omega(\pi_{t-1}))\\ &&\times\,r(X_{t-1},\pi_{t-1}) \end{array} $$

with equality for π *, since π * is defined to take the maximizing action. Hence,

$$ \begin{array}{rcl} 0&\geq& E_{\pi}\Bigg\{\sum\limits_{t=1}^T[h(X_t)-g-h(X_{t-1})\\ && +\,(\kappa+\omega(\pi_{t-1}))r(X_{t-1},\pi_{t-1})]\Bigg\}\\ \Leftrightarrow g&\geq&E_{\pi}\frac{h(X_T)}{T}-E_{\pi}\frac{h(X_0)}{T}\\ &&+\,E_{\pi}\frac{1}{T}\sum\limits_{t=1}^T(\kappa+\omega(\pi_{t-1}))r(X_{t-1},\pi_{t-1})\\ \Leftrightarrow g&\geq&E_{\pi}\frac{h(X_T)}{T}-E_{\pi}\frac{h(X_0)}{T}+E_{\pi}\frac{1}{T}\sum\limits_{t=1}^Tr(X_{t-1},\pi_{t-1})\\ &&+\,E_{\pi}\frac{1}{T}\!\sum\limits_{t=1}^T\left(\!\omega(\pi_{t-1}\!-\!\sum\limits_{a\in A}D(a) \omega(a)\right)\\ &&\times\,r(\!X_{t-1},\pi_{t-1}). \end{array} $$

Letting T → ∞ and using the fact that h is bounded, we have that

$$ \begin{array}{rll} g&\geq& J_{\pi}(X_0) \\ &+&\lim\limits_{T\to \infty}E_{\pi}\frac{1}{T}\sum\limits_{t=0}^{T-1}\left(\omega(\pi_{t-1}-\sum\limits_{a\in A}D(a)\omega(a)\right) r(X_{t-1},\pi_{t-1}) \\ \Leftrightarrow g &\geq& J_{\pi}(X_0) \\ &+& \sum\limits_{a\in A}\omega(a)\Bigg(\lim\limits_{T\to\infty}E_{\pi}\Bigg[\left.\frac{1}{T}\sum\limits^{T-1}_{t=0}r(X_t,\pi_t)\mathbf{1}_{\{\pi_t=a\}}\right| X_0=s\Bigg] \\ \qquad\qquad\qquad- D(a)J_{\pi}(s)\Bigg). \label{Theorem4.1} \end{array} $$
(33)

Since we know that ω ≥ 0, and that the policy π satisfies the utilitarian fairness constraints, the second part of Eq. 33 is greater than or equal to zero. We get

$$ g\geq J_{\pi}(s). $$

With policy π *, we have

$$ \begin{array}{lll} g&=& J_{\pi^*}(s) \\ \;\; +\sum\limits_{a\in A}u(a)\Bigg(\lim\limits_{T\to\infty}E_{\pi^*}\Bigg[\left.\frac{1}{T}\sum\limits^{T-1}_{t=0}r(X_t,\pi^*_t)\mathbf{1}_{\{\pi^*_t=a\}}\right| X_0\!=\!s\!\Bigg] \\ \qquad\qquad - D(a)J_{\pi^*}(s)\!\Bigg). \\ &=& J_{\pi^*}(s), \end{array} $$
(34)

where the second part of Eq. 34 equals to zero because of condition 3 on ω(a). Hence, the desired result is proven. □

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, Z., Moola, S. & Chong, E.K.P. Opportunistic Fair Scheduling in Wireless Networks: An Approximate Dynamic Programming Approach. Mobile Netw Appl 15, 710–728 (2010). https://doi.org/10.1007/s11036-009-0198-x

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11036-009-0198-x

Keywords

Navigation