Opportunistic Fair Scheduling in Wireless Networks: An Approximate Dynamic Programming Approach

Zhang, Zhi; Moola, Sudhir; Chong, Edwin K. P.

doi:10.1007/s11036-009-0198-x

Opportunistic Fair Scheduling in Wireless Networks: An Approximate Dynamic Programming Approach

Published: 04 August 2009

Volume 15, pages 710–728, (2010)
Cite this article

Mobile Networks and Applications Aims and scope Submit manuscript

Zhi Zhang¹,
Sudhir Moola¹ &
Edwin K. P. Chong¹

187 Accesses
2 Citations
Explore all metrics

Abstract

We consider the problem of temporal fair scheduling of queued data transmissions in wireless heterogeneous networks. We deal with both the throughput maximization problem and the delay minimization problem. Taking fairness constraints and the data arrival queues into consideration, we formulate the transmission scheduling problem as a Markov decision process (MDP) with fairness constraints. We study two categories of fairness constraints, namely temporal fairness and utilitarian fairness. We consider two criteria: infinite horizon expected total discounted reward and expected average reward. Applying the dynamic programming approach, we derive and prove explicit optimality equations for the above constrained MDPs, and give corresponding optimal fair scheduling policies based on those equations. A practical stochastic-approximation-type algorithm is applied to calculate the control parameters online in the policies. Furthermore, we develop a novel approximation method—temporal fair rollout—to achieve a tractable computation. Numerical results show that the proposed scheme achieves significant performance improvement for both throughput maximization and delay minimization problems compared with other existing schemes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Stochastic approximation based on-line algorithm for fairness in multi-rate wireless LANs

Article 11 March 2016

Sundaresan Krishnan & Prasanna Chaporkar

Resource allocation for real-time traffic in unreliable wireless cellular networks

Article 24 November 2016

Jun Xu, Chengcheng Guo, … Jianfeng Yang

Adaptive proportional fair scheduling with global-fairness

Article 13 August 2019

Zhao Li, Yujiao Bai, … Zhixian Chang

Notes

An MDP is unichain if the transition matrix corresponding to every deterministic stationary policy consists of one single recurrent class plus a possibly empty set of transient state [12].

References

Zhang Z, Moola S, Chong EKP (2008) Approximate stochastic dynamic programming for opportunistic fair scheduling in wireless networks. In: Proc. 47th IEEE conference on decision and control, Cancun, 9–11 December 2008, pp 1404–1409
Knopp R, Humblet P (1995) Information capacity and power control in single cell multiuser communications. In: Proc. IEEE int. conference on communications 1995, vol 1, pp 331–335
Andrews M, Kumaran K, Ramanan K, Stolyar A, Whiting P, Vijayakumar R (2001) Providing quality of service over a shared wireless link. IEEE Commun Mag 39(2):150–153
Article Google Scholar
Bender P, Black P, Grob M, Padovani R, Sindhushyana N, Viterbi A (2000) Cdma/hdr: a bandwidth-efficient high-speed wireless data service for nomadic users. IEEE Commun Mag 38(7):70–77
Article Google Scholar
Andrews M (2005) A survey of scheduling theory in wireless data networks. In: Proc. 2005 IMA summer workshop on wireless communications
Parkvall S, Dahlman E, Frenger P, Beming P, Persson M (2001) The high speed packet data evolution of wcdma. In: Proc. IEEE VTC 2001, vol 3, pp 2287–2291
Liu X, Chong EKP, Shroff NB (2001) Opportunistic transmission scheduling with resource-sharing constraints in wireless networks. IEEE J Sel Areas Commun 19(10):2053–2064
Article Google Scholar
Liu X, Chong EKP, Shroff NB (2003) A framework for opportunistic scheduling in wireless networks. Comput Netw 41(4):451–474
Article MATH Google Scholar
Liu X, Chong EKP, Shroff NB (2004) Opportunistic scheduling: an illustration of cross-layer design. Telecommun Rev 14(6):947–959
Google Scholar
Wang HS, Moayeri N (1995) Finite-state markov channel—a useful model for radio communication channels. IEEE Trans Veh Technol 43:163–171
Article Google Scholar
Kelly F (1997) Charging and rate control for elastic traffic. Eur Trans Telecommun 8:33–37
Article Google Scholar
Puterman ML (1994) Markov decision processes. Wiley, New York
Book MATH Google Scholar
Bertsekas DP (2001) Dynamic programming and optimal control, 2nd ed. Athena, Belmont
MATH Google Scholar
Ross SM (1970) Applied probability models with optimization applications. Dover, New York
MATH Google Scholar
Derman C (1970) Finite sate Markovian decision processes. Academic, New York
Google Scholar
Altman E (1998) Constrained Markov decision processes. Chapman and Hall/CRC, London
Google Scholar
Piunovskiy AB (1997) Optimal control of random sequences in problems with constraints. Kluwer, Dordrecht
MATH Google Scholar
Feinberg EA, Shwartz A (eds) (2002) Handbook of Markov decision processes: methods and applications. Kluwer, Boston
MATH Google Scholar
Ross KW (1989) Randomized and past-dependent policies for Markov decision processes with multiple constraints. Oper Res 37(3):474–477
Article MATH MathSciNet Google Scholar
Piunovskiy AB, Mao X (2000) Constrained markovian decision processes: the dynamic programming approach. Oper Res Lett 27:119–126
Article MATH MathSciNet Google Scholar
Chen RC, Blankenship GL (2004) Dynamic programming equations for discounted constrained stochastic control. IEEE Trans Automat Contr 49(5):699–709
Article MathSciNet Google Scholar
Bertsekas DP, Tsitsiklis JN (1996) Neuro-dynamic programming. Athena, Belmont
MATH Google Scholar
Bertsekas DP, Tsitsiklis JN, Wu C (1997) Rollout algorithms for combinatorial optimization. J Heuristics 3:245–262
Article MATH Google Scholar
Bertsekas DP, Castanon DA (1999) Rollout algorithms for stochastic scheduling problems. J Heuristics 5(1):89–108
Article MATH Google Scholar
Gesbert D, Slim-Alouini M (2004) How much feedback is multi-user diversity really worth? In: Proc. IEEE international conference on communications, pp 234–238
Floren F, Edfors O, Molin BA (2003) The effect of feedback quantization on the throughput of a multiuser diversity scheme. In: Proc. IEEE global telecommunication conference (GLOBECOM’03), pp 497–501
Svedman P, Wilson S, Cimini LJ Jr., Ottersten B (2004) A simplified opportunistic feedback and scheduling scheme for ofdm. In: Proc. IEEE vehicular technology conference 2004
Al-Harthi Y, Tewfik A, Alouini MS (2007) Multiuser diversity with quantized feedback. IEEE Trans Wirel Commun 6(1):330–337
Article Google Scholar
Parekh AK, Gallager RG (1993) A generalized processor sharing approach to flow control in integrated services networks: the single-node case. IEEE/ACM Trans Netw 1(3):344–357
Article Google Scholar
Kushner HJ, Yin GG (2003) Stochastic approximation and recursive algorithms and applications, 2nd ed. Springer, New York
MATH Google Scholar
Chen HF (2002) Stochastic approximation and its applications. Kluwer, Dordrecht
MATH Google Scholar
Gilbert E (1960) Capacity of a burst-noise channel. Bell Syst Technol J 39:1253–1265
Google Scholar
Elliott EO (1963) Estimates of error rates for codes on burst-noise channels. Bell Syst Technol 42:1977–1997
Google Scholar
Swarts F, Ferreira HC (1993) Markov characterization of channels with soft decision outputs. IEEE Trans Commun 41:678–682
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electrical and Computer Engineering, Colorado State University, Ft. Collins, CO, 80523-1373, USA
Zhi Zhang, Sudhir Moola & Edwin K. P. Chong

Authors

Zhi Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Sudhir Moola
View author publications
You can also search for this author in PubMed Google Scholar
Edwin K. P. Chong
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Edwin K. P. Chong.

Additional information

This research was supported in part by NSF under grant ECCS-0700559. Parts of an early version of this paper was presented at the IEEE Conference on Decision and Control 2008 [1].

Appendix

1.1 A Proof of Lemma 1

Proof

Let π be an arbitrary policy, and suppose that π chooses action a at time slot 0 with probability P _a, a ∈ A. Then,

$$ V_{\pi}(s)=\sum_{a\in A}P_{a}\left[r(s,a)+ u(a) +\sum_{s'\in S}P(s'|s,a)W_{\pi}(s')\right], $$

where W _π(s′) represents the expected discounted weighted reward with the weight u(π _t) incurred from time slot 1 onwards, given that π is employed and the state a time 1 is s′. However, it follows that

$$ W_{\pi}(s')\leq \alpha V_{\alpha}(s') $$

and hence that

$$\begin{array}{rl} V_{\pi}(s)&\leq \sum\limits_{a\in A}P_{a}\left\{r(s,a)+ u(a) +\alpha\sum\limits_{s'\in S}P(s'|s,a)V_{\alpha}(s')\right\} \\ &\!\leq\! \sum\limits_{a\in A}P_{a}\max\limits_{a\in A}\!\left\{r(s,a)\!+\! u(a) \!+\!\alpha\!\sum\limits_{s'\in S}P(s'|s,a)V_{\alpha}(s')\right\} \\ &=\max\limits_{a\in A}\left\{r(s,a)+ u(a) +\alpha\sum\limits_{s'\in S}P(s'|s,a)V_{\alpha}(s')\right\}.\label{Lemma1.2} \end{array} $$

(23)

Since π is arbitrary, Eq. 23 implies that

$$ \label{Lemma1.3} V_{\alpha}(s)\leq\max_{a\in A}\left\{r(s,a)+u(a)+\alpha\sum_{s'\in S}P(s'|s,a)V_{\alpha}(s')\right\}. $$

(24)

To go the other way, let a ₀ be such that

$$ \begin{array}{rcl} \label{Lemma1.4} &&{\kern-6pt} r(s,a_0)+u(a_0)+\alpha\sum\limits_{s'\in S}P(s'|s,a_0)V_{\alpha}(s') \\ &&=\max\limits_{a\in A}\left\{r(s,a)+u(a)+\alpha\sum\limits_{s'\in S}P(s'|s,a)V_{\alpha}(s')\right\} \end{array} $$

(25)

and let π be the policy that chooses a ₀ at time 0; and, if the next state is s′, views the process as originating in state s′; and follows a policy π _s′, which is such that $V_{\pi_{s'}}(s')\geq V_{\alpha}(s')-\varepsilon$, s′ ∈ S. Hence,

$$ \begin{array}{rl} V_{\pi}(s)&=r(s,a_0)+u(a_0)+\alpha\sum\limits_{s'\in S}P(s'|s,a_0)V_{\pi_{s'}}(s') \\ &\geq r(s,a_0)+u(a_0)+\alpha\sum\limits_{s'\in S}P(s'|s,a_0)V_{\alpha}(s')-\alpha \varepsilon \end{array} $$

which, since V _α(s) ≥ V _π(s), implies that

$$ V_{\alpha}(s)\geq r(s,a_0)+u(a_0)+\alpha\sum_{s'\in S}P(s'|s,a_0)V_{\alpha}(s')-\alpha \varepsilon. $$

Hence, from Eq. 25, we have

$$ \label{Lemma1.5} V_{\alpha}(s)\!\geq\! \max_{a\in A}\left\{r(s,a)\!+\!u(a)\!+\!\alpha\sum_{s'\in S}P(s'|s,a)V_{\alpha}(s')\right\}\!-\!\alpha \varepsilon. $$

(26)

Since π _s′ could be arbitrary, then ε is arbitrary, from Eqs. 24 and 26, we have

$$ \begin{array}{rcl} V_{\alpha}(s)&=&\max\limits_{a\in A}\left\{r(s,a)+u(a). +\alpha\sum\limits_{s'\in S}P(s'|s,a)V_{\alpha}(s')\right\},\\ s &\in& S. \end{array} $$

□

1.2 B Proof of Lemma 2

Proof

By applying the mapping $T_{\pi^*}$ to V _α, we obtain

$$ \begin{array}{rcl} &&{\kern-30pt}(T_{\pi^*}V_\alpha)(s) \\ \quad{\kern6pt}&=&r(s,\pi^*(s))+u(\pi^*(s)) +\alpha\sum\limits_{s'\in S}P(s'|s,\pi^*(s))V_\alpha(s')\\ \quad&=&\!\max\limits_{a\in A}\!\left\{r(s,a)\!+\!u(a)\!+\!\!\alpha\!\sum\limits_{s'\in S}\!P(s'|s,\pi^*(s))\!V_\alpha(s')\!\right\}\!\!=\!\!V_\alpha\!(s), \end{array} $$

where the last equation follows from Lemma 1. Hence, by induction we have,

$$ T_{\pi^*}^nV_\alpha=V_\alpha , \quad \forall n. $$

Letting n→ ∞ and using Banach fixed-point theorem yields the result,

$$ V_{\pi^*}(s)=V_\alpha(s), \quad \forall s \in S. $$

□

1.3 C Proof of Theorem 1

Proof

Let π be a policy satisfying the expected discounted temporal fairness constraint. And suppose there exists u:A→ℝ satisfying conditions 1–3. Then,

$$ \begin{array}{rcl} &&{\kern-30pt}J_{\pi}(s)\\ {\kern6pt} \;\;&=&\lim\limits_{T\to\infty}E_{\pi}\left[\left.\sum\limits_{t=0}^{T-1}\alpha^{t}r(X_t,\pi_t)\right|X_0=s\right]\\ \;\;&\leq& \lim\limits_{T\to\infty}E_{\pi}\left[\left.\sum\limits_{t=0}^{T-1}\alpha^{t}r(X_t,\pi_t)\right|X_0=s\right]\\ &&+\!\sum\limits_{a\in A}u(a)\!\left(\lim\limits_{T\to\infty}E_{\pi}\left[\left.\sum\limits^{T-1}_{t=0}\alpha^t\mathbf{1}_{\{\pi_t=a\}}\right|X_0=s\right]\!-\!C(a)\right)\\ \;\;&=&\lim\limits_{T\to\infty}E_{\pi}\left[\left.\sum\limits_{t=0}^{T-1}\alpha^{t}r(X_t,\pi_t)\right|X_0=s\right]\\ &&+\lim\limits_{T\to\infty}E_{\pi}\left[\left.\sum\limits_{t=0}^{T-1}\alpha^tu(\pi_t)\right|X_0=s\right]-\sum\limits_{a \in A}u(a)C(a)\\ \;\;&=& \lim\limits_{T\to\infty}E_{\pi}\left[\left.\sum\limits_{t=0}^{T-1}\alpha^{t}[r(X_t,\pi_t)+u(\pi_t)]\right|X_0=s\right]\\ &&-\sum\limits_{a\in A}u(a)C(a)\\ \;\;&=&V_{\pi}(s)-\sum\limits_{a\in A}u(a)C(a). \end{array} $$

Since $V_{\pi}(s)\leq V_{\alpha}(s)=V_{\pi^*}(s)$ from Lemma 2, we have

$$ J_{\pi}(s)\leq V_{\pi^*}(s)-\sum_{a\in A}u(a)C(a)\label{Theorem1.1} $$

(27)

$$ \begin{array}{rcl} \quad&&=\lim\limits_{T\to\infty}E_{\pi^*}\left[\left.\sum\limits_{t=0}^{T-1}\alpha^{t}[r(X_t,\pi^*_t)+u(\pi^*_t)]\right|X_0=s\right] \\[5pt] &&{\kern6pt}-\sum\limits_{a\in A}u(a)C(a) \\[5pt] &&=\lim\limits_{T\to\infty}E_{\pi^*}\left[\left.\sum\limits_{t=0}^{T-1}\alpha^{t}r(X_t,\pi^*_t)\right|X_0=s\right] \\[5pt] &&{\kern6pt}+\,\sum\limits_{a\in A}u(a)\left(\lim\limits_{T\to\infty}E_{\pi^*}\!\left[\left.\sum\limits^{T-1}_{t=0}\alpha^t\mathbf{1}_{\{\pi^*_t=a\}}\right|X_0\!=\!s\right]\!-\!C(a)\!\right) \\[5pt] &&=\lim\limits_{T\to\infty}E_{\pi^*}\left[\left.\sum\limits_{t=0}^{T-1}\alpha^{t}r(X_t,\pi^*_t)\right|X_0=s\right] \\[5pt] &&=J_{\pi^*}(s),\label{Theorem1.2} \end{array} $$

(28)

where the second part of Eq. 28 equals zero because of condition 3 on u. From Eq. 27, we get the corresponding optimal discounted reward as

$$ J_{\pi^*}(s)=V_{\pi^*}(s)-\sum_{a \in A}u(a)C(a) , \; \forall s \in S. $$

□

1.4 D Proof of Theorem 2

Proof

Let π be a policy satisfying the expected average temporal fairness constraint; and let H _t = (X ₀,π ₀,..., X _t − 1,π _t − 1,X _t,π _t) denote the history of the process up to time t. First, we have

$$E_{\pi}\left\{\sum_{t=1}^T[h(X_t)-E_{\pi}(h(X_t)|H_{t-1})]\right\}=0,$$

since

$$ \begin{array}{rcl} &&{\kern-21pt}E_{\pi}\left\{\sum\limits_{t=1}^T[h(X_t)-E_{\pi}(h(X_t)|H_{t-1})]\right\} \\ {\kern6pt}\;\;&=&\sum\limits_{t=1}^T E_{\pi}[h(X_t)-E_{\pi}(h(X_t)|H_{t-1})] \\ \;\;&=&\sum\limits_{t=1}^T \{E_{\pi}[h(X_t)]-E_{\pi}[E_{\pi}(h(X_t)|H_{t-1})]\} \\ \;\;&=&\sum\limits_{t=1}^T \{E_{\pi}[h(X_t)]-E_{\pi}[h(X_t)]\}=0. \end{array} $$

Also,

$$ \begin{array}{rcl} E_{\pi}[h(X_t)|H_{t-1}]&=&\sum\limits_{s'\in S}h(s')P(s'|X_{t-1},\pi_{t-1})\\ &=&\; r(X_{t-1},\pi_{t-1})+u(\pi_{t-1})\\ &&+\sum\limits_{s'\in S}h(s')P(s'|X_{t-1},\pi_{t-1})\\ &&-r(X_{t-1},\pi_{t-1})-u(\pi_{t-1})\\ &\leq &\max\limits_{a\in A}\bigg\{r(X_{t-1},a)+u(a)\\ &&\phantom{\max\limits_{a\in A}\,\,}+\sum_{s'\in S}P(s'|X_{t-1},a)h(s')\bigg\}\\ &&-r(X_{t-1},\pi_{t-1})-u(\pi_{t-1})\\ &=& g\!+\!h(X_{t-1})\!-\!r(X_{t-1},\pi_{t-1})\!-\!u(\pi_{t-1}) \end{array} $$

with equality for π ^*, since π ^* is defined to take the maximizing action. Hence,

$$ \begin{array}{rcl} 0&\geq& E_{\pi}\Bigg\{\sum\limits_{t=1}^T\big[h(X_t)-g-h(X_{t-1})+r(X_{t-1},\pi_{t-1})\Bigg.\\ &&\phantom{E_{\pi}\,\,}\Bigg.+u(\pi_{t-1})\big]\Bigg\}\\ \Leftrightarrow g&\geq& E_{\pi}\frac{h(X_T)}{T}-E_{\pi}\frac{h(X_0)}{T}+E_{\pi}\frac{1}{T}\sum\limits_{t=1}^Tr(X_{t-1},\pi_{t-1})\\ &&+E_{\pi}\frac{1}{T}\sum\limits_{t=1}^Tu(\pi_{t-1}). \end{array} $$

Letting T → ∞ and using the fact that h is bounded, we have that

$$ \begin{array}{rl} g&\geq J_{\pi}(X_0)+\lim\limits_{T\to \infty}E_{\pi}\frac{1}{T}\sum\limits_{t=0}^{T-1}u(\pi_t) \Leftrightarrow g-\sum\limits_{a\in A}u(a)C(a) \\&\geq J_{\pi}(X_0)+\lim\limits_{T\to \infty}E_{\pi}\frac{1}{T}\sum\limits_{t=0}^{T-1}u(\pi_t)-\sum\limits_{a\in A}u(a)C(a) \\ &= J_{\pi}(s)+\lim\limits_{T\to \infty}E_{\pi}\bigg[\left.\frac{1}{T}\sum\limits_{t=0}^{T-1}\sum\limits_{a\in A}u(a)\mathbf{1}_{\{\pi_t=a\}}\right|\, X_0=s\bigg] \\ &{\kern6pt}-\sum\limits_{a\in A}u(a)C(a) \\ &= J_{\pi}(s)+\sum\limits_{a\in A}u(a)\bigg(\lim\limits_{T\to \infty}E_{\pi}\bigg[\left.\frac{1}{T}\sum\limits_{t=0}^{T-1}\mathbf{1}_{\{\pi_t=a\}}\right|\, X_0=s\bigg] \\ &{\kern6pt}-C(a)\bigg).\label{Theorem2.1} \end{array}$$

(29)

Since we know that u ≥ 0, and that the policy π satisfies the temporal fairness constraints, the second part of Eq. 29 is greater than or equal to zero. We get

$$ g-\sum_{a\in A}u(a)C(a)\geq J_{\pi}(s). $$

With policy π ^*, we have

$$ \begin{array}{l} g\!-\!\sum\limits_{a\in A}u(a)C(a)= J_{\pi^*}(s)\\ \qquad \qquad \quad\!+\!\sum\limits_{a\in A}u(a) \times \bigg(\!\lim\limits_{T\to\infty}\!E_{\pi^*}\left[\!\frac{1}{T}\sum\limits_{t=0}^{T-1}\mathbf{1}_{\{\pi_t^*=a\}}\!\right]\!-\!C(a)\!\bigg) \\ \qquad \qquad \qquad = J_{\pi^*}(s), \end{array} $$

(30)

where the second part of Eq. 30 equals to zero because of condition 3 on u(a). Hence, the desired result is proven. □

1.5 E Proof of Theorem 3

Proof

Let π be a policy satisfying the expected discounted utilitarian fairness constraint. And suppose there exists ω:A→ℝ satisfying conditions 1–3. Then,

$$ \begin{array}{rl} J_{\pi}(s) &\!\leq\! J_{\pi}(s)\!+\!\sum\limits_{a\in A}\omega(a)\Bigg(\!\lim\limits_{T\to\infty}E_{\pi}\Bigg[\left.\sum\limits^{T-1}_{t=0}\alpha^tr(X_t,\pi_t)\mathbf{1}_{\{\pi_t=a\}}\right| \\ &\qquad\qquad\qquad{\kern120pt} \times\! X_0 \!=\!s\!\Bigg]\!-\!D(a)J_{\pi}(s)\!\Bigg) \\ &=J_{\pi}(s)+\lim\limits_{T\to\infty}E_{\pi}\left[\left.\sum\limits_{t=0}^{T-1}\alpha^t\omega(\pi_t)r(X_t,\pi_t)\right|X_0=s\right] \\ &{\kern12pt} -\sum\limits_{a \in A}\omega(a)D(a)J_{\pi}(s) \\ &=\lim\limits_{T\to\infty}E_{\pi}\left[\left.\sum\limits_{t=0}^{T-1}\alpha^{t}[(\kappa+\omega(\pi_t))r(X_t,\pi_t)]\right|X_0=s\right] \\ &=U_{\pi}(s), \end{array} $$

where $\kappa\!=\!1\!-\!\sum_{\pi_t\in A}D(\pi_t)\omega(\pi_t)$. Since U _π(s) ≤ U _α(s) = $U_{\pi^*}(s)$ from Lemma 4, we have

$$ J_{\pi}(s)\leq U_{\pi^*}(s)\label{Theorem3.1} $$

(31)

$$ \begin{array}{l} \qquad =\lim\limits_{T\to\infty}E_{\pi^*}\left[\left.\sum\limits_{t=0}^{T-1}\alpha^{t}r(X_t,\pi^*_t)\right|X_0=s\right] \\ + \sum\limits_{a\in A}\omega(a)\bigg(\!\lim\limits_{T\to\infty}E_{\pi^*}\bigg[\!\left.\sum\limits^{T-1}_{t=0}\alpha^tr(X_t,\pi^*_t)\mathbf{1}_{\{\pi^*_t\!=\!a\}}\right| X_0\!=\!s\!\bigg]\! \\ \qquad \qquad \qquad-\!D(a)J_{\pi^*}(s)\!\bigg) \\ \qquad =J_{\pi^*}(s), \end{array} $$

(32)

where the second part of Eq. 32 equals zero because of condition 3 on ω. From Eq. 31, we get the corresponding optimal discounted reward is

$$ J_{\pi^*}(s)=U_{\pi^*}(s) , \; \forall s \in S. $$

□

1.6 F Proof of Theorem 4

Proof

Let π be a policy satisfying the expected average utilitarian fairness constraint; and let H _t = (X ₀,π ₀,..., X _t − 1,π _t − 1,X _t,π _t) denote the history of the process up to time t. First, we have

$$E_{\pi}\left\{\sum_{t=1}^T[h(X_t)-E_{\pi}(h(X_t)|H_{t-1})]\right\}=0.$$

Also,

$$ \begin{array}{rcl} E_{\pi}[h(X_t)|H_{t-1}]&=&\sum\limits_{s'\in S}h(s')P(s'|X_{t-1},\pi_{t-1})\\ &=& (\kappa+\omega(\pi_{t-1}))r(X_{t-1},\pi_{t-1})\\ &&+\,\sum\limits_{s'\in S}h(s')P(s'|X_{t-1},\pi_{t-1})\\ &&-\,(\kappa+\omega(\pi_{t-1}))r(X_{t-1},\pi_{t-1})\\ &\leq& \max\limits_{a\in A}\Bigg\{(\kappa+\omega(a))r(X_{t-1},a)\\ &&+\,\sum\limits_{s'\in S}P(s'|X_{t-1},a)h(s')\Bigg\}\\ &&-\,(\kappa+\omega(\pi_{t-1}))r(X_{t-1},\pi_{t-1})\\ &=&g+h(X_{t-1})-(\kappa+\omega(\pi_{t-1}))\\ &&\times\,r(X_{t-1},\pi_{t-1}) \end{array} $$

with equality for π ^*, since π ^* is defined to take the maximizing action. Hence,

$$ \begin{array}{rcl} 0&\geq& E_{\pi}\Bigg\{\sum\limits_{t=1}^T[h(X_t)-g-h(X_{t-1})\\ && +\,(\kappa+\omega(\pi_{t-1}))r(X_{t-1},\pi_{t-1})]\Bigg\}\\ \Leftrightarrow g&\geq&E_{\pi}\frac{h(X_T)}{T}-E_{\pi}\frac{h(X_0)}{T}\\ &&+\,E_{\pi}\frac{1}{T}\sum\limits_{t=1}^T(\kappa+\omega(\pi_{t-1}))r(X_{t-1},\pi_{t-1})\\ \Leftrightarrow g&\geq&E_{\pi}\frac{h(X_T)}{T}-E_{\pi}\frac{h(X_0)}{T}+E_{\pi}\frac{1}{T}\sum\limits_{t=1}^Tr(X_{t-1},\pi_{t-1})\\ &&+\,E_{\pi}\frac{1}{T}\!\sum\limits_{t=1}^T\left(\!\omega(\pi_{t-1}\!-\!\sum\limits_{a\in A}D(a) \omega(a)\right)\\ &&\times\,r(\!X_{t-1},\pi_{t-1}). \end{array} $$

Letting T → ∞ and using the fact that h is bounded, we have that

$$ \begin{array}{rll} g&\geq& J_{\pi}(X_0) \\ &+&\lim\limits_{T\to \infty}E_{\pi}\frac{1}{T}\sum\limits_{t=0}^{T-1}\left(\omega(\pi_{t-1}-\sum\limits_{a\in A}D(a)\omega(a)\right) r(X_{t-1},\pi_{t-1}) \\ \Leftrightarrow g &\geq& J_{\pi}(X_0) \\ &+& \sum\limits_{a\in A}\omega(a)\Bigg(\lim\limits_{T\to\infty}E_{\pi}\Bigg[\left.\frac{1}{T}\sum\limits^{T-1}_{t=0}r(X_t,\pi_t)\mathbf{1}_{\{\pi_t=a\}}\right| X_0=s\Bigg] \\ \qquad\qquad\qquad- D(a)J_{\pi}(s)\Bigg). \label{Theorem4.1} \end{array} $$

(33)

Since we know that ω ≥ 0, and that the policy π satisfies the utilitarian fairness constraints, the second part of Eq. 33 is greater than or equal to zero. We get

$$ g\geq J_{\pi}(s). $$

With policy π ^*, we have

$$ \begin{array}{lll} g&=& J_{\pi^*}(s) \\ \;\; +\sum\limits_{a\in A}u(a)\Bigg(\lim\limits_{T\to\infty}E_{\pi^*}\Bigg[\left.\frac{1}{T}\sum\limits^{T-1}_{t=0}r(X_t,\pi^*_t)\mathbf{1}_{\{\pi^*_t=a\}}\right| X_0\!=\!s\!\Bigg] \\ \qquad\qquad - D(a)J_{\pi^*}(s)\!\Bigg). \\ &=& J_{\pi^*}(s), \end{array} $$

(34)

where the second part of Eq. 34 equals to zero because of condition 3 on ω(a). Hence, the desired result is proven. □

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, Z., Moola, S. & Chong, E.K.P. Opportunistic Fair Scheduling in Wireless Networks: An Approximate Dynamic Programming Approach. Mobile Netw Appl 15, 710–728 (2010). https://doi.org/10.1007/s11036-009-0198-x

Download citation

Published: 04 August 2009
Issue Date: October 2010
DOI: https://doi.org/10.1007/s11036-009-0198-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Opportunistic Fair Scheduling in Wireless Networks: An Approximate Dynamic Programming Approach

Abstract

Access this article

Similar content being viewed by others

Stochastic approximation based on-line algorithm for fairness in multi-rate wireless LANs

Resource allocation for real-time traffic in unreliable wireless cellular networks

Adaptive proportional fair scheduling with global-fairness

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix

1.1 A Proof of Lemma 1

Proof

1.2 B Proof of Lemma 2

Proof

1.3 C Proof of Theorem 1

Proof

1.4 D Proof of Theorem 2

Proof

1.5 E Proof of Theorem 3

Proof

1.6 F Proof of Theorem 4

Proof

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Opportunistic Fair Scheduling in Wireless Networks: An Approximate Dynamic Programming Approach

Abstract

Access this article

Similar content being viewed by others

Stochastic approximation based on-line algorithm for fairness in multi-rate wireless LANs

Resource allocation for real-time traffic in unreliable wireless cellular networks

Adaptive proportional fair scheduling with global-fairness

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix

Appendix

1.1 A Proof of Lemma 1

Proof

1.2 B Proof of Lemma 2

Proof

1.3 C Proof of Theorem 1

Proof

1.4 D Proof of Theorem 2

Proof

1.5 E Proof of Theorem 3

Proof

1.6 F Proof of Theorem 4

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation