A constrained MDP-based vertical handoff decision algorithm for 4G heterogeneous wireless networks

Sun, Chi; Stevens-Navarro, Enrique; Shah-Mansouri, Vahid; Wong, Vincent W. S.

doi:10.1007/s11276-011-0335-x

A constrained MDP-based vertical handoff decision algorithm for 4G heterogeneous wireless networks

Published: 27 March 2011

Volume 17, pages 1063–1081, (2011)
Cite this article

Wireless Networks Aims and scope Submit manuscript

Chi Sun¹,
Enrique Stevens-Navarro¹,
Vahid Shah-Mansouri¹ &
…
Vincent W. S. Wong¹

536 Accesses
17 Citations
Explore all metrics

Abstract

The 4th generation wireless communication systems aim to provide users with the convenience of seamless roaming among heterogeneous wireless access networks. To achieve this goal, the support of vertical handoff is important in mobility management. This paper focuses on the vertical handoff decision algorithm, which determines the criteria under which vertical handoff should be performed. The problem is formulated as a constrained Markov decision process. The objective is to maximize the expected total reward of a connection subject to the expected total access cost constraint. In our model, a benefit function is used to assess the quality of the connection, and a penalty function is used to model the signaling incurred and call dropping. The user’s velocity and location information are also considered when making handoff decisions. The policy iteration and Q-learning algorithms are employed to determine the optimal policy. Structural results on the optimal vertical handoff policy are derived by using the concept of supermodularity. We show that the optimal policy is a threshold policy in bandwidth, delay, and velocity. Numerical results show that our proposed vertical handoff decision algorithm outperforms other decision schemes in a wide range of conditions such as variations on connection duration, user’s velocity, user’s budget, traffic type, signaling cost, and monetary access cost.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An Optimal Vertical Handoff Decision Algorithm for Multiple Services with Different Priorities in Heterogeneous Wireless Networks

Article 26 February 2015

A multi-objective model-based vertical handoff algorithm for heterogeneous wireless networks

Article Open access 06 April 2021

Game Theory for Vertical Handoff Decisions in Heterogeneous Wireless Networks: A Tutorial

Notes

The time between two successive decision epochs is on the order of seconds.

References

3rd Generation Partnership Project (3GPP) http://www.3gpp.org.
3rd Generation Partnership Project 2 (3GPP2) http://www.3gpp2.org.
IEEE 802.21 Media Independent Handover Working Group, http://www.ieee802.org/21/.
McNair, J., & Zhu, F. (2004). Vertical handoffs in fourth-generation multi-network environments. IEEE Wireless Communications, 11(3), 8–15.
Article Google Scholar
Chen, W., Liu, J., & Huang, H. (2004). An adaptive scheme for vertical handoff in wireless overlay networks. In Proceedings of ICPAD’04, Newport Beach, CA.
Xia, L., Jiang, L. G., & He, C. (2007). A novel fuzzy logic vertical handoff algorithm with aid of differential prediction and pre-decision method. In Proceedings of IEEE ICC’07, Glasgow, Scotland.
Nasser, N., Guizani, S., & Al-Masri, E. (2007). Middleware vertical handoff manager: A neural network-based solution. In Proceedings of IEEE ICC’07, Glasgow, Scotland.
Mani, M., & Crespi, N. (2006). Handover criteria considerations in future convergent networks. In Proceedings of IEEE GLOBECOM’06, San Francisco, CA.
Garmonov, A. V., Cheon, S. H., Yim, D. H., Han, K. T., Park, Y. S., Savinkov, A. Y., et al. (2008). QoS-oriented intersystem handover between IEEE 802.11b and overlay networks. IEEE Transactions on Vehicular Technology, 57(2), 1142–1154.
Article Google Scholar
Zhang, W. (2004). Handover decision using fuzzy MADM in heterogeneous networks. In Proceedings of IEEE WCNC’04, Atlanta, GA.
Bari, F., & Leung, V. (2007). Application of ELECTRE to network selection in a heterogeneous wireless network environment. In Proceedings of IEEE WCNC’07, Hong Kong, China.
Yang, K., Gondal, I., Qiu, B., & Dooley, L. S. (2007). Combined SINR based vertical handoff algorithm for next generation heterogeneous wireless networks. In Proceedings of IEEE GLOBECOM’07, Washington, DC.
Liu, M., Li, Z., Guo, X., & Dutkiewicz, E. (2008). Performance analysis and optimization of handoff algorithms in heterogeneous wireless networks. IEEE Transactions on Mobile Computing, 7(7), 846–857.
Article Google Scholar
Chien, S., Liu, H., Low, A. L. Y., Maciocco, C., & Ho, Y. (2008). Smart predictive trigger for effective handover in wireless networks. In Proceedings of IEEE ICC’08, Beijing, China.
Ormond, O., Murphy, J., & Muntean, G. (2006). Utility-based intelligent network selection in beyond 3G systems. In Proceedings of IEEE ICC’06, Istanbul, Turkey.
Zhang, J., Chan, H. C., & Leung, V. (2006). A location-based vertical handoff decision algorithm for heterogeneous mobile networks. In Proceedings of IEEE GLOBECOM’06, San Francisco, CA.
Guo, Q., Zhu, J., & Xu, X. (2005). An adaptive multi-criteria vertical handoff decision algorithm for radio heterogeneous networks. In Proceedings of IEEE ICC’05, Seoul, Korea.
Lee, W., Kim, E., Kim, J., Lee, I., & Lee, C. (2007). Movement-aware vertical handoff of WLAN and mobile WiMAX for seamless ubiquitous access. IEEE Transactions on Consumers Electronics, 53(4), 1268–1275.
Article Google Scholar
Zahran, A., & Liang, B. (2005). Performance evaluation framework for vertical handoff algorithms in heterogeneous networks. In Proceedings of IEEE ICC’05, Seoul, Korea.
Wang, J., Prasad, R. V., & Niemegeers, I. (2008). Solving the incertitude of vertical handovers in heterogeneous mobile wireless network using mdp. In Proceedings of IEEE ICC’08, Beijing, China.
Stevens-Navarro, E., Lin, Y., & Wong, V.W.S. (2008). An MDP-based vertical handoff decision algorithm for heterogeneous wireless networks. IEEE Transactions on Vehicular Technology, 57(2), 1243–1254.
Article Google Scholar
Sun, C., Stevens-Navarro, E., & Wong, V. (2008). A constrained MDP-based vertical handoff decision algorithm for 4G wireless networks. In Proceedings of IEEE ICC’08, Beijing, China.
Sun, C. (2008). A constrained MDP-based vertical handoff decision algorithm for wireless networks. Master’s thesis, University of British Columbia, Vancouver, BC.
Altman, E. (1999). Constrained Markov decision processes. London: Chapman and Hall.
MATH Google Scholar
Liang, B., & Haas, Z. (2003). Predictive distance-based mobility management for multidimensional PCS networks. IEEE/ACM Transactions on Networking, 11(5), 718–732.
Article Google Scholar
Tang, S., & Li, W. (2005). Performance analysis of the 3G network with complementary WLANs. In Proceedings of IEEE GLOBECOM’05, St. Louis, MO.
Doufexi, A., Tameh, E., Nix, A., Armour, S., & Molina, A. (2003). Hotspot wireless LANs to enhance the performance of 3G and beyond cellular networks. IEEE Communications Magazine, 41(7), 58–65.
Article Google Scholar
The network simulator—ns-2, http://www.isi.edu/nsnam/ns.
WiMAX module for ns-2 simulator, http://ndsl.csie.cgu.edu.tw/wimaxns2.php.
Puterman, M. (1994). Markov decision processes: Discrete stochastic dynamic programming. New York: Wiley.
MATH Google Scholar
Bertsekas, D. P. (2007). Dynamic programming and optimal control (3rd ed.). Boston, MA: Athena Scientific.
Google Scholar
Djonin, V., & Krishnamurthy, V. (2007). Q-learning algorithms for constrained Markov decision process with randomized monotone policies: Application to MIMO transmission control. IEEE Transactions on Signal Processing, 55(5), 2170–2181.
Article MathSciNet Google Scholar
Beutler, F., & Ross, K. (1985). Optimal policies for controlled markov chains with a constraint. Journal of Mathematical Analysis and Applications, 112, 236–252.
Article MATH MathSciNet Google Scholar
Topkis, D. M. (1998). Supermodularity and complementary. Princeton: Princeton University Press.
Google Scholar

Download references

Acknowledgments

This work was supported by Bell Canada and the Natural Sciences and Engineering Research Council (NSERC) of Canada.

Author information

Authors and Affiliations

Department of Electrical and Computer Engineering, The University of British Columbia, 2332 Main Mall, Vancouver, BC, V6T 1Z4, Canada
Chi Sun, Enrique Stevens-Navarro, Vahid Shah-Mansouri & Vincent W. S. Wong

Authors

Chi Sun
View author publications
You can also search for this author in PubMed Google Scholar
Enrique Stevens-Navarro
View author publications
You can also search for this author in PubMed Google Scholar
Vahid Shah-Mansouri
View author publications
You can also search for this author in PubMed Google Scholar
Vincent W. S. Wong
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Vincent W. S. Wong.

Appendices

Appendix 1: Proof of Lemma 1

We will prove v _β(i, b ₁, d ₁, b ₂, d ₂, v, l) is monotone in available bandwidth, delay, and velocity. This consists of two steps:

1.
To prove the reward function r(s, a) is monotone in available bandwidth, delay, and velocity;
2.
To prove the sum of transition probabilities $\textstyle \sum_{{\bf s}^{\prime} \in {\bf S}} P[{\bf s}^{\prime}|{\bf s},a]$ is monotone in available bandwidth, delay, and velocity.

We first note that the only part that relates to the bandwidth in the reward function (i.e., r(s, a)) is f _b(s, a). Let b ¹_a and b ²_a be two possible bandwidth values, and b ¹_a ≥ b ²_a . We denote f _b(s ¹, a) as the value of f _b(s, a) when b _a = b ¹_a , and f _b(s ²,a) as the value of f _b(s, a) when b _a = b ²_a . Clearly from the definition of f _b(s, a), f _b(s ¹, a) is greater than (or equal to) f _b(s ²,a), since f _b(s, a) is linearly proportional to b _a. As a result, the reward function r(s, a) is monotonically non-decreasing in the available bandwidth.

Similarly, the only part that relates to the delay in the reward function is f _d(s, a). Let d ¹_a and d ²_a be two possible delay values, and d ¹_a ≥ d ²_a . We denote f _d(s ¹, a) as the value of f _d(s, a) when d _a = d ¹_a , and f _d(s ²,a) as the value of f _d(s, a) when d _a = d ²_a . Clearly from the definition of f _d(s, a), f _d(s ¹, a) is smaller than (or equal to) f _d(s ²,a), since f _d(s, a) is linear inverse proportional to d _a. As a result, the reward function r(s, a) is monotonically non-increasing in the delay.

For the velocity, the only part that relates to it in the reward function is −q(s, a). From the definition of q(s, a) in (9), when the velocity v becomes larger, the value of q(s, a) becomes larger or remains the same, which means that −q(s, a) becomes smaller or stays the same. Consequently, the reward function r(s, a) is monotonically non-increasing in velocity.

We assume that the transition probability function $P[{\bf s}^{\prime}|{\bf s},a]$ satisfies the first order stochastic dominance condition. This implies when the system is in a better state (e.g., larger bandwidth, lower delay), its evolution will be in the region of better states with a higher probability. When the available bandwidth is considered, it implies that the sum of transition probabilities (i.e., $\textstyle\sum_{{\bf s}^{\prime} \in {\bf S}} P[{\bf s}^{\prime}|{\bf s},a]$) is monotonically non-decreasing in the available bandwidth. Similarly, the delay (velocity) in the next decision epoch is stochastically decreasing with respect to the delay (velocity) in the current decision epoch is the condition under which the sum of transition probabilities (i.e., $\textstyle\sum_{{\bf s}^{\prime} \in {\bf S}} P[{\bf s}^{\prime}|{\bf s},a]$) is monotonically non-increasing in the delay (velocity).

Appendix 2: Proof of Theorem 1

To show that the optimal policy is monotonically non-increasing in the available bandwidth, we need to prove that Q _β(s, a) is submodular in (b ₂, a). We will prove via mathematical induction that for a suitable initialization,

$$ \begin{aligned} & Q^{k+1}_{\beta}([i, b_1, d_1, b_2, d_2, v, l], 2)-Q^{k+1}_{\beta}([i, b_1, d_1, b_2, d_2, v, l], 1) &\quad = r([i, b_1, d_1, b_2, d_2, v, l], 2; \beta) -r([i, b_1, d_1, b_2, d_2, v, l], 1; \beta) \\ &\qquad+\sum_{{\bf s}^{\prime} \in \, {\bf S}} \lambda P[b^{\prime}_1,d^{\prime}_1|b_1,d_1]P[b^{\prime}_2,d^{\prime}_2|b_2,d_2]P[v^{\prime}|v]P[l^{\prime}|l]\\ &\qquad \times \left(v^{k}_{\beta}(j,b^{\prime}_1,d^{\prime}_1,b^{\prime}_2,d^{\prime}_2,v^{\prime},l^{\prime})-v^{k}_{\beta}(j,b^{\prime}_1,d^{\prime}_1,b^{\prime}_2,d^{\prime}_2,v^{\prime},l^{\prime})\right), \end{aligned} $$

(34)

is monotonically non-increasing in the available bandwidth b ₂. It holds if $v_{\beta}(j,b^{\prime}_1,d^{\prime}_1,b^{\prime}_2,d^{\prime}_2,v^{\prime},l^{\prime})$ has non-increasing difference in b ₂. Select $v^{0}_{\beta}(j,b^{\prime}_1,d^{\prime}_1,b^{\prime}_2,d^{\prime}_2,v^{\prime},l^{\prime})$ with non-increasing difference in b ₂. Assume that $v^{k}_{\beta}(j,b^{\prime}_1,d^{\prime}_1,b^{\prime}_2,d^{\prime}_2,v^{\prime},l^{\prime})$ has non-increasing difference in b ₂, which implies that Q ^k+1 _β([i, b ₁, d ₁, b ₂, d ₂, v, l], a) is submodular in (b ₂,a). We will now prove that $v^{k+1}_{\beta}(j,b^{\prime}_1,d^{\prime}_1,b^{\prime}_2,d^{\prime}_2,v^{\prime},l^{\prime})$ also has non-increasing difference in b ₂. That is,

$$ \begin{aligned} & v^{k+1}_{\beta}(i,b_1,d_1,b_2+1,d_2,v,l)-v^{k+1}_{\beta}(i,b_1,d_1,b_2,d_2,v,l)\\ & \quad \leq v^{k+1}_{\beta}(i,b_1,d_1,b_2,d_2,v,l)-v^{k+1}_{\beta}(i,b_1,d_1,b_2-1,d_2,v,l), \end{aligned} $$

(35)

or

$$ \begin{aligned} &v^{k+1}_{\beta}(i,b_1,d_1,b_2+1,d_2,v,l)-v^{k+1}_{\beta}(i,b_1,d_1,b_2,d_2,v,l)\\ & \quad - \left(v^{k+1}_{\beta}(i,b_1,d_1,b_2,d_2,v,l)\right.\\& \left.\quad -v^{k+1}_{\beta}(i,b_1,d_1,b_2-1,d_2,v,l)\right) \leq 0. \end{aligned}$$

(36)

We assume

$$ \begin{aligned} v^{k+1}_{\beta}(i,b_1,d_1,b_2+1,d_2,v,l)=\,& Q^{k+1}_{\beta}([i,b_1,d_1,b_2+1,d_2,v,l],a_2),\\ v^{k+1}_{\beta}(i,b_1,d_1,b_2,d_2,v,l) = \,& Q^{k+1}_{\beta}([i,b_1,d_1,b_2,d_2,v,l],a_1), \end{aligned} $$

and

$$ v^{k+1}_{\beta}(i,b_1,d_1,b_2-1,d_2,v,l)=Q^{k+1}_{\beta}([i,b_1,d_1,b_2-1,d_2,v,l],a_0), $$

for some actions $a_2, a_1, a_0 \in {\bf A}_{\bf s}$. Thus, we can re-write (35) as

$$ \begin{aligned} & Q^{k+1}_{\beta}([i,b_1,d_1,b_2+1,d_2,v,l], a_2)-Q^{k+1}_{\beta}([i,b_1,d_1,b_2,d_2,v,l], a_1)\\ & \qquad - \left(Q^{k+1}_{\beta}([i,b_1,d_1,b_2,d_2,v,l], a_1)-Q^{k+1}_{\beta}([i,b_1,d_1,b_2-1,d_2,v,l], a_0)\right)\\ & \quad \leq 0,\end{aligned} $$

(37)

or

$$ \begin{array}{c} \underset{W_1}{\underbrace{Q^{k+1}_{\beta}([i,b_1,d_1,b_2+1,d_2,v,l], a_2) - Q^{k+1}_{\beta}([i,b_1,d_1,b_2,d_2,v,l], a_2)}}\\ \underset{X_1}{\underbrace{+Q^{k+1}_{\beta}([i,b_1,d_1,b_2,d_2,v,l], a_2) - Q^{k+1}_{\beta}([i,b_1,d_1,b_2,d_2,v,l], a_1)}}\\ \underset{Y_1}{\underbrace{-Q^{k+1}_{\beta}([i,b_1,d_1,b_2,d_2,v,l], a_1) + Q^{k+1}_{\beta}([i,b_1,d_1,b_2,d_2,v,l], a_0)}}\\ -\underset{Z_1}{\left(\underbrace{Q^{k+1}_{\beta}([i,b_1,d_1,b_2,d_2,v,l], a_0) - Q^{k+1}_{\beta}([i,b_1,d_1,b_2-1,d_2,v,l], a_0})\right)}\\ \leq 0, \end{array} $$

where X ₁ ≤ 0 and Y ₁ ≤ 0 by optimality. Note that in X ₁ and Y ₁ the optimal action is a ₁.

In addition, it follows from the induction hypothesis that

$$ \begin{aligned} W_1 = & r([i, b_1, d_1, b_2+1, d_2, v, l], a_2; \beta) -r([i, b_1, d_1, b_2, d_2, v, l], a_2; \beta) \\ &+ \sum_{{\bf s}^{\prime} \in \, {\bf S}} \lambda \left(P[b^{\prime}_1,d^{\prime}_1|b_1,d_1]P[b^{\prime}_2,d^{\prime}_2|b_2+1,d_2]P[v^{\prime}|v]P[l^{\prime}\,|\,l]\right.\\ &\times v^{k}_{\beta}(j,b^{\prime}_1,d^{\prime}_1,(b_2+1)^{\prime},d^{\prime}_2,v^{\prime},l^{\prime}) - P[b^{\prime}_1,d^{\prime}_1\,|\,b_1,d_1] \\ &\times \left. P[b^{\prime}_2,d^{\prime}_2\,|\,b_2,d_2]P[v^{\prime}\,|\,v]P[l^{\prime}\,|\,l]v^{k}_{\beta}(j,b^{\prime}_1,d^{\prime}_1,b_2^{\prime},d^{\prime}_2,v^{\prime},l^{\prime})\right)\\ \leq & r([i, b_1, d_1, b_2, d_2, v, l], a_0; \beta) -r([i, b_1, d_1, b_2-1, d_2, v, l], a_0; \beta) \\ &+ \sum_{{\bf s}^{\prime} \in \, {\bf S}} \lambda \left(P[b^{\prime}_1,d^{\prime}_1\,|\,b_1,d_1]P[b^{\prime}_2,d^{\prime}_2\,|\,b_2,d_2]P[v^{\prime}\,|\,v]P[l^{\prime}\,|\,l]\right.\\ & \times \left. v^{k}_{\beta}(j,b^{\prime}_1,d^{\prime}_1,b_2^{\prime},d^{\prime}_2,v^{\prime},l^{\prime}) -P[b^{\prime}_1,d^{\prime}_1\,|\,b_1,d_1]P[b^{\prime}_2,d^{\prime}_2\,|\,b_2-1,d_2] \right.\\ &\times \left. P[v^{\prime}\,|\,v]P[l^{\prime}\,|\,l]v^{k}_{\beta}(j,b^{\prime}_1,d^{\prime}_1,(b_2-1)^{\prime},d^{\prime}_2,v^{\prime},l^{\prime})\right). \end{aligned} $$

The right-hand side (RHS) of the inequality comes from the expansion of Z ₁ which implies that W ₁ ≤ Z ₁. Therefore, it is shown that v ^k+1_β (i, b ₁, d ₁, b ₂, d ₂, v, l) satisfies (35), which implies that Q _β(s, a) is submodular in (b ₂,a).

Appendix 3: Proof of Theorem 2

To show that the optimal policy is monotonically non-decreasing in the delay, we need to prove that Q _β(s, a) is supermodular in (d ₂, a). We will prove via mathematical induction that, for a suitable initialization, (34) is monotonically non-decreasing in the delay d ₂. The above holds if $v_{\beta}(j,b^{\prime}_1,d^{\prime}_1,b^{\prime}_2,d^{\prime}_2,v^{\prime},l^{\prime})$ has non-increasing difference in d ₂. Select $v^{0}_{\beta}(j,b^{\prime}_1,d^{\prime}_1,b^{\prime}_2,d^{\prime}_2,v^{\prime},l^{\prime})$ with non-increasing difference in d ₂. Assume that $v^{k}_{\beta}(j,b^{\prime}_1,d^{\prime}_1,b^{\prime}_2,d^{\prime}_2,v^{\prime},l^{\prime})$ has non-increasing difference in d ₂, which implies that Q ^k+1 _β([i, b ₁, d ₁, b ₂, d ₂, v, l], a) is supermodular in (d ₂,a). We will now prove that $v^{k+1}_{\beta}(j,b^{\prime}_1,d^{\prime}_1,b^{\prime}_2,d^{\prime}_2,v^{\prime},l^{\prime})$ also has non-increasing difference in d ₂. That is,

$$ \begin{aligned} & v^{k+1}_{\beta}(i,b_1,d_1,b_2,d_2+1,v,l)-v^{k+1}_{\beta}(i,b_1,d_1,b_2,d_2,v,l)\\ &\quad \leq v^{k+1}_{\beta}(i,b_1,d_1,b_2,d_2,v,l)-v^{k+1}_{\beta}(i,b_1,d_1,b_2,d_2-1,v,l), \end{aligned}$$

(38)

or

$$ \begin{aligned} & v^{k+1}_{\beta}(i,b_1,d_1,b_2,d_2+1,v,l)-v^{k+1}_{\beta}(i,b_1,d_1,b_2,d_2,v,l)\\ &\quad - \left(v^{k+1}_{\beta}(i,b_1,d_1,b_2,d_2,v,l)-v^{k+1}_{\beta}(i,b_1,d_1,b_2,d_2-1,v,l)\right)\leq 0.\end{aligned} $$

(39)

We assume

$$ \begin{aligned} v^{k+1}_{\beta}(i,b_1,d_1,b_2,d_2+1,v,l) =\,& Q^{k+1}_{\beta}([i,b_1,d_1,b_2,d_2+1,v,l],a_2),\\ v^{k+1}_{\beta}(i,b_1,d_1,b_2,d_2,v,l) = \,& Q^{k+1}_{\beta}([i,b_1,d_1,b_2,d_2,v,l],a_1), \end{aligned} $$

and

$$ v^{k+1}_{\beta}(i,b_1,d_1,b_2,d_2-1,v,l)=Q^{k+1}_{\beta}([i,b_1,d_1,b_2,d_2-1,v,l],a_0), $$

for some actions $a_2, a_1, a_0 \in {\bf A}_{\bf s}$. Thus, we can re-write (38) as

$$\begin{aligned}& Q^{k+1}_{\beta}([i,b_1,d_1,b_2,d_2+1,v,l],a_2)-Q^{k+1}_{\beta}([i,b_1,d_1,b_2,d_2,v,l], a_1)\\& \quad - \left(Q^{k+1}_{\beta}([i,b_1,d_1,b_2,d_2,v,l], a_1)\right.\\& \quad \left.- Q^{k+1}_{\beta}([i,b_1,d_1,b_2,d_2-1,v,l], a_0)\right) \leq 0,\end{aligned} $$

(40)

or

$$ \begin{array}{c} \underset{W_2}{\underbrace{Q^{k+1}_{\beta}([i,b_1,d_1,b_2,d_2+1,v,l], a_2) - Q^{k+1}_{\beta}([i,b_1,d_1,b_2,d_2,v,l], a_2)}}\\ \underset{X_2}{\underbrace{+Q^{k+1}_{\beta}([i,b_1,d_1,b_2,d_2,v,l], a_2) - Q^{k+1}_{\beta}([i,b_1,d_1,b_2,d_2,v,l], a_1)}}\\ \underset{Y_2}{\underbrace{-Q^{k+1}_{\beta}([i,b_1,d_1,b_2,d_2,v,l], a_1) + Q^{k+1}_{\beta}([i,b_1,d_1,b_2,d_2,v,l], a_0)}}\\ -\underset{Z_2}{\left(\underbrace{Q^{k+1}_{\beta}([i,b_1,d_1,b_2,d_2,v,l], a_0) - Q^{k+1}_{\beta}([i,b_1,d_1,b_2,d_2-1,v,l], a_0})\right)}\\ \leq 0, \end{array} $$

where X ₂ ≤ 0 and Y ₂ ≤ 0 by optimality. Note that in X ₂ and Y ₂ the optimal action is a ₁.

In addition, it follows from the induction hypothesis that

$$ \begin{aligned}W_2 = & r([i, b_1, d_1, b_2, d_2+1,v,l], a_2; \beta) -r([i, b_1, d_1, b_2, d_2,v,l], a_2; \beta) \\ &+ \sum_{{\bf s}^{\prime} \in \, {\bf S}} \lambda \left(P[b^{\prime}_1,d^{\prime}_1\,|\,b_1,d_1]P[b^{\prime}_2,d^{\prime}_2\,|\,b_2,d_2+1]P[v^{\prime}\,|\,v]P[l^{\prime}\,|\,l]\right.\\ &\times v^{k}_{\beta}(j,b^{\prime}_1,d^{\prime}_1,b^{\prime}_2,(d_2+1)^{\prime},v^{\prime},l^{\prime}) -P[b^{\prime}_1,d^{\prime}_1\,|\,b_1,d_1]\\ &\times \left. P[b^{\prime}_2,d^{\prime}_2\,|\,b_2,d_2] P[v^{\prime}\,|\,v]P[l^{\prime}\,|\,l]v^{k}_{\beta}(j,b^{\prime}_1,d^{\prime}_1,b^{\prime}_2,d_2^{\prime},v^{\prime},l^{\prime})\right)\\ \leq & r([i, b_1, d_1, b_2, d_2,v,l], a_0; \beta) -r([i, b_1, d_1, b_2, d_2-1,v,l], a_0; \beta)\\ & + \sum_{{\bf s}^{\prime} \in \, {\bf S}} \lambda \left(P[b^{\prime}_1,d^{\prime}_1\,|\,b_1,d_1]P[b^{\prime}_2,d^{\prime}_2\,|\,b_2,d_2]P[v^{\prime}\,|\,v]P[l^{\prime}\,|\,l]\right.\\ &\times v^{k}_{\beta}(j,b^{\prime}_1,d^{\prime}_1,b^{\prime}_2,d_2^{\prime},v^{\prime},l^{\prime}) -P[b^{\prime}_1,d^{\prime}_1\,|\,b_1,d_1]\\ &\times P[b^{\prime}_2,d^{\prime}_2\,|\,b_2,d_2-1]P[v^{\prime}\,|\,v]P[l^{\prime}\,|\,l] \\ &\times \left. v^{k}_{\beta}(j,b^{\prime}_1,d^{\prime}_1,b^{\prime}_2,(d_2-1)^{\prime},v^{\prime},l^{\prime})\right).\\ \end{aligned} $$

The RHS of the inequality comes from the expansion of Z ₂ which implies that W ₂ ≤ Z ₂. Therefore, it is shown that v ^k+1 _β(i, b ₁, d ₁, b ₂, d ₂, v, l) satisfies (38), which implies that Q _β(s, a) is supermodular in (d ₂,a).

Appendix 4: Proof of Theorem 3

To show that the optimal policy is monotonically non-decreasing in the velocity, we need to prove that Q _β(s, a) is supermodular in (v, a). We will prove via mathematical induction that for a suitable initialization, (34) is monotonically non-decreasing in the velocity v. It holds if $v_{\beta}(j,b^{\prime}_1,d^{\prime}_1,b^{\prime}_2,d^{\prime}_2,v^{\prime},l^{\prime})$ has non-increasing difference in v. Select $v^{0}_{\beta}(j,b^{\prime}_1,d^{\prime}_1,b^{\prime}_2,d^{\prime}_2,v^{\prime},l^{\prime})$ with non-increasing difference in v. Assume that $v^{k}_{\beta}(j,b^{\prime}_1,d^{\prime}_1,b^{\prime}_2,d^{\prime}_2,v^{\prime},l^{\prime})$ has non-increasing difference in v, which implies that Q ^k+1 _β([i, b ₁, d ₁, b ₂, d ₂, v, l], a) is supermodular in (v, a). We will now prove that $v^{k+1}_{\beta}(j,b^{\prime}_1,d^{\prime}_1,b^{\prime}_2,d^{\prime}_2,v^{\prime},l^{\prime})$ also has non-increasing difference in v. That is,

$$ \begin{aligned} & v^{k+1}_{\beta}(i,b_1,d_1,b_2,d_2,v+1,l)-v^{k+1}_{\beta}(i,b_1,d_1,b_2,d_2,v,l)\\ & \quad \leq v^{k+1}_{\beta}(i,b_1,d_1,b_2,d_2,v,l)-v^{k+1}_{\beta}(i,b_1,d_1,b_2,d_2,v-1,l), \end{aligned}$$

(41)

or

$$ \begin{aligned} & v^{k+1}_{\beta}(i,b_1,d_1,b_2,d_2,v+1,l)-v^{k+1}_{\beta}(i,b_1,d_1,b_2,d_2,v,l)\\ &\quad - \left(v^{k+1}_{\beta}(i,b_1,d_1,b_2,d_2,v,l)-v^{k+1}_{\beta}(i,b_1,d_1,b_2,d_2,v-1,l)\right)\leq 0. \end{aligned} $$

(42)

We assume

$$ \begin{aligned} v^{k+1}_{\beta}(i,b_1,d_1,b_2,d_2,v+1,l) =\, & Q^{k+1}_{\beta}([i,b_1,d_1,b_2,d_2,v+1,l],a_2),\\ v^{k+1}_{\beta}(i,b_1,d_1,b_2,d_2,v,l) =\, & Q^{k+1}_{\beta}([i,b_1,d_1,b_2,d_2,v,l],a_1), \end{aligned} $$

and

$$ v^{k+1}_{\beta}(i,b_1,d_1,b_2,d_2,v-1,l)=Q^{k+1}_{\beta}([i,b_1,d_1,b_2,d_2,v-1,l],a_0), $$

for some actions $a_2, a_1, a_0 \in {\bf A}_{\bf s}$. Thus, we can re-write (41) as

$$ \begin{aligned} & Q^{k+1}_{\beta}([i,b_1,d_1,b_2,d_2,v+1,l], a_2)-Q^{k+1}_{\beta}([i,b_1,d_1,b_2,d_2,v,l], a_1)\\ & \quad - \left(Q^{k+1}_{\beta}([i,b_1,d_1,b_2,d_2,v,l], a_1)\right. \\ & \quad \left.- Q^{k+1}_{\beta}\left([i,b_1,d_1,b_2,d_2,v-1,l], a_0\right)\right) \leq 0, \end{aligned} $$

(43)

or

$$ \begin{array}{c} \underset{W_3}{\underbrace{Q^{k+1}_{\beta}([i,b_1,d_1,b_2,d_2,v+1,l], a_2) - Q^{k+1}_{\beta}([i,b_1,d_1,b_2,d_2,v,l], a_2)}}\\ \underset{X_3}{\underbrace{+Q^{k+1}_{\beta}([i,b_1,d_1,b_2,d_2,v,l], a_2) - Q^{k+1}_{\beta}([i,b_1,d_1,b_2,d_2,v,l], a_1)}}\\ \underset{Y_3}{\underbrace{-Q^{k+1}_{\beta}([i,b_1,d_1,b_2,d_2,v,l], a_1) + Q^{k+1}_{\beta}([i,b_1,d_1,b_2,d_2,v,l], a_0)}}\\ -\underset{Z_3}{\left(\underbrace{Q^{k+1}_{\beta}([i,b_1,d_1,b_2,d_2,v,l], a_0) - Q^{k+1}_{\beta}([i,b_1,d_1,b_2,d_2,v-1,l], a_0})\right)}\\ \leq 0, \end{array} $$

where X ₃ ≤ 0 and Y ₃ ≤ 0 by optimality. Note that in X ₃ and Y ₃ the optimal action is a ₁.

In addition, it follows from the induction hypothesis that

$$ \begin{aligned} W_3 = & r([i, b_1, d_1, b_2, d_2,v+1,l], a_2; \beta) -r([i, b_1, d_1, b_2, d_2,v,l], a_2; \beta) \\ &+ \sum_{{\bf s}^{\prime} \in \, {\bf S}} \lambda \left(P[b^{\prime}_1,d^{\prime}_1\,|\,b_1,d_1]P[b^{\prime}_2,d^{\prime}_2\,|\,b_2,d_2]P[v^{\prime}\,|\,v+1]P[l^{\prime}\,|\,l]\right.\\ &\times v^{k}_{\beta}(j,b^{\prime}_1,d^{\prime}_1,b^{\prime}_2,d^{\prime}_2,(v+1)^{\prime},l^{\prime}) -P[b^{\prime}_1,d^{\prime}_1\,|\,b_1,d_1] \\ &\times \left. P[b^{\prime}_2,d^{\prime}_2\,|\,b_2,d_2]P[v^{\prime}\,|\,v]P[l^{\prime}\,|\,l]v^{k}_{\beta}(j,b^{\prime}_1,d^{\prime}_1,b^{\prime}_2,d^{\prime}_2,v^{\prime},l^{\prime})\right) \\ \leq & r([i, b_1, d_1, b_2, d_2,v,l], a_0; \beta) -r([i, b_1, d_1, b_2, d_2,v-1,l], a_0; \beta) \\ &+\sum_{{\bf s}^{\prime} \in \, {\bf S}} \lambda \left(P[b^{\prime}_1,d^{\prime}_1\,|\,b_1,d_1]P[b^{\prime}_2,d^{\prime}_2\,|\,b_2,d_2]P[v^{\prime}\,|\,v]P[l^{\prime}\,|\,l] \right. \\ &\times v^{k}_{\beta}(j,b^{\prime}_1,d^{\prime}_1,b^{\prime}_2,d^{\prime}_2,v^{\prime},l^{\prime}) -P[b^{\prime}_1,d^{\prime}_1\,|\,b_1,d_1]P[b^{\prime}_2,d^{\prime}_2\,|\,b_2,d_2] \\ &\times \left. P[v^{\prime}\,|\,v-1]P[l^{\prime}\,|\,l]v^{k}_{\beta}(j,b^{\prime}_1,d^{\prime}_1,b^{\prime}_2,d^{\prime}_2,(v-1)^{\prime},l^{\prime})\right). \\ \end{aligned} $$

The RHS of the inequality comes from the expansion of Z ₃ which implies that W ₃ ≤ Z ₃. Therefore, it is shown that v ^k+1 _β(i, b ₁, d ₁, b ₂, d ₂, v, l) satisfies (41), which implies that Q _β(s, a) is supermodular in (v, a).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sun, C., Stevens-Navarro, E., Shah-Mansouri, V. et al. A constrained MDP-based vertical handoff decision algorithm for 4G heterogeneous wireless networks. Wireless Netw 17, 1063–1081 (2011). https://doi.org/10.1007/s11276-011-0335-x

Download citation

Published: 27 March 2011
Issue Date: May 2011
DOI: https://doi.org/10.1007/s11276-011-0335-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A constrained MDP-based vertical handoff decision algorithm for 4G heterogeneous wireless networks

Abstract

Access this article

Similar content being viewed by others

An Optimal Vertical Handoff Decision Algorithm for Multiple Services with Different Priorities in Heterogeneous Wireless Networks

A multi-objective model-based vertical handoff algorithm for heterogeneous wireless networks

Game Theory for Vertical Handoff Decisions in Heterogeneous Wireless Networks: A Tutorial

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix 1: Proof of Lemma 1

Appendix 2: Proof of Theorem 1

Appendix 3: Proof of Theorem 2

Appendix 4: Proof of Theorem 3

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A constrained MDP-based vertical handoff decision algorithm for 4G heterogeneous wireless networks

Abstract

Access this article

Similar content being viewed by others

An Optimal Vertical Handoff Decision Algorithm for Multiple Services with Different Priorities in Heterogeneous Wireless Networks

A multi-objective model-based vertical handoff algorithm for heterogeneous wireless networks

Game Theory for Vertical Handoff Decisions in Heterogeneous Wireless Networks: A Tutorial

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix 1: Proof of Lemma 1

Appendix 2: Proof of Theorem 1

Appendix 3: Proof of Theorem 2

Appendix 4: Proof of Theorem 3

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation