Skip to main content
Log in

A constrained MDP-based vertical handoff decision algorithm for 4G heterogeneous wireless networks

  • Published:
Wireless Networks Aims and scope Submit manuscript

Abstract

The 4th generation wireless communication systems aim to provide users with the convenience of seamless roaming among heterogeneous wireless access networks. To achieve this goal, the support of vertical handoff is important in mobility management. This paper focuses on the vertical handoff decision algorithm, which determines the criteria under which vertical handoff should be performed. The problem is formulated as a constrained Markov decision process. The objective is to maximize the expected total reward of a connection subject to the expected total access cost constraint. In our model, a benefit function is used to assess the quality of the connection, and a penalty function is used to model the signaling incurred and call dropping. The user’s velocity and location information are also considered when making handoff decisions. The policy iteration and Q-learning algorithms are employed to determine the optimal policy. Structural results on the optimal vertical handoff policy are derived by using the concept of supermodularity. We show that the optimal policy is a threshold policy in bandwidth, delay, and velocity. Numerical results show that our proposed vertical handoff decision algorithm outperforms other decision schemes in a wide range of conditions such as variations on connection duration, user’s velocity, user’s budget, traffic type, signaling cost, and monetary access cost.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Notes

  1. The time between two successive decision epochs is on the order of seconds.

References

  1. 3rd Generation Partnership Project (3GPP) http://www.3gpp.org.

  2. 3rd Generation Partnership Project 2 (3GPP2) http://www.3gpp2.org.

  3. IEEE 802.21 Media Independent Handover Working Group, http://www.ieee802.org/21/.

  4. McNair, J., & Zhu, F. (2004). Vertical handoffs in fourth-generation multi-network environments. IEEE Wireless Communications, 11(3), 8–15.

    Article  Google Scholar 

  5. Chen, W., Liu, J., & Huang, H. (2004). An adaptive scheme for vertical handoff in wireless overlay networks. In Proceedings of ICPAD’04, Newport Beach, CA.

  6. Xia, L., Jiang, L. G., & He, C. (2007). A novel fuzzy logic vertical handoff algorithm with aid of differential prediction and pre-decision method. In Proceedings of IEEE ICC’07, Glasgow, Scotland.

  7. Nasser, N., Guizani, S., & Al-Masri, E. (2007). Middleware vertical handoff manager: A neural network-based solution. In Proceedings of IEEE ICC’07, Glasgow, Scotland.

  8. Mani, M., & Crespi, N. (2006). Handover criteria considerations in future convergent networks. In Proceedings of IEEE GLOBECOM’06, San Francisco, CA.

  9. Garmonov, A. V., Cheon, S. H., Yim, D. H., Han, K. T., Park, Y. S., Savinkov, A. Y., et al. (2008). QoS-oriented intersystem handover between IEEE 802.11b and overlay networks. IEEE Transactions on Vehicular Technology, 57(2), 1142–1154.

    Article  Google Scholar 

  10. Zhang, W. (2004). Handover decision using fuzzy MADM in heterogeneous networks. In Proceedings of IEEE WCNC’04, Atlanta, GA.

  11. Bari, F., & Leung, V. (2007). Application of ELECTRE to network selection in a heterogeneous wireless network environment. In Proceedings of IEEE WCNC’07, Hong Kong, China.

  12. Yang, K., Gondal, I., Qiu, B., & Dooley, L. S. (2007). Combined SINR based vertical handoff algorithm for next generation heterogeneous wireless networks. In Proceedings of IEEE GLOBECOM’07, Washington, DC.

  13. Liu, M., Li, Z., Guo, X., & Dutkiewicz, E. (2008). Performance analysis and optimization of handoff algorithms in heterogeneous wireless networks. IEEE Transactions on Mobile Computing, 7(7), 846–857.

    Article  Google Scholar 

  14. Chien, S., Liu, H., Low, A. L. Y., Maciocco, C., & Ho, Y. (2008). Smart predictive trigger for effective handover in wireless networks. In Proceedings of IEEE ICC’08, Beijing, China.

  15. Ormond, O., Murphy, J., & Muntean, G. (2006). Utility-based intelligent network selection in beyond 3G systems. In Proceedings of IEEE ICC’06, Istanbul, Turkey.

  16. Zhang, J., Chan, H. C., & Leung, V. (2006). A location-based vertical handoff decision algorithm for heterogeneous mobile networks. In Proceedings of IEEE GLOBECOM’06, San Francisco, CA.

  17. Guo, Q., Zhu, J., & Xu, X. (2005). An adaptive multi-criteria vertical handoff decision algorithm for radio heterogeneous networks. In Proceedings of IEEE ICC’05, Seoul, Korea.

  18. Lee, W., Kim, E., Kim, J., Lee, I., & Lee, C. (2007). Movement-aware vertical handoff of WLAN and mobile WiMAX for seamless ubiquitous access. IEEE Transactions on Consumers Electronics, 53(4), 1268–1275.

    Article  Google Scholar 

  19. Zahran, A., & Liang, B. (2005). Performance evaluation framework for vertical handoff algorithms in heterogeneous networks. In Proceedings of IEEE ICC’05, Seoul, Korea.

  20. Wang, J., Prasad, R. V., & Niemegeers, I. (2008). Solving the incertitude of vertical handovers in heterogeneous mobile wireless network using mdp. In Proceedings of IEEE ICC’08, Beijing, China.

  21. Stevens-Navarro, E., Lin, Y., & Wong, V.W.S. (2008). An MDP-based vertical handoff decision algorithm for heterogeneous wireless networks. IEEE Transactions on Vehicular Technology, 57(2), 1243–1254.

    Article  Google Scholar 

  22. Sun, C., Stevens-Navarro, E., & Wong, V. (2008). A constrained MDP-based vertical handoff decision algorithm for 4G wireless networks. In Proceedings of IEEE ICC’08, Beijing, China.

  23. Sun, C. (2008). A constrained MDP-based vertical handoff decision algorithm for wireless networks. Master’s thesis, University of British Columbia, Vancouver, BC.

  24. Altman, E. (1999). Constrained Markov decision processes. London: Chapman and Hall.

    MATH  Google Scholar 

  25. Liang, B., & Haas, Z. (2003). Predictive distance-based mobility management for multidimensional PCS networks. IEEE/ACM Transactions on Networking, 11(5), 718–732.

    Article  Google Scholar 

  26. Tang, S., & Li, W. (2005). Performance analysis of the 3G network with complementary WLANs. In Proceedings of IEEE GLOBECOM’05, St. Louis, MO.

  27. Doufexi, A., Tameh, E., Nix, A., Armour, S., & Molina, A. (2003). Hotspot wireless LANs to enhance the performance of 3G and beyond cellular networks. IEEE Communications Magazine, 41(7), 58–65.

    Article  Google Scholar 

  28. The network simulator—ns-2, http://www.isi.edu/nsnam/ns.

  29. WiMAX module for ns-2 simulator, http://ndsl.csie.cgu.edu.tw/wimaxns2.php.

  30. Puterman, M. (1994). Markov decision processes: Discrete stochastic dynamic programming. New York: Wiley.

    MATH  Google Scholar 

  31. Bertsekas, D. P. (2007). Dynamic programming and optimal control (3rd ed.). Boston, MA: Athena Scientific.

    Google Scholar 

  32. Djonin, V., & Krishnamurthy, V. (2007). Q-learning algorithms for constrained Markov decision process with randomized monotone policies: Application to MIMO transmission control. IEEE Transactions on Signal Processing, 55(5), 2170–2181.

    Article  MathSciNet  Google Scholar 

  33. Beutler, F., & Ross, K. (1985). Optimal policies for controlled markov chains with a constraint. Journal of Mathematical Analysis and Applications, 112, 236–252.

    Article  MATH  MathSciNet  Google Scholar 

  34. Topkis, D. M. (1998). Supermodularity and complementary. Princeton: Princeton University Press.

    Google Scholar 

Download references

Acknowledgments

This work was supported by Bell Canada and the Natural Sciences and Engineering Research Council (NSERC) of Canada.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vincent W. S. Wong.

Appendices

Appendix 1: Proof of Lemma 1

We will prove v β(ib 1d 1b 2d 2vl) is monotone in available bandwidth, delay, and velocity. This consists of two steps:

  1. 1.

    To prove the reward function r(sa) is monotone in available bandwidth, delay, and velocity;

  2. 2.

    To prove the sum of transition probabilities \(\textstyle \sum_{{\bf s}^{\prime} \in {\bf S}} P[{\bf s}^{\prime}|{\bf s},a]\) is monotone in available bandwidth, delay, and velocity.

We first note that the only part that relates to the bandwidth in the reward function (i.e., r(sa)) is f b (sa). Let b 1 a and b 2 a be two possible bandwidth values, and b 1 a  ≥ b 2 a . We denote f b (s 1, a) as the value of f b (sa) when b a  = b 1 a , and f b (s 2,a) as the value of f b (sa) when b a  = b 2 a . Clearly from the definition of f b (sa),  f b (s 1a) is greater than (or equal to) f b (s 2,a), since f b (sa) is linearly proportional to b a . As a result, the reward function r(sa) is monotonically non-decreasing in the available bandwidth.

Similarly, the only part that relates to the delay in the reward function is f d (sa). Let d 1 a and d 2 a be two possible delay values, and d 1 a  ≥ d 2 a . We denote f d (s 1, a) as the value of f d (sa) when d a  = d 1 a , and f d (s 2,a) as the value of f d (sa) when d a  = d 2 a . Clearly from the definition of f d (sa),  f d (s 1a) is smaller than (or equal to) f d (s 2,a), since f d (sa) is linear inverse proportional to d a . As a result, the reward function r(sa) is monotonically non-increasing in the delay.

For the velocity, the only part that relates to it in the reward function is −q(sa). From the definition of q(sa) in (9), when the velocity v becomes larger, the value of q(sa) becomes larger or remains the same, which means that −q(sa) becomes smaller or stays the same. Consequently, the reward function r(sa) is monotonically non-increasing in velocity.

We assume that the transition probability function \(P[{\bf s}^{\prime}|{\bf s},a]\) satisfies the first order stochastic dominance condition. This implies when the system is in a better state (e.g., larger bandwidth, lower delay), its evolution will be in the region of better states with a higher probability. When the available bandwidth is considered, it implies that the sum of transition probabilities (i.e., \(\textstyle\sum_{{\bf s}^{\prime} \in {\bf S}} P[{\bf s}^{\prime}|{\bf s},a]\)) is monotonically non-decreasing in the available bandwidth. Similarly, the delay (velocity) in the next decision epoch is stochastically decreasing with respect to the delay (velocity) in the current decision epoch is the condition under which the sum of transition probabilities (i.e., \(\textstyle\sum_{{\bf s}^{\prime} \in {\bf S}} P[{\bf s}^{\prime}|{\bf s},a]\)) is monotonically non-increasing in the delay (velocity).

Appendix 2: Proof of Theorem 1

To show that the optimal policy is monotonically non-increasing in the available bandwidth, we need to prove that Q β(sa) is submodular in (b 2, a). We will prove via mathematical induction that for a suitable initialization,

$$ \begin{aligned} & Q^{k+1}_{\beta}([i, b_1, d_1, b_2, d_2, v, l], 2)-Q^{k+1}_{\beta}([i, b_1, d_1, b_2, d_2, v, l], 1) &\quad = r([i, b_1, d_1, b_2, d_2, v, l], 2; \beta) -r([i, b_1, d_1, b_2, d_2, v, l], 1; \beta) \\ &\qquad+\sum_{{\bf s}^{\prime} \in \, {\bf S}} \lambda P[b^{\prime}_1,d^{\prime}_1|b_1,d_1]P[b^{\prime}_2,d^{\prime}_2|b_2,d_2]P[v^{\prime}|v]P[l^{\prime}|l]\\ &\qquad \times \left(v^{k}_{\beta}(j,b^{\prime}_1,d^{\prime}_1,b^{\prime}_2,d^{\prime}_2,v^{\prime},l^{\prime})-v^{k}_{\beta}(j,b^{\prime}_1,d^{\prime}_1,b^{\prime}_2,d^{\prime}_2,v^{\prime},l^{\prime})\right), \end{aligned} $$
(34)

is monotonically non-increasing in the available bandwidth b 2. It holds if \(v_{\beta}(j,b^{\prime}_1,d^{\prime}_1,b^{\prime}_2,d^{\prime}_2,v^{\prime},l^{\prime})\) has non-increasing difference in b 2. Select \(v^{0}_{\beta}(j,b^{\prime}_1,d^{\prime}_1,b^{\prime}_2,d^{\prime}_2,v^{\prime},l^{\prime})\) with non-increasing difference in b 2. Assume that \(v^{k}_{\beta}(j,b^{\prime}_1,d^{\prime}_1,b^{\prime}_2,d^{\prime}_2,v^{\prime},l^{\prime})\) has non-increasing difference in b 2, which implies that Q k+1 β([ib 1d 1b 2d 2vl], a) is submodular in (b 2,a). We will now prove that \(v^{k+1}_{\beta}(j,b^{\prime}_1,d^{\prime}_1,b^{\prime}_2,d^{\prime}_2,v^{\prime},l^{\prime})\) also has non-increasing difference in b 2. That is,

$$ \begin{aligned} & v^{k+1}_{\beta}(i,b_1,d_1,b_2+1,d_2,v,l)-v^{k+1}_{\beta}(i,b_1,d_1,b_2,d_2,v,l)\\ & \quad \leq v^{k+1}_{\beta}(i,b_1,d_1,b_2,d_2,v,l)-v^{k+1}_{\beta}(i,b_1,d_1,b_2-1,d_2,v,l), \end{aligned} $$
(35)

or

$$ \begin{aligned} &v^{k+1}_{\beta}(i,b_1,d_1,b_2+1,d_2,v,l)-v^{k+1}_{\beta}(i,b_1,d_1,b_2,d_2,v,l)\\ & \quad - \left(v^{k+1}_{\beta}(i,b_1,d_1,b_2,d_2,v,l)\right.\\& \left.\quad -v^{k+1}_{\beta}(i,b_1,d_1,b_2-1,d_2,v,l)\right) \leq 0. \end{aligned}$$
(36)

We assume

$$ \begin{aligned} v^{k+1}_{\beta}(i,b_1,d_1,b_2+1,d_2,v,l)=\,& Q^{k+1}_{\beta}([i,b_1,d_1,b_2+1,d_2,v,l],a_2),\\ v^{k+1}_{\beta}(i,b_1,d_1,b_2,d_2,v,l) = \,& Q^{k+1}_{\beta}([i,b_1,d_1,b_2,d_2,v,l],a_1), \end{aligned} $$

and

$$ v^{k+1}_{\beta}(i,b_1,d_1,b_2-1,d_2,v,l)=Q^{k+1}_{\beta}([i,b_1,d_1,b_2-1,d_2,v,l],a_0), $$

for some actions \(a_2, a_1, a_0 \in {\bf A}_{\bf s}\). Thus, we can re-write (35) as

$$ \begin{aligned} & Q^{k+1}_{\beta}([i,b_1,d_1,b_2+1,d_2,v,l], a_2)-Q^{k+1}_{\beta}([i,b_1,d_1,b_2,d_2,v,l], a_1)\\ & \qquad - \left(Q^{k+1}_{\beta}([i,b_1,d_1,b_2,d_2,v,l], a_1)-Q^{k+1}_{\beta}([i,b_1,d_1,b_2-1,d_2,v,l], a_0)\right)\\ & \quad \leq 0,\end{aligned} $$
(37)

or

$$ \begin{array}{c} \underset{W_1}{\underbrace{Q^{k+1}_{\beta}([i,b_1,d_1,b_2+1,d_2,v,l], a_2) - Q^{k+1}_{\beta}([i,b_1,d_1,b_2,d_2,v,l], a_2)}}\\ \underset{X_1}{\underbrace{+Q^{k+1}_{\beta}([i,b_1,d_1,b_2,d_2,v,l], a_2) - Q^{k+1}_{\beta}([i,b_1,d_1,b_2,d_2,v,l], a_1)}}\\ \underset{Y_1}{\underbrace{-Q^{k+1}_{\beta}([i,b_1,d_1,b_2,d_2,v,l], a_1) + Q^{k+1}_{\beta}([i,b_1,d_1,b_2,d_2,v,l], a_0)}}\\ -\underset{Z_1}{\left(\underbrace{Q^{k+1}_{\beta}([i,b_1,d_1,b_2,d_2,v,l], a_0) - Q^{k+1}_{\beta}([i,b_1,d_1,b_2-1,d_2,v,l], a_0})\right)}\\ \leq 0, \end{array} $$

where X 1 ≤ 0 and Y 1 ≤ 0 by optimality. Note that in X 1 and Y 1 the optimal action is a 1.

In addition, it follows from the induction hypothesis that

$$ \begin{aligned} W_1 = & r([i, b_1, d_1, b_2+1, d_2, v, l], a_2; \beta) -r([i, b_1, d_1, b_2, d_2, v, l], a_2; \beta) \\ &+ \sum_{{\bf s}^{\prime} \in \, {\bf S}} \lambda \left(P[b^{\prime}_1,d^{\prime}_1|b_1,d_1]P[b^{\prime}_2,d^{\prime}_2|b_2+1,d_2]P[v^{\prime}|v]P[l^{\prime}\,|\,l]\right.\\ &\times v^{k}_{\beta}(j,b^{\prime}_1,d^{\prime}_1,(b_2+1)^{\prime},d^{\prime}_2,v^{\prime},l^{\prime}) - P[b^{\prime}_1,d^{\prime}_1\,|\,b_1,d_1] \\ &\times \left. P[b^{\prime}_2,d^{\prime}_2\,|\,b_2,d_2]P[v^{\prime}\,|\,v]P[l^{\prime}\,|\,l]v^{k}_{\beta}(j,b^{\prime}_1,d^{\prime}_1,b_2^{\prime},d^{\prime}_2,v^{\prime},l^{\prime})\right)\\ \leq & r([i, b_1, d_1, b_2, d_2, v, l], a_0; \beta) -r([i, b_1, d_1, b_2-1, d_2, v, l], a_0; \beta) \\ &+ \sum_{{\bf s}^{\prime} \in \, {\bf S}} \lambda \left(P[b^{\prime}_1,d^{\prime}_1\,|\,b_1,d_1]P[b^{\prime}_2,d^{\prime}_2\,|\,b_2,d_2]P[v^{\prime}\,|\,v]P[l^{\prime}\,|\,l]\right.\\ & \times \left. v^{k}_{\beta}(j,b^{\prime}_1,d^{\prime}_1,b_2^{\prime},d^{\prime}_2,v^{\prime},l^{\prime}) -P[b^{\prime}_1,d^{\prime}_1\,|\,b_1,d_1]P[b^{\prime}_2,d^{\prime}_2\,|\,b_2-1,d_2] \right.\\ &\times \left. P[v^{\prime}\,|\,v]P[l^{\prime}\,|\,l]v^{k}_{\beta}(j,b^{\prime}_1,d^{\prime}_1,(b_2-1)^{\prime},d^{\prime}_2,v^{\prime},l^{\prime})\right). \end{aligned} $$

The right-hand side (RHS) of the inequality comes from the expansion of Z 1 which implies that W 1 ≤ Z 1. Therefore, it is shown that v k+1β (ib 1d 1b 2d 2vl) satisfies (35), which implies that Q β(sa) is submodular in (b 2,a).

Appendix 3: Proof of Theorem 2

To show that the optimal policy is monotonically non-decreasing in the delay, we need to prove that Q β(sa) is supermodular in (d 2, a). We will prove via mathematical induction that, for a suitable initialization, (34) is monotonically non-decreasing in the delay d 2. The above holds if \(v_{\beta}(j,b^{\prime}_1,d^{\prime}_1,b^{\prime}_2,d^{\prime}_2,v^{\prime},l^{\prime})\) has non-increasing difference in d 2. Select \(v^{0}_{\beta}(j,b^{\prime}_1,d^{\prime}_1,b^{\prime}_2,d^{\prime}_2,v^{\prime},l^{\prime})\) with non-increasing difference in d 2. Assume that \(v^{k}_{\beta}(j,b^{\prime}_1,d^{\prime}_1,b^{\prime}_2,d^{\prime}_2,v^{\prime},l^{\prime})\) has non-increasing difference in d 2, which implies that Q k+1 β([ib 1d 1b 2d 2vl], a) is supermodular in (d 2,a). We will now prove that \(v^{k+1}_{\beta}(j,b^{\prime}_1,d^{\prime}_1,b^{\prime}_2,d^{\prime}_2,v^{\prime},l^{\prime})\) also has non-increasing difference in d 2. That is,

$$ \begin{aligned} & v^{k+1}_{\beta}(i,b_1,d_1,b_2,d_2+1,v,l)-v^{k+1}_{\beta}(i,b_1,d_1,b_2,d_2,v,l)\\ &\quad \leq v^{k+1}_{\beta}(i,b_1,d_1,b_2,d_2,v,l)-v^{k+1}_{\beta}(i,b_1,d_1,b_2,d_2-1,v,l), \end{aligned}$$
(38)

or

$$ \begin{aligned} & v^{k+1}_{\beta}(i,b_1,d_1,b_2,d_2+1,v,l)-v^{k+1}_{\beta}(i,b_1,d_1,b_2,d_2,v,l)\\ &\quad - \left(v^{k+1}_{\beta}(i,b_1,d_1,b_2,d_2,v,l)-v^{k+1}_{\beta}(i,b_1,d_1,b_2,d_2-1,v,l)\right)\leq 0.\end{aligned} $$
(39)

We assume

$$ \begin{aligned} v^{k+1}_{\beta}(i,b_1,d_1,b_2,d_2+1,v,l) =\,& Q^{k+1}_{\beta}([i,b_1,d_1,b_2,d_2+1,v,l],a_2),\\ v^{k+1}_{\beta}(i,b_1,d_1,b_2,d_2,v,l) = \,& Q^{k+1}_{\beta}([i,b_1,d_1,b_2,d_2,v,l],a_1), \end{aligned} $$

and

$$ v^{k+1}_{\beta}(i,b_1,d_1,b_2,d_2-1,v,l)=Q^{k+1}_{\beta}([i,b_1,d_1,b_2,d_2-1,v,l],a_0), $$

for some actions \(a_2, a_1, a_0 \in {\bf A}_{\bf s}\). Thus, we can re-write (38) as

$$\begin{aligned}& Q^{k+1}_{\beta}([i,b_1,d_1,b_2,d_2+1,v,l],a_2)-Q^{k+1}_{\beta}([i,b_1,d_1,b_2,d_2,v,l], a_1)\\& \quad - \left(Q^{k+1}_{\beta}([i,b_1,d_1,b_2,d_2,v,l], a_1)\right.\\& \quad \left.- Q^{k+1}_{\beta}([i,b_1,d_1,b_2,d_2-1,v,l], a_0)\right) \leq 0,\end{aligned} $$
(40)

or

$$ \begin{array}{c} \underset{W_2}{\underbrace{Q^{k+1}_{\beta}([i,b_1,d_1,b_2,d_2+1,v,l], a_2) - Q^{k+1}_{\beta}([i,b_1,d_1,b_2,d_2,v,l], a_2)}}\\ \underset{X_2}{\underbrace{+Q^{k+1}_{\beta}([i,b_1,d_1,b_2,d_2,v,l], a_2) - Q^{k+1}_{\beta}([i,b_1,d_1,b_2,d_2,v,l], a_1)}}\\ \underset{Y_2}{\underbrace{-Q^{k+1}_{\beta}([i,b_1,d_1,b_2,d_2,v,l], a_1) + Q^{k+1}_{\beta}([i,b_1,d_1,b_2,d_2,v,l], a_0)}}\\ -\underset{Z_2}{\left(\underbrace{Q^{k+1}_{\beta}([i,b_1,d_1,b_2,d_2,v,l], a_0) - Q^{k+1}_{\beta}([i,b_1,d_1,b_2,d_2-1,v,l], a_0})\right)}\\ \leq 0, \end{array} $$

where X 2 ≤ 0 and Y 2 ≤ 0 by optimality. Note that in X 2 and Y 2 the optimal action is a 1.

In addition, it follows from the induction hypothesis that

$$ \begin{aligned}W_2 = & r([i, b_1, d_1, b_2, d_2+1,v,l], a_2; \beta) -r([i, b_1, d_1, b_2, d_2,v,l], a_2; \beta) \\ &+ \sum_{{\bf s}^{\prime} \in \, {\bf S}} \lambda \left(P[b^{\prime}_1,d^{\prime}_1\,|\,b_1,d_1]P[b^{\prime}_2,d^{\prime}_2\,|\,b_2,d_2+1]P[v^{\prime}\,|\,v]P[l^{\prime}\,|\,l]\right.\\ &\times v^{k}_{\beta}(j,b^{\prime}_1,d^{\prime}_1,b^{\prime}_2,(d_2+1)^{\prime},v^{\prime},l^{\prime}) -P[b^{\prime}_1,d^{\prime}_1\,|\,b_1,d_1]\\ &\times \left. P[b^{\prime}_2,d^{\prime}_2\,|\,b_2,d_2] P[v^{\prime}\,|\,v]P[l^{\prime}\,|\,l]v^{k}_{\beta}(j,b^{\prime}_1,d^{\prime}_1,b^{\prime}_2,d_2^{\prime},v^{\prime},l^{\prime})\right)\\ \leq & r([i, b_1, d_1, b_2, d_2,v,l], a_0; \beta) -r([i, b_1, d_1, b_2, d_2-1,v,l], a_0; \beta)\\ & + \sum_{{\bf s}^{\prime} \in \, {\bf S}} \lambda \left(P[b^{\prime}_1,d^{\prime}_1\,|\,b_1,d_1]P[b^{\prime}_2,d^{\prime}_2\,|\,b_2,d_2]P[v^{\prime}\,|\,v]P[l^{\prime}\,|\,l]\right.\\ &\times v^{k}_{\beta}(j,b^{\prime}_1,d^{\prime}_1,b^{\prime}_2,d_2^{\prime},v^{\prime},l^{\prime}) -P[b^{\prime}_1,d^{\prime}_1\,|\,b_1,d_1]\\ &\times P[b^{\prime}_2,d^{\prime}_2\,|\,b_2,d_2-1]P[v^{\prime}\,|\,v]P[l^{\prime}\,|\,l] \\ &\times \left. v^{k}_{\beta}(j,b^{\prime}_1,d^{\prime}_1,b^{\prime}_2,(d_2-1)^{\prime},v^{\prime},l^{\prime})\right).\\ \end{aligned} $$

The RHS of the inequality comes from the expansion of Z 2 which implies that W 2 ≤ Z 2. Therefore, it is shown that v k+1 β(ib 1d 1b 2d 2vl) satisfies (38), which implies that Q β(sa) is supermodular in (d 2,a).

Appendix 4: Proof of Theorem 3

To show that the optimal policy is monotonically non-decreasing in the velocity, we need to prove that Q β(sa) is supermodular in (va). We will prove via mathematical induction that for a suitable initialization, (34) is monotonically non-decreasing in the velocity v. It holds if \(v_{\beta}(j,b^{\prime}_1,d^{\prime}_1,b^{\prime}_2,d^{\prime}_2,v^{\prime},l^{\prime})\) has non-increasing difference in v. Select \(v^{0}_{\beta}(j,b^{\prime}_1,d^{\prime}_1,b^{\prime}_2,d^{\prime}_2,v^{\prime},l^{\prime})\) with non-increasing difference in v. Assume that \(v^{k}_{\beta}(j,b^{\prime}_1,d^{\prime}_1,b^{\prime}_2,d^{\prime}_2,v^{\prime},l^{\prime})\) has non-increasing difference in v, which implies that Q k+1 β([ib 1d 1b 2d 2vl], a) is supermodular in (va). We will now prove that \(v^{k+1}_{\beta}(j,b^{\prime}_1,d^{\prime}_1,b^{\prime}_2,d^{\prime}_2,v^{\prime},l^{\prime})\) also has non-increasing difference in v. That is,

$$ \begin{aligned} & v^{k+1}_{\beta}(i,b_1,d_1,b_2,d_2,v+1,l)-v^{k+1}_{\beta}(i,b_1,d_1,b_2,d_2,v,l)\\ & \quad \leq v^{k+1}_{\beta}(i,b_1,d_1,b_2,d_2,v,l)-v^{k+1}_{\beta}(i,b_1,d_1,b_2,d_2,v-1,l), \end{aligned}$$
(41)

or

$$ \begin{aligned} & v^{k+1}_{\beta}(i,b_1,d_1,b_2,d_2,v+1,l)-v^{k+1}_{\beta}(i,b_1,d_1,b_2,d_2,v,l)\\ &\quad - \left(v^{k+1}_{\beta}(i,b_1,d_1,b_2,d_2,v,l)-v^{k+1}_{\beta}(i,b_1,d_1,b_2,d_2,v-1,l)\right)\leq 0. \end{aligned} $$
(42)

We assume

$$ \begin{aligned} v^{k+1}_{\beta}(i,b_1,d_1,b_2,d_2,v+1,l) =\, & Q^{k+1}_{\beta}([i,b_1,d_1,b_2,d_2,v+1,l],a_2),\\ v^{k+1}_{\beta}(i,b_1,d_1,b_2,d_2,v,l) =\, & Q^{k+1}_{\beta}([i,b_1,d_1,b_2,d_2,v,l],a_1), \end{aligned} $$

and

$$ v^{k+1}_{\beta}(i,b_1,d_1,b_2,d_2,v-1,l)=Q^{k+1}_{\beta}([i,b_1,d_1,b_2,d_2,v-1,l],a_0), $$

for some actions \(a_2, a_1, a_0 \in {\bf A}_{\bf s}\). Thus, we can re-write (41) as

$$ \begin{aligned} & Q^{k+1}_{\beta}([i,b_1,d_1,b_2,d_2,v+1,l], a_2)-Q^{k+1}_{\beta}([i,b_1,d_1,b_2,d_2,v,l], a_1)\\ & \quad - \left(Q^{k+1}_{\beta}([i,b_1,d_1,b_2,d_2,v,l], a_1)\right. \\ & \quad \left.- Q^{k+1}_{\beta}\left([i,b_1,d_1,b_2,d_2,v-1,l], a_0\right)\right) \leq 0, \end{aligned} $$
(43)

or

$$ \begin{array}{c} \underset{W_3}{\underbrace{Q^{k+1}_{\beta}([i,b_1,d_1,b_2,d_2,v+1,l], a_2) - Q^{k+1}_{\beta}([i,b_1,d_1,b_2,d_2,v,l], a_2)}}\\ \underset{X_3}{\underbrace{+Q^{k+1}_{\beta}([i,b_1,d_1,b_2,d_2,v,l], a_2) - Q^{k+1}_{\beta}([i,b_1,d_1,b_2,d_2,v,l], a_1)}}\\ \underset{Y_3}{\underbrace{-Q^{k+1}_{\beta}([i,b_1,d_1,b_2,d_2,v,l], a_1) + Q^{k+1}_{\beta}([i,b_1,d_1,b_2,d_2,v,l], a_0)}}\\ -\underset{Z_3}{\left(\underbrace{Q^{k+1}_{\beta}([i,b_1,d_1,b_2,d_2,v,l], a_0) - Q^{k+1}_{\beta}([i,b_1,d_1,b_2,d_2,v-1,l], a_0})\right)}\\ \leq 0, \end{array} $$

where X 3 ≤ 0 and Y 3 ≤ 0 by optimality. Note that in X 3 and Y 3 the optimal action is a 1.

In addition, it follows from the induction hypothesis that

$$ \begin{aligned} W_3 = & r([i, b_1, d_1, b_2, d_2,v+1,l], a_2; \beta) -r([i, b_1, d_1, b_2, d_2,v,l], a_2; \beta) \\ &+ \sum_{{\bf s}^{\prime} \in \, {\bf S}} \lambda \left(P[b^{\prime}_1,d^{\prime}_1\,|\,b_1,d_1]P[b^{\prime}_2,d^{\prime}_2\,|\,b_2,d_2]P[v^{\prime}\,|\,v+1]P[l^{\prime}\,|\,l]\right.\\ &\times v^{k}_{\beta}(j,b^{\prime}_1,d^{\prime}_1,b^{\prime}_2,d^{\prime}_2,(v+1)^{\prime},l^{\prime}) -P[b^{\prime}_1,d^{\prime}_1\,|\,b_1,d_1] \\ &\times \left. P[b^{\prime}_2,d^{\prime}_2\,|\,b_2,d_2]P[v^{\prime}\,|\,v]P[l^{\prime}\,|\,l]v^{k}_{\beta}(j,b^{\prime}_1,d^{\prime}_1,b^{\prime}_2,d^{\prime}_2,v^{\prime},l^{\prime})\right) \\ \leq & r([i, b_1, d_1, b_2, d_2,v,l], a_0; \beta) -r([i, b_1, d_1, b_2, d_2,v-1,l], a_0; \beta) \\ &+\sum_{{\bf s}^{\prime} \in \, {\bf S}} \lambda \left(P[b^{\prime}_1,d^{\prime}_1\,|\,b_1,d_1]P[b^{\prime}_2,d^{\prime}_2\,|\,b_2,d_2]P[v^{\prime}\,|\,v]P[l^{\prime}\,|\,l] \right. \\ &\times v^{k}_{\beta}(j,b^{\prime}_1,d^{\prime}_1,b^{\prime}_2,d^{\prime}_2,v^{\prime},l^{\prime}) -P[b^{\prime}_1,d^{\prime}_1\,|\,b_1,d_1]P[b^{\prime}_2,d^{\prime}_2\,|\,b_2,d_2] \\ &\times \left. P[v^{\prime}\,|\,v-1]P[l^{\prime}\,|\,l]v^{k}_{\beta}(j,b^{\prime}_1,d^{\prime}_1,b^{\prime}_2,d^{\prime}_2,(v-1)^{\prime},l^{\prime})\right). \\ \end{aligned} $$

The RHS of the inequality comes from the expansion of Z 3 which implies that W 3 ≤ Z 3. Therefore, it is shown that v k+1 β(ib 1d 1b 2d 2vl) satisfies (41), which implies that Q β(sa) is supermodular in (va).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sun, C., Stevens-Navarro, E., Shah-Mansouri, V. et al. A constrained MDP-based vertical handoff decision algorithm for 4G heterogeneous wireless networks. Wireless Netw 17, 1063–1081 (2011). https://doi.org/10.1007/s11276-011-0335-x

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11276-011-0335-x

Keywords

Navigation