Skip to main content

Advertisement

Log in

Online Learning and Optimization for Computation Offloading in D2D Edge Computing and Networks

  • Published:
Mobile Networks and Applications Aims and scope Submit manuscript

Abstract

This paper introduces a framework of device-to-device edge computing and networks (D2D-ECN), a new paradigm for computation offloading and data processing with a group of resource-rich devices towards collaborative optimization between communication and computation. However, the computation process of task intensive applications would be interrupted when capacity-limited battery energy run out. In order to tackle this issue, the D2D-ECN with energy harvesting technology is applied to provide a green computation network and guarantee service continuity. Specifically, we design a reinforcement learning framework in a point-to-point offloading system to overcome challenges of the dynamic nature and uncertainty of renewable energy, channel state and task generation rates. Furthermore, to cope with high-dimensionality and continuous-valued action of the offloading system with multiple cooperating devices, we propose an online approach based on Lyapunov optimization for computation offloading and resource management without priori energy and network information. Numerical results demonstrate that our proposed scheme can reduce system operation cost with low task execution time in D2D-ECN.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Gubbi J, Buyya R, Marusic S, Palaniswami M (2013) Internet of things (IoT): a vision, architectural elements, and future directions. Future Generat Comput Syst 29(7):1645–1660

    Article  Google Scholar 

  2. ETSI ISG (2015) Mobile edge computing a key technology towards 5G. White Paper 11:1–16

    Google Scholar 

  3. Pu L, Chen C, Xu J, Fu X (2016) D2D fogging: an energy-efficient and incentive-aware task offloading framework via network-assisted D2D collaboration. IEEE J Sel Areas Commun 34(12):3887–3901

    Article  Google Scholar 

  4. Liu F et al (2013) Gearing resource-poor mobile devices with powerful clouds: architectures, challenges, and applications. IEEE Wireless Commun 20(3):14–22

    Article  Google Scholar 

  5. Zhang K et al (2016) Energy-ffficient offloading for mobile edge computing in 5G heterogeneous networks. IEEE Access 4:5896–5907

    Article  Google Scholar 

  6. Shih Y-Y, Chung W-H, Pang A-C, Chiu T-C, Wei H-Y (2017) Enabling low-latency applications in fog-radio access networks. IEEE Netw 31(1):52–58

    Article  Google Scholar 

  7. Zhang K, Mao Y, Leng S, He Y, Zhang Y (2017) Mobile-edge computing for vehicular networks: a promising network paradigm with predictive off-loading. IEEE Vehi Tech Maga 12(2):36–44

    Article  Google Scholar 

  8. Wang X, Leng S, Yang K (2017) Social-aware edge caching in fog radio access networks. IEEE Access 5:8492–8501

    Article  Google Scholar 

  9. Chen X, Pu L, Gao L, Wu W, Wu D (2017) Exploiting massive D2D collaboration for energy-efficient mobile edge computing. IEEE Commun Mag 24(4):64–71

    Google Scholar 

  10. Ti N, Le L (2017) Computation offloading leveraging computing resources from edge cloud and mobile peers. In: Proceedings of the IEEE Int. Commun. Confe (ICC)

  11. Meng X, Wang W, Zhang Z (2017) Delay-constrained hybrid computation offloading with cloud and fog computing. IEEE Access 5:21355–21367

    Article  Google Scholar 

  12. Garlatova M, Wallwater A, Zussman G (2011) Networking low-power energy harvesting devices: measurements and algorithms. In: Proceedings of the IEEE Conf. Comput. Commun. (INFOCOM), pp 1602–1610

  13. Dhillon H, Li Y, Nuggehalli P, Pi Z, Andrews J (2014) Fundamentals of heterogeneous cellular networks with energy harvesting. IEEE Trans Wireless Commun 13(5):2782–2797

    Article  Google Scholar 

  14. Mao Y, Zhang J, Letaief KB (2016) Dynamic computation offloading for Mobile-Edge computing with energy harvesting devices. IEEE J Sel Areas Commun 34(12):3590–3605

    Article  Google Scholar 

  15. Fan B, Leng S, Yang K (2016) A dynamic bandwidth allocation algorithm in mobile networks with big data of users and networks. IEEE Netw Maga 30(1):6–10

    Article  Google Scholar 

  16. Mastronarde N, Schaar M (2011) Fast reinforcement learning for energy-efficient wireless communication. IEEE Trans Sig Process 50(12):6262–6266

    Article  MathSciNet  Google Scholar 

  17. Wei Y, Yu FR, Song M, Han Z (2017) User scheduling and resource allocation in HetNets with hybrid energy supply: an actor-critic reinforcement learning approach. IEEE Trans Wire Commun 17(1):680–692

    Article  Google Scholar 

  18. Xu J, Chen L, Ren S (2017) Online learning for offloading and autoscaling in energy harvesting mobile edge computing. IEEE Trans Cogn Commun Netw 3(3):361–373

    Article  Google Scholar 

  19. Mao Y, You C, Zhang J, Huang K, Letaief KB (2017) A servey on mobile edge computing: the communication perspective. IEEE Commun Survey Tuts 19(4):2322–2358

    Article  Google Scholar 

  20. Miettinen AP, Nurminen JK (2010) Energy efficiency of mobile clients in cloud computing. In: Proceedings of the USENIX Conf. Hot Topics Cloud Comput. (HotCloud), Boston, MA, USA, pp 1–7

  21. Wang Q, Leng S, Fu H, Zhang Y (2012) An IEEE 802.11p-based multichannle MAC scheme with channel coordination for vehicular Ad Hoc networks. IEEE Trans Intell Trans Sys 13(2):449–458

    Article  Google Scholar 

  22. Rajan D, Sabharwal A, Aazhang B (2004) Delay-bounded packet scheduling of bursty traffic over wireless channels. IEEE Trans Inform Theory 50(1):125–144

    Article  MathSciNet  Google Scholar 

  23. Burd TD, Brodersen RW (1996) Processor design for portable systems. Kluwer J VLSI Signal Process Syst 13(2):203–221

    Article  Google Scholar 

  24. Altman E (1999) Constrained Markov Decision Process Chapman and Hall/CRC

  25. Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. MIT Press, Cambridge

    MATH  Google Scholar 

  26. Salodkar N, Bhorkar A, Karandikar A, Borkar VS (2008) An on-line learning algorithm for energy efficient delay constrained scheduling over a fading channel. IEEE J Sel Areas Commun 26(4):732–742

    Article  Google Scholar 

  27. Neely MJ (2010) Stochastic network optimization with application to communication and queueing systems, San rafael, CA USA: Morgan claypool

  28. Gibilisco P, Hiai F, Petz D (2009) Quantum covariance, quantum fisher information, and the uncertainty relations. IEEE Trans Inform Theory 55(1):439–443

    Article  MathSciNet  Google Scholar 

  29. Gorlatova M, Wallwater A, Zussma G (2012) Networking low-power energy harvesting devices: Measurements and algorithms. IEEE Trans Mobile Comput 12(9):1853–1865

    Article  Google Scholar 

  30. Mitchell TM (1997) Machine learning McGraw-Hill

Download references

Acknowledgments

This work is supported by the joint fund of the Ministry of Education of China and China Mobile (MCM 20160304), the Fundamental Research Funds for the Central Universities, China (ZYGX2016Z011), and EU H2020 Project COSAFE (MSCA-RISE-2018-824019).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Supeng Leng.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix: Convergence and optimality analysis of algorithm 1

Given the Lagrange multiplier η, Algorithm 1 can converge to an optimal state-action value (e.g, Q(t), to approximate Q) after finite number of iterations as it satisfies following conditions [30]: i) the state transition \(p(\left . {s^{\prime }} \right |s,a)\) in Eq. 10 is stationary under any observed state sS with the corresponding action aA; ii) the Lagrange function ρ in Eq. 13 is bounded, that is the unconstrained cost satisfies \(\left | \rho (s,a)\right | < \rho _{max}\) for each possible state-action pair (s,a), and ρmax is a constant; (iii) each state s and Lagrange function could be infinitely explored at the each episode. As a consequence, the Qη converges to the optimal Qη with probability 1. Moreover, it is worth mentioning that the executive time conducted by interactions between agent and external network environment have considerable impact on the feasibility and efficiency of Q-learning algorithm.

On the other hand, we impose the following additional conditions on the learning rate α(t) and β(t) to guarantee convergence of Theorem 1 [26],

$$ \left\{ \begin{array}{l} \sum\nolimits_{t = 0}^{\infty} {\alpha (t)} = \infty ,\sum\nolimits_{t = 0}^{\infty} {\beta (t)} = \infty \\ \sum\nolimits_{t = 0}^{\infty} {{\alpha^{2}}(t)} \le \infty ,\sum\nolimits_{t = 0}^{\infty} {{\beta^{2}}(t)} \le \infty \\ \underset{t \to \infty }{\lim } \frac{{\beta (t)}}{{\alpha (t)}} \to 0 \end{array} \right. $$
(28)

Consequently, Eq. 19 can ensure to obtain optimal policy in finite iterations. ■

Appendix B: Proof for theorem 1

Based on the inequality \(\max {[x,0]^{2}} \le {x^{2}}\), the battery energy state transition can be reformulated as

$$\begin{array}{@{}rcl@{}} {[b(t + 1)]^{2}} &\le& {[b(t)]^{2}} + {[e(t)]^{2}} + {[g(t)]^{2}}\\ &&- 2b(t)[g(t) - e(t)] \end{array} $$
(29)

Then, substituting the Lyapunov function (23) and the above inequality (29) into the Lyapunov drift function (24), we obtain

$$\begin{array}{@{}rcl@{}} E\left\{ {L(t + 1) - L(t)} \right\} &\le& \frac{1}{2}E\left\{ {\left({\left. {{{[e(t)]}^{2}} + {{[g(t)]}^{2}}} \right|b(t)} \right)} \right.\\ &&-\left({\left. {2b(t)[g(t) - e(t)]} \right|b(t)} \right) \end{array} $$
(30)

Taking the renewable energy arrival rate and energy utilization to hold for the inequalities \(\left | {e(t)} \right | \le {e_{\max }}\) and \(\left | {g(t)} \right | \le {g_{\max }}\), we define the following equation

$$ \vartheta = \frac{1}{2}E\left\{ {\left({\left. {{{[{e_{\max }}]}^{2}} + {{[{g_{\max }}]}^{2}}} \right|b(t)} \right)} \right. $$
(31)

Therefore, adding the \(\pi E\left \{ {c(t)} \right \}\) into each side of the inequality (30), which can be rewritten as the Eq. 26. ■

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Qiao, G., Leng, S. & Zhang, Y. Online Learning and Optimization for Computation Offloading in D2D Edge Computing and Networks. Mobile Netw Appl 27, 1111–1122 (2022). https://doi.org/10.1007/s11036-018-1176-y

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11036-018-1176-y

Keywords

Navigation