Skip to main content

Advertisement

Log in

Mobile Edge Computing Against Smart Attacks with Deep Reinforcement Learning in Cognitive MIMO IoT Systems

  • Published:
Mobile Networks and Applications Aims and scope Submit manuscript

Abstract

In wireless Internet of Things (IoT) systems, the multi-input multi-output (MIMO) and cognitive radio (CR) techniques are usually involved into the mobile edge computing (MEC) structure to improve the spectrum efficiency and transmission reliability. However, such a CR based MIMO IoT system will suffer from a variety of smart attacks from wireless environments, even the MEC servers in IoT systems are not secure enough and vulnerable to these attacks. In this paper, we investigate a secure communication problem in a cognitive MIMO IoT system comprising of a primary user (PU), a secondary user (SU), a smart attacker and several MEC servers. The target of our system design is to optimize utility of the SU, including its efficiency and security. The SU will choose an idle MEC server that is not occupied by the PU in the CR scenario, and allocates a proper offloading rate of its computation tasks to the server, by unloading such tasks with proper transmit power. In such a CR IoT system, the attacker will select one type of smart attacks. Then two deep reinforcement learning based resource allocation strategies are proposed to find an optimal policy of maximal utility without channel state information(CSI), one of which is the Dyna architecture and Prioritized sweeping based Edge Server Selection (DPESS) strategy, and the other is the Deep Q-network based Edge Server Selection (DESS) strategy. Specifically, the convergence speed of the DESS scheme is significantly improved due to the trained convolutional neural network (CNN) by utilizing the experience replay technique and stochastic gradient descent (SGD). In addition, the Nash equilibrium and existence conditions of the proposed two schemes are theoretically deduced for the modeled MEC game against smart attacks. Compared with the traditional Q-learning algorithm, the average utility and secrecy capacity of the SU can be improved by the proposed DPESS and DESS schemes. Numerical simulations are also presented to verify the better performance of our proposals in terms of efficiency and security, including the higher convergence speed of the DESS strategy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  1. Chen X (2015) Decentralized computation offloading game for mobile cloud computing. IEEE Transaction on Parallel and Distributed Systems 26(4):974–983

    Article  Google Scholar 

  2. Xu J, Chen L, Ren S (2015) Online learning for offloading and autoscaling in energy harvesting mobile edge computing. IEEE Transactions on Cognitive Communications and Networking 3(3):361–373

    Article  Google Scholar 

  3. Mao Y, You C, Zhang J, Huang K, Lataief KB (2017) A survey on mobile edge computing: the communication perspective. IEEE Communications Surveys & Tutorials 19(4):2322–2358

    Article  Google Scholar 

  4. Shirazi SN, Gouglidis A, Farshad A, Hutchison D (2017) The extended cloud: review and analysis of mobile edge computing and fog from a security and resilience perspective. IEEE Journal on Selected Areas in Communications 35(11):2586–2595

    Article  Google Scholar 

  5. Xiao L, Xie C, Chen T, Dai H, Poor HV (2016) Mobile offloading game against smart attacks. In: Proc. IEEE International Conference on Computer Communications (INFOCOM WKSHPS), -BigSecurity, San Francisco, CA

  6. Duan L, Gao L, Huang J (2014) Cooperative spectrum sharing: a Contract-Based approach. IEEE Transactions On Mobile Computing 13(1):174–187

    Article  Google Scholar 

  7. Wang F, Xu J, Wang X, Cui S (2018) Joint offloading and computing optimization in wireless powered mobile-edge computing. IEEE Trans Wirel Commun 17(3):1784–1797

    Article  Google Scholar 

  8. Li Y, Li Q, Liu J, Xiao L (2015) Mobile cloud offloading for malware detections with learning. In: Proc. IEEE international conference on computer communications, INFOCOM), -BigSecurity, Hongkong

  9. Wan X, Sheng G, Li Y, Xiao L, Du X (2017) Reinforcement Learning Based Mobile Offloading for Cloud-based Malware Detection. In: Proc. IEEE Global Commun Conf, GLOBECOM, Singapore

  10. Messous M, Sedjelmaci H, Houari N, Senouci S (2017) Computation offloading game for an UAV network in mobile edge computing. In: In Proc. of, IEEE International Conference on Communications (ICC). Paris, France, pp 1–6

  11. Min M, Wan X, Xiao L, Chen Y, Xia M, Wu D, Dai H (2018) Learning-based privacy-aware offloading for healthcare IoT with energy harvesting. IEEE Internet of Things Journal, https://doi.org/10.1109/JIOT.2018.2875926

  12. Li Y, Xiao L, Dai H, Poor HV (2017) Game Theoretic Study of Protecting MIMO Transmission Against Smart Attacks. In: Proc. of IEEE International Conference on Communications (ICC), Paris

  13. Sutton RS, Barto AG (1998) Reinforcement learning: An introduction. MIT press

  14. Xiao L, Chen T, Xie C, Dai H, Poor HV (2018) Mobile crowdsensing games in vehicular networks. IEEE Trans. Vehicular Technology 62(2):1535–1545

    Article  Google Scholar 

  15. Ayatollahi H, Tapparello C, Heinzelman W (2017) Reinforcement learning in MIMO wireless networks with energy harvesting. In: Proc. of IEEE international conference on communications (ICC), Paris, France, pp 1-6

  16. Wang Z, Liu L, Zhang H, Xiao G (2016) Fault-Tolerant Controller design for a class of nonlinear MIMO Discrete-Time systems via online reinforcement learning algorithm. IEEE transactions on systems, Man, and Cybernetics:, Systems 46(5):611– 622

    Article  Google Scholar 

  17. Liu Y, Tang L, Tong S, Philip Chen CL, Li D (2015) Reinforcement learning Design-Based adaptive tracking control with less learning parameters for nonlinear Discrete-Time MIMO systems. IEEE Transactions on Neural Networks and Learning Systems 26(1):165–176

    Article  MathSciNet  Google Scholar 

  18. Xiao L, Li Y, Han G, Dai H, Poor HV (2018) A secure mobile crowdsensing game with deep reinforcement learning. IEEE Trans. Information Forensics & Security 13(1):35–47

    Article  Google Scholar 

  19. Xiao L, Xie C, Min M, Zhuang W (2018) User-centric view of unmanned aerial vehicle transmission against smart attacks. IEEE Trans. Vehicular Technology 67(4):3420–3430

    Article  Google Scholar 

  20. Xiao L, Xie C, Han G, Li Y, Zhuang W, Sun L (2016) Channel-based authentication game in MIMO systems. In: Proc. IEEE Global Commun. Conf. (GLOBCOM). Washington. DC

  21. Moore AW, Atkeson CG (1993) Prioritized sweepin: Reinforecement learning with less data and less time. Mach Learn 13(1):103–130

    Google Scholar 

  22. Watkins CJ, Dayan P (1992) Q-learning. Machine Learning 8(3):279–292

    MATH  Google Scholar 

  23. Mnih V, Kavukcuoglu K, Silver D et al (2013) Playing Atari with Deep Reinforcement Learning. Computer Science

  24. Xiao L, Xie C, Han G, Li Y, Zhuang W, Sun L (2017) Game Theoretic Study on Channel-based Authentication in MIMO Systems. IEEE Trans. Vehicular Technology 66(8):7474– 7484

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiang Chen.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The work is supported in part of Science, Technology and Innovation Commission of Shenzhen Municipality (No. JCYJ20170816151823313), NSFC (No. U1734209, No. 61501527), States Key Project of Research and Development Plan (No. 2017YFE0121300-6), The 54th Research Institute of China Electronics Technology Group Corporation (No. B0105), Guangdong R&D Project in Key Areas (No.2019B010156004) and Guangdong Provincial Special Fund For Modern Agriculture Industry Technology Innovation Teams (No. 2019KJ122).

Appendix : A: Proof of theorem 1

Appendix : A: Proof of theorem 1

Proof

By (2), we have

$$ U(0, p)=xL- \rho e_{0}- \nu t_{0} -xL \frac{\rho p+ \nu}{B\log_{2} \det \left( \mathbf{I}+ \frac{p}{N_{U}} {\mathbf{H}}_{UM} {\mathbf{H}}_{UM}^{T} \right)}. $$
(10)

Let A and function f(p) denote xLρe0νt0 and \(-xL \frac {\log _{2} \det \left (\mathbf {I}+ \frac {p}{N_{U}} {\mathbf {H}}_{UM} {\mathbf {H}}_{UM}^{T} \right )}{\rho p+ \nu }\), respectively. Thus, we can obtain the equivalent expression of the utility as

$$ U=A+1/(Bf(p)). $$
(11)

By the expression of f(p), it is clear that we can get

$$ \frac{\partial f(p)}{\partial p}|_{p=0} = - \frac{\text{Tr}[{\mathbf{H}}_{UM} {\mathbf{H}}_{UM}^{T}]}{N_{M} xL\nu b\ln 2}<0. $$
(12)

By (9), we can obtain

$$ \frac{\partial f(p)}{\partial p}|_{p=\overline{p}} <0. $$
(13)

If the equality (8) holds, we have \(\frac {\partial ^{2} f(p)}{\partial p^{2}}>0\). Thus, \(\frac {\partial f(p)}{\partial p}\) increases with p monotonically and it is always negative. From (11), we get

$$ \frac{\partial U}{\partial p}=(\frac{-\partial f(p)}{\partial p})/(Bf^{2}(p))>0. $$
(14)

Therefore, it can be shown that the utility of the SU increases with p. Moreover, we can have the maximum of UU(0,p), i.e., \(U(0, \overline {p})\). Therefore, from (14),

$$ \overline{p} = \arg\max \limits_{\underline{p}\leq p\leq \tilde{p}} U_{U}(0, p) $$
(15)

holds for \(\overline {p}\).

By the definition of (1), we have

$$ U_{A}(0, \overline{p}) = -\log_{2} \det \left( \mathbf{I}+ \frac{\overline{p}}{N_{U}} {\mathbf{H}}_{UM} {\mathbf{H}}_{UM}^{T} \right) -\omega_{0}, $$
(16)
$$ U_{A}(1, \overline{p}) = -\log_{2} \det \left( \mathbf{I}+ \frac{\overline{p}}{N_{U}} {\mathbf{H}}_{UM} {\mathbf{H}}_{UM}^{T} \right) + \log_{2} \det \left( \mathbf{I}+ \frac{p_{S}}{N_{A}} {\mathbf{H}}_{MA} {\mathbf{H}}_{MA}^{T} \right) -\omega_{1}, $$
(17)
$$ U_{A}(2, \overline{p}) = - \log_{2} \det \left( \frac{\mathbf{I}+ \frac{\overline{p}}{N_{U}} {\mathbf{H}}_{UM} {\mathbf{H}}_{UM}^{T}}{\mathbf{I}+ \frac{p_{J}}{N_{A}} {\mathbf{H}}_{MA} {\mathbf{H}}_{MA}^{T}} \right)-\omega_{2}, $$
(18)
$$ \begin{array}{@{}rcl@{}} U_{A}(3, \overline{p}) = &&-\log_{2} \det \left( \mathbf{I}+ \frac{\overline{p}}{N_{U}} {\mathbf{H}}_{UM} {\mathbf{H}}_{UM}^{T} \right)\\ && +\log_{2} \det \left( \mathbf{I}+ \frac{\overline{p}}{N_{U}} {\mathbf{H}}_{UA} {\mathbf{H}}_{UA}^{T} \right)-\omega_{3}. \end{array} $$
(19)

If the equations (5)-(7) hold, we can get

$$ U_{A}(0, \overline{p})-U_{A}(1, \overline{p}) \geq 0, $$
(20)
$$ U_{A}(0,\overline{p})-U_{A}(2, \overline{p}) \geq 0, $$
(21)
$$ U_{A}(0, \overline{p})-U_{A}(3, \overline{p}) \geq 0. $$
(22)

Therefore, the equality

$$ 0 = \arg \max \limits_{g\in \mathcal{A}_{A}}U_{A}(g,\overline{p}) $$
(23)

holds for g = 0.

Then by (15) and (23), we can conclude that the NE of the MEC system is \((\overline {p},0)\). By (9), we obtain

$$ \begin{array}{@{}rcl@{}} \frac{\partial U_{U}}{\partial x}=1&&-\frac{\rho p +\nu}{B\log2 \det (\mathbf{I}+\frac{p}{N_{U}}\mathbf{H}_{UM}\mathbf{H}_{UM}^{T})} \\ &&+ \phi(\rho e_{0} +\nu t_{0}) > \phi (\rho kf^{2} +\nu f^{-1})>0, \end{array} $$
(24)
$$ \frac{\partial^{2} U_{U}}{\partial x^{2}}=0, $$
(25)

which implies that the utility of the SU increases with x when NE holds. Thus, we can know the utility can arrive at the maximum when x is maximal, i.e.,

$$ \begin{array}{@{}rcl@{}} U_{U}=\overline{x}L(1&&-\frac{\rho \overline{p}+\nu }{B\log_{2} {\det} (1+ \frac{ \overline{p}}{N_{U}}\mathbf{H}_{UM}\mathbf{H}_{UM}^{T})})\\ &&+L\phi (\rho kf^{2} +\nu f^{-1})(1-\overline{x}). \end{array} $$
(26)

Thus, the SU can get the optimal utility if the inequalities (5)-(9) hold. □

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ge, S., Lu, B., Xiao, L. et al. Mobile Edge Computing Against Smart Attacks with Deep Reinforcement Learning in Cognitive MIMO IoT Systems. Mobile Netw Appl 25, 1851–1862 (2020). https://doi.org/10.1007/s11036-020-01572-w

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11036-020-01572-w

Keywords

Navigation