Skip to main content

Reinforcement Learning Approach to Generate Zero-Dynamics Attacks on Control Systems Without State Space Models

  • Conference paper
  • First Online:
Computer Security – ESORICS 2023 (ESORICS 2023)

Abstract

Stealthy attacks on control systems are bound to go unnoticed, which makes them a severe threat to critical infrastructure such as power systems, smart grids, and vehicular networks. This paper investigates a subset of stealthy attacks known as zero-dynamics-based stealthy attacks. While previous works on zero-dynamics attacks have highlighted the necessity of highly accurate knowledge of the system’s state space for generating attack signals, our study requires none. We propose a deep reinforcement learning based attacker to generate attack signals without prior knowledge of the system’s state space. We develop several attackers and detectors iteratively until the attacker and detectors no longer improve. In addition, we also show that the reinforcement learning based attacker successfully executes an attack in the same manner as the theoretical attacker described in previous literature.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Alabugin, S.K., Sokolov, A.N.: Applying of generative adversarial networks for anomaly detection in industrial control systems. In: 2020 Global Smart Industry Conference (GloSIC), pp. 199–203. IEEE (2020)

    Google Scholar 

  2. Anderson, B.D.: Output-nulling invariant and controllability subspaces. IFAC Proc. Vol. 8(1), 337–345 (1975)

    Article  Google Scholar 

  3. Aoufi, S., Derhab, A., Guerroumi, M.: Survey of false data injection in smart power grid: attacks, countermeasures and challenges. J. Inf. Secur. Appl. 54, 102518 (2020)

    Google Scholar 

  4. Defense Use Case: Analysis of the cyber attack on the Ukrainian power grid. Electr. Inf. Shar. Anal. Center (E-ISAC) 388, 1–29 (2016)

    Google Scholar 

  5. Chen, Y., Huang, S., Liu, F., Wang, Z., Sun, X.: Evaluation of reinforcement learning-based false data injection attack to automatic voltage control. IEEE Trans. Smart Grid 10(2), 2158–2169 (2018)

    Article  Google Scholar 

  6. Dash, P., Karimibiuki, M., Pattabiraman, K.: Out of control: stealthy attacks against robotic vehicles protected by control-based techniques. In: Proceedings of the 35th Annual Computer Security Applications Conference, pp. 660–672 (2019)

    Google Scholar 

  7. Deng, R., Xiao, G., Lu, R., Liang, H., Vasilakos, A.V.: False data injection on state estimation in power systems attacks, impacts, and defense: a survey. IEEE Trans. Industr. Inf. 13(2), 411–423 (2016)

    Article  Google Scholar 

  8. Duan, J., et al.: Deep-reinforcement-learning-based autonomous voltage control for power grid operations. IEEE Trans. Power Syst. 35(1), 814–817 (2019)

    Article  MathSciNet  Google Scholar 

  9. Feng, C., Li, T., Zhu, Z., Chana, D.: A deep learning-based framework for conducting stealthy attacks in industrial control systems. arXiv preprint arXiv:1709.06397 (2017)

  10. Haarnoja, T., Zhou, A., Abbeel, P., Levine, S.: Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor. In: International Conference on Machine Learning, pp. 1861–1870. PMLR (2018)

    Google Scholar 

  11. Harshbarger, S.: The impact of zero-dynamics stealthy attacks on control systems: stealthy attack success probability and attack prevention (2022). https://krex.k-state.edu/dspace/handle/2097/42853

  12. Harshbarger, S., Hosseinzadehtaher, M., Natarajan, B., Vasserman, E., Shadmand, M., Amariucai, G.: (A little) ignorance is bliss: The effect of imperfect model information on stealthy attacks in power grids. In: 2020 IEEE Kansas Power and Energy Conference (KPEC), pp. 1–6. IEEE (2020)

    Google Scholar 

  13. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)

    Article  Google Scholar 

  14. Johansson, K.H.: The quadruple-tank process: a multivariable laboratory process with an adjustable zero. IEEE Trans. Control Syst. Technol. 8(3), 456–465 (2000)

    Article  Google Scholar 

  15. Kim, S., Park, K.J.: A survey on machine-learning based security design for cyber-physical systems. Appl. Sci. 11(12), 5458 (2021)

    Article  Google Scholar 

  16. Langner, R.: Stuxnet: dissecting a cyberwarfare weapon. IEEE Secur. Privacy 9(3), 49–51 (2011)

    Article  Google Scholar 

  17. Li, C., Qiu, M.: Reinforcement Learning for Cyber-Physical Systems: With Cybersecurity Case Studies. Chapman and Hall/CRC, London (2019)

    Book  Google Scholar 

  18. Liu, Z., Wang, Q., Ye, Y., Tang, Y.: A GAN-based data injection attack method on data-driven strategies in power systems. IEEE Trans. Smart Grid 13(4), 3203–3213 (2022)

    Article  Google Scholar 

  19. Sayghe, A., Zhao, J., Konstantinou, C.: Evasion attacks with adversarial deep learning against power system state estimation. In: 2020 IEEE Power & Energy Society General Meeting (PESGM), pp. 1–5. IEEE (2020)

    Google Scholar 

  20. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (2018)

    Google Scholar 

  21. Sutton, R.S., McAllester, D., Singh, S., Mansour, Y.: Policy gradient methods for reinforcement learning with function approximation. In: Advances in Neural Information Processing Systems 12 (1999)

    Google Scholar 

  22. Teixeira, A., Pérez, D., Sandberg, H., Johansson, K.H.: Attack models and scenarios for networked control systems. In: Proceedings of the 1st International Conference on High Confidence Networked Systems, pp. 55–64 (2012)

    Google Scholar 

  23. Teixeira, A., Shames, I., Sandberg, H., Johansson, K.H.: Revealing stealthy attacks in control systems. In: 2012 50th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1806–1813. IEEE (2012)

    Google Scholar 

  24. Teixeira, A., Sou, K.C., Sandberg, H., Johansson, K.H.: Secure control systems: a quantitative risk management approach. IEEE Control Syst. Mag. 35(1), 24–45 (2015)

    Article  MathSciNet  Google Scholar 

  25. Zenati, H., Foo, C.S., Lecouat, B., Manek, G., Chandrasekhar, V.R.: Efficient GAN-based anomaly detection. arXiv preprint arXiv:1802.06222 (2018)

  26. Zhang, R., Venkitasubramaniam, P.: Stealthy control signal attacks in linear quadratic Gaussian control systems: detectability reward tradeoff. IEEE Trans. Inf. Forensics Secur. 12(7), 1555–1570 (2017)

    Article  Google Scholar 

Download references

Acknowledgement

This publication was made possible by NPRP grant #12C-33905-SP-165 from the Qatar National Research Fund (a member of Qatar Foundation). The findings achieved herein are solely the responsibility of the authors. We appreciate the anonymous reviewers’ valuable suggestions and comments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bipin Paudel .

Editor information

Editors and Affiliations

Appendices

A Proof for Attack Generation

Since matrix D is 0, we replace the solution in [2] as shown below. We need a value of \(F_2\) that satisfies

figure a

From this, we can easily obtain

figure b

where, \(\begin{bmatrix} V & B \end{bmatrix} ^ +\) and \(V^+\) represent the Moore-Penrose pseudo-inverses of matrices \(\begin{bmatrix} V & B \end{bmatrix}\) and V, respectively.

If we choose \(\begin{bmatrix} F_1\\ F_2 \end{bmatrix}\) to be the right-hand side of (8b), that is,

$$\begin{aligned} \begin{bmatrix} F_1\\ F_2 \end{bmatrix}= \begin{bmatrix} V & B \end{bmatrix}^+ AVV^+, \end{aligned}$$
(9)

then using one of the properties of the pseudo-inverse, that states that for a general matrix M we have \(M^+MM^+=M^+\), we see that this choice of \(\begin{bmatrix}F_1\\ F_2\end{bmatrix}\) satisfies (8b). Now substituting the same in the left-hand side of (8a), and using the property that for a general matrix M we have that \(MM^+M=M\), we get

$$\begin{aligned} \begin{bmatrix} V B \end{bmatrix} \begin{bmatrix} F_1\\ F_2 \end{bmatrix}V= \begin{bmatrix} V & B \end{bmatrix} \begin{bmatrix} V & B \end{bmatrix}^+ AV, \end{aligned}$$
(10)

the right-hand side of which is equal to AV (meaning that our choice of \(\begin{bmatrix}F_1\\ F_2\end{bmatrix}\) satisfies (8a)) whenever \(\begin{bmatrix} V B\end{bmatrix}\) has linearly independent rows. What remains is to set \(F=-F_2\).

B Architectures and Hyperparameters

Table 4 shows the architecture of neural network models associated with the attackers. The table displays the number of nodes in each subsequent layers of the model, activation function in hidden and output layers, and learning rate and optimizer used to update the model’s parameters. Similarly, the Table 5 consists of hyperparameter settings for the neural network models. The target network is updated using a soft update method whose coefficient is 0.005 for each attacker. Similarly, the output of these attackers is scaled using the Tanh activation function. Hence, to get the desired attack signals, we scaled the output by some constant, depicted in the same table. In addition to this, the value of the parameters a and b from the reward function in Sect. 4.2 for each attacker are provided in the table. In addition, Table 6 consists of architectures and hyperparameters of LSTM detectors. All the detectors have the same architecture but differ in learning rate parameter, which is given in the table.

Table 4. Architecture of the neural network models associated with the attackers. Attackers (\(ATT_1\), \(ATT_2\)) and (\(ATT_3\), \(ATT_4\) and \(ATT_5\)) share the same architecture
Table 5. Hyper Parameter Setting for the attackers based on SAC algorithm
Table 6. Architecture of the LSTM-based detectors

C Attacker Rewards and Performance

Fig. 4.
figure 4

Reward plots of five consecutive attackers

Fig. 5.
figure 5

Cosine similarity as described in Sect. 3.2.

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Paudel, B., Amariucai, G. (2024). Reinforcement Learning Approach to Generate Zero-Dynamics Attacks on Control Systems Without State Space Models. In: Tsudik, G., Conti, M., Liang, K., Smaragdakis, G. (eds) Computer Security – ESORICS 2023. ESORICS 2023. Lecture Notes in Computer Science, vol 14347. Springer, Cham. https://doi.org/10.1007/978-3-031-51482-1_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-51482-1_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-51481-4

  • Online ISBN: 978-3-031-51482-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics