Reinforcement Learning Approach to Generate Zero-Dynamics Attacks on Control Systems Without State Space Models

Paudel, Bipin; Amariucai, George

doi:10.1007/978-3-031-51482-1_1

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14347))

Included in the following conference series:

European Symposium on Research in Computer Security

693 Accesses

Abstract

Stealthy attacks on control systems are bound to go unnoticed, which makes them a severe threat to critical infrastructure such as power systems, smart grids, and vehicular networks. This paper investigates a subset of stealthy attacks known as zero-dynamics-based stealthy attacks. While previous works on zero-dynamics attacks have highlighted the necessity of highly accurate knowledge of the system’s state space for generating attack signals, our study requires none. We propose a deep reinforcement learning based attacker to generate attack signals without prior knowledge of the system’s state space. We develop several attackers and detectors iteratively until the attacker and detectors no longer improve. In addition, we also show that the reinforcement learning based attacker successfully executes an attack in the same manner as the theoretical attacker described in previous literature.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Understanding adversarial attacks on observations in deep reinforcement learning

Article 26 April 2024

Multiple-Model Based Defense for Deep Reinforcement Learning Against Adversarial Attack

New Approaches to Detection and Secure Control for Cyber-physical Systems Against False Data Injection Attacks

Article 13 January 2025

References

Alabugin, S.K., Sokolov, A.N.: Applying of generative adversarial networks for anomaly detection in industrial control systems. In: 2020 Global Smart Industry Conference (GloSIC), pp. 199–203. IEEE (2020)
Google Scholar
Anderson, B.D.: Output-nulling invariant and controllability subspaces. IFAC Proc. Vol. 8(1), 337–345 (1975)
Article Google Scholar
Aoufi, S., Derhab, A., Guerroumi, M.: Survey of false data injection in smart power grid: attacks, countermeasures and challenges. J. Inf. Secur. Appl. 54, 102518 (2020)
Google Scholar
Defense Use Case: Analysis of the cyber attack on the Ukrainian power grid. Electr. Inf. Shar. Anal. Center (E-ISAC) 388, 1–29 (2016)
Google Scholar
Chen, Y., Huang, S., Liu, F., Wang, Z., Sun, X.: Evaluation of reinforcement learning-based false data injection attack to automatic voltage control. IEEE Trans. Smart Grid 10(2), 2158–2169 (2018)
Article Google Scholar
Dash, P., Karimibiuki, M., Pattabiraman, K.: Out of control: stealthy attacks against robotic vehicles protected by control-based techniques. In: Proceedings of the 35th Annual Computer Security Applications Conference, pp. 660–672 (2019)
Google Scholar
Deng, R., Xiao, G., Lu, R., Liang, H., Vasilakos, A.V.: False data injection on state estimation in power systems attacks, impacts, and defense: a survey. IEEE Trans. Industr. Inf. 13(2), 411–423 (2016)
Article Google Scholar
Duan, J., et al.: Deep-reinforcement-learning-based autonomous voltage control for power grid operations. IEEE Trans. Power Syst. 35(1), 814–817 (2019)
Article MathSciNet Google Scholar
Feng, C., Li, T., Zhu, Z., Chana, D.: A deep learning-based framework for conducting stealthy attacks in industrial control systems. arXiv preprint arXiv:1709.06397 (2017)
Haarnoja, T., Zhou, A., Abbeel, P., Levine, S.: Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor. In: International Conference on Machine Learning, pp. 1861–1870. PMLR (2018)
Google Scholar
Harshbarger, S.: The impact of zero-dynamics stealthy attacks on control systems: stealthy attack success probability and attack prevention (2022). https://krex.k-state.edu/dspace/handle/2097/42853
Harshbarger, S., Hosseinzadehtaher, M., Natarajan, B., Vasserman, E., Shadmand, M., Amariucai, G.: (A little) ignorance is bliss: The effect of imperfect model information on stealthy attacks in power grids. In: 2020 IEEE Kansas Power and Energy Conference (KPEC), pp. 1–6. IEEE (2020)
Google Scholar
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
Johansson, K.H.: The quadruple-tank process: a multivariable laboratory process with an adjustable zero. IEEE Trans. Control Syst. Technol. 8(3), 456–465 (2000)
Article Google Scholar
Kim, S., Park, K.J.: A survey on machine-learning based security design for cyber-physical systems. Appl. Sci. 11(12), 5458 (2021)
Article Google Scholar
Langner, R.: Stuxnet: dissecting a cyberwarfare weapon. IEEE Secur. Privacy 9(3), 49–51 (2011)
Article Google Scholar
Li, C., Qiu, M.: Reinforcement Learning for Cyber-Physical Systems: With Cybersecurity Case Studies. Chapman and Hall/CRC, London (2019)
Book Google Scholar
Liu, Z., Wang, Q., Ye, Y., Tang, Y.: A GAN-based data injection attack method on data-driven strategies in power systems. IEEE Trans. Smart Grid 13(4), 3203–3213 (2022)
Article Google Scholar
Sayghe, A., Zhao, J., Konstantinou, C.: Evasion attacks with adversarial deep learning against power system state estimation. In: 2020 IEEE Power & Energy Society General Meeting (PESGM), pp. 1–5. IEEE (2020)
Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (2018)
Google Scholar
Sutton, R.S., McAllester, D., Singh, S., Mansour, Y.: Policy gradient methods for reinforcement learning with function approximation. In: Advances in Neural Information Processing Systems 12 (1999)
Google Scholar
Teixeira, A., Pérez, D., Sandberg, H., Johansson, K.H.: Attack models and scenarios for networked control systems. In: Proceedings of the 1st International Conference on High Confidence Networked Systems, pp. 55–64 (2012)
Google Scholar
Teixeira, A., Shames, I., Sandberg, H., Johansson, K.H.: Revealing stealthy attacks in control systems. In: 2012 50th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1806–1813. IEEE (2012)
Google Scholar
Teixeira, A., Sou, K.C., Sandberg, H., Johansson, K.H.: Secure control systems: a quantitative risk management approach. IEEE Control Syst. Mag. 35(1), 24–45 (2015)
Article MathSciNet Google Scholar
Zenati, H., Foo, C.S., Lecouat, B., Manek, G., Chandrasekhar, V.R.: Efficient GAN-based anomaly detection. arXiv preprint arXiv:1802.06222 (2018)
Zhang, R., Venkitasubramaniam, P.: Stealthy control signal attacks in linear quadratic Gaussian control systems: detectability reward tradeoff. IEEE Trans. Inf. Forensics Secur. 12(7), 1555–1570 (2017)
Article Google Scholar

Download references

Acknowledgement

This publication was made possible by NPRP grant #12C-33905-SP-165 from the Qatar National Research Fund (a member of Qatar Foundation). The findings achieved herein are solely the responsibility of the authors. We appreciate the anonymous reviewers’ valuable suggestions and comments.

Author information

Authors and Affiliations

Kansas State University, Manhattan, KS, 66506, USA
Bipin Paudel & George Amariucai

Authors

Bipin Paudel
View author publications
You can also search for this author in PubMed Google Scholar
George Amariucai
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bipin Paudel .

Editor information

Editors and Affiliations

University of California, Irvine, CA, USA
Gene Tsudik
University of Padua, Padua, Italy
Mauro Conti
Delft University of Technology, Delft, The Netherlands
Kaitai Liang
Delft University of Technology, Delft, The Netherlands
Georgios Smaragdakis

Appendices

A Proof for Attack Generation

Since matrix D is 0, we replace the solution in [2] as shown below. We need a value of $F_2$ that satisfies

From this, we can easily obtain

where, $\begin{bmatrix} V & B \end{bmatrix} ^ +$ and $V^+$ represent the Moore-Penrose pseudo-inverses of matrices $\begin{bmatrix} V & B \end{bmatrix}$ and V, respectively.

If we choose $\begin{bmatrix} F_1\\ F_2 \end{bmatrix}$ to be the right-hand side of (8b), that is,

$$\begin{aligned} \begin{bmatrix} F_1\\ F_2 \end{bmatrix}= \begin{bmatrix} V & B \end{bmatrix}^+ AVV^+, \end{aligned}$$

(9)

then using one of the properties of the pseudo-inverse, that states that for a general matrix M we have $M^+MM^+=M^+$, we see that this choice of $\begin{bmatrix}F_1\\ F_2\end{bmatrix}$ satisfies (8b). Now substituting the same in the left-hand side of (8a), and using the property that for a general matrix M we have that $MM^+M=M$, we get

$$\begin{aligned} \begin{bmatrix} V B \end{bmatrix} \begin{bmatrix} F_1\\ F_2 \end{bmatrix}V= \begin{bmatrix} V & B \end{bmatrix} \begin{bmatrix} V & B \end{bmatrix}^+ AV, \end{aligned}$$

(10)

the right-hand side of which is equal to AV (meaning that our choice of $\begin{bmatrix}F_1\\ F_2\end{bmatrix}$ satisfies (8a)) whenever $\begin{bmatrix} V B\end{bmatrix}$ has linearly independent rows. What remains is to set $F=-F_2$.

B Architectures and Hyperparameters

Table 4 shows the architecture of neural network models associated with the attackers. The table displays the number of nodes in each subsequent layers of the model, activation function in hidden and output layers, and learning rate and optimizer used to update the model’s parameters. Similarly, the Table 5 consists of hyperparameter settings for the neural network models. The target network is updated using a soft update method whose coefficient is 0.005 for each attacker. Similarly, the output of these attackers is scaled using the Tanh activation function. Hence, to get the desired attack signals, we scaled the output by some constant, depicted in the same table. In addition to this, the value of the parameters a and b from the reward function in Sect. 4.2 for each attacker are provided in the table. In addition, Table 6 consists of architectures and hyperparameters of LSTM detectors. All the detectors have the same architecture but differ in learning rate parameter, which is given in the table.

Table 4. Architecture of the neural network models associated with the attackers. Attackers ($ATT_1$, $ATT_2$) and ($ATT_3$, $ATT_4$ and $ATT_5$) share the same architecture

Full size table

Table 5. Hyper Parameter Setting for the attackers based on SAC algorithm

Full size table

Table 6. Architecture of the LSTM-based detectors

Full size table

C Attacker Rewards and Performance

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Paudel, B., Amariucai, G. (2024). Reinforcement Learning Approach to Generate Zero-Dynamics Attacks on Control Systems Without State Space Models. In: Tsudik, G., Conti, M., Liang, K., Smaragdakis, G. (eds) Computer Security – ESORICS 2023. ESORICS 2023. Lecture Notes in Computer Science, vol 14347. Springer, Cham. https://doi.org/10.1007/978-3-031-51482-1_1

Download citation

DOI: https://doi.org/10.1007/978-3-031-51482-1_1
Published: 11 January 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-51481-4
Online ISBN: 978-3-031-51482-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Reinforcement Learning Approach to Generate Zero-Dynamics Attacks on Control Systems Without State Space Models

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Understanding adversarial attacks on observations in deep reinforcement learning

Multiple-Model Based Defense for Deep Reinforcement Learning Against Adversarial Attack

New Approaches to Detection and Secure Control for Cyber-physical Systems Against False Data Injection Attacks

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendices

A Proof for Attack Generation

B Architectures and Hyperparameters

C Attacker Rewards and Performance

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us