Abstract
Cyber-Physical Systems (CPS) possess physical and software interdependence and are typically designed by teams of mechanical, electrical, and software engineers. The interdisciplinary nature of CPS makes them difficult to design with safety guarantees. When autonomy is incorporated, design complexity and, especially, the difficulty of providing safety assurances are increased. Vision-based reinforcement learning is an increasingly popular family of machine learning algorithms that may be used to provide autonomy for CPS. Understanding how visual stimuli trigger various actions is critical for trustworthy autonomy. In this chapter we introduce reinforcement learning in the context of Microsoft’s AirSim drone simulator. Specifically, we guide the reader through the necessary steps for creating a drone simulation environment suitable for experimenting with vision-based reinforcement learning. We also explore how existing vision-oriented deep learning analysis methods may be applied toward safety verification in vision-based reinforcement learning applications.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
In the notation for this example, the subscripts denote “options,” versus the usual meaning, which is time in this chapter.
- 2.
\(softmax(x_i | \boldsymbol {x}) := \frac {e^{x_i}}{\sum _{j=1}^{\vert \boldsymbol {x} \vert } e^{x_j}}\), where x is a vector of reals.
- 3.
This is because our RL policy is memoryless. If an RNN or LSTM were used, instead of a vanilla CNN, the policy gains memory, and it would be possible for the drone to learn to bump into cubes which it can no longer see.
- 4.
Softmax of the network’s logits.
- 5.
Normalization is defined as g ← (g − μ(g))∕σ(g), where scalar operations are applied element-wise to the vector.
- 6.
For the cube collection task used in this chapter, we used a simple CNN with a grayscale image as input, so we generate grayscale images for action visualization.
- 7.
In this case, the training set consists of the set of images captured by the drone during its episodes.
- 8.
ReLU stands for rectified linear unit and is defined as ReLU(x) = max(0, x).
- 9.
When using Grad-CAM, a and s are sampled from the policy and environment, while the policy controlled the drone, whereas with CMV (presented in the previous subsection) s was generated by the method and a was specified.
References
R.N Charette, This car runs on code. IEEE Spectr. 46(3), 3 (2009)
A. Krizhevsky, I. Sutskever, G.E. Hinton, Imagenet classification with deep convolutional neural networks, in Advances in Neural Information Processing Systems (2012), pp. 1097–1105
R.J. Williams, Simple statistical gradient-following algorithms for connectionist reinforcement learning, in Reinforcement Learning (Springer, Berlin, 1992), pp. 5–32
S. Shah, D. Dey, C. Lovett, A. Kapoor, Airsim: high-fidelity visual and physical simulation for autonomous vehicles, in Field and Service Robotics (2017)
L. van der Maaten, G. Hinton, Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008)
K. Simonyan, A. Vedaldi, A. Zisserman, Deep inside convolutional networks: visualising image classification models and saliency maps. Preprint, arXiv:1312.6034 (2013)
C. Olah, A. Mordvintsev, L. Schubert, Feature visualization, in Distill (2017)
R.R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, D. Batra, Grad-CAM: visual explanations from deep networks via gradient-based localization, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017), pp. 618–626
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Luo, J., Green, S., Feghali, P., Legrady, G., Koç, Ç.K. (2018). Reinforcement Learning and Trustworthy Autonomy. In: Koç, Ç.K. (eds) Cyber-Physical Systems Security. Springer, Cham. https://doi.org/10.1007/978-3-319-98935-8_10
Download citation
DOI: https://doi.org/10.1007/978-3-319-98935-8_10
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-98934-1
Online ISBN: 978-3-319-98935-8
eBook Packages: Computer ScienceComputer Science (R0)