Reinforcement Learning and Trustworthy Autonomy

Luo, Jieliang; Green, Sam; Feghali, Peter; Legrady, George; Koç, Çetin Kaya

doi:10.1007/978-3-319-98935-8_10

Jieliang Luo⁴,
Sam Green⁴,
Peter Feghali⁴,
George Legrady⁴ &
…
Çetin Kaya Koç^5,6,7

1391 Accesses

Abstract

Cyber-Physical Systems (CPS) possess physical and software interdependence and are typically designed by teams of mechanical, electrical, and software engineers. The interdisciplinary nature of CPS makes them difficult to design with safety guarantees. When autonomy is incorporated, design complexity and, especially, the difficulty of providing safety assurances are increased. Vision-based reinforcement learning is an increasingly popular family of machine learning algorithms that may be used to provide autonomy for CPS. Understanding how visual stimuli trigger various actions is critical for trustworthy autonomy. In this chapter we introduce reinforcement learning in the context of Microsoft’s AirSim drone simulator. Specifically, we guide the reader through the necessary steps for creating a drone simulation environment suitable for experimenting with vision-based reinforcement learning. We also explore how existing vision-oriented deep learning analysis methods may be applied toward safety verification in vision-based reinforcement learning applications.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
In the notation for this example, the subscripts denote “options,” versus the usual meaning, which is time in this chapter.
2.
\(softmax(x_i | \boldsymbol {x}) := \frac {e^{x_i}}{\sum _{j=1}^{\vert \boldsymbol {x} \vert } e^{x_j}}\), where x is a vector of reals.
3.
This is because our RL policy is memoryless. If an RNN or LSTM were used, instead of a vanilla CNN, the policy gains memory, and it would be possible for the drone to learn to bump into cubes which it can no longer see.
4.
Softmax of the network’s logits.
5.
Normalization is defined as g ← (g − μ(g))∕σ(g), where scalar operations are applied element-wise to the vector.
6.
For the cube collection task used in this chapter, we used a simple CNN with a grayscale image as input, so we generate grayscale images for action visualization.
7.
In this case, the training set consists of the set of images captured by the drone during its episodes.
8.
ReLU stands for rectified linear unit and is defined as ReLU(x) = max(0, x).
9.
When using Grad-CAM, a and s are sampled from the policy and environment, while the policy controlled the drone, whereas with CMV (presented in the previous subsection) s was generated by the method and a was specified.

References

R.N Charette, This car runs on code. IEEE Spectr. 46(3), 3 (2009)
Article Google Scholar
A. Krizhevsky, I. Sutskever, G.E. Hinton, Imagenet classification with deep convolutional neural networks, in Advances in Neural Information Processing Systems (2012), pp. 1097–1105
Google Scholar
R.J. Williams, Simple statistical gradient-following algorithms for connectionist reinforcement learning, in Reinforcement Learning (Springer, Berlin, 1992), pp. 5–32
MATH Google Scholar
S. Shah, D. Dey, C. Lovett, A. Kapoor, Airsim: high-fidelity visual and physical simulation for autonomous vehicles, in Field and Service Robotics (2017)
Google Scholar
L. van der Maaten, G. Hinton, Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008)
MATH Google Scholar
K. Simonyan, A. Vedaldi, A. Zisserman, Deep inside convolutional networks: visualising image classification models and saliency maps. Preprint, arXiv:1312.6034 (2013)
Google Scholar
C. Olah, A. Mordvintsev, L. Schubert, Feature visualization, in Distill (2017)
Google Scholar
R.R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, D. Batra, Grad-CAM: visual explanations from deep networks via gradient-based localization, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017), pp. 618–626
Google Scholar

Download references

Author information

Authors and Affiliations

University of California Santa Barbara, Santa Barbara, CA, USA
Jieliang Luo, Sam Green, Peter Feghali & George Legrady
İstinye University, Istanbul, Turkey
Çetin Kaya Koç
Nanjing University of Aeronautics and Astronautics, Nanjing, China
Çetin Kaya Koç
University of California Santa Barbara, Santa Barbara, CA, USA
Çetin Kaya Koç

Authors

Jieliang Luo
View author publications
You can also search for this author in PubMed Google Scholar
Sam Green
View author publications
You can also search for this author in PubMed Google Scholar
Peter Feghali
View author publications
You can also search for this author in PubMed Google Scholar
George Legrady
View author publications
You can also search for this author in PubMed Google Scholar
Çetin Kaya Koç
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jieliang Luo .

Editor information

Editors and Affiliations

İstinye University, İstanbul, Turkey
Çetin Kaya Koç
Nanjing University of Aeronautics and Astronautics, Nanjing, China
Çetin Kaya Koç
University of California Santa Barbara, Santa Barbara, CA, USA
Çetin Kaya Koç

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Luo, J., Green, S., Feghali, P., Legrady, G., Koç, Ç.K. (2018). Reinforcement Learning and Trustworthy Autonomy. In: Koç, Ç.K. (eds) Cyber-Physical Systems Security. Springer, Cham. https://doi.org/10.1007/978-3-319-98935-8_10

Download citation

DOI: https://doi.org/10.1007/978-3-319-98935-8_10
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-98934-1
Online ISBN: 978-3-319-98935-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics