Skip to main content

Reinforcement Learning and Trustworthy Autonomy

  • Chapter
Book cover Cyber-Physical Systems Security

Abstract

Cyber-Physical Systems (CPS) possess physical and software interdependence and are typically designed by teams of mechanical, electrical, and software engineers. The interdisciplinary nature of CPS makes them difficult to design with safety guarantees. When autonomy is incorporated, design complexity and, especially, the difficulty of providing safety assurances are increased. Vision-based reinforcement learning is an increasingly popular family of machine learning algorithms that may be used to provide autonomy for CPS. Understanding how visual stimuli trigger various actions is critical for trustworthy autonomy. In this chapter we introduce reinforcement learning in the context of Microsoft’s AirSim drone simulator. Specifically, we guide the reader through the necessary steps for creating a drone simulation environment suitable for experimenting with vision-based reinforcement learning. We also explore how existing vision-oriented deep learning analysis methods may be applied toward safety verification in vision-based reinforcement learning applications.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    In the notation for this example, the subscripts denote “options,” versus the usual meaning, which is time in this chapter.

  2. 2.

    \(softmax(x_i | \boldsymbol {x}) := \frac {e^{x_i}}{\sum _{j=1}^{\vert \boldsymbol {x} \vert } e^{x_j}}\), where x is a vector of reals.

  3. 3.

    This is because our RL policy is memoryless. If an RNN or LSTM were used, instead of a vanilla CNN, the policy gains memory, and it would be possible for the drone to learn to bump into cubes which it can no longer see.

  4. 4.

    Softmax of the network’s logits.

  5. 5.

    Normalization is defined as g ← (g − μ(g))∕σ(g), where scalar operations are applied element-wise to the vector.

  6. 6.

    For the cube collection task used in this chapter, we used a simple CNN with a grayscale image as input, so we generate grayscale images for action visualization.

  7. 7.

    In this case, the training set consists of the set of images captured by the drone during its episodes.

  8. 8.

    ReLU stands for rectified linear unit and is defined as ReLU(x) = max(0, x).

  9. 9.

    When using Grad-CAM, a and s are sampled from the policy and environment, while the policy controlled the drone, whereas with CMV (presented in the previous subsection) s was generated by the method and a was specified.

References

  1. R.N Charette, This car runs on code. IEEE Spectr. 46(3), 3 (2009)

    Article  Google Scholar 

  2. A. Krizhevsky, I. Sutskever, G.E. Hinton, Imagenet classification with deep convolutional neural networks, in Advances in Neural Information Processing Systems (2012), pp. 1097–1105

    Google Scholar 

  3. R.J. Williams, Simple statistical gradient-following algorithms for connectionist reinforcement learning, in Reinforcement Learning (Springer, Berlin, 1992), pp. 5–32

    MATH  Google Scholar 

  4. S. Shah, D. Dey, C. Lovett, A. Kapoor, Airsim: high-fidelity visual and physical simulation for autonomous vehicles, in Field and Service Robotics (2017)

    Google Scholar 

  5. L. van der Maaten, G. Hinton, Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008)

    MATH  Google Scholar 

  6. K. Simonyan, A. Vedaldi, A. Zisserman, Deep inside convolutional networks: visualising image classification models and saliency maps. Preprint, arXiv:1312.6034 (2013)

    Google Scholar 

  7. C. Olah, A. Mordvintsev, L. Schubert, Feature visualization, in Distill (2017)

    Google Scholar 

  8. R.R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, D. Batra, Grad-CAM: visual explanations from deep networks via gradient-based localization, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017), pp. 618–626

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jieliang Luo .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this chapter

Cite this chapter

Luo, J., Green, S., Feghali, P., Legrady, G., Koç, Ç.K. (2018). Reinforcement Learning and Trustworthy Autonomy. In: Koç, Ç.K. (eds) Cyber-Physical Systems Security. Springer, Cham. https://doi.org/10.1007/978-3-319-98935-8_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-98935-8_10

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-98934-1

  • Online ISBN: 978-3-319-98935-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics