Skip to main content

Safer Reinforcement Learning for Agents in Industrial Grid-Warehousing

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12566))

Abstract

In mission-critical, real-world environments, there is typically a low threshold for failure, which makes interaction with learning algorithms particularly challenging. Here, current state-of-the-art reinforcement learning algorithms struggle to learn optimal control policies safely. Loss of control follows, which could result in equipment breakages and even personal injuries.

On the other hand, a model-based reinforcement learning algorithm aims to encode environment transition dynamics into a predictive model. The transition dynamics define the mapping from one state to another, conditioned on an action. A sufficiently accurate predictive model should learn optimal behavior, also even in real environments.

The paper’s heart is the introduction of the novel, Safer Dreaming Variational Autoencoder, which combines constrained criterion, external knowledge, and risk-directed exploration to learn good policies. Using model-based reinforcement learning, we show that the proposed method performs comparably to model-free algorithms safety constraints, with a substantially lower risk of entering catastrophic states.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    Mixture Density Network combined with a Recurrent Neural Networks.

  2. 2.

    All models in this paper are neural network approximations.

  3. 3.

    https://github.com/cair/deep-warehouse.

  4. 4.

    https://github.com/cair/deep-rts.

  5. 5.

    The agent in DeepRTS walks to the nearest gold deposit when the inventory is empty. When inventory is full, it returns the gold to the base.

References

  1. Andersen, P.-A., Goodwin, M., Granmo, O.-C.: Deep RTS: a game environment for deep reinforcement learning in real-time strategy games. In: 2018 IEEE Conference on Computational Intelligence and Games (CIG). IEEE (2018). https://doi.org/10.1109/CIG.2018.8490409

  2. Andersen, P.-A., Goodwin, M., Granmo, O.-C.: The dreaming variational autoencoder for reinforcement learning environments. In: Bramer, M., Petridis, M. (eds.) SGAI 2018. LNCS (LNAI), vol. 11311, pp. 143–155. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-04191-5_11

    Chapter  Google Scholar 

  3. Andersen, P.-A., Goodwin, M., Granmo, O.-C.: Towards model-based reinforcement learning for industry-near environments. In: Bramer, M., Petridis, M. (eds.) SGAI 2019. LNCS (LNAI), vol. 11927, pp. 36–49. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-34885-4_3

    Chapter  Google Scholar 

  4. Arulkumaran, K., Deisenroth, M.P., Brundage, M., Bharath, A.A.: Deep reinforcement learning: a brief survey. IEEE Sig. Process. Mag. 34, 26–38 (2017). https://doi.org/10.1109/MSP.2017.2743240

    Article  Google Scholar 

  5. Berkenkamp, F., Turchetta, M., Schoellig, A.P., Krause, A.: Safe model-based reinforcement learning with stability guarantees. In: Advances in Neural Information Processing Systems 30, pp. 908–918. Curran Associates Inc., CA, USA (2017)

    Google Scholar 

  6. Edith, L.L., Melanie, C., Doina, P., Bohdana, R.: Risk-directed exploration in reinforcement learning. In: IJCAI (2005)

    Google Scholar 

  7. Fox, R., Pakman, A., Tishby, N.: Taming the noise in reinforcement learning via soft updates. In: Proceedings of the 32nd Conference on Uncertainty in Artificial Intelligence, UAI 2016, pp. 202–211. AUAI Press, Arlington (2016). https://doi.org/10.5555/3020948.3020970

  8. Gaskett, C.: Reinforcement learning under circumstances beyond its control. In: Proceedings of the International Conference on Computational Intelligence for Modelling Control and Automation (2003). http://www.his.atr.co.jp/cgaskett/

  9. Geibel, P., Wysotzki, F.: Risk-sensitive reinforcement learning applied to control under constraints. JAIR 24, 81–108 (2005). https://doi.org/10.1613/jair.1666

    Article  MATH  Google Scholar 

  10. Ha, D., Schmidhuber, J.: Recurrent world models facilitate policy evolution. In: Advances in Neural Information Processing Systems 31, pp. 2450–2462. Curran Associates Inc., Montréal, September 2018. https://arxiv.org/abs/1809.01999

  11. Heger, M.: Consideration of risk in reinforcement learning. In: Proceedings of the 11th International Conference on Machine Learning, ICML 1994, pp. 105–111. Elsevier (1994). https://doi.org/10.1016/B978-1-55860-335-6.50021-0

  12. Levine, S., Finn, C., Darrell, T., Abbeel, P.: End-to-end training of deep visuomotor policies. J. Mach. Learn. Res. 17(1), 1334–1373 (2016)

    MathSciNet  MATH  Google Scholar 

  13. Mnih, V., et al.: Playing Atari with deep reinforcement learning. Neural Information Processing Systems, December 2013. http://arxiv.org/abs/1312.5602

  14. Moldovan, T.M., Abbeel, P.: Safe exploration in Markov decision processes. In: Proceedings of the 29th International Conference on Machine Learning (2012)

    Google Scholar 

  15. Santana, E., Hotz, G.: Learning a Driving Simulator. arxiv preprint arXiv:1608.01230 (2016)

  16. Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal Policy Optimization Algorithms. arxiv preprint arXiv:1707.06347 (2017)

  17. Sutton, R.S., Barto, A.G.: Reinforcement learning: An introduction, 2nd edn. A Bradford Book, Cambridge (2018). https://doi.org/10.5555/3312046

  18. Wang, Y., He, H., Tan, X.: Robust Reinforcement Learning in POMDPs with Incomplete and Noisy Observations (2019)

    Google Scholar 

  19. Xiao, T., Kesineni, G.: Generative adversarial networks for model based reinforcement learning with tree search. Technical report, University of California, Berkeley (2016). http://tedxiao.me/pdf/gans_drl.pdf

  20. Zhang, C., Patras, P., Haddadi, H.: Deep learning in mobile and wireless networking: a survey. IEEE Commun. Surv. Tutor. 21(3), 2224–2287 (2019). https://doi.org/10.1109/COMST.2019.2904897

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Per-Arne Andersen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Andersen, PA., Goodwin, M., Granmo, OC. (2020). Safer Reinforcement Learning for Agents in Industrial Grid-Warehousing. In: Nicosia, G., et al. Machine Learning, Optimization, and Data Science. LOD 2020. Lecture Notes in Computer Science(), vol 12566. Springer, Cham. https://doi.org/10.1007/978-3-030-64580-9_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-64580-9_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-64579-3

  • Online ISBN: 978-3-030-64580-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics