Safer Reinforcement Learning for Agents in Industrial Grid-Warehousing

Andersen, Per-Arne; Goodwin, Morten; Granmo, Ole-Christoffer

doi:10.1007/978-3-030-64580-9_14

Safer Reinforcement Learning for Agents in Industrial Grid-Warehousing

Per-Arne Andersen¹⁶,
Morten Goodwin¹⁶ &
Ole-Christoffer Granmo¹⁶

Conference paper
First Online: 07 January 2021

1301 Accesses
4 Altmetric

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12566))

Abstract

In mission-critical, real-world environments, there is typically a low threshold for failure, which makes interaction with learning algorithms particularly challenging. Here, current state-of-the-art reinforcement learning algorithms struggle to learn optimal control policies safely. Loss of control follows, which could result in equipment breakages and even personal injuries.

On the other hand, a model-based reinforcement learning algorithm aims to encode environment transition dynamics into a predictive model. The transition dynamics define the mapping from one state to another, conditioned on an action. A sufficiently accurate predictive model should learn optimal behavior, also even in real environments.

The paper’s heart is the introduction of the novel, Safer Dreaming Variational Autoencoder, which combines constrained criterion, external knowledge, and risk-directed exploration to learn good policies. Using model-based reinforcement learning, we show that the proposed method performs comparably to model-free algorithms safety constraints, with a substantially lower risk of entering catastrophic states.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
Mixture Density Network combined with a Recurrent Neural Networks.
2.
All models in this paper are neural network approximations.
3.
https://github.com/cair/deep-warehouse.
4.
https://github.com/cair/deep-rts.
5.
The agent in DeepRTS walks to the nearest gold deposit when the inventory is empty. When inventory is full, it returns the gold to the base.

References

Andersen, P.-A., Goodwin, M., Granmo, O.-C.: Deep RTS: a game environment for deep reinforcement learning in real-time strategy games. In: 2018 IEEE Conference on Computational Intelligence and Games (CIG). IEEE (2018). https://doi.org/10.1109/CIG.2018.8490409
Andersen, P.-A., Goodwin, M., Granmo, O.-C.: The dreaming variational autoencoder for reinforcement learning environments. In: Bramer, M., Petridis, M. (eds.) SGAI 2018. LNCS (LNAI), vol. 11311, pp. 143–155. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-04191-5_11
Chapter Google Scholar
Andersen, P.-A., Goodwin, M., Granmo, O.-C.: Towards model-based reinforcement learning for industry-near environments. In: Bramer, M., Petridis, M. (eds.) SGAI 2019. LNCS (LNAI), vol. 11927, pp. 36–49. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-34885-4_3
Chapter Google Scholar
Arulkumaran, K., Deisenroth, M.P., Brundage, M., Bharath, A.A.: Deep reinforcement learning: a brief survey. IEEE Sig. Process. Mag. 34, 26–38 (2017). https://doi.org/10.1109/MSP.2017.2743240
Article Google Scholar
Berkenkamp, F., Turchetta, M., Schoellig, A.P., Krause, A.: Safe model-based reinforcement learning with stability guarantees. In: Advances in Neural Information Processing Systems 30, pp. 908–918. Curran Associates Inc., CA, USA (2017)
Google Scholar
Edith, L.L., Melanie, C., Doina, P., Bohdana, R.: Risk-directed exploration in reinforcement learning. In: IJCAI (2005)
Google Scholar
Fox, R., Pakman, A., Tishby, N.: Taming the noise in reinforcement learning via soft updates. In: Proceedings of the 32nd Conference on Uncertainty in Artificial Intelligence, UAI 2016, pp. 202–211. AUAI Press, Arlington (2016). https://doi.org/10.5555/3020948.3020970
Gaskett, C.: Reinforcement learning under circumstances beyond its control. In: Proceedings of the International Conference on Computational Intelligence for Modelling Control and Automation (2003). http://www.his.atr.co.jp/cgaskett/
Geibel, P., Wysotzki, F.: Risk-sensitive reinforcement learning applied to control under constraints. JAIR 24, 81–108 (2005). https://doi.org/10.1613/jair.1666
Article MATH Google Scholar
Ha, D., Schmidhuber, J.: Recurrent world models facilitate policy evolution. In: Advances in Neural Information Processing Systems 31, pp. 2450–2462. Curran Associates Inc., Montréal, September 2018. https://arxiv.org/abs/1809.01999
Heger, M.: Consideration of risk in reinforcement learning. In: Proceedings of the 11th International Conference on Machine Learning, ICML 1994, pp. 105–111. Elsevier (1994). https://doi.org/10.1016/B978-1-55860-335-6.50021-0
Levine, S., Finn, C., Darrell, T., Abbeel, P.: End-to-end training of deep visuomotor policies. J. Mach. Learn. Res. 17(1), 1334–1373 (2016)
MathSciNet MATH Google Scholar
Mnih, V., et al.: Playing Atari with deep reinforcement learning. Neural Information Processing Systems, December 2013. http://arxiv.org/abs/1312.5602
Moldovan, T.M., Abbeel, P.: Safe exploration in Markov decision processes. In: Proceedings of the 29th International Conference on Machine Learning (2012)
Google Scholar
Santana, E., Hotz, G.: Learning a Driving Simulator. arxiv preprint arXiv:1608.01230 (2016)
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal Policy Optimization Algorithms. arxiv preprint arXiv:1707.06347 (2017)
Sutton, R.S., Barto, A.G.: Reinforcement learning: An introduction, 2nd edn. A Bradford Book, Cambridge (2018). https://doi.org/10.5555/3312046
Wang, Y., He, H., Tan, X.: Robust Reinforcement Learning in POMDPs with Incomplete and Noisy Observations (2019)
Google Scholar
Xiao, T., Kesineni, G.: Generative adversarial networks for model based reinforcement learning with tree search. Technical report, University of California, Berkeley (2016). http://tedxiao.me/pdf/gans_drl.pdf
Zhang, C., Patras, P., Haddadi, H.: Deep learning in mobile and wireless networking: a survey. IEEE Commun. Surv. Tutor. 21(3), 2224–2287 (2019). https://doi.org/10.1109/COMST.2019.2904897
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of ICT, University of Agder, Grimstad, Norway
Per-Arne Andersen, Morten Goodwin & Ole-Christoffer Granmo

Authors

Per-Arne Andersen
View author publications
You can also search for this author in PubMed Google Scholar
Morten Goodwin
View author publications
You can also search for this author in PubMed Google Scholar
Ole-Christoffer Granmo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Per-Arne Andersen .

Editor information

Editors and Affiliations

University of Catania, Catania, Italy
Giuseppe Nicosia
University of Reading, Reading, UK
Varun Ojha
University of Oxford, Oxford, UK
Emanuele La Malfa
University of Cambridge, Cambridge, UK
Giorgio Jansen
Almawave, Rome, Italy
Vincenzo Sciacca
University of Florida, Gainesville, FL, USA
Panos Pardalos
University of Catania, Catania, Italy
Giovanni Giuffrida
Harvard University, Cambridge, MA, USA
Renato Umeton

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Andersen, PA., Goodwin, M., Granmo, OC. (2020). Safer Reinforcement Learning for Agents in Industrial Grid-Warehousing. In: Nicosia, G., et al. Machine Learning, Optimization, and Data Science. LOD 2020. Lecture Notes in Computer Science(), vol 12566. Springer, Cham. https://doi.org/10.1007/978-3-030-64580-9_14

Download citation

DOI: https://doi.org/10.1007/978-3-030-64580-9_14
Published: 07 January 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-64579-3
Online ISBN: 978-3-030-64580-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics