Safe Exploration for Active Learning with Gaussian Processes

Schreiter, Jens; Nguyen-Tuong, Duy; Eberts, Mona; Bischoff, Bastian; Markert, Heiner; Toussaint, Marc

doi:10.1007/978-3-319-23461-8_9

Safe Exploration for Active Learning with Gaussian Processes

Jens Schreiter¹²,
Duy Nguyen-Tuong¹²,
Mona Eberts¹²,
Bastian Bischoff¹²,
Heiner Markert¹² &
…
Marc Toussaint¹³

Conference paper
First Online: 01 January 2015

3999 Accesses
28 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9286))

Abstract

In this paper, the problem of safe exploration in the active learning context is considered. Safe exploration is especially important for data sampling from technical and industrial systems, e.g. combustion engines and gas turbines, where critical and unsafe measurements need to be avoided. The objective is to learn data-based regression models from such technical systems using a limited budget of measured, i.e. labelled, points while ensuring that critical regions of the considered systems are avoided during measurements. We propose an approach for learning such models and exploring new data regions based on Gaussian processes (GP’s). In particular, we employ a problem specific GP classifier to identify safe and unsafe regions, while using a differential entropy criterion for exploring relevant data regions. A theoretical analysis is shown for the proposed algorithm, where we provide an upper bound for the probability of failure. To demonstrate the efficiency and robustness of our safe exploration scheme in the active learning setting, we test the approach on a policy exploration task for the inverse pendulum hold up problem.

Download to read the full chapter text

Chapter PDF

References

Auer, P.: Using Confidence Bounds for Exploitation-Exploration Trade-Offs. Journal of Machine Learning Research 3, 397–422 (2002)
MathSciNet MATH Google Scholar
Cover, T.M., Thomas, J.A.: Elements of Information Theory. John Wiley & Sons (2006)
Google Scholar
Deisenroth, M.P., Fox, D., Rasmussen, C.E.: Gaussian Processes for Data-Efficient Learning in Robotics and Control. Transactions on Pattern Analysis and Machine Intelligence 37, 408–423 (2015)
Article Google Scholar
Fedorov, V.V.: Theory of Optimal Experiments. Academic Press (1972)
Google Scholar
Galichet, N., Sebag, M., Teytaud, O.: Exploration vs exploitation vs safety: risk-aware multi-armed bandits. In: Ong, C.S., Ho, T.B. (eds.) Proceedings of the 5th Asian Conference on Machine Learning, JMLR: W&CP, vol. 29, pp. 245–260 (2013)
Google Scholar
Geibel, P.: Reinforcement learning with bounded risk. In: Brodley, C.E., Danyluk, A.P. (eds.) Proceedings of the 18th International Conference on Machine Learning, pp. 162–169 (2001)
Google Scholar
Gillula, J.H., Tomlin, C.J.: Guaranteed safe online learning of a bounded system. In: Amato, N.M. (ed.) Proceedings of the International Conference on Intelligent Robots and Systems, pp. 2979–2984 (2011)
Google Scholar
Guestrin, C., Krause, A., Singh, A.: Near-Optimal sensor placements in gaussian processes. In: De Raedt, L., Wrobel, S. (eds.) Proceedings of the 22nd International Conference on Machine Learning, pp. 265–275 (2005)
Google Scholar
Hans, A., Schneegaß, D., Schäfer, AM., Udluft, S.: Safe Exploration for reinforcement learning. In: Verleysen, M. (ed.) Proceedings of the European Symposium on Artificial Neural Networks, pp. 143–148 (2008)
Google Scholar
Ko, C., Lee, J., Queyranne, M.: An Exact Algorithm for Maximum Entropy Sampling. Operations Research 43, 684–691 (1995)
Article MathSciNet MATH Google Scholar
Krause, A., Guestrin, C.: Nonmyopic active learning of gaussian processes: an exploration–exploitation approach. In: Ghahramani, Z. (ed.) Proceedings of the 24th International Conference on Machine Learning, pp. 449–456 (2007)
Google Scholar
Lang, K.J., Baum, E.B.: Query learning can work poorly when a human oracle is used. In: Proceedings of the International Joint Conference on Neural Networks, pp. 335–340 (1992)
Google Scholar
Moldovan, T.M., Abbeel, P.: Safe exploration in markov decision processes. In: Langford, J., Pineau, J. (eds.) Proceedings of the 29th International Conference on Machine Learning, pp. 1711–1718 (2012)
Google Scholar
Nemhauser, G.L., Wolsey, L.A., Fisher, M.L.: An Analysis of the Approximations for Maximizing Submodular Set Functions. Mathematical Programming 14, 265–294 (1978)
Article MathSciNet MATH Google Scholar
Nickisch, H., Rasmussen, C.E.: Approximations for Binary Gaussian Process Classification. Journal of Machine Learning Research 9, 2035–2078 (2008)
MathSciNet Google Scholar
Polo, F.J.G., Rebollo, F.F.: Safe reinforcement learning in high-risk tasks through policy improvement. In: Proceedings of the Symposium on Adaptive Dynamic Programming and Reinforcement Learning, pp. 76–83 (2011)
Google Scholar
Ramakrishnan, N., Bailey-Kellogg, C., Tadepalli, S., Pandey, V.N.: Gaussian processes for active data mining of spatial aggregates. In: Kargupta, H., Kamath, C., Srivastava, J., Goodman, A. (eds.) Proceedings of the 5th SIAM International Conference on Data Mining, pp. 427–438 (2005)
Google Scholar
Rasmussen, C.E., Williams, C.K.I.: Gaussian Processes for Machine Learning. The MIT Press (2006)
Google Scholar
Seo, S., Wallat, M., Graepel, T., Obermayer, K.: Gaussian process regression: active data selection and test point rejection. In: Proceedings of the International Joint Conference on Neural Networks vol. 3, pp. 241–246 (2000)
Google Scholar
Settles, B.: Active Learning Literature Survey. In: Computer Sciences Technical Report University of Wisconsin, Madison (2010)
Google Scholar
Sobol, I.M.: Uniformly Distributed Sequences with an Additional Uniform Property. USSR Computational Mathematics and Mathematical Physics 16, 236–242 (1976)
Article MathSciNet MATH Google Scholar
Srinivas, N., Krause, A., Kakade, S.M., Seeger, M.W.: Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting. Transactions on Information Theory 58, 3250–3265 (2012)
Article MathSciNet Google Scholar
Valiant, L.G.: A Theory of the Learnable. Communications of the ACM 27, 1134–1142 (1984)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Robert Bosch GmbH, 70442, Stuttgart, Germany
Jens Schreiter, Duy Nguyen-Tuong, Mona Eberts, Bastian Bischoff & Heiner Markert
University of Stuttgart, MLR Laboratory, 70569, Stuttgart, Germany
Marc Toussaint

Authors

Jens Schreiter
View author publications
You can also search for this author in PubMed Google Scholar
Duy Nguyen-Tuong
View author publications
You can also search for this author in PubMed Google Scholar
Mona Eberts
View author publications
You can also search for this author in PubMed Google Scholar
Bastian Bischoff
View author publications
You can also search for this author in PubMed Google Scholar
Heiner Markert
View author publications
You can also search for this author in PubMed Google Scholar
Marc Toussaint
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jens Schreiter .

Editor information

Editors and Affiliations

Huawei Noah’s Ark Lab, Shatin, Hong Kong
Albert Bifet
Siemens AG Corporate Technology, München, Germany
Michael May
IBM Research Brazil, Rio de Janeiro, Brazil
Bianca Zadrozny
Universitat Politècnica de Catalunya, Barcelona, Spain
Ricard Gavalda
Università di Pisa, Pisa, Italy
Dino Pedreschi
Eurecat / Yahoo Labs, Barcelona, Spain
Francesco Bonchi
University of Porto - INESC TEC, Porto, Portugal
Jaime Cardoso
Otto-von-Guericke University, Magdeburg, Germany
Myra Spiliopoulou

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Schreiter, J., Nguyen-Tuong, D., Eberts, M., Bischoff, B., Markert, H., Toussaint, M. (2015). Safe Exploration for Active Learning with Gaussian Processes. In: Bifet, A., et al. Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2015. Lecture Notes in Computer Science(), vol 9286. Springer, Cham. https://doi.org/10.1007/978-3-319-23461-8_9

Download citation

DOI: https://doi.org/10.1007/978-3-319-23461-8_9
Published: 29 August 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-23460-1
Online ISBN: 978-3-319-23461-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics