Abstract
In this paper, the problem of safe exploration in the active learning context is considered. Safe exploration is especially important for data sampling from technical and industrial systems, e.g. combustion engines and gas turbines, where critical and unsafe measurements need to be avoided. The objective is to learn data-based regression models from such technical systems using a limited budget of measured, i.e. labelled, points while ensuring that critical regions of the considered systems are avoided during measurements. We propose an approach for learning such models and exploring new data regions based on Gaussian processes (GP’s). In particular, we employ a problem specific GP classifier to identify safe and unsafe regions, while using a differential entropy criterion for exploring relevant data regions. A theoretical analysis is shown for the proposed algorithm, where we provide an upper bound for the probability of failure. To demonstrate the efficiency and robustness of our safe exploration scheme in the active learning setting, we test the approach on a policy exploration task for the inverse pendulum hold up problem.
Chapter PDF
References
Auer, P.: Using Confidence Bounds for Exploitation-Exploration Trade-Offs. Journal of Machine Learning Research 3, 397–422 (2002)
Cover, T.M., Thomas, J.A.: Elements of Information Theory. John Wiley & Sons (2006)
Deisenroth, M.P., Fox, D., Rasmussen, C.E.: Gaussian Processes for Data-Efficient Learning in Robotics and Control. Transactions on Pattern Analysis and Machine Intelligence 37, 408–423 (2015)
Fedorov, V.V.: Theory of Optimal Experiments. Academic Press (1972)
Galichet, N., Sebag, M., Teytaud, O.: Exploration vs exploitation vs safety: risk-aware multi-armed bandits. In: Ong, C.S., Ho, T.B. (eds.) Proceedings of the 5th Asian Conference on Machine Learning, JMLR: W&CP, vol. 29, pp. 245–260 (2013)
Geibel, P.: Reinforcement learning with bounded risk. In: Brodley, C.E., Danyluk, A.P. (eds.) Proceedings of the 18th International Conference on Machine Learning, pp. 162–169 (2001)
Gillula, J.H., Tomlin, C.J.: Guaranteed safe online learning of a bounded system. In: Amato, N.M. (ed.) Proceedings of the International Conference on Intelligent Robots and Systems, pp. 2979–2984 (2011)
Guestrin, C., Krause, A., Singh, A.: Near-Optimal sensor placements in gaussian processes. In: De Raedt, L., Wrobel, S. (eds.) Proceedings of the 22nd International Conference on Machine Learning, pp. 265–275 (2005)
Hans, A., Schneegaß, D., Schäfer, AM., Udluft, S.: Safe Exploration for reinforcement learning. In: Verleysen, M. (ed.) Proceedings of the European Symposium on Artificial Neural Networks, pp. 143–148 (2008)
Ko, C., Lee, J., Queyranne, M.: An Exact Algorithm for Maximum Entropy Sampling. Operations Research 43, 684–691 (1995)
Krause, A., Guestrin, C.: Nonmyopic active learning of gaussian processes: an exploration–exploitation approach. In: Ghahramani, Z. (ed.) Proceedings of the 24th International Conference on Machine Learning, pp. 449–456 (2007)
Lang, K.J., Baum, E.B.: Query learning can work poorly when a human oracle is used. In: Proceedings of the International Joint Conference on Neural Networks, pp. 335–340 (1992)
Moldovan, T.M., Abbeel, P.: Safe exploration in markov decision processes. In: Langford, J., Pineau, J. (eds.) Proceedings of the 29th International Conference on Machine Learning, pp. 1711–1718 (2012)
Nemhauser, G.L., Wolsey, L.A., Fisher, M.L.: An Analysis of the Approximations for Maximizing Submodular Set Functions. Mathematical Programming 14, 265–294 (1978)
Nickisch, H., Rasmussen, C.E.: Approximations for Binary Gaussian Process Classification. Journal of Machine Learning Research 9, 2035–2078 (2008)
Polo, F.J.G., Rebollo, F.F.: Safe reinforcement learning in high-risk tasks through policy improvement. In: Proceedings of the Symposium on Adaptive Dynamic Programming and Reinforcement Learning, pp. 76–83 (2011)
Ramakrishnan, N., Bailey-Kellogg, C., Tadepalli, S., Pandey, V.N.: Gaussian processes for active data mining of spatial aggregates. In: Kargupta, H., Kamath, C., Srivastava, J., Goodman, A. (eds.) Proceedings of the 5th SIAM International Conference on Data Mining, pp. 427–438 (2005)
Rasmussen, C.E., Williams, C.K.I.: Gaussian Processes for Machine Learning. The MIT Press (2006)
Seo, S., Wallat, M., Graepel, T., Obermayer, K.: Gaussian process regression: active data selection and test point rejection. In: Proceedings of the International Joint Conference on Neural Networks vol. 3, pp. 241–246 (2000)
Settles, B.: Active Learning Literature Survey. In: Computer Sciences Technical Report University of Wisconsin, Madison (2010)
Sobol, I.M.: Uniformly Distributed Sequences with an Additional Uniform Property. USSR Computational Mathematics and Mathematical Physics 16, 236–242 (1976)
Srinivas, N., Krause, A., Kakade, S.M., Seeger, M.W.: Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting. Transactions on Information Theory 58, 3250–3265 (2012)
Valiant, L.G.: A Theory of the Learnable. Communications of the ACM 27, 1134–1142 (1984)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Schreiter, J., Nguyen-Tuong, D., Eberts, M., Bischoff, B., Markert, H., Toussaint, M. (2015). Safe Exploration for Active Learning with Gaussian Processes. In: Bifet, A., et al. Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2015. Lecture Notes in Computer Science(), vol 9286. Springer, Cham. https://doi.org/10.1007/978-3-319-23461-8_9
Download citation
DOI: https://doi.org/10.1007/978-3-319-23461-8_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-23460-1
Online ISBN: 978-3-319-23461-8
eBook Packages: Computer ScienceComputer Science (R0)