Abstract
Imitation Learning (IL) algorithms try to mimic expert behavior in order to be capable of performing specific tasks, but it remains unclear what those strategies could achieve when learning from sub-optimal data (faulty experts). Studying how Imitation Learning approaches learn when dealing with different degrees of quality from the observations can benefit tasks such as optimizing data collection, producing interpretable models, reducing bias when using sub-optimal experts, and more. Therefore, in this work we provide extensive experiments to verify how different Imitation Learning agents perform under various degrees of expert optimality. We experiment with four IL algorithms, three of them that learn self-supervisedly and one that uses the ground-truth labels (BC) in four different environments (tasks), and we compare them using optimal and sub-optimal experts. For assessing the performance of each agent, we compute two metrics: Performance and Average Episodic Reward. Our experiments show that IL approaches that learn self-supervisedly are relatively resilient to sub-optimal experts, which is not the case of the supervised approach. We also observe that sub-optimal experts are sometimes beneficial since they seem to act as a kind of regularization method, preventing models from data overfitting. You can replicate our experiments by using the code in our GitHub (https://github.com/NathanGavenski/How-resilient-IL-methods-are).
N. Gavenski and J. Monteiro—These authors contributed equally to the work.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robot. Auton. Syst. 57(5), 469–483 (2009)
Barto, A.G., Sutton, R.S., Anderson, C.W.: Neuronlike adaptive elements that can solve difficult learning control problems. IEEE Trans. Syst. Man Cybern. 1(5), 834–846 (1983)
Burke, C.J., Tobler, P.N., Baddeley, M., Schultz, W.: Neural mechanisms of observational learning. Proc. Natl. Acad. Sci. U. S. A. 107(32), 14431–14436 (2010)
Ciosek, K.: Imitation learning by reinforcement learning. arXiv preprint arXiv:2108.04763 (2021)
Edwards, A., Sahni, H., Schroecker, Y., Isbell, C.: Imitating latent policies from observation. In: International Conference on Machine Learning, pp. 1755–1763. PMLR (2019)
Gavenski, N., Monteiro, J., Granada, R., Meneguzzi, F., Barros, R.C.: Imitating unknown policies via exploration. arXiv preprint arXiv:2008.05660 (2020)
Gavenski, N.S.: Self-supervised imitation learning from observation. Master’s thesis, Pontifícia Universidade Católica do Rio Grande do Sul (2021)
Ho, J., Ermon, S.: Generative adversarial imitation learning. In: Proceedings of the 30th Conference on Neural Information Processing Systems (NIPS 2016), NIPS 2016, pp. 4565–4573 (2016)
Hussein, A., Gaber, M.M., Elyan, E., Jayne, C.: Imitation learning: a survey of learning methods. ACM Comput. Surv. (CSUR) 50(2), 1–35 (2017)
Jing, L., Tian, Y.: Self-supervised visual feature learning with deep neural networks: a survey. IEEE Trans. Pattern Anal. Mach. Intell. 1(01), 1–24 (2019). https://doi.org/10.1109/TPAMI.2020.2992393
Kidambi, R., Chang, J.D., Sun, W.: Mobile: model-based imitation learning from observations alone (2021)
Liu, Y., Gupta, A., Abbeel, P., Levine, S.: Imitation from observation: learning to imitate behaviors from raw video via context translation. In: Proceedings of ICRA 2018, pp. 1118–1125 (2018)
Monteiro, J., Gavenski, N., Granada, R., Meneguzzi, F., Barros, R.C.: Augmented behavioral cloning from observation. In: Proceedings of the 2020 International Conference on Neural Networks, IJCNN 2020, pp. 1–8. IEEE, July 2020. https://arxiv.org/abs/2004.13529
Moore, A.W.: Efficient memory-based learning for robot control. Ph.D. thesis, University of Cambridge (1990)
Pomerleau, D.A.: ALVINN: an autonomous land vehicle in a neural network. In: Proceedings of the 1st Conference on Neural Information Processing Systems, NIPS 1988, pp. 305–313 (1988)
Schaal, S.: Learning from demonstration. In: Proceedings of NIPS 1996, pp. 1040–1046 (1996)
Sutton, R.S., Barto, A.G.: Introduction to Reinforcement Learning, vol. 2. MIT Press, Cambridge (1998)
Torabi, F., Warnell, G., Stone, P.: Behavioral cloning from observation. In: Proceedings of IJCAI 2018, pp. 4950–4957 (2018)
Zhu, Z., Lin, K., Dai, B., Zhou, J.: Off-policy imitation learning from observations. Adv. Neural. Inf. Process. Syst. 33, 12402–12413 (2020)
Acknowledgment
This paper was achieved in cooperation with Banco Cooperativo Sicredi.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Gavenski, N., Monteiro, J., Medronha, A., Barros, R.C. (2022). How Resilient Are Imitation Learning Methods to Sub-optimal Experts?. In: Xavier-Junior, J.C., Rios, R.A. (eds) Intelligent Systems. BRACIS 2022. Lecture Notes in Computer Science(), vol 13654 . Springer, Cham. https://doi.org/10.1007/978-3-031-21689-3_32
Download citation
DOI: https://doi.org/10.1007/978-3-031-21689-3_32
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-21688-6
Online ISBN: 978-3-031-21689-3
eBook Packages: Computer ScienceComputer Science (R0)