Unsupervised Task Clustering for Multi-task Reinforcement Learning

Ackermann, Johannes; Richter, Oliver; Wattenhofer, Roger

doi:10.1007/978-3-030-86486-6_14

Johannes Ackermann¹³,
Oliver Richter¹⁴ &
Roger Wattenhofer¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12975))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

2403 Accesses
2 Citations

Abstract

Meta-learning, transfer learning and multi-task learning have recently laid a path towards more generally applicable reinforcement learning agents that are not limited to a single task. However, most existing approaches implicitly assume a uniform similarity between tasks. We argue that this assumption is limiting in settings where the relationship between tasks is unknown a-priori. In this work, we propose a general approach to automatically cluster together similar tasks during training. Our method, inspired by the expectation-maximization algorithm, succeeds at finding clusters of related tasks and uses these to improve sample complexity. We achieve this by designing an agent with multiple policies. In the expectation step, we evaluate the performance of the policies on all tasks and assign each task to the best performing policy. In the maximization step, each policy trains by sampling tasks from its assigned set. This method is intuitive, simple to implement and orthogonal to other multi-task learning algorithms. We show the generality of our approach by evaluating on simple discrete and continuous control tasks, as well as complex bipedal walker tasks and Atari games. Results show improvements in sample complexity as well as a more general applicability when compared to other approaches.

J. Ackermann and O. Richter—Equal contribution. Johannes Ackermann did his part while visiting ETH Zurich.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Note that assigning all tasks to a cluster that did not get any tasks assigned is only done for exploration. In the evaluation of our objective these clusters remain empty.
2.
The Appendix and implementations of all our experiments can be found at https://github.com/JohannesAck/EMTaskClustering.

References

Achiam, J., Edwards, H., Amodei, D., Abbeel, P.: Variational option discovery algorithms (2018). https://arxiv.org/abs/1807.10299
Bräm, T., Brunner, G., Richter, O., Wattenhofer, R.: Attentive multi-task deep reinforcement learning. In: ECML PKDD (2019)
Google Scholar
Brockman, G., et al.: Openai gym (2016). http://arxiv.org/abs/1606.01540
Cacciatore, T.W., Nowlan, S.J.: Mixtures of controllers for jump linear and non-linear plants. In: NeurIPS (1993)
Google Scholar
Carroll, J.L., Seppi, K.: Task similarity measures for transfer in reinforcement learning task libraries. In: IJCNN (2005)
Google Scholar
Castro, P.S., Moitra, S., Gelada, C., Kumar, S., Bellemare, M.G.: Dopamine: a research framework for deep reinforcement learning (2018). http://arxiv.org/abs/1812.06110
Cully, A., Demiris, Y.: Quality and diversity optimization: a unifying modular framework. IEEE Trans. Evol. Comput. 22, 245–259 (2018)
Article Google Scholar
Dabney, W., Ostrovski, G., Silver, D., Munos, R.: Implicit quantile networks for distributional reinforcement learning. In: ICML (2018)
Google Scholar
Deisenroth, M.P., Neumann, G., Peter, J.: A survey on policy search for robotics. Found. Trends Robot. 2(1–2), 1–142 (2013)
Google Scholar
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Stat. Soc. Ser. B (Methodol.) 39(1), 1–38 (1977)
MathSciNet MATH Google Scholar
Eramo, C.D., Tateo, D., Bonarini, A., Restelli, M., Milano, P., Peters, J.: Sharing knowledge in multi-task deep reinforcement learning. In: ICLR (2020)
Google Scholar
Eysenbach, B., Gupta, A., Ibarz, J., Levine, S.: Diversity is all you need: learning skills without a reward function. In: ICLR (2018)
Google Scholar
Fujimoto, S., van Hoof, H., Meger, D.: Addressing function approximation error in actor-critic methods. In: ICML (2018)
Google Scholar
Hessel, M., Soyer, H., Espeholt, L., Czarnecki, W., Schmitt, S., Van Hasselt, H.: Multi-task deep reinforcement learning with PopArt. In: AAAI (2019)
Google Scholar
Hospedales, T., Antoniou, A., Micaelli, P., Storkey, A.: Meta-learning in neural networks: a survey (2020). https://arxiv.org/abs/2004.05439
Jacobs, R.A., Jordan, M.I., Nowlan, S.E., Hinton, G.E.: Adaptive mixture of experts. Neural Comput. 3, 79–87 (1991)
Article Google Scholar
Jacobs, R., Jordan, M.: A competitive modular connectionist architecture. In: NeurIPS (1990)
Google Scholar
Jordan, M.I., Jacobs, R.A.: Hierarchical mixtures of experts and the EM algorithm. In: IJCNN (1993)
Google Scholar
Lazaric, A.: Transfer in reinforcement learning: a framework and a survey. In: Wiering, M., van Otterlo, M. (eds.) Reinforcement Learning - State of the Art, vol. 12, pp. 143–173. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-27645-3_5
Chapter Google Scholar
Lazaric, A., Ghavamzadeh, M.: Bayesian multi-task reinforcement learning. In: ICML (2010)
Google Scholar
Lee, K., Seo, Y., Lee, S., Lee, H., Shin, J.: Context-aware dynamics model for generalization in model-based reinforcement learning. In: ICML (2020)
Google Scholar
Li, H., Liao, X., Carin, L.: Multi-task reinforcement learning in partially observable stochastic environments. J. Mach. Learn. Res. 10, 1131–1186 (2009)
Google Scholar
Machado, M.C., Bellemare, M.G., Talvitie, E., Veness, J., Hausknecht, M.J., Bowling, M.: Revisiting the arcade learning environment: evaluation protocols and open problems for general agents. JAIR 61, 523–562 (2018)
Article MathSciNet Google Scholar
MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Statistics, pp. 281–297 (1967)
Google Scholar
Mahmud, M.M.H., Hawasly, M., Rosman, B., Ramamoorthy, S.: Clustering Markov decision processes for continual transfer (2013). http://arxiv.org/abs/1311.3959
Meila, M., Jordan, M.I.: Learning fine motion by Markov mixtures of experts. In: NeurIPS (1995)
Google Scholar
Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518, 529–533 (2015)
Article Google Scholar
Portelas, R., Colas, C., Hofmann, K., Oudeyer, P.Y.: Teacher algorithms for curriculum learning of deep RL in continuously parameterized environments. In: CoRL (2019)
Google Scholar
Pugh, J.K., Soros, L.B., Stanley, K.O.: Quality diversity: a new frontier for evolutionary computation. Front. Robot. AI 3, 40 (2016)
Article Google Scholar
Riemer, M., Liu, M., Tesauro, G.: Learning abstract options. In: NeurIPS (2018)
Google Scholar
Sharma, S., Jha, A.K., Hegde, P.S., Ravindran, B.: Learning to multi-task by active sampling. ICLR 2018 - Conference Track (2018)
Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press (2017)
Google Scholar
Tang, G., Hauser, K.: Discontinuity-sensitive optimal control learning by mixture of experts. In: ICRA (2019)
Google Scholar
Thrun, S., O’Sullivan, J.: Discovering structure in multiple learning tasks : the TC algorithm. In: ICML (1996)
Google Scholar
Wang, R., Lehman, J., Clune, J., Stanley, K.O.: Paired open-ended trailblazer (POET): endlessly generating increasingly complex and diverse learning environments and their solutions (2019). http://arxiv.org/abs/1901.01753
Wang, R., et al.: Enhanced POET: open-ended reinforcement learning through unbounded invention of learning challenges and their solutions. In: ICML (2020)
Google Scholar
Watkins, C.J.C.H.: Learning from delayed rewards. Ph.D. thesis, King’s College, Cambridge (1989)
Google Scholar
Wilson, A., Fern, A., Ray, S., Tadepalli, P.: Multi-task reinforcement learning: a hierarchical Bayesian approach. In: ICML (2007)
Google Scholar
Yang, J., Petersen, B., Zha, H., Faissol, D.: Single episode policy transfer in reinforcement learning. In: ICLR (2020)
Google Scholar
Yu, T., Kumar, S., Gupta, A., Levine, S., Hausman, K., Finn, C.: Gradient surgery for multi-task learning (2020). http://arxiv.org/abs/2001.06782
Zhang, Y., Yang, Q.: A survey on multi-task learning (2017). https://arxiv.org/abs/1707.08114
Zhu, Z., Lin, K., Zhou, J.: Transfer learning in deep reinforcement learning: a survey (2020). http://arxiv.org/abs/2009.07888
Zintgraf, L., et al.: VariBAD: a very good method for bayes-adaptive deep RL via meta-learning. In: ICLR (2020)
Google Scholar

Download references

Author information

Authors and Affiliations

Technical University of Munich, Munich, Germany
Johannes Ackermann
ETH Zurich, Zürich, Switzerland
Oliver Richter & Roger Wattenhofer

Authors

Johannes Ackermann
View author publications
You can also search for this author in PubMed Google Scholar
Oliver Richter
View author publications
You can also search for this author in PubMed Google Scholar
Roger Wattenhofer
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Johannes Ackermann or Oliver Richter .

Editor information

Editors and Affiliations

ELLIS - The European Laboratory for Learning and Intelligent Systems, Alicante, Spain
Nuria Oliver
ETHZ and EPFL, Zürich, Switzerland
Fernando Pérez-Cruz
Johannes Gutenberg University of Mainz, Mainz, Germany
Stefan Kramer
École Polytechnique, Palaiseau, France
Jesse Read
Basque Center for Applied Mathematics, Bilbao, Spain
Jose A. Lozano

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ackermann, J., Richter, O., Wattenhofer, R. (2021). Unsupervised Task Clustering for Multi-task Reinforcement Learning. In: Oliver, N., Pérez-Cruz, F., Kramer, S., Read, J., Lozano, J.A. (eds) Machine Learning and Knowledge Discovery in Databases. Research Track. ECML PKDD 2021. Lecture Notes in Computer Science(), vol 12975. Springer, Cham. https://doi.org/10.1007/978-3-030-86486-6_14

Download citation

DOI: https://doi.org/10.1007/978-3-030-86486-6_14
Published: 10 September 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86485-9
Online ISBN: 978-3-030-86486-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the ECML PKDD community (opens in a new tab)

Unsupervised Task Clustering for Multi-task Reinforcement Learning