Skip to main content

Unsupervised Task Clustering for Multi-task Reinforcement Learning

  • Conference paper
  • First Online:
Machine Learning and Knowledge Discovery in Databases. Research Track (ECML PKDD 2021)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12975))

Abstract

Meta-learning, transfer learning and multi-task learning have recently laid a path towards more generally applicable reinforcement learning agents that are not limited to a single task. However, most existing approaches implicitly assume a uniform similarity between tasks. We argue that this assumption is limiting in settings where the relationship between tasks is unknown a-priori. In this work, we propose a general approach to automatically cluster together similar tasks during training. Our method, inspired by the expectation-maximization algorithm, succeeds at finding clusters of related tasks and uses these to improve sample complexity. We achieve this by designing an agent with multiple policies. In the expectation step, we evaluate the performance of the policies on all tasks and assign each task to the best performing policy. In the maximization step, each policy trains by sampling tasks from its assigned set. This method is intuitive, simple to implement and orthogonal to other multi-task learning algorithms. We show the generality of our approach by evaluating on simple discrete and continuous control tasks, as well as complex bipedal walker tasks and Atari games. Results show improvements in sample complexity as well as a more general applicability when compared to other approaches.

J. Ackermann and O. Richter—Equal contribution. Johannes Ackermann did his part while visiting ETH Zurich.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Note that assigning all tasks to a cluster that did not get any tasks assigned is only done for exploration. In the evaluation of our objective these clusters remain empty.

  2. 2.

    The Appendix and implementations of all our experiments can be found at https://github.com/JohannesAck/EMTaskClustering.

References

  1. Achiam, J., Edwards, H., Amodei, D., Abbeel, P.: Variational option discovery algorithms (2018). https://arxiv.org/abs/1807.10299

  2. Bräm, T., Brunner, G., Richter, O., Wattenhofer, R.: Attentive multi-task deep reinforcement learning. In: ECML PKDD (2019)

    Google Scholar 

  3. Brockman, G., et al.: Openai gym (2016). http://arxiv.org/abs/1606.01540

  4. Cacciatore, T.W., Nowlan, S.J.: Mixtures of controllers for jump linear and non-linear plants. In: NeurIPS (1993)

    Google Scholar 

  5. Carroll, J.L., Seppi, K.: Task similarity measures for transfer in reinforcement learning task libraries. In: IJCNN (2005)

    Google Scholar 

  6. Castro, P.S., Moitra, S., Gelada, C., Kumar, S., Bellemare, M.G.: Dopamine: a research framework for deep reinforcement learning (2018). http://arxiv.org/abs/1812.06110

  7. Cully, A., Demiris, Y.: Quality and diversity optimization: a unifying modular framework. IEEE Trans. Evol. Comput. 22, 245–259 (2018)

    Article  Google Scholar 

  8. Dabney, W., Ostrovski, G., Silver, D., Munos, R.: Implicit quantile networks for distributional reinforcement learning. In: ICML (2018)

    Google Scholar 

  9. Deisenroth, M.P., Neumann, G., Peter, J.: A survey on policy search for robotics. Found. Trends Robot. 2(1–2), 1–142 (2013)

    Google Scholar 

  10. Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Stat. Soc. Ser. B (Methodol.) 39(1), 1–38 (1977)

    MathSciNet  MATH  Google Scholar 

  11. Eramo, C.D., Tateo, D., Bonarini, A., Restelli, M., Milano, P., Peters, J.: Sharing knowledge in multi-task deep reinforcement learning. In: ICLR (2020)

    Google Scholar 

  12. Eysenbach, B., Gupta, A., Ibarz, J., Levine, S.: Diversity is all you need: learning skills without a reward function. In: ICLR (2018)

    Google Scholar 

  13. Fujimoto, S., van Hoof, H., Meger, D.: Addressing function approximation error in actor-critic methods. In: ICML (2018)

    Google Scholar 

  14. Hessel, M., Soyer, H., Espeholt, L., Czarnecki, W., Schmitt, S., Van Hasselt, H.: Multi-task deep reinforcement learning with PopArt. In: AAAI (2019)

    Google Scholar 

  15. Hospedales, T., Antoniou, A., Micaelli, P., Storkey, A.: Meta-learning in neural networks: a survey (2020). https://arxiv.org/abs/2004.05439

  16. Jacobs, R.A., Jordan, M.I., Nowlan, S.E., Hinton, G.E.: Adaptive mixture of experts. Neural Comput. 3, 79–87 (1991)

    Article  Google Scholar 

  17. Jacobs, R., Jordan, M.: A competitive modular connectionist architecture. In: NeurIPS (1990)

    Google Scholar 

  18. Jordan, M.I., Jacobs, R.A.: Hierarchical mixtures of experts and the EM algorithm. In: IJCNN (1993)

    Google Scholar 

  19. Lazaric, A.: Transfer in reinforcement learning: a framework and a survey. In: Wiering, M., van Otterlo, M. (eds.) Reinforcement Learning - State of the Art, vol. 12, pp. 143–173. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-27645-3_5

    Chapter  Google Scholar 

  20. Lazaric, A., Ghavamzadeh, M.: Bayesian multi-task reinforcement learning. In: ICML (2010)

    Google Scholar 

  21. Lee, K., Seo, Y., Lee, S., Lee, H., Shin, J.: Context-aware dynamics model for generalization in model-based reinforcement learning. In: ICML (2020)

    Google Scholar 

  22. Li, H., Liao, X., Carin, L.: Multi-task reinforcement learning in partially observable stochastic environments. J. Mach. Learn. Res. 10, 1131–1186 (2009)

    Google Scholar 

  23. Machado, M.C., Bellemare, M.G., Talvitie, E., Veness, J., Hausknecht, M.J., Bowling, M.: Revisiting the arcade learning environment: evaluation protocols and open problems for general agents. JAIR 61, 523–562 (2018)

    Article  MathSciNet  Google Scholar 

  24. MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Statistics, pp. 281–297 (1967)

    Google Scholar 

  25. Mahmud, M.M.H., Hawasly, M., Rosman, B., Ramamoorthy, S.: Clustering Markov decision processes for continual transfer (2013). http://arxiv.org/abs/1311.3959

  26. Meila, M., Jordan, M.I.: Learning fine motion by Markov mixtures of experts. In: NeurIPS (1995)

    Google Scholar 

  27. Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518, 529–533 (2015)

    Article  Google Scholar 

  28. Portelas, R., Colas, C., Hofmann, K., Oudeyer, P.Y.: Teacher algorithms for curriculum learning of deep RL in continuously parameterized environments. In: CoRL (2019)

    Google Scholar 

  29. Pugh, J.K., Soros, L.B., Stanley, K.O.: Quality diversity: a new frontier for evolutionary computation. Front. Robot. AI 3, 40 (2016)

    Article  Google Scholar 

  30. Riemer, M., Liu, M., Tesauro, G.: Learning abstract options. In: NeurIPS (2018)

    Google Scholar 

  31. Sharma, S., Jha, A.K., Hegde, P.S., Ravindran, B.: Learning to multi-task by active sampling. ICLR 2018 - Conference Track (2018)

    Google Scholar 

  32. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press (2017)

    Google Scholar 

  33. Tang, G., Hauser, K.: Discontinuity-sensitive optimal control learning by mixture of experts. In: ICRA (2019)

    Google Scholar 

  34. Thrun, S., O’Sullivan, J.: Discovering structure in multiple learning tasks : the TC algorithm. In: ICML (1996)

    Google Scholar 

  35. Wang, R., Lehman, J., Clune, J., Stanley, K.O.: Paired open-ended trailblazer (POET): endlessly generating increasingly complex and diverse learning environments and their solutions (2019). http://arxiv.org/abs/1901.01753

  36. Wang, R., et al.: Enhanced POET: open-ended reinforcement learning through unbounded invention of learning challenges and their solutions. In: ICML (2020)

    Google Scholar 

  37. Watkins, C.J.C.H.: Learning from delayed rewards. Ph.D. thesis, King’s College, Cambridge (1989)

    Google Scholar 

  38. Wilson, A., Fern, A., Ray, S., Tadepalli, P.: Multi-task reinforcement learning: a hierarchical Bayesian approach. In: ICML (2007)

    Google Scholar 

  39. Yang, J., Petersen, B., Zha, H., Faissol, D.: Single episode policy transfer in reinforcement learning. In: ICLR (2020)

    Google Scholar 

  40. Yu, T., Kumar, S., Gupta, A., Levine, S., Hausman, K., Finn, C.: Gradient surgery for multi-task learning (2020). http://arxiv.org/abs/2001.06782

  41. Zhang, Y., Yang, Q.: A survey on multi-task learning (2017). https://arxiv.org/abs/1707.08114

  42. Zhu, Z., Lin, K., Zhou, J.: Transfer learning in deep reinforcement learning: a survey (2020). http://arxiv.org/abs/2009.07888

  43. Zintgraf, L., et al.: VariBAD: a very good method for bayes-adaptive deep RL via meta-learning. In: ICLR (2020)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Johannes Ackermann or Oliver Richter .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ackermann, J., Richter, O., Wattenhofer, R. (2021). Unsupervised Task Clustering for Multi-task Reinforcement Learning. In: Oliver, N., Pérez-Cruz, F., Kramer, S., Read, J., Lozano, J.A. (eds) Machine Learning and Knowledge Discovery in Databases. Research Track. ECML PKDD 2021. Lecture Notes in Computer Science(), vol 12975. Springer, Cham. https://doi.org/10.1007/978-3-030-86486-6_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-86486-6_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-86485-9

  • Online ISBN: 978-3-030-86486-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics