Skip to main content

MA-TREX: Mutli-agent Trajectory-Ranked Reward Extrapolation via Inverse Reinforcement Learning

  • Conference paper
  • First Online:
Knowledge Science, Engineering and Management (KSEM 2020)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12275))

Abstract

Trajectory-ranked reward extrapolation (T-REX) provides a general framework to infer users’ intentions from sub-optimal demonstrations. However, it becomes inflexible when encountering multi-agent scenarios, due to its high complexity caused by rational behaviors, e.g., cooperation and communication. In this paper, we propose a novel Multi-Agent Trajectory-ranked Reward EXtrapolation framework (MA-TREX), which adopts inverse reinforcement learning to infer demonstrators’ cooperative intention in the environment with high-dimensional state-action space. Specifically, to reduce the dependence on demonstrators, the MA-TREX uses self-generated demonstrations to iteratively extrapolate the reward function. Moreover, a knowledge transfer method is adopted in the iteration process, by which the self-generated data required subsequently is only one third of the initial demonstrations. Experimental results on several multi-agent collaborative tasks demonstrate that the MA-TREX can effectively surpass the demonstrators and obtain the same level reward as the ground truth quickly and stably.

This work was supported in part by National Natural Science Foundation of China under grants 61876069, 61572226, 61902145, Jilin Province Key Scientific and Technological Research and Development project under grants 20180201067GX and 20180201044GX.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Abbeel, P., Ng, A.Y.: Apprenticeship learning via inverse reinforcement learning. In: Proceedings of the Twenty-first International Conference, Machine Learning, Canada, vol. 69 (2004)

    Google Scholar 

  2. Amodei, D., Olah, C., Steinhardt, J., Christiano, P.F., Schulman, J., Mané, D.: Concrete problems in AI safety. CoRR abs/1606.06565 (2016)

    Google Scholar 

  3. Argall, B.D., Chernova, S., Veloso, M.M., Browning, B.: A survey of robot learning from demonstration. Robot. Auton. Syst. 57(5), 469–483 (2009)

    Article  Google Scholar 

  4. Bradley, R.A., Terry, M.E.: Rank analysis of incomplete block designs: I. The method of paired comparisons. Biometrika 39(3–4), 324–345 (1952)

    MathSciNet  MATH  Google Scholar 

  5. Brown, D.S., Goo, W., Nagarajan, P., Niekum, S.: Extrapolating beyond suboptimal demonstrations via inverse reinforcement learning from observations. In: Proceedings of the 36th International Conference on Machine Learning, USA, vol. 97, pp. 783–792 (2019)

    Google Scholar 

  6. Christiano, P.F., Leike, J., Brown, T.B., Martic, M., Legg, S., Amodei, D.: Deep reinforcement learning from human preferences. In: Advances in Neural Information Processing Systems 2017, USA, pp. 4299–4307 (2017)

    Google Scholar 

  7. Finn, C., Christiano, P.F., Abbeel, P., Levine, S.: A connection between generative adversarial networks, inverse reinforcement learning, and energy-based models. CoRR abs/1611.03852 (2016)

    Google Scholar 

  8. Finn, C., Levine, S., Abbeel, P.: Guided cost learning: deep inverse optimal control via policy optimization. In: Proceedings of the 33nd International Conference on Machine Learning, USA, vol. 48, pp. 49–58 (2016)

    Google Scholar 

  9. Henderson, P., Chang, W., Bacon, P., Meger, D., Pineau, J., Precup, D.: OptionGAN: learning joint reward-policy options using generative adversarial inverse reinforcement learning. In: Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, USA, pp. 3199–3206 (2018)

    Google Scholar 

  10. Ho, J., Ermon, S.: Generative adversarial imitation learning. In: Advances in Neural Information Processing Systems 2016, Spain, pp. 4565–4573 (2016)

    Google Scholar 

  11. Hu, J., Wellman, M.P.: Multiagent reinforcement learning: theoretical framework and an algorithm. In: Proceedings of the Fifteenth International Conference on Machine Learning USA, pp. 242–250 (1998)

    Google Scholar 

  12. Ibarz, B., Leike, J., Pohlen, T., Irving, G., Legg, S., Amodei, D.: Reward learning from human preferences and demonstrations in Atari. In: Advances in Neural Information Processing Systems 2018, Canada, pp. 8022–8034 (2018)

    Google Scholar 

  13. Littman, M.L.: Markov games as a framework for multi-agent reinforcement learning (1994)

    Google Scholar 

  14. Lowe, R., Wu, Y., Tamar, A., Harb, J., Abbeel, P., Mordatch, I.: Multi-agent actor-critic for mixed cooperative-competitive environments. In: Advances in Neural Information Processing Systems 2017, USA, pp. 6379–6390 (2017)

    Google Scholar 

  15. Luce, D.R.: Individual choice behavior: a theoretical analysis. J. Am. Stat. Assoc. 115(293), 1–15 (2005)

    MATH  Google Scholar 

  16. Ng, A.Y., Harada, D., Russell, S.J.: Policy invariance under reward transformations: theory and application to reward shaping. In: Proceedings of the Sixteenth International Conference on Machine Learning, Slovenia, pp. 278–287 (1999)

    Google Scholar 

  17. Ramachandran, D., Amir, E.: Bayesian inverse reinforcement learning. In: Proceedings of the 20th International Joint Conference on Artificial Intelligence, India, pp. 2586–2591 (2007)

    Google Scholar 

  18. Song, J., Ren, H., Sadigh, D., Ermon, S.: Multi-agent generative adversarial imitation learning. In: Advances in Neural Information Processing Systems 2018, Canada, pp. 7472–7483 (2018)

    Google Scholar 

  19. Torabi, F., Warnell, G., Stone, P.: Behavioral cloning from observation. In: Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, Sweden, pp. 4950–4957 (2018)

    Google Scholar 

  20. Wang, S., Hu, X., Yu, P.S., Li, Z.: MMRate: inferring multi-aspect diffusion networks with multi-pattern cascades. In: The 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, USA, pp. 1246–1255 (2014)

    Google Scholar 

  21. Yu, L., Song, J., Ermon, S.: Multi-agent adversarial inverse reinforcement learning. In: Proceedings of the 36th International Conference on Machine Learning, California, vol. 97, pp. 7194–7201 (2019)

    Google Scholar 

  22. Ziebart, B.D., Maas, A.L., Bagnell, J.A., Dey, A.K.: Maximum entropy inverse reinforcement learning. In: Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence, USA, pp. 1433–1438 (2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bo Yang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Huang, S., Yang, B., Chen, H., Piao, H., Sun, Z., Chang, Y. (2020). MA-TREX: Mutli-agent Trajectory-Ranked Reward Extrapolation via Inverse Reinforcement Learning. In: Li, G., Shen, H., Yuan, Y., Wang, X., Liu, H., Zhao, X. (eds) Knowledge Science, Engineering and Management. KSEM 2020. Lecture Notes in Computer Science(), vol 12275. Springer, Cham. https://doi.org/10.1007/978-3-030-55393-7_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-55393-7_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-55392-0

  • Online ISBN: 978-3-030-55393-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics