Skip to main content

Limited Information Opponent Modeling

  • Conference paper
  • First Online:
Artificial Neural Networks and Machine Learning – ICANN 2023 (ICANN 2023)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14261))

Included in the following conference series:

  • 632 Accesses

Abstract

The goal of opponent modeling is to model the opponent policy to maximize the reward of the main agent. Most prior works fail to effectively handle scenarios where opponent information is limited. To this end, we propose a Limited Information Opponent Modeling (LIOM) approach that extracts opponent policy representations across episodes using only self-observations. LIOM introduces a novel policy-based data augmentation method that extracts opponent policy representations offline via contrastive learning and incorporates them as additional inputs for training a general response policy. During online testing, LIOM dynamically responds to opponent policies by extracting opponent policy representations from recent historical trajectory data and combining them with the general policy. Moreover, LIOM ensures a lower bound on expected rewards through a balance between conservative and exploitation. Experimental results demonstrate that LIOM is able to accurately extract opponent policy representations even when the opponent’s information is limited, and has a certain degree of generalization ability for unknown policies, outperforming existing opponent modeling algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: International Conference on Machine Learning, pp. 1597–1607. PMLR (2020)

    Google Scholar 

  2. Foerster, J.N., Chen, R.Y., Al-Shedivat, M., Whiteson, S., Abbeel, P., Mordatch, I.: Learning with opponent-learning awareness. arXiv preprint: arXiv:1709.04326 (2017)

  3. Fu, H., et al.: Greedy when sure and conservative when uncertain about the opponents. In: International Conference on Machine Learning, pp. 6829–6848. PMLR (2022)

    Google Scholar 

  4. He, H., Boyd-Graber, J., Kwok, K., Daumé III, H.: Opponent modeling in deep reinforcement learning. In: International Conference on Machine Learning, pp. 1804–1813. PMLR (2016)

    Google Scholar 

  5. He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9729–9738 (2020)

    Google Scholar 

  6. Hernandez-Leal, P., Taylor, M.E., Rosman, B.S., Sucar, L.E., Munoz de Cote, E.: Identifying and tracking switching, non-stationary opponents: a Bayesian approach (2016)

    Google Scholar 

  7. Hong, Z.W., Su, S.Y., Shann, T.Y., Chang, Y.H., Lee, C.Y.: A deep policy inference q-network for multi-agent systems. arXiv preprint: arXiv:1712.07893 (2017)

  8. Oord, A.V.D., Li, Y., Vinyals, O.: Representation learning with contrastive predictive coding. arXiv preprint: arXiv:1807.03748 (2018)

  9. Raileanu, R., Denton, E., Szlam, A., Fergus, R.: Modeling others using oneself in multi-agent reinforcement learning. In: International Conference on Machine Learning, pp. 4257–4266. PMLR (2018)

    Google Scholar 

  10. Rosman, B., Hawasly, M., Ramamoorthy, S.: Bayesian policy reuse. Mach. Learn. 104, 99–127 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  11. Yang, T., Meng, Z., Hao, J., Zhang, C., Zheng, Y., Zheng, Z.: Towards efficient detection and optimal response against sophisticated opponents. arXiv preprint: arXiv:1809.04240 (2018)

  12. Zheng, Y., Meng, Z., Hao, J., Zhang, Z., Yang, T., Fan, C.: A deep Bayesian policy reuse approach against non-stationary agents. In: Advances in Neural Information Processing Systems, vol. 31 (2018)

    Google Scholar 

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China (Grant No.62106172), the “New Generation of Artificial Intelligence” Major Project of Science & Technology 2030 (Grant No.2022ZD0116402), and the Science and Technology on Information Systems Engineering Laboratory (Grant No.WDZC20235250409, No.WDZC20205250407).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jianye Hao .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Lv, Y., Yu, Y., Zheng, Y., Hao, J., Wen, Y., Yu, Y. (2023). Limited Information Opponent Modeling. In: Iliadis, L., Papaleonidas, A., Angelov, P., Jayne, C. (eds) Artificial Neural Networks and Machine Learning – ICANN 2023. ICANN 2023. Lecture Notes in Computer Science, vol 14261. Springer, Cham. https://doi.org/10.1007/978-3-031-44198-1_42

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-44198-1_42

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-44197-4

  • Online ISBN: 978-3-031-44198-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics