Abstract
The goal of opponent modeling is to model the opponent policy to maximize the reward of the main agent. Most prior works fail to effectively handle scenarios where opponent information is limited. To this end, we propose a Limited Information Opponent Modeling (LIOM) approach that extracts opponent policy representations across episodes using only self-observations. LIOM introduces a novel policy-based data augmentation method that extracts opponent policy representations offline via contrastive learning and incorporates them as additional inputs for training a general response policy. During online testing, LIOM dynamically responds to opponent policies by extracting opponent policy representations from recent historical trajectory data and combining them with the general policy. Moreover, LIOM ensures a lower bound on expected rewards through a balance between conservative and exploitation. Experimental results demonstrate that LIOM is able to accurately extract opponent policy representations even when the opponent’s information is limited, and has a certain degree of generalization ability for unknown policies, outperforming existing opponent modeling algorithms.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: International Conference on Machine Learning, pp. 1597–1607. PMLR (2020)
Foerster, J.N., Chen, R.Y., Al-Shedivat, M., Whiteson, S., Abbeel, P., Mordatch, I.: Learning with opponent-learning awareness. arXiv preprint: arXiv:1709.04326 (2017)
Fu, H., et al.: Greedy when sure and conservative when uncertain about the opponents. In: International Conference on Machine Learning, pp. 6829–6848. PMLR (2022)
He, H., Boyd-Graber, J., Kwok, K., Daumé III, H.: Opponent modeling in deep reinforcement learning. In: International Conference on Machine Learning, pp. 1804–1813. PMLR (2016)
He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9729–9738 (2020)
Hernandez-Leal, P., Taylor, M.E., Rosman, B.S., Sucar, L.E., Munoz de Cote, E.: Identifying and tracking switching, non-stationary opponents: a Bayesian approach (2016)
Hong, Z.W., Su, S.Y., Shann, T.Y., Chang, Y.H., Lee, C.Y.: A deep policy inference q-network for multi-agent systems. arXiv preprint: arXiv:1712.07893 (2017)
Oord, A.V.D., Li, Y., Vinyals, O.: Representation learning with contrastive predictive coding. arXiv preprint: arXiv:1807.03748 (2018)
Raileanu, R., Denton, E., Szlam, A., Fergus, R.: Modeling others using oneself in multi-agent reinforcement learning. In: International Conference on Machine Learning, pp. 4257–4266. PMLR (2018)
Rosman, B., Hawasly, M., Ramamoorthy, S.: Bayesian policy reuse. Mach. Learn. 104, 99–127 (2016)
Yang, T., Meng, Z., Hao, J., Zhang, C., Zheng, Y., Zheng, Z.: Towards efficient detection and optimal response against sophisticated opponents. arXiv preprint: arXiv:1809.04240 (2018)
Zheng, Y., Meng, Z., Hao, J., Zhang, Z., Yang, T., Fan, C.: A deep Bayesian policy reuse approach against non-stationary agents. In: Advances in Neural Information Processing Systems, vol. 31 (2018)
Acknowledgements
This work is supported by the National Natural Science Foundation of China (Grant No.62106172), the “New Generation of Artificial Intelligence” Major Project of Science & Technology 2030 (Grant No.2022ZD0116402), and the Science and Technology on Information Systems Engineering Laboratory (Grant No.WDZC20235250409, No.WDZC20205250407).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Lv, Y., Yu, Y., Zheng, Y., Hao, J., Wen, Y., Yu, Y. (2023). Limited Information Opponent Modeling. In: Iliadis, L., Papaleonidas, A., Angelov, P., Jayne, C. (eds) Artificial Neural Networks and Machine Learning – ICANN 2023. ICANN 2023. Lecture Notes in Computer Science, vol 14261. Springer, Cham. https://doi.org/10.1007/978-3-031-44198-1_42
Download citation
DOI: https://doi.org/10.1007/978-3-031-44198-1_42
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-44197-4
Online ISBN: 978-3-031-44198-1
eBook Packages: Computer ScienceComputer Science (R0)