Abstract
When applying Artificial Intelligence into the traditional Chinese poker game Doudizhu, it is faced with many challenging issues resulted from the characteristics of Doudizhu. One of these challenging issues is the sparse reward, due to the truth that a valid feedback could be obtained only at the end of a round of the game. Against this, in this paper, a deep neural framework, DQN-IRL, is proposed to address the challenging issue of sparse reward in Doudizhu. The experimental results proves the efficiency of DQN-IRL (Inverse Reinforcement Learning) in terms of winning rate.















Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Code availability
There are not available Code or data.
References
Silver D, Huang A, Maddison CJ et al (2016) Mastering the game of Go with deep neural networks and tree search. Nature 529(7587):484–489
Silver D, Schrittwieser J, Simonyan K et al (2017) Mastering the game of go without human knowledge. Nature 550(7676):354–359
Silver D, Hubert T, Schrittwieser J, et al. (2017) Mastering chess and shogi by self-play with a general reinforcement learning algorithm. arXiv preprint arXiv:1712.01815
Machado MC, Bellemare MG, Talvitie E et al (2018) Revisiting the arcade learning environment: evaluation protocols and open problems for general agents. J Artif Intell Res 61:523–562
Vinyals O, Ewalds T, Bartunov S, et al. (2017) Starcraft ii: A new challenge for reinforcement learning. arXiv preprint arXiv:1708.04782
Browne CB, Powley E, Whitehouse D et al (2012) A survey of monte Carlo tree search methods. IEEE Trans Comput Intell Ai in Games 4(1):1–43
Brown N, Sandholm T (2019) Superhuman AI for multiplayer poker. Science 365(6456):885–890
Brown N, Sandholm T (2018) Superhuman AI for heads-up no-limit poker: libratus beats top professionals. Science 359(6374):418–424
Jiang Q, Li K, Du B, et al. (2019) DeltaDou: Expert-level Doudizhu AI through Self-play. IJCAI. pp 1265–1271.
You Y, Li L, Guo B, et al. (2019) Combinational Q-Learning for Dou Di Zhu[J]. arXiv preprint arXiv:1901.08925
Zha D, Xie J, Ma W, et al. (2021) DouZero: Mastering DouDizhu with Self-Play Deep Reinforcement Learning. arXiv preprint arXiv:2106.06135
Zhang X, Wang H, Stojanovic V, et al. (2021) Asynchronous Fault Detection for Interval Type-2 Fuzzy Nonhomogeneous Higher-level Markov Jump Systems with Uncertain Transition Probabilities. IEEE Trans Fuzzy Syst, pp 1–1
Zxa B, Xla B, Vs C (2021) Exponential stability of nonlinear state-dependent delayed impulsive systems with applications. Nonlinear Anal Hybrid Syst, 42
Xin X, Tu Y, Stojanovic V et al (2022) Online reinforcement learning multiplayer non-zero sum games of continuous-time Markov jump linear systems. Appl Math Comput 412(1–3):126537
Ng AY, Harada D, Russell S (1999) Policy invariance under reward transformations Theory and application to reward shaping. Morgan Kaufmann Publishers Inc., Burlington
Jaderberg M, Mnih V, Czarnecki WM, et al. (2016) Reinforcement Learning with unsupervised auxiliary tasks
Li S, Wang R, Tang M, et al. (2019) Hierarchical reinforcement learning with advantage-based auxiliary rewards
Kulkarni TD, Narasimhan KR, Saeedi A, et al. (2016) Hierarchical deep reinforcement learning. Integr Temp Abstract Intrinsic Motiv
Parr R, Russell S (1998) Reinforcement Learning with Hierarchies of Machines. In: Conference on advances in neural information processing systems. MIT Press
Abbeel P, Ng AY (2011) Inverse reinforcement learning. In: Webb GI, Sammut C (eds) Encyclopedia of machine learning. Springer, Boston MA
Mnih V, Kavukcuoglu K, Silver D, et al. (2013) Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602
Wu Z, Sun L, Zhan W et al (2020) Efficient sampling-based maximum entropy inverse reinforcement learning with application to autonomous driving. IEEE Robot Automation Lett 5(4):5355–5362
Abbeel P, Ng AY (2004) Apprenticeshipship learning via inverse reinforcement learning. In: Proceedings of the twenty-first international conference on Machine learning. 1
Zha D, Lai K H, Cao Y, et al. (2019) Rlcard: A toolkit for reinforcement learning in card games. arXiv preprint arXiv:1910.04376
Zhang L, Chen Y, Wang W et al (2021) A monte carlo neural fictitious self-play approach to approximate Nash equilibrium in imperfect-information dynamic games. Front Comput Sci 15(5):1–14
Cho K, Merrienboer BV, Gulcehre C, et al. (2014) Learning phrase representations using RNN encoder-decoder for statistical machine translation. Comput Sci
Wang Z, Freitas ND, Lanctot M (2015) Dueling network architectures for deep reinforcement learning. JMLR. https://doi.org/10.48550/arXiv.1511.06581
Zhang J, Li Y, Xiao W, et al. (2020) Non-iterative and fast deep learning: multilayer extreme learning machines. J Franklin Inst, 357(13)
Zhang J, Li Y, Xiao W, et al. (2020) Robust extreme learning machine for modeling with unknown noise. J Franklin Inst, 357(14)
Funding
The authors declare that no funds, grants, or other support were received during the preparation of this manuscript.
Author information
Authors and Affiliations
Contributions
All authors contributed to the study conception and design. Material preparation, data collection and analysis were performed by YK and HS. Code is modified by HS, XW and YR. The first draft of the manuscript was written by HS and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Conflict of interests
The authors have no relevant financial or non-financial interests to disclose.
Ethics Approval
This is an observational study. The XYZ Research Ethics Committee has confirmed that no ethical approval is required.
Consent to Participate
Informed consent was obtained from all individual participants included in the study.
Consent for Publication
The authors affirm that human research participants provided informed consent for publication of the images in the paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Kong, Y., Shi, H., Wu, X. et al. Application of DQN-IRL Framework in Doudizhu’s Sparse Reward. Neural Process Lett 55, 9467–9482 (2023). https://doi.org/10.1007/s11063-023-11209-0
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11063-023-11209-0