Application of DQN-IRL Framework in Doudizhu’s Sparse Reward

Kong, Yan; Shi, Hongyuan; Wu, Xiaocong; Rui, Yefeng

doi:10.1007/s11063-023-11209-0

Application of DQN-IRL Framework in Doudizhu’s Sparse Reward

Published: 20 March 2023

Volume 55, pages 9467–9482, (2023)
Cite this article

Neural Processing Letters Aims and scope Submit manuscript

Yan Kong¹,
Hongyuan Shi¹,
Xiaocong Wu¹ &
…
Yefeng Rui¹

171 Accesses
1 Altmetric
Explore all metrics

Abstract

When applying Artificial Intelligence into the traditional Chinese poker game Doudizhu, it is faced with many challenging issues resulted from the characteristics of Doudizhu. One of these challenging issues is the sparse reward, due to the truth that a valid feedback could be obtained only at the end of a round of the game. Against this, in this paper, a deep neural framework, DQN-IRL, is proposed to address the challenging issue of sparse reward in Doudizhu. The experimental results proves the efficiency of DQN-IRL (Inverse Reinforcement Learning) in terms of winning rate.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Q-based policy gradient optimization approach for Doudizhu

Article 16 November 2022

From mimic to counteract: a two-stage reinforcement learning algorithm for Google research football

Article 22 February 2024

Towards a Deep Reinforcement Learning Approach for Tower Line Wars

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Code availability

There are not available Code or data.

References

Silver D, Huang A, Maddison CJ et al (2016) Mastering the game of Go with deep neural networks and tree search. Nature 529(7587):484–489
Article Google Scholar
Silver D, Schrittwieser J, Simonyan K et al (2017) Mastering the game of go without human knowledge. Nature 550(7676):354–359
Article Google Scholar
Silver D, Hubert T, Schrittwieser J, et al. (2017) Mastering chess and shogi by self-play with a general reinforcement learning algorithm. arXiv preprint arXiv:1712.01815
Machado MC, Bellemare MG, Talvitie E et al (2018) Revisiting the arcade learning environment: evaluation protocols and open problems for general agents. J Artif Intell Res 61:523–562
Article MathSciNet MATH Google Scholar
Vinyals O, Ewalds T, Bartunov S, et al. (2017) Starcraft ii: A new challenge for reinforcement learning. arXiv preprint arXiv:1708.04782
Browne CB, Powley E, Whitehouse D et al (2012) A survey of monte Carlo tree search methods. IEEE Trans Comput Intell Ai in Games 4(1):1–43
Article Google Scholar
Brown N, Sandholm T (2019) Superhuman AI for multiplayer poker. Science 365(6456):885–890
Article MathSciNet MATH Google Scholar
Brown N, Sandholm T (2018) Superhuman AI for heads-up no-limit poker: libratus beats top professionals. Science 359(6374):418–424
Article MathSciNet MATH Google Scholar
Jiang Q, Li K, Du B, et al. (2019) DeltaDou: Expert-level Doudizhu AI through Self-play. IJCAI. pp 1265–1271.
You Y, Li L, Guo B, et al. (2019) Combinational Q-Learning for Dou Di Zhu[J]. arXiv preprint arXiv:1901.08925
Zha D, Xie J, Ma W, et al. (2021) DouZero: Mastering DouDizhu with Self-Play Deep Reinforcement Learning. arXiv preprint arXiv:2106.06135
Zhang X, Wang H, Stojanovic V, et al. (2021) Asynchronous Fault Detection for Interval Type-2 Fuzzy Nonhomogeneous Higher-level Markov Jump Systems with Uncertain Transition Probabilities. IEEE Trans Fuzzy Syst, pp 1–1
Zxa B, Xla B, Vs C (2021) Exponential stability of nonlinear state-dependent delayed impulsive systems with applications. Nonlinear Anal Hybrid Syst, 42
Xin X, Tu Y, Stojanovic V et al (2022) Online reinforcement learning multiplayer non-zero sum games of continuous-time Markov jump linear systems. Appl Math Comput 412(1–3):126537
MathSciNet MATH Google Scholar
Ng AY, Harada D, Russell S (1999) Policy invariance under reward transformations Theory and application to reward shaping. Morgan Kaufmann Publishers Inc., Burlington
Google Scholar
Jaderberg M, Mnih V, Czarnecki WM, et al. (2016) Reinforcement Learning with unsupervised auxiliary tasks
Li S, Wang R, Tang M, et al. (2019) Hierarchical reinforcement learning with advantage-based auxiliary rewards
Kulkarni TD, Narasimhan KR, Saeedi A, et al. (2016) Hierarchical deep reinforcement learning. Integr Temp Abstract Intrinsic Motiv
Parr R, Russell S (1998) Reinforcement Learning with Hierarchies of Machines. In: Conference on advances in neural information processing systems. MIT Press
Abbeel P, Ng AY (2011) Inverse reinforcement learning. In: Webb GI, Sammut C (eds) Encyclopedia of machine learning. Springer, Boston MA
Google Scholar
Mnih V, Kavukcuoglu K, Silver D, et al. (2013) Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602
Wu Z, Sun L, Zhan W et al (2020) Efficient sampling-based maximum entropy inverse reinforcement learning with application to autonomous driving. IEEE Robot Automation Lett 5(4):5355–5362
Article Google Scholar
Abbeel P, Ng AY (2004) Apprenticeshipship learning via inverse reinforcement learning. In: Proceedings of the twenty-first international conference on Machine learning. 1
Zha D, Lai K H, Cao Y, et al. (2019) Rlcard: A toolkit for reinforcement learning in card games. arXiv preprint arXiv:1910.04376
Zhang L, Chen Y, Wang W et al (2021) A monte carlo neural fictitious self-play approach to approximate Nash equilibrium in imperfect-information dynamic games. Front Comput Sci 15(5):1–14
Article Google Scholar
Cho K, Merrienboer BV, Gulcehre C, et al. (2014) Learning phrase representations using RNN encoder-decoder for statistical machine translation. Comput Sci
Wang Z, Freitas ND, Lanctot M (2015) Dueling network architectures for deep reinforcement learning. JMLR. https://doi.org/10.48550/arXiv.1511.06581
Zhang J, Li Y, Xiao W, et al. (2020) Non-iterative and fast deep learning: multilayer extreme learning machines. J Franklin Inst, 357(13)
Zhang J, Li Y, Xiao W, et al. (2020) Robust extreme learning machine for modeling with unknown noise. J Franklin Inst, 357(14)

Download references

Funding

The authors declare that no funds, grants, or other support were received during the preparation of this manuscript.

Author information

Authors and Affiliations

Nanjing University of Information Science and Technology, Nanjing, China
Yan Kong, Hongyuan Shi, Xiaocong Wu & Yefeng Rui

Authors

Yan Kong
View author publications
You can also search for this author inPubMed Google Scholar
Hongyuan Shi
View author publications
You can also search for this author inPubMed Google Scholar
Xiaocong Wu
View author publications
You can also search for this author inPubMed Google Scholar
Yefeng Rui
View author publications
You can also search for this author inPubMed Google Scholar

Contributions

All authors contributed to the study conception and design. Material preparation, data collection and analysis were performed by YK and HS. Code is modified by HS, XW and YR. The first draft of the manuscript was written by HS and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Yan Kong.

Ethics declarations

Conflict of interests

The authors have no relevant financial or non-financial interests to disclose.

Ethics Approval

This is an observational study. The XYZ Research Ethics Committee has confirmed that no ethical approval is required.

Consent to Participate

Informed consent was obtained from all individual participants included in the study.

Consent for Publication

The authors affirm that human research participants provided informed consent for publication of the images in the paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Kong, Y., Shi, H., Wu, X. et al. Application of DQN-IRL Framework in Doudizhu’s Sparse Reward. Neural Process Lett 55, 9467–9482 (2023). https://doi.org/10.1007/s11063-023-11209-0

Download citation

Accepted: 24 February 2023
Published: 20 March 2023
Issue Date: December 2023
DOI: https://doi.org/10.1007/s11063-023-11209-0

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Application of DQN-IRL Framework in Doudizhu’s Sparse Reward

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A Q-based policy gradient optimization approach for Doudizhu

From mimic to counteract: a two-stage reinforcement learning algorithm for Google research football

Towards a Deep Reinforcement Learning Approach for Tower Line Wars

Explore related subjects

Code availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interests

Ethics Approval

Consent to Participate

Consent for Publication

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now