Skip to main content
Log in

A Q-based policy gradient optimization approach for Doudizhu

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Deep reinforcement learning (DRL) has recently been employed in various games, with which superhuman intelligence has been achieved, including Atari, Go, no-limit, and Texas hold’em. However, this technique has not been fully considered for Doudizhu which is a popular poker game in Asia and involves confrontation and cooperation among multiple players with imperfect information. In this paper we present a new deep reinforcement learning approach NV-Dou for the game Doudizhu. It adopts a variant of neural fictitious self-play to approximate the Nash equilibria of the game. The loss functions of the neural network integrate Q-Based policy gradient (mean actor critic) with advantage learning and proximal policy optimization. In addition, parametric noises are adopted for the fully connected layers in the neural network. The experimental results show that it needs only a few hours of training and achieves almost state-of-the-art performance comparing with the well-known open implementations RHCP, CQL, MCTS and others for Doudizhu.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Algorithm 1
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. https://ninesun.blog.csdn.net/article/details/70787814

  2. http://rlcard.org/games.html#action-abstraction-of-dou-dizhu

  3. https://en.wikipedia.org/wiki/Dou_dizhu

  4. https://github.com/qq456cvb/doudizhu-C

  5. https://github.com/qq45cvb/doudizhu-C

  6. http://rlcard.org/games.html#action-abstraction-of-dou-dizhu

  7. https://github.com/qq-ship/NV-Dou

  8. https://ninesun.blog.csdn.net/article/details/70787814

References

  1. Alvarado M, Rendón AY (2012) Nash equilibrium for collective strategic reasoning. Expert Syst Appl 39(15):12014–12025

    Article  Google Scholar 

  2. Asadi K, Allen C, Roderick M, Mohamed A-R, Konidaris GD, Littman ML (2017) Mean actor critic. arXiv:1709.00503

  3. Azar OH, Bar-Eli M (2011) Do soccer players play the mixed-strategy nash equilibrium? Appl Econ 43(25):3591–3601

    Article  Google Scholar 

  4. Babaeizadeh M, Frosio I, Tyree S, Clemons J, Kautz J (2017) Reinforcement learning through asynchronous advantage actor-critic on a GPU. In: 5th international conference on learning representations, ICLR 2017, Toulon, April 24–26. Conference track proceedings. OpenReview.net

  5. Bowling M, Burch N, Johanson M, Tammelin O (2015) Heads-up limit hold’em poker is solved. Science 347(6218):145–149

    Article  Google Scholar 

  6. Brown N, Lerer A, Gross S, Sandholm T (2019) Deep counterfactual regret minimization. In: Chaudhuri K, Salakhutdinov R (eds) Proceedings of the 36th international conference on machine learning, ICM. 9–15 June 2019. Long Beach, California, vol 97 of Proceedings of machine learning research, pp 793–802. PMLR

  7. Brown N, Sandholm T (2018) Superhuman ai for heads-up no-limit poker: libratus beats top professionals. Science 359(6374):418–424

    Article  MathSciNet  MATH  Google Scholar 

  8. Brown N, Sandholm T (2019) Superhuman ai for multiplayer poker. Science 365(6456):885–890

    Article  MathSciNet  MATH  Google Scholar 

  9. Cowling PI, Powley EJ, Whitehouse D (2012) Information set Monte Carlo tree search. IEEE Trans Comput Intell AI Games 4(2):120–143

    Article  Google Scholar 

  10. Deng X, Wang Z, Qi L, Deng Y, Mahadevan S (2014) A belief-based evolutionarily stable strategy. J Theor Biol 361:81–86

    Article  MATH  Google Scholar 

  11. Fortunato M, Azar MG, Piot B, Menick J, Hessel M, Osband I, Graves A, Mnih V, Munos R, Hassabis D, Pietquin O, Blundell C, Legg S (2018) Noisy networks for exploration. In: 6th international conference on learning representations, ICLR 2018, Vancouver, BC, April 30–May 3 2018, conference track proceedings. OpenReview.net

  12. Gao Y, Li W, Khalid MNA, Iida H (2020) Quantifying attractiveness of incomplete-information multi-player game: case study using doudizhu. In: Computational science and technology. Springer, pp 301–310

  13. Gao Y, Li W, Xiao Y, Khalid MNA, Iida H (2020) Nature of attractive multiplayer games: case study on China’s most popular card game—doudizhu. Information 11(3):141

    Article  Google Scholar 

  14. Heinrich J, Silver D (2015) Smooth UCT search in computer poker. In: Yang Q, Wooldridge MJ (eds) Proceedings of the twenty-fourth international joint conference on artificial intelligence, IJCAI, Buenos Aires, Argentina, July 25–31, 2015. AAAI Press, pp 554–560

  15. Jiang Q, Li K, Du B, Chen H, Fang H (2019) Deltadou: expert-level doudizhu AI through self-play. In: Kraus S (ed) Proceedings of the twenty-eighth international joint conference on artificial intelligence, IJCAI 2019, Macao, China, August 10–16, 2019, pp 1265–1271. ijcai.org

  16. Kawamura K, Mizukami N, Tsuruoka Y (2017) Neural fictitious self-play in imperfect information games with many players. In: Cazenave T, Winands MHM, Saffidine A (eds) Computer games - 6th workshop, CGW 2017, held in conjunction with the 26th international conference on artificial intelligence, IJCAI 2017, Melbourne, VIC, Australia, August, 20, 2017, Revised selected papers, vol 818 of Communications in computer and information science. Springer, pp 61–74

  17. Knuth DE (2000) Dancing links. arXiv:cs/0011047

  18. Lanctot M, Waugh K, Zinkevich M, Bowling MH (2009) Monte Carlo sampling for regret minimization in extensive games. In: Bengio Y, Schuurmans D, Lafferty JD, Williams CKI, Culotta A (eds) Advances in neural information processing systems 22: 23rd annual conference on neural information processing systems 2009. Proceedings of a meeting held 7–10 December 2009, Vancouver, British Columbia, Canada. Curran Associates, Inc., pp 1078–1086

  19. Li P, Bing L, Lam W (2018) Actor-critic based training framework for abstractive summarization. arXiv:1803.11070

  20. Li S, Li S, Cao H, Meng K, Ding M (2020) Study on the strategy of playing doudizhu game based on multirole modeling. Complexity 2020

  21. Li S, Wu R, Bo J (2019) Study on the play strategy of dou dizhu poker based on convolution neural network. In: 2019 IEEE international conferences on ubiquitous computing & communications (IUCC) and data science and computational intelligence (DSCI) and smart computing, networking and services (SmartCNS). IEEE, pp 702–707

  22. Mnih V, Badia AP, Mirza M, Graves A, Lillicrap TP, Harley T, Silver D, Kavukcuoglu K (2016) Asynchronous methods for deep reinforcement learning. In: Balcan M-F, Weinberger KQ (eds) Proceedings of the 33rd international conference on machine learning, ICML 2016, New York City, NY, USA, June 19–24, 2016, vol 48 of JMLR workshop and conference proceedings, pp 1928–1937, JMLR.org

  23. Moravčík M, Schmid M, Burch N, Lisỳ V, Morrill D, Bard N, Davis T, Waugh K, Johanson M, Bowling M (2017) Deepstack: expert-level artificial intelligence in heads-up no-limit poker. Science 356(6337):508–513

    Article  MathSciNet  MATH  Google Scholar 

  24. Powley EJ, Whitehouse D, Cowling PI (2011) Determinization in Monte-Carlo tree search for the card game dou di zhu. Proc Artif Intell Simul Behav:17–24

  25. Schofield N, Sened I (2002) Local nash equilibrium in multiparty politics. Ann Oper Res 109 (1):193–211

    Article  MathSciNet  MATH  Google Scholar 

  26. Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O (2017) Proximal policy optimization algorithms. arXiv:1707.06347

  27. Silver D, Huang A, Maddison CJ, Guez A, Sifre L, van den Driessche G, Schrittwieser J, Antonoglou I, Panneershelvam V, Lanctot M, Dieleman S, Grewe D, Nham J, Kalchbrenner N, Sutskever I, Lillicrap TP, Leach M, Kavukcuoglu K, Graepel T, Hassabis D (2016) Mastering the game of go with deep neural networks and tree search. Nature 529(7587):484–489

    Article  Google Scholar 

  28. Silver D, Hubert T, Schrittwieser J, Antonoglou I, Lai M, Guez A, Lanctot M, Sifre L, Kumaran D, Graepel T et al (2018) A general reinforcement learning algorithm that masters chess, shogi, and go through self-play. Science 362(6419):1140–1144

    Article  MathSciNet  MATH  Google Scholar 

  29. Silver D, Schrittwieser J, Simonyan K, Antonoglou I, Huang A, Guez A, Hubert T, Baker L, Lai M, Bolton A, Chen Y, Lillicrap TP, Hui F, Sifre L, van den Driessche G, Graepel T, Hassabis D (2017) Mastering the game of go without human knowledge. Nature 550(7676):354–359

    Article  Google Scholar 

  30. Tan G, Wei P, He Y, Xu H, Shi X (2021) Solving the playing strategy of dou dizhu using convolutional neural network: a residual learning approach. J Comput Methods Sci Eng 21(1):3–18

    Google Scholar 

  31. Wang Z, Schaul T, Hessel M, van Hasselt H, Lanctot M, de Freitas N (2016) Dueling network architectures for deep reinforcement learning. In: Balcan M-F, Weinberger KQ (eds) Proceedings of the 33rd international conference on machine learning, ICML 2016, New York City, NY, USA, June 19–24, 2016, vol 48 of JMLR workshop and conference proceedings, pp 1995–2003, JMLR.org

  32. Yee A, Rodríguez R, Alvarado M (2014) Analysis of strategies in American football using nash equilibrium. In: International conference on artificial intelligence: methodology, systems, and applications. Springer, pp 286–294

  33. You Y, Li L, Guo B, Wang W, Lu C (2020) Combinatorial q-learning for dou di zhu. In: Proceedings of the sixteenth AAAI conference on artificial intelligence and interactive digital entertainment, AIIDE’20. AAAI Press

  34. Zha D, Lai K-H, Cao Y, Huang S, Wei R, Guo J, Hu X (2019) Rlcard: a toolkit for reinforcement learning in card games. arXiv:1910.04376

Download references

Acknowledgements

We kindly thank all the anonymous reviewers whose comments improved this work to a great extent. This work is supported by the National Natural Science Foundation of China under Grants 61976065 and U1836205, Guizhou Science and Technology Foundation under Grant Qiankehejichu[2020]1Y420 and Guizhou Science Support Project (No. 2022-259).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yisong Wang.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

Table 2 The action categories in Doudizhu
Table 3 Game records played by NV-Dou and RHCP
Table 4 Game records played by RHCP and NV-Dou
Table 5 Game records played by RHCP and NV-Dou

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yu, X., Wang, Y., Qin, J. et al. A Q-based policy gradient optimization approach for Doudizhu. Appl Intell 53, 15372–15389 (2023). https://doi.org/10.1007/s10489-022-04281-x

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-022-04281-x

Keywords

Navigation