Skip to main content
Log in

A black-box model for predicting difficulty of word puzzle games: a case study of Wordle

  • Regular paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

The popular word-filling game Wordle has gained widespread attention since its release in 2022. Much attention has been paid to find the optimal strategy. However, this article proposes a black-box prediction model that can accurately predict the difficulty level of words in the game to find the deep rules in the game data. In this work, we scientifically established a black-box model for game difficulty prediction. We achieve high accuracy in new datasets and show strong stability in similar tasks. The black-box model is divided into the game input content feature extraction model and the game output content rule extraction model. This research scientifically and effectively extracts word attributes, including word frequency, letter frequency, part of speech, times of letter repetitions, and word meaning score from the input content. Then it reduces the seven kinds of proportion of people in different tries in output content into two indices using the Critic method. Finally, it establishes a gradient boosting decision tree-based multiple regression model, making the final prediction accuracy of difficulty level for new words reach 95%. It is believed that the black-box prediction model can provide valuable insights for game designers and developers. And the research provides an innovative method to predict and understand user behavior in online games, contributing to the broader field of data science. The integration of data-driven methodologies in the gaming industry opens new possibilities for understanding player interactions and further enhancing game development strategies.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Anderson BJ, Meyer JG (2022) Finding the optimal human strategy for wordle using maximum correct letter probabilities and reinforcement learning. CoRR arXiv:2202.00557

  2. Bakkes S, Tan CT, Pisan Y (2012) Personalised gaming: a motivation and overview of literature. In: Proceedings of the 8th Australasian conference on interactive entertainment: playing the system, pp 1–10

  3. Basu A, Garain A, Naskar SK (2019) Word difficulty prediction using convolutional neural networks. In: TENCON 2019-2019 IEEE region 10 conference (TENCON). IEEE, pp 1109–1112

  4. Bilal A, Mirza HT, Hussain I (2023) Identifying significant textual features in titles of google play store applications and their influence on user review rating. Knowl Inf Syst 65(3):1159–1178. https://doi.org/10.1007/s10115-022-01799-x

    Article  Google Scholar 

  5. Bonthron M (2022) Rank one approximation as a strategy for Wordle. arXiv e-prints

  6. Brysbaert M, New B (2009) Moving beyond Kučera and Francis: a critical evaluation of current word frequency norms and the introduction of a new and improved word frequency measure for american english. Behav Res Methods 41(4):977–990

    Article  PubMed  Google Scholar 

  7. Canossa A, Drachen A, Yannakakis GN (2016) Modeling player experience for content creation and game customization. IEEE Trans Games 8(3):94–102

    Google Scholar 

  8. Carmona P, Climent F, Momparler A (2019) Predicting failure in the us banking sector: an extreme gradient boosting approach. Int Rev Econ Finance 61:304–323

    Article  Google Scholar 

  9. Chen C, Wei L, Zhang J et al (2021) Deep gradient boosting and its application in clinical data analysis. IEEE J Biomed Health Inform 25(2):459–468

    CAS  Google Scholar 

  10. Chen KT, Kao CY, Liu CF et al (2018) Predicting player churn in mobile games using machine learning techniques. J Intell Inf Syst 51(2):221–237

    Google Scholar 

  11. Chen T, Guestrin C (2016) Xgboost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 785–794

  12. Csikszentmihalyi M (1990) Flow: the psychology of optimal experience. Harper & Row, New York

    Google Scholar 

  13. Davies M (2008) The corpus of contemporary American English: 520 million words, 1990-present. Int J Corpus Linguist 13(4):511–521

    Google Scholar 

  14. Diakoulaki D, Mavrotas G, Papayannakis L (1995) Determining objective weights in multiple criteria problems: the critic method. Comput Oper Res 22(7):763–770

    Article  Google Scholar 

  15. Elnasr MS, Drachen A, Canossa A (2013) Game analytics: Maximizing the value of player data. Springer, London

    Google Scholar 

  16. Fields T, Cotton B, Marques G (2011) Social game design: monetization methods and mechanics. CRC Press, Cambridge

    Book  Google Scholar 

  17. Flunger R, Mladenow A, Strauss C (2019) Game analytics on free to play. Springer, Berlin, pp 133–141. https://doi.org/10.1007/978-3-030-27355-2_10

    Book  Google Scholar 

  18. Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Ann Stat 29(5):1189–1232

    Article  MathSciNet  Google Scholar 

  19. Hadiji F, Sifa R, Thurau C, et al (2014) Predicting player churn in the wild. In: IEEE conference on computational intelligence and games

  20. Haley J, Wearnc A, Copland C, et al (2020) Cluster analysis of deep embeddings in real-time strategy games. In: Artificial intelligence and machine learning for multi-domain operations applications conference

  21. Hamari J, Alha K, Järvelä S et al (2017) Why do players buy in-game content? An empirical study on concrete purchase motivations. Comput Human Behav 68:538–546. https://doi.org/10.1016/j.chb.2016.11.045

    Article  Google Scholar 

  22. Hassan MA, Shafiq M, Ahmad S et al (2021) The impact of climate change on wheat productivity in Pakistan: a spearman correlation analysis. Environ Sci Pollut Res 28(18):22633–22643

    Google Scholar 

  23. Hilgard J, Engelhardt C, Bartholow B (2013) Individual differences in motives, preferences, and pathology in video games: the gaming attitudes, motives, and experiences scales (games). Front Psychol. https://doi.org/10.3389/fpsyg.2013.00608

    Article  PubMed  PubMed Central  Google Scholar 

  24. Hinton GE, Osindero S, Teh YW (2006) A fast learning algorithm for deep belief nets. Neural Comput 18(7):1527–1554

    Article  MathSciNet  PubMed  Google Scholar 

  25. Hooshyar D, Yousefi M, Lim H (2018) Data-driven approaches to game player modeling: a systematic literature review. ACM Comput Surv 50(6):1–19

    Article  Google Scholar 

  26. Jiang Y, Zhang H, Yang H et al (2020) A novel comprehensive evaluation model for smart cities based on a modified critic weighting method. IEEE Access 8:98383–98398

    Google Scholar 

  27. Kendall MG (1945) The treatment of ties in ranking problems. Biometrika 33(3):239–251

    Article  MathSciNet  CAS  PubMed  Google Scholar 

  28. Keskisärkkä R (2012) Automatic text simplification via synonym replacement. Master’s thesis, Linköping University, Linköping, Sweden, Department of Computer and Information Science, The Institute of Technology

  29. Kutner MH, Nachtsheim CJ, Neter J (2004) Applied linear regression models, vol 4. McGraw Hill Irwin, New York

    Google Scholar 

  30. LeCun Y, Bottou L, Bengio Y et al (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324

    Article  Google Scholar 

  31. Lee-Cultura S, Sharma K, Papavlasopoulou S, et al (2020) Motion-based educational games: using multi-modal data to predict player’s performance. In: IEEE conference on games (COG 2020)

  32. Liu X, Chen X, Ma C et al (2021) A novel comprehensive evaluation method for cloud service providers based on a modified critic weighting method. IEEE Access 9:96909–96922

    Google Scholar 

  33. Luton W (2013) Free-to-play: making money from games you give away. New Riders

  34. Malone TW (1981) Toward a theory of intrinsically motivating instruction. Cogn Sci 4(4):333–369

    Google Scholar 

  35. Mamdouh Farghaly H, Shams MY, Abd El-Hafeez T (2023) Hepatitis c virus prediction based on machine learning framework: a real-world case study in Egypt. Knowl Inf Syst 65(6):2595–2617. https://doi.org/10.1007/s10115-023-01851-4

    Article  Google Scholar 

  36. Maqsood R, Ceravolo P, Romero C et al (2022) Modeling and predicting students’ engagement behaviors using mixture Markov models. Knowl Inf Syst. https://doi.org/10.1007/s10115-022-01674-9

    Article  Google Scholar 

  37. Nguyen T, Le T, Le B (2020) Predicting next purchase item on JXM game by k-means clustering and Arimax model. In: 2020 7th NAFOSTED conference on information and computer science (NICS), pp 421–426. https://doi.org/10.1109/NICS51282.2020.9335839

  38. Perisic A, Pahor M (2020) Extended RFM logit model for churn prediction in the mobile gaming market. Croatian Oper Res Rev 11(2):249–261

    Article  Google Scholar 

  39. Punetha N, Jain G (2023) Aspect and orientation-based sentiment analysis of customer feedback using mathematical optimization models. Knowl Inf Syst 65(6):2731–2760. https://doi.org/10.1007/s10115-023-01848-z

    Article  Google Scholar 

  40. Qiu W, Wang J, Yu Y et al (2021) Hybrid multi-criteria decision-making method based on critic weight and deep belief network for supplier selection. IEEE Access 9:123360–123372

    Google Scholar 

  41. Quwaider M, Alabed A, Duwairi R (2019) The impact of video games on the players behaviors: a survey. Procedia Comput Sci 151:575–582. https://doi.org/10.1016/j.procs.2019.04.077. The 10th international conference on ambient systems, networks and technologies (ANT 2019)/the 2nd international conference on emerging data and industry 4.0 (EDI40 2019)/affiliated workshops

  42. Rasmussen CE, Williams CK (2006) Gaussian processes for machine learning. MIT Press, Cambridge

    Google Scholar 

  43. Rumelhart DE, Hinton GE, Williams RJ (1986) Learning representations by back-propagating errors. Nature 323(6088):533–536

    Article  ADS  Google Scholar 

  44. Saroj K (2016) Review: study on simple k-mean and modified k-mean clustering technique. Int J Comput Sci Eng Technol 6(7):279–281

    Google Scholar 

  45. Sekeroglu B, Dimililer K, Tuncal K (2019) Student performance prediction and classification using machine learning algorithms. In: Proceedings of the 2019 8th international conference on educational and information technology. ACM, New York, pp 7–11

  46. Selby A (2023) The best strategies for wordle. http://sonorouschocolate.com/notes/index.php?title=The best strategies for Wordle

  47. Smola AJ, Schölkopf B (2004) A tutorial on support vector regression. Stat Comput 14(3):199–222

    Article  MathSciNet  Google Scholar 

  48. Takacs D, Busch M (2019) The influence of word difficulty on player experience in word games. In: Proceedings of the 2019 CHI conference on human factors in computing systems. ACM, New York, pp 1–12

  49. Tapper T (2022) Using machine learning to predict customer lifetime value of players in a freemium mobile game: effect of seasonal features. Master’s thesis, Aalto University. School of Business. http://urn.fi/URN:NBN:fi:aalto-202211066343

  50. Waite M (2004) Macmillan English dictionary online: an advanced learner and APOS dictionary on the world wide web. Comput Assist Lang Learn 17(1):1–16

    Google Scholar 

  51. Wardle J (2021) Wordle game website. https://www.nytimes.com/games/wordle/index.html

  52. Wikipedia (Accessed on 15th March 2023) Wordle. https://en.wikipedia.org/wiki/Wordle

  53. Wold S, Ruhe A, Wold H et al (1984) The collinearity problem in linear regression. The partial least squares (PLS) approach to generalized inverses. SIAM J Sci Stat Comput 5(3):735–743

    Article  Google Scholar 

  54. Yohai VJ (1987) High breakdown-point and high efficiency robust estimates for regression. Ann Stat 15(2):642–656

    Article  MathSciNet  Google Scholar 

  55. Youm D, Kim J (2022) Text mining approach to improve mobile role playing games using users’ reviews. Appl Sci. https://doi.org/10.3390/app12126243

    Article  Google Scholar 

  56. Yue L, Chen W, Li X et al (2019) A survey of sentiment analysis in social media. Knowl Inf Syst. https://doi.org/10.1007/s10115-018-1236-4

    Article  ADS  Google Scholar 

  57. Zhang Z, Zhao Y, Canes A et al (2019) Predictive analytics with gradient boosting in clinical medicine. Ann Transl Med 7(7):152

    Article  PubMed  PubMed Central  Google Scholar 

  58. Zhao J, Han B, Zhang L et al (2021) A comprehensive evaluation of transportation infrastructure construction investment based on a novel critic weighting method. Transp Res Part D Transp Environ 98:102969

    Google Scholar 

  59. Zheng X, Jiang W, Xie J et al (2020) Extreme gradient boosting with machine learning pipeline for real estate price prediction. J Comput Inf Sci Eng 20(2):021013

    Google Scholar 

Download references

Acknowledgements

This work is supported by National Natural Science Foundation of China under Grant No. 42271391, No. 62006214 and No. 42101439, Joint Funds of Equipment Pre-Research and Ministry of Education of China Grant No. 8091B022148, the 14th Five-year Pre-research Project of Civil Aerospace of China, and Hubei Excellent Young and Middle-Aged Science and Technology Innovation Team Plan Project under Grant No. T2021031.

Author information

Authors and Affiliations

Authors

Contributions

CY is responsible for the code of the game output model and the writing of the main paper, SL is responsible for the code of the game input model and the writing of the paper, and LJ is responsible for the code of the regression model and the writing of the paper. All authors participate in the review, revise the details of the paper, and establish the model of the paper.Dai Guangming is responsible for responding to review comments.

Corresponding author

Correspondence to Xiaoyu Chen.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Shi, L., Chen, Y., Lin, J. et al. A black-box model for predicting difficulty of word puzzle games: a case study of Wordle. Knowl Inf Syst 66, 1729–1750 (2024). https://doi.org/10.1007/s10115-023-01992-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-023-01992-6

Keywords

Navigation