A black-box model for predicting difficulty of word puzzle games: a case study of Wordle

Shi, Ling; Chen, Yingke; Lin, Jiaxuan; Chen, Xiaoyu; Dai, Guangming

doi:10.1007/s10115-023-01992-6

A black-box model for predicting difficulty of word puzzle games: a case study of Wordle

Regular paper
Published: 14 October 2023

Volume 66, pages 1729–1750, (2024)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Ling Shi¹^na1,
Yingke Chen¹^na1,
Jiaxuan Lin¹^na1,
Xiaoyu Chen¹ &
…
Guangming Dai¹

388 Accesses
Explore all metrics

Abstract

The popular word-filling game Wordle has gained widespread attention since its release in 2022. Much attention has been paid to find the optimal strategy. However, this article proposes a black-box prediction model that can accurately predict the difficulty level of words in the game to find the deep rules in the game data. In this work, we scientifically established a black-box model for game difficulty prediction. We achieve high accuracy in new datasets and show strong stability in similar tasks. The black-box model is divided into the game input content feature extraction model and the game output content rule extraction model. This research scientifically and effectively extracts word attributes, including word frequency, letter frequency, part of speech, times of letter repetitions, and word meaning score from the input content. Then it reduces the seven kinds of proportion of people in different tries in output content into two indices using the Critic method. Finally, it establishes a gradient boosting decision tree-based multiple regression model, making the final prediction accuracy of difficulty level for new words reach 95%. It is believed that the black-box prediction model can provide valuable insights for game designers and developers. And the research provides an innovative method to predict and understand user behavior in online games, contributing to the broader field of data science. The integration of data-driven methodologies in the gaming industry opens new possibilities for understanding player interactions and further enhancing game development strategies.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Ordinal Regression for Difficulty Prediction of StepMania Levels

Examination of adaptation components in serious games: a systematic review study

Article 18 November 2022

Personalized Game Reviews

References

Anderson BJ, Meyer JG (2022) Finding the optimal human strategy for wordle using maximum correct letter probabilities and reinforcement learning. CoRR arXiv:2202.00557
Bakkes S, Tan CT, Pisan Y (2012) Personalised gaming: a motivation and overview of literature. In: Proceedings of the 8th Australasian conference on interactive entertainment: playing the system, pp 1–10
Basu A, Garain A, Naskar SK (2019) Word difficulty prediction using convolutional neural networks. In: TENCON 2019-2019 IEEE region 10 conference (TENCON). IEEE, pp 1109–1112
Bilal A, Mirza HT, Hussain I (2023) Identifying significant textual features in titles of google play store applications and their influence on user review rating. Knowl Inf Syst 65(3):1159–1178. https://doi.org/10.1007/s10115-022-01799-x
Article Google Scholar
Bonthron M (2022) Rank one approximation as a strategy for Wordle. arXiv e-prints
Brysbaert M, New B (2009) Moving beyond Kučera and Francis: a critical evaluation of current word frequency norms and the introduction of a new and improved word frequency measure for american english. Behav Res Methods 41(4):977–990
Article PubMed Google Scholar
Canossa A, Drachen A, Yannakakis GN (2016) Modeling player experience for content creation and game customization. IEEE Trans Games 8(3):94–102
Google Scholar
Carmona P, Climent F, Momparler A (2019) Predicting failure in the us banking sector: an extreme gradient boosting approach. Int Rev Econ Finance 61:304–323
Article Google Scholar
Chen C, Wei L, Zhang J et al (2021) Deep gradient boosting and its application in clinical data analysis. IEEE J Biomed Health Inform 25(2):459–468
CAS Google Scholar
Chen KT, Kao CY, Liu CF et al (2018) Predicting player churn in mobile games using machine learning techniques. J Intell Inf Syst 51(2):221–237
Google Scholar
Chen T, Guestrin C (2016) Xgboost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 785–794
Csikszentmihalyi M (1990) Flow: the psychology of optimal experience. Harper & Row, New York
Google Scholar
Davies M (2008) The corpus of contemporary American English: 520 million words, 1990-present. Int J Corpus Linguist 13(4):511–521
Google Scholar
Diakoulaki D, Mavrotas G, Papayannakis L (1995) Determining objective weights in multiple criteria problems: the critic method. Comput Oper Res 22(7):763–770
Article Google Scholar
Elnasr MS, Drachen A, Canossa A (2013) Game analytics: Maximizing the value of player data. Springer, London
Google Scholar
Fields T, Cotton B, Marques G (2011) Social game design: monetization methods and mechanics. CRC Press, Cambridge
Book Google Scholar
Flunger R, Mladenow A, Strauss C (2019) Game analytics on free to play. Springer, Berlin, pp 133–141. https://doi.org/10.1007/978-3-030-27355-2_10
Book Google Scholar
Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Ann Stat 29(5):1189–1232
Article MathSciNet Google Scholar
Hadiji F, Sifa R, Thurau C, et al (2014) Predicting player churn in the wild. In: IEEE conference on computational intelligence and games
Haley J, Wearnc A, Copland C, et al (2020) Cluster analysis of deep embeddings in real-time strategy games. In: Artificial intelligence and machine learning for multi-domain operations applications conference
Hamari J, Alha K, Järvelä S et al (2017) Why do players buy in-game content? An empirical study on concrete purchase motivations. Comput Human Behav 68:538–546. https://doi.org/10.1016/j.chb.2016.11.045
Article Google Scholar
Hassan MA, Shafiq M, Ahmad S et al (2021) The impact of climate change on wheat productivity in Pakistan: a spearman correlation analysis. Environ Sci Pollut Res 28(18):22633–22643
Google Scholar
Hilgard J, Engelhardt C, Bartholow B (2013) Individual differences in motives, preferences, and pathology in video games: the gaming attitudes, motives, and experiences scales (games). Front Psychol. https://doi.org/10.3389/fpsyg.2013.00608
Article PubMed PubMed Central Google Scholar
Hinton GE, Osindero S, Teh YW (2006) A fast learning algorithm for deep belief nets. Neural Comput 18(7):1527–1554
Article MathSciNet PubMed Google Scholar
Hooshyar D, Yousefi M, Lim H (2018) Data-driven approaches to game player modeling: a systematic literature review. ACM Comput Surv 50(6):1–19
Article Google Scholar
Jiang Y, Zhang H, Yang H et al (2020) A novel comprehensive evaluation model for smart cities based on a modified critic weighting method. IEEE Access 8:98383–98398
Google Scholar
Kendall MG (1945) The treatment of ties in ranking problems. Biometrika 33(3):239–251
Article MathSciNet CAS PubMed Google Scholar
Keskisärkkä R (2012) Automatic text simplification via synonym replacement. Master’s thesis, Linköping University, Linköping, Sweden, Department of Computer and Information Science, The Institute of Technology
Kutner MH, Nachtsheim CJ, Neter J (2004) Applied linear regression models, vol 4. McGraw Hill Irwin, New York
Google Scholar
LeCun Y, Bottou L, Bengio Y et al (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324
Article Google Scholar
Lee-Cultura S, Sharma K, Papavlasopoulou S, et al (2020) Motion-based educational games: using multi-modal data to predict player’s performance. In: IEEE conference on games (COG 2020)
Liu X, Chen X, Ma C et al (2021) A novel comprehensive evaluation method for cloud service providers based on a modified critic weighting method. IEEE Access 9:96909–96922
Google Scholar
Luton W (2013) Free-to-play: making money from games you give away. New Riders
Malone TW (1981) Toward a theory of intrinsically motivating instruction. Cogn Sci 4(4):333–369
Google Scholar
Mamdouh Farghaly H, Shams MY, Abd El-Hafeez T (2023) Hepatitis c virus prediction based on machine learning framework: a real-world case study in Egypt. Knowl Inf Syst 65(6):2595–2617. https://doi.org/10.1007/s10115-023-01851-4
Article Google Scholar
Maqsood R, Ceravolo P, Romero C et al (2022) Modeling and predicting students’ engagement behaviors using mixture Markov models. Knowl Inf Syst. https://doi.org/10.1007/s10115-022-01674-9
Article Google Scholar
Nguyen T, Le T, Le B (2020) Predicting next purchase item on JXM game by k-means clustering and Arimax model. In: 2020 7th NAFOSTED conference on information and computer science (NICS), pp 421–426. https://doi.org/10.1109/NICS51282.2020.9335839
Perisic A, Pahor M (2020) Extended RFM logit model for churn prediction in the mobile gaming market. Croatian Oper Res Rev 11(2):249–261
Article Google Scholar
Punetha N, Jain G (2023) Aspect and orientation-based sentiment analysis of customer feedback using mathematical optimization models. Knowl Inf Syst 65(6):2731–2760. https://doi.org/10.1007/s10115-023-01848-z
Article Google Scholar
Qiu W, Wang J, Yu Y et al (2021) Hybrid multi-criteria decision-making method based on critic weight and deep belief network for supplier selection. IEEE Access 9:123360–123372
Google Scholar
Quwaider M, Alabed A, Duwairi R (2019) The impact of video games on the players behaviors: a survey. Procedia Comput Sci 151:575–582. https://doi.org/10.1016/j.procs.2019.04.077. The 10th international conference on ambient systems, networks and technologies (ANT 2019)/the 2nd international conference on emerging data and industry 4.0 (EDI40 2019)/affiliated workshops
Rasmussen CE, Williams CK (2006) Gaussian processes for machine learning. MIT Press, Cambridge
Google Scholar
Rumelhart DE, Hinton GE, Williams RJ (1986) Learning representations by back-propagating errors. Nature 323(6088):533–536
Article ADS Google Scholar
Saroj K (2016) Review: study on simple k-mean and modified k-mean clustering technique. Int J Comput Sci Eng Technol 6(7):279–281
Google Scholar
Sekeroglu B, Dimililer K, Tuncal K (2019) Student performance prediction and classification using machine learning algorithms. In: Proceedings of the 2019 8th international conference on educational and information technology. ACM, New York, pp 7–11
Selby A (2023) The best strategies for wordle. http://sonorouschocolate.com/notes/index.php?title=The best strategies for Wordle
Smola AJ, Schölkopf B (2004) A tutorial on support vector regression. Stat Comput 14(3):199–222
Article MathSciNet Google Scholar
Takacs D, Busch M (2019) The influence of word difficulty on player experience in word games. In: Proceedings of the 2019 CHI conference on human factors in computing systems. ACM, New York, pp 1–12
Tapper T (2022) Using machine learning to predict customer lifetime value of players in a freemium mobile game: effect of seasonal features. Master’s thesis, Aalto University. School of Business. http://urn.fi/URN:NBN:fi:aalto-202211066343
Waite M (2004) Macmillan English dictionary online: an advanced learner and APOS dictionary on the world wide web. Comput Assist Lang Learn 17(1):1–16
Google Scholar
Wardle J (2021) Wordle game website. https://www.nytimes.com/games/wordle/index.html
Wikipedia (Accessed on 15th March 2023) Wordle. https://en.wikipedia.org/wiki/Wordle
Wold S, Ruhe A, Wold H et al (1984) The collinearity problem in linear regression. The partial least squares (PLS) approach to generalized inverses. SIAM J Sci Stat Comput 5(3):735–743
Article Google Scholar
Yohai VJ (1987) High breakdown-point and high efficiency robust estimates for regression. Ann Stat 15(2):642–656
Article MathSciNet Google Scholar
Youm D, Kim J (2022) Text mining approach to improve mobile role playing games using users’ reviews. Appl Sci. https://doi.org/10.3390/app12126243
Article Google Scholar
Yue L, Chen W, Li X et al (2019) A survey of sentiment analysis in social media. Knowl Inf Syst. https://doi.org/10.1007/s10115-018-1236-4
Article ADS Google Scholar
Zhang Z, Zhao Y, Canes A et al (2019) Predictive analytics with gradient boosting in clinical medicine. Ann Transl Med 7(7):152
Article PubMed PubMed Central Google Scholar
Zhao J, Han B, Zhang L et al (2021) A comprehensive evaluation of transportation infrastructure construction investment based on a novel critic weighting method. Transp Res Part D Transp Environ 98:102969
Google Scholar
Zheng X, Jiang W, Xie J et al (2020) Extreme gradient boosting with machine learning pipeline for real estate price prediction. J Comput Inf Sci Eng 20(2):021013
Google Scholar

Download references

Acknowledgements

This work is supported by National Natural Science Foundation of China under Grant No. 42271391, No. 62006214 and No. 42101439, Joint Funds of Equipment Pre-Research and Ministry of Education of China Grant No. 8091B022148, the 14th Five-year Pre-research Project of Civil Aerospace of China, and Hubei Excellent Young and Middle-Aged Science and Technology Innovation Team Plan Project under Grant No. T2021031.

Author information

Ling Shi, Yingke Chen and Jiaxuan Lin have contributed equally to this work.

Authors and Affiliations

School of Computer Science, China University of Geosciences, Jincheng Road, Wuhan, 430078, Hubei, China
Ling Shi, Yingke Chen, Jiaxuan Lin, Xiaoyu Chen & Guangming Dai

Authors

Ling Shi
View author publications
You can also search for this author in PubMed Google Scholar
Yingke Chen
View author publications
You can also search for this author in PubMed Google Scholar
Jiaxuan Lin
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoyu Chen
View author publications
You can also search for this author in PubMed Google Scholar
Guangming Dai
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

CY is responsible for the code of the game output model and the writing of the main paper, SL is responsible for the code of the game input model and the writing of the paper, and LJ is responsible for the code of the regression model and the writing of the paper. All authors participate in the review, revise the details of the paper, and establish the model of the paper.Dai Guangming is responsible for responding to review comments.

Corresponding author

Correspondence to Xiaoyu Chen.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Shi, L., Chen, Y., Lin, J. et al. A black-box model for predicting difficulty of word puzzle games: a case study of Wordle. Knowl Inf Syst 66, 1729–1750 (2024). https://doi.org/10.1007/s10115-023-01992-6

Download citation

Received: 19 May 2023
Revised: 31 July 2023
Accepted: 15 September 2023
Published: 14 October 2023
Issue Date: March 2024
DOI: https://doi.org/10.1007/s10115-023-01992-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A black-box model for predicting difficulty of word puzzle games: a case study of Wordle

Abstract

Access this article

Similar content being viewed by others

Ordinal Regression for Difficulty Prediction of StepMania Levels

Examination of adaptation components in serious games: a systematic review study

Personalized Game Reviews

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A black-box model for predicting difficulty of word puzzle games: a case study of Wordle

Abstract

Access this article

Similar content being viewed by others

Ordinal Regression for Difficulty Prediction of StepMania Levels

Examination of adaptation components in serious games: a systematic review study

Personalized Game Reviews

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation