Abstract
The popular word-filling game Wordle has gained widespread attention since its release in 2022. Much attention has been paid to find the optimal strategy. However, this article proposes a black-box prediction model that can accurately predict the difficulty level of words in the game to find the deep rules in the game data. In this work, we scientifically established a black-box model for game difficulty prediction. We achieve high accuracy in new datasets and show strong stability in similar tasks. The black-box model is divided into the game input content feature extraction model and the game output content rule extraction model. This research scientifically and effectively extracts word attributes, including word frequency, letter frequency, part of speech, times of letter repetitions, and word meaning score from the input content. Then it reduces the seven kinds of proportion of people in different tries in output content into two indices using the Critic method. Finally, it establishes a gradient boosting decision tree-based multiple regression model, making the final prediction accuracy of difficulty level for new words reach 95%. It is believed that the black-box prediction model can provide valuable insights for game designers and developers. And the research provides an innovative method to predict and understand user behavior in online games, contributing to the broader field of data science. The integration of data-driven methodologies in the gaming industry opens new possibilities for understanding player interactions and further enhancing game development strategies.
Similar content being viewed by others
References
Anderson BJ, Meyer JG (2022) Finding the optimal human strategy for wordle using maximum correct letter probabilities and reinforcement learning. CoRR arXiv:2202.00557
Bakkes S, Tan CT, Pisan Y (2012) Personalised gaming: a motivation and overview of literature. In: Proceedings of the 8th Australasian conference on interactive entertainment: playing the system, pp 1–10
Basu A, Garain A, Naskar SK (2019) Word difficulty prediction using convolutional neural networks. In: TENCON 2019-2019 IEEE region 10 conference (TENCON). IEEE, pp 1109–1112
Bilal A, Mirza HT, Hussain I (2023) Identifying significant textual features in titles of google play store applications and their influence on user review rating. Knowl Inf Syst 65(3):1159–1178. https://doi.org/10.1007/s10115-022-01799-x
Bonthron M (2022) Rank one approximation as a strategy for Wordle. arXiv e-prints
Brysbaert M, New B (2009) Moving beyond Kučera and Francis: a critical evaluation of current word frequency norms and the introduction of a new and improved word frequency measure for american english. Behav Res Methods 41(4):977–990
Canossa A, Drachen A, Yannakakis GN (2016) Modeling player experience for content creation and game customization. IEEE Trans Games 8(3):94–102
Carmona P, Climent F, Momparler A (2019) Predicting failure in the us banking sector: an extreme gradient boosting approach. Int Rev Econ Finance 61:304–323
Chen C, Wei L, Zhang J et al (2021) Deep gradient boosting and its application in clinical data analysis. IEEE J Biomed Health Inform 25(2):459–468
Chen KT, Kao CY, Liu CF et al (2018) Predicting player churn in mobile games using machine learning techniques. J Intell Inf Syst 51(2):221–237
Chen T, Guestrin C (2016) Xgboost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 785–794
Csikszentmihalyi M (1990) Flow: the psychology of optimal experience. Harper & Row, New York
Davies M (2008) The corpus of contemporary American English: 520 million words, 1990-present. Int J Corpus Linguist 13(4):511–521
Diakoulaki D, Mavrotas G, Papayannakis L (1995) Determining objective weights in multiple criteria problems: the critic method. Comput Oper Res 22(7):763–770
Elnasr MS, Drachen A, Canossa A (2013) Game analytics: Maximizing the value of player data. Springer, London
Fields T, Cotton B, Marques G (2011) Social game design: monetization methods and mechanics. CRC Press, Cambridge
Flunger R, Mladenow A, Strauss C (2019) Game analytics on free to play. Springer, Berlin, pp 133–141. https://doi.org/10.1007/978-3-030-27355-2_10
Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Ann Stat 29(5):1189–1232
Hadiji F, Sifa R, Thurau C, et al (2014) Predicting player churn in the wild. In: IEEE conference on computational intelligence and games
Haley J, Wearnc A, Copland C, et al (2020) Cluster analysis of deep embeddings in real-time strategy games. In: Artificial intelligence and machine learning for multi-domain operations applications conference
Hamari J, Alha K, Järvelä S et al (2017) Why do players buy in-game content? An empirical study on concrete purchase motivations. Comput Human Behav 68:538–546. https://doi.org/10.1016/j.chb.2016.11.045
Hassan MA, Shafiq M, Ahmad S et al (2021) The impact of climate change on wheat productivity in Pakistan: a spearman correlation analysis. Environ Sci Pollut Res 28(18):22633–22643
Hilgard J, Engelhardt C, Bartholow B (2013) Individual differences in motives, preferences, and pathology in video games: the gaming attitudes, motives, and experiences scales (games). Front Psychol. https://doi.org/10.3389/fpsyg.2013.00608
Hinton GE, Osindero S, Teh YW (2006) A fast learning algorithm for deep belief nets. Neural Comput 18(7):1527–1554
Hooshyar D, Yousefi M, Lim H (2018) Data-driven approaches to game player modeling: a systematic literature review. ACM Comput Surv 50(6):1–19
Jiang Y, Zhang H, Yang H et al (2020) A novel comprehensive evaluation model for smart cities based on a modified critic weighting method. IEEE Access 8:98383–98398
Kendall MG (1945) The treatment of ties in ranking problems. Biometrika 33(3):239–251
Keskisärkkä R (2012) Automatic text simplification via synonym replacement. Master’s thesis, Linköping University, Linköping, Sweden, Department of Computer and Information Science, The Institute of Technology
Kutner MH, Nachtsheim CJ, Neter J (2004) Applied linear regression models, vol 4. McGraw Hill Irwin, New York
LeCun Y, Bottou L, Bengio Y et al (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324
Lee-Cultura S, Sharma K, Papavlasopoulou S, et al (2020) Motion-based educational games: using multi-modal data to predict player’s performance. In: IEEE conference on games (COG 2020)
Liu X, Chen X, Ma C et al (2021) A novel comprehensive evaluation method for cloud service providers based on a modified critic weighting method. IEEE Access 9:96909–96922
Luton W (2013) Free-to-play: making money from games you give away. New Riders
Malone TW (1981) Toward a theory of intrinsically motivating instruction. Cogn Sci 4(4):333–369
Mamdouh Farghaly H, Shams MY, Abd El-Hafeez T (2023) Hepatitis c virus prediction based on machine learning framework: a real-world case study in Egypt. Knowl Inf Syst 65(6):2595–2617. https://doi.org/10.1007/s10115-023-01851-4
Maqsood R, Ceravolo P, Romero C et al (2022) Modeling and predicting students’ engagement behaviors using mixture Markov models. Knowl Inf Syst. https://doi.org/10.1007/s10115-022-01674-9
Nguyen T, Le T, Le B (2020) Predicting next purchase item on JXM game by k-means clustering and Arimax model. In: 2020 7th NAFOSTED conference on information and computer science (NICS), pp 421–426. https://doi.org/10.1109/NICS51282.2020.9335839
Perisic A, Pahor M (2020) Extended RFM logit model for churn prediction in the mobile gaming market. Croatian Oper Res Rev 11(2):249–261
Punetha N, Jain G (2023) Aspect and orientation-based sentiment analysis of customer feedback using mathematical optimization models. Knowl Inf Syst 65(6):2731–2760. https://doi.org/10.1007/s10115-023-01848-z
Qiu W, Wang J, Yu Y et al (2021) Hybrid multi-criteria decision-making method based on critic weight and deep belief network for supplier selection. IEEE Access 9:123360–123372
Quwaider M, Alabed A, Duwairi R (2019) The impact of video games on the players behaviors: a survey. Procedia Comput Sci 151:575–582. https://doi.org/10.1016/j.procs.2019.04.077. The 10th international conference on ambient systems, networks and technologies (ANT 2019)/the 2nd international conference on emerging data and industry 4.0 (EDI40 2019)/affiliated workshops
Rasmussen CE, Williams CK (2006) Gaussian processes for machine learning. MIT Press, Cambridge
Rumelhart DE, Hinton GE, Williams RJ (1986) Learning representations by back-propagating errors. Nature 323(6088):533–536
Saroj K (2016) Review: study on simple k-mean and modified k-mean clustering technique. Int J Comput Sci Eng Technol 6(7):279–281
Sekeroglu B, Dimililer K, Tuncal K (2019) Student performance prediction and classification using machine learning algorithms. In: Proceedings of the 2019 8th international conference on educational and information technology. ACM, New York, pp 7–11
Selby A (2023) The best strategies for wordle. http://sonorouschocolate.com/notes/index.php?title=The best strategies for Wordle
Smola AJ, Schölkopf B (2004) A tutorial on support vector regression. Stat Comput 14(3):199–222
Takacs D, Busch M (2019) The influence of word difficulty on player experience in word games. In: Proceedings of the 2019 CHI conference on human factors in computing systems. ACM, New York, pp 1–12
Tapper T (2022) Using machine learning to predict customer lifetime value of players in a freemium mobile game: effect of seasonal features. Master’s thesis, Aalto University. School of Business. http://urn.fi/URN:NBN:fi:aalto-202211066343
Waite M (2004) Macmillan English dictionary online: an advanced learner and APOS dictionary on the world wide web. Comput Assist Lang Learn 17(1):1–16
Wardle J (2021) Wordle game website. https://www.nytimes.com/games/wordle/index.html
Wikipedia (Accessed on 15th March 2023) Wordle. https://en.wikipedia.org/wiki/Wordle
Wold S, Ruhe A, Wold H et al (1984) The collinearity problem in linear regression. The partial least squares (PLS) approach to generalized inverses. SIAM J Sci Stat Comput 5(3):735–743
Yohai VJ (1987) High breakdown-point and high efficiency robust estimates for regression. Ann Stat 15(2):642–656
Youm D, Kim J (2022) Text mining approach to improve mobile role playing games using users’ reviews. Appl Sci. https://doi.org/10.3390/app12126243
Yue L, Chen W, Li X et al (2019) A survey of sentiment analysis in social media. Knowl Inf Syst. https://doi.org/10.1007/s10115-018-1236-4
Zhang Z, Zhao Y, Canes A et al (2019) Predictive analytics with gradient boosting in clinical medicine. Ann Transl Med 7(7):152
Zhao J, Han B, Zhang L et al (2021) A comprehensive evaluation of transportation infrastructure construction investment based on a novel critic weighting method. Transp Res Part D Transp Environ 98:102969
Zheng X, Jiang W, Xie J et al (2020) Extreme gradient boosting with machine learning pipeline for real estate price prediction. J Comput Inf Sci Eng 20(2):021013
Acknowledgements
This work is supported by National Natural Science Foundation of China under Grant No. 42271391, No. 62006214 and No. 42101439, Joint Funds of Equipment Pre-Research and Ministry of Education of China Grant No. 8091B022148, the 14th Five-year Pre-research Project of Civil Aerospace of China, and Hubei Excellent Young and Middle-Aged Science and Technology Innovation Team Plan Project under Grant No. T2021031.
Author information
Authors and Affiliations
Contributions
CY is responsible for the code of the game output model and the writing of the main paper, SL is responsible for the code of the game input model and the writing of the paper, and LJ is responsible for the code of the regression model and the writing of the paper. All authors participate in the review, revise the details of the paper, and establish the model of the paper.Dai Guangming is responsible for responding to review comments.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Shi, L., Chen, Y., Lin, J. et al. A black-box model for predicting difficulty of word puzzle games: a case study of Wordle. Knowl Inf Syst 66, 1729–1750 (2024). https://doi.org/10.1007/s10115-023-01992-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-023-01992-6