Abstract
This study investigates biases present in large language models (LLMs) when utilized for narrative tasks, specifically in game story generation and story ending classification. Our experiment involves using popular LLMs, including GPT-3.5, GPT-4, and Llama 2, to generate game stories and classify their endings into three categories: positive, negative, and neutral. The results of our analysis reveal a notable bias towards positive-ending stories in the LLMs under examination. Moreover, we observe that GPT-4 and Llama 2 tend to classify stories into uninstructed categories, underscoring the critical importance of thoughtfully designing downstream systems that employ LLM-generated outputs. These findings provide a groundwork for the development of systems that incorporate LLMs in game story generation and classification. They also emphasize the necessity of being vigilant in addressing biases and improving system performance. By acknowledging and rectifying these biases, we can create more fair and accurate applications of LLMs in various narrative-based tasks.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Converting a raw text string into a key-value object in memory.
- 2.
As the temperature increases, the output from the model becomes more stochastic. The possible value range for ChatGPT is from 0 to 2, where 1 is the default value.
- 3.
- 4.
References
Baek, S., Im, H., Ryu, J., et al.: PromptCrafter: crafting text-to-image prompt through mixed-initiative dialogue with LLM. arXiv preprint arXiv:2307.08985 (2023)
Christiano, P., Leike, J., Brown, T.B., et al.: Deep reinforcement learning from human preferences (2023)
Dang, H., Mecke, L., Lehmann, F., et al.: How to prompt? Opportunities and challenges of zero-and few-shot learning for human-AI interaction in creative applications of generative models. arXiv preprint arXiv:2209.01390 (2022)
Gilbert, L.: “Assassin’s Creed reminds us that history is human experience’’: students’ senses of empathy while playing a narrative video game. Theory Res. Soc. Educ. 47(1), 108–137 (2019). https://doi.org/10.1080/00933104.2018.1560713
Grace, L.: Game type and game genre (2005). Accessed 22 Feb 2009
Jozefowicz, R., Vinyals, O., Schuster, M., et al.: Exploring the limits of language modeling (2016)
Kasneci, E., Sessler, K., Küchemann, S., et al.: ChatGPT for good? On opportunities and challenges of large language models for education. Learn. Individ. Differ. 103, 102274 (2023). https://doi.org/10.1016/j.lindif.2023.102274. https://www.sciencedirect.com/science/article/pii/S1041608023000195
Khandelwal, U., He, H., Qi, P., et al.: Sharp nearby, fuzzy far away: how neural language models use context. arXiv preprint arXiv:1805.04623 (2018)
Lanzi, P.L., Loiacono, D.: ChatGPT and other large language models as evolutionary engines for online interactive collaborative game design (2023)
Lu, Y., Bartolo, M., Moore, A., et al.: Fantastically ordered prompts and where to find them: overcoming few-shot prompt order sensitivity. arXiv preprint arXiv:2104.08786 (2021)
Murray, J.: From game-story to cyberdrama. In: First Person: New Media as Story, Performance, and Game, vol. 1, pp. 2–11 (2004)
OpenAI: Introducing ChatGPT (2022). https://openai.com/blog/chatgpt
OpenAI: GPT-4 technical report (2023)
Porteous, J., Cavazza, M.: Controlling narrative generation with planning trajectories: the role of constraints. In: Iurgel, I.A., Zagalo, N., Petta, P. (eds.) ICIDS 2009. LNCS, vol. 5915, pp. 234–245. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-10643-9_28
Roemmele, M., Gordon, A.S.: Creative help: a story writing assistant. In: Schoenau-Fog, H., Bruni, L.E., Louchart, S., Baceviciute, S. (eds.) ICIDS 2015. LNCS, vol. 9445, pp. 81–92. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-27036-4_8
Sallam, M.: ChatGPT utility in healthcare education, research, and practice: systematic review on the promising perspectives and valid concerns. Healthcare 11(6) (2023). https://www.mdpi.com/2227-9032/11/6/887
Shaikh, O., Zhang, H., Held, W., et al.: On second thought, let’s not think step by step! bias and toxicity in zero-shot reasoning (2023)
Taveekitworachai, P., Abdullah, F., Dewantoro, M.F., et al.: ChatGPT4PCG competition: character-like level generation for science birds (2023)
Touvron, H., Martin, L., Stone, K., et al.: LLaMA 2: open foundation and fine-tuned chat models (2023)
Venkit, P.N., Gautam, S., Panchanadikar, R., et al.: Nationality bias in text generation (2023)
Värtinen, S., Hämäläinen, P., Guckelsberger, C.: Generating role-playing game quests with GPT language models. IEEE Trans. Games 1–12 (2022). https://doi.org/10.1109/TG.2022.3228480
Wang, G., Xie, Y., Jiang, Y., et al.: Voyager: an open-ended embodied agent with large language models (2023)
Wang, Z., Xie, Q., Ding, Z., et al.: Is ChatGPT a Good Sentiment Analyzer? A Preliminary Study (2023)
Webson, A., Pavlick, E.: Do prompt-based models really understand the meaning of their prompts? arXiv preprint arXiv:2109.01247 (2021)
Wei, J., Tay, Y., Bommasani, R., et al.: Emergent abilities of large language models (2022)
White, J., Fu, Q., Hays, S., et al.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023)
Wu, M., Aji, A.F.: Style over substance: Evaluation biases for large language models (2023)
Yuan, A., Coenen, A., Reif, E., et al.: Wordcraft: story writing with large language models. In: 27th International Conference on Intelligent User Interfaces, IUI 2022, pp. 841–852. Association for Computing Machinery, New York (2022). https://doi.org/10.1145/3490099.3511105
Zhao, W.X., et al.: A survey of large language models (2023)
Zhou, Y., Muresanu, A.I., Han, Z., et al.: Large language models are human-level prompt engineers. arXiv preprint arXiv:2211.01910 (2022)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Taveekitworachai, P. et al. (2023). What Is Waiting for Us at the End? Inherent Biases of Game Story Endings in Large Language Models. In: Holloway-Attaway, L., Murray, J.T. (eds) Interactive Storytelling. ICIDS 2023. Lecture Notes in Computer Science, vol 14384. Springer, Cham. https://doi.org/10.1007/978-3-031-47658-7_26
Download citation
DOI: https://doi.org/10.1007/978-3-031-47658-7_26
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-47657-0
Online ISBN: 978-3-031-47658-7
eBook Packages: Computer ScienceComputer Science (R0)