Beyond transparency and explainability: on the need for adequate and contextualized user guidelines for LLM use

Barman, Kristian González; Wood, Nathan; Pawlowski, Pawel

doi:10.1007/s10676-024-09778-2

Beyond transparency and explainability: on the need for adequate and contextualized user guidelines for LLM use

Original Paper
Published: 17 July 2024

Volume 26, article number 47, (2024)
Cite this article

Ethics and Information Technology Aims and scope Submit manuscript

Kristian González Barman ORCID: orcid.org/0000-0001-7277-7351¹,
Nathan Wood^1,2 &
Pawel Pawlowski¹

1605 Accesses
Explore all metrics

A Correction to this article was published on 10 March 2025

This article has been updated

Abstract

Large language models (LLMs) such as ChatGPT present immense opportunities, but without proper training for users (and potentially oversight), they carry risks of misuse as well. We argue that current approaches focusing predominantly on transparency and explainability fall short in addressing the diverse needs and concerns of various user groups. We highlight the limitations of existing methodologies and propose a framework anchored on user-centric guidelines. In particular, we argue that LLM users should be given guidelines on what tasks LLMs can do well and which they cannot, which tasks require further guidance or refinement by the user, and context-specific heuristics. We further argue that (some) users should be taught to refine and elaborate adequate prompts, be provided with good procedures for prompt iteration, and be taught efficient ways to verify outputs. We suggest that for users, shifting away from looking at the technology itself, but rather looking at the usage of it within contextualized sociotechnical systems, can help solve many issues related to LLMs. We further emphasize the role of real-world case studies in shaping these guidelines, ensuring they are grounded in practical, applicable strategies. Like any technology, risks of misuse can be managed through education, regulation, and responsible development.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Bias, machine learning, and conceptual engineering

Article Open access 18 February 2025

Striking the Balance in Using LLMs for Fact-Checking: A Narrative Literature Review

The Social Consequences of Language Technologies and Their Underlying Language Ideologies

Change history

10 March 2025
A Correction to this paper has been published: https://doi.org/10.1007/s10676-025-09824-7

Notes

For example, Harvard, the University of California, Berkeley, and the University of Missouri have spearheaded efforts to codify guidelines on responsible and ethical use of LLMs within the university context. See https://provost.harvard.edu/guidelines-using-chatgpt-and-other-generative-ai-tools-harvard, https://ethics.berkeley.edu/privacy/appropriate-use-chatgpt-and-similar-ai-tools, https://oai.missouri.edu/chatgpt-artificial-intelligence-and-academic-integrity/.
Importantly, our concern is with mitigation of unintentional or possibly negligent misuse stemming from user ignorance regarding the limitations of these systems. Willful and malicious misuse will still obviously present a problem, but mitigation strategies for this will need to be crafted along very different lines, in keeping with the different nature of such misuses. Exploration of this is beyond the scope of the current article.
See, e.g., (Augenstein et al., 2023; Barman et al., 2024a, 2024b; Chen & Shu, 2023; Mittelstadt et al., 2023).
For examples of such problems arising in real-world contexts, see, e.g., Gallegos et al. (2023), Li et al. (2023) and Salinas et al. (2023).
See Wood (2024) for further exploration of challenges in using XAI to improve effective and responsible use of AI-enabled systems.
See, e.g., Liao and Vaughan (2023) and Wang et al. (2024a, 2024b).
E.g., Bowman (2023) and Zhao et al. (2023). See also the discussion presented in layman’s terms at https://www.linkedin.com/pulse/when-llm-experts-say-we-dont-know-how-pallav-sharda-2tpyc/.
More broadly, emphasis on XAI, assuming it can be fully achieved, may undermine more institutional and human-centric approaches. See Wood (2024).
Some might argue that “rules of thumb” or heuristics for guiding LLM use are not apt to empirical testing or verification. What we have in mind, however, is a general ability to empirically check whether guidelines improve use of LLMs (in terms of users accomplishing the tasks they are employing LLMs for), and in this respect, it should be possible to empirically examine whether guidelines are indeed improving use, detracting from it, or having a negligible impact. The precise impact of various guidelines, and their implementation, would further provide useful running data for the improvement of user interfaces with an eye to ever more effective and responsible LLM use. See also Barman et al., (2024a, 2024b).
For candidate approaches in this direction, see, e.g., Wang et al. (2024a, 2024b and Watkins (2023) as well as https://www.dpc.sa.gov.au/__data/assets/pdf_file/0007/936745/Guideline-13.1-Use-of-Large-Language-Model-AI-Tools-Utilities.pdf and https://www.isc.upenn.edu/security/LLM-guide. See Johri et al. (2023) for more meta-level guidelines embedded within a specific context, i.e., LLM use in the field of medicine.

References

Abid, A., Farooqi, M., & Zou, J. (2021). Large language models associate Muslims with violence. Nature Machine Intelligence, 3(6), 461–463.
Article Google Scholar
Agarwal, V., Thureja, N., Garg, M. K., Dharmavaram, S., & Kumar, D. (2024). “Which LLM should I use?”: Evaluating LLMs for tasks performed by Undergraduate Computer Science Students in India. Preprint retrieved from arXiv:2402.01687.
Arrieta, A. B., Díaz-Rodríguez, N., Del Ser, J., Bennetot, A., Tabik, S., Barbado, A., ... & Herrera, F. (2020). Explainable artificial intelligence (XAI): concepts, taxonomies, opportunities and challenges toward responsible AI. Information fusion, 58, 82–115.
Augenstein, I., Baldwin, T., Cha, M., Chakraborty, T., Ciampaglia, G. L., Corney, D., ... & Zagni, G. (2023). Factuality challenges in the era of large language models. Preprint retrieved from arXiv:2310.05189.
Barman, D., Guo, Z., & Conlan, O. (2024). The dark side of language models: Exploring the potential of LLMs in multimedia disinformation generation and dissemination. Machine Learning with Applications, 16, 100545.
Article Google Scholar
Barman, K. G., Caron, S., Claassen, T., & De Regt, H. (2024b). Towards a benchmark for scientific understanding in humans and machines. Minds and Machines, 34(1), 1–16.
Article MATH Google Scholar
Bender, E. M., Gebru, T., McMillan-Major, A., & Shmitchell, S. (2021). On the dangers of stochastic parrots: Can language models be too big? Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency (FAccT ‘21), 610–623. https://doi.org/10.1145/3442188.3445922
Bills, S., Cammarata, N., Mossing, D., Tillman, H., Gao, L., Goh, G., Sutskever, I., Leike, J., Wu, J., & Saunders, W. (2023) Language models can explain neurons in language models. https://openaipublic.blob.core.windows.net/neuron-explainer/paper/index.Html
Boge, F. J. (2022). Two dimensions of opacity and the deep learning predicament. Minds and Machines, 32(1), 43–75.
Article MATH Google Scholar
Boiko, D. A., MacKnight, R., & Gomes, G. (2023). Emergent autonomous scientific research capabilities of large language models. Preprint retrieved from https://arxiv.org/abs/2304.05332
Burrell, J. (2016). How the machine ‘thinks’: Understanding opacity in machine learning algorithms. Big Data & Society, 3(1), 2053951715622512.
Article MATH Google Scholar
Buruk, Oğuz’Oz. (2023) Academic Writing with GPT-3.5: Reflections on practices, efficacy and transparency. Preprint retrieved from arXiv:2304.11079.
Chen, C., & Shu, K. (2023). Combating misinformation in the age of LLMs: Opportunities and challenges. Preprint retrieved from arXiv:2311.05656.
Choi, E. (2023). A comprehensive inquiry into the use of ChatGPT: Examining general, educational, and disability-focused perspectives. International Journal of Arts Humanities and Social Sciences. https://doi.org/10.56734/ijahss.v4n11a1
Article MATH Google Scholar
Conmy, A., Mavor-Parker, A. N., Lynch, A., Heimersheim, S., & Garriga-Alonso, A. (2023). Towards automated circuit discovery for mechanistic interpretability. Preprint retrieved from arXiv:2304.14997.
Conmy, A., Mavor-Parker, A., Lynch, A., Heimersheim, S., & Garriga-Alonso, A. (2023b). Towards automated circuit discovery for mechanistic interpretability. Advances in Neural Information Processing Systems, 36, 16318–16352.
Google Scholar
de Fine Licht, K. (2023). Integrating large language models into higher education: guidelines for effective implementation. Computer Sciences & Mathematics Forum, 8(1), 65.
Google Scholar
Dergaa, I., Chamari, K., Zmijewski, P., & Ben Saad, H. (2023). From human writing to artificial intelligence generated text: Examining the prospects and potential threats of ChatGPT in academic writing. Biology of Sport, 40(2), 615–622. https://doi.org/10.5114/biolsport.2023.125623
Article Google Scholar
Durán, J. M. (2021). Dissecting scientific explanation in AI (sXAI): A case for medicine and healthcare. Artificial Intelligence, 297, 103498.
Article MathSciNet MATH Google Scholar
Eloundou, T., Manning, S., Mishkin, P., & Rock, D. (2023). Gpts are gpts: An early look at the labor market impact potential of large language models. Preprint retrieved from arXiv:2303.10130.
Essel, H. B., Vlachopoulos, D., Essuman, A. B., & Amankwa, J. O. (2024). ChatGPT effects on cognitive skills of undergraduate students: Receiving instant responses from AI-based conversational large language models (LLMs). Computers and Education: Artificial Intelligence, 6, 100198.
Google Scholar
Extance, A. (2023). ChatGPT has entered the classroom: How LLMs could transform education. Nature, 623(7987), 474–477.
Article Google Scholar
Fan, L., Li, L., Ma, Z., Lee, S., Yu, H., & Hemphill, L. (2023). A bibliometric review of large language models research from 2017 to 2023. Preprint retrieved from https://doi.org/10.48550/arXiv.2304.02020
Fear, K., & Gleber, C. (2023). Shaping the future of older adult care: ChatGPT, advanced AI, and the transformation of clinical practice. JMIR Aging, 6(1), e51776.
Article Google Scholar
Ferrara, E. (2023). Should chatgpt be biased? Challenges and risks of bias in large language models. Preprint retrieved from arXiv:2304.03738.
Gallegos, I. O., Rossi, R. A., Barrow, J., Tanjim, M. M., Kim, S., Dernoncourt, F., ... & Ahmed, N. K. (2023). Bias and fairness in large language models: A survey. Preprint retrieved from arXiv:2309.00770.
Girotra, K., Meincke, L., Terwiesch, C., & Ulrich, K. T. (2023). Ideas are dimes a dozen: Large language models for idea generation in innovation. Available at SSRN 4526071.
Guo, Y., & Lee, D. (2023). Leveraging chatgpt for enhancing critical thinking skills. Journal of Chemical Education, 100(12), 4876–4883.
Article MATH Google Scholar
Hadi, M. U., Al-Tashi, Q., Qureshi, R., Shah, A., Muneer, A., Irfan, M., Zafar, A., Shaikh, M. B., Akhtar, N., Wu, J., Mirjalili, S., & Shah, M. (2023). Large language models: A comprehensive survey of its applications, challenges, limitations, and future prospects. Preprint retrieved from https://doi.org/10.36227/techrxiv.23589741.v4
Hadi, M. U., Qureshi, R., Shah, A., Irfan, M., Zafar, A., Shaikh, M. B., ... & Mirjalili, S. (2023). Large language models: A comprehensive survey of its applications, challenges, limitations, and future prospects. Authorea Preprints.
Humphreys, P. (2009). The philosophical novelty of computer simulation methods. Synthese, 169, 615–626.
Article MathSciNet MATH Google Scholar
Inagaki, T., Kato, A., Takahashi, K., Ozaki, H., & Kanda, G. N. (2023). LLMs can generate robotic scripts from goal-oriented instructions in biological laboratory automation. Preprint retrieved from https://doi.org/10.48550/arXiv.2304.10267
Jablonka, K. M., Ai, Q., Al-Feghali, A., Badhwar, S., Bocarsly, J. D., Bran, A. M., Bringuier, S., Brinson, L. C., Choudhary, K., Circi, D., Cox, S., de Jong, W. A., Evans, M. L., Gastellu, N., Genzling, J., Gil, M. V., Gupta, A. K., Hong, Z., Imran, A., ... Blaiszik, B. (2023). 14 examples of how LLMs can transform materials science and chemistry: A reflection on a large language model hackathon. Digital Discovery, 2(5), 1233–1250. https://doi.org/10.1039/d3dd00113j
Johri, S., Jeong, J., Tran, B. A., Schlessinger, D. I., Wongvibulsin, S., Cai, Z. R., ... & Rajpurkar, P. (2023). Guidelines for rigorous evaluation of clinical LLMs for conversational reasoning. medRxiv, 2023–09.
Kasneci, E., Seßler, K., Küchemann, S., Bannert, M., Dementieva, D., Fischer, F., ... & Kasneci, G. (2023). ChatGPT for good? On opportunities and challenges of large language models for education. Learning and individual differences, 103, 102274.
Kim, J. K., Chua, M., Rickard, M., & Lorenzo, A. (2023). ChatGPT and large language model (LLM) chatbots: The current state of acceptability and a proposal for guidelines on utilization in academic medicine. Journal of Pediatric Urology., 19, 598.
Article Google Scholar
Lee, J., Le, T., Chen, J., & Lee, D. (2023). Do language models plagiarize? In Proceedings of the ACM Web Conference 2023 (pp. 3637–3647). ACM. https://doi.org/10.1145/3543507.3583199
Li, Y., Du, M., Song, R., Wang, X., & Wang, Y. (2023). A survey on fairness in large language models. Preprint retrieved from arXiv:2308.10149.
Liao, Q. V., & Vaughan, J. W. (2023). Ai transparency in the age of llms: A human-centered research roadmap. Preprint retrieved from arXiv:2306.01941
Lin, Z. (2023). Why and how to embrace AI such as ChatGPT in your academic life. Royal Society Open Science, 10(8), 230658. https://doi.org/10.1098/rsos.230658
Article Google Scholar
Lundberg, S. M., & Lee, S. I. (2017). A unified approach to interpreting model predictions. Advances in neural information processing systems, 30.
Meng, K., Bau, D., Andonian, A., & Belinkov, Y. (2022). Locating and editing factual associations in GPT. Advances in Neural Information Processing Systems, 35, 17359–17372.
Google Scholar
Mishra, A., Soni, U., Arunkumar, A., Huang, J., Kwon, B. C., & Bryan, C. (2023). Promptaid: Prompt exploration, perturbation, testing and iteration using visual analytics for large language models. Preprint retrieved from arXiv:2304.01964.
Mittelstadt, B., Wachter, S., & Russell, C. (2023). To protect science, we must use LLMs as zero-shot translators. Nature Human Behaviour, 7(11), 1830–1832.
Article Google Scholar
Noy, S., & Zhang, W. (2023). Experimental evidence on the productivity effects of generative artificial intelligence. Available at SSRN 4375283.
OpenAI, R. (2023). Gpt-4 technical report. Preprint retrieved from arxiv:2303.08774. View in Article, 2.
Pan, Y., Pan, L., Chen, W., Nakov, P., Kan, M.-Y., & Wang, W. Y. (2023). On the risk of misinformation pollution with large language models. Preprint retrieved from https://doi.org/10.48550/arXiv.2305.13661
Qadir, Junaid. (2023) Engineering education in the era of ChatGPT: Promise and pitfalls of generative AI for education. 2023 IEEE Global Engineering Education Conference (EDUCON). IEEE, 2023.
Rakap, S. (2023). Chatting with GPT: Enhancing individualized education program goal development for novice special education teachers. Journal of Special Education Technology. https://doi.org/10.1177/01626434231211295
Article Google Scholar
Ribeiro, M. T., Singh, S., & Guestrin, C. (2016). Model-agnostic interpretability of machine learning. Preprint retrieved from arXiv:1606.05386.
Rudin, C. (2019). Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5), 206–215.
Article MATH Google Scholar
Salinas, A., Shah, P., Huang, Y., McCormack, R., & Morstatter, F. (2023, October). The Unequal Opportunities of Large Language Models: Examining Demographic Biases in Job Recommendations by ChatGPT and LLaMA. In Proceedings of the 3rd ACM Conference on Equity and Access in Algorithms, Mechanisms, and Optimization (pp. 1–15).
Schramowski, P., Turan, C., Andersen, N., & Herbert, F. (2022). Large pre-trained language models contain human-like biases of what is right and wrong to do. Nature Machine Intelligence, 4(3), 258–268. https://doi.org/10.1038/s42256-022-00458-8
Article Google Scholar
De Silva, D., Mills, N., El-Ayoubi, M., Manic, M., & Alahakoon, D. (2023). ChatGPT and generative AI guidelines for addressing academic integrity and augmenting pre-existing chatbots. In 2023 IEEE International Conference on Industrial Technology (ICIT) (pp. 1–6). IEEE. https://doi.org/10.1109/ICIT58465.2023.10143123
Sun, Z. (2023). A short survey of viewing large language models in legal aspect. Preprint retrieved from arXiv:2303.09136.
Valentino, M., & Freitas, A. (2022). Scientific explanation and natural language: A unified epistemological-linguistic perspective for explainable AI. Preprint retrieved from arXiv:2205.01809.
Vidgof, M., Bachhofner, S., & Mendling, J. (2023). Large language models for business process management: Opportunities and challenges. Preprint retrieved from https://doi.org/10.48550/arXiv.2304.04309
Wang, J., Ma, W., Sun, P., Zhang, M., & Nie, J. Y. (2024). Understanding user experience in large language model interactions. Preprint retrieved from arXiv:2401.08329.
Wang, L., Chen, X., Deng, X., Wen, H., You, M., Liu, W., & Li, J. (2024). Prompt engineering in consistency and reliability with the evidence-based guideline for LLMs. npj Digital Medicine, 7(1), 41.
Article MATH Google Scholar
Watkins, R. (2023). Guidance for researchers and peer-reviewers on the ethical use of Large Language Models (LLMs) in scientific research workflows. AI and Ethics. https://doi.org/10.1007/s43681-023-00294-5
Article MATH Google Scholar
Wei, J., Wang, X., Schuurmans, D., Bosma, M., Xia, F., Chi, E., Le, Q. V., & Zhou, D. (2022). Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35, 24824–24837. https://doi.org/10.48550/arXiv.2201.11903
Article Google Scholar
Williams, N., Ivanov, S., & Buhalis, D. (2023). Algorithmic ghost in the research shell: Large language models and academic knowledge creation in management research. Preprint retrieved from https://doi.org/10.48550/arXiv.2303.07304
Wood, N. G. (2024). Explainable AI in the military domain. Ethics and Information Technology, 26(2), 1–13.
Article MATH Google Scholar
Xiao, Z., Yuan, X., Liao, Q. V., Abdelghani, R., & Oudeyer, P.-Y. (2023). Supporting qualitative analysis with large language models: Combining codebook with GPT-3 for deductive coding. In Companion Proceedings of the 28th International Conference on Intelligent User Interfaces (pp. 75–78). ACM. https://doi.org/10.1145/3581754.3584101
Yadav, G. (2023). Scaling evidence-based instructional design expertise through large language models. Preprint retrieved from https://doi.org/10.48550/arXiv.2306.01006
Yan, L., Sha, L., Zhao, L., Li, Y., Martinez-Maldonado, R., Chen, G., Li, X., Jin, Y., & Gašević, D. (2023). Practical and ethical challenges of large language models in education: A systematic literature review. Preprint retrieved from https://doi.org/10.48550/arXiv.2303.13379
Yell, M. M. (2023). Social studies, ChatGPT, and lateral reading. Social Education, 87(3), 138–141.
MATH Google Scholar
Zhao, H., Chen, H., Yang, F., Liu, N., Deng, H., Cai, H., ... & Du, M. (2023). Explainability for large language models: A survey. Preprint retrieved from arXiv:2309.01029.
Zolanvari, M., Yang, Z., Khan, K., Jain, R., & Meskin, N. (2021). Trust xai: Model-agnostic explanations for ai with a case study on iiot security. IEEE Internet of Things Journal.

Download references

Funding

This work was funded by Fonds Wetenschappelijk Onderzoek (Grant numbers: 1229124N for Kristian González Barman and 1255724N for Pawel Pawlowski) and the Czech Science Foundation (Grant number 24-12638I for Nathan Wood).

Author information

Authors and Affiliations

Centre for Logic and Philosophy of Science, Department of Philosophy and Moral Sciences, Ghent University, Blandijnberg 2, 9000, Ghent, Belgium
Kristian González Barman, Nathan Wood & Pawel Pawlowski
Institute of Philosophy, Czech Academy of Sciences, Jilská 1, Praha 1, 110 00, Czechia
Nathan Wood

Authors

Kristian González Barman
View author publications
You can also search for this author inPubMed Google Scholar
Nathan Wood
View author publications
You can also search for this author inPubMed Google Scholar
Pawel Pawlowski
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Kristian González Barman.

Ethics declarations

Conflict of interest

The authors declare no conflicts of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The original online version of this article was revised: affiliation and email address of one of the authors was corrected.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Barman, K.G., Wood, N. & Pawlowski, P. Beyond transparency and explainability: on the need for adequate and contextualized user guidelines for LLM use. Ethics Inf Technol 26, 47 (2024). https://doi.org/10.1007/s10676-024-09778-2

Download citation

Accepted: 16 May 2024
Published: 17 July 2024
DOI: https://doi.org/10.1007/s10676-024-09778-2

Keywords

Part of a collection:

Large Language Models: A Philosophical Reckoning

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Beyond transparency and explainability: on the need for adequate and contextualized user guidelines for LLM use

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Bias, machine learning, and conceptual engineering

Striking the Balance in Using LLMs for Fact-Checking: A Narrative Literature Review

The Social Consequences of Language Technologies and Their Underlying Language Ideologies

Change history

10 March 2025

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now