Abstract
We investigate the utility of large pretrained language models (PLMs) for automatic educational assessment question generation. While PLMs have shown increasing promise in a wide range of natural language applications, including question generation, they can generate unreliable and undesirable content. For high-stakes applications such as educational assessments, it is not only critical to ensure that the generated content is of high quality but also relates to the specific content being assessed. In this paper, we investigate the impact of various PLM prompting strategies on the quality of generated questions. We design a series of generation scenarios to evaluate various generation strategies and evaluate generated questions via automatic metrics and manual examination. With empirical evaluation, we identify the prompting strategy that is most likely to lead to high-quality generated questions. Finally, we demonstrate the promising educational utility of generated questions using our concluded best generation strategy by presenting generated questions together with human-authored questions to a subject matter expert, who despite their expertise, could not effectively distinguish between generated and human-authored questions.
Z. Wang and J. Valdez—Contributed equally.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Adesope, O.O., et al.: Rethinking the use of tests: a meta-analysis of practice testing. Rev. Educ. Res. 87(3), 659–701 (2017)
Bloom, B.S., Engelhart, M.D., Furst, E., Hill, W.H., Krathwohl, D.R.: Handbook I: Cognitive Domain. David McKay, New York (1956)
Brown, T., et al.: Language models are few-shot learners. In: Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M.F., Lin, H. (eds.) Advances in Neural Information Processing Systems, vol. 33, pp. 1877–1901 (2020)
Connor-Greene, P.A.: Assessing and promoting student learning: blurring the line between teaching and testing. Teach. Psychol. 27(2), 84–88 (2000)
Du, X., Shao, J., Cardie, C.: Learning to ask: neural question generation for reading comprehension. In: Proceedings of the ACL, pp. 1342–1352 (July 2017)
Duan, N., Tang, D., Chen, P., Zhou, M.: Question generation for question answering. In: Proceedings of the Conference on EMNLP, pp. 866–874 (September 2017)
Huang, Y.T., Chen, M.C., Sun, Y.S.: Bringing personalized learning into computer-aided question generation (2018)
Karpicke, J.D.: Retrieval-based learning: active retrieval promotes meaningful learning. Curr. Dir. Psychol. Sci. 21(3), 157–163 (2012)
Karpicke, J.D., Blunt, J.R.: Retrieval practice produces more learning than elaborative studying with concept mapping. Science 331(6018), 772–775 (2011)
Karpicke, J.D., Roediger, H.L., III.: The critical importance of retrieval for learning. Science 319(5865), 966–968 (2008)
Keskar, N.S., McCann, B., Varshney, L.R., Xiong, C., Socher, R.: CTRL: a conditional transformer language model for controllable generation (2019)
Koedinger, K.R., Kim, J., Jia, J.Z., McLaughlin, E.A., Bier, N.L.: Learning is not a spectator sport: Doing is better than watching for learning from a MOOC. In: Proceedings of the Conference on Learning at Scale, pp. 111–120 (2015)
Kovacs, G.: Effects of in-video quizzes on MOOC lecture viewing. In: Proceedings of the Conference on Learning at Scale, pp. 31–40 (2016)
Krathwohl, D.R.: A revision of bloom’s taxonomy: a overview. Theor. Pract. 41(4), 212–218 (2002)
Li, J., Galley, M., Brockett, C., Gao, J., Dolan, B.: A diversity-promoting objective function for neural conversation models. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 110–119 (Jun 2016)
Li, X.L., Liang, P.: Prefix-tuning: optimizing continuous prompts for generation. In: Proceedings of the ACL. pp. 4582–4597 (August 2021)
Li, Y., Duan, N., Zhou, B., Chu, X., Ouyang, W., Wang, X.: Visual question generation as dual task of visual question answering. arXiv e-prints (2017)
Liu, P., et al.: Pre-train, prompt, and predict: a systematic survey of prompting methods in natural language processing (2021)
Liu, X., Ji, K., Fu, Y., Du, Z., Yang, Z., Tang, J.: P-tuning v2: prompt tuning can be comparable to fine-tuning universally across scales and tasks (2021)
Lu, O.H., Huang, A.Y., Tsai, D.C., Yang, S.J.: Expert-authored and machine-generated short-answer questions for assessing students learning performance. Educ. Technol. Soc. 24(3), 159–173 (2021)
Martin, L., Mills, C., D’Mello, S.K., Risko, E.F.: Re-watching lectures as a study strategy and its effect on mind wandering. Exp. Psychol. 65(5), 297–305 (2018)
Morris, J.: Python language tool (2021). https://github.com/jxmorris12/language_tool_python
Perspective: Using machine learning to reduce toxicity online (2021). https://www.perspectiveapi.com/
Rajpurkar, P., et al.: SQuAD: 100,000+ questions for machine comprehension of text. In: Proceedings of the Conference on EMNLP, pp. 2383–2392 (November 2016)
Serban, I.V., et al.: Generating factoid questions with recurrent neural networks: the 30M factoid question-answer corpus. In: Proceedings of the ACL, pp. 588–598 (August 2016)
Wang, Z., Lan, A.S., Nie, W., Waters, A.E., Grimaldi, P.J., Baraniuk, R.G.: QG-Net: a data-driven question generation model for educational content. In: Proceedings of the Conference on Learning at Scale (2018)
Wiklund-Hörnqvist, C., Jonsson, B., Nyberg, L.: Strengthening concept learning by repeated testing. Scand. J. Psychol. 55(1), 10–16 (2014)
Willis, A., et al.: Key phrase extraction for generating educational question-answer pairs. In: Proceedings of the Conference on Learning at Scale (2019)
Acknowledgements
This work is supported by NSF grants 1842378, 1917713, 2118706, ONR grant N0014-20-1-2534, AFOSR grant FA9550-18-1-0478, and a Vannevar Bush Faculty Fellowship, ONR grant N00014-18-1-2047. We thank Prof. Sandra Adams (Excelsior College), Prof. Tyler Rust (California State University), Prof. Julie Dinh (Baruch College, CUNY) for contributing their subject matter and instructional expertise. Thanks to the anonymous reviewers for thoughtful feedback on the manuscript.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 Springer Nature Switzerland AG
About this paper
Cite this paper
Wang, Z., Valdez, J., Basu Mallick, D., Baraniuk, R.G. (2022). Towards Human-Like Educational Question Generation with Large Language Models. In: Rodrigo, M.M., Matsuda, N., Cristea, A.I., Dimitrova, V. (eds) Artificial Intelligence in Education. AIED 2022. Lecture Notes in Computer Science, vol 13355. Springer, Cham. https://doi.org/10.1007/978-3-031-11644-5_13
Download citation
DOI: https://doi.org/10.1007/978-3-031-11644-5_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-11643-8
Online ISBN: 978-3-031-11644-5
eBook Packages: Computer ScienceComputer Science (R0)